Does Your Browser Support Multi-language ?
... and would you like to see what's in those really BIG fonts ?
Unicode is the World's standard for encoding text. Most all of the characters
used in modern writing systems have already been assigned to unique
code positions and work is under way to add some fairly exotic modern
scripts as well as provide standardized encoding for ancient scripts.
If your browser has multilingual capabilities, it probably uses Unicode to
address the various letters, characters, and symbols shown on your screen.
If you're using Windows 2000, you already have a Unicode-aware system along
with special tools, such as the enhanced Character Map, enabling use and
display of special character from large fonts like MS Hei, MS Song, GulimChe,
or MingLiU. (Beware that W2K Character Map does not offer all of the valid
Unicode ranges and thus often can't offer all of the glyphs in a font's
repertoire.)
But, if you're using an older operating system, you may have tried to see
some of those special characters and become fairly frustrated when your font
viewer failed or totally choked-up.
Using Unicode-aware software such as Opera, Netscape, Internet Explorer, or
Outlook Express, to name a few, it is possible to view special characters and
simulate the cut and paste features of the Windows 2000 Character Map even on
the Windows 95 operating system.
HOW TO:
DOWNLOADS:
- Unicode reference sheets:
- on-line
- zipped for download so you can view them any time you want, off-line, for free. File size: 88,985 bytes. (When un-zipped and run, about 3,468,582 bytes.)
Many of the sheets appear blank. This is because many of the codes have
not yet been assigned or formalized, or because your browser and system
lack the font(s) required to display the characters.
- Code2000 shareware demo
Unicode font. (Version 1.17)
- Code2001 Code2001 is a font for Plane One. Updated April 5, 2020.
- Code2002 Code2002 is a beta test font for Plane Two! Updated April 5, 2020. Code2002 has about forty percent coverage of CJK ideographs in Plane Two and is still rather rough.
- Ol Cemet' font (freeware)
TEST PAGES (for Multilingual and Unicode support and testing):
- Script Links and Unicode test pages
- TITUS Unicode Support Testing with Multilingual Sample Pages, a comprehensive site maintained by Jost Gippert
and Carl-Martin Bunz.
- In addition to concise charts showing the character repertoire mapped
to hexadecimal codepoints, multilingual sample pages are available showing
many exotic scripts.
- Be sure to follow the link to the TITUS homepage to
learn about digitally encoding the treasury of our world's documentary
heritage to enable study and allow preservation of ancient writings in
their original scripts.
HOW TO USE UNICODE NCRs IN A WEB PAGE (HTML File)
NCR stands for Numeric Character Reference. All of the characters in
Unicode can be included as "text" in any HTML file which can be read
by all modern web browsers including Opera, Netscape, and Internet Explorer.
Microsoft® offers many free Unicode based fonts available covering many
different scripts, including Chinese.
Once you've assured that your favorite browser in installed in its latest
version on your system, you can visit:
http://www.microsoft.com/windows/ie/downloads/recommended/ime/default.asp
... This is Microsoft's IME page and offers language packs including fonts
and input methods for various writing systems.
There are several fine Unicode based commercial fonts available on the World
Wide Web.
Mr. Ronald Ogawa has a very nice font currently available as beta-test freeware
which includes Latin, Cyrillic, Greek and the UCAS (Unified Canadian Aboriginal
Syllabics). The UCAS are used for writing languages such as Cree, Naskapi, Ojibwe,
and Inuktitut. The font is called "Ballymun RO"
and is available at:
http://nexus.brocku.ca/rogawa/ucas
How To Add Special Characters (to HTML):
The web browser substitutes special characters from fonts whenever it finds
this sequence:
the ampersand symbol, the number sign, 99999, the semi-colon
(Where 99999 can be any number up to 65535)
So, A will produce the capital letter "A" because
the number 65 is the decimal code point assigned in Unicode for "A".
Of course, most of us would simply type the letter "A".
If the trademark symbol "™" is needed, however, it doesn't
appear on most keyboards.
™ will produce the trademark symbol.
Долина Кукол
will produce "Dolina Kukol" in the Cyrillic script. This is the book
title "Valley of the Dolls" in Russian:
Долина Кукол.
(Many people know that
Долина Кукол
was written by Жаклин Сьюзан.)
Although Unicode is comprehensive, it isn't yet complete. The Unicode
Consortium welcomes input from users of the various World's scripts.
It is possible to represent any of the letter-plus-diacritic combinations
found in Vietnamese with a single Unicode NCR (at least, as far as I can
tell...) But, some languages have combinations which aren't included in
Unicode as "precomposed" characters. One example is the Guarani language
which uses the letter g combined with the tilde. When a precomposed form
isn't directly encoded in Unicode it is necessary to use one of the characters
found in the combining diacritic range. So, to get the Latin letter 'g with
tilde', use the letter g followed by the NCR for the tilde as a combining
diacritic. Thus, " g̃ " should produce the symbol
"g̃"
Many of the newer e-mail programs can be set to handle HTML, so multilingual
e-mail is possible.
Several word processors allow "global search and replace" which
means that the word processor could substitute the NCR-macro any time it
finds a certain letter or combination of letters. For example, if the
HTML sheet needs a lot of the trademark symbols, the author could use
any keyboard symbol which isn't needed in the document and then "Find
and Replace" every appearance of that symbol with the desired NCR macro.
(I use the ` and I could replace all the ` signs with ™)
The more sophisticated word processors allow for a series of "global
search and replace" operations to be "programmed".
So, someone wishing to set type in the Cherokee script would be able to
type phonetically using the Latin script and then use the pre-programmed
series to convert the Latin script file into Cherokee Unicode NCRs:
for example:
FIND AND REPLACE ALL te WITH Ꮦ
FIND AND REPLACE ALL di WITH Ꮧ
FIND AND REPLACE ALL ti WITH Ꮨ
FIND AND REPLACE ALL do WITH Ꮩ
FIND AND REPLACE ALL du WITH Ꮪ
et cetera.
Folks used to have to make gifs or bmps (picture files) of any special symbol
or script and then insert the gifs into the document. Picture files take up
a lot of room and sometimes take forever to load. Using the font(s) that are
already installed on your web page reader's computer saves time and storage
space.
Sometimes, it is necessary to make a picture file of an unusual script.
The HTML author may wish to display a specific type face of a script (as one
example) because the author believes the reader's computer lacks the proper
font(s).
I do this by creating my HTML document, calling it up on my web browser
(off-line, because it is on my hard drive), using the "Screen Capture"
feature in my registered copy of IrfanView32, saving the "capture"
as a Windows BMP (bitmap) file, modifying the bitmap in Windows Paint (if
any modification is necessary, like trimming off the explorer bar),
opening the modified bitmap in IrfanView32,
then finally saving the bitmap as a gif. (Gifs are much smaller than bmps,
and thus take up less space and load much faster.)
WHILE VIEWING THE CHARTS...
If you want to see the difference between various large fonts, such as
Chinese Simplified or Chinese Traditional, try changing the default font
in your browser while viewing the specific reference sheet. If you are
using the Microsoft Internet Explorer, pull-down the "View" Menu,
slide the mouse across the "Fonts" selection, and choose an
alternate font-style from the listing.
My name is James Kass. Email to let me know if you
like my site, or have any suggestions.