[Indic-computing-users] Why Unicode Won't Work on the Internet - Part 2

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

> http://www=2Ehastingsresearch=2Ecom/net/04-unicode-limitations=2Eshtml
I've seen this site before - they basically have a very detailed
explanation of
the missing characters from Chinese, Japanese and Korean scripts in
unicode=2E=20
Excellent research=2E
Now, is that *relevant* to the Indian scripts?

=3D=3D=3D=3D=3D


Analysis
The problems perceived with the Tamil Unicode segment are as follows=2E
--------------------------------------------------------------------------=
--
----

Character Sequence
The primary problem is that the sequence of characters does not match the
traditional Tamil ordering method=2E This is the reason many Tamil scholar=
s
have requested a reorganization of the Tamil segment=2E However since some=

softwares including the popular Windows 2000 have already implemented
Unicode in its current form, it will not be possible now to change the
Tamil segment entirely=2E

The actual ordering of characters in unicode based databases happen by
specifying a collation sequence=2E Unicode has detailed documents on writi=
ng
these collation sequences=2E A document needs to be prepared in accordance=

with Unicode standards for the correct Tamil ordering sequence (tamiz
neTunkaNakku)=2E


--------------------------------------------------------------------------=
--
----

Missing Characters
Some characters in Tamil script are missing in Unicode=2E These are not
native to Tamil language, but are often used in Tamil documents=2E
Significant among them are, SRI, KSHA and Tamil numeral ZERO=2E=20

While zero is not used in the original Tamil number system, it is often
used in present day documents while writing numbers in Tamil numerals usin=
g
international number system=2E It is noted [1] that Mauritius currency use=
s
Tamil numberals with zero in their currencies=2E This character should be
added at U+0BE6=2E

KSHA and SRI are grantha characters not native to Tamil language=2E Howeve=
r
they are being used in Tamil script for over six centuries [2]=2E Devanaga=
ri
and many other indian languages consider these characters as ligatures
(combination of several characters)=2E These are considered separate
characters in Tamil, not ligatures=2E Thus including these characters in
Tamil Unicode range makes good sense and will eliminate complex ligature
handling for Tamil=2E (Possible locations: KSHA =3D U+0BBA; SRI =3D U+0BF3=
)=2E

Another character often requested in Tamil Unicode range is OM=2E Devanaga=
ri
and Gujarati ranges in Unicode have a separate character for OM at U+0950
and U+0ADO=2E If there is a consensus on adding this character, it can be
added at U+0BD0=2E

There have been some suggestions for including Rupee sign, and Tamil
fraction symbols in Unicode=2E These characters are not in popular use=2E =
These
may need to be considered only when they come into use=2E


--------------------------------------------------------------------------=
--
----

Incorrect Characters
Tamil Unicode range has a character for TAMIL SIGN ANUSVARA at U+0B82=2E T=
his
character does not exist in native Tamil script=2E Its existence in granth=
a
script needs to be examined=2E This character is sometimes mistaken for Ta=
mil
oRRu which is at location U+0BCD (TAMIL SIGN VIRAMA)=2E It would be better=
 if
this character is deprecated from Unicode standard=2E

The character TAMIL SIGN VISARGA (Aytam) at U+0B83 is incorrectly
classified as a modifier in Unicode while it is considered as a character
in Tamil script=2E This error needs to be corrected=2E The glyph that
represents this character should be changed to remove the character place
holder (dotted circle)=2E


(http://www=2Etamil=2Enet/people/sivaraj/tamil_unicode=2Ehtml)

--------------------------------------------------------------------
mail2web - Check your email from the web at
http://mail2web=2Ecom/ =2E