[Indic-computing-users] Why Unicode Won't Work on the Internet - Part 2
Status: Alpha
Brought to you by:
jkoshy
From: <ma...@ch...> - 2003-01-20 18:05:38
|
> http://www=2Ehastingsresearch=2Ecom/net/04-unicode-limitations=2Eshtml I've seen this site before - they basically have a very detailed explanation of the missing characters from Chinese, Japanese and Korean scripts in unicode=2E=20 Excellent research=2E Now, is that *relevant* to the Indian scripts? =3D=3D=3D=3D=3D Analysis The problems perceived with the Tamil Unicode segment are as follows=2E --------------------------------------------------------------------------= -- ---- Character Sequence The primary problem is that the sequence of characters does not match the traditional Tamil ordering method=2E This is the reason many Tamil scholar= s have requested a reorganization of the Tamil segment=2E However since some= softwares including the popular Windows 2000 have already implemented Unicode in its current form, it will not be possible now to change the Tamil segment entirely=2E The actual ordering of characters in unicode based databases happen by specifying a collation sequence=2E Unicode has detailed documents on writi= ng these collation sequences=2E A document needs to be prepared in accordance= with Unicode standards for the correct Tamil ordering sequence (tamiz neTunkaNakku)=2E --------------------------------------------------------------------------= -- ---- Missing Characters Some characters in Tamil script are missing in Unicode=2E These are not native to Tamil language, but are often used in Tamil documents=2E Significant among them are, SRI, KSHA and Tamil numeral ZERO=2E=20 While zero is not used in the original Tamil number system, it is often used in present day documents while writing numbers in Tamil numerals usin= g international number system=2E It is noted [1] that Mauritius currency use= s Tamil numberals with zero in their currencies=2E This character should be added at U+0BE6=2E KSHA and SRI are grantha characters not native to Tamil language=2E Howeve= r they are being used in Tamil script for over six centuries [2]=2E Devanaga= ri and many other indian languages consider these characters as ligatures (combination of several characters)=2E These are considered separate characters in Tamil, not ligatures=2E Thus including these characters in Tamil Unicode range makes good sense and will eliminate complex ligature handling for Tamil=2E (Possible locations: KSHA =3D U+0BBA; SRI =3D U+0BF3= )=2E Another character often requested in Tamil Unicode range is OM=2E Devanaga= ri and Gujarati ranges in Unicode have a separate character for OM at U+0950 and U+0ADO=2E If there is a consensus on adding this character, it can be added at U+0BD0=2E There have been some suggestions for including Rupee sign, and Tamil fraction symbols in Unicode=2E These characters are not in popular use=2E = These may need to be considered only when they come into use=2E --------------------------------------------------------------------------= -- ---- Incorrect Characters Tamil Unicode range has a character for TAMIL SIGN ANUSVARA at U+0B82=2E T= his character does not exist in native Tamil script=2E Its existence in granth= a script needs to be examined=2E This character is sometimes mistaken for Ta= mil oRRu which is at location U+0BCD (TAMIL SIGN VIRAMA)=2E It would be better= if this character is deprecated from Unicode standard=2E The character TAMIL SIGN VISARGA (Aytam) at U+0B83 is incorrectly classified as a modifier in Unicode while it is considered as a character in Tamil script=2E This error needs to be corrected=2E The glyph that represents this character should be changed to remove the character place holder (dotted circle)=2E (http://www=2Etamil=2Enet/people/sivaraj/tamil_unicode=2Ehtml) -------------------------------------------------------------------- mail2web - Check your email from the web at http://mail2web=2Ecom/ =2E |