[Indic-computing-devel] Conflicting Encoding schemes

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Conflicting Encoding schemes

Indian language usage on computers are primarily focused on document management, which includes office correspondance, DTP and Web pages. In that context even sophisticated technologies providing solutions for localization at OS level or for Database management often fails. Keeping in view the developer community often provide a solution by hacking the available font technology. Every language software available today is based on the age old glyph encoding techniques.

Inspite of the need and passion NO technology has been developed in India to cater to the needs of the Indian Languages. It is the failure of various Govt and Govt bodies to develop any technology. It is strange that our system of bureaucracy even bend upon to support outdated techniques in the name language technology which does not serve the cause for the language.

When the Govts failed to recognize the need for technology development for Indian Languages, the passionated developers continue to develop solutions based on Font techniques. Due to lack of vision some of the state governments are satisfied and have announced certain glyph encoding schemes.

In this context Government of Tamilnadu have announced two glyph encoding schemes as standards during 1999(TAM stands for Tamil Monolingual and TAB stands for Tamil Bi-lingual). Along side TSCII (yet to be recognized encoding scheme by Govt of Tamilnadu) is being used widely in the web.

Govt of Karnataka have also announced a monolingual glyph encoding standards during end of 2000. No other encoding such as character encoding or glyph encoding for bi-lingual requirement have been announced as standard by the Govt of Karnataka.

Now, Govt of India is actively involved in evolving a National glyph encoding standard for all Indian Languages.

For the effective Language content processing, Language identification is a crucial requirement. Some of the standards recognized such needs and prefixed with appropriate Language IDs. For example, TAM and TAB for Tamil fonts. Similarly, Govt of India is also following some schemes. However, the monolingual standard announced by Govt of Karanataka doesn't prefix font names with any IDs for the glyph standards. This also throws open the Pandora box.

Strangely, KGP has evolved an alternate encoding scheme called KSCLP, which is not part of Govt of Karnataka standard. As the Govt of Karnataka insists on using the glyph encoding schemes and due to lack technological support for KSCLAP there is no user to use KSCLAP for web requirement.

While supporting ISCII, the support need to be extended to PC-ISCII, which is also part of ISCII.

While developing Indic support in Mozilla, every glyph encoding can to be supported by providing internal conversion as pointed out for ISCII. It means that Mozilla would always process the Unicode data. This would strengthen the technology being developed for Indian Languages in terms of lnguistic components.

N. ANBARASAN
APPLESOFT
BANGALORE - 10