RE: [Indic-computing-users] why_OTF : Can txt file be in txt format.
Status: Alpha
Brought to you by:
jkoshy
From: Andy W. <And...@bt...> - 2003-02-09 21:18:48
|
Jitendra wrote: > Nice to hear about the OTF meet in Banglore. > The attachement (in announcement of OTF meet) of Karunakar's > 'why_otf.txt' was not legibel. Can the same be put up once > again. Jitendra > Here you are (see below). Andy Why Opentype? Background ---------------- Any language is written using a script. A script - writing system - is collection of graphical shapes/symbols evolved over time and usually represent a distinct sound/idea/thing. These basic shapes are also called letter/aplhabet/character. 'Character' is widely used one in computer terminology. This graphical shape for characters in digital typography terms called a glyph. And Font is a collection of glyphs with similar style. Glyph data is either a bitmap of the shape or set of points drawing the outline of the glyph. The collection of characters for a script is called the characterset. To use the script on computers, which does all processing in bits&bytes, each character is assigned a numeric code (called code point or character code ) giving us a character encoding. The codes are usually 7bit, 8bit or 16bit and usually decided upon by countries where language is used majorly and accepted by standard bodies like ISO, Unicode or BIS(indian). For some characters in Devanagari: ? ? ? ? ? ? ? ? Issues with TrueType ---------------------------- TrueType font stores font data in tables, one of which is the CMAP table (character to glyph mapping). For 8bit fonts this table is 8bit & therefore has a limit that it can only have maximum of 256 glyphs. Though of this 256 spaces only abt 190-200 is actually avialable for glyphs, as rest is occupied by control codes. This is a problem wrt Indic scripts as they have apart from basic character set (around 60-100 unique characters - ie consonants, vowels, vowel matras, punctuation/marks etc), they have large no of consonant vowel combinations, conjuncts (2 or more consonants combining), usually in 200-1000 or more. eg For devanagari we have 35 consonants, 16 vowels so theoritically we would have 35*35 + 35*16 = 1225 + 560 = 1785 1785 glyphs cannot be put in a 8bit TTF font. This number can be reduced by studying roots of the script & making few compromises on conjuncts to be used. An obvious solution is break down the script elements in to glyph parts, which can be combined to give required shapes One simplification that is evident is that glyphs for consonant-vowel combinations are not needed & can be done with just using a basic set of consonants and vowel matras ( so 35+16+16 = 77 ). Most consonant conjuncts are half forms, so by just having half form. i) Large glyph set needs to be reduced to 190-200 ii) A mapping function for converting characters to glyphs is needed. Here we will have 1-to-1, 1-to-many , many-to-1 mappings. iii) Appropriate rendering features to give proper positioning. With a 8bit TTF after making few compromises in no of conjuncts, glyphset can be brought to 190-200 range. But now glyphs are not accessed by character codes, but glyph codes or font encoding - so-so glyph given so-so code , which can vary from font to font. But complexity increases in step ii. Step (i) is domain of font designer, but (ii) & (iii) have to be done by application developer. He has to write a library or api interface to take care of (ii) & (iii). This library will be used where ever there is need for script processing & display. This library could be used universally in many applications, if the font encoding is fixed. So then many fonts would work with same library. But unfortunately the current situation is such that there is no standardized font encoding, with font vendors having different encodings & application developers adopding different approaches for script processing. Step (i) could be avoided by having a 16bit table for character to glyph mapping. Then there is scope of having a large glyph set. This is what a Unicode font gives. It provides basic range for script and a private area in which one can put his own stuff. But still there is no standard access mechanism, which could simplify (ii) & (iii). Script processing logic & language details have still to be known by application developer & so also much of font stuff is to be hard coded into the application or the library as said above. Also if (i)-(iii) could be achieved in some way it doesnt relieve the burden of programmer of knowing language processing detail nor does it easy job of font designer or give him flexibilty to prove his creativity. Also there is an interdependence between font designer & programmer with Unicode fonts job of font designer can be made little easy but not enough to make him & programmer independent. All this can be made easy if we have - Font access mechanism which is not dependent on font encoding but character encoding - Can provide for large glyph set - Some way to keep mapping information within the font. - Script processing available as a library with simple api to access such fonts & do text rendering. And this is all what OpenType format provides and more. OpenType is an extension to TrueType , and uses Unicode as standard for character encoding. It also provides additional tables for defining rich set of mappings between characters and glyphs. It also provides for a having a large glyph set and even glyph varaints. All the features provided by OpenType format can be made use by having a application independent, preferable system level library with a api interface usable by applications. For Indic script processing OpenType tables like GSUB (glyph substitution) and GPOS (glyph positioning) gives font designer to define his rules on what conjuncts or combinations could be made available. Application programmer is relieved of the burden of knowing all the linguistic part. Also OpenType sort of makes the concept of glyph standard or font encoding standard redundant, again giving font vendors freedom to follow their own glyph setsand not really affecting the application. To summarize OpenType provides lot of benifits to Indic computing and also renders redundant some issues faced in Indic computing. |