indic-computing-devel Mailing List for The Indic-Computing Project (Page 23)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I'm trying to refine my understanding of the basic algorithms involved
in Indic glyph rendering, for future inclusion into the Handbook.

My current understanding:

There seem to be two major issues when rendering Indic scripts ---
given a sequence of code points representing characters in some
encoding like Unicode or ISCII:

   (A) the presentation (i.e. visual) order of glyphs need not match
       the order of code points in the sequence.

   (B) these scripts use a number of glyph shapes representing
       combinations of characters, so there isn't a 1-1 mapping of
       character encoding code points to glyphs.

(A) can come about because of the structure of the character encoding
  used.  For example, UNICODE follows the convention that the code
  point for a 'base character' precedes the code points for any
  modifiers.  However, some indic scripts may require that glyphs
  representing these modifiers (e.g:- "vowel marks") be placed before
  the glyph for the 'base character'.

  [Note: You could possibly think of a character encoding where text
   is encoded in "visual" order.  Some transliteration schemes for
   indian languages use such "visual" order encodings. ]

(B) is a property of the script: most (all?) indic scripts have
  special glyph shapes for double-consonants, consonants+vowel
  combinations, etc.

So, our rendering process has to map:

   `M' code points -> `N' language glyph shapes

and in doing so we have to do glyph re-ordering "(A)" and composite
glyph selection "(B)".

[Q: Are there any other issues to be taken care of when rendering 
    indic scripts? ]

Some indian language fonts are designed to contain "partial glyphs";
these fonts require a sequence of glyphs to be specified to render a
full language glyph on screen (for example, Baraha (Kannada)).  For
such fonts, each of the `N' language glyph shapes selected above will
need to be mapped further into `O' font-specific glyph indices.

My questions are:

  - do we do reordering of glyphs (A) before looking for composite
    glyphs (B), or is it best done the other way round?

  - do (A) and (B) have to be done multiple times?

  - is there ONE algorithm that can handle correct glyph rendering for
    every indic script, or are the glyph selection/re-ordering
    algorithms language specific?

Thanks in advance for answers; this discussion will form the basis of
a section on Indic rendering in our Indic-Computing Handbook.

Regards,
Koshy
<jk...@fr...>

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec (14)
2002	Jan (25)	Feb (90)	Mar (41)	Apr (16)	May (8)	Jun	Jul (37)	Aug (35)	Sep (62)	Oct (37)	Nov (22)	Dec (7)
2003	Jan (16)	Feb (19)	Mar (10)	Apr (5)	May (26)	Jun (11)	Jul (35)	Aug (4)	Sep (14)	Oct (5)	Nov (5)	Dec (10)
2004	Jan (25)	Feb (2)	Mar	Apr (1)	May	Jun	Jul (10)	Aug (2)	Sep (2)	Oct (1)	Nov (9)	Dec
2005	Jan	Feb	Mar	Apr	May	Jun	Jul (2)	Aug	Sep	Oct (1)	Nov (1)	Dec (1)
2006	Jan	Feb	Mar (1)	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (1)	Dec
2017	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (4)	Dec

indic-computing-devel Mailing List for The Indic-Computing Project (Page 23)

indic-computing-devel — Discussing the development of tools for Indian language information processing