Re: [Indic-computing-devel] Re: NCST Indix Examined

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Arun,

as> That is for efficiency reasons. man XTextExtents. XQueryTextExtents
as> (something more powerful than that) was the proposed new mechanism.

And even *that* isn't being used in the Athena widget set.

Folks, please read the code before offering suggestions.  It would
help to keep the signal-to-noise ratio reasonable.

as> The proposed new algorithm:
as> 
as> FindPosition(textpos, startx, pixel_width)
as> 	// Make a single request to the X Server - this doesn't exist in
as> 	// the X protocol yet
as> 	nchars = XComputeWidth(textbuf[textpos:end-of-line], startx, pixel_width,
as> 		               // other args font etc)
as> 
as> 	// everything starting from textpos to textpos + nchars is "selected"

Well, you've just changed your X client.

I thought you were going to describe an algorithm that would allow X
clients to work unchanged in the presence of arbitrary glyph
reordering, substitutioning and positioning by the X server.

as> I just said it was inconsistent in the use of character codes vs glyph
as> codes - not that it was ambiguous or in error. This seems to be a
as> consequence of it being designed at a time, when the distinction between
as> the two was not as important as it is today.

The distinction between characters and glyphs is important even for
Latin scripts.  Consider ligatures and diacritical marks; some Latin
encodings have separate character codes for the diacritical marks; a
"c" and a "cedilla" (two code points) together can have a different
glyph in these languages.  Similarly "f", "f" and "i" combine to form
a distinct glyph "ffi".

The X protocol was explicitly designed NOT to support these kinds of
transformations.

as> And you yourself (along with others on this list) accepted that certain
as> references were ambiguous. What's all the fuss about then ? :)

One place in the X protocol specification uses the phrase 'string of
characters'.  Now the word 'character' has (today) become an
overloaded phrase, with meanings ranging from the visual
representation (the letterform), the 'abstract' character itself, the
code point assigned to the character in a given encoding, a specific
glyph in a font, etc.  The exact meaning is usually clear from the
context.

Nowhere does the X11 protocol specification say that 'character codes'
are to be used in text drawing requests.  In fact, it EXPLICITLY
states that the semantics of character `codes' are NOT to be honored
by the X server.

If you change this, you'll end up with some other "protocol", not the
X protocol.  This new graphics "protocol" is however:

a. inconsistent
   i. how do you map a screen coordinate back to position in the text 
      stream if you are doing complex text rendering?

b. incomplete
   i.  how do you specify text in a different character encoding?
   ii. how do you access glyphs in a font that do not correspond to
       a `character'?

c. suffers from new problems
   i.  If you are indexing fonts using character codes, how do you use
       fonts that do not contain glyphs of 'letters'?
       You don't want glyph combining and reordering happening for
       the glyphs in a symbol font for example.

...etc...

as> need to come up with the pros and cons of each approach. I've given
as> several tangible advantages of implementing it on the X
as> server. Perhaps you could articulate your thoughts on why you think
as> it should be done in a client side library ?

Implementing Indic script support in the X server alone without
changing clients appears to be infeasible.

However, you don't need to change the X server to support Indic
scripts.  Here is one way how it would work:

>> Client side Indic Rendering I

In a client side rendering model, the client transforms:

   `M' code-points -> `N' PolyText protocol requests

The client then draws glyphs on screen using the standard
PolyText/ImageText requests.

In this model, the client does the necessary glyph substitution,
reordering and positioning, using whatever algorithm appropriate for
the script it is processing it chooses.  The end result of the
transformation is a set of [font, x/y-position, glyph-lists] tuples
that would go out as protocol requests.

Further, in this model, the client has all the information required to
map an [x,y] screen coordinate returned in an X event back to a
position in the 'text' stream (since it did all the reordering,
positioning and glyph substitution).

o this is efficient in terms of network bandwidth (glyph indices are
  sent over)

o it doesn't break anything; you are still using the X11 protocol :)

o it will work on every X server in the world; no need for extensions.

o the X server is still doing the rendering of glyphs onto the screen
  and can apply the usual caching/pre-rendering optimizations for done
  for text.

o you can support multiple encodings (KSCLP, TSCII, UNICODE, whatever)

o you can support multiple algorithms for Indic rendering 

The downside: 

Client side rendering requires fonts to be coded to a well-known font
encoding scheme, since the client has to transform character
code-points to lists of glyph indices and their positions.  

Question to the list: What font encoding standards are available for
indic scripts?  How complete are they --- do they cover every
letterform (graphical shape) used by a language's writing system?

>>  Client side Indic Rendering II

Another way of getting Indic rendering to work without any X server
modifications would be to have the client render glyphs onto a bitmap
and send this "final" bitmap across.  

I.e, the client transforms

   `M' code points -> 1 bitmap

This doesn't have the dependency on "well-known" font encodings (in
fact the font need not be present at the X server at all) but has at
least three drawbacks:

o sending a bitmap over is costlier than sending over glyph indices 

o the client has to do text rendering inside of itself, adding to
  its complexity, and complexity of administration

o the X server can't optimize its use of the glyphs of a font

The other characteristics are like that of ``Client Side Indic
Rendering I''.

Regards,
Koshy
<jk...@fr...>