Re: [Indic-computing-devel] Re: NCST Indix Examined

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Tue, Feb 19, 2002 at 11:17:32PM -0800, Joseph Koshy wrote:

Hi Koshy,

> 
> 
> Arun,
> 
> as> That is for efficiency reasons. man XTextExtents. XQueryTextExtents
> as> (something more powerful than that) was the proposed new mechanism.
> 
> And even *that* isn't being used in the Athena widget set.

And when did I say Xaw was using XTextExtents or XQueryTextExtents ? My
description of the FindPosition() algorithm didn't make any references
to either of the two.

> 
> Folks, please read the code before offering suggestions.  It would
> help to keep the signal-to-noise ratio reasonable.
> 

The algorithm FindPosition() was written after referring to the code. It'd help
the quality of the discussion, if you respected other people's intelligence and
knowledge.

[ back to the topic under discussion ]

> as> The proposed new algorithm:
> as> 
> as> FindPosition(textpos, startx, pixel_width)
> as> 	// Make a single request to the X Server - this doesn't exist in
> as> 	// the X protocol yet
> as> 	nchars = XComputeWidth(textbuf[textpos:end-of-line], startx, pixel_width,
> as> 		               // other args font etc)
> as> 
> as> 	// everything starting from textpos to textpos + nchars is "selected"
> 
> Well, you've just changed your X client.
> 

I never said we could support Indic scripts without changing X clients or
the protocol. Obviously, some extensions are needed.

What I did say however is that not everybody is interested in installing
client side libraries specific to Indic scripts. I for eg, do not
install Cyrillic fonts on my machine.

A possible solution could consist of:

1. Some generic (i.e. script/language independent) extensions to the X
   protocol get standardized and installed on most machines around the world.

2. An Indic language server side extension that only someone interested
   in running a unicode compliant application with Indic script installs on the
   machine running the X server. 

> I thought you were going to describe an algorithm that would allow X
> clients to work unchanged in the presence of arbitrary glyph
> reordering, substitutioning and positioning by the X server.

No. See above.

> The distinction between characters and glyphs is important even for
> Latin scripts.  Consider ligatures and diacritical marks; some Latin
> encodings have separate character codes for the diacritical marks; a
> "c" and a "cedilla" (two code points) together can have a different
> glyph in these languages.  Similarly "f", "f" and "i" combine to form
> a distinct glyph "ffi".
> 
> The X protocol was explicitly designed NOT to support these kinds of
> transformations.

Yes, the designers of X wanted to keep X to be nothing more than an
image rendering protocol and they probably had a reason too (which I
haven't found even after quite a bit of searching - would appreciate
references to X design rationale - I already have the OReilly Xlib book).

Sure, we should pay attention to the wisdom of these people, but we also
should keep in mind that things were very different 15 years ago.
Reading: http://www.xfree86.org/~keithp/talks/render.html confirms that.

However, questioning their design decisions and considering possible
implementations, that introduce new extensions without breaking
backward compatibility should be done, IMO. Perhaps, the right thing to do
is implement Indic support in client side libraries. Who knows ? But it
doesn't hurt to have all the options on the table and discuss the pros
and cons of each.

> 
> If you change this, you'll end up with some other "protocol", not the
> X protocol.  This new graphics "protocol" is however:
> 
> a. inconsistent
>    i. how do you map a screen coordinate back to position in the text 
>       stream if you are doing complex text rendering?

Inconsistent with what ? I'd say it's more consistent because all the
codes that go on the wire are character codes and glyph codes are
internal to the X server.

If it's possible to do it on the client side, it must be possible to do
it on the server. The server has all the information it needs to do this
computation.

That's not to say it's desirable - just that it's possible.

> 
> b. incomplete
>    i.  how do you specify text in a different character encoding?

Simple. Put font1 with encoding1 in the GC and call PolyText. Put
font2 with encoding2 in the GC and call PolyText again.

>    ii. how do you access glyphs in a font that do not correspond to
>        a `character'?

The client doesn't need to. It just deals with character strings (in the
conventional meaning of the word `character').

> 
> c. suffers from new problems
>    i.  If you are indexing fonts using character codes, how do you use
>        fonts that do not contain glyphs of 'letters'?
>        You don't want glyph combining and reordering happening for
>        the glyphs in a symbol font for example.

Using a glyph code == character code encoding.

> 
> ...etc...
> 
> as> need to come up with the pros and cons of each approach. I've given
> as> several tangible advantages of implementing it on the X
> as> server. Perhaps you could articulate your thoughts on why you think
> as> it should be done in a client side library ?
> 
> Implementing Indic script support in the X server alone without
> changing clients appears to be infeasible.
> 

Agree. Changing clients is necessary - but the change could be generic
and not Indic script specific.

> >> Client side Indic Rendering I
> 
> o this is efficient in terms of network bandwidth (glyph indices are
>   sent over)

You didn't count the overhead of sending the font information from the X
server to the client. As things stand now, this is a documented problem
with unicode fonts with a large difference between minChar and maxChar.
And this is not counting the relatively large number of glyphs for a
small range of unicode code space in Indic scripts.

> 
> o it doesn't break anything; you are still using the X11 protocol :)
> 

There are ways of doing the server side implementations without "breaking"
the letter of the X protocol, while breaking the spirit, I think.

> o it will work on every X server in the world; no need for extensions.

Granted.

> 
> o the X server is still doing the rendering of glyphs onto the screen
>   and can apply the usual caching/pre-rendering optimizations for done
>   for text.

True for a server side implementation too.

> 
> o you can support multiple encodings (KSCLP, TSCII, UNICODE, whatever)
> 

True for a server side implementation too. Multiple PolyText requests
with a different font (with a different encoding) in the GC each time.

> o you can support multiple algorithms for Indic rendering 

True for a server side implementation too. In fact, this argument works
better for a server side implementation. Imagine installing:

for each Indic language L:
	for each font (possibly using a different algorithm) A:
		for each client machine C:
			install a client side library

For the server side implementation C = 1 and hopefully, we can keep A
down to 1. Also, L is not a small number :)

Conclusion: the only advantage I can see that's specific to this scheme
is that it doesn't require any changes to the X server or the X protocol.

I think the issue is an implementation detail and doesn't affect any
applications, as long as they call the following time tested Xlib interface:

 XDrawText(display, d, gc, x, y, items, nitems)
             Display *display;
             Drawable d;
             GC gc;
             int x, y;
             XTextItem *items;
             int nitems;

There may be some value in experimenting with this interface with both the
approaches and learn from the experience. In some cases, though one
approach may be technically superior, the "market" may decide differently.

I'm yet to study the IndiX code - which I finally downloaded today. Will
probably chew on it for a while.

	-Arun