Thread: [Indic-computing-devel] Re: NCST IndiX examined
Status: Alpha
Brought to you by:
jkoshy
From: Sastry R. <rs...@mg...> - 2002-02-04 15:40:31
|
Dear Koshy, I sincerely appreciate your efforts in trying to support Indian Languages on Linux. After having shared the keen insights you have gained by looking at both IndLinux(IITM) and Indix(NCST), what are your final suggestions/recommendations? When can we expect a complete release that would support atleast Hindi completely without breaking the compatibility with the X Window System protocol? Warm Regards Sastry |
From: Keyur S. <key...@ya...> - 2002-02-08 09:22:03
|
Hi, I have just joined the list. Let me first say thanks to Mr. Joseph Koshy for evaluating the IndiX system. It's really good to have someone who can independently evaluate our work. Thanks a lot Mr. Koshy! Also I am thankful to Mr. Tapan Parikh for informing me about this mailing list. There are certain points which I want to put comments on. --- Joseph Koshy wrote: > [[ This is a 28MB download. I wonder why they didn't > (also) > put up a "diff" wrt. XFree86 sources instead. ]] Agreed. I was too lazy to put 'diff' on the website :(. I'll do it. > > IMO, the plus points of this work are: At this moment, I don't want to say anything about plus point of IndiX. I rather like to discuss negative points so that I can improve the design and the system :) > > The NCST changes are, unfortunately, intrusive, and break > the semantics of the X Window system protocol. > > The problem with the NCST design arises from confusion > over > character codes and the glyph indices used in X11 text > drawing calls. > I have some objection on this point. X11 text drawing calls accepts character codes and send them to the X Server along with other data in the form of a request. We have not changed this semantic. This character codes are then used by the subsequent font library to get the glyph codes. > > In the NCST work however, all text strings fed into X11 > text calls are assumed to be UNICODE character streams > encoded in UTF-8 format. True. This assumption we have taken. There are certain reasons behind this but we'll discuss them later on. > The NCST system cannot be considered an implementation of > the X Window System protocol. Applications using the > NCST > X library will not work correctly on other X servers and > applications compiled on other systems will not work > correctly on the NCST X server. Again this is not true. I have been downloading binary RPMS and using it on my machine where IndiX has been installed. You can also use applications compiled on the IndiX system and use it without any problem on your machine. > > Compatibility of clients using the NCST X11 library with > `stock' X servers is broken because of a change to > XQueryTextExtents(): in the NCST system, the text string > sent over to the X server is assumed to be in UTF-8 > format > and is first converted to UCS-2 by their X11 library. > Thus > the bytes (in UCS-2 format) that get sent out will be > quite different from what the client passed in. The NCST > X > server will deal correctly with this UCS-2 encoded data, > but stock X servers will not. Err! Please carefully see the source code of xc/lib/X11/QuTextExt.c in original XFree86. It also first converts the string into UCS-2 before sending request to the X Server. The only difference between the conversion is that, originally X Server pads an extra byte to each element of the string to make it UCS-2. We assume incoming sequence into UTF-8 and convert it into UCS-2. The changes made in IndiX was earlier breaking relationship with other foreign languages like French, German (all with iso-8859-* encoding). But now I am taking care of this also. > > Nice system, nice code; unfortunately not compatible with > the X Window System protocol. In my view, it is compatible with the X Window System protocol. Thanks, Keyur __________________________________________________ Do You Yahoo!? Send FREE Valentine eCards with Yahoo! Greetings! http://greetings.yahoo.com |
From: <jk...@Fr...> - 2002-02-09 05:42:47
|
Dear Keyur, Welcome! ks> Err! Please carefully see the source code of xc/lib/X11/QuTextExt.c ks> in original XFree86. It also first converts the string into UCS-2 ks> before sending request to the X Server. The only difference between ks> the conversion is that, originally X Server pads an extra byte to ks> each element of the string to make it UCS-2. We assume incoming ks> sequence into UTF-8 and convert it into UCS-2. The changes made in ks> IndiX was earlier breaking relationship with other foreign languages ks> like French, German (all with iso-8859-* encoding). But now I am ks> taking care of this also. I don't see anything specific to Unicode or UCS-2 in this file. http://cvsweb.xfree86.org/cvsweb/xc/lib/X11/QuTextExt.c?rev=1.4&content-type=text/x-cvsweb-markup There is only one [QueryTextExtents] protocol request in the X11 protocol. This request is used for both the `XQueryTextExtents16()' and `XQueryTextExtents()'. It expects 2 byte glyph indices. For "linear" (single byte) glyph indices, the X library makes the MSB of each 2 byte glyph index to be zero (i.e. linear encodings are treated as row 0 of a 2-D glyph matrix). All this is explained in the X Protocol specification. [See Page 37, QueryTextExtents] Compare: http://cvsweb.xfree86.org/cvsweb/xc/lib/X11/QuTextE16.c?rev=1.4&content-type=text/x-cvsweb-markup ks> X11 text drawing calls accepts character codes and send them to the ks> X Server along with other data in the form of a request. We have not ks> changed this semantic. This character codes are then used by the ks> subsequent font library to get the glyph codes. ks> In my view, it is compatible with the X Window System protocol. You seem to have ignored the part of the X protocol specification (that I had quoted in my review) that explicitly states that the X protocol DOES NOT deal with character codes and that the clients just use indices into the glyph array. ``Font: A font is a matrix of glyphs (typically characters). The protocol does no translations or interpretation of character sets. The client simply indicates values used to index the glyph array. A font contains additional metric information to determine interglyph and interline spacing.' X Protocol Specification, Glossary, pp 154. If you want to see the effect of your changes on protocol compliance, you could: (a) run the X protocol test suite. In particular, /tset/CH06/drwimgstr16/Test{all} ... and others in this section ... /tset/XPROTO/imgtxt16/Test{all} /tset/XPROTO/plytxt16/Test{all} /tset/XPROTO/qrytxtextn/Test{all} (b) attempt to view cryllic, korean or japanese text (i.e character encodings whose code points fall outside the US-ASCII range) You are probably the best placed in our group of developers to talk about the technology behind Indic script rendering. I am looking forward to learning from your experience. Do you have tutorial or writeup on Indic rendering that you could share with this group? Regards, Koshy <jk...@fr...> |
From: Keyur S. <key...@ya...> - 2002-02-11 08:04:37
|
Dear Joseph, --- Joseph Koshy <jk...@Fr...> wrote: > > ks> Err! Please carefully see the source code of > xc/lib/X11/QuTextExt.c > I don't see anything specific to Unicode or UCS-2 in this > file. When Xlib converts 8-bit string into 16-bit string, it first send MSB first. This is same as Little-Endian UCS-2. > There is only one [QueryTextExtents] protocol request in > the X11 > protocol. This request is used for both the > `XQueryTextExtents16()' > and `XQueryTextExtents()'. It expects 2 byte glyph > indices. These are not glyph indices. These are character codes which are passed in the request. The X server passes it to the appropriate font library which then maps these character codes to the glyph codes and do the further processing. "The client simply indicates values used to index the glyph array." In this sentence 'values used to index the glyph array' means 'character codes' which are used to index the glyph array using some mapping table (e.g., cmap table in TrueType font) in the font. > > For "linear" (single byte) glyph indices, the X library > makes the MSB > of each 2 byte glyph index to be zero (i.e. linear > encodings are > treated as row 0 of a 2-D glyph matrix). All this is > explained in the > X Protocol specification. [See Page 37, > QueryTextExtents] Please read the first sentence in X Protocol Specification, Glossary, pp 37 "This request returns the logical extents of the specified string of characters in the specified font". ^^^^^^^^^^^^^^^^^^^^ Let me explain this through an example. Client passes a string of characters, e.g., "Hello World", in XQueryTextExtent. Xlib will convert it into 16-bit string before sending it to Xserver in 'QueryTextExtents' request. At this place no conversion from these character codes to glyph codes is done. At the server side, proper font renderer (font library) is chosen (see xc/lib/font). This font library then gets glyph ids and other glyph information (glyph metrics etc.) from this character string using a mapping table stored in the font. Font library then passes this information back to the XServer which then processes the request further and finally either send reply/error/event or fulfill the request (as in case of XDrawString). In the mapping table of the font, character code is not necessarily same as glyph code (glyph id). For example, character 'A' which has character code 65 may be at glyph position 10 and thus having glyph code 10. In the font table, there is a mapping from character code 65 to glyph code 10. If you are still not happy with my explanation, then put a 'printf' sentence in the function XQueryTextExtents and see the values passed in the request. :) > ks> In my view, it is compatible with the X Window System > protocol. > > You seem to have ignored the part of the X protocol > specification > (that I had quoted in my review) that explicitly states > that the X > protocol DOES NOT deal with character codes and that the > clients just > use indices into the glyph array. As I have explained earlier, you have misinterpreted the sentence. How can client have knowledge about the glyph indices? Client always pass character string in Xlib routine. > > If you want to see the effect of your changes on protocol > compliance, > you could: > > (a) run the X protocol test suite. In particular, > Unfortunately, I don't have test suite installed on my system. It is not there in xc/test :( I am also not able to locate it on XFree86 site. Will you please tell me where can I get it from? > You are probably the best placed in our group of > developers to talk > about the technology behind Indic script rendering. I > am looking forward to learning from your experience. Do > you have tutorial or writeup on Indic rendering that you > could share with this group? Sure. Working as a group, we shall definately arrive at some solution. I'll be happy to share my experience with this group. I would also like to comment on various design issues that you explained in one of your earlier mails. There are some documents on Indic rendering (not written by me). I'll send you pointers. I'll also give the document written by us. We also have developed a series of printing tools that can produce high quality PS file using outlines. It uses OpenType font and supports UTF-8, ISCII, and UCS-2 (Little-Endian and Big-Endian) encodings. I am looking forward for your feedback on these tools. I'll also register all our projects on sourgeforge. Regards, Keyur __________________________________________________ Do You Yahoo!? Send FREE Valentine eCards with Yahoo! Greetings! http://greetings.yahoo.com |
From: <jk...@Fr...> - 2002-02-11 11:11:36
|
Dear Keyur, ks> When Xlib converts 8-bit string into 16-bit string, it ks> first send MSB first. This is same as Little-Endian UCS-2. The X11 Protocol definition predates Unicode. This isn't Little-Endian UCS-2, its just a 2-byte encoding of the 8 bit glyph indices. ks> If you are still not happy with my explanation, then put a ks> 'printf' sentence in the function XQueryTextExtents and see ks> the values passed in the request. :) ks> How can client have knowledge about the glyph indices? That is what the encoding field in the long name of X fonts is for. For Latin fonts this will be `-iso8859-1' meaning the font is encoded compatibly with the ISO8859-1 character encoding. There can be other encodings; Big5 (chinese), iso8859-8 (latin+hebrew) or iso8859-5 (latin+arabic). You can have fonts that are not indexed by character codes and fonts that follow different encoding schemes e.g:- hp-roman8. The client has to select the correct glyph indices in the X text drawing calls, appropriately. If the font's glyph encoding matches the character encoding, then an X client can just send over the numeric values of 'characters' unchanged and the correct glyphs will get selected automatically. This is what you are seeing when you put a "printf()" in "XQueryTextExtents()". TrueType fonts do have a 'cmap' that maps from character codes to internal glyph indices. This happens to work in X because the X client is assuming a font encoding (like iso8859-1) when sending over the glyph indices and the fonts 'cmap' is setup to map the same character encoding to its internal layout. Such "remapping" by TrueType fonts is out of the scope of the X protocol. ks> Please read the first sentence in ks> X Protocol Specification, Glossary, pp 37 ks> "This request returns the logical extents of the ks> specified string of characters in the specified font". ^^^^^^^^^^^^^^^^^^^^ Agreed, this is poorly worded. You need to read the formal definitions of FONT, STRING8 and STRING16 to put the definition in context. See also the protocol descriptions for PolyText{8,16} and ImageText{8,16}. ks> Unfortunately, I don't have test suite installed on my ks> system. It is not there in xc/test :( I am also not able to ks> locate it on XFree86 site. Will you please tell me where ks> can I get it from? It is part of the XFree86 repository, available under directory "test/", a sibling of directory "xc/". It can be retrieved in the usual ways (Anon-CVS checkout, CVSup mirroring etc). Anyone changing the X11 library or the X server really should be running the test suite to check for breakages. Do be sure to run the test suite from a remote (unmodified) system as well as locally. Regards, Koshy <jk...@fr...> |
From: Keyur S. <key...@ya...> - 2002-02-12 06:20:50
|
--- Joseph Koshy <jk...@Fr...> wrote: > > > Dear Keyur, > > ks> When Xlib converts 8-bit string into 16-bit string, > it > ks> first send MSB first. This is same as Little-Endian > UCS-2. > > The X11 Protocol definition predates Unicode. This isn't > Little-Endian UCS-2, its just a 2-byte encoding of the 8 > bit glyph > indices. I don't say that "it is" UCS-2. I say that it is "same as" UCS-2 (or compatible with UCS-2). > > ks> How can client have knowledge about the glyph > indices? > > That is what the encoding field in the long name of X > fonts is for. > For Latin fonts this will be `-iso8859-1' meaning the > font is encoded > compatibly with the ISO8859-1 character encoding. As you said, this is really character encoding not font encoding. Some distinction should be made between "character" and "glyph". ----------- According to Unicode standard (see glossary), a character is (1) The smallest component of written language that has semantic value; refers to the abstract meaning and/or shape, rather than a specific shape (see also glyph), though in code tables some form of visual representation is essential for the reader's understanding. (2) Synonym for abstarct character (3) Loosely, the basic unit of encoding for the Unicode character encoding, a 16-bit unit of textual representation. (4) Synonym for code value. (5) The English name for the ideographic written elements of Chinese origin. Abstract character : A unit of information used for the organization, control, or representation of textual data. (See also character (1, 2)) And glyph has been defined as (1) An abstract form that represents one or more glyph images. (2) A synonym for glyph image. In displaying Unicode character data, one or more glyphs may be selected to depict a particular character. These glyphs are selected by a rendering engine during composition and layout processing. --------------- As can be depicted from the above definations, a client pass "something" that has semantic value, means "characters". One or more glyphs may be selected to display a particular character. So client is in no position to decide upon the glyph indices to be used for a character. It is totally at the sole discretion of font designer to select _proper_ glyph(s) for a character. We can't say that particular glyph should be used for a character. > You can have fonts that are not indexed by character > codes and fonts > that follow different encoding schemes e.g:- hp-roman8. Can you give me few font formats used in X Window system which doesn't use mapping table? Even in case of different encoding like hp-roman8 or font coding like ISFOC, there should be mapping from these encoding values to the glyph codes. In case of ISFOC, font glyph encoding matches with ISFOC encoding. > The client > has to select the correct glyph indices in the X text > drawing calls, > appropriately. In __________________________________________________ Do You Yahoo!? Send FREE Valentine eCards with Yahoo! Greetings! http://greetings.yahoo.com |
From: Keyur S. <key...@ya...> - 2002-02-12 10:02:13
|
Hello, I don't understand what is happening. Now sending it again. :( Hello, It seems that my earlier mail was not sent in full. So sending it again. Sorry for the inconvenience. --- Joseph Koshy <jk...@Fr...> wrote: > > > Dear Keyur, > > ks> When Xlib converts 8-bit string into 16-bit string, > it > ks> first send MSB first. This is same as Little-Endian > UCS-2. > > The X11 Protocol definition predates Unicode. This isn't > Little-Endian UCS-2, its just a 2-byte encoding of the 8 > bit glyph > indices. I don't say that "it is" UCS-2. I say that it is "same as" UCS-2 (or compatible with UCS-2). > > ks> How can client have knowledge about the glyph > indices? > > That is what the encoding field in the long name of X > fonts is for. > For Latin fonts this will be `-iso8859-1' meaning the > font is encoded > compatibly with the ISO8859-1 character encoding. As you said, this is really character encoding not font encoding. Some distinction should be made between "character" and "glyph". ----------- According to Unicode standard (see glossary), a character is (1) The smallest component of written language that has semantic value; refers to the abstract meaning and/or shape, rather than a specific shape (see also glyph), though in code tables some form of visual representation is essential for the reader's understanding. (2) Synonym for abstarct character (3) Loosely, the basic unit of encoding for the Unicode character encoding, a 16-bit unit of textual representation. (4) Synonym for code value. (5) The English name for the ideographic written elements of Chinese origin. Abstract character : A unit of information used for the organization, control, or representation of textual data. (See also character (1, 2)) And glyph has been defined as (1) An abstract form that represents one or more glyph images. (2) A synonym for glyph image. In displaying Unicode character data, one or more glyphs may be selected to depict a particular character. These glyphs are selected by a rendering engine during composition and layout processing. --------------- As can be depicted from the above definations, a client pass "something" that has semantic value, means "characters". One or more glyphs may be selected to display a particular character. So client is in no position to decide upon the glyph indices to be used for a character. It is totally at the sole discretion of font designer to select _proper_ glyph(s) for a character. We can't say that particular glyph should be used for a character. > You can have fonts that are not indexed by character > codes and fonts > that follow different encoding schemes e.g:- hp-roman8. Can you give me few font formats used in X Window system which doesn't use mapping table? Even in case of different encoding like hp-roman8 or font coding like ISFOC, there should be mapping from these encoding values to the glyph codes. In case of ISFOC, font glyph encoding matches with ISFOC encoding. > The client > has to select the correct glyph indices in the X text > drawing calls, > appropriately. In XWindow system system client doesn't have direct access to font resources when fonts are loaded by the font library interactively with Xserver. Also all the font resources and security data are kept by the Xserver. Clients can only send request to Xserver to display a character string or to get extents of a character string. > > If the font's glyph encoding matches the character > encoding, then an X > client can just send over the numeric values of > 'characters' unchanged > and the correct glyphs will get selected automatically. > This is what > you are seeing when you put a "printf()" in > "XQueryTextExtents()". I have objection against the word "automatically". The glyphs are not selected automatically but since glyph codes and character codes are matched, they are displayed properly. It is also possible that font designer decides to use two glyphs "/" and "\" for character "X". In that case it is the job of mapping tables to do the things properly. Client will only request to draw glyph for character "X". It will not send indices for "/" and "\". > TrueType fonts do have a 'cmap' that maps from character > codes to > internal glyph indices. This happens to work in X because > the X client > is assuming a font encoding (like iso8859-1) when sending > over the > glyph indices and the fonts 'cmap' is setup to map the > same character > encoding to its internal layout. So you are coming to the point. As you said TrueType characters do have a 'cmap' table that maps from character codes to internal glyph indices. It means that clients has to pass character codes to such fonts. And clients do pass character codes. My stand becomes more clear if you take example of XDrawString16 or XQueryTextExtents16. In these functions we use XChar2b structure to pass character codes (e.g., Unicode). A font may have as many as 500 glyphs. But we pass values like below. ---- XChar2b str[10]; str[0].byte1 = 0x09; str[0].byte2 = 0x15; str[1].byte1 = 0x09; str[0].byte2 = 0x30; XDrawString16(dpy, drawable, gc, x, y, str, 2); ---- Clearly, we are passing Unicode values U+0915 and U+0930 which are Unicode characters "Devanagari Ka" and "Devanagari Ra" respectively. The glyphs for these characters may be at position 156 and 183 respectively. We are not passing values "156" or "183". > > Such "remapping" by TrueType fonts is out of the scope of > the X > protocol. > > ks> Please read the first sentence in > ks> X Protocol Specification, Glossary, pp 37 > ks> "This request returns the logical extents of the > ks> specified string of characters in the specified font". > ^^^^^^^^^^^^^^^^^^^^ > > Agreed, this is poorly worded. You need to read the > formal > definitions of FONT, STRING8 and STRING16 to put the > definition in > context. See also the protocol descriptions for > PolyText{8,16} and > ImageText{8,16}. OK. Here are the definations. ------- FONT (Page 154) A font is a matrix of glyphs (typically characters). The protocol does no translation or interpretation of character sets. The client simply indicates values used to index glyph array. A font contains additional metric information to determine interglyph and interline spacing. ------- Here "values used to index" doesn't necessarily mean glyph codes. "Character codes" are also values used to index glyph array using some mapping table. --------- (Page 3) STRING8 -> LISTofCARD8 STRING16 -> LISTofCHAR2B CHAR2B -> [byte1, byte2: CARD8] BYTE -> 8-bit value CARD8 -> 8-bit unsigned integer CARD16 -> 16-bit unsigned integer --------- At no place they have indicated anything about glyph indices. In fact, Protocol doesn't clearly describe anything explicitly about the "values" used in the protocol. The freedom was given for the implementation. X Window system is not merely an X Protocol but it includes X library, X Protocol, Xserver, and now Font renderers. It is totally on the implementation to decide what these "values" mean. And the developers have decided to use "character codes" to pass as values in X Protocol. Regards, Keyur __________________________________________________ Do You Yahoo!? Send FREE Valentine eCards with Yahoo! Greetings! http://greetings.yahoo.com |
From: Keyur S. <key...@ya...> - 2002-02-12 09:41:46
|
Hello, It seems that my earlier mail was not sent in full. So sending it again. Sorry for the inconvenience. --- Joseph Koshy <jk...@Fr...> wrote: > > > Dear Keyur, > > ks> When Xlib converts 8-bit string into 16-bit string, > it > ks> first send MSB first. This is same as Little-Endian > UCS-2. > > The X11 Protocol definition predates Unicode. This isn't > Little-Endian UCS-2, its just a 2-byte encoding of the 8 > bit glyph > indices. I don't say that "it is" UCS-2. I say that it is "same as" UCS-2 (or compatible with UCS-2). > > ks> How can client have knowledge about the glyph > indices? > > That is what the encoding field in the long name of X > fonts is for. > For Latin fonts this will be `-iso8859-1' meaning the > font is encoded > compatibly with the ISO8859-1 character encoding. As you said, this is really character encoding not font encoding. Some distinction should be made between "character" and "glyph". ----------- According to Unicode standard (see glossary), a character is (1) The smallest component of written language that has semantic value; refers to the abstract meaning and/or shape, rather than a specific shape (see also glyph), though in code tables some form of visual representation is essential for the reader's understanding. (2) Synonym for abstarct character (3) Loosely, the basic unit of encoding for the Unicode character encoding, a 16-bit unit of textual representation. (4) Synonym for code value. (5) The English name for the ideographic written elements of Chinese origin. Abstract character : A unit of information used for the organization, control, or representation of textual data. (See also character (1, 2)) And glyph has been defined as (1) An abstract form that represents one or more glyph images. (2) A synonym for glyph image. In displaying Unicode character data, one or more glyphs may be selected to depict a particular character. These glyphs are selected by a rendering engine during composition and layout processing. --------------- As can be depicted from the above definations, a client pass "something" that has semantic value, means "characters". One or more glyphs may be selected to display a particular character. So client is in no position to decide upon the glyph indices to be used for a character. It is totally at the sole discretion of font designer to select _proper_ glyph(s) for a character. We can't say that particular glyph should be used for a character. > You can have fonts that are not indexed by character > codes and fonts > that follow different encoding schemes e.g:- hp-roman8. Can you give me few font formats used in X Window system which doesn't use mapping table? Even in case of different encoding like hp-roman8 or font coding like ISFOC, there should be mapping from these encoding values to the glyph codes. In case of ISFOC, font glyph encoding matches with ISFOC encoding. > The client > has to select the correct glyph indices in the X text > drawing calls, > appropriately. In __________________________________________________ Do You Yahoo!? Send FREE Valentine eCards with Yahoo! Greetings! http://greetings.yahoo.com |
From: <jk...@Fr...> - 2002-02-18 10:48:34
|
ks> While starting my work on IndiX, I also decided to give support ks> using an X extension. But unfortunately, I had to work under ks> strictly imposed constraints :-( esp. that applications should not ks> be modified for Indian language support. So I did the thing in ks> whatever way the people wanted me to do and also tried to do it in ks> best possible way! Most unmodified X applications may not work correctly with an X server that does "behind-the-scenes" glyph reordering and substitution. Consider the following scenario: - the user presses Button-1 down on some `x[1],y[1]' location on screen and sweeps the pointer over the screen - Button-1 is released at location `x[2],y[2]' These screen coordinates get reported back to the application in the form of "events". Given these two pixel coordinates, the X application has to figure out the region of the underlying text that was "selected". This involves going backwards from 'x,y' coordinates to the character code points in its text buffer. If the X server is doing arbitrary glyph reordering and glyph substitution unknown to the X client, then this translation will go wrong in the client. I don't think the requirement of X applications running unchanged with Indic scripts is a feasible one. Regards, Koshy <jk...@fr...> |
From: Arun S. <ar...@sh...> - 2002-02-18 20:57:26
|
Joseph Koshy wrote: > >If the X server is doing arbitrary glyph reordering and glyph >substitution unknown to the X client, then this translation will go >wrong in the client. > Possible. The proposal over here: http://www.pps.jussieu.fr/~jch/software/UTF8_STRING/ may be relevant. It is basically saying that the X selections should be UTF8_STRINGs and not glyph codes (which may be different between different fonts). However, if your point was that the client can't easily map (x[1], y[1], x[2], y[2]) to a UTF-8 string, I don't think it would be much harder than the existing algorithms in: xc/lib/Xaw/AsciiSink.c - FindPosition() In a nutshell, the server, which has the knowledge of complex glyph codes and reordering, responds to client requests for XQueryTextExtents. I do see the problem of increased network traffic - but I see it as unavoidable. If we have to do the same thing on the client side, we'll have to communicate the information in the open type cmap tables over the X protocol to the client. > >I don't think the requirement of X applications running unchanged with >Indic scripts is a feasible one. > I'd qualify that with "as things stand today". However, if Xlib is made fully Unicode or UTF-8 capable, I believe (based on my limited understanding of X) that we could make it work, with server side modifications only. Another observation: X doesn't seem to be consistent on where the character -> glyph mapping should be done. While many sources hint that the codes in the requests (for eg: PolyText, RENDER extension etc) should be glyph codes, there are others who indicate that the values stored in XSelections should be UTF-8. In one case the X server doesn't know about the character codes, in the other it does. This is true of the X protocol also - some bits on the wire indicate glyph codes (PolyText16) and some (UTF8_STRING) indicate character codes. Does anyone know how the Indic support on MS windows works with network transparent protocols like Citrix ? Their marketing literature (talk about thin clients) seems to hint at pushing the complexity to the windows terminal server side (Windows Terminal Server = analogous X client). In that respect, it makes sense to push the complexity to the X client, because, the X server could be a very low powered hand held, not capable of dealing with the complexity. This is the only argument I can find in favour of doing reordering etc on the X client side. I think we should bring this up on the right XFree86 fora and resolve it there. -Arun |
From: <jk...@Fr...> - 2002-02-19 09:45:00
|
as> Another observation: X doesn't seem to be consistent on where the as> character -> glyph mapping should be done. While many sources hint as> that the codes in the requests (for eg: PolyText, RENDER extension as> etc) should be glyph codes, there are others who indicate that the as> values stored in XSelections should be UTF-8. X selections are a client side concept: defined and managed by clients, not the X server. Selections are built using X "properties" (name/value pairs). The X server serves as a repository for properties but does not deal with their contents. This is basic X (application) programming stuff. as> However, if your point was that the client can't easily map (x[1], as> y[1], x[2], y[2]) to a UTF-8 string, I don't think it would be as> much harder than the existing algorithms in: as> xc/lib/Xaw/AsciiSink.c - FindPosition() as> In a nutshell, the server, which has the knowledge of complex as> glyph codes and reordering, responds to client requests for as> XQueryTextExtents. Well, the Xaw widget set doesn't seem to be using XQueryTextExtents() at all. I'd really like to see this 'not so hard' algorithm whose existence you have postulated :). as> I think we should bring this up on the right XFree86 fora and as> resolve it there. I think that it would be prudent to first understand how the X window system actually works. Especially so, if you are going to claim that the X protocol specification is ambiguous/in error, and that the error has been undetected for the two decades (or so) that the specification has been around :). Here is a short list of reading material, that I found useful: o Among others, O'Reilly Inc. publishes a set of books on X window system programming which cover the basics of the system. People who are interested on working with/extending X SHOULD first read and understand these. o The mailing lists hosted at XFree86.org are a good resource, though they assume that you are already familiar with the basic design issues. o The newsgroup "comp.windows.x" is another resource which could be useful on the days the S/N ratio is tolerable. o Documentation in the X source tree "xc/doc/*" Regards, Koshy <jk...@fr...> |
From: Arun S. <ar...@sh...> - 2002-02-19 18:07:03
|
On Tue, Feb 19, 2002 at 01:44:57AM -0800, Joseph Koshy wrote: > > as> However, if your point was that the client can't easily map (x[1], > as> y[1], x[2], y[2]) to a UTF-8 string, I don't think it would be > as> much harder than the existing algorithms in: > as> xc/lib/Xaw/AsciiSink.c - FindPosition() > as> In a nutshell, the server, which has the knowledge of complex > as> glyph codes and reordering, responds to client requests for > as> XQueryTextExtents. > > Well, the Xaw widget set doesn't seem to be using XQueryTextExtents() > at all. That is for efficiency reasons. man XTextExtents. XQueryTextExtents (something more powerful than that) was the proposed new mechanism. > > I'd really like to see this 'not so hard' algorithm whose existence > you have postulated :). > The existing algorithm in Xaw: // startx = x[1] // pixel_width = x[2] - x[1] FindPosition(textpos, startx, pixel_width) nchars = 0 curpos = startx while 1: // Uses XFontStruct for efficiency width = compute the width of the next char in the text buf nchars++; curpos += width if (curpos >= startx + pixel_width) break; // everything starting from textpos to textpos + nchars is "selected" The proposed new algorithm: FindPosition(textpos, startx, pixel_width) // Make a single request to the X Server - this doesn't exist in // the X protocol yet nchars = XComputeWidth(textbuf[textpos:end-of-line], startx, pixel_width, // other args font etc) // everything starting from textpos to textpos + nchars is "selected" The X server would deal with all the context sensitive reordering and joining and computes nchars. I don't have an algorithm for doing this, but am postulating that the X server has all the information that it needs to compute nchars. > as> I think we should bring this up on the right XFree86 fora and > as> resolve it there. > > I think that it would be prudent to first understand how the X window > system actually works. Is it really that important ? :) > Especially so, if you are going to claim that > the X protocol specification is ambiguous/in error, and that the error > has been undetected for the two decades (or so) that the specification > has been around :). I just said it was inconsistent in the use of character codes vs glyph codes - not that it was ambiguous or in error. This seems to be a consequence of it being designed at a time, when the distinction between the two was not as important as it is today. And you yourself (along with others on this list) accepted that certain references were ambiguous. What's all the fuss about then ? :) I think, where we stand today, both the approaches are feasible and we need to come up with the pros and cons of each approach. I've given several tangible advantages of implementing it on the X server. Perhaps you could articulate your thoughts on why you think it should be done in a client side library ? -Arun |
From: Arun S. <ar...@sh...> - 2002-02-19 19:03:14
|
On Tue, Feb 19, 2002 at 10:10:41AM -0800, Arun Sharma wrote: > The proposed new algorithm: > > FindPosition(textpos, startx, pixel_width) > // Make a single request to the X Server - this doesn't exist in > // the X protocol yet > nchars = XComputeWidth(textbuf[textpos:end-of-line], startx, pixel_width, > // other args font etc) > > // everything starting from textpos to textpos + nchars is "selected" > Hmm, I don't think I needed to invent a new protocol request. XQueryTextExtents seems to be good enough. Another thought: we'll have to implement this algorithm on the X server side with Open Type fonts anyway, in order to service this particular request. Why reimplement it on a client side library ? To summarize my thoughts on the advantages of taking the client side approach: - Preserve status quo (use glyph codes) - Less pressure on the X server - good for "thin clients" = "thin X servers" - Reduced network traffic ? I think we'll have to empirically determine this one (Is XQueryTextExtents traffic >> Shipping the Open Type font info to the client once and processing XTextExtents locally ?). On the other hand, X has not been optimized for network efficiency (it always assumed a fast ethernet environment). More ? -Arun |
From: <jk...@Fr...> - 2002-02-20 07:17:33
|
Arun, as> That is for efficiency reasons. man XTextExtents. XQueryTextExtents as> (something more powerful than that) was the proposed new mechanism. And even *that* isn't being used in the Athena widget set. Folks, please read the code before offering suggestions. It would help to keep the signal-to-noise ratio reasonable. as> The proposed new algorithm: as> as> FindPosition(textpos, startx, pixel_width) as> // Make a single request to the X Server - this doesn't exist in as> // the X protocol yet as> nchars = XComputeWidth(textbuf[textpos:end-of-line], startx, pixel_width, as> // other args font etc) as> as> // everything starting from textpos to textpos + nchars is "selected" Well, you've just changed your X client. I thought you were going to describe an algorithm that would allow X clients to work unchanged in the presence of arbitrary glyph reordering, substitutioning and positioning by the X server. as> I just said it was inconsistent in the use of character codes vs glyph as> codes - not that it was ambiguous or in error. This seems to be a as> consequence of it being designed at a time, when the distinction between as> the two was not as important as it is today. The distinction between characters and glyphs is important even for Latin scripts. Consider ligatures and diacritical marks; some Latin encodings have separate character codes for the diacritical marks; a "c" and a "cedilla" (two code points) together can have a different glyph in these languages. Similarly "f", "f" and "i" combine to form a distinct glyph "ffi". The X protocol was explicitly designed NOT to support these kinds of transformations. as> And you yourself (along with others on this list) accepted that certain as> references were ambiguous. What's all the fuss about then ? :) One place in the X protocol specification uses the phrase 'string of characters'. Now the word 'character' has (today) become an overloaded phrase, with meanings ranging from the visual representation (the letterform), the 'abstract' character itself, the code point assigned to the character in a given encoding, a specific glyph in a font, etc. The exact meaning is usually clear from the context. Nowhere does the X11 protocol specification say that 'character codes' are to be used in text drawing requests. In fact, it EXPLICITLY states that the semantics of character `codes' are NOT to be honored by the X server. If you change this, you'll end up with some other "protocol", not the X protocol. This new graphics "protocol" is however: a. inconsistent i. how do you map a screen coordinate back to position in the text stream if you are doing complex text rendering? b. incomplete i. how do you specify text in a different character encoding? ii. how do you access glyphs in a font that do not correspond to a `character'? c. suffers from new problems i. If you are indexing fonts using character codes, how do you use fonts that do not contain glyphs of 'letters'? You don't want glyph combining and reordering happening for the glyphs in a symbol font for example. ...etc... as> need to come up with the pros and cons of each approach. I've given as> several tangible advantages of implementing it on the X as> server. Perhaps you could articulate your thoughts on why you think as> it should be done in a client side library ? Implementing Indic script support in the X server alone without changing clients appears to be infeasible. However, you don't need to change the X server to support Indic scripts. Here is one way how it would work: >> Client side Indic Rendering I In a client side rendering model, the client transforms: `M' code-points -> `N' PolyText protocol requests The client then draws glyphs on screen using the standard PolyText/ImageText requests. In this model, the client does the necessary glyph substitution, reordering and positioning, using whatever algorithm appropriate for the script it is processing it chooses. The end result of the transformation is a set of [font, x/y-position, glyph-lists] tuples that would go out as protocol requests. Further, in this model, the client has all the information required to map an [x,y] screen coordinate returned in an X event back to a position in the 'text' stream (since it did all the reordering, positioning and glyph substitution). o this is efficient in terms of network bandwidth (glyph indices are sent over) o it doesn't break anything; you are still using the X11 protocol :) o it will work on every X server in the world; no need for extensions. o the X server is still doing the rendering of glyphs onto the screen and can apply the usual caching/pre-rendering optimizations for done for text. o you can support multiple encodings (KSCLP, TSCII, UNICODE, whatever) o you can support multiple algorithms for Indic rendering The downside: Client side rendering requires fonts to be coded to a well-known font encoding scheme, since the client has to transform character code-points to lists of glyph indices and their positions. Question to the list: What font encoding standards are available for indic scripts? How complete are they --- do they cover every letterform (graphical shape) used by a language's writing system? >> Client side Indic Rendering II Another way of getting Indic rendering to work without any X server modifications would be to have the client render glyphs onto a bitmap and send this "final" bitmap across. I.e, the client transforms `M' code points -> 1 bitmap This doesn't have the dependency on "well-known" font encodings (in fact the font need not be present at the X server at all) but has at least three drawbacks: o sending a bitmap over is costlier than sending over glyph indices o the client has to do text rendering inside of itself, adding to its complexity, and complexity of administration o the X server can't optimize its use of the glyphs of a font The other characteristics are like that of ``Client Side Indic Rendering I''. Regards, Koshy <jk...@fr...> |
From: Arun S. <ar...@sh...> - 2002-02-20 09:12:54
|
On Tue, Feb 19, 2002 at 11:17:32PM -0800, Joseph Koshy wrote: Hi Koshy, > > > Arun, > > as> That is for efficiency reasons. man XTextExtents. XQueryTextExtents > as> (something more powerful than that) was the proposed new mechanism. > > And even *that* isn't being used in the Athena widget set. And when did I say Xaw was using XTextExtents or XQueryTextExtents ? My description of the FindPosition() algorithm didn't make any references to either of the two. > > Folks, please read the code before offering suggestions. It would > help to keep the signal-to-noise ratio reasonable. > The algorithm FindPosition() was written after referring to the code. It'd help the quality of the discussion, if you respected other people's intelligence and knowledge. [ back to the topic under discussion ] > as> The proposed new algorithm: > as> > as> FindPosition(textpos, startx, pixel_width) > as> // Make a single request to the X Server - this doesn't exist in > as> // the X protocol yet > as> nchars = XComputeWidth(textbuf[textpos:end-of-line], startx, pixel_width, > as> // other args font etc) > as> > as> // everything starting from textpos to textpos + nchars is "selected" > > Well, you've just changed your X client. > I never said we could support Indic scripts without changing X clients or the protocol. Obviously, some extensions are needed. What I did say however is that not everybody is interested in installing client side libraries specific to Indic scripts. I for eg, do not install Cyrillic fonts on my machine. A possible solution could consist of: 1. Some generic (i.e. script/language independent) extensions to the X protocol get standardized and installed on most machines around the world. 2. An Indic language server side extension that only someone interested in running a unicode compliant application with Indic script installs on the machine running the X server. > I thought you were going to describe an algorithm that would allow X > clients to work unchanged in the presence of arbitrary glyph > reordering, substitutioning and positioning by the X server. No. See above. > The distinction between characters and glyphs is important even for > Latin scripts. Consider ligatures and diacritical marks; some Latin > encodings have separate character codes for the diacritical marks; a > "c" and a "cedilla" (two code points) together can have a different > glyph in these languages. Similarly "f", "f" and "i" combine to form > a distinct glyph "ffi". > > The X protocol was explicitly designed NOT to support these kinds of > transformations. Yes, the designers of X wanted to keep X to be nothing more than an image rendering protocol and they probably had a reason too (which I haven't found even after quite a bit of searching - would appreciate references to X design rationale - I already have the OReilly Xlib book). Sure, we should pay attention to the wisdom of these people, but we also should keep in mind that things were very different 15 years ago. Reading: http://www.xfree86.org/~keithp/talks/render.html confirms that. However, questioning their design decisions and considering possible implementations, that introduce new extensions without breaking backward compatibility should be done, IMO. Perhaps, the right thing to do is implement Indic support in client side libraries. Who knows ? But it doesn't hurt to have all the options on the table and discuss the pros and cons of each. > > If you change this, you'll end up with some other "protocol", not the > X protocol. This new graphics "protocol" is however: > > a. inconsistent > i. how do you map a screen coordinate back to position in the text > stream if you are doing complex text rendering? Inconsistent with what ? I'd say it's more consistent because all the codes that go on the wire are character codes and glyph codes are internal to the X server. If it's possible to do it on the client side, it must be possible to do it on the server. The server has all the information it needs to do this computation. That's not to say it's desirable - just that it's possible. > > b. incomplete > i. how do you specify text in a different character encoding? Simple. Put font1 with encoding1 in the GC and call PolyText. Put font2 with encoding2 in the GC and call PolyText again. > ii. how do you access glyphs in a font that do not correspond to > a `character'? The client doesn't need to. It just deals with character strings (in the conventional meaning of the word `character'). > > c. suffers from new problems > i. If you are indexing fonts using character codes, how do you use > fonts that do not contain glyphs of 'letters'? > You don't want glyph combining and reordering happening for > the glyphs in a symbol font for example. Using a glyph code == character code encoding. > > ...etc... > > as> need to come up with the pros and cons of each approach. I've given > as> several tangible advantages of implementing it on the X > as> server. Perhaps you could articulate your thoughts on why you think > as> it should be done in a client side library ? > > Implementing Indic script support in the X server alone without > changing clients appears to be infeasible. > Agree. Changing clients is necessary - but the change could be generic and not Indic script specific. > >> Client side Indic Rendering I > > o this is efficient in terms of network bandwidth (glyph indices are > sent over) You didn't count the overhead of sending the font information from the X server to the client. As things stand now, this is a documented problem with unicode fonts with a large difference between minChar and maxChar. And this is not counting the relatively large number of glyphs for a small range of unicode code space in Indic scripts. > > o it doesn't break anything; you are still using the X11 protocol :) > There are ways of doing the server side implementations without "breaking" the letter of the X protocol, while breaking the spirit, I think. > o it will work on every X server in the world; no need for extensions. Granted. > > o the X server is still doing the rendering of glyphs onto the screen > and can apply the usual caching/pre-rendering optimizations for done > for text. True for a server side implementation too. > > o you can support multiple encodings (KSCLP, TSCII, UNICODE, whatever) > True for a server side implementation too. Multiple PolyText requests with a different font (with a different encoding) in the GC each time. > o you can support multiple algorithms for Indic rendering True for a server side implementation too. In fact, this argument works better for a server side implementation. Imagine installing: for each Indic language L: for each font (possibly using a different algorithm) A: for each client machine C: install a client side library For the server side implementation C = 1 and hopefully, we can keep A down to 1. Also, L is not a small number :) Conclusion: the only advantage I can see that's specific to this scheme is that it doesn't require any changes to the X server or the X protocol. I think the issue is an implementation detail and doesn't affect any applications, as long as they call the following time tested Xlib interface: XDrawText(display, d, gc, x, y, items, nitems) Display *display; Drawable d; GC gc; int x, y; XTextItem *items; int nitems; There may be some value in experimenting with this interface with both the approaches and learn from the experience. In some cases, though one approach may be technically superior, the "market" may decide differently. I'm yet to study the IndiX code - which I finally downloaded today. Will probably chew on it for a while. -Arun |
From: Arun S. <ar...@sh...> - 2002-02-20 17:55:24
|
On Wed, Feb 20, 2002 at 01:16:27AM -0800, Arun Sharma wrote: > Yes, the designers of X wanted to keep X to be nothing more than an > image rendering protocol and they probably had a reason too (which I > haven't found even after quite a bit of searching - would appreciate > references to X design rationale - I already have the OReilly Xlib book). Some more thoughts on this topic: - Most of the X designers worked for comapnies that had a thin client (X server) and fat server (X client) ideology. So naturally, they were inclined to keep the X server simple enough to be implemented in cheap hardware - NCD xterminals etc. However, the design center (at least numerically) for X has shifted to x86 PCs running some form of Free UNIX. - Advances in hardware technology also have pushed more functionality to the X server. To be fair, most of these have been in the area of "acceleration" - I think we should consider yet another alternative to the ones we're discussing. That would be (Apart from): 1. client side library implementation (Koshy's proposal) 2. server side implementation (IndiX) 3. "Character -> glyph code" server Basically, have an external process to the complex mapping between characters and glyphs. This is not very different from say XIM servers for CJK. This has the advantages of both 1 and 2, namely: - Installing Indic software on fewer machines - Simplicity of the X server - No extension the protocol and keeping the spirit of the X protocol (image drawing server) However, X has been often criticized for sluggish performance due to having too much stuff running in many different address spaces (X server, window manager, x client) and this will only add to the misery. - Another point to consider - font selection in a unicode text buffer containing codes from multiple scripts. Communication overhead might increase, if the client has to query the server for the correct font for each of the scripts. In a server side implementation, this could be done with less network overhead ? -Arun |
From: Keyur S. <key...@ya...> - 2002-02-21 17:47:01
|
Hi, --- Joseph Koshy <jk...@Fr...> wrote: > The distinction between characters and glyphs is > important even for > Latin scripts. Consider ligatures and diacritical marks; > some Latin > encodings have separate character codes for the > diacritical marks; a > "c" and a "cedilla" (two code points) together can have a > different > glyph in these languages. Similarly "f", "f" and "i" > combine to form > a distinct glyph "ffi". > > The X protocol was explicitly designed NOT to support > these kinds of > transformations. True. So now you are agree on the point that under existing X Protocol requests we can't handle such complexity. Indic scripts have similar complexity; even more than this. > One place in the X protocol specification uses the phrase > 'string of > characters'. Now the word 'character' has (today) become > an > overloaded phrase, with meanings ranging from the visual > representation (the letterform), the 'abstract' character > itself, the > code point assigned to the character in a given encoding, > a specific > glyph in a font, etc. The exact meaning is usually clear > from the context. > > Nowhere does the X11 protocol specification say that > 'character codes' > are to be used in text drawing requests. In fact, it > EXPLICITLY > states that the semantics of character `codes' are NOT to > be honored by the X server. It says that X protocol does no translation of character sets. It doesn't mean that characters 'codes' are not to be honored by the X server. X protocol also doesn't EXPLICITLY says that glyph codes have to be passed in the request. Everywhere it says about "values" passed in the request. It was left for implementation to decide what are these "values". Actually at that time, there was no distinction between character codes and glyph codes. > > If you change this, you'll end up with some other > "protocol", not the > X protocol. This new graphics "protocol" is however: I think while talking about X, you are simply ignoring the fact that X Window system is not merely the X Protocol but it consists of X Protocol, X library, X server, and font renderer used by the X server. Since now all the font renderers used by X Server do have mapping table which maps from character codes to glyph codes, it is not possible for an X client to specify particular glyph to be displayed just by passing glyph codes in XDrawString unless there is "char codes = glyph codes" in the font. Can client use any X TrueType font for Indic with Unicode encoding and determine which glyph to use for character "KA"? It is not possible for the client since the font is loaded by the server and the glyph information is again kept by the font renderer which is not directly accessible by the client. The client will simply pass the Unicode value of the character "KA" in XDrawString16 call and the font renderer will take care of it. You can try this out by writing a simple application to draw glyph for character "KA" from some Unicode encoded Indic font. So even if X protocol didn't entertain character codes to be passed in the protocol request, that documented feature (as you said) doesn't match with current scenario in X Window system. Does it mean that all X clients are *now* violating X protocol specification? > > a. inconsistent > i. how do you map a screen coordinate back to position > in the text > stream if you are doing complex text rendering? This is certainly not the job of X server. > b. incomplete > i. how do you specify text in a different character > encoding? OpenType font allows you to design your font in _any_ encoding currently in use. > ii. how do you access glyphs in a font that do not > correspond to > a `character'? This should be done using some intermediate tables in the font like GSUB table in OpenType font. Indic scripts have many glyphs which don't have any character code value in Unicode but as you must be knowing these glyphs are getting displayed in IndiX which use OpenType font. > > c. suffers from new problems > i. If you are indexing fonts using character codes, > how do you use > fonts that do not contain glyphs of 'letters'? > You don't want glyph combining and reordering > happening for > the glyphs in a symbol font for example. Your font must specify some character encoding. There will be a mapping table from "character code" in this encoding to "glyph code". Using this mapping table, you can display all the glyphs from your font. On the other hand I would like to ask you that how one can determine glyph id for a character when all the font information is kept by the server? > However, you don't need to change the X server to support > Indic > scripts. Here is one way how it would work: > > >> Client side Indic Rendering I > > o this is efficient in terms of network bandwidth (glyph > indices are sent over) Not necessary for Indic script. There is one-to-one, one-to-many, many-to-one, and many-to-many mapping between character codes and glyph codes. I gave few examples in one of my earlier mail. It means that 'm' character codes may be mapped to 'n' glyphs and it is possible that n > m. > o it doesn't break anything; you are still using the X11 > protocol :) This point we'll skip until we decide whether passing character codes actually breaks the Xprotocol. I think other points have already been addressed by Arun, so I will wait for response from you and other people. Regards, Keyur __________________________________________________ Do You Yahoo!? Yahoo! Sports - Coverage of the 2002 Olympic Games http://sports.yahoo.com |
From: Arun S. <ar...@sh...> - 2002-02-22 01:18:21
|
On Thu, Feb 21, 2002 at 09:46:55AM -0800, Keyur Shroff wrote: > renderer used by the X server. Since now all the font > renderers used by X Server do have mapping table which maps > from character codes to glyph codes, it is not possible for > an X client to specify particular glyph to be displayed > just by passing glyph codes in XDrawString unless there is > "char codes = glyph codes" in the font. The man page for XDrawString maintains the ambiguity about what the arguments represent - character or glyph codes. So it's possible to call XDrawString with glyph codes, which get transmitted via PolyText* requests and assuming the X server can look up the glyph by glyph code and render the font. In general, I don't have any doubts that both implementations are _possible_. It's just a matter of which one has more desirable properties. In the following stack: Application -> Toolkit (eg: Qt) -> Xlib -> X protocol -> X server -> renderer we certainly know that the input to the Toolkit is character code. But beyond that, implementations could do what they want. > > Can client use any X TrueType font for Indic with Unicode > encoding and determine which glyph to use for character > "KA"? It is not possible for the client since the font is > loaded by the server and the glyph information is again > kept by the font renderer which is not directly accessible > by the client. A client side implementation will have to enhance the X protocol to transmit the cmap tables to the client. Also, there is talk of using "local client side fonts" and transmitting glyphs to the server on the XFree86-render mailing lists. In this implementation, fonts are accessible to the client, but not th server. More on this below. > The client will simply pass the Unicode > value of the character "KA" in XDrawString16 call and the > font renderer will take care of it. You can try this out by > writing a simple application to draw glyph for character > "KA" from some Unicode encoded Indic font. Few clients will call XDrawString directly. They may call gtk_label_new("KA") and then a library like pango can take over, convert "KA" to a glyph string and then call XDrawString with the glyph string. http://cvs.gnome.org/lxr/source/pango/pango/pango-layout.c http://cvs.gnome.org/lxr/source/pango/libpango/glyphstring.c http://cvs.gnome.org/lxr/source/gtk+/gtk/gtklabel.c Based on my web searches, other reasons why people might have gyrated towards client side solutions: - Project management issues, licensing issues (My conclusions/speculation) People working on gnome or KDE might find it easier to get their code into their own repositories than the XFree86 one. People wanting to keep their code under GPL and not the MIT/X style licenses. - "Glacial" speed of X server development http://www.xfree86.org/~keithp/talks/xtc2001/paper/ http://www.xfree86.org/pipermail/render/2001-August/001291.html I'd also like to write up a proposal for a binary distribution strategy for IndiX. It's my belief that we'll end up with both client side and server side solutions, which could possibly be installed on the same machine side-by-side. Fortunately, the apps don't get affected and may even dynamically choose one using LD_PRELOAD. -Arun |
From: Arun S. <ar...@sh...> - 2002-02-22 07:31:06
|
On Tue, Feb 19, 2002 at 10:10:41AM -0800, Arun Sharma wrote: > The existing algorithm in Xaw: > > // startx = x[1] > // pixel_width = x[2] - x[1] > FindPosition(textpos, startx, pixel_width) > nchars = 0 > curpos = startx > while 1: > // Uses XFontStruct for efficiency > width = compute the width of the next char in the text buf > nchars++; > curpos += width > if (curpos >= startx + pixel_width) > break; > > // everything starting from textpos to textpos + nchars is "selected" I've studied how gtk implements this algorithm. The algorithm can be found at: gtk/gtktext.c - find_mouse_cursor_at_line() The algorithm is very similar, proceeding one character at a time (in other words, broken for (context sensitive) Indic fonts). It has a 256 byte cache which caches the widths of all the characters in the range 0-256 (gtk/gtktext.c - find_char_width()) per font. (Hint: Ammo for those of you looking for "latin1 bias" :) For everything > 256, it delegates it to gdk, which delegates it to XTextExtents(). XTextExtents is a network efficient function, in that it is capable of responding to requests locally (unlike XQueryTextExtents, which has to consult the X server). The way IndiX has implemented it, XTextExtents is mapped to XQueryTextExtents - sacrificing network efficiency. In other words, when you drag your mouse, gtk will execute the above loop and a X protocol request is made for every character to compute its width. A new protocol request - such as XComputeWidth (Is XComputeNChars a better name ?) will batch these XQueryTextExtent requests into a single request. Keyur, I didn't understand your comment on why this functionality doesn't belong to the X server. Or did I misread your statement ? Another data point: Using character codes is NOT unprecedented in XFree86. I just finished reading the i18n specs. http://www.x-docs.org/i18n/Framework.pdf and looked at the implementation: xc/lib/X11/omDefault.c - _Xutf8DefaultDrawString The input is clearly a UTF-8 string. It's calling _XmbDefaultDrawString which is calling XDrawString, which is doing a PolyText request in the X protocol. In other words, a character code is being sent over the X protocol. On a related note, can anyone on the list enlighten me on the difference between utf8 and mbs (multi byte string) ? I thought UTF-8 was a mbs too. -Arun |
From: Keyur S. <key...@ya...> - 2002-02-22 10:13:56
|
Hi, --- Arun Sharma <ar...@sh...> wrote: > For everything > 256, it delegates it to gdk, which > delegates it to > XTextExtents(). XTextExtents is a network efficient > function, in that it > is capable of responding to requests locally (unlike > XQueryTextExtents, which has to consult the X server). > > The way IndiX has implemented it, XTextExtents is mapped > to > XQueryTextExtents - sacrificing network efficiency. In > other words, when you > drag your mouse, gtk will execute the above loop and a X > protocol > request is made for every character to compute its width. ^^^^^^^^^ It should be syllable. In Indic scripts, it is better to select entire syllable while dragging mouse over the character string. Unlike other foreign languages in which a character is the smallest unit of writing system, we consider syllable as the a basic typographical unit of our writing system. It also simplifies interaction with the user. This will also not create problem at the time of selection through mouse as Koshy probed earlier since reordering is done for each syllable and we select entire syllable in the selection operation. The following operations are proposed : (1) Selection should select entire syllable (2) Cursor should move over the syllables (3) Delete key should remove the entire next syllable (4) Backspace should delete a character from the previous syllable (5) Insertion of new character may increase/descrease the number of syllables (6) Deletion of a character may decrease/increase the number of syllables Yes. In (5) and (6), both the cases are possible. I'll give examples when it comes to proper discussion. > > A new protocol request - such as XComputeWidth (Is > XComputeNChars a > better name ?) will batch these XQueryTextExtent requests > into a single > request. Keyur, I didn't understand your comment on why > this > functionality doesn't belong to the X server. Or did I > misread your > statement ? I meant to say that maintaining backbuffer for text and selection of text is not the functionality of server. The client itself does all the job and can also send requests to server to compute width of a character stream. I think my sentence was ambiguous. > Another data point: Using character codes is NOT > unprecedented in > XFree86. I just finished reading the i18n specs. > > http://www.x-docs.org/i18n/Framework.pdf > > and looked at the implementation: > > xc/lib/X11/omDefault.c - _Xutf8DefaultDrawString > > The input is clearly a UTF-8 string. It's calling > _XmbDefaultDrawString > which is calling XDrawString, which is doing a PolyText > request in the X > protocol. In other words, a character code is being sent > over the X protocol. I wonder why this example didn't come into my mind while arguing about use of character codes in XDrawString. This is a perfect example which proves that current implementation of XFree86 doesn't restrict clients to send character codes in PolyText request. Arun, thanks for drawing my attention towards it. > On a related note, can anyone on the list enlighten me on > the difference > between utf8 and mbs (multi byte string) ? I thought > UTF-8 was a mbs too. Yes. UTF-8 is also an mbs. In fact any string encoded in locale dependent encoding can be thought of as mbs while calling XmbDrawString. XmbDrawString is a function to draw string encoded in locale dependent encoding. A locale may have stateful or stateless encoding scheme. XmbDrawString calls various conversion routines to correctly decode the string in the form (e.g., 16-bit Unicode string in a UTF-8 locale using ISO-10646 encoded font) suitable for XDrawString and then finally use XDrawString to send request to server. mbs is always locale dependent while UTF-8 is always locale independent. Under non-UTF-8 locale their functionality is different but under UTF-8 locale their functionality should be exactly same. Regards, Keyur __________________________________________________ Do You Yahoo!? Yahoo! Sports - Coverage of the 2002 Olympic Games http://sports.yahoo.com |
From: Keyur S. <key...@ya...> - 2002-02-19 17:32:15
|
Hi, At present I am in Delhi for IndiaSoft 2002 exhibition. I would like to put some comment on this after coming back to Mumbai on 25th February. Regards, Keyur --- Joseph Koshy <jk...@Fr...> wrote: > > > ks> While starting my work on IndiX, I also decided to > give support > ks> using an X extension. But unfortunately, I had to > work under > ks> strictly imposed constraints :-( esp. that > applications should not > ks> be modified for Indian language support. So I did the > thing in > ks> whatever way the people wanted me to do and also > tried to do it in > ks> best possible way! > > Most unmodified X applications may not work correctly > with an X server > that does "behind-the-scenes" glyph reordering and > substitution. > > Consider the following scenario: > - the user presses Button-1 down on some `x[1],y[1]' > location on screen > and sweeps the pointer over the screen > - Button-1 is released at location `x[2],y[2]' > > These screen coordinates get reported back to the > application in the > form of "events". Given these two pixel coordinates, the > X > application has to figure out the region of the > underlying text that > was "selected". This involves going backwards from 'x,y' > coordinates > to the character code points in its text buffer. > > If the X server is doing arbitrary glyph reordering and > glyph > substitution unknown to the X client, then this > translation will go > wrong in the client. > > I don't think the requirement of X applications running > unchanged with > Indic scripts is a feasible one. > > Regards, > Koshy > <jk...@fr...> > > _______________________________________________ > Indic-computing-devel mailing list > http://indic-computing.sourceforge.net/ > Ind...@li... > https://lists.sourceforge.net/lists/listinfo/indic-computing-devel __________________________________________________ Do You Yahoo!? Yahoo! Sports - Coverage of the 2002 Olympic Games http://sports.yahoo.com |
From: <jk...@Fr...> - 2002-02-25 02:33:19
|
ks> It says that X protocol does no translation of character ks> sets. It doesn't mean that characters 'codes' are not to be ks> honored by the X server. X protocol also doesn't EXPLICITLY ks> says that glyph codes have to be passed in the request. The protocol says very clearly that fonts in X are collections of glyph bitmaps indexed by a glyph index. It states that the values used in text drawing requests are indices into the font. It also states very clearly that 'character codes' (i.e code points of any character set) are not used in the protocol. It states very clearly how the drawing requests are to place the bitmaps of the specified glyphs next to each other (i.e, it disallows reordering or substitution of glyphs). X clients rely on the X server placing the exact glyphs specified, at the exact pixel coordinates specified by them, when implementing user interfaces. There is a whole sub-standard (the X Logical Font Description) that is used by clients to select fonts with desired font encodings so that everything 'just works'. This is one of the cornerstones of X's design. ks> Everywhere it says about "values" passed in the request. It ks> was left for implementation to decide what are these ks> "values". You are implying ambiguity in the specification where none exists. If you had a question about this, you could have asked on the XFree86 lists. I wonder what you are hoping to achieve by arguing about the X protocol on /this/ list: - If there was really a doubt, you could have asked for clarification from the rest of the X community; I see no mail from you on this topic in the XFree86 archives. - You haven't run the test suite. - A cursory search in the XFree86 archives for ``glyph indices'' or other keywords would have revealed enough. The initial review of IndiX had been posted on <indic-computing-devel> to make public the rationale for why it wouldn't be bundled in the 'Bootable OS' sub-project. As and when IndiX gets re-designed to be protocol compliant, we'll be happy to look at it again. Regards, Koshy <jk...@fr...> |
From: Keyur S. <key...@ya...> - 2002-02-25 07:59:03
|
Hi, --- Joseph Koshy <jk...@Fr...> wrote: > The protocol says very clearly that fonts in X are > collections of > glyph bitmaps indexed by a glyph index. It states that > the values > used in text drawing requests are indices into the font. It states that the values used in text drawing requests are values used to index glyphs. This doesn't mean that these are glyph indices. > It also states very clearly that 'character codes' (i.e > code points of > any character set) are not used in the protocol. > > It states very clearly how the drawing requests are to > place the > bitmaps of the specified glyphs next to each other (i.e, > it disallows > reordering or substitution of glyphs). Please give reference to each of above. > There is a whole sub-standard (the X Logical Font > Description) that is > used by clients to select fonts with desired font > encodings so that everything 'just works'. > > This is one of the cornerstones of X's design. XLFD (X Logical Font Description) is not just meant for clients. It is also used by X library. When everything is wrapped in X library then X client does not necessarily use this XLFD. > ks> Everywhere it says about "values" passed in the > request. It > ks> was left for implementation to decide what are these > ks> "values". > > You are implying ambiguity in the specification where > none exists. You can send us pointers in X protocol specs where it is _clearly_ mentioned that only glyph indices are used. I have searched the specification but nowhere I have found that only glyph indices are used and character codes should not be used. > If you had a question about this, you could have asked on > the XFree86 lists. > > I wonder what you are hoping to achieve by arguing about > the X protocol on /this/ list: This discussion will help us in deciding correct design of bootable OS. When we have some doubts about other person's idea then it is better to clarify that before moving further in the design. This will only help all the people to contribute towards better design. And I remind you that only you raised questions about X protocol, not me. I am just answering to your questions. > - If there was really a doubt, you could have asked for > clarification > from the rest of the X community; I see no mail from > you on this > topic in the XFree86 archives. I don't have any doubt. Doubt is in your mind. So I don't see any need to raise this issue on XFree86 mailing list. > - You haven't run the test suite. In one of my earlier mail I stated that older copy of IndiX (which is on the website) breaks relationship with other foreign languages. I have fixed the problem now and Pablo Saratxaga (maintainer of Mandrake Linux) tested it on his machine. IndiX was showing French without any problem. I'll put the changes on the web. > - A cursory search in the XFree86 archives for ``glyph > indices'' or > other keywords would have revealed enough. Also see 'man XDrawString'.It clearly states that "character string" is passed in the function. I tell you that X library doesn't convert these character codes into glyph codes before sending them in the protocol request. > The initial review of IndiX had been posted on > <indic-computing-devel> > to make public the rationale for why it wouldn't be > bundled in the > 'Bootable OS' sub-project. When someone review my system publicly and posted his mail in public list, then I reserve all the rights to defend my system in public. I wonder why do you want to discuss this topic off the list when it is very much connected to design of Indic OS? > As and when IndiX gets re-designed to be protocol > compliant, we'll be happy to look at it again. I can definately think about redesigning of IndiX once it will be proved that it really breaks protocol. Remember that you have safely ignored questions raised by Arun. You have also decided to keep quiet on the issues like XUtf8DrawString and XmbDrawString which we believe sends character codes in X Protocol request. When you believe that sending character codes breaks X protocol design then you should prove that these functions really deal with glyph codes and not character codes. You should also try to write a small application to draw a string using some Unicode encoded TrueType font and calling XDrawString16. If you want, I can send a small program which I have tested. It will show Indic characters on any X server not only IndiX. Instead of just saying what is there in X protocol, you should also think about current implementation of XFree86. Regards, Keyur __________________________________________________ Do You Yahoo!? Yahoo! Sports - Coverage of the 2002 Olympic Games http://sports.yahoo.com |
From: <jk...@Fr...> - 2002-02-25 10:26:02
|
ks> I can definately think about redesigning of IndiX once it ks> will be proved that it really breaks protocol. I'm happy to hear that. I've only heard good reports of the Indic (Devanagari) rendering aspects of IndiX. ks> have also decided to keep quiet on the issues like ks> XUtf8DrawString and XmbDrawString which we believe sends I didn't think this list is the place to clear the basic misconceptions about X programming. Since you seem unable (unwilling?) to do your own homework, I'll do it for you (this once). Please start with: "i18n mechanism" http://www.xfree86.org/pipermail/i18n/2002-February/003074.html and follow the thread, in particular the replies from Tomohiro KUBOTA http://www.xfree86.org/pipermail/i18n/2002-February/003077.html and Keith Packard. http://www.xfree86.org/pipermail/i18n/2002-February/003078.html OR the answers to my question here: "Complex text layout and mapping screen coordinates" http://www.xfree86.org/pipermail/fonts/2002-February/001331.html OR Arun Sharma's continuation of this argument on those lists: http://www.xfree86.org/pipermail/fonts/2002-February/001341.html OR this discussion here: "Another approach to text in X" http://www.xfree86.org/pipermail/fonts/2002-February/001339.html These links should help to clarify matters in a more efficent manner than any discussion on <indic-computing-devel> could hope to achieve. Could you please take further discussion of basic X concepts and X programming to the XFree86 lists? They are a better place for such discussions. We can discuss IndiX being part of the Bootable CD once it is re-designed to be protocol compliant. Regards, Koshy <jk...@fr...> |
From: Guntupalli K. <kar...@fr...> - 2002-02-05 10:17:19
|
On Mon, 4 Feb 2002 20:58:38 +0530 "Sastry Ramachandrula" <rs...@mg...> wrote: > Dear Koshy, > > I sincerely appreciate your efforts in trying to support Indian > Languages on Linux. After having shared the keen insights you have > gained by looking at both IndLinux(IITM) and Indix(NCST), what are > your final suggestions/recommendations? > > When can we expect a complete release that would support atleast > Hindi completely without breaking the compatibility with the X > Window System protocol? > Maybe something like this is needed http://www.x.org/contrib/i18n/ Some lead developers (on KDE/GNOME, even X) have given the opinion that complex text servies should be carried out at higher layers (eg toolkits). Xft & Xrender mech. address the font issues, but not the issues like , text reordering, cluster formation, glyph selection etc. which is what pango does. Some discussions have been going on regarding modifying X text support, mainly to accomodate complex texts & new font mechanisms. Team from Sun made the foll posting http://XFree86.Org/pipermail/i18n/2001-December/002727.html One font gurus opinion http://XFree86.Org/pipermail/fonts/2001-December/001210.html Freetype's plans in reference to above http://www.freetype.org/pipermail/devel/2001-December/002740.html Regards, Karunakar |