indic-computing-devel Mailing List for The Indic-Computing Project (Page 21)
Status: Alpha
Brought to you by:
jkoshy
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(14) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(25) |
Feb
(90) |
Mar
(41) |
Apr
(16) |
May
(8) |
Jun
|
Jul
(37) |
Aug
(35) |
Sep
(62) |
Oct
(37) |
Nov
(22) |
Dec
(7) |
2003 |
Jan
(16) |
Feb
(19) |
Mar
(10) |
Apr
(5) |
May
(26) |
Jun
(11) |
Jul
(35) |
Aug
(4) |
Sep
(14) |
Oct
(5) |
Nov
(5) |
Dec
(10) |
2004 |
Jan
(25) |
Feb
(2) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(10) |
Aug
(2) |
Sep
(2) |
Oct
(1) |
Nov
(9) |
Dec
|
2005 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2006 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(4) |
Dec
|
From: Arun S. <ar...@sh...> - 2002-02-22 17:12:09
|
On Fri, Feb 22, 2002 at 01:25:58PM +0530, Tapan S. Parikh wrote: > > It still seems to me mistakes can be made... > > Take an example text = (a b c d e) [char codes] > > But after reordering = (f-a, f-b, f-c, f-e, f-d) [glyph codes] > > Then if ComputeWidth returns 4, wouldnt the client improperly assume char > codes (a, b, c, d) were selected, and not the correct (a, b, c, f)? Usually, a text widget's (such as gtktext) idea of a selection is offsets into the character buffer (not the glyph string buffer). So I'm not sure what it means to say that (a b c f) is selected. A request such as XComputeWidth could send the string "abcde" over the network in a single request. The X server could compute the glyph string, compute the widths and compute the number of syllables selected (using the IndiX algorithm). Further, it could convert the number of syllables to the number of characters and send it back to the client. As a result, (a b) may be selected or (a b c d) may be selected, but not (a b c) assuming that (c d) is a syllable. > If this is indeed a prb, It seems one way to handle this would be to use an > algorithm like Keyurs on the client to make sure only full syllables are > highlighted... Sure, it could be done on the client (as it is done in IndiX) - but the number of round trips to the server to compute the width of each syllable could be large, depending on the size of the selection. Reading XFree86 lists, people wanting to optimize X performance are often told that - Text requests are not a problem - Optimize round trip latencies - Optimize image performance At this point, XComputeWidth is purely a latency reducing, bandwidth conserving optimization. XQueryTextExtents provides all the necessary functionality needed for correctness. -Arun |
From: Keyur S. <key...@ya...> - 2002-02-22 10:13:56
|
Hi, --- Arun Sharma <ar...@sh...> wrote: > For everything > 256, it delegates it to gdk, which > delegates it to > XTextExtents(). XTextExtents is a network efficient > function, in that it > is capable of responding to requests locally (unlike > XQueryTextExtents, which has to consult the X server). > > The way IndiX has implemented it, XTextExtents is mapped > to > XQueryTextExtents - sacrificing network efficiency. In > other words, when you > drag your mouse, gtk will execute the above loop and a X > protocol > request is made for every character to compute its width. ^^^^^^^^^ It should be syllable. In Indic scripts, it is better to select entire syllable while dragging mouse over the character string. Unlike other foreign languages in which a character is the smallest unit of writing system, we consider syllable as the a basic typographical unit of our writing system. It also simplifies interaction with the user. This will also not create problem at the time of selection through mouse as Koshy probed earlier since reordering is done for each syllable and we select entire syllable in the selection operation. The following operations are proposed : (1) Selection should select entire syllable (2) Cursor should move over the syllables (3) Delete key should remove the entire next syllable (4) Backspace should delete a character from the previous syllable (5) Insertion of new character may increase/descrease the number of syllables (6) Deletion of a character may decrease/increase the number of syllables Yes. In (5) and (6), both the cases are possible. I'll give examples when it comes to proper discussion. > > A new protocol request - such as XComputeWidth (Is > XComputeNChars a > better name ?) will batch these XQueryTextExtent requests > into a single > request. Keyur, I didn't understand your comment on why > this > functionality doesn't belong to the X server. Or did I > misread your > statement ? I meant to say that maintaining backbuffer for text and selection of text is not the functionality of server. The client itself does all the job and can also send requests to server to compute width of a character stream. I think my sentence was ambiguous. > Another data point: Using character codes is NOT > unprecedented in > XFree86. I just finished reading the i18n specs. > > http://www.x-docs.org/i18n/Framework.pdf > > and looked at the implementation: > > xc/lib/X11/omDefault.c - _Xutf8DefaultDrawString > > The input is clearly a UTF-8 string. It's calling > _XmbDefaultDrawString > which is calling XDrawString, which is doing a PolyText > request in the X > protocol. In other words, a character code is being sent > over the X protocol. I wonder why this example didn't come into my mind while arguing about use of character codes in XDrawString. This is a perfect example which proves that current implementation of XFree86 doesn't restrict clients to send character codes in PolyText request. Arun, thanks for drawing my attention towards it. > On a related note, can anyone on the list enlighten me on > the difference > between utf8 and mbs (multi byte string) ? I thought > UTF-8 was a mbs too. Yes. UTF-8 is also an mbs. In fact any string encoded in locale dependent encoding can be thought of as mbs while calling XmbDrawString. XmbDrawString is a function to draw string encoded in locale dependent encoding. A locale may have stateful or stateless encoding scheme. XmbDrawString calls various conversion routines to correctly decode the string in the form (e.g., 16-bit Unicode string in a UTF-8 locale using ISO-10646 encoded font) suitable for XDrawString and then finally use XDrawString to send request to server. mbs is always locale dependent while UTF-8 is always locale independent. Under non-UTF-8 locale their functionality is different but under UTF-8 locale their functionality should be exactly same. Regards, Keyur __________________________________________________ Do You Yahoo!? Yahoo! Sports - Coverage of the 2002 Olympic Games http://sports.yahoo.com |
From: Tapan S. P. <ta...@ya...> - 2002-02-22 08:13:31
|
It still seems to me mistakes can be made... Take an example text = (a b c d e) [char codes] But after reordering = (f-a, f-b, f-c, f-e, f-d) [glyph codes] Then if ComputeWidth returns 4, wouldnt the client improperly assume char codes (a, b, c, d) were selected, and not the correct (a, b, c, f)? If this is indeed a prb, It seems one way to handle this would be to use an algorithm like Keyurs on the client to make sure only full syllables are highlighted... While we discuss this lets not also forget our task of documenting the current state of affairs wrt indic computing... Since I dont think there is a clear answer to this X question, one thing positive that could come out of this is to document the various approaches we have considered so far for adding indic support to X, and to document the design methodology, pros, cons, caveats, compatibility, etc... Any volunteers? I for my part will be trying to find time to document the Mithi and CDAC Indian lang toolkits, listing features, bugs, pros, cons, etc... so that we can get an idea of what all more will be required from our end to have a truly robust indic language working environment... --Tapan > > > The proposed new algorithm: > > > > > > FindPosition(textpos, startx, pixel_width) > > > // Make a single request to the X Server - this doesn't exist in > > > // the X protocol yet > > > nchars = XComputeWidth(textbuf[textpos:end-of-line], startx, pixel_width, > > > // other args font etc) > > > > > > // everything starting from textpos to textpos + nchars is "selected" > > > > > > > Is textbuf supposed to be modified and returned by XComputeWidth? > > No. Textbuf contains character codes. The reordering happens on glyph > codes. The X server could do the complex mapping from character -> > glyph including reordering etc, then compute the width and return the result. > The client is completely ignorant about the glyph codes. > > XComputeWidth, XPolyText' (a brand new request, which uses character > codes) and friends could potentially be in an X extension for a server side > implementation. _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com |
From: Guntupalli K. <kar...@fr...> - 2002-02-22 07:48:10
|
On Thu, 21 Feb 2002 09:12:56 -0800 (PST) Keyur Shroff <key...@ya...> wrote: > > --- Guntupalli Karunakar <kar...@fr...> wrote: > > Hi, > > Please check out the preview of glyphs in the upcoming > > Devanagari > > opentype font at > > http://www.indlinux.org/fonts/ > > I think you have missed out Zero-Width-Joiner (U+200D) and > Zero-Width-NonJoiner (U+200C) in your font. As you must be > knowing, these Unicode characters have special treament > with respect to Indic scripts. Even if these characters are > non-printing characters, they should have some shape in the > font and you can omit their glyphs from being displayed by > putting appropriate rules in the GSUB table. > > Moreover there should be two different glyphs for 'numeric > sign five' as they have different shapes in Hindi and > Marathi. Appropriate glyphs should be chosen depending on > the language selection in the font. Yep, even for numeric sign 'eight', they too have different shapes for hindi & marathi (current one reflects marathi one). Actually there are variations for some characters in hindi itself eg. 'JHA', varying according to the region. > I also don't see the glyphs for Udatta, Anudatta, Acute > Accent and Grave Accent. I suggest that your Unicode > encoded font should contain all glyphs from Unicode under > Devanagari range even if your font supports only Hindi and > Marathi languages. > Thanks for the suggestions , I will update the font designer to make the necessary changes. Regards, Karunakar |
From: Arun S. <ar...@sh...> - 2002-02-22 07:31:06
|
On Tue, Feb 19, 2002 at 10:10:41AM -0800, Arun Sharma wrote: > The existing algorithm in Xaw: > > // startx = x[1] > // pixel_width = x[2] - x[1] > FindPosition(textpos, startx, pixel_width) > nchars = 0 > curpos = startx > while 1: > // Uses XFontStruct for efficiency > width = compute the width of the next char in the text buf > nchars++; > curpos += width > if (curpos >= startx + pixel_width) > break; > > // everything starting from textpos to textpos + nchars is "selected" I've studied how gtk implements this algorithm. The algorithm can be found at: gtk/gtktext.c - find_mouse_cursor_at_line() The algorithm is very similar, proceeding one character at a time (in other words, broken for (context sensitive) Indic fonts). It has a 256 byte cache which caches the widths of all the characters in the range 0-256 (gtk/gtktext.c - find_char_width()) per font. (Hint: Ammo for those of you looking for "latin1 bias" :) For everything > 256, it delegates it to gdk, which delegates it to XTextExtents(). XTextExtents is a network efficient function, in that it is capable of responding to requests locally (unlike XQueryTextExtents, which has to consult the X server). The way IndiX has implemented it, XTextExtents is mapped to XQueryTextExtents - sacrificing network efficiency. In other words, when you drag your mouse, gtk will execute the above loop and a X protocol request is made for every character to compute its width. A new protocol request - such as XComputeWidth (Is XComputeNChars a better name ?) will batch these XQueryTextExtent requests into a single request. Keyur, I didn't understand your comment on why this functionality doesn't belong to the X server. Or did I misread your statement ? Another data point: Using character codes is NOT unprecedented in XFree86. I just finished reading the i18n specs. http://www.x-docs.org/i18n/Framework.pdf and looked at the implementation: xc/lib/X11/omDefault.c - _Xutf8DefaultDrawString The input is clearly a UTF-8 string. It's calling _XmbDefaultDrawString which is calling XDrawString, which is doing a PolyText request in the X protocol. In other words, a character code is being sent over the X protocol. On a related note, can anyone on the list enlighten me on the difference between utf8 and mbs (multi byte string) ? I thought UTF-8 was a mbs too. -Arun |
From: Arun S. <ar...@sh...> - 2002-02-22 01:18:21
|
On Thu, Feb 21, 2002 at 09:46:55AM -0800, Keyur Shroff wrote: > renderer used by the X server. Since now all the font > renderers used by X Server do have mapping table which maps > from character codes to glyph codes, it is not possible for > an X client to specify particular glyph to be displayed > just by passing glyph codes in XDrawString unless there is > "char codes = glyph codes" in the font. The man page for XDrawString maintains the ambiguity about what the arguments represent - character or glyph codes. So it's possible to call XDrawString with glyph codes, which get transmitted via PolyText* requests and assuming the X server can look up the glyph by glyph code and render the font. In general, I don't have any doubts that both implementations are _possible_. It's just a matter of which one has more desirable properties. In the following stack: Application -> Toolkit (eg: Qt) -> Xlib -> X protocol -> X server -> renderer we certainly know that the input to the Toolkit is character code. But beyond that, implementations could do what they want. > > Can client use any X TrueType font for Indic with Unicode > encoding and determine which glyph to use for character > "KA"? It is not possible for the client since the font is > loaded by the server and the glyph information is again > kept by the font renderer which is not directly accessible > by the client. A client side implementation will have to enhance the X protocol to transmit the cmap tables to the client. Also, there is talk of using "local client side fonts" and transmitting glyphs to the server on the XFree86-render mailing lists. In this implementation, fonts are accessible to the client, but not th server. More on this below. > The client will simply pass the Unicode > value of the character "KA" in XDrawString16 call and the > font renderer will take care of it. You can try this out by > writing a simple application to draw glyph for character > "KA" from some Unicode encoded Indic font. Few clients will call XDrawString directly. They may call gtk_label_new("KA") and then a library like pango can take over, convert "KA" to a glyph string and then call XDrawString with the glyph string. http://cvs.gnome.org/lxr/source/pango/pango/pango-layout.c http://cvs.gnome.org/lxr/source/pango/libpango/glyphstring.c http://cvs.gnome.org/lxr/source/gtk+/gtk/gtklabel.c Based on my web searches, other reasons why people might have gyrated towards client side solutions: - Project management issues, licensing issues (My conclusions/speculation) People working on gnome or KDE might find it easier to get their code into their own repositories than the XFree86 one. People wanting to keep their code under GPL and not the MIT/X style licenses. - "Glacial" speed of X server development http://www.xfree86.org/~keithp/talks/xtc2001/paper/ http://www.xfree86.org/pipermail/render/2001-August/001291.html I'd also like to write up a proposal for a binary distribution strategy for IndiX. It's my belief that we'll end up with both client side and server side solutions, which could possibly be installed on the same machine side-by-side. Fortunately, the apps don't get affected and may even dynamically choose one using LD_PRELOAD. -Arun |
From: <fpo...@ba...> - 2002-02-21 20:28:48
|
> The distinction between characters and glyphs is > important even for > Latin scripts. Consider ligatures and diacritical marks; > some Latin > encodings have separate character codes for the > diacritical marks; a > "c" and a "cedilla" (two code points) together can have a > different > glyph in these languages. Similarly "f", "f" and "i" > combine to form > a distinct glyph "ffi". Since I am the writer/language police:) I noticed the phrase "Latin scripts". Are you talking about Western Latin 8859-1 and its various cousins (8859-2, 3, 4etc.) ? JUST for clarification. > > The X protocol was explicitly designed NOT to support > these kinds of > transformations. > with meanings ranging from the visual > representation (the letterform), the 'abstract' character > itself, the > code point assigned to the character in a given encoding, > a specific > glyph in a font, etc. The exact meaning is usually clear > from the context. May I suggest that it often isn't? That is what is driving my literal mind completely around the semantical bend: it is not always made clear what the writers are referring to. Just a personal observation. Maybe it might be an idea to clarify Xlib and X Protocol documentation on this point and take more recent developments in Unicode/ISCII etc. into account? > > Nowhere does the X11 protocol specification say that > 'character codes' > are to be used in text drawing requests. In fact, it > EXPLICITLY > states that the semantics of character `codes' are NOT to > be honored by the X server. >It says that X protocol does no translation of character >sets. It doesn't mean that characters 'codes' are not to be >honored by the X server. So which one is it? A translation of character sets into....glyphs is - to my knowledge at least - the task of the font rendering software used by the X server. what I don't understand is what an X client does. Just display the glyphs? I am not sure what "honoring character codes" would mean here. Are you talking about character set codes or glyph sets? >X protocol also doesn't EXPLICITLY >says that glyph codes have to be passed in the request. >Everywhere it says about "values" passed in the request. It >was left for implementation to decide what are these >"values". Actually at that time, there was no distinction >between character codes and glyph codes. Yes, there was. But there wasn't a terribly urgent need for software developers at the time to pay any attention to the distinction. > > If you change this, you'll end up with some other > "protocol", not the > X protocol. This new graphics "protocol" is however: >The client will simply pass the Unicode >value of the character "KA" in XDrawString16 call and the >font renderer will take care of it. You can try this out by >writing a simple application to draw glyph for character >"KA" from some Unicode encoded Indic font. Interesting. I am beginning to understand what is going on here. So does the passing of character codes (or whatever is meant by this) break the X protocol? Confused again -Frank |
From: Keyur S. <key...@ya...> - 2002-02-21 17:47:01
|
Hi, --- Joseph Koshy <jk...@Fr...> wrote: > The distinction between characters and glyphs is > important even for > Latin scripts. Consider ligatures and diacritical marks; > some Latin > encodings have separate character codes for the > diacritical marks; a > "c" and a "cedilla" (two code points) together can have a > different > glyph in these languages. Similarly "f", "f" and "i" > combine to form > a distinct glyph "ffi". > > The X protocol was explicitly designed NOT to support > these kinds of > transformations. True. So now you are agree on the point that under existing X Protocol requests we can't handle such complexity. Indic scripts have similar complexity; even more than this. > One place in the X protocol specification uses the phrase > 'string of > characters'. Now the word 'character' has (today) become > an > overloaded phrase, with meanings ranging from the visual > representation (the letterform), the 'abstract' character > itself, the > code point assigned to the character in a given encoding, > a specific > glyph in a font, etc. The exact meaning is usually clear > from the context. > > Nowhere does the X11 protocol specification say that > 'character codes' > are to be used in text drawing requests. In fact, it > EXPLICITLY > states that the semantics of character `codes' are NOT to > be honored by the X server. It says that X protocol does no translation of character sets. It doesn't mean that characters 'codes' are not to be honored by the X server. X protocol also doesn't EXPLICITLY says that glyph codes have to be passed in the request. Everywhere it says about "values" passed in the request. It was left for implementation to decide what are these "values". Actually at that time, there was no distinction between character codes and glyph codes. > > If you change this, you'll end up with some other > "protocol", not the > X protocol. This new graphics "protocol" is however: I think while talking about X, you are simply ignoring the fact that X Window system is not merely the X Protocol but it consists of X Protocol, X library, X server, and font renderer used by the X server. Since now all the font renderers used by X Server do have mapping table which maps from character codes to glyph codes, it is not possible for an X client to specify particular glyph to be displayed just by passing glyph codes in XDrawString unless there is "char codes = glyph codes" in the font. Can client use any X TrueType font for Indic with Unicode encoding and determine which glyph to use for character "KA"? It is not possible for the client since the font is loaded by the server and the glyph information is again kept by the font renderer which is not directly accessible by the client. The client will simply pass the Unicode value of the character "KA" in XDrawString16 call and the font renderer will take care of it. You can try this out by writing a simple application to draw glyph for character "KA" from some Unicode encoded Indic font. So even if X protocol didn't entertain character codes to be passed in the protocol request, that documented feature (as you said) doesn't match with current scenario in X Window system. Does it mean that all X clients are *now* violating X protocol specification? > > a. inconsistent > i. how do you map a screen coordinate back to position > in the text > stream if you are doing complex text rendering? This is certainly not the job of X server. > b. incomplete > i. how do you specify text in a different character > encoding? OpenType font allows you to design your font in _any_ encoding currently in use. > ii. how do you access glyphs in a font that do not > correspond to > a `character'? This should be done using some intermediate tables in the font like GSUB table in OpenType font. Indic scripts have many glyphs which don't have any character code value in Unicode but as you must be knowing these glyphs are getting displayed in IndiX which use OpenType font. > > c. suffers from new problems > i. If you are indexing fonts using character codes, > how do you use > fonts that do not contain glyphs of 'letters'? > You don't want glyph combining and reordering > happening for > the glyphs in a symbol font for example. Your font must specify some character encoding. There will be a mapping table from "character code" in this encoding to "glyph code". Using this mapping table, you can display all the glyphs from your font. On the other hand I would like to ask you that how one can determine glyph id for a character when all the font information is kept by the server? > However, you don't need to change the X server to support > Indic > scripts. Here is one way how it would work: > > >> Client side Indic Rendering I > > o this is efficient in terms of network bandwidth (glyph > indices are sent over) Not necessary for Indic script. There is one-to-one, one-to-many, many-to-one, and many-to-many mapping between character codes and glyph codes. I gave few examples in one of my earlier mail. It means that 'm' character codes may be mapped to 'n' glyphs and it is possible that n > m. > o it doesn't break anything; you are still using the X11 > protocol :) This point we'll skip until we decide whether passing character codes actually breaks the Xprotocol. I think other points have already been addressed by Arun, so I will wait for response from you and other people. Regards, Keyur __________________________________________________ Do You Yahoo!? Yahoo! Sports - Coverage of the 2002 Olympic Games http://sports.yahoo.com |
From: Keyur S. <key...@ya...> - 2002-02-21 17:13:01
|
--- Guntupalli Karunakar <kar...@fr...> wrote: > Hi, > Please check out the preview of glyphs in the upcoming > Devanagari > opentype font at > http://www.indlinux.org/fonts/ I think you have missed out Zero-Width-Joiner (U+200D) and Zero-Width-NonJoiner (U+200C) in your font. As you must be knowing, these Unicode characters have special treament with respect to Indic scripts. Even if these characters are non-printing characters, they should have some shape in the font and you can omit their glyphs from being displayed by putting appropriate rules in the GSUB table. Moreover there should be two different glyphs for 'numeric sign five' as they have different shapes in Hindi and Marathi. Appropriate glyphs should be chosen depending on the language selection in the font. I also don't see the glyphs for Udatta, Anudatta, Acute Accent and Grave Accent. I suggest that your Unicode encoded font should contain all glyphs from Unicode under Devanagari range even if your font supports only Hindi and Marathi languages. Regards, Keyur __________________________________________________ Do You Yahoo!? Yahoo! Sports - Coverage of the 2002 Olympic Games http://sports.yahoo.com |
From: Rajkumar S <s_...@my...> - 2002-02-21 16:27:41
|
On Thu, 21 Feb 2002, Guntupalli Karunakar wrote: > Those interested in sponsoring fonts (this gives you the right to name > the font after yourself or any name you choose), please contact Venky > Hariharan <ve...@vs...> or Prakash Advani <pr...@fr...>. How can I sponsor developing a Malayalam OpenType font? How much money is involved? Who will design the glyphs? raj |
From: Guntupalli K. <kar...@fr...> - 2002-02-21 15:08:03
|
Hi, Sun's 'Standard Type Services Framework' project has started at http://stsf.sourceforge.net/ At present there are only docs related to their API. Regards, Karunakar |
From: Keyur S. <key...@ya...> - 2002-02-21 12:19:26
|
--- Guntupalli Karunakar <kar...@fr...> wrote: > Hi, > Please check out the preview of glyphs in the upcoming > Devanagari > opentype font at > http://www.indlinux.org/fonts/ > Please ignore the appearing encoding information. It > contains the > basic Unicode range + 204 glyphs ( with total of 308 > glyphs ). Right > now only those combinations that are practically used > have been > supported, unlike some existing fonts which contain all > theoretically > possible combinations. This issue can be debated later > once the font > is released. The follwoing would be more appropriate sentence :-) "Right now only those combinations that are widely used have been supported, unlike some existing fonts which contain less frequently used combinations also." Regards, Keyur __________________________________________________ Do You Yahoo!? Yahoo! Sports - Coverage of the 2002 Olympic Games http://sports.yahoo.com |
From: Guntupalli K. <kar...@fr...> - 2002-02-21 11:48:07
|
Hi, Please check out the preview of glyphs in the upcoming Devanagari opentype font at http://www.indlinux.org/fonts/ Please ignore the appearing encoding information. It contains the basic Unicode range + 204 glyphs ( with total of 308 glyphs ). Right now only those combinations that are practically used have been supported, unlike some existing fonts which contain all theoretically possible combinations. This issue can be debated later once the font is released. More updates, screenshots and text samples written using the font will follow soon. Actual font and documents regarding designing Devanagari opentype fonts along with the font sources ( font data + opentype tables ) will be released under an opensource license, once we get a sponsor for the font. Those interested in sponsoring fonts (this gives you the right to name the font after yourself or any name you choose), please contact Venky Hariharan <ve...@vs...> or Prakash Advani <pr...@fr...>. A Telugu OTF is also in the making, which is based on an existing TTF font ( Tikkana 1.2, by Prasad Chodavarapu http://chaitanya.bhaavana.net/fonts ). Will put up a preview of it also soon. Regards, Karunakar |
From: Guntupalli K. <kar...@fr...> - 2002-02-21 11:46:27
|
Hi, Please check out the preview of glyphs in the upcoming Devanagari opentype font at http://www.indlinux.org/fonts/ Please ignore the appearing encoding information. It contains the basic Unicode range + 204 glyphs ( with total of 308 glyphs ). Right now only those combinations that are practically used have been supported, unlike some existing fonts which contain all theoretically possible combinations. This issue can be debated later once the font is released. More updates, screenshots and text samples written using the font will follow soon. Actual font and documents regarding designing Devanagari opentype fonts along with the font sources ( font data + opentype tables ) will be released under an opensource license, once we get a sponsor for the font. Those interested in sponsoring fonts (this gives you the right to name the font after yourself or any name you choose), please contact Venky Hariharan <ve...@vs...> or Prakash Advani <pr...@fr...>. A Telugu OTF is also in the making, which is based on an existing TTF font ( Tikkana 1.2, by Prasad Chodavarapu http://chaitanya.bhaavana.net/fonts ). Will put up a preview of it also soon. Regards, Karunakar |
From: Arun S. <ar...@sh...> - 2002-02-21 06:10:04
|
I generated a diff for the benefit of people, who may want to review the code. http://www.sharma-home.net/~adsharma/misc/indix-patch.bz2 [ 75 KB] Keyur, hope you don't mind :) -Arun |
From: Arun S. <ar...@sh...> - 2002-02-20 21:37:21
|
On Thu, Feb 21, 2002 at 01:46:25AM +0530, Tapan S. Parikh wrote: > > > > The proposed new algorithm: > > > > FindPosition(textpos, startx, pixel_width) > > // Make a single request to the X Server - this doesn't exist in > > // the X protocol yet > > nchars = XComputeWidth(textbuf[textpos:end-of-line], startx, pixel_width, > > // other args font etc) > > > > // everything starting from textpos to textpos + nchars is "selected" > > > > Is textbuf supposed to be modified and returned by XComputeWidth? No. Textbuf contains character codes. The reordering happens on glyph codes. The X server could do the complex mapping from character -> glyph including reordering etc, then compute the width and return the result. The client is completely ignorant about the glyph codes. XComputeWidth, XPolyText' (a brand new request, which uses character codes) and friends could potentially be in an X extension for a server side implementation. > 2) From this inadequate knowledge, it seems to me that the philosophies of > Open Type Fonts / True Type Fonts and X may not match so well together, in > that the X Server expects clients to send glyph codes, but fonts are > maintained server-side (not a big deal if char codes _are_ glyph codes), > while in the TTF/OTF world, the font files contain the neccesary > information for doing char-glyph mapping. It seems in X both Server and > Client would need open type font info (the client for doing char->glyph > mapping, the server for doing rfendering, positioning, etc.) Somewhat of a > catch-22, or am I missing something? TTF and OTF were designed primarily for Windows/Mac environment, where network transparent protocols aren't the main issue. In the client side rendering model, the X server (or the X font server) will have to open the OTF file, parse the cmap tables, respond to client requests for the font information. The client uses that information to compute the ordering of the glyphs and sends PolyText requests to draw the glyphs. Thus information originates in the X server (or the font server and then moves to the X server), goes to the client, where all the heavy duty stuff happens, comes back to the server in the form of a sequence of glyph codes and rendered by the X server. This imposes some network traffic overhead, unless cached on the client. In the server side model, the code moves to where the data is - the X server, thereby reducing the network overhead. Some overhead is added because character codes occupy more bytes than glyph codes. It's not clear to me which one is the dominant overhead - some empirical data would be useful. -Arun |
From: Tapan S. P. <ta...@ya...> - 2002-02-20 20:14:45
|
> The proposed new algorithm: > > FindPosition(textpos, startx, pixel_width) > // Make a single request to the X Server - this doesn't exist in > // the X protocol yet > nchars = XComputeWidth(textbuf[textpos:end-of-line], startx, pixel_width, > // other args font etc) > > // everything starting from textpos to textpos + nchars is "selected" > Is textbuf supposed to be modified and returned by XComputeWidth? Otherwise it seems with reordering there may be some problems at the boundaries where chars are reordered. Two things I am noticing... 1) My knowledge of X and its design decisions is inadequate. 2) From this inadequate knowledge, it seems to me that the philosophies of Open Type Fonts / True Type Fonts and X may not match so well together, in that the X Server expects clients to send glyph codes, but fonts are maintained server-side (not a big deal if char codes _are_ glyph codes), while in the TTF/OTF world, the font files contain the neccesary information for doing char-glyph mapping. It seems in X both Server and Client would need open type font info (the client for doing char->glyph mapping, the server for doing rfendering, positioning, etc.) Somewhat of a catch-22, or am I missing something? --tapan _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com |
From: Arun S. <ar...@sh...> - 2002-02-20 17:55:24
|
On Wed, Feb 20, 2002 at 01:16:27AM -0800, Arun Sharma wrote: > Yes, the designers of X wanted to keep X to be nothing more than an > image rendering protocol and they probably had a reason too (which I > haven't found even after quite a bit of searching - would appreciate > references to X design rationale - I already have the OReilly Xlib book). Some more thoughts on this topic: - Most of the X designers worked for comapnies that had a thin client (X server) and fat server (X client) ideology. So naturally, they were inclined to keep the X server simple enough to be implemented in cheap hardware - NCD xterminals etc. However, the design center (at least numerically) for X has shifted to x86 PCs running some form of Free UNIX. - Advances in hardware technology also have pushed more functionality to the X server. To be fair, most of these have been in the area of "acceleration" - I think we should consider yet another alternative to the ones we're discussing. That would be (Apart from): 1. client side library implementation (Koshy's proposal) 2. server side implementation (IndiX) 3. "Character -> glyph code" server Basically, have an external process to the complex mapping between characters and glyphs. This is not very different from say XIM servers for CJK. This has the advantages of both 1 and 2, namely: - Installing Indic software on fewer machines - Simplicity of the X server - No extension the protocol and keeping the spirit of the X protocol (image drawing server) However, X has been often criticized for sluggish performance due to having too much stuff running in many different address spaces (X server, window manager, x client) and this will only add to the misery. - Another point to consider - font selection in a unicode text buffer containing codes from multiple scripts. Communication overhead might increase, if the client has to query the server for the correct font for each of the scripts. In a server side implementation, this could be done with less network overhead ? -Arun |
From: Arun S. <ar...@sh...> - 2002-02-20 09:12:54
|
On Tue, Feb 19, 2002 at 11:17:32PM -0800, Joseph Koshy wrote: Hi Koshy, > > > Arun, > > as> That is for efficiency reasons. man XTextExtents. XQueryTextExtents > as> (something more powerful than that) was the proposed new mechanism. > > And even *that* isn't being used in the Athena widget set. And when did I say Xaw was using XTextExtents or XQueryTextExtents ? My description of the FindPosition() algorithm didn't make any references to either of the two. > > Folks, please read the code before offering suggestions. It would > help to keep the signal-to-noise ratio reasonable. > The algorithm FindPosition() was written after referring to the code. It'd help the quality of the discussion, if you respected other people's intelligence and knowledge. [ back to the topic under discussion ] > as> The proposed new algorithm: > as> > as> FindPosition(textpos, startx, pixel_width) > as> // Make a single request to the X Server - this doesn't exist in > as> // the X protocol yet > as> nchars = XComputeWidth(textbuf[textpos:end-of-line], startx, pixel_width, > as> // other args font etc) > as> > as> // everything starting from textpos to textpos + nchars is "selected" > > Well, you've just changed your X client. > I never said we could support Indic scripts without changing X clients or the protocol. Obviously, some extensions are needed. What I did say however is that not everybody is interested in installing client side libraries specific to Indic scripts. I for eg, do not install Cyrillic fonts on my machine. A possible solution could consist of: 1. Some generic (i.e. script/language independent) extensions to the X protocol get standardized and installed on most machines around the world. 2. An Indic language server side extension that only someone interested in running a unicode compliant application with Indic script installs on the machine running the X server. > I thought you were going to describe an algorithm that would allow X > clients to work unchanged in the presence of arbitrary glyph > reordering, substitutioning and positioning by the X server. No. See above. > The distinction between characters and glyphs is important even for > Latin scripts. Consider ligatures and diacritical marks; some Latin > encodings have separate character codes for the diacritical marks; a > "c" and a "cedilla" (two code points) together can have a different > glyph in these languages. Similarly "f", "f" and "i" combine to form > a distinct glyph "ffi". > > The X protocol was explicitly designed NOT to support these kinds of > transformations. Yes, the designers of X wanted to keep X to be nothing more than an image rendering protocol and they probably had a reason too (which I haven't found even after quite a bit of searching - would appreciate references to X design rationale - I already have the OReilly Xlib book). Sure, we should pay attention to the wisdom of these people, but we also should keep in mind that things were very different 15 years ago. Reading: http://www.xfree86.org/~keithp/talks/render.html confirms that. However, questioning their design decisions and considering possible implementations, that introduce new extensions without breaking backward compatibility should be done, IMO. Perhaps, the right thing to do is implement Indic support in client side libraries. Who knows ? But it doesn't hurt to have all the options on the table and discuss the pros and cons of each. > > If you change this, you'll end up with some other "protocol", not the > X protocol. This new graphics "protocol" is however: > > a. inconsistent > i. how do you map a screen coordinate back to position in the text > stream if you are doing complex text rendering? Inconsistent with what ? I'd say it's more consistent because all the codes that go on the wire are character codes and glyph codes are internal to the X server. If it's possible to do it on the client side, it must be possible to do it on the server. The server has all the information it needs to do this computation. That's not to say it's desirable - just that it's possible. > > b. incomplete > i. how do you specify text in a different character encoding? Simple. Put font1 with encoding1 in the GC and call PolyText. Put font2 with encoding2 in the GC and call PolyText again. > ii. how do you access glyphs in a font that do not correspond to > a `character'? The client doesn't need to. It just deals with character strings (in the conventional meaning of the word `character'). > > c. suffers from new problems > i. If you are indexing fonts using character codes, how do you use > fonts that do not contain glyphs of 'letters'? > You don't want glyph combining and reordering happening for > the glyphs in a symbol font for example. Using a glyph code == character code encoding. > > ...etc... > > as> need to come up with the pros and cons of each approach. I've given > as> several tangible advantages of implementing it on the X > as> server. Perhaps you could articulate your thoughts on why you think > as> it should be done in a client side library ? > > Implementing Indic script support in the X server alone without > changing clients appears to be infeasible. > Agree. Changing clients is necessary - but the change could be generic and not Indic script specific. > >> Client side Indic Rendering I > > o this is efficient in terms of network bandwidth (glyph indices are > sent over) You didn't count the overhead of sending the font information from the X server to the client. As things stand now, this is a documented problem with unicode fonts with a large difference between minChar and maxChar. And this is not counting the relatively large number of glyphs for a small range of unicode code space in Indic scripts. > > o it doesn't break anything; you are still using the X11 protocol :) > There are ways of doing the server side implementations without "breaking" the letter of the X protocol, while breaking the spirit, I think. > o it will work on every X server in the world; no need for extensions. Granted. > > o the X server is still doing the rendering of glyphs onto the screen > and can apply the usual caching/pre-rendering optimizations for done > for text. True for a server side implementation too. > > o you can support multiple encodings (KSCLP, TSCII, UNICODE, whatever) > True for a server side implementation too. Multiple PolyText requests with a different font (with a different encoding) in the GC each time. > o you can support multiple algorithms for Indic rendering True for a server side implementation too. In fact, this argument works better for a server side implementation. Imagine installing: for each Indic language L: for each font (possibly using a different algorithm) A: for each client machine C: install a client side library For the server side implementation C = 1 and hopefully, we can keep A down to 1. Also, L is not a small number :) Conclusion: the only advantage I can see that's specific to this scheme is that it doesn't require any changes to the X server or the X protocol. I think the issue is an implementation detail and doesn't affect any applications, as long as they call the following time tested Xlib interface: XDrawText(display, d, gc, x, y, items, nitems) Display *display; Drawable d; GC gc; int x, y; XTextItem *items; int nitems; There may be some value in experimenting with this interface with both the approaches and learn from the experience. In some cases, though one approach may be technically superior, the "market" may decide differently. I'm yet to study the IndiX code - which I finally downloaded today. Will probably chew on it for a while. -Arun |
From: <jk...@Fr...> - 2002-02-20 07:17:33
|
Arun, as> That is for efficiency reasons. man XTextExtents. XQueryTextExtents as> (something more powerful than that) was the proposed new mechanism. And even *that* isn't being used in the Athena widget set. Folks, please read the code before offering suggestions. It would help to keep the signal-to-noise ratio reasonable. as> The proposed new algorithm: as> as> FindPosition(textpos, startx, pixel_width) as> // Make a single request to the X Server - this doesn't exist in as> // the X protocol yet as> nchars = XComputeWidth(textbuf[textpos:end-of-line], startx, pixel_width, as> // other args font etc) as> as> // everything starting from textpos to textpos + nchars is "selected" Well, you've just changed your X client. I thought you were going to describe an algorithm that would allow X clients to work unchanged in the presence of arbitrary glyph reordering, substitutioning and positioning by the X server. as> I just said it was inconsistent in the use of character codes vs glyph as> codes - not that it was ambiguous or in error. This seems to be a as> consequence of it being designed at a time, when the distinction between as> the two was not as important as it is today. The distinction between characters and glyphs is important even for Latin scripts. Consider ligatures and diacritical marks; some Latin encodings have separate character codes for the diacritical marks; a "c" and a "cedilla" (two code points) together can have a different glyph in these languages. Similarly "f", "f" and "i" combine to form a distinct glyph "ffi". The X protocol was explicitly designed NOT to support these kinds of transformations. as> And you yourself (along with others on this list) accepted that certain as> references were ambiguous. What's all the fuss about then ? :) One place in the X protocol specification uses the phrase 'string of characters'. Now the word 'character' has (today) become an overloaded phrase, with meanings ranging from the visual representation (the letterform), the 'abstract' character itself, the code point assigned to the character in a given encoding, a specific glyph in a font, etc. The exact meaning is usually clear from the context. Nowhere does the X11 protocol specification say that 'character codes' are to be used in text drawing requests. In fact, it EXPLICITLY states that the semantics of character `codes' are NOT to be honored by the X server. If you change this, you'll end up with some other "protocol", not the X protocol. This new graphics "protocol" is however: a. inconsistent i. how do you map a screen coordinate back to position in the text stream if you are doing complex text rendering? b. incomplete i. how do you specify text in a different character encoding? ii. how do you access glyphs in a font that do not correspond to a `character'? c. suffers from new problems i. If you are indexing fonts using character codes, how do you use fonts that do not contain glyphs of 'letters'? You don't want glyph combining and reordering happening for the glyphs in a symbol font for example. ...etc... as> need to come up with the pros and cons of each approach. I've given as> several tangible advantages of implementing it on the X as> server. Perhaps you could articulate your thoughts on why you think as> it should be done in a client side library ? Implementing Indic script support in the X server alone without changing clients appears to be infeasible. However, you don't need to change the X server to support Indic scripts. Here is one way how it would work: >> Client side Indic Rendering I In a client side rendering model, the client transforms: `M' code-points -> `N' PolyText protocol requests The client then draws glyphs on screen using the standard PolyText/ImageText requests. In this model, the client does the necessary glyph substitution, reordering and positioning, using whatever algorithm appropriate for the script it is processing it chooses. The end result of the transformation is a set of [font, x/y-position, glyph-lists] tuples that would go out as protocol requests. Further, in this model, the client has all the information required to map an [x,y] screen coordinate returned in an X event back to a position in the 'text' stream (since it did all the reordering, positioning and glyph substitution). o this is efficient in terms of network bandwidth (glyph indices are sent over) o it doesn't break anything; you are still using the X11 protocol :) o it will work on every X server in the world; no need for extensions. o the X server is still doing the rendering of glyphs onto the screen and can apply the usual caching/pre-rendering optimizations for done for text. o you can support multiple encodings (KSCLP, TSCII, UNICODE, whatever) o you can support multiple algorithms for Indic rendering The downside: Client side rendering requires fonts to be coded to a well-known font encoding scheme, since the client has to transform character code-points to lists of glyph indices and their positions. Question to the list: What font encoding standards are available for indic scripts? How complete are they --- do they cover every letterform (graphical shape) used by a language's writing system? >> Client side Indic Rendering II Another way of getting Indic rendering to work without any X server modifications would be to have the client render glyphs onto a bitmap and send this "final" bitmap across. I.e, the client transforms `M' code points -> 1 bitmap This doesn't have the dependency on "well-known" font encodings (in fact the font need not be present at the X server at all) but has at least three drawbacks: o sending a bitmap over is costlier than sending over glyph indices o the client has to do text rendering inside of itself, adding to its complexity, and complexity of administration o the X server can't optimize its use of the glyphs of a font The other characteristics are like that of ``Client Side Indic Rendering I''. Regards, Koshy <jk...@fr...> |
From: <fpo...@ba...> - 2002-02-19 19:10:58
|
> As one of the non-Programmers (well, I can't claim that for much > longer...) on the list, I would appreciate a hint as to which parts > of the X protocol specification is relevant for the understanding of > i18n issues. >ls requests | egrep -i font|text >In other words, in the X protocol doc (www.x-docs.org), all requests >containing the word "Text" or "Font". Thanks. -Frank |
From: Arun S. <ar...@sh...> - 2002-02-19 19:05:16
|
On Tue, Feb 19, 2002 at 09:53:19AM -0500, fpo...@ba... wrote: > As one of the non-Programmers (well, I can't claim that for much > longer...) on the list, I would appreciate a hint as to which parts > of the X protocol specification is relevant for the understanding of > i18n issues. ls requests | egrep -i font\|text In other words, in the X protocol doc (www.x-docs.org), all requests containing the word "Text" or "Font". Also, the open type font specs from Adobe and MS. -Arun |
From: Arun S. <ar...@sh...> - 2002-02-19 19:03:14
|
On Tue, Feb 19, 2002 at 10:10:41AM -0800, Arun Sharma wrote: > The proposed new algorithm: > > FindPosition(textpos, startx, pixel_width) > // Make a single request to the X Server - this doesn't exist in > // the X protocol yet > nchars = XComputeWidth(textbuf[textpos:end-of-line], startx, pixel_width, > // other args font etc) > > // everything starting from textpos to textpos + nchars is "selected" > Hmm, I don't think I needed to invent a new protocol request. XQueryTextExtents seems to be good enough. Another thought: we'll have to implement this algorithm on the X server side with Open Type fonts anyway, in order to service this particular request. Why reimplement it on a client side library ? To summarize my thoughts on the advantages of taking the client side approach: - Preserve status quo (use glyph codes) - Less pressure on the X server - good for "thin clients" = "thin X servers" - Reduced network traffic ? I think we'll have to empirically determine this one (Is XQueryTextExtents traffic >> Shipping the Open Type font info to the client once and processing XTextExtents locally ?). On the other hand, X has not been optimized for network efficiency (it always assumed a fast ethernet environment). More ? -Arun |
From: Arun S. <ar...@sh...> - 2002-02-19 18:07:03
|
On Tue, Feb 19, 2002 at 01:44:57AM -0800, Joseph Koshy wrote: > > as> However, if your point was that the client can't easily map (x[1], > as> y[1], x[2], y[2]) to a UTF-8 string, I don't think it would be > as> much harder than the existing algorithms in: > as> xc/lib/Xaw/AsciiSink.c - FindPosition() > as> In a nutshell, the server, which has the knowledge of complex > as> glyph codes and reordering, responds to client requests for > as> XQueryTextExtents. > > Well, the Xaw widget set doesn't seem to be using XQueryTextExtents() > at all. That is for efficiency reasons. man XTextExtents. XQueryTextExtents (something more powerful than that) was the proposed new mechanism. > > I'd really like to see this 'not so hard' algorithm whose existence > you have postulated :). > The existing algorithm in Xaw: // startx = x[1] // pixel_width = x[2] - x[1] FindPosition(textpos, startx, pixel_width) nchars = 0 curpos = startx while 1: // Uses XFontStruct for efficiency width = compute the width of the next char in the text buf nchars++; curpos += width if (curpos >= startx + pixel_width) break; // everything starting from textpos to textpos + nchars is "selected" The proposed new algorithm: FindPosition(textpos, startx, pixel_width) // Make a single request to the X Server - this doesn't exist in // the X protocol yet nchars = XComputeWidth(textbuf[textpos:end-of-line], startx, pixel_width, // other args font etc) // everything starting from textpos to textpos + nchars is "selected" The X server would deal with all the context sensitive reordering and joining and computes nchars. I don't have an algorithm for doing this, but am postulating that the X server has all the information that it needs to compute nchars. > as> I think we should bring this up on the right XFree86 fora and > as> resolve it there. > > I think that it would be prudent to first understand how the X window > system actually works. Is it really that important ? :) > Especially so, if you are going to claim that > the X protocol specification is ambiguous/in error, and that the error > has been undetected for the two decades (or so) that the specification > has been around :). I just said it was inconsistent in the use of character codes vs glyph codes - not that it was ambiguous or in error. This seems to be a consequence of it being designed at a time, when the distinction between the two was not as important as it is today. And you yourself (along with others on this list) accepted that certain references were ambiguous. What's all the fuss about then ? :) I think, where we stand today, both the approaches are feasible and we need to come up with the pros and cons of each approach. I've given several tangible advantages of implementing it on the X server. Perhaps you could articulate your thoughts on why you think it should be done in a client side library ? -Arun |
From: Keyur S. <key...@ya...> - 2002-02-19 17:32:15
|
Hi, At present I am in Delhi for IndiaSoft 2002 exhibition. I would like to put some comment on this after coming back to Mumbai on 25th February. Regards, Keyur --- Joseph Koshy <jk...@Fr...> wrote: > > > ks> While starting my work on IndiX, I also decided to > give support > ks> using an X extension. But unfortunately, I had to > work under > ks> strictly imposed constraints :-( esp. that > applications should not > ks> be modified for Indian language support. So I did the > thing in > ks> whatever way the people wanted me to do and also > tried to do it in > ks> best possible way! > > Most unmodified X applications may not work correctly > with an X server > that does "behind-the-scenes" glyph reordering and > substitution. > > Consider the following scenario: > - the user presses Button-1 down on some `x[1],y[1]' > location on screen > and sweeps the pointer over the screen > - Button-1 is released at location `x[2],y[2]' > > These screen coordinates get reported back to the > application in the > form of "events". Given these two pixel coordinates, the > X > application has to figure out the region of the > underlying text that > was "selected". This involves going backwards from 'x,y' > coordinates > to the character code points in its text buffer. > > If the X server is doing arbitrary glyph reordering and > glyph > substitution unknown to the X client, then this > translation will go > wrong in the client. > > I don't think the requirement of X applications running > unchanged with > Indic scripts is a feasible one. > > Regards, > Koshy > <jk...@fr...> > > _______________________________________________ > Indic-computing-devel mailing list > http://indic-computing.sourceforge.net/ > Ind...@li... > https://lists.sourceforge.net/lists/listinfo/indic-computing-devel __________________________________________________ Do You Yahoo!? Yahoo! Sports - Coverage of the 2002 Olympic Games http://sports.yahoo.com |