indic-computing-devel Mailing List for The Indic-Computing Project (Page 22)
Status: Alpha
Brought to you by:
jkoshy
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(14) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(25) |
Feb
(90) |
Mar
(41) |
Apr
(16) |
May
(8) |
Jun
|
Jul
(37) |
Aug
(35) |
Sep
(62) |
Oct
(37) |
Nov
(22) |
Dec
(7) |
2003 |
Jan
(16) |
Feb
(19) |
Mar
(10) |
Apr
(5) |
May
(26) |
Jun
(11) |
Jul
(35) |
Aug
(4) |
Sep
(14) |
Oct
(5) |
Nov
(5) |
Dec
(10) |
2004 |
Jan
(25) |
Feb
(2) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(10) |
Aug
(2) |
Sep
(2) |
Oct
(1) |
Nov
(9) |
Dec
|
2005 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2006 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(4) |
Dec
|
From: <fpo...@ba...> - 2002-02-19 15:07:23
|
Hello, >I think that it would be prudent to first understand how the X window >system actually works. Especially so, if you are going to claim that >the X protocol specification is ambiguous/in error, and that the >error >has been undetected for the two decades (or so) that the >specification >has been around :). As one of the non-Programmers (well, I can't claim that for much longer...) on the list, I would appreciate a hint as to which parts of the X protocol specification is relevant for the understanding of i18n issues. Chapter and verse would be appreciated:) >Here is a short list of reading material, that I found useful: >o Among others, O'Reilly Inc. publishes a set of books on X window > system programming which cover the basics of the system Although they are NOT about X itself, it might be helpful to know that you can find the Motif books online. http://www.oreilly.com/openbook/motif/ -Frank |
From: <jk...@Fr...> - 2002-02-19 09:45:00
|
as> Another observation: X doesn't seem to be consistent on where the as> character -> glyph mapping should be done. While many sources hint as> that the codes in the requests (for eg: PolyText, RENDER extension as> etc) should be glyph codes, there are others who indicate that the as> values stored in XSelections should be UTF-8. X selections are a client side concept: defined and managed by clients, not the X server. Selections are built using X "properties" (name/value pairs). The X server serves as a repository for properties but does not deal with their contents. This is basic X (application) programming stuff. as> However, if your point was that the client can't easily map (x[1], as> y[1], x[2], y[2]) to a UTF-8 string, I don't think it would be as> much harder than the existing algorithms in: as> xc/lib/Xaw/AsciiSink.c - FindPosition() as> In a nutshell, the server, which has the knowledge of complex as> glyph codes and reordering, responds to client requests for as> XQueryTextExtents. Well, the Xaw widget set doesn't seem to be using XQueryTextExtents() at all. I'd really like to see this 'not so hard' algorithm whose existence you have postulated :). as> I think we should bring this up on the right XFree86 fora and as> resolve it there. I think that it would be prudent to first understand how the X window system actually works. Especially so, if you are going to claim that the X protocol specification is ambiguous/in error, and that the error has been undetected for the two decades (or so) that the specification has been around :). Here is a short list of reading material, that I found useful: o Among others, O'Reilly Inc. publishes a set of books on X window system programming which cover the basics of the system. People who are interested on working with/extending X SHOULD first read and understand these. o The mailing lists hosted at XFree86.org are a good resource, though they assume that you are already familiar with the basic design issues. o The newsgroup "comp.windows.x" is another resource which could be useful on the days the S/N ratio is tolerable. o Documentation in the X source tree "xc/doc/*" Regards, Koshy <jk...@fr...> |
From: Arun S. <ar...@sh...> - 2002-02-18 20:57:26
|
Joseph Koshy wrote: > >If the X server is doing arbitrary glyph reordering and glyph >substitution unknown to the X client, then this translation will go >wrong in the client. > Possible. The proposal over here: http://www.pps.jussieu.fr/~jch/software/UTF8_STRING/ may be relevant. It is basically saying that the X selections should be UTF8_STRINGs and not glyph codes (which may be different between different fonts). However, if your point was that the client can't easily map (x[1], y[1], x[2], y[2]) to a UTF-8 string, I don't think it would be much harder than the existing algorithms in: xc/lib/Xaw/AsciiSink.c - FindPosition() In a nutshell, the server, which has the knowledge of complex glyph codes and reordering, responds to client requests for XQueryTextExtents. I do see the problem of increased network traffic - but I see it as unavoidable. If we have to do the same thing on the client side, we'll have to communicate the information in the open type cmap tables over the X protocol to the client. > >I don't think the requirement of X applications running unchanged with >Indic scripts is a feasible one. > I'd qualify that with "as things stand today". However, if Xlib is made fully Unicode or UTF-8 capable, I believe (based on my limited understanding of X) that we could make it work, with server side modifications only. Another observation: X doesn't seem to be consistent on where the character -> glyph mapping should be done. While many sources hint that the codes in the requests (for eg: PolyText, RENDER extension etc) should be glyph codes, there are others who indicate that the values stored in XSelections should be UTF-8. In one case the X server doesn't know about the character codes, in the other it does. This is true of the X protocol also - some bits on the wire indicate glyph codes (PolyText16) and some (UTF8_STRING) indicate character codes. Does anyone know how the Indic support on MS windows works with network transparent protocols like Citrix ? Their marketing literature (talk about thin clients) seems to hint at pushing the complexity to the windows terminal server side (Windows Terminal Server = analogous X client). In that respect, it makes sense to push the complexity to the X client, because, the X server could be a very low powered hand held, not capable of dealing with the complexity. This is the only argument I can find in favour of doing reordering etc on the X client side. I think we should bring this up on the right XFree86 fora and resolve it there. -Arun |
From: <jk...@Fr...> - 2002-02-18 10:48:34
|
ks> While starting my work on IndiX, I also decided to give support ks> using an X extension. But unfortunately, I had to work under ks> strictly imposed constraints :-( esp. that applications should not ks> be modified for Indian language support. So I did the thing in ks> whatever way the people wanted me to do and also tried to do it in ks> best possible way! Most unmodified X applications may not work correctly with an X server that does "behind-the-scenes" glyph reordering and substitution. Consider the following scenario: - the user presses Button-1 down on some `x[1],y[1]' location on screen and sweeps the pointer over the screen - Button-1 is released at location `x[2],y[2]' These screen coordinates get reported back to the application in the form of "events". Given these two pixel coordinates, the X application has to figure out the region of the underlying text that was "selected". This involves going backwards from 'x,y' coordinates to the character code points in its text buffer. If the X server is doing arbitrary glyph reordering and glyph substitution unknown to the X client, then this translation will go wrong in the client. I don't think the requirement of X applications running unchanged with Indic scripts is a feasible one. Regards, Koshy <jk...@fr...> |
From: Ashish K. <as...@mi...> - 2002-02-18 08:44:22
|
Just wanted to add IIIT-Hyderabad who are also doing work in this space. Are they already on this list? - Ashish ----- Original Message ----- From: Keyur Shroff <key...@ya...> > them to this mailing list. Are there people from > IIT-Kanpur, IIT-Madras, C-DAC on this list? If they are not > there then I think we should tell them to join the list so > that we can work in unity. |
From: Arun S. <ar...@sh...> - 2002-02-17 09:53:54
|
All, I spent some time thinking about how we should develop the indic computing model with X. Of the two approaches (client side libraries vs server side), I tend to think that the changes should go on the server side. The rationale is - if I'm remotely logged into a box in some corner of the world (say Timbuktu) and running a unicode compliant X application, I can't really expect the adminstrator of the box to install Indic support. However, I can always control what goes into my own machine, which is running the X server. My understanding is that IndiX uses the server side approach. However, the issue Koshy brought up is that it breaks the X11 protocol, which is being looked into. Some relevant questions: Do we all agree that * the arguments to the PolyText and PolyText16 requests are character codes in UTF-8 and not glyph codes ? * the reordering and conjunctions etc happen on the server side ? * No X11 protocol extensions are necessary ? I saw talk of using the RENDER extension (which was also recommended by the XFree86 folks). While using an extension has the advantage of not having to worry about protocol breakages, it doesn't come for free. I'll have to require the sysadmin in Timbuktu to install the library part of the extension. Also, the main purpose of RENDER seems to be to produce anti-aliased text using alpha blending techniques. Other disadvantages: * RENDER uses glyphs, rather than character codes as a part of the protocol, which runs into the Timbuktu issue Also, I think we need to ping the XFree86 developers to see how receptive they're to the idea of interpreting the bytes in XDrawString16 as character codes and not glyph codes. The writings here: http://www.cl.cam.ac.uk/~mgk25/unicode.html#x11 still seem to assume glyph codes and character codes. Keyur, I was wondering if we could make your work into a library that can be made into a loadable module. This would ease the binary distribution issues. I'm very willing to spend some cycles there. Looking forward to hearing your comments, -Arun |
From: Guntupalli K. <kar...@fr...> - 2002-02-15 14:04:32
|
On Fri, 15 Feb 2002 02:52:56 -0800 (PST) Keyur Shroff <key...@ya...> wrote: > > --- Arun Sharma <ar...@sh...> wrote: > > > > The JDK 1.4 announcement that went out today claimed > > hi_IN support. I've > > downloaded and installed the Linux version, but can't > > find anything > > specific to Hindi. Does anyone on this forum know exactly > > what the > > announcement means ? > > I am not sure but this was probably for Java on Windows. > Couple of weeks ago we searched on the internet for Java > Hindi support on Linux. At that time we found that Java > supports rendering of Indic scripts on Windows platform > only and it was contributed by IBM. The classes were first > designed by IBM then handed over to Sun. We also found that > Java doesn't support Indic rendering on Linux. We really > have to check it out for this latest version of JDK. > It works on linux also (since all betas ). I think it does this only for swing. components. Harsha - please throw more light on this. Regards, Karunakar |
From: Karunakar <kar...@fr...> - 2002-02-15 13:44:37
|
----- Original Message ----- From: "Arun Sharma" <ar...@sh...> To: <ind...@li...> Sent: Friday, February 15, 2002 11:57 AM Subject: [Indic-computing-devel] Java and hi_IN support > > The JDK 1.4 announcement that went out today claimed hi_IN support. I've > downloaded and installed the Linux version, but can't find anything > specific to Hindi. Does anyone on this forum know exactly what the > announcement means ? > Jdk1.4 has a hi_IN unicode locale It has the indic rendering support built in for swing components and also a opentype font (LucidaSansRegular) supporting devanagari. (This font is diff than the one in IBM's JDK ) You can chek the support by using '\uXXXX' char sequences in strings say like '\u0915'+'\u094d'+'\u0930' (this gives a kra ) java based input methods are needed to type in linux. IISc guys can throw more light on this, they did a notepad application with hindi input methods. Regards, Karunakar |
From: Keyur S. <key...@ya...> - 2002-02-15 10:52:57
|
--- Arun Sharma <ar...@sh...> wrote: > > The JDK 1.4 announcement that went out today claimed > hi_IN support. I've > downloaded and installed the Linux version, but can't > find anything > specific to Hindi. Does anyone on this forum know exactly > what the > announcement means ? I am not sure but this was probably for Java on Windows. Couple of weeks ago we searched on the internet for Java Hindi support on Linux. At that time we found that Java supports rendering of Indic scripts on Windows platform only and it was contributed by IBM. The classes were first designed by IBM then handed over to Sun. We also found that Java doesn't support Indic rendering on Linux. We really have to check it out for this latest version of JDK. Regards, Keyur __________________________________________________ Do You Yahoo!? Got something to say? Say it better with Yahoo! Video Mail http://mail.yahoo.com |
From: Keyur S. <key...@ya...> - 2002-02-15 07:05:16
|
Hello, There are many other people/organizations involved in Indic computing development. I feel that we should also bring them to this mailing list. Are there people from IIT-Kanpur, IIT-Madras, C-DAC on this list? If they are not there then I think we should tell them to join the list so that we can work in unity. Regards, Keyur __________________________________________________ Do You Yahoo!? Got something to say? Say it better with Yahoo! Video Mail http://mail.yahoo.com |
From: Keyur S. <key...@ya...> - 2002-02-15 07:03:32
|
--- "Tapan S. Parikh" <ta...@ya...> wrote: > > > > > > [Note: You could possibly think of a character > encoding where text > > > is encoded in "visual" order. Some > transliteration schemes for > > > indian languages use such "visual" order > encodings. ] > > > > > > > this is like the shusha font scheme uses. > > > > But then this typically wouldnt match the user input > order. Do you do some > reordering at the input level, or must user input in > visual order also? In fact there are many transliteration schemes in existence. Susha is also one among them. In this scheme user is expected to type exactly in visual order and no reordering is done at the input level. There is one-to-one mapping between character code and glyph code. However this kind of scheme doesn't deal very well with complex Indic scripts. However it is somewhat suitable for web applications where rich typography is not expected and emphasis is put on simpler solution. Unfortunately, it breaks the standards. Regards, Keyur __________________________________________________ Do You Yahoo!? Got something to say? Say it better with Yahoo! Video Mail http://mail.yahoo.com |
From: Arun S. <ar...@sh...> - 2002-02-15 06:31:27
|
The JDK 1.4 announcement that went out today claimed hi_IN support. I've downloaded and installed the Linux version, but can't find anything specific to Hindi. Does anyone on this forum know exactly what the announcement means ? -Arun |
From: Tapan S. P. <ta...@ya...> - 2002-02-15 06:06:54
|
> > > [Note: You could possibly think of a character encoding where text > > is encoded in "visual" order. Some transliteration schemes for > > indian languages use such "visual" order encodings. ] > > > > this is like the shusha font scheme uses. > But then this typically wouldnt match the user input order. Do you do some reordering at the input level, or must user input in visual order also? _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com |
From: Keyur S. <key...@ya...> - 2002-02-14 11:02:45
|
--- Guntupalli Karunakar <kar...@fr...> wrote: > > glyph selection/reordering are script/language > specific. Yes. It is script specific. But a routine can be written so that it looks like script independent. It should be table driven and the modules should be arranged in proper way. See the reordering algorithm available on Microsoft's website. My experience says that there are some flaws in the algorithm. It doesn't cover some exceptional cases. I'll update on it when proper time comes. Some useful links http://rohini.ncst.ernet.in/shrinath/indicFeb1999.pdf http://www.emille.lancs.ac.uk/lesal/omvikas.pdf http://www.emille.lancs.ac.uk/ Regards, Keyur __________________________________________________ Do You Yahoo!? Send FREE Valentine eCards with Yahoo! Greetings! http://greetings.yahoo.com |
From: Guntupalli K. <kar...@fr...> - 2002-02-14 08:31:26
|
On Mon, 11 Feb 2002 23:05:32 -0800 (PST) jk...@Fr... (Joseph Koshy) wrote: > > > I'm trying to refine my understanding of the basic algorithms > involved in Indic glyph rendering, for future inclusion into the > Handbook. > A well researched model used in Win2k/XP (similar used in Indix) is documented at http://www.microsoft.com/typography/otspec/indicot/shaping.htm The complete document is at http://www.microsoft.com/typography/otspec/indicot/default.htm but the gist of above has already been said in another reply to this post, so I just cover the font issues. > [Note: You could possibly think of a character encoding where text > is encoded in "visual" order. Some transliteration schemes for > indian languages use such "visual" order encodings. ] > this is like the shusha font scheme uses. > (B) is a property of the script: most (all?) indic scripts have > special glyph shapes for double-consonants, consonants+vowel > combinations, etc. > > So, our rendering process has to map: > > `M' code points -> `N' language glyph shapes > > and in doing so we have to do glyph re-ordering "(A)" and composite > glyph selection "(B)". > > [Q: Are there any other issues to be taken care of when rendering > indic scripts? ] > > Some indian language fonts are designed to contain "partial glyphs"; > these fonts require a sequence of glyphs to be specified to render a > full language glyph on screen (for example, Baraha (Kannada)). For > such fonts, each of the `N' language glyph shapes selected above > will need to be mapped further into `O' font-specific glyph indices. > This was done because of the restriction in 8 bit fonts, where approx not more than 220 encoded glyphs for indian language can be put. More glyphs can be put but no code can be assigned if its to be an 8 bit font. All this mapping info you need to put in ur code , or in a seperate file. or with opentype font in the OT tables. > My questions are: > > - do we do reordering of glyphs (A) before looking for composite > glyphs (B), or is it best done the other way round? > Reordering is done at character code level. > - do (A) and (B) have to be done multiple times? > (A) once, (B) multiple times, at each step u look for a specific combination of glyphs, and do the substitution. > - is there ONE algorithm that can handle correct glyph rendering > for every indic script, or are the glyph selection/re-ordering > algorithms language specific? > glyph selection/reordering are script/language specific. Say like in Devanagari, only reordering needs to be done with , the VS 'I' & the 'RA' forms . In langauages like bengali, tamil etc. where you have surrounding vowels ( here vowel sign has 2 parts , that go to either side of the base consonant ). Also like hindi & marathi use same script, but there are variations in glyph shapes for some characters eg SHA (U0936) glyph in marathi is different than that used for hindi, so also for some digits like '8' , '9' The is as such no ONE algorithm, but since the microsoft way (Uniscribe + Opentype layout services library + Indic fonts ) has been (researched & ) documented well, it has become kind of 'defacto standard' of doing indic rendering. Pango is also follows the uniscribe model. Freetype project is working on a 'Freetype services layout' library which will to the opentype stuff in freetype. Regards, Karunakar |
From: Keyur S. <key...@ya...> - 2002-02-14 05:57:01
|
There is a "typo" error in my previous mail. Please correct it. --- Keyur Shroff <key...@ya...> wrote: > In (A), the first two character should be turned into > 'Reph' glyph and place on top of glyph for U+0916 since > it > is base character. For that the sequence U+0930 and > U+094D > should be moved after U+0917 and the first syllable will ^^^^^^ Read it as : should be moved after U+0916 and the first syllable will > be > reordered as > > A1. U+0915 U+094D U+0916 U+0930 U+094D Regards, Keyur __________________________________________________ Do You Yahoo!? Send FREE Valentine eCards with Yahoo! Greetings! http://greetings.yahoo.com |
From: Keyur S. <key...@ya...> - 2002-02-14 05:45:58
|
--- Arun Sharma <ar...@sh...> wrote: > > > > > > - do (A) and (B) have to be done multiple times? > > > > Yes. It has to be done for each syllable. > > Let me pose the question in a different way - can it be > done with one > pass over the character codes ? No. It has to be done for each syllable. While reordering, we look at the characteristic of characters, e.g., whether it is base character or matra, etc. Doing it in one pass is not possible as it mixes all the characters and it becomes very difficult to decide at which place a character sequence should move. > If not, can you give an > example of a > character code string, where multiple passes would be > necessary to > determine the correct sequence of glyphs ? Take a simple example, U+0930 U+094D U+0915 U+094D U+0916 U+0917 In the above sequence there are two syllables, A. U+0930 U+094D U+0915 U+094D U+0916 B. U+0917 In (A), the first two character should be turned into 'Reph' glyph and place on top of glyph for U+0916 since it is base character. For that the sequence U+0930 and U+094D should be moved after U+0917 and the first syllable will be reordered as A1. U+0915 U+094D U+0916 U+0930 U+094D No reordering is necessary for the second syllable since there is only one character. Please note that this reordering is somewhat depends on the design of OpenType font substitution table (GSUB). Microsoft and Adobe has published guidelines to design OpenType fonts so that this reordering remains almost same for all fonts. This can not be done within one pass because it is difficult to determine whether U+0916 is a base character or U+0917 and consequently where to move the sequence U+0930 U+094D. > > Also a related question - are there any algorithms to > efficiently > determine how far back do you go in a string of > characters (for doing > a re-layout), when somebody hits a backspace ? Stop at > the word boundary ? A generally accepted action for Backspace is that it should remove the last character from previous syllable. However, Delete key should remove entire next syllable. Similarly, cursor should move over the syllables not characters. These are not standards but people generally feel comfortable with these operations. The same mechanism has been adopted in MS Windows 2000/XP. Using the same "Syllable Breaking State Machine" we can determione how far to move back or forth. > > Is it correct to state that the state machine that you > talked about is > specified in the font file and not a special purpose > library ? No. Syllable breaking state machine and Reordering of characters is not a part of font but they are part of special purpose library. OpenType font contains only logic for Substitution and Positioning. And Reordering is absolutely necessary for Substitution. However, in ISCII/ISFOC standard, even this substitution logic is part of the library and not the font. But there font designer must design the font according to ISFOC standard layout. In this standard whole GlyphSet has been defined and each glyph must occupy previously defined position. ISCII/ISFOC doesn't cover information for positioning of glyphs and does not provide so called "Rich Typography". Regards, Keyur __________________________________________________ Do You Yahoo!? Send FREE Valentine eCards with Yahoo! Greetings! http://greetings.yahoo.com |
From: Arun S. <ar...@sh...> - 2002-02-13 23:49:08
|
On Wed, Feb 13, 2002 at 01:10:40AM -0800, Keyur Shroff wrote: > > > > - do (A) and (B) have to be done multiple times? > > Yes. It has to be done for each syllable. Let me pose the question in a different way - can it be done with one pass over the character codes ? If not, can you give an example of a character code string, where multiple passes would be necessary to determine the correct sequence of glyphs ? Also a related question - are there any algorithms to efficiently determine how far back do you go in a string of characters (for doing a re-layout), when somebody hits a backspace ? Stop at the word boundary ? Is it correct to state that the state machine that you talked about is specified in the font file and not a special purpose library ? -Arun |
From: Keyur S. <key...@ya...> - 2002-02-13 09:29:07
|
--- Joseph Koshy <jk...@Fr...> wrote: > > Shall we move further discussion on X protocol issues off > this list? > I'm afraid we may be moving away from the charter of > <indic-computing-devel>. Like Tapan, I also believe that this discussion is more or less related to [Indic-computing-devel]. But if you wish, we can move it off the list. Regards, Keyur __________________________________________________ Do You Yahoo!? Send FREE Valentine eCards with Yahoo! Greetings! http://greetings.yahoo.com |
From: Keyur S. <key...@ya...> - 2002-02-13 09:25:48
|
--- "Tapan S. Parikh" <ta...@ya...> wrote: > Let me just add that I tend to think that adding support > for rendering > complex scripts through an extension or a toolkit may > make it hard(er) to > port existing X apps (XEmac, Mozilla, etc.) to support > Indic Languages, but > that is a trade-off we may have to live with, esp. if it > involves a > grievous violation of the X Protocol. Maybe I was just > being too > revolutionary in my thinking. On this point I am fully agree with Koshy's idea that we should write an extention. X has been evolved through extentions. Writing an extention and then incorporating them into various applications is a long process. However in my opinion this is accceptable solution which may be proved better in long duration. Changing or breaking X protocol is definately not a great idea. Provision is there for you to define new data structure and write an extention. So why not to take advantage of it? While starting my work on IndiX, I also decided to give support using an X extension. But unfortunately, I had to work under strictly imposed constraints :-( esp. that applications should not be modified for Indian language support. So I did the thing in whatever way the people wanted me to do and also tried to do it in best possible way! (I have downloaded test suite. But I think I'll have to read documents before running it. At the moment I am busy with preparation of IndiaSoft exhibition to be held in Delhi. So after 24th Feb only I'll be able to test it). > However, if we modified the X system in a well-understood > what, the > behavior of which was well-documented and easy to reason > about, than the > approach Keyur took, that forfeits 100% compatibility for > existing > applications, but does a good job for developing new > applications and > porting popular existing applications for use in the > Indian market, this > does not seem like an unreasonable direction. For > instance we have lived > with both System V and BSD now for over thirty years, so > I dont see why > this couldnt co-exist with "Standard" X. Maybe once > again Im being too > revolutionary... And we also have two desktop environments in X (KDE & GNOME). ;-) Regards, Keyur __________________________________________________ Do You Yahoo!? Send FREE Valentine eCards with Yahoo! Greetings! http://greetings.yahoo.com |
From: Keyur S. <key...@ya...> - 2002-02-13 09:10:41
|
Hi, --- Joseph Koshy <jk...@Fr...> wrote: > (B) these scripts use a number of glyph shapes > representing > combinations of characters, so there isn't a 1-1 > mapping of > character encoding code points to glyphs. > So, our rendering process has to map: > > `M' code points -> `N' language glyph shapes Yes. All of the following mappings are possible. In the bracket there are sample character codes : Charcode(s) Glyphcode(s) one -> one (e.g., U+0915) one -> many (e.g., U+0BCA) many -> one (e.g., Kssa conjunct in Devanagari) many -> many (e.g., Other consonant conjuncts) > [Q: Are there any other issues to be taken care of when > rendering indic scripts? ] "Syllable breaking" is a major issue that we have to take care of. A rule based algorithm is used to determine boundaries of syllables in a given character string. Following is the sequence : (1) Take the input string (2) Break it into various scripts For each script run, (3) Break the string depending upon various properties (if applicable), e.g., colour of characters. (4) Now break them into syllables For each syllable, (5) Reorder characters within the syllable (6) Get glyph codes (6) Apply Substitution to get new glyph codes (7) Apply Positioning (if applicable) (8) Render each glyph > > My questions are: > > - do we do reordering of glyphs (A) before looking for > composite > glyphs (B), or is it best done the other way round? Yes. Reordering is necessary before looking for the glyphs. We reorder the characters so that some character sequence which is converted into some attached glyph are placed on top (or below) of base character. Now this base character can only be determine from the property of character code. We can not determine it from glyph codes unless we maintain history. So it is easier to reorder the characters before converting them into glyphs. > > - do (A) and (B) have to be done multiple times? Yes. It has to be done for each syllable. > > - is there ONE algorithm that can handle correct glyph > rendering for every indic script, or are the glyph > selection/re-ordering algorithms language specific? Syllable breaking logic is common for all Indic scripts. We can classify each character in all Indic scripts. Syllable breaking state machine uses these classes to determine syllable boundary. Reordering of characters is however script specific. Regards, Keyur __________________________________________________ Do You Yahoo!? Send FREE Valentine eCards with Yahoo! Greetings! http://greetings.yahoo.com |
From: Tapan S. P. <ta...@ya...> - 2002-02-13 08:07:04
|
>>[Q: Are there any other issues to be taken care of when rendering >> indic scripts? ] Positioning - Some glyphs have to be positioned specially depending on the glyphs that are nearby. I have some knowledge on the other questions, but Ill wait for more authorative answers from Keyur or Karunakar. Sorry if I took the X question too far... maybe in the end you are right Koshy.... -- Tapan _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com |
From: Tapan S. P. <ta...@ya...> - 2002-02-13 07:44:47
|
I tend to think that this discussion is quite germane to <indic-computing-devel>, as it plays directly into issues neccesary for a bootable OS (esp. if we are basing that on the X arch), but if others feel otherwise, I dont mind doing it back-channel, or on the X lists themselves. Let me just add that I tend to think that adding support for rendering complex scripts through an extension or a toolkit may make it hard(er) to port existing X apps (XEmac, Mozilla, etc.) to support Indic Languages, but that is a trade-off we may have to live with, esp. if it involves a grievous violation of the X Protocol. Maybe I was just being too revolutionary in my thinking. However, if we modified the X system in a well-understood what, the behavior of which was well-documented and easy to reason about, than the approach Keyur took, that forfeits 100% compatibility for existing applications, but does a good job for developing new applications and porting popular existing applications for use in the Indian market, this does not seem like an unreasonable direction. For instance we have lived with both System V and BSD now for over thirty years, so I dont see why this couldnt co-exist with "Standard" X. Maybe once again Im being too revolutionary... --tapan ----- Original Message ----- From: "Joseph Koshy" <jk...@Fr...> To: "Tapan S. Parikh" <ta...@ya...> Cc: <ind...@so...>; "Keyur Shroff" <ke...@ko...> Sent: Wednesday, February 13, 2002 10:14 AM Subject: Re: [Indic-computing-devel] Re: NCST Indix Examined > > > Dear Tapan, > > tp> Remember, X was designed and developed in a time when we were > tp> dealing with a (ISO) Latin-only world, with no idea of supporting > tp> complex scripts and / or fonts , where 8-bit char codes most likely > tp> meant glyph codes as well, so the distinction was moot. At that > tp> point in time there was no idea of supporting complex scripts and > tp> issues such as ligatures, conjuncts, positioning, etc., nor even the > tp> idea of true type (not to mention open-type) fonts and cmap tables. > > The X protocol caters only to the simplest model of text rendering > possible: X fonts are simple collections of glyphs (no additional > semantics) and X text rendering works by placing glyphs ``next'' to > each other. > > This is not sufficient even for high-quality Latin text rendering > (eg:- no kerning, no ligatures), and is quite inadequate for other > writing systems. > > X was NOT designed for high-quality text rendering. Please note, > though, that this was a reasonable design in its time. > > tp> As we move into the 21st century, it seems very much out of the X > tp> frame of view to have the client worrying about the specific font it > tp> will use and the corresponding mapping table, as well as positioning > tp> issues. To me this seems clearly within the domain of the X Server, > tp> and its rendering engine. If the X protocol is vague on this issue, > tp> it is my opinion that the X protocol should change, not our approach > tp> to this problem. > > Well, the X protocol /can/ be extended in a controlled way without > necessarily breaking the 'base' semantics. These are called > 'extensions'. Nearly every X server you would use today implements > one or more extensions to the core X protocol (you can run `xdpyinfo' > and see the extensions present in your X server). > > tp> So this becomes a deeper issue of modifying the X Server to > tp> support Unicode and Open (or True) Type Fonts, which must be going > tp> on elsewhere as well. Anyone know? > > The Fonts and I18N sub-groups at the XFree86 project have been working > on these issues for quite a while now. > > http://www.xfree86.org/pipermail/fonts > http://www.xfree86.org/pipermail/i18n > > Keith Packard and a few other developers are developing "RENDER", a > protocol extension to X that provides somewhat more sophisticated text > rendering. > > http://www.xfree86.org/~keithp/render/ > > "RENDER" uses *new* protocol requests to do glyph rendering (named > `CompositeGlyphs{8/16/32}' respectively). It doesn't modify the > semantics of `PolyText{8/16}' and `ImageText{8/16}', the existing > protocol requests that do text rendering. Thus existing X clients > will continue to work unchanged. > > "RENDER" moves font-specific and encoding-specific knowledge away from > the X server into the X client. > > Shall we move further discussion on X protocol issues off this list? > I'm afraid we may be moving away from the charter of > <indic-computing-devel>. > > Regards, > Koshy > <jk...@fr...> _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com |
From: Tapan S. P. <ta...@ya...> - 2002-02-13 07:35:38
|
I tend to think that this discussion is quite germane to <indic-computing-devel>, as it plays directly into issues neccesary for a bootable OS (esp. if we are basing that on the X arch), but if others feel otherwise, I dont mind doing it back-channel, or on the X lists themselves. --tapan _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com |
From: <jk...@Fr...> - 2002-02-13 04:44:46
|
Dear Tapan, tp> Remember, X was designed and developed in a time when we were tp> dealing with a (ISO) Latin-only world, with no idea of supporting tp> complex scripts and / or fonts , where 8-bit char codes most likely tp> meant glyph codes as well, so the distinction was moot. At that tp> point in time there was no idea of supporting complex scripts and tp> issues such as ligatures, conjuncts, positioning, etc., nor even the tp> idea of true type (not to mention open-type) fonts and cmap tables. The X protocol caters only to the simplest model of text rendering possible: X fonts are simple collections of glyphs (no additional semantics) and X text rendering works by placing glyphs ``next'' to each other. This is not sufficient even for high-quality Latin text rendering (eg:- no kerning, no ligatures), and is quite inadequate for other writing systems. X was NOT designed for high-quality text rendering. Please note, though, that this was a reasonable design in its time. tp> As we move into the 21st century, it seems very much out of the X tp> frame of view to have the client worrying about the specific font it tp> will use and the corresponding mapping table, as well as positioning tp> issues. To me this seems clearly within the domain of the X Server, tp> and its rendering engine. If the X protocol is vague on this issue, tp> it is my opinion that the X protocol should change, not our approach tp> to this problem. Well, the X protocol /can/ be extended in a controlled way without necessarily breaking the 'base' semantics. These are called 'extensions'. Nearly every X server you would use today implements one or more extensions to the core X protocol (you can run `xdpyinfo' and see the extensions present in your X server). tp> So this becomes a deeper issue of modifying the X Server to tp> support Unicode and Open (or True) Type Fonts, which must be going tp> on elsewhere as well. Anyone know? The Fonts and I18N sub-groups at the XFree86 project have been working on these issues for quite a while now. http://www.xfree86.org/pipermail/fonts http://www.xfree86.org/pipermail/i18n Keith Packard and a few other developers are developing "RENDER", a protocol extension to X that provides somewhat more sophisticated text rendering. http://www.xfree86.org/~keithp/render/ "RENDER" uses *new* protocol requests to do glyph rendering (named `CompositeGlyphs{8/16/32}' respectively). It doesn't modify the semantics of `PolyText{8/16}' and `ImageText{8/16}', the existing protocol requests that do text rendering. Thus existing X clients will continue to work unchanged. "RENDER" moves font-specific and encoding-specific knowledge away from the X server into the X client. Shall we move further discussion on X protocol issues off this list? I'm afraid we may be moving away from the charter of <indic-computing-devel>. Regards, Koshy <jk...@fr...> |