indic-computing-devel Mailing List for The Indic-Computing Project (Page 20)
Status: Alpha
Brought to you by:
jkoshy
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(14) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(25) |
Feb
(90) |
Mar
(41) |
Apr
(16) |
May
(8) |
Jun
|
Jul
(37) |
Aug
(35) |
Sep
(62) |
Oct
(37) |
Nov
(22) |
Dec
(7) |
2003 |
Jan
(16) |
Feb
(19) |
Mar
(10) |
Apr
(5) |
May
(26) |
Jun
(11) |
Jul
(35) |
Aug
(4) |
Sep
(14) |
Oct
(5) |
Nov
(5) |
Dec
(10) |
2004 |
Jan
(25) |
Feb
(2) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(10) |
Aug
(2) |
Sep
(2) |
Oct
(1) |
Nov
(9) |
Dec
|
2005 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2006 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(4) |
Dec
|
From: Rajkumar S <s_...@my...> - 2002-03-06 20:16:13
|
On Wed, 6 Mar 2002, Primoz Peterlin wrote: > I find this hard to believe. According to > <http://www.portalkerala.com/hevents2.htm>, newspapers have been > published in Malayalam for over 150 years. I am sure that the > typesetters had to make some selection of glyphs? they all choose a subset of glyphs, but the problem is that they choose different sub set. Since any conjunct in Malayalam can be represented as combination of base glyps and Halant, even if their are difference in the actual glyph chosen readability is not compromised. In general the more the number of glyphs, the better. But you can do with less glyphs also. > If you take any Malayalam newspaper, or any textbook, is it typeset in > the written script or the reformed one? I guess we should aim at this, > as people are used to read it. It is with reformed script, but some higher quality papers choose to include more glyphs. > Thank you! Not being a native Malayalam reader, I will sure need and > appreciate your help. It is my pleasure. > None of them are free, so I cannot include it in the GPL-ed font (I > can look at them when I design one, I guess :) None of them seem to > contain more than some 80 ligatures. Do you know of any better? Take a look at the fonts my Hellingman of Malayalam TeX, which should be available at CTAN. It is free, though the shapes of the glyphs are not real high quality. I will also try to hunt down some good Malayalam fonts as they appear in newspapers and other print media, If you can send me your postal address offline. raj |
From: Primoz P. <pri...@bi...> - 2002-03-06 20:06:07
|
-----BEGIN PGP SIGNED MESSAGE----- On Wed, 6 Mar 2002, Rajkumar S wrote: > > As a first question, I would like to ask whether there is any > > agreement on the sets of ligatures needed to render particular Indic > > scripts, i.e. > Absolutely none for Malayalam. I find this hard to believe. According to <http://www.portalkerala.com/hevents2.htm>, newspapers have been published in Malayalam for over 150 years. I am sure that the typesetters had to make some selection of glyphs? > > * a minimal set, e.g. for use in email (for instance, like the lam-alif > > in Arabic) > > * a practical set, e.g. for use on WWW or in newspaper (required to > > typeset a modern language) > This two sets are minimally different for Malayalam. In fact their are tw= o > different ligature sets for Malayalam, One is the original written script > which has a lot of ligatures. Mr Hussin who has done extensive work in th= e > original Malayalam script has so far identified about 900 glyphs and stil= l > counting. > As for the so called "reformed" script which was mainly truncating > Malayalam for typewriting, I am not exactly aware of the exact number of > glyphs involved. but in general it can be taken from some of the CDAC > fonts that are lying around. If you take any Malayalam newspaper, or any textbook, is it typeset in the written script or the reformed one? I guess we should aim at this, as people are used to read it. > I will be more than happy to provide any information to help you to creat= e > freely available Malayalam fonts. Thank you! Not being a native Malayalam reader, I will sure need and appreciate your help. So far, I have found three sites offering Malayalam fonts: http://www.deepika.com/font.htm http://www.malayalamvarikha.com/font/download.htm http://www.goodnews-weekly.com/malayalam.htm None of them are free, so I cannot include it in the GPL-ed font (I can look at them when I design one, I guess :) None of them seem to contain more than some 80 ligatures. Do you know of any better? With kind regards, Primoz - -- Primo=BE Peterlin, In=B9titut za biofiziko, Med. fakulteta, Univerza v Lj= ubljani Lipi=E8eva 2, SI-1000 Ljubljana, Slovenija. primoz.peterlin@biofiz.mf.uni-= lj.si Tel: +386-1-5437632, fax: +386-1-4315127, http://sizif.mf.uni-lj.si/~peterl= in/ F8021D69 OpenPGP fingerprint: CB 6F F1 EE D9 67 E0 2F 0B 59 AF 0D 79 56 19= 0F -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (HP-UX) Comment: For info see http://www.gnupg.org iQB1AwUBPIZ2Qz3bcxr4Ah1pAQELrQMArTWPpBdGtYDMiFsVIhbm7uypbykf+kfa WLcg8oB3t1Qi5/dqK6FGuIUlYduHqZCXYq2DiMiDxeOwfAJuGqRtLF2uHHsiSlaE QQ7zZoeHnOA75fBldtLuLg2CY5TkrMef =3DyL5+ -----END PGP SIGNATURE----- |
From: Primoz P. <pri...@bi...> - 2002-03-06 19:50:34
|
-----BEGIN PGP SIGNED MESSAGE----- Hello, On Wed, 6 Mar 2002, Guntupalli Karunakar wrote: > > As a first question, I would like to ask whether there is any > > agreement on the sets of ligatures needed to render particular Indic > > scripts, ... > Not all scripts have officially standardized glyph set, except for > Tamil & Kannada. Thank you. I just learned that the Kannada standard set is available on the Karnataka Directorate of Information Technology site. Do you have any pointers to the Tamil specifications? > > http://www.indlinux.org/fonts/ > This font was worked keeping in mind the above categories, we > started from minimal to practical and in future to a maximal . So > Currently it satisfies the first two. This font will be soon released > under GNU GPL (now that it has finally got a sponsor :-) . Congratulations! > > What is the situation with other Indic scripts? > There are the Bharatbhasha shusha set of fonts which take the minimal > approach , available at http://www.bharatbhasha.org.in/ They cater to > the scripts - Devanagari (Hindi & Marathi), Gurmukhi, Bengali, > Gujarati. That's a new URL, thank you. The last I know was http://www.bharabhasha.net/, and seemed to be neglected. > For Kannada, an organisation Kannada Ganaka Parishat ( > www.ganakaparishat.org ) has done glyph standardisation (contact > person: Dr U B Pavanaja < pav...@vi... > ) I believe this was a very wise and important step. > For Tamil , A glyph standard called TSCII has already been evolved. > http://www.geocities.com/Athens/5180/tnet99.html > http://www.geocities.com/Athens/5180/tscii.html Thank you for the link! > For Telugu there is no standard yet but a GPLed font is available at > (http://chaitanya.bhaavana.net/fonts/). It has a rich set of glyphs, > and can cater to first 2 categories. I am making a opentype version of > it. Thank you, I've found this font already, but as I cannot read Telugu, wasn't able to devise which glyphs represent which ligatures. Thank you very much, Primoz - -- Primo=BE Peterlin, In=B9titut za biofiziko, Med. fakulteta, Univerza v Lj= ubljani Lipi=E8eva 2, SI-1000 Ljubljana, Slovenija. primoz.peterlin@biofiz.mf.uni-= lj.si Tel: +386-1-5437632, fax: +386-1-4315127, http://sizif.mf.uni-lj.si/~peterl= in/ F8021D69 OpenPGP fingerprint: CB 6F F1 EE D9 67 E0 2F 0B 59 AF 0D 79 56 19= 0F -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (HP-UX) Comment: For info see http://www.gnupg.org iQB1AwUBPIZy1z3bcxr4Ah1pAQGA9gMAtKgle+JVO25PkTOC5XquVt5CKv3b+e6l 6k8TN2nVS5RKWbgARBs6qi78I9xu1ndQq90DIGsPR1dmD3AhN3vskZ6FOetp/i45 pOqMGdwH3giq8NPzc0bYuC93aTb0jb6V =3DX2G4 -----END PGP SIGNATURE----- |
From: Primoz P. <pri...@bi...> - 2002-03-06 19:35:09
|
-----BEGIN PGP SIGNED MESSAGE----- Hello, On Wed, 6 Mar 2002, Dr. U.B. Pavanaja wrote: > > the PfaEdit PostScript font editor <http://pfaedit.sourceforge.net/>, > I went through the manual pages. I did not find info on how do I add > the OpnType Layouts. Currently I am creating a OpenType font for > Kannada using Microsofft's VOLT. PfaEdit is a rapidly evolving piece of software. I believe that the main reason why it doesn't support it yet is that none of its current users felt any real need for it. I am sure that commercial font editors are more advanced, but then again, their support teams don't reply to bug reports in the matter of minutes, either... :) > > As a first question, I would like to ask whether there is any > > agreement on the sets of ligatures needed to render particular Indic > > scripts, > As far as Kannada is considered, tere is a standard. There is also a > free Kannada script software (for Windows) available for FREE > download at http://www.bangaloreit.com/html/education/Nudi.html. The > s/w includes the standard font (glyph set). Thank you for the URL, I'll have a look. > For OpenType font, we need more glyphs. There is no need of any font > glyph set standard for OpenType font. It is the job of the rendering > engine (Uniscribe on Windows XP) to display the font properly. I realize that the glyph set is an "open set", to which glyphs can be added, should the need arise. What I meant by an agreed set of required ligatures needs not necessarily be an official standard. But on the other hand, I believe that newspapers and textbooks are printed in all major Indian languages, so a century(-ies) ago, well before any computers, typesetters had to make such lists. I would guess that printing scholarly publications, poetry etc. might require a richer set of glyphs, but nevertheless, I would like to have some goal... On the Uniscribe engine... I have been reading the Microsoft Typography pages, and wasn't smart enough to guess whether Uniscribe simply substitutes the right ligature for the given sequence of characters (using the GSUB table?), or has some smart way of actually *creating* the needed glyphs on-the-fly. You seem to have some first-hand experience with it, perhaps you could help me? With kind regards, Primoz - -- Primo=BE Peterlin, In=B9titut za biofiziko, Med. fakulteta, Univerza v Lj= ubljani Lipi=E8eva 2, SI-1000 Ljubljana, Slovenija. primoz.peterlin@biofiz.mf.uni-= lj.si Tel: +386-1-5437632, fax: +386-1-4315127, http://sizif.mf.uni-lj.si/~peterl= in/ F8021D69 OpenPGP fingerprint: CB 6F F1 EE D9 67 E0 2F 0B 59 AF 0D 79 56 19= 0F -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (HP-UX) Comment: For info see http://www.gnupg.org iQB1AwUBPIZvSD3bcxr4Ah1pAQEA1AL8Dfy52CLPWW92G01kYX2tqaLYkoSHNS2w ucadzsxihlblvSZqMp4oqvIoDxI9UqKyJQGSgi98b1cTl/JzOM3sdjXMJkktYrb7 Ttb9Nf3sqqM7BBlKYta+M8bOCoJzTxBG =3DqmxN -----END PGP SIGNATURE----- |
From: Rajkumar S <s_...@my...> - 2002-03-06 18:39:30
|
On Wed, 6 Mar 2002, Dr. U.B. Pavanaja wrote: > I went through the manual pages. I did not find info on how do I add > the OpnType Layouts. Currently I am creating a OpenType font for > Kannada using Microsofft's VOLT. As far as I know VOLT is the only tool capable of adding OpenType Layouts for indic scripts. raj > Note: I don't worry about pselling mixtakes :) |
From: Rajkumar S <s_...@my...> - 2002-03-06 18:31:20
|
On Wed, 6 Mar 2002, Primoz Peterlin wrote: > Dear gentlemen, > > Encouraged by the URW++ release of core 35 PostScript fonts under the > terms of GNU GPL and the steady improvement of the PfaEdit PostScript > font editor <http://pfaedit.sourceforge.net/>, I set myself a goal to > compile a set of free (GPL-ed) outline fonts covering a range of > ISO10646/Unicode as broad as reasonably achievable. The partial > results of this effort are available on the project page, > <http://savannah.gnu.org/projects/freefont/>. > > As Indic scripts seem to remain Unicode's largest grey area, I was > very happy when Prof. Hariharan told me of this project, as it seems > that a large portion of knowledge is concentrated here. > > As a first question, I would like to ask whether there is any > agreement on the sets of ligatures needed to render particular Indic > scripts, i.e. Absolutely none for Malayalam. > * a minimal set, e.g. for use in email (for instance, like the lam-alif > in Arabic) > * a practical set, e.g. for use on WWW or in newspaper (required to > typeset a modern language) This two sets are minimally different for Malayalam. In fact their are two different ligature sets for Malayalam, One is the original written script which has a lot of ligatures. Mr Hussin who has done extensive work in the original Malayalam script has so far identified about 900 glyphs and still counting. As for the so called "reformed" script which was mainly truncating Malayalam for typewriting, I am not exactly aware of the exact number of glyphs involved. but in general it can be taken from some of the CDAC fonts that are lying around. I will be more than happy to provide any information to help you to create freely available Malayalam fonts. raj |
From: Guntupalli K. <kar...@fr...> - 2002-03-06 15:45:13
|
On Wed, 6 Mar 2002 11:18:16 +0100 (MET) Primoz Peterlin <pri...@bi...> wrote: > Dear gentlemen, > > Encouraged by the URW++ release of core 35 PostScript fonts under > the terms of GNU GPL and the steady improvement of the PfaEdit > PostScript font editor <http://pfaedit.sourceforge.net/>, I set > myself a goal to compile a set of free (GPL-ed) outline fonts > covering a range of ISO10646/Unicode as broad as reasonably > achievable. The partial results of this effort are available on the > project page, <http://savannah.gnu.org/projects/freefont/>. > > As Indic scripts seem to remain Unicode's largest grey area, I was > very happy when Prof. Hariharan told me of this project, as it seems > that a large portion of knowledge is concentrated here. > > As a first question, I would like to ask whether there is any > agreement on the sets of ligatures needed to render particular Indic > scripts, i.e. > Not all scripts have officially standardized glyph set, except for Tamil & Kannada. > * a minimal set, e.g. for use in email (for instance, like the > lam-alif in Arabic) > * a practical set, e.g. for use on WWW or in newspaper (required to > typeset a modern language) > * a maximal set, including all glyphs needed to render traditional > texts, including rare or theoretical ligatures > > So far, I know of three sets of ligatures for Devanagari alone: > > * Frans Velthuis' Devanagari metafont (cca. 120 ligatures) > http://www.ctan.org/tex-archive/language/devanagari/ > * Prof Joshi's Raghu font (468 ligatures) > http://rohini.ncst.ernet.in/indix/download/font/ > * Indlinux Devenagari font (204 ligature) > http://www.indlinux.org/fonts/ This font was worked keeping in mind the above categories, we started from minimal to practical and in future to a maximal . So Currently it satisfies the first two. This font will be soon released under GNU GPL (now that it has finally got a sponsor :-) . > > It would be nice if they would somehow correspond to the above three > categories... :) > > What is the situation with other Indic scripts? There are the Bharatbhasha shusha set of fonts which take the minimal approach , available at http://www.bharatbhasha.org.in/ They cater to the scripts - Devanagari (Hindi & Marathi), Gurmukhi, Bengali, Gujarati. For Kannada, an organisation Kannada Ganaka Parishat ( www.ganakaparishat.org ) has done glyph standardisation (contact person: Dr U B Pavanaja < pav...@vi... > ) For Tamil , A glyph standard called TSCII has already been evolved. http://www.geocities.com/Athens/5180/tnet99.html http://www.geocities.com/Athens/5180/tscii.html For Telugu there is no standard yet but a GPLed font is available at (http://chaitanya.bhaavana.net/fonts/). It has a rich set of glyphs, and can cater to first 2 categories. I am making a opentype version of it. Dont have much info. about other scripts ( Gujarati, Gurmukhi, Bengali, Oriya ) Regards, Karunakar |
From: Dr. U.B. P. <pav...@vi...> - 2002-03-06 14:05:07
|
From: Primoz Peterlin <pri...@bi...> > the PfaEdit PostScript > font editor <http://pfaedit.sourceforge.net/>, I went through the manual pages. I did not find info on how do I add the OpnType Layouts. Currently I am creating a OpenType font for Kannada using Microsofft's VOLT. > As Indic scripts seem to remain Unicode's largest grey area, I was > very happy when Prof. Hariharan told me of this project, as it seems > that a large portion of knowledge is concentrated here. > > As a first question, I would like to ask whether there is any > agreement on the sets of ligatures needed to render particular Indic > scripts, i.e. <snif> > What is the situation with other Indic scripts? As far as Kannada is considered, tere is a standard. There is also a free Kannada script software (for Windows) available for FREE download at http://www.bangaloreit.com/html/education/Nudi.html. The s/w includes the standard font (glyph set). For OpenType font, we need more glyphs. There is no need of any font glyph set standard for OpenType font. It is the job of the rendering engine (Uniscribe on Windows XP) to display the font properly. -Pavanaja ----------------------------------------------------- Dr. U.B. Pavanaja Editor, Vishva Kannada World's first Internet magazine in Kannada http://www.vishvakannada.com/ Note: I don't worry about pselling mixtakes |
From: Primoz P. <pri...@bi...> - 2002-03-06 10:18:47
|
Dear gentlemen, Encouraged by the URW++ release of core 35 PostScript fonts under the terms of GNU GPL and the steady improvement of the PfaEdit PostScript font editor <http://pfaedit.sourceforge.net/>, I set myself a goal to compile a set of free (GPL-ed) outline fonts covering a range of ISO10646/Unicode as broad as reasonably achievable. The partial results of this effort are available on the project page, <http://savannah.gnu.org/projects/freefont/>= =2E As Indic scripts seem to remain Unicode's largest grey area, I was very happy when Prof. Hariharan told me of this project, as it seems that a large portion of knowledge is concentrated here. As a first question, I would like to ask whether there is any agreement on the sets of ligatures needed to render particular Indic scripts, i.e. * a minimal set, e.g. for use in email (for instance, like the lam-alif in Arabic) * a practical set, e.g. for use on WWW or in newspaper (required to typeset a modern language) * a maximal set, including all glyphs needed to render traditional texts, including rare or theoretical ligatures So far, I know of three sets of ligatures for Devanagari alone: * Frans Velthuis' Devanagari metafont (cca. 120 ligatures) http://www.ctan.org/tex-archive/language/devanagari/ * Prof Joshi's Raghu font (468 ligatures) http://rohini.ncst.ernet.in/indix/download/font/ * Indlinux Devenagari font (204 ligature) http://www.indlinux.org/fonts/ It would be nice if they would somehow correspond to the above three categories... :) What is the situation with other Indic scripts? With kind regards, Primoz Peterlin -- Primo=BE Peterlin, In=B9titut za biofiziko, Med. fakulteta, Univerza v Lj= ubljani Lipi=E8eva 2, SI-1000 Ljubljana, Slovenija. primoz.peterlin@biofiz.mf.uni-= lj.si Tel: +386-1-5437632, fax: +386-1-4315127, http://sizif.mf.uni-lj.si/~peterl= in/ F8021D69 OpenPGP fingerprint: CB 6F F1 EE D9 67 E0 2F 0B 59 AF 0D 79 56 19= 0F |
From: Dr. U.B. P. <pav...@vi...> - 2002-03-06 08:32:32
|
From: Rajkumar S <s_...@my...> > > Devanagari has "Akhand", > > "Akhand" feature is confusing for me, I am having a dialogue with > various people here (I am a native Malayalam speaker) and Apurva > about this. So far I have never come across this feature in Malayalam > grammar, but it seems that the two Akhand ligatures get priority over > other ligatures when rendering. The way I tested this by asking this > question > > Given a font that has a glyph each for the conjuncts: KaKa, Kssa. > Given that the font contains a lookup with the following substitution > rule: > > Ka Halant Ka -> KaKa > Ka Halant Ssa -> Kssa > Ja Halant Nya -> Dnya > Nya Halant Nya -> Nnya > > Now given a theoretical sequences: Ka Halant Ka Halant Ssa Halant Ma > and Ja Halant Nya Halant Nya Halant Ma, How will you render them. > > All of them answered that they will give priority for Akhand. But > later when I explained the concept of Akhand they were surprised. But > even now I don't know if their is any linguistic basis for clustering > priorities. In OTL services Akhand follows Nukta and all other features are applied later. Hence Akhand gets priority. "Is there any linguistic basis for this?" -I am afraid of existence of any positive answer for this question. Devanagari has akhand feature for KSssa and JaNya. MS has implemented akhand for for these in their Tunga font for Kannada. Actually, there is no akhand feature in Kannada. > Any one has any idea about sorting Unicode Indic data, esp in the > context of any database? Any Unicode aware DB out there? Sorting in Unicode follows the collation table mentioned in Unicode Technical Report 10 (tr10). This is based on ISCII. ISCII has wrongly placed La and Lla together for Kannada (and all other languages). Hindi does not contain Lla. Hence the ISCII order is Ya, Ra, Rra (old), La, Lla, Va, Sha, Ssa, Sa, Ha; while the correct order for Kannada is Ya, Ra, Rra (old), La, Va, Sha, Ssa, Sa, Ha, Lla. This mistake in the Unicode has been corrected now (March 01). But MS is yet to implement it in XP. Other language people should check out these. MS Access that ships with WinXP and OfficeXP has Uniocde sorting for Indic. Same is true of SQL2000 from MS. -Pavanaja > raj ----------------------------------------------------- Dr. U.B. Pavanaja Editor, Vishva Kannada World's first Internet magazine in Kannada http://www.vishvakannada.com/ Note: I don't worry about pselling mixtakes |
From: Keyur S. <key...@ya...> - 2002-02-28 07:58:12
|
Hi, --- Rajkumar S <s_...@my...> wrote: > Has any one got the Indic support for Openoffice working? > According to a > technical report All the indian languages are supported. Yes, but partially. Our NCST Bangalore team is responsible for OpenOffice localization. They have done it for OpenOffice on Windows. Last week they demonstrated it in Delhi during the exhibition. As OpenOffice is cross-platform office tools, it should work on Linux with few changes. There is a separate layer in OpenOffice for platform dependent code. Once platform independent code has been written and fully tested, we only have to write code fr Linux specific routines. - Keyur __________________________________________________ Do You Yahoo!? Yahoo! Greetings - Send FREE e-cards for every occasion! http://greetings.yahoo.com |
From: Arun S. <ar...@sh...> - 2002-02-27 17:11:54
|
On Wed, Feb 27, 2002 at 09:49:23AM +0530, Rajkumar S wrote: > > Supporting UTF-8 is not enough. We need some mechanism so that language > specific sorting algorithms can be applied to UTF-8 data. > For Linux, there is a hi_IN locale in glibc, which has a sorting order specified. It was created by someone in Japan, working for IBM (isn't it ironic for a country of 1 billion people, boasting hundreds of thousands of programmers ?). The language experts on this list should review the sorting order specified there. In the absence of a locale specifying a sorting order, I think it falls back to number comparision of the unicode code points. FreeBSD didn't support UTF-8 locales last I looked, but there is a ISCII based locale that I contributed. Again, language experts please review. The sorting order is encapsulated in the LC_COLLATE section for Linux: http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/localedata/locales/hi_IN?rev=1.2&content-type=text/x-cvsweb-markup&cvsroot=glibc http://www.freebsd.org/cgi/cvsweb.cgi/src/share/colldef/hi_IN.ISCII-DEV.src?rev=1.1&content-type=text/x-cvsweb-markup -Arun |
From: Arun S. <ar...@sh...> - 2002-02-27 16:59:08
|
On Wed, Feb 27, 2002 at 01:37:17AM +0530, Rajkumar S wrote: > > Any one has any idea about sorting Unicode Indic data, esp in the context > of any database? Any Unicode aware DB out there? > For Kannada (of which I'm a native speaker) there is a document published by the Kannada Ganaka parishat, which clarifies these rules - such as Does Ka Halant come before or after Ka in the sorting order. About databases, yes - I've experimented with postgres. My first experiment was a user contributed Indian language dictionary. http://www.sharma-home.net:8180/indict/find_word.py?word=Internet http://www.sharma-home.net:8180/indict/index.jsp Since very few people seem to have access to a browser that can display Indian languages, I haven't had much response. But hopefully, that'll change over time. -Arun |
From: Rajkumar S <s_...@my...> - 2002-02-27 16:34:39
|
Hi, Has any one got the Indic support for Openoffice working? According to a technical report All the indian languages are supported. See http://www.openoffice.org/files/documents/9/68/Technical_report.html?JServSessionIdservlets=381gqnmk21 From the report: 2.2. INDIAN LANGUAGES SUPPORT: Complex Text Layout now works for the following Indian Language script - i) Devanagari (Hindi, Konkani, Sanskrit and Marathi), ii) Tamil, iii) Kannada, iv) Telugu, v) Gurmukhi (Punjabi), vi) Malayalam, vii) Oriya, viii) Bengali and ix) Gujarati raj |
From: Rajkumar S <s_...@my...> - 2002-02-27 16:11:00
|
On Wed, 27 Feb 2002, Keyur Shroff wrote: > Tomorrow someone from Kerala (don't remember his name) called me up at > NCST and asked about this "Chillaksharam" problem. You mean Yesterday? :) > separate code points could have been assigned to all Akhand in Indic > scripts. What are the advantages of having a separate code point? I am against encoding conjuncts in Unicode. Just because for Latin has these encoded do not mean that we need to have them. Actually it breaks the rule of Unicode that only characters are encoded. Akhand is a rendering problem and the solution should also be in rendering engine. > I am sure that in the next coming proposal our Government has proposed > to include all Akhand for a separate code point in Unicode. Wouldn't that mess up the sorting rules? I guess unless we can find some unencoded "characters" we should not bother with expanding the character set. Vedic Characters are one example which comes to my mind that requires encoding. > I'll try to gather some information on sorting order. Many database > including Oracle now supports UTF-8 format of Unicode. Supporting UTF-8 is not enough. We need some mechanism so that language specific sorting algorithms can be applied to UTF-8 data. raj |
From: Keyur S. <key...@ya...> - 2002-02-27 11:00:09
|
--- Rajkumar S <s_...@my...> wrote: > On Mon, 25 Feb 2002, Keyur Shroff wrote: > > > For example, Malayalam has "Chillaksharam" > My first impression with > "Chillaksharam" was that It is a unique feature of > Malayalam and it will > require some modifications in Unicode stds to accommodate > them. Our Ministry has prepared new proposal for introduction of new Indic characters in next version of Unicode standard. For that the Ministry is gathering information on various scripts from various Language Resource Centres in India for those scripts. Tomorrow someone from Kerala (don't remember his name) called me up at NCST and asked about this "Chillaksharam" problem. It is true that "Chillaksharam" form can be produced using Zero-Width-Joiner (ZWJ) and Zero-Width-NonJoiner (ZWNJ). > > Devanagari has "Akhand", > > "Akhand" feature is confusing for me, I am having a > dialogue with various > people here (I am a native Malayalam speaker) and Apurva > about this. So > far I have never come across this feature in Malayalam > grammar, but it > seems that the two Akhand ligatures get priority over > other ligatures when > rendering. The way I tested this by asking this question > > Given a font that has a glyph each for the conjuncts: > KaKa, Kssa. Given > that the font contains a lookup with the following > substitution rule: > > Ka Halant Ka -> KaKa > Ka Halant Ssa -> Kssa > Ja Halant Nya -> Dnya > Nya Halant Nya -> Nnya > > Now given a theoretical sequences: Ka Halant Ka Halant > Ssa Halant Ma and > Ja Halant Nya Halant Nya Halant Ma, How will you render > them. > > All of them answered that they will give priority for > Akhand. But later > when I explained the concept of Akhand they were > surprised. But even now > I don't know if their is any linguistic basis for > clustering priorities. I also discussed with Apurva Joshi @ Microsoft about applying features. She is daughter of Prof. R.K.Joshi who is working here at NCST in font design area. Raghu font has been designed by Prof. R.K.Joshi. Here I am quoting Apurva's message <quote> All the akhands I have come across so far in Indic scripts are made up of two consonants. They are thus essentially treated as conjuncts, not consonants. And, more importantly, they have an additional status of being processed first in any given input sequence. </quote> Thus, "Akhand" is also consonant conjunct but it is "special" in the sense that it is given priority over others. Actually, like many Latin-1 supplement and Latin Extended-A characters in Unicode, separate code points could have been assigned to all Akhands in Indic scripts. But because of improper lobbying at Unicode consortium, we couldn't make them assign separate code points for Akhand. However I am sure that in the next coming proposal our Government has proposed to include all Akhands for a separate code point in Unicode. > Any one has any idea about sorting Unicode Indic data, > esp in the context > of any database? Any Unicode aware DB out there? I'll try to gather some information on sorting order. Many database including Oracle now supports UTF-8 format of Unicode. - Keyur __________________________________________________ Do You Yahoo!? Yahoo! Greetings - Send FREE e-cards for every occasion! http://greetings.yahoo.com |
From: Rajkumar S <s_...@my...> - 2002-02-27 07:59:03
|
On Mon, 25 Feb 2002, Keyur Shroff wrote: > For example, Malayalam has "Chillaksharam" I have been experimenting with the IndiX patch for Malayalam support and has had some initial success with it. My first impression with "Chillaksharam" was that It is a unique feature of Malayalam and it will require some modifications in Unicode stds to accommodate them. But later Apurva Joshi <ap...@mi...>, who deals with Indic scripts in MS clarified that <quote> I assume that "chillu form" above means the "chillaksharam form" that only a few consonants in Malayalam take. If so, the following explanation is how this form is currently implemented: If the last consonant in a Malayalam word is capable of forming a chillaksharam, and it is followed by a Halant/Virama followed by a word delimiter [in most Indic scripts this is the space]; this sequence is displayed as consonant Halant. Thus: Kha Ka Halant is displayed as Kha Ka_Chillaksharam. In the above case if you would like to convert the chillaksharam to its consonant+Halant form you need to insert a ZWJ after the Halant; thus: Kha Ka Halant ZWJ This will display as Kha Ka Halant. And for input sequences like those given below, where the consonant capable of forming a chillaksharam is not the last consonant in a syllable, the following is done: Kha Na Halant Kha; the final display will be Kha Na Halant Kha. If the Na Halant, in the above case, which does not occur at the end of the word; needs to be retained as the chillaksharam form, you need to insert a ZWNJ thus: Kha Na Halant ZWNJ Kha; this will display as Kha Na_chillaksharam Kha. </quote> > Devanagari has "Akhand", "Akhand" feature is confusing for me, I am having a dialogue with various people here (I am a native Malayalam speaker) and Apurva about this. So far I have never come across this feature in Malayalam grammar, but it seems that the two Akhand ligatures get priority over other ligatures when rendering. The way I tested this by asking this question Given a font that has a glyph each for the conjuncts: KaKa, Kssa. Given that the font contains a lookup with the following substitution rule: Ka Halant Ka -> KaKa Ka Halant Ssa -> Kssa Ja Halant Nya -> Dnya Nya Halant Nya -> Nnya Now given a theoretical sequences: Ka Halant Ka Halant Ssa Halant Ma and Ja Halant Nya Halant Nya Halant Ma, How will you render them. All of them answered that they will give priority for Akhand. But later when I explained the concept of Akhand they were surprised. But even now I don't know if their is any linguistic basis for clustering priorities. > Tamil has "two-side split matra", etc. Detailed discussion of these > features is required. For Malayalam I take this as, U+0D4A, 4B and 4C. From what I understand these have to be split into the corresponding component marks, ie 0D15 0D4A is first split into 0D15 0D46 0D3E It is then reordered to 0D46 0D15 0D3E for rendering. The Tamil section of Unicode std gives more information about this. Any one has any idea about sorting Unicode Indic data, esp in the context of any database? Any Unicode aware DB out there? raj |
From: Keyur S. <key...@ya...> - 2002-02-26 07:40:45
|
Hello, I think we should discuss features those are specific to each Indic scripts. For example, Malayalam has "Chillaksharam", Devanagari has "Akhand", Tamil has "two-side split matra", etc. Detailed discussion of these features is required. Not all of us have full understanding of all Indic scripts. I believe that I have some good knowledge of Devanagari (and Gujarati). Also I know little about Tamil and other scripts. Can anyone throw some light? Regards, Keyur __________________________________________________ Do You Yahoo!? Yahoo! Sports - Coverage of the 2002 Olympic Games http://sports.yahoo.com |
From: Tapan S. P. <ta...@ya...> - 2002-02-26 06:16:32
|
Just a quick word. Can we try to remember we are a community working together to solve a problem (or set of problems)? Due respect for each other and each others ideas is a requirement for any community to work together. I respect all of you, and I think working together we have a diverse set of technical abilities and can achieve alot. We should focus on funneling our energies together, not apart. That is how it should be when our end goal is the same. But if we continue to work as individuals, that is what we will get, individual advances and individual projects. Can I remind you that is how this whole indic language mess started in the first place? That said, regarding this X issue, as I see it there are several approaches each with their own pros and cons. Compliance with the X protocol should probably be discussed on the X lists so you can get the input of people who are involved with that protocol impl itself. But on this list I feel we should be able to work freely and collaboratively, with no sense of hierarchy, and discuss anything that is of common interest, while at the same time keeping our end goals in sight and trying to move towards them. Just my four anna. Regards, Tapan _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com |
From: Arun S. <ar...@sh...> - 2002-02-25 18:25:19
|
I just noticed that a second doc has appeared on the website: http://stsf.sourceforge.net/ * Xst Client API spec [PDF] [StarOffice 6.0][Rich Text] Reading the client API spec (very cursorily), I think it has taken an IndiX like approach. XSTLineNewForWidth seems to be very much like XComputeNChars I talked about earlier. -Arun |
From: <jk...@Fr...> - 2002-02-25 10:26:02
|
ks> I can definately think about redesigning of IndiX once it ks> will be proved that it really breaks protocol. I'm happy to hear that. I've only heard good reports of the Indic (Devanagari) rendering aspects of IndiX. ks> have also decided to keep quiet on the issues like ks> XUtf8DrawString and XmbDrawString which we believe sends I didn't think this list is the place to clear the basic misconceptions about X programming. Since you seem unable (unwilling?) to do your own homework, I'll do it for you (this once). Please start with: "i18n mechanism" http://www.xfree86.org/pipermail/i18n/2002-February/003074.html and follow the thread, in particular the replies from Tomohiro KUBOTA http://www.xfree86.org/pipermail/i18n/2002-February/003077.html and Keith Packard. http://www.xfree86.org/pipermail/i18n/2002-February/003078.html OR the answers to my question here: "Complex text layout and mapping screen coordinates" http://www.xfree86.org/pipermail/fonts/2002-February/001331.html OR Arun Sharma's continuation of this argument on those lists: http://www.xfree86.org/pipermail/fonts/2002-February/001341.html OR this discussion here: "Another approach to text in X" http://www.xfree86.org/pipermail/fonts/2002-February/001339.html These links should help to clarify matters in a more efficent manner than any discussion on <indic-computing-devel> could hope to achieve. Could you please take further discussion of basic X concepts and X programming to the XFree86 lists? They are a better place for such discussions. We can discuss IndiX being part of the Bootable CD once it is re-designed to be protocol compliant. Regards, Koshy <jk...@fr...> |
From: Keyur S. <key...@ya...> - 2002-02-25 07:59:03
|
Hi, --- Joseph Koshy <jk...@Fr...> wrote: > The protocol says very clearly that fonts in X are > collections of > glyph bitmaps indexed by a glyph index. It states that > the values > used in text drawing requests are indices into the font. It states that the values used in text drawing requests are values used to index glyphs. This doesn't mean that these are glyph indices. > It also states very clearly that 'character codes' (i.e > code points of > any character set) are not used in the protocol. > > It states very clearly how the drawing requests are to > place the > bitmaps of the specified glyphs next to each other (i.e, > it disallows > reordering or substitution of glyphs). Please give reference to each of above. > There is a whole sub-standard (the X Logical Font > Description) that is > used by clients to select fonts with desired font > encodings so that everything 'just works'. > > This is one of the cornerstones of X's design. XLFD (X Logical Font Description) is not just meant for clients. It is also used by X library. When everything is wrapped in X library then X client does not necessarily use this XLFD. > ks> Everywhere it says about "values" passed in the > request. It > ks> was left for implementation to decide what are these > ks> "values". > > You are implying ambiguity in the specification where > none exists. You can send us pointers in X protocol specs where it is _clearly_ mentioned that only glyph indices are used. I have searched the specification but nowhere I have found that only glyph indices are used and character codes should not be used. > If you had a question about this, you could have asked on > the XFree86 lists. > > I wonder what you are hoping to achieve by arguing about > the X protocol on /this/ list: This discussion will help us in deciding correct design of bootable OS. When we have some doubts about other person's idea then it is better to clarify that before moving further in the design. This will only help all the people to contribute towards better design. And I remind you that only you raised questions about X protocol, not me. I am just answering to your questions. > - If there was really a doubt, you could have asked for > clarification > from the rest of the X community; I see no mail from > you on this > topic in the XFree86 archives. I don't have any doubt. Doubt is in your mind. So I don't see any need to raise this issue on XFree86 mailing list. > - You haven't run the test suite. In one of my earlier mail I stated that older copy of IndiX (which is on the website) breaks relationship with other foreign languages. I have fixed the problem now and Pablo Saratxaga (maintainer of Mandrake Linux) tested it on his machine. IndiX was showing French without any problem. I'll put the changes on the web. > - A cursory search in the XFree86 archives for ``glyph > indices'' or > other keywords would have revealed enough. Also see 'man XDrawString'.It clearly states that "character string" is passed in the function. I tell you that X library doesn't convert these character codes into glyph codes before sending them in the protocol request. > The initial review of IndiX had been posted on > <indic-computing-devel> > to make public the rationale for why it wouldn't be > bundled in the > 'Bootable OS' sub-project. When someone review my system publicly and posted his mail in public list, then I reserve all the rights to defend my system in public. I wonder why do you want to discuss this topic off the list when it is very much connected to design of Indic OS? > As and when IndiX gets re-designed to be protocol > compliant, we'll be happy to look at it again. I can definately think about redesigning of IndiX once it will be proved that it really breaks protocol. Remember that you have safely ignored questions raised by Arun. You have also decided to keep quiet on the issues like XUtf8DrawString and XmbDrawString which we believe sends character codes in X Protocol request. When you believe that sending character codes breaks X protocol design then you should prove that these functions really deal with glyph codes and not character codes. You should also try to write a small application to draw a string using some Unicode encoded TrueType font and calling XDrawString16. If you want, I can send a small program which I have tested. It will show Indic characters on any X server not only IndiX. Instead of just saying what is there in X protocol, you should also think about current implementation of XFree86. Regards, Keyur __________________________________________________ Do You Yahoo!? Yahoo! Sports - Coverage of the 2002 Olympic Games http://sports.yahoo.com |
From: Arun S. <ar...@sh...> - 2002-02-25 07:36:05
|
On Sun, Feb 24, 2002 at 06:33:41PM -0800, Joseph Koshy wrote: > The right lists for basic X11 questions would be the ones hosted by > X.Org or XFree86.Org, primarily because that is where X expertise is > most likely to be found. Discussions of X11 protocol extensions (for > example) would best be held on those lists. > > On the other hand, this /is/ the right list to discuss Indic > algorithms. For example: issues with the Kannada rendering algorithm > used by Uniscribe ('arkavathu' handling), the problems of handling of > Malayalam 'chillaksharams', the strengths/weaknesses of the Graphite > system [SIL.ORG] etc. etc. Do you think it makes sense to create a separate list under indic-computing-* for discussing: "Indian language support implementation issues on UNIX compatible systems using the X protocol" ? -Arun |
From: <jk...@Fr...> - 2002-02-25 02:33:41
|
Folks, The review of NCST IndiX seems to have sparked off a lot of email traffic, and the discussions have wandered off in a number of directions, including the design of protocol extensions for X, explanations of the X system etc. The right lists for basic X11 questions would be the ones hosted by X.Org or XFree86.Org, primarily because that is where X expertise is most likely to be found. Discussions of X11 protocol extensions (for example) would best be held on those lists. On the other hand, this /is/ the right list to discuss Indic algorithms. For example: issues with the Kannada rendering algorithm used by Uniscribe ('arkavathu' handling), the problems of handling of Malayalam 'chillaksharams', the strengths/weaknesses of the Graphite system [SIL.ORG] etc. etc. While there seem to be a number of resources available on the 'Net dealing with programming X/Windows(r)/MacOS(r), there seem to be very few which deal with the specific issues an Indian language developer has to face. When we setup these lists, the intent was that these lists and the Handbook would serve to (partially) fill this need. Regards, Koshy <jk...@fr...> |
From: <jk...@Fr...> - 2002-02-25 02:33:19
|
ks> It says that X protocol does no translation of character ks> sets. It doesn't mean that characters 'codes' are not to be ks> honored by the X server. X protocol also doesn't EXPLICITLY ks> says that glyph codes have to be passed in the request. The protocol says very clearly that fonts in X are collections of glyph bitmaps indexed by a glyph index. It states that the values used in text drawing requests are indices into the font. It also states very clearly that 'character codes' (i.e code points of any character set) are not used in the protocol. It states very clearly how the drawing requests are to place the bitmaps of the specified glyphs next to each other (i.e, it disallows reordering or substitution of glyphs). X clients rely on the X server placing the exact glyphs specified, at the exact pixel coordinates specified by them, when implementing user interfaces. There is a whole sub-standard (the X Logical Font Description) that is used by clients to select fonts with desired font encodings so that everything 'just works'. This is one of the cornerstones of X's design. ks> Everywhere it says about "values" passed in the request. It ks> was left for implementation to decide what are these ks> "values". You are implying ambiguity in the specification where none exists. If you had a question about this, you could have asked on the XFree86 lists. I wonder what you are hoping to achieve by arguing about the X protocol on /this/ list: - If there was really a doubt, you could have asked for clarification from the rest of the X community; I see no mail from you on this topic in the XFree86 archives. - You haven't run the test suite. - A cursory search in the XFree86 archives for ``glyph indices'' or other keywords would have revealed enough. The initial review of IndiX had been posted on <indic-computing-devel> to make public the rationale for why it wouldn't be bundled in the 'Bootable OS' sub-project. As and when IndiX gets re-designed to be protocol compliant, we'll be happy to look at it again. Regards, Koshy <jk...@fr...> |