Thread: [Indic-computing-devel] Free UCS outline font
Status: Alpha
Brought to you by:
jkoshy
From: Primoz P. <pri...@bi...> - 2002-03-06 10:18:47
|
Dear gentlemen, Encouraged by the URW++ release of core 35 PostScript fonts under the terms of GNU GPL and the steady improvement of the PfaEdit PostScript font editor <http://pfaedit.sourceforge.net/>, I set myself a goal to compile a set of free (GPL-ed) outline fonts covering a range of ISO10646/Unicode as broad as reasonably achievable. The partial results of this effort are available on the project page, <http://savannah.gnu.org/projects/freefont/>= =2E As Indic scripts seem to remain Unicode's largest grey area, I was very happy when Prof. Hariharan told me of this project, as it seems that a large portion of knowledge is concentrated here. As a first question, I would like to ask whether there is any agreement on the sets of ligatures needed to render particular Indic scripts, i.e. * a minimal set, e.g. for use in email (for instance, like the lam-alif in Arabic) * a practical set, e.g. for use on WWW or in newspaper (required to typeset a modern language) * a maximal set, including all glyphs needed to render traditional texts, including rare or theoretical ligatures So far, I know of three sets of ligatures for Devanagari alone: * Frans Velthuis' Devanagari metafont (cca. 120 ligatures) http://www.ctan.org/tex-archive/language/devanagari/ * Prof Joshi's Raghu font (468 ligatures) http://rohini.ncst.ernet.in/indix/download/font/ * Indlinux Devenagari font (204 ligature) http://www.indlinux.org/fonts/ It would be nice if they would somehow correspond to the above three categories... :) What is the situation with other Indic scripts? With kind regards, Primoz Peterlin -- Primo=BE Peterlin, In=B9titut za biofiziko, Med. fakulteta, Univerza v Lj= ubljani Lipi=E8eva 2, SI-1000 Ljubljana, Slovenija. primoz.peterlin@biofiz.mf.uni-= lj.si Tel: +386-1-5437632, fax: +386-1-4315127, http://sizif.mf.uni-lj.si/~peterl= in/ F8021D69 OpenPGP fingerprint: CB 6F F1 EE D9 67 E0 2F 0B 59 AF 0D 79 56 19= 0F |
From: Guntupalli K. <kar...@fr...> - 2002-03-06 15:45:13
|
On Wed, 6 Mar 2002 11:18:16 +0100 (MET) Primoz Peterlin <pri...@bi...> wrote: > Dear gentlemen, > > Encouraged by the URW++ release of core 35 PostScript fonts under > the terms of GNU GPL and the steady improvement of the PfaEdit > PostScript font editor <http://pfaedit.sourceforge.net/>, I set > myself a goal to compile a set of free (GPL-ed) outline fonts > covering a range of ISO10646/Unicode as broad as reasonably > achievable. The partial results of this effort are available on the > project page, <http://savannah.gnu.org/projects/freefont/>. > > As Indic scripts seem to remain Unicode's largest grey area, I was > very happy when Prof. Hariharan told me of this project, as it seems > that a large portion of knowledge is concentrated here. > > As a first question, I would like to ask whether there is any > agreement on the sets of ligatures needed to render particular Indic > scripts, i.e. > Not all scripts have officially standardized glyph set, except for Tamil & Kannada. > * a minimal set, e.g. for use in email (for instance, like the > lam-alif in Arabic) > * a practical set, e.g. for use on WWW or in newspaper (required to > typeset a modern language) > * a maximal set, including all glyphs needed to render traditional > texts, including rare or theoretical ligatures > > So far, I know of three sets of ligatures for Devanagari alone: > > * Frans Velthuis' Devanagari metafont (cca. 120 ligatures) > http://www.ctan.org/tex-archive/language/devanagari/ > * Prof Joshi's Raghu font (468 ligatures) > http://rohini.ncst.ernet.in/indix/download/font/ > * Indlinux Devenagari font (204 ligature) > http://www.indlinux.org/fonts/ This font was worked keeping in mind the above categories, we started from minimal to practical and in future to a maximal . So Currently it satisfies the first two. This font will be soon released under GNU GPL (now that it has finally got a sponsor :-) . > > It would be nice if they would somehow correspond to the above three > categories... :) > > What is the situation with other Indic scripts? There are the Bharatbhasha shusha set of fonts which take the minimal approach , available at http://www.bharatbhasha.org.in/ They cater to the scripts - Devanagari (Hindi & Marathi), Gurmukhi, Bengali, Gujarati. For Kannada, an organisation Kannada Ganaka Parishat ( www.ganakaparishat.org ) has done glyph standardisation (contact person: Dr U B Pavanaja < pav...@vi... > ) For Tamil , A glyph standard called TSCII has already been evolved. http://www.geocities.com/Athens/5180/tnet99.html http://www.geocities.com/Athens/5180/tscii.html For Telugu there is no standard yet but a GPLed font is available at (http://chaitanya.bhaavana.net/fonts/). It has a rich set of glyphs, and can cater to first 2 categories. I am making a opentype version of it. Dont have much info. about other scripts ( Gujarati, Gurmukhi, Bengali, Oriya ) Regards, Karunakar |
From: Primoz P. <pri...@bi...> - 2002-03-06 19:50:34
|
-----BEGIN PGP SIGNED MESSAGE----- Hello, On Wed, 6 Mar 2002, Guntupalli Karunakar wrote: > > As a first question, I would like to ask whether there is any > > agreement on the sets of ligatures needed to render particular Indic > > scripts, ... > Not all scripts have officially standardized glyph set, except for > Tamil & Kannada. Thank you. I just learned that the Kannada standard set is available on the Karnataka Directorate of Information Technology site. Do you have any pointers to the Tamil specifications? > > http://www.indlinux.org/fonts/ > This font was worked keeping in mind the above categories, we > started from minimal to practical and in future to a maximal . So > Currently it satisfies the first two. This font will be soon released > under GNU GPL (now that it has finally got a sponsor :-) . Congratulations! > > What is the situation with other Indic scripts? > There are the Bharatbhasha shusha set of fonts which take the minimal > approach , available at http://www.bharatbhasha.org.in/ They cater to > the scripts - Devanagari (Hindi & Marathi), Gurmukhi, Bengali, > Gujarati. That's a new URL, thank you. The last I know was http://www.bharabhasha.net/, and seemed to be neglected. > For Kannada, an organisation Kannada Ganaka Parishat ( > www.ganakaparishat.org ) has done glyph standardisation (contact > person: Dr U B Pavanaja < pav...@vi... > ) I believe this was a very wise and important step. > For Tamil , A glyph standard called TSCII has already been evolved. > http://www.geocities.com/Athens/5180/tnet99.html > http://www.geocities.com/Athens/5180/tscii.html Thank you for the link! > For Telugu there is no standard yet but a GPLed font is available at > (http://chaitanya.bhaavana.net/fonts/). It has a rich set of glyphs, > and can cater to first 2 categories. I am making a opentype version of > it. Thank you, I've found this font already, but as I cannot read Telugu, wasn't able to devise which glyphs represent which ligatures. Thank you very much, Primoz - -- Primo=BE Peterlin, In=B9titut za biofiziko, Med. fakulteta, Univerza v Lj= ubljani Lipi=E8eva 2, SI-1000 Ljubljana, Slovenija. primoz.peterlin@biofiz.mf.uni-= lj.si Tel: +386-1-5437632, fax: +386-1-4315127, http://sizif.mf.uni-lj.si/~peterl= in/ F8021D69 OpenPGP fingerprint: CB 6F F1 EE D9 67 E0 2F 0B 59 AF 0D 79 56 19= 0F -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (HP-UX) Comment: For info see http://www.gnupg.org iQB1AwUBPIZy1z3bcxr4Ah1pAQGA9gMAtKgle+JVO25PkTOC5XquVt5CKv3b+e6l 6k8TN2nVS5RKWbgARBs6qi78I9xu1ndQq90DIGsPR1dmD3AhN3vskZ6FOetp/i45 pOqMGdwH3giq8NPzc0bYuC93aTb0jb6V =3DX2G4 -----END PGP SIGNATURE----- |
From: Rajkumar S <s_...@my...> - 2002-03-06 18:31:20
|
On Wed, 6 Mar 2002, Primoz Peterlin wrote: > Dear gentlemen, > > Encouraged by the URW++ release of core 35 PostScript fonts under the > terms of GNU GPL and the steady improvement of the PfaEdit PostScript > font editor <http://pfaedit.sourceforge.net/>, I set myself a goal to > compile a set of free (GPL-ed) outline fonts covering a range of > ISO10646/Unicode as broad as reasonably achievable. The partial > results of this effort are available on the project page, > <http://savannah.gnu.org/projects/freefont/>. > > As Indic scripts seem to remain Unicode's largest grey area, I was > very happy when Prof. Hariharan told me of this project, as it seems > that a large portion of knowledge is concentrated here. > > As a first question, I would like to ask whether there is any > agreement on the sets of ligatures needed to render particular Indic > scripts, i.e. Absolutely none for Malayalam. > * a minimal set, e.g. for use in email (for instance, like the lam-alif > in Arabic) > * a practical set, e.g. for use on WWW or in newspaper (required to > typeset a modern language) This two sets are minimally different for Malayalam. In fact their are two different ligature sets for Malayalam, One is the original written script which has a lot of ligatures. Mr Hussin who has done extensive work in the original Malayalam script has so far identified about 900 glyphs and still counting. As for the so called "reformed" script which was mainly truncating Malayalam for typewriting, I am not exactly aware of the exact number of glyphs involved. but in general it can be taken from some of the CDAC fonts that are lying around. I will be more than happy to provide any information to help you to create freely available Malayalam fonts. raj |
From: Primoz P. <pri...@bi...> - 2002-03-06 20:06:07
|
-----BEGIN PGP SIGNED MESSAGE----- On Wed, 6 Mar 2002, Rajkumar S wrote: > > As a first question, I would like to ask whether there is any > > agreement on the sets of ligatures needed to render particular Indic > > scripts, i.e. > Absolutely none for Malayalam. I find this hard to believe. According to <http://www.portalkerala.com/hevents2.htm>, newspapers have been published in Malayalam for over 150 years. I am sure that the typesetters had to make some selection of glyphs? > > * a minimal set, e.g. for use in email (for instance, like the lam-alif > > in Arabic) > > * a practical set, e.g. for use on WWW or in newspaper (required to > > typeset a modern language) > This two sets are minimally different for Malayalam. In fact their are tw= o > different ligature sets for Malayalam, One is the original written script > which has a lot of ligatures. Mr Hussin who has done extensive work in th= e > original Malayalam script has so far identified about 900 glyphs and stil= l > counting. > As for the so called "reformed" script which was mainly truncating > Malayalam for typewriting, I am not exactly aware of the exact number of > glyphs involved. but in general it can be taken from some of the CDAC > fonts that are lying around. If you take any Malayalam newspaper, or any textbook, is it typeset in the written script or the reformed one? I guess we should aim at this, as people are used to read it. > I will be more than happy to provide any information to help you to creat= e > freely available Malayalam fonts. Thank you! Not being a native Malayalam reader, I will sure need and appreciate your help. So far, I have found three sites offering Malayalam fonts: http://www.deepika.com/font.htm http://www.malayalamvarikha.com/font/download.htm http://www.goodnews-weekly.com/malayalam.htm None of them are free, so I cannot include it in the GPL-ed font (I can look at them when I design one, I guess :) None of them seem to contain more than some 80 ligatures. Do you know of any better? With kind regards, Primoz - -- Primo=BE Peterlin, In=B9titut za biofiziko, Med. fakulteta, Univerza v Lj= ubljani Lipi=E8eva 2, SI-1000 Ljubljana, Slovenija. primoz.peterlin@biofiz.mf.uni-= lj.si Tel: +386-1-5437632, fax: +386-1-4315127, http://sizif.mf.uni-lj.si/~peterl= in/ F8021D69 OpenPGP fingerprint: CB 6F F1 EE D9 67 E0 2F 0B 59 AF 0D 79 56 19= 0F -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (HP-UX) Comment: For info see http://www.gnupg.org iQB1AwUBPIZ2Qz3bcxr4Ah1pAQELrQMArTWPpBdGtYDMiFsVIhbm7uypbykf+kfa WLcg8oB3t1Qi5/dqK6FGuIUlYduHqZCXYq2DiMiDxeOwfAJuGqRtLF2uHHsiSlaE QQ7zZoeHnOA75fBldtLuLg2CY5TkrMef =3DyL5+ -----END PGP SIGNATURE----- |
From: Rajkumar S <s_...@my...> - 2002-03-06 20:16:13
|
On Wed, 6 Mar 2002, Primoz Peterlin wrote: > I find this hard to believe. According to > <http://www.portalkerala.com/hevents2.htm>, newspapers have been > published in Malayalam for over 150 years. I am sure that the > typesetters had to make some selection of glyphs? they all choose a subset of glyphs, but the problem is that they choose different sub set. Since any conjunct in Malayalam can be represented as combination of base glyps and Halant, even if their are difference in the actual glyph chosen readability is not compromised. In general the more the number of glyphs, the better. But you can do with less glyphs also. > If you take any Malayalam newspaper, or any textbook, is it typeset in > the written script or the reformed one? I guess we should aim at this, > as people are used to read it. It is with reformed script, but some higher quality papers choose to include more glyphs. > Thank you! Not being a native Malayalam reader, I will sure need and > appreciate your help. It is my pleasure. > None of them are free, so I cannot include it in the GPL-ed font (I > can look at them when I design one, I guess :) None of them seem to > contain more than some 80 ligatures. Do you know of any better? Take a look at the fonts my Hellingman of Malayalam TeX, which should be available at CTAN. It is free, though the shapes of the glyphs are not real high quality. I will also try to hunt down some good Malayalam fonts as they appear in newspapers and other print media, If you can send me your postal address offline. raj |
From: Keyur S. <key...@ya...> - 2002-03-07 07:07:55
|
Hi, --- Primoz Peterlin <pri...@bi...> > Encouraged by the URW++ release of core 35 PostScript > fonts under the > terms of GNU GPL and the steady improvement of the > PfaEdit PostScript font > editor <http://pfaedit.sourceforge.net/>, I set myself a > goal to compile a > set of free (GPL-ed) outline fonts covering a range of > ISO10646/Unicode as > broad as reasonably achievable. The partial results of > this effort are > available on the project page, > <http://savannah.gnu.org/projects/freefont/>. This is really a very good effort. > As a first question, I would like to ask whether there is > any agreement on > the sets of ligatures needed to render particular Indic > scripts, i.e. As far as I know, during last few months our Ministry was in process to standardize glyph sets for all Indic scripts. However the technology like OpenType gives freedom to font designer to define his/her own glyphset. So no standardization is required for OpenType font. However such standardization will help in designing fonts in other kind of technologies. I'll learn more about it when I attend meeting with our Ministry and other organizations (C-DAC, IIT-Kanpur, etc.) in Delhi on Saturday. > > * a minimal set, e.g. for use in email (for instance, > like the lam-alif > in Arabic) > * a practical set, e.g. for use on WWW or in newspaper > (required to > typeset a modern language) > * a maximal set, including all glyphs needed to render > traditional texts, > including rare or theoretical ligatures > > * Prof Joshi's Raghu font (468 ligatures) > http://rohini.ncst.ernet.in/indix/download/font/ Raghu font has 674 glyphs and very much fit in the third category. We are also planning to design variants of Raghu font which will fall in first and second category respectively. Hopefully within next one year we shall design OpenType font for each of the other Indic scripts and put it in public domain. > What is the situation with other Indic scripts? According to my knowledge there are two such widely used standard available currently in India. ISFOC defines glyphset for each of the Indic scripts. tscii is another standard for Tamil script only. There may be other standards which I don't know about. Regards, Keyur __________________________________________________ Do You Yahoo!? Try FREE Yahoo! Mail - the world's greatest free email! http://mail.yahoo.com/ |
From: Arun S. <ar...@sh...> - 2002-03-07 17:53:05
|
On Wed, Mar 06, 2002 at 11:07:53PM -0800, Keyur Shroff wrote: > As far as I know, during last few months our Ministry was > in process to standardize glyph sets for all Indic scripts. > However the technology like OpenType gives freedom to font > designer to define his/her own glyphset. Perhaps what the standardization efforts could do is: Publish charts similar to this one: http://www.sharma-home.net/~adsharma/languages/kannada/kaguNita.html [ Requires fonts such as Tunga from MS to view. This font has some known errors ] and then designate certain ranges as optional. -Arun |
From: Primoz P. <pri...@bi...> - 2002-03-08 18:15:30
|
-----BEGIN PGP SIGNED MESSAGE----- On Thu, 7 Mar 2002, Arun Sharma wrote: > On Wed, Mar 06, 2002 at 11:07:53PM -0800, Keyur Shroff wrote: > > As far as I know, during last few months our Ministry was > > in process to standardize glyph sets for all Indic scripts. > > However the technology like OpenType gives freedom to font > > designer to define his/her own glyphset. > Perhaps what the standardization efforts could do is: > Publish charts similar to this one: > http://www.sharma-home.net/~adsharma/languages/kannada/kaguNita.html > [ Requires fonts such as Tunga from MS to view. This font has some known > errors ] > and then designate certain ranges as optional. For those without the Tunga font, it would help a lot if the table would also be available as bitmap image. But I believe that not all combinations really appear in live written language, or do they? With kind regards, Primoz - -- Primo=BE Peterlin, In=B9titut za biofiziko, Med. fakulteta, Univerza v Lj= ubljani Lipi=E8eva 2, SI-1000 Ljubljana, Slovenija. primoz.peterlin@biofiz.mf.uni-= lj.si Tel: +386-1-5437632, fax: +386-1-4315127, http://sizif.mf.uni-lj.si/~peterl= in/ F8021D69 OpenPGP fingerprint: CB 6F F1 EE D9 67 E0 2F 0B 59 AF 0D 79 56 19= 0F -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (HP-UX) Comment: For info see http://www.gnupg.org iQB1AwUBPIj/hT3bcxr4Ah1pAQGolgMAobCA+wiZVeZj/Tw5auLPI6S5Qq52WaNO U9ptx9UFoFKeVL4+p0MMlqoZ0cDq7bfpV7/Gp93haGwE+isT/Lg+a+VFOCNeoZFJ qCldaLIG2pXcTspu9bqQJ72hnqCN8qi0 =3DbRfF -----END PGP SIGNATURE----- |
From: Arun S. <ar...@sh...> - 2002-03-08 18:46:08
|
On Fri, Mar 08, 2002 at 07:14:25PM +0100, Primoz Peterlin wrote: > > For those without the Tunga font, it would help a lot if the table would > also be available as bitmap image. > Actually, I tried saving it as PDF before I mailed it out, but it failed. When I open the PDF, it says "can't find the Tunga font". I tried to make the PDF writer embed the font in the PDF, without much success. Anyone on this list know how to do it ? > But I believe that not all combinations really appear in live > written language, or do they? I can certainly say that characters like U+919 and U+91E (Dev) and their equivalents in Kannada are extremely rare in the written language. If we had large amounts of representative unicode text available in Indian languages, we could've done a frequency analysis to figure out which ones were more common. I'll try to write something up later today. While we're on the topic, any opinions on how programs like "wc" should behave for Indian languages ? Should they not count the combination of a consonant and a vowel as a character ? -Arun |
From: Arun S. <ar...@sh...> - 2002-03-10 01:19:14
|
On Fri, Mar 08, 2002 at 10:51:40AM -0800, Arun Sharma wrote: [ Context: on the topic of coming up with a "common minimum" glyph set for Indian languages ] > > If we had large amounts of representative unicode text available in > Indian languages, we could've done a frequency analysis to figure out > which ones were more common. > > I'll try to write something up later today. While we're on the topic, > any opinions on how programs like "wc" should behave for Indian > languages ? Should they not count the combination of a consonant and a > vowel as a character ? ok, I wrote up a script: http://www.sharma-home.net/~adsharma/languages/scripts/lf.py On running the script on this page: [ <meta http-equiv="content-type" content="text/html; charset=UTF-8"> <meta http-equiv="content-language" content="kn-IN"> ] http://www.sharma-home.net/~adsharma/languages/kannada/shivarama-karant.html I get this: http://www.sharma-home.net/~adsharma/languages/scripts/freq.txt Interesting stats 1. The number of times the halant was used - I guess this is because every "vattu" needs one. 2. The dependent vowel "e" came in second (might be similar to English, where e is the most frequent letter) TODO: to count the frequency on a per-syllable basis, rather than a per character basis. Will need libraries to do the consonant-vowel composition and then run it through lf.py. I see some code in Emacs lisp, which is doing such computation: http://www.mit.edu/afs/athena.mit.edu/project/ptest/emacs/emacs-20.5/lisp/language/devan-util.el -Arun |
From: Arun S. <ar...@sh...> - 2002-03-11 06:47:25
|
On Sat, Mar 09, 2002 at 05:24:45PM -0800, Arun Sharma wrote: > TODO: to count the frequency on a per-syllable basis, rather than a per > character basis. Will need libraries to do the consonant-vowel > composition and then run it through lf.py. I finished this work today. Please review the state machine I used to do the composition: http://www.sharma-home.net/~adsharma/languages/scripts/state-machine.jpg The code: http://www.sharma-home.net/~adsharma/languages/scripts/lf.py http://www.sharma-home.net/~adsharma/languages/scripts/kannada.py http://www.sharma-home.net/~adsharma/languages/scripts/indian.py The result of running the above code on: http://www.sharma-home.net/~adsharma/languages/kannada/shivarama-karant.html is here: http://www.sharma-home.net/~adsharma/languages/scripts/freq.txt The above data can be used to (a) Design keyboards based on the analysis of which syllables are more frequent and which syllables often occur next to each other etc. (b) Publish simplified keyboards and fonts, which contain smaller, more manageable, but incomplete subsets of the language/script. The above code is easily extensible to other Indian languages. All you need to do is copy and modify kannada.py to indicate the vowels, consonants and matras in your language. The code is not very efficient yet. I'm focussing on getting the code right. Python specific issues: 1. Python assumes that the input.py file is ASCII. Specifying unicode literals requires usage of this idiom: x = unicode("foobar", "utf8") 2. Printing unicode text is done as follows: print x.encode("utf8") If there is enough interest, I can collect all this code (and other language specific modules that you may contribute) and try to get them included in the standard python distribution. I'd love to run these scripts on large bodies of unicode text in Indian languages. Any suggestions on where to get such text ? -Arun |
From: Arun S. <ar...@sh...> - 2002-03-11 07:10:09
|
On Sun, Mar 10, 2002 at 10:53:05PM -0800, Arun Sharma wrote: > The above data can be used to > > (a) Design keyboards based on the analysis of which syllables are more > frequent and which syllables often occur next to each other etc. > (b) Publish simplified keyboards and fonts, which contain smaller, more > manageable, but incomplete subsets of the language/script. (c) Cryptanalysis of course :) > > The above code is easily extensible to other Indian languages. All you > need to do is copy and modify kannada.py to indicate the vowels, > consonants and matras in your language. I've added devanagari.py now. > > The code is not very efficient yet. I'm focussing on getting the code > right. > Took 4 mins on a 800 MHz Duron to process 20,000 lines of text. > I'd love to run these scripts on large bodies of unicode text in Indian > languages. Any suggestions on where to get such text ? I ran it on the last 20,000 lines of a UTF-8 encoded English-Hindi dictionary. http://www.sharma-home.net/~adsharma/languages/scripts/dict.txt For those who can't read unicode, top 5 syllables: 1. ra 2. ka 3. nA 4. ta 5. pa -Arun |
From: Guntupalli K. <kar...@fr...> - 2002-03-11 08:36:35
|
On Sun, 10 Mar 2002 23:15:57 -0800 Arun Sharma <ar...@sh...> wrote: > On Sun, Mar 10, 2002 at 10:53:05PM -0800, Arun Sharma wrote: > > The above data can be used to > > > > (a) Design keyboards based on the analysis of which syllables are > > more frequent and which syllables often occur next to each > > other etc.(b) Publish simplified keyboards and fonts, which > > contain smaller, more manageable, but incomplete subsets of the > > language/script. > > > I'd love to run these scripts on large bodies of unicode text in > > Indian languages. Any suggestions on where to get such text ? This site contains lot of text thought not unicode ( but iscii versions were there when I last checked, though cant get through it now ). http://sanskrit.gde.to Contact IIIT Hyd, LTRC team ( vc at iiit.net , amba at iiit.net ) , they have large amounts of ISCII text. > I ran it on the last 20,000 lines of a UTF-8 encoded > English-Hindi dictionary. > > http://www.sharma-home.net/~adsharma/languages/scripts/dict.txt > > For those who can't read unicode, top 5 syllables: > > 1. ra > 2. ka > 3. nA > 4. ta > 5. pa > To support the Inscript keyb layout. on Inscript keyb layout they are under keys ra - 'j' ka - 'k' nA - 'v' ta - 'l' pa - 'h' similarly halant on 'd' VS I on 'f' And these keys are the under normal finger positions 'a s d f' 'h j k l' on a typewriter :) Regards, Karunakar |
From: Arun S. <ar...@sh...> - 2002-03-11 17:27:17
|
[ Snip lin...@li... - some of this may be off topic there ] On Mon, Mar 11, 2002 at 02:13:32PM +0530, Guntupalli Karunakar wrote: [...] > To support the Inscript keyb layout. > on Inscript keyb layout they are under keys > ra - 'j' > ka - 'k' > nA - 'v' > ta - 'l' > pa - 'h' > similarly > halant on 'd' > VS I on 'f' > > And these keys are the under normal finger positions 'a s d f' 'h > j k l' on a typewriter :) Yes, somebody must've run a similar analysis before designing the inscript keyboard. But I suspect that the analysis was done on devanagari. I'm not sure I could reproduce the same numbers on kannada for example. So does Kannada-inscript make sense ? May be, if the characteristics are mostly the same - the cost of deviating from devangari-inscript may not be justifiable. I found that I was typing the syllable "lli" (U+cb2 U+ccd U+cb2 U+cbf) very often in kannada and it's pretty inconvenient with kannada-inscript. I don't think this occurs often enough in devanagari based languages. I'm sure you'll appreciate this if you have typed your name with inscript :) -Arun PS: Should this discussion be on -standards ? |
From: <jk...@Fr...> - 2002-03-11 09:26:21
|
> http://www.sharma-home.net/~adsharma/languages/scripts/freq.txt A suggestion: could you print the Unicode numbers (i.e U+ABCD) along side the UTF-8 string displayed. This would help people on platforms without support for Unicode rendering to make sense of the data. Regards, Koshy <jk...@fr...> |
From: Arun S. <ar...@sh...> - 2002-03-11 17:14:23
|
On Mon, Mar 11, 2002 at 01:25:11AM -0800, Joseph Koshy wrote: > > > > http://www.sharma-home.net/~adsharma/languages/scripts/freq.txt > > A suggestion: could you print the Unicode numbers (i.e U+ABCD) along > side the UTF-8 string displayed. > > This would help people on platforms without support for Unicode > rendering to make sense of the data. That's a good one. I've made the code change, running the script again now - by the time you read this, you should see the unicode numbers in http://www.sharma-home.net/~adsharma/languages/scripts/dict.txt A quick profiling of the code indicated that the performance problems are due to the string manipulation: str = str + "abc" is inefficient in python, because strings are immutable and doing string concatenation in a loop creates too many objects. (This is true of Java also). The trick is to collect them in a list and do string.join(list). Will make the change later today. -Arun |
From: Dr. U.B. P. <pav...@vi...> - 2002-03-29 07:56:22
|
> On Sat, Mar 09, 2002 at 05:24:45PM -0800, Arun Sharma wrote: > The above data can be used to > > (a) Design keyboards based on the analysis of which syllables are more > frequent and which syllables often occur next to each other etc. > (b) Publish simplified keyboards and fonts, which contain smaller, > more > manageable, but incomplete subsets of the language/script. > > The above code is easily extensible to other Indian languages. All you > need to do is copy and modify kannada.py to indicate the vowels, > consonants and matras in your language. People at Prajavani (a leading Kannada daily) have done the frequency analysis for Kannada letters looooooooong ago. They had even designed their Montype Kannada layout based on these data. Now all these have gone due to the advent of GoK (Govt of Karnataka) standard Kannada keyboard layout (also know as KGP (Kannada Ganaka Parishat) std). I hope everyone remembers the story about QWERTY keyboard that we use for English. It was designed to make typing SLOW rather than FAST ;-) Rgds, Pavanaja ----------------------------------------------------- Dr. U.B. Pavanaja Editor, Vishva Kannada World's first Internet magazine in Kannada http://www.vishvakannada.com/ Note: I don't worry about pselling mixtakes |
From: Arun S. <ar...@sh...> - 2002-03-29 08:08:39
|
On Fri, Mar 29, 2002 at 01:26:06PM +0530, Dr. U.B. Pavanaja wrote: > > People at Prajavani (a leading Kannada daily) have done the frequency > analysis for Kannada letters looooooooong ago. They had even designed > their Montype Kannada layout based on these data. Did they publish their findings somewhere ? > Now all these have > gone due to the advent of GoK (Govt of Karnataka) standard Kannada > keyboard layout (also know as KGP (Kannada Ganaka Parishat) std). Did the KGP keyboard take into account such analyses ? I'm sure it's easy to use for people already comfortable with qwerty. If the world didn't have qwerty or English and you were (the Indian) Sholes designing a keyboard for kannada, how would you design it ? > > I hope everyone remembers the story about QWERTY keyboard that we use > for English. It was designed to make typing SLOW rather than FAST ;-) Qwerty isn't the unchallenged king of the keyboard universe. I recently bumped into Dvorak: http://www.mwbrooks.com/dvorak/ -Arun |
From: Dr. U.B. P. <pav...@vi...> - 2002-03-29 08:35:26
|
> People at Prajavani (a leading Kannada daily) have done the frequency > > analysis for Kannada letters looooooooong ago. They had even > designed > their Montype Kannada layout based on these data. > > Did they publish their findings somewhere ? I tried to hunt for it. But could not get hold. So many people have changed during these many years at their technical section and nobody knows about the whereabouts of that data. > > > Now all these have > > gone due to the advent of GoK (Govt of Karnataka) standard Kannada > > keyboard layout (also know as KGP (Kannada Ganaka Parishat) std). > > Did the KGP keyboard take into account such analyses ? I'm sure it's > easy to use for people already comfortable with qwerty. No. KGP/GoK keyboard is based on QWERTY. Except for Z,X,Q,W and the f (link key) keys all othe rkeys are intuitive for those who are familiar with QWERTY. For ex, k is for ka, K is for Kha, etc. For those people who use Englsih as well as Kannada, this keyboard layout becomes very easy. KGP layout is not totally invented by KGP. It was originally by KP Rao and was modified slightly by KGP. KP Rao is 80+ years of age now, but still active. He is currently with his daughter in Delhi. I am in touch with him about our common interests. > If the world didn't have qwerty or English and you were (the Indian) > Sholes designing a keyboard for kannada, how would you design it ? Probably I would have designed it very close to the Monotype layout once used by Prajavani. > > I hope everyone remembers the story about QWERTY keyboard that we > > use for English. It was designed to make typing SLOW rather than > > FAST ;-) > > Qwerty isn't the unchallenged king of the keyboard universe. I > recently bumped into Dvorak: I am aware of another layout by IBM. It was called something EBC... I had used it during my card-punching days at BARC. It was far superior to qwerty, but it died. Rgds, Pavanaja ----------------------------------------------------- Dr. U.B. Pavanaja Editor, Vishva Kannada World's first Internet magazine in Kannada http://www.vishvakannada.com/ Note: I don't worry about pselling mixtakes |