indic-computing-devel Mailing List for The Indic-Computing Project (Page 19)
Status: Alpha
Brought to you by:
jkoshy
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(14) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(25) |
Feb
(90) |
Mar
(41) |
Apr
(16) |
May
(8) |
Jun
|
Jul
(37) |
Aug
(35) |
Sep
(62) |
Oct
(37) |
Nov
(22) |
Dec
(7) |
2003 |
Jan
(16) |
Feb
(19) |
Mar
(10) |
Apr
(5) |
May
(26) |
Jun
(11) |
Jul
(35) |
Aug
(4) |
Sep
(14) |
Oct
(5) |
Nov
(5) |
Dec
(10) |
2004 |
Jan
(25) |
Feb
(2) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(10) |
Aug
(2) |
Sep
(2) |
Oct
(1) |
Nov
(9) |
Dec
|
2005 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2006 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(4) |
Dec
|
From: Dr. U.B. P. <pav...@vi...> - 2002-03-29 07:56:22
|
> On Sat, Mar 09, 2002 at 05:24:45PM -0800, Arun Sharma wrote: > The above data can be used to > > (a) Design keyboards based on the analysis of which syllables are more > frequent and which syllables often occur next to each other etc. > (b) Publish simplified keyboards and fonts, which contain smaller, > more > manageable, but incomplete subsets of the language/script. > > The above code is easily extensible to other Indian languages. All you > need to do is copy and modify kannada.py to indicate the vowels, > consonants and matras in your language. People at Prajavani (a leading Kannada daily) have done the frequency analysis for Kannada letters looooooooong ago. They had even designed their Montype Kannada layout based on these data. Now all these have gone due to the advent of GoK (Govt of Karnataka) standard Kannada keyboard layout (also know as KGP (Kannada Ganaka Parishat) std). I hope everyone remembers the story about QWERTY keyboard that we use for English. It was designed to make typing SLOW rather than FAST ;-) Rgds, Pavanaja ----------------------------------------------------- Dr. U.B. Pavanaja Editor, Vishva Kannada World's first Internet magazine in Kannada http://www.vishvakannada.com/ Note: I don't worry about pselling mixtakes |
From: <as...@mi...> - 2002-03-20 14:45:26
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <META http-equiv=Content-Type content="text/html; charset=iso-8859-1"> <STYLE type=text/css>.MailHeader { FONT-WEIGHT: normal; FONT-SIZE: 8pt; COLOR: #000000; FONT-STYLE: normal; FONT-FAMILY: "Arial"; TEXT-DECORATION: none } </STYLE> <META content="MSHTML 5.50.4522.1800" name=GENERATOR></HEAD> <BODY bgColor=#ffffff> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>great idea. maybe you can lead this part of the discussion at the workshop, and maybe also coordinate development of such a platform?</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>are u still coming to bombay this weekend? vijay is here this wknd also and we can all get together and talk...</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>--tapan</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <BLOCKQUOTE dir=ltr style="PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px"> <DIV style="FONT: 10pt arial">----- Original Message ----- </DIV> <DIV style="BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: black"><B>From:</B> <A title=as...@mi... href="mailto:as...@mi...">as...@mi...</A> </DIV> <DIV style="FONT: 10pt arial"><B>To:</B> <A title=ta...@ya... href="mailto:ta...@ya...">'Tapan S. Parikh'</A> </DIV> <DIV style="FONT: 10pt arial"><B>Cc:</B> <A title=ind...@li... href="mailto:ind...@li...">ind...@li...</A> ; <A title=sf-...@mm... href="mailto:sf-...@mm...">sf-...@mm...</A> </DIV> <DIV style="FONT: 10pt arial"><B>Sent:</B> Tuesday, March 19, 2002 11:01 PM</DIV> <DIV style="FONT: 10pt arial"><B>Subject:</B> Re: [Indic-computing-devel] Re: Value (from FSF-India)</DIV> <DIV><BR></DIV><PRE><FONT size=2>Tapan, I see a lot of value in what you are proposing. In fact I was thinking along the same lines. I just attended a two-da y meeting of regional partners (read NGOs) of OneWorld South Asia in Delhi in which I presented a case for enabling ICTs for local languag es keeping in mind the work these NGOs are doing among people at the grassroots level. I could see some of the NGOs were pretty savvy when it came to use of ICTs, while most were ICT challenged. They need he lp, handholding and constant motivation in getting onto the ICT bandw agon. And almost all of them expressed a desire to be part of a forum /platform where their ICT related queries would be answered timely an d in a language they understand. This is where networks like OneWorld or sourceforge can play a major role. Maybe we can have a parallel n etwork for NGOs (NGO-computing ?) on the lines of Indic-computing. I proposed to OneWorld to act as ICT evangelist to these NGOs. BTW , most of these NGOs are on email and even have a functional website. Ashish > -----------------------Original Message------------ ------------- > From : 'Tapan S. Parikh' <TA...@YA...> > To : 'fsf...@mm...' <FSF...@MM...> > Sent Date : Tue Mar 19 18:51:58 IST 2002 > Subject : [Indic-computing-devel ] Re: Value (from FSF-India) > > > Raj, > > This is one hope of the workshop that we have been talking about and > th at you alluded to. To bring the developer community, particularly > progressive, aware technicians such as ourselves, closer to the NG O, > development and govt communities, so we can better understand > their needs, and provide better thought-out tools that meet the ir > requirements, while still remaining technically sound, true t o our ideals > of interoperability and freedom, whatever they may be for us. > > There are tons of NGOs to work with, Sristi (wh o I used to work with) and > SEWA in Ahmedabad come immediately to mind because I have first-hand > knowledge. Also many various go vt agencies could use help, and as someone > enlightened me about earlier today, starting to interface with them on the > advantages of free software is a definite possibility. Get together with > other developers, and people you know in the larger community, and st art to > make inroads if you can. > > But I know this is ha rd until there is a platform of dicussion to which we > can both c ome to to discuss, which is not technically intimidating to the > NGO / govt side, nor tedious and undirected to the technical side. > Otherwise discussion will be unfocused and vague and will lead now here, > which I also tell you from first hand experience. > > One thing I am imagining as I write this is very interesting - how about a > sourceforge type platform for rural and development gro ups to post software > problems on? These could then be refined i n to full-fledged software > specifications, and then taken up by people like us who find such projects > interesting to develop. T hat would be a great work model, and maybe we can > start talking about building such a web platform. Where to start? Can we > sta rt discussing the requirements for such a platform? Can we make it a > sourceforge project, and start working on it? (Hypothetical qu estions, > because of course we can!) I look forward to it, I thi nk it would be a > great contribution if we could do that. > > --Tapan > > > > > Can we collect the details of NGOs an d other groups so that we can get > > in touch with them. Kerala S astra Sahitya Parishad is one such > > organization in Kerala. > > > > Any one involved with NGOs here? > > > > > - setting up training and education forums and trying to get > > > people, particularly from these communities, involved > > > > Rather, c an we work on creating some training material so that the > > NGOs can use it to learn themselves, This will be more productive i > > the long run as they need not be dependent in us. > > > > One such example is the tutorial collection of TugIndia. > > <HTTP: tutorials.html www.tug.org.in> If we can get about 10 - 15 > > peo ple and if each can write a chapter we can create some nice > > tu torial and use this mail list to answer the queries for the tutorial > > users. > > > > > - analyzing the needs of local language and rural software in > > > India and start developing tools and applications > > > > This is a very important step. I am lookin g fwd to attend the > > conference you are organizing. I myself is working on and off on > > various stuff for Malayalam. > > > > raj > > > > > > __________________________ _______________________________ > Do You Yahoo!? > Get your fre e @yahoo.com address at http://mail.yahoo.com > > > ______ _________________________________________ > Indic-computing-devel mailing list > http://indic-computing.sourceforge.net/ > Ind...@li... > https://lists.sourceforge.net/lists/listinfo/indic-computing-devel > </FONT></PRE></BLOCKQUOTE></BODY></HTML> |
From: Rajkumar S <s_...@my...> - 2002-03-19 19:57:44
|
On Tue, 19 Mar 2002 as...@mi... wrote: > I could see some of the NGOs were pretty savvy when it came to use of > ICTs, while most were ICT challenged. They need help, handholding and > constant motivation in getting onto the ICT bandwagon. And almost all > of them expressed a desire to be part of a forum/platform where their > ICT related queries would be answered timely and in a language they > understand. FSF India can act as a gateway between NGOs and the geeks, I have already made a proposal at the FSF India list to use a topic of the FSF India forum for NGOs. I will talk about this in the FSF working group also. > This is where networks like OneWorld or sourceforge can play a major > role. Maybe we can have a parallel network for NGOs (NGO-computing ?) > on the lines of Indic-computing. I proposed to OneWorld to act as ICT > evangelist to these NGOs. FSF can help the OneWorld in making technical choices and even train some of the people in GNU/Linux It will be an interesting development if the Free software community and NGOs come together. raj |
From: Tapan S. P. <ta...@ya...> - 2002-03-19 13:01:09
|
Raj, This is one hope of the workshop that we have been talking about and that you alluded to. To bring the developer community, particularly progressive, aware technicians such as ourselves, closer to the NGO, development and govt communities, so we can better understand their needs, and provide better thought-out tools that meet their requirements, while still remaining technically sound, true to our ideals of interoperability and freedom, whatever they may be for us. There are tons of NGOs to work with, Sristi (who I used to work with) and SEWA in Ahmedabad come immediately to mind because I have first-hand knowledge. Also many various govt agencies could use help, and as someone enlightened me about earlier today, starting to interface with them on the advantages of free software is a definite possibility. Get together with other developers, and people you know in the larger community, and start to make inroads if you can. But I know this is hard until there is a platform of dicussion to which we can both come to to discuss, which is not technically intimidating to the NGO / govt side, nor tedious and undirected to the technical side. Otherwise discussion will be unfocused and vague and will lead nowhere, which I also tell you from first hand experience. One thing I am imagining as I write this is very interesting - how about a sourceforge type platform for rural and development groups to post software problems on? These could then be refined in to full-fledged software specifications, and then taken up by people like us who find such projects interesting to develop. That would be a great work model, and maybe we can start talking about building such a web platform. Where to start? Can we start discussing the requirements for such a platform? Can we make it a sourceforge project, and start working on it? (Hypothetical questions, because of course we can!) I look forward to it, I think it would be a great contribution if we could do that. --Tapan > > Can we collect the details of NGOs and other groups so that we can get > in touch with them. Kerala Sastra Sahitya Parishad is one such > organization in Kerala. > > Any one involved with NGOs here? > > > - setting up training and education forums and trying to get > > people, particularly from these communities, involved > > Rather, can we work on creating some training material so that the > NGOs can use it to learn themselves, This will be more productive i > the long run as they need not be dependent in us. > > One such example is the tutorial collection of TugIndia. > <http://www.tug.org.in/tutorials.html> If we can get about 10 - 15 > people and if each can write a chapter we can create some nice > tutorial and use this mail list to answer the queries for the tutorial > users. > > > - analyzing the needs of local language and rural software in > > India and start developing tools and applications > > This is a very important step. I am looking fwd to attend the > conference you are organizing. I myself is working on and off on > various stuff for Malayalam. > > raj _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com |
From: Arun S. <ar...@sh...> - 2002-03-15 17:43:47
|
Rajkumar S wrote: >On Thu, 14 Mar 2002, Pat Hall wrote: > >>What would also be sensible would be for some migration path to be >>laid down which defined mappings from ISCII to Unicode. >> > >>From what I know of ISCII and Unicode, It should be possible to write a >straight forward converter from ISCII to Unicode. Since Unicode is based >on ISCII. > I've collected some here: http://www.sharma-home.net/~adsharma/languages/scripts/iscii2utf8.pl http://www.sharma-home.net/~adsharma/languages/scripts/ -Arun |
From: Arun S. <ar...@sh...> - 2002-03-14 07:49:20
|
On Thu, Mar 14, 2002 at 12:47:58PM +0530, Tapan S. Parikh wrote: > > Not if you use StringBuffer in Java. (Not that Im some Java advocate or > anything, sometimes it _is_ painfully slow...) Is there any similar > mechanism in Python, or maybe this is it? StringBuffer can be used in jython, but that's not a part of the standard python API. I picked up the idiom from here: http://manatee.mojam.com/~skip/python/fastpython.html#stringcat Am yet to try it out. -Arun |
From: Tapan S. P. <ta...@ya...> - 2002-03-14 07:25:17
|
> A quick profiling of the code indicated that the performance problems > are due to the string manipulation: > > str = str + "abc" > > is inefficient in python, because strings are immutable and doing string > concatenation in a loop creates too many objects. (This is true of Java > also). The trick is to collect them in a list and do string.join(list). > Will make the change later today. Not if you use StringBuffer in Java. (Not that Im some Java advocate or anything, sometimes it _is_ painfully slow...) Is there any similar mechanism in Python, or maybe this is it? --Tapan _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com |
From: Arun S. <ar...@sh...> - 2002-03-11 17:27:17
|
[ Snip lin...@li... - some of this may be off topic there ] On Mon, Mar 11, 2002 at 02:13:32PM +0530, Guntupalli Karunakar wrote: [...] > To support the Inscript keyb layout. > on Inscript keyb layout they are under keys > ra - 'j' > ka - 'k' > nA - 'v' > ta - 'l' > pa - 'h' > similarly > halant on 'd' > VS I on 'f' > > And these keys are the under normal finger positions 'a s d f' 'h > j k l' on a typewriter :) Yes, somebody must've run a similar analysis before designing the inscript keyboard. But I suspect that the analysis was done on devanagari. I'm not sure I could reproduce the same numbers on kannada for example. So does Kannada-inscript make sense ? May be, if the characteristics are mostly the same - the cost of deviating from devangari-inscript may not be justifiable. I found that I was typing the syllable "lli" (U+cb2 U+ccd U+cb2 U+cbf) very often in kannada and it's pretty inconvenient with kannada-inscript. I don't think this occurs often enough in devanagari based languages. I'm sure you'll appreciate this if you have typed your name with inscript :) -Arun PS: Should this discussion be on -standards ? |
From: Arun S. <ar...@sh...> - 2002-03-11 17:14:23
|
On Mon, Mar 11, 2002 at 01:25:11AM -0800, Joseph Koshy wrote: > > > > http://www.sharma-home.net/~adsharma/languages/scripts/freq.txt > > A suggestion: could you print the Unicode numbers (i.e U+ABCD) along > side the UTF-8 string displayed. > > This would help people on platforms without support for Unicode > rendering to make sense of the data. That's a good one. I've made the code change, running the script again now - by the time you read this, you should see the unicode numbers in http://www.sharma-home.net/~adsharma/languages/scripts/dict.txt A quick profiling of the code indicated that the performance problems are due to the string manipulation: str = str + "abc" is inefficient in python, because strings are immutable and doing string concatenation in a loop creates too many objects. (This is true of Java also). The trick is to collect them in a list and do string.join(list). Will make the change later today. -Arun |
From: <jk...@Fr...> - 2002-03-11 09:26:21
|
> http://www.sharma-home.net/~adsharma/languages/scripts/freq.txt A suggestion: could you print the Unicode numbers (i.e U+ABCD) along side the UTF-8 string displayed. This would help people on platforms without support for Unicode rendering to make sense of the data. Regards, Koshy <jk...@fr...> |
From: Guntupalli K. <kar...@fr...> - 2002-03-11 08:36:35
|
On Sun, 10 Mar 2002 23:15:57 -0800 Arun Sharma <ar...@sh...> wrote: > On Sun, Mar 10, 2002 at 10:53:05PM -0800, Arun Sharma wrote: > > The above data can be used to > > > > (a) Design keyboards based on the analysis of which syllables are > > more frequent and which syllables often occur next to each > > other etc.(b) Publish simplified keyboards and fonts, which > > contain smaller, more manageable, but incomplete subsets of the > > language/script. > > > I'd love to run these scripts on large bodies of unicode text in > > Indian languages. Any suggestions on where to get such text ? This site contains lot of text thought not unicode ( but iscii versions were there when I last checked, though cant get through it now ). http://sanskrit.gde.to Contact IIIT Hyd, LTRC team ( vc at iiit.net , amba at iiit.net ) , they have large amounts of ISCII text. > I ran it on the last 20,000 lines of a UTF-8 encoded > English-Hindi dictionary. > > http://www.sharma-home.net/~adsharma/languages/scripts/dict.txt > > For those who can't read unicode, top 5 syllables: > > 1. ra > 2. ka > 3. nA > 4. ta > 5. pa > To support the Inscript keyb layout. on Inscript keyb layout they are under keys ra - 'j' ka - 'k' nA - 'v' ta - 'l' pa - 'h' similarly halant on 'd' VS I on 'f' And these keys are the under normal finger positions 'a s d f' 'h j k l' on a typewriter :) Regards, Karunakar |
From: Arun S. <ar...@sh...> - 2002-03-11 07:10:09
|
On Sun, Mar 10, 2002 at 10:53:05PM -0800, Arun Sharma wrote: > The above data can be used to > > (a) Design keyboards based on the analysis of which syllables are more > frequent and which syllables often occur next to each other etc. > (b) Publish simplified keyboards and fonts, which contain smaller, more > manageable, but incomplete subsets of the language/script. (c) Cryptanalysis of course :) > > The above code is easily extensible to other Indian languages. All you > need to do is copy and modify kannada.py to indicate the vowels, > consonants and matras in your language. I've added devanagari.py now. > > The code is not very efficient yet. I'm focussing on getting the code > right. > Took 4 mins on a 800 MHz Duron to process 20,000 lines of text. > I'd love to run these scripts on large bodies of unicode text in Indian > languages. Any suggestions on where to get such text ? I ran it on the last 20,000 lines of a UTF-8 encoded English-Hindi dictionary. http://www.sharma-home.net/~adsharma/languages/scripts/dict.txt For those who can't read unicode, top 5 syllables: 1. ra 2. ka 3. nA 4. ta 5. pa -Arun |
From: Arun S. <ar...@sh...> - 2002-03-11 06:47:25
|
On Sat, Mar 09, 2002 at 05:24:45PM -0800, Arun Sharma wrote: > TODO: to count the frequency on a per-syllable basis, rather than a per > character basis. Will need libraries to do the consonant-vowel > composition and then run it through lf.py. I finished this work today. Please review the state machine I used to do the composition: http://www.sharma-home.net/~adsharma/languages/scripts/state-machine.jpg The code: http://www.sharma-home.net/~adsharma/languages/scripts/lf.py http://www.sharma-home.net/~adsharma/languages/scripts/kannada.py http://www.sharma-home.net/~adsharma/languages/scripts/indian.py The result of running the above code on: http://www.sharma-home.net/~adsharma/languages/kannada/shivarama-karant.html is here: http://www.sharma-home.net/~adsharma/languages/scripts/freq.txt The above data can be used to (a) Design keyboards based on the analysis of which syllables are more frequent and which syllables often occur next to each other etc. (b) Publish simplified keyboards and fonts, which contain smaller, more manageable, but incomplete subsets of the language/script. The above code is easily extensible to other Indian languages. All you need to do is copy and modify kannada.py to indicate the vowels, consonants and matras in your language. The code is not very efficient yet. I'm focussing on getting the code right. Python specific issues: 1. Python assumes that the input.py file is ASCII. Specifying unicode literals requires usage of this idiom: x = unicode("foobar", "utf8") 2. Printing unicode text is done as follows: print x.encode("utf8") If there is enough interest, I can collect all this code (and other language specific modules that you may contribute) and try to get them included in the standard python distribution. I'd love to run these scripts on large bodies of unicode text in Indian languages. Any suggestions on where to get such text ? -Arun |
From: Arun S. <aru...@sh...> - 2002-03-11 04:38:24
|
Sorry - forwarding a newsgroup message failed because of my mail user agent error. Let me try again. -Arun Arun Sharma wrote: > Forgot to mention - using python 2.1 > > -Arun > > Arun Sharma wrote: > >> >> I would like to iterate over the following unicode string one >> character at a time. >> >> line = u"ಡಾ|| ಶಿವರಾಮ ಕಾರಂತ" >> for c in line: >> print c >> >> fails miserably. What is the right way to do it ? I would also like to >> be able to slice the string i.e. line[i] to get the i'th character. >> >> Thanks in advance, >> >> -Arun >> > > |
From: Arun S. <aru...@sh...> - 2002-03-11 04:33:48
|
Does anyone know how to do this ? It fails because of the variable length encoding used in utf8. -Arun |
From: Rajkumar S <s_...@my...> - 2002-03-10 14:34:09
|
Hi, The current version of Yudit has complete support for malayalam and other indic languages. It can also use Opentype layout tables of Malayalam fonts. I think Yudit is the first Application that can use Opentype tables for Malayalam, as MS is yet to release it's engine for malayalam. raj ---------- Forwarded message ---------- Date: Sun, 10 Mar 2002 10:59:41 +0900 (JST) From: Gaspar Sinai <gs...@yu...> To: lin...@nl... Subject: Yudit 2.5.4 Yudit 2.5.4 has been released. It can be downloaded from: http://www.yudit.org/download.html Changes: o Malayalam,Kannada and Telugu support o Software glyph-mirroring o iso-8859-15, iso-8859-16, koi8-c, koi8-u, ncr and rovas converters o Ukrainian kmap and menu translations o Old Hungarian (rovAsA-rAs) has bee added using Unicode Private Use Area: http://www.yudit.org/download/pua o Fallback font (yudit.ttf) encoding changed from cp-1251 to unicode o HOWTO-rovasiras.txt HOWTO-malayalam.txt has been added o Some of the bugs have been fixed There are a lot more fixes that I planned but I did not have time for... Enjoy, gaspar |
From: Arun S. <ar...@sh...> - 2002-03-10 01:19:14
|
On Fri, Mar 08, 2002 at 10:51:40AM -0800, Arun Sharma wrote: [ Context: on the topic of coming up with a "common minimum" glyph set for Indian languages ] > > If we had large amounts of representative unicode text available in > Indian languages, we could've done a frequency analysis to figure out > which ones were more common. > > I'll try to write something up later today. While we're on the topic, > any opinions on how programs like "wc" should behave for Indian > languages ? Should they not count the combination of a consonant and a > vowel as a character ? ok, I wrote up a script: http://www.sharma-home.net/~adsharma/languages/scripts/lf.py On running the script on this page: [ <meta http-equiv="content-type" content="text/html; charset=UTF-8"> <meta http-equiv="content-language" content="kn-IN"> ] http://www.sharma-home.net/~adsharma/languages/kannada/shivarama-karant.html I get this: http://www.sharma-home.net/~adsharma/languages/scripts/freq.txt Interesting stats 1. The number of times the halant was used - I guess this is because every "vattu" needs one. 2. The dependent vowel "e" came in second (might be similar to English, where e is the most frequent letter) TODO: to count the frequency on a per-syllable basis, rather than a per character basis. Will need libraries to do the consonant-vowel composition and then run it through lf.py. I see some code in Emacs lisp, which is doing such computation: http://www.mit.edu/afs/athena.mit.edu/project/ptest/emacs/emacs-20.5/lisp/language/devan-util.el -Arun |
From: Arun S. <ar...@sh...> - 2002-03-08 18:46:08
|
On Fri, Mar 08, 2002 at 07:14:25PM +0100, Primoz Peterlin wrote: > > For those without the Tunga font, it would help a lot if the table would > also be available as bitmap image. > Actually, I tried saving it as PDF before I mailed it out, but it failed. When I open the PDF, it says "can't find the Tunga font". I tried to make the PDF writer embed the font in the PDF, without much success. Anyone on this list know how to do it ? > But I believe that not all combinations really appear in live > written language, or do they? I can certainly say that characters like U+919 and U+91E (Dev) and their equivalents in Kannada are extremely rare in the written language. If we had large amounts of representative unicode text available in Indian languages, we could've done a frequency analysis to figure out which ones were more common. I'll try to write something up later today. While we're on the topic, any opinions on how programs like "wc" should behave for Indian languages ? Should they not count the combination of a consonant and a vowel as a character ? -Arun |
From: Primoz P. <pri...@bi...> - 2002-03-08 18:15:30
|
-----BEGIN PGP SIGNED MESSAGE----- On Thu, 7 Mar 2002, Arun Sharma wrote: > On Wed, Mar 06, 2002 at 11:07:53PM -0800, Keyur Shroff wrote: > > As far as I know, during last few months our Ministry was > > in process to standardize glyph sets for all Indic scripts. > > However the technology like OpenType gives freedom to font > > designer to define his/her own glyphset. > Perhaps what the standardization efforts could do is: > Publish charts similar to this one: > http://www.sharma-home.net/~adsharma/languages/kannada/kaguNita.html > [ Requires fonts such as Tunga from MS to view. This font has some known > errors ] > and then designate certain ranges as optional. For those without the Tunga font, it would help a lot if the table would also be available as bitmap image. But I believe that not all combinations really appear in live written language, or do they? With kind regards, Primoz - -- Primo=BE Peterlin, In=B9titut za biofiziko, Med. fakulteta, Univerza v Lj= ubljani Lipi=E8eva 2, SI-1000 Ljubljana, Slovenija. primoz.peterlin@biofiz.mf.uni-= lj.si Tel: +386-1-5437632, fax: +386-1-4315127, http://sizif.mf.uni-lj.si/~peterl= in/ F8021D69 OpenPGP fingerprint: CB 6F F1 EE D9 67 E0 2F 0B 59 AF 0D 79 56 19= 0F -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (HP-UX) Comment: For info see http://www.gnupg.org iQB1AwUBPIj/hT3bcxr4Ah1pAQGolgMAobCA+wiZVeZj/Tw5auLPI6S5Qq52WaNO U9ptx9UFoFKeVL4+p0MMlqoZ0cDq7bfpV7/Gp93haGwE+isT/Lg+a+VFOCNeoZFJ qCldaLIG2pXcTspu9bqQJ72hnqCN8qi0 =3DbRfF -----END PGP SIGNATURE----- |
From: Primoz P. <pri...@bi...> - 2002-03-08 18:12:51
|
On Thu, 7 Mar 2002, Dr. U.B. Pavanaja wrote: > Uniscribe renders the OpenType font. The logic for substitution and > positioning of the glyphs is supplied by the font. Uniscribe CAN NOT > create any glyph on-the-fly. Uniscribe works only on Windows2000 and > XP. It does not work on Windows 95/98/ME. Thank you for your explanation. With kind regards, Primoz -- Primo=BE Peterlin, In=B9titut za biofiziko, Med. fakulteta, Univerza v Lj= ubljani Lipi=E8eva 2, SI-1000 Ljubljana, Slovenija. primoz.peterlin@biofiz.mf.uni-= lj.si Tel: +386-1-5437632, fax: +386-1-4315127, http://sizif.mf.uni-lj.si/~peterl= in/ F8021D69 OpenPGP fingerprint: CB 6F F1 EE D9 67 E0 2F 0B 59 AF 0D 79 56 19= 0F |
From: Keyur S. <key...@ya...> - 2002-03-08 06:02:44
|
Hello, --- "Dr. U.B. Pavanaja" <pav...@vi...> wrote: > What we need is a public domain OpenType font rendering > engine for > Indic scripts for all platforms, especially for Linux. > Indix from > NCST is supposed to be one. I have no experience on that. > Anyone has? Yes, I have ;-). IndiX has its own Indic Script Shaping engine. It has functions for Syllable breaking, Reordering, etc. Logic for turning ON/OFF script features is also inside the Indic library. For parsing tables of Opentype font, it uses FreeType library. Regards, Keyur __________________________________________________ Do You Yahoo!? Try FREE Yahoo! Mail - the world's greatest free email! http://mail.yahoo.com/ |
From: Arun S. <ar...@sh...> - 2002-03-07 17:56:29
|
On Thu, Mar 07, 2002 at 10:37:09PM +0530, Dr. U.B. Pavanaja wrote: > What we need is a public domain OpenType font rendering engine for > Indic scripts for all platforms, especially for Linux. Indix from > NCST is supposed to be one. I have no experience on that. Anyone has? Keyur Shroff is one of the developers on the IndiX project. He's subscribed to this list. -Arun |
From: Arun S. <ar...@sh...> - 2002-03-07 17:53:05
|
On Wed, Mar 06, 2002 at 11:07:53PM -0800, Keyur Shroff wrote: > As far as I know, during last few months our Ministry was > in process to standardize glyph sets for all Indic scripts. > However the technology like OpenType gives freedom to font > designer to define his/her own glyphset. Perhaps what the standardization efforts could do is: Publish charts similar to this one: http://www.sharma-home.net/~adsharma/languages/kannada/kaguNita.html [ Requires fonts such as Tunga from MS to view. This font has some known errors ] and then designate certain ranges as optional. -Arun |
From: Dr. U.B. P. <pav...@vi...> - 2002-03-07 17:07:46
|
>From:Primoz Peterlin <pri...@bi...> > > > For OpenType font, we need more glyphs. There is no need of any font > > glyph set standard for OpenType font. It is the job of the rendering > > engine (Uniscribe on Windows XP) to display the font properly. > > I realize that the glyph set is an "open set", to which glyphs can be > added, should the need arise. What I meant by an agreed set of > required ligatures needs not necessarily be an official standard. But > on the other hand, I believe that newspapers and textbooks are printed > in all major Indian languages, so a century(-ies) ago, well before any > computers, typesetters had to make such lists. I would guess that > printing scholarly publications, poetry etc. might require a richer > set of glyphs, but nevertheless, I would like to have some goal... The Kannada OpenType font Tunga that ships with XP has 407 glyphs which is more than enough for printing almost any book in Kannada. In the OpenType font that I am making, I have knocked off some glyphs from this set. Some publishers ask for some special glyphs for printing Sanskrit scriptures in Kannada, music notations, etc. In OpenType font we can add these extra glyphs in the Private Use Area of Unicode. To get these glyphs in the text that we type, we will have to send their respective Unicode values. In OfficeXP this is done by typing the Unicode value and hitting Alt-X. > On the Uniscribe engine... I have been reading the Microsoft > Typography pages, and wasn't smart enough to guess whether Uniscribe > simply substitutes the right ligature for the given sequence of > characters (using the GSUB table?), or has some smart way of actually > *creating* the needed glyphs on-the-fly. You seem to have some > first-hand experience with it, perhaps you could help me? Uniscribe renders the OpenType font. The logic for substitution and positioning of the glyphs is supplied by the font. Uniscribe CAN NOT create any glyph on-the-fly. Uniscribe works only on Windows2000 and XP. It does not work on Windows 95/98/ME. What we need is a public domain OpenType font rendering engine for Indic scripts for all platforms, especially for Linux. Indix from NCST is supposed to be one. I have no experience on that. Anyone has? Regards, Pavanaja ----------------------------------------------------- Dr. U.B. Pavanaja Editor, Vishva Kannada World's first Internet magazine in Kannada http://www.vishvakannada.com/ Note: I don't worry about pselling mixtakes |
From: Keyur S. <key...@ya...> - 2002-03-07 07:07:55
|
Hi, --- Primoz Peterlin <pri...@bi...> > Encouraged by the URW++ release of core 35 PostScript > fonts under the > terms of GNU GPL and the steady improvement of the > PfaEdit PostScript font > editor <http://pfaedit.sourceforge.net/>, I set myself a > goal to compile a > set of free (GPL-ed) outline fonts covering a range of > ISO10646/Unicode as > broad as reasonably achievable. The partial results of > this effort are > available on the project page, > <http://savannah.gnu.org/projects/freefont/>. This is really a very good effort. > As a first question, I would like to ask whether there is > any agreement on > the sets of ligatures needed to render particular Indic > scripts, i.e. As far as I know, during last few months our Ministry was in process to standardize glyph sets for all Indic scripts. However the technology like OpenType gives freedom to font designer to define his/her own glyphset. So no standardization is required for OpenType font. However such standardization will help in designing fonts in other kind of technologies. I'll learn more about it when I attend meeting with our Ministry and other organizations (C-DAC, IIT-Kanpur, etc.) in Delhi on Saturday. > > * a minimal set, e.g. for use in email (for instance, > like the lam-alif > in Arabic) > * a practical set, e.g. for use on WWW or in newspaper > (required to > typeset a modern language) > * a maximal set, including all glyphs needed to render > traditional texts, > including rare or theoretical ligatures > > * Prof Joshi's Raghu font (468 ligatures) > http://rohini.ncst.ernet.in/indix/download/font/ Raghu font has 674 glyphs and very much fit in the third category. We are also planning to design variants of Raghu font which will fall in first and second category respectively. Hopefully within next one year we shall design OpenType font for each of the other Indic scripts and put it in public domain. > What is the situation with other Indic scripts? According to my knowledge there are two such widely used standard available currently in India. ISFOC defines glyphset for each of the Indic scripts. tscii is another standard for Tamil script only. There may be other standards which I don't know about. Regards, Keyur __________________________________________________ Do You Yahoo!? Try FREE Yahoo! Mail - the world's greatest free email! http://mail.yahoo.com/ |