Re: Letter frequency in unicode (Was Re: [Indic-computing-devel] Free UCS outline font)
Status: Alpha
Brought to you by:
jkoshy
From: Guntupalli K. <kar...@fr...> - 2002-03-11 08:36:35
|
On Sun, 10 Mar 2002 23:15:57 -0800 Arun Sharma <ar...@sh...> wrote: > On Sun, Mar 10, 2002 at 10:53:05PM -0800, Arun Sharma wrote: > > The above data can be used to > > > > (a) Design keyboards based on the analysis of which syllables are > > more frequent and which syllables often occur next to each > > other etc.(b) Publish simplified keyboards and fonts, which > > contain smaller, more manageable, but incomplete subsets of the > > language/script. > > > I'd love to run these scripts on large bodies of unicode text in > > Indian languages. Any suggestions on where to get such text ? This site contains lot of text thought not unicode ( but iscii versions were there when I last checked, though cant get through it now ). http://sanskrit.gde.to Contact IIIT Hyd, LTRC team ( vc at iiit.net , amba at iiit.net ) , they have large amounts of ISCII text. > I ran it on the last 20,000 lines of a UTF-8 encoded > English-Hindi dictionary. > > http://www.sharma-home.net/~adsharma/languages/scripts/dict.txt > > For those who can't read unicode, top 5 syllables: > > 1. ra > 2. ka > 3. nA > 4. ta > 5. pa > To support the Inscript keyb layout. on Inscript keyb layout they are under keys ra - 'j' ka - 'k' nA - 'v' ta - 'l' pa - 'h' similarly halant on 'd' VS I on 'f' And these keys are the under normal finger positions 'a s d f' 'h j k l' on a typewriter :) Regards, Karunakar |