Re: Letter frequency in unicode (Was Re: [Indic-computing-devel] Free UCS outline font)
Status: Alpha
Brought to you by:
jkoshy
From: Arun S. <ar...@sh...> - 2002-03-11 17:14:23
|
On Mon, Mar 11, 2002 at 01:25:11AM -0800, Joseph Koshy wrote: > > > > http://www.sharma-home.net/~adsharma/languages/scripts/freq.txt > > A suggestion: could you print the Unicode numbers (i.e U+ABCD) along > side the UTF-8 string displayed. > > This would help people on platforms without support for Unicode > rendering to make sense of the data. That's a good one. I've made the code change, running the script again now - by the time you read this, you should see the unicode numbers in http://www.sharma-home.net/~adsharma/languages/scripts/dict.txt A quick profiling of the code indicated that the performance problems are due to the string manipulation: str = str + "abc" is inefficient in python, because strings are immutable and doing string concatenation in a loop creates too many objects. (This is true of Java also). The trick is to collect them in a list and do string.join(list). Will make the change later today. -Arun |