Re: Letter frequency in unicode (Was Re: [Indic-computing-devel] Free UCS outline font)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Mon, Mar 11, 2002 at 01:25:11AM -0800, Joseph Koshy wrote:
> 
> 
> > http://www.sharma-home.net/~adsharma/languages/scripts/freq.txt
> 
> A suggestion: could you print the Unicode numbers (i.e U+ABCD) along
> side the UTF-8 string displayed.
> 
> This would help people on platforms without support for Unicode
> rendering to make sense of the data.

That's a good one. I've made the code change, running the script again
now - by the time you read this, you should see the unicode numbers in

http://www.sharma-home.net/~adsharma/languages/scripts/dict.txt

A quick profiling of the code indicated that the performance problems
are due to the string manipulation:

str = str + "abc"

is inefficient in python, because strings are immutable and doing string
concatenation in a loop creates too many objects. (This is true of Java
also). The trick is to collect them in a list and do string.join(list). 
Will make the change later today.

	-Arun