On Thu, Aug 23, 2012 at 7:53 PM, Harshula <harshula@gmail.com> wrote:
Hi Sandaruwan,

1) I did the following:

iconv -f UTF-16 -t UTF-8 $src | awk '{print $1}' | LANG=si_LK.UTF-8 sort
-u -k 1 > $dst

affixcompress $dst 2> /dev/null

2) I noted you added the following:
--------------------------------------
SET UTF-8

TRY ්ාෘුැූෑිීෙ

REP 25
REP න ණ
REP ණ න
REP ල ළ
REP ළ ල
REP ස ෂ
REP ෂ ස
REP ස ශ
REP ශ ස
REP ච ඡ
REP ඡ ච
REP බ භ
REP භ බ
REP ද ධ
REP ධ ද
REP ර් ්‍ර
REP ට ඨ
REP ඨ ට
REP ක ඛ
REP ඛ ක
REP ඩ ඪ
REP ඪ ඩ
REP ඉ ඊ
REP ඊ ඉ
REP ප ඵ
REP ඵ ප
--------------------------------------
Does that complete the steps from UCSC word list file to the dictionary?


Yes. BTW, the word list is already sorted in the DistinctWords.txt file - so you can omit the sort part.
 
3) Were you able to determine the license under which the UCSC word list
is distributed? The word list license would impact your right to
distribute a derived work.

No - looks like it's not mentioned anywhere. Perhaps someone affiliated with Language Lab can resolve this?

--
Best Regards,
Sandaruwan Gunathilake