Yes, It can be automated. You can use affixcompress. Then, few extra rules can be added to the aff file - such as "ණ <-> න".
Can the processing steps be automated in a shell script or makefile?
That way Parag can d/l the UCSC word list and build the final output
On Thu, 2012-08-23 at 11:09 +0530, Sandaruwan Gunathilake wrote:
> The original word list is still available in the UCSC
> page : http://www.ucsc.cmb.ac.lk/ltrl/?page=downloads
> I don't have the processed file at the moment - I'll dig up my backups
> and check whether I still have them. It's still in the firefox addon
> though : https://addons.mozilla.org/en-us/firefox/addon/sinhala-spellchecker/
> On Thu, Aug 23, 2012 at 10:45 AM, Harshula <email@example.com> wrote:
> Hi Sandaruwan,
> Parag (CC'd) is wondering where the upstream source tarball
> for the word
> list went?
> On Mon, 2010-07-05 at 00:59 +0530, Sandaruwan Gunathilake
> > Hi,
> > On Sun, Jul 4, 2010 at 11:57 PM, Harshula
> <firstname.lastname@example.org> wrote:
> > Hi Sandaruwan,
> > On Sun, 2010-07-04 at 22:01 +0530, Sandaruwan
> > wrote:
> > > What about the sinhala words list on UCSC language
> lab page?
> > >
> > > http://www.ucsc.cmb.ac.lk/ltrl/?page=downloads
> > >
> > > I switched the word list to that in spellchecker
> > 0.2.
> > The LTRL word list states it has 70142 distinct
> Sinhala words.
> > si-LK.dic
> > appears to have 26707 words. Did you take a subset
> of the
> > words from the
> > LTRL word list?
> > No, everything is there. I just used compressed the words
> list with
> > "affixcompress" utility and added few extra rules at the top
> of .aff
> > file to support "ණ/න/ල/ළ", etc.
> > --
> > Best Regards,
> > Sandaruwan Gunathilake
> Best Regards,
> Sandaruwan Gunathilake