More ideas (there are some research papers available on some of these topics too):

Sinhala morphological analyzer (this will be useful all most all Sinhala related NLP applications) -  Sinhala stemming algo?
Sinhala sentence boundary detection
Sinhala named entity recognition
Sinhala name transliteration (Soundex Algo for Sinhala?)

Re:Sinhala <-> MT system,

Some work have been reported on this ( see publications by Dr. Ruvan Weerasinghe & J. Liyanapathirana - http://ucsc.cmb.ac.lk/index.php?option=com_content&task=category&sectionid=8&id=106&Itemid=106) . Unless you are going to develop a rule based system, you will need a parallel corpus ( at least 1 Million sentences or so) to train your system.

Re: Subasa spell checker.
Architecture of Subasa is completely different from Hunspell/Aspell architecture, and is based on ngram statistics( more info can be found on the papers cited by Eranga). May be you could focus on improving Hunspell by incorporating a Sinhala morphological analyzer module  (e.g. improve the accuracy of Hunspell's suffix/prefix generation algo).

Hope this helps

Asanka

 

On Mon, Jan 3, 2011 at 12:51 AM, Buddhika Laknath <blaknath@gmail.com> wrote:
Hi Maheeka,

As you have noticed there are some work done in the field of Sinhala spell checking and support for popular applications such as Firefox, OpenOffice.etc. Currently most of these applications are using Hunspell as the framework for spell checking and using a common framework across many applications comes handy because it gives chance to focus on improving wordlists rather than getting used to various technologies.

At the moment there are several wordlists including UCSC word list and one being developed by me. If you are interested in this approach of building/improving spell checkers it might be better to setup Hunspell and check how it can workout with Sinhala (and this might help as a starting point). Another approach would be to forgo any frameworks and just build a spell checker from the scratch specifically for Sinhala. This could be advantageous when exploiting some unique Sinhala language features such as some forms of "Sandhi" to make spell checking process more corrective but on the other hand will have the disadvantage of being unusable with common text editing applications.

Hope this helps.

-Laknath



On 01/01/2011 06:08 PM, Maheeka Jayasuriya wrote:
Thank you for your concern regarding this matter. I hope to get help and support from all experienced and knowledged in this area as my project is continuing. Thanks

Regards, 
Maheeka

On Sat, Jan 1, 2011 at 4:49 PM, Harshula <harshula@gmail.com> wrote:
Hi All,

There's a number of you that can probably help Maheeka.

cya,
#

-------- Forwarded Message --------

I am a final year student at APIIT. I am to select my final year
project within this week. After going through so many ideas, i have
ended up with a few useful ideas regarding Sinhala.

Initially i wanted to create a translator for English to Sinhala,
which i know is not an easy task. And it is almost impossible in 8
months of my FYP period. So after talking with my supervisor, he
advised me to go for a Sinhala to English translator, which is a
little bit more manageable than vice-versa, considering the time. Even
out of this i had to select a limited area of the language. The reason
I had to limit my scope is that I have to develop everything from
scratch. Therefore, I have selected sentences adhering to "prathama
wibhakthi" and "sambanda wibhakthi" only.

My other idea was to create a Sinhala spell checker. After some
googling i found out that there is already a spell checker for open-
office and an add-on for Firefox. the firefox add-on does not support
with the newer versions and the open office plug-in needs some
improvements regarding the word set. After going through the threads
in this forum, i saw a thread that was referring to the need of a
browser spell-checker. I further explored this idea and informed my
supervisor of my intention to develop such a plug-in. I personally
prefer this over the translator. But I have not confirmed my selection
as at yet.

Since it is the vacation I still couldn't get my supervisor's opinion
about these two ideas. But since both is regarding Sinhala, i would
like to get support from this group members for this project. i would
like to hear your ideas regarding this matter.

Regards,

Maheeka

------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and,
should the need arise, upgrade to a full multi-node Oracle RAC database
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
sinhala-technical mailing list
sinhala-technical@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sinhala-technical