beginner's guide?

Help
paul
2008-10-31
2013-06-03
  • paul

    paul - 2008-10-31

    I'm here because I thought I could make an OpenOffice.org spellchecker for the minority language I'm working with; help the language community with access to open source software, that kind of thing. Just looking around the sourceforge site here I'm quickly realising that this is significantly more complicated than taking the wordlist I have and creating an 'extension' for the OpenOffice dictionary repository . . . Is there an easier way to implement a spellchecker for a minority language (and I don't mean using a workaround such as replacing the contents of an existing spellchecker as suggested on some forum threads)? Any help would be greatly appreciated! Thanks to all who read and respond to this thread.

     
    • Jeppe Bundsgaard

      Hi Paul
      It sure isn't a simple task to create a well functioning spell checker. Not only because of the technical tools, but also because you have to make a description of the system of inflections for the language. This is both a pretty complex lingusitic task and a challange to describe it in technical language.

      What you have to do is to describe the system of inflections in a socalled affix-file. And then tag the words in your word-list in relation to these affixes. You can read about this in the documentation - this document: https://sourceforge.net/docman/display_doc.php?docid=29374&group_id=143754 will become your closest friend ;-)

      In the Danish word list project, www.Stavekontrolden.dk, we have developed a webapplication that helps us do the job of describing the affix file and tagging the words. If you are interested we will be happy to share the system with you and help you get going - but I won't pretend it will be a piece of cake anyway. So what you maybe want to do is to gather a team so you don't have to do all the work your self.

      We were lucky to get access to a very large corpus of words that were tagged (in a different system, though) which helped us get a critical mass of words in our list. Maybe you can get access to a similar corpus in your language.

      Best regards,
      Jeppe

       

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks