From: Jimmy O'R. <jo...@gm...> - 2010-06-14 16:28:58
|
On Monday, June 14, 2010, Francis Tyers <ft...@pr...> wrote: > El dl 14 de 06 de 2010 a les 16:45 +0200, en/na mikel otxandorena va > escriure: >> Hi, I am a student who is doing a small project. I have to do >> something similar to stemming of porter for the basque language and >> maybe you can help me telling me if you have some open code in your >> project!! > > Kaixo Mikel, > > I'm not entirely sure how the porter stemmer works, but suspect that it > is probably just removing common suffixes from English words (e.g. -ing, > -ised, -ed etc.) > Almost, but close enough. Mikel, are you just writing a stemmer, or are you writing it in Porter's Snowball language specifically? If so, the Snowball list would perhaps be the best place to direct your questions. > There are morphological analysers for Basque that will do the same > trick. The IXA group has one 'Xuxen' I think it is called, but it isn't > open source. There is a partially conversion of this in the Apertium > project: > > https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-eu-es/apertium-eu-es.eu.dix > > But your mileage might vary. > The <l> parts of the paradigms will provide you with most of what you will need to trim for a Porter-type stemmer, but not *when*. > The other thing you could look at is the hunspell Basque spellchecker > (see e.g. hunspell-eu-es in Debian/Ubuntu), which has a rather long > affix file eu-ES.aff. -- <Leftmost> jimregan, that's because deep inside you, you are evil. <Leftmost> Also not-so-deep inside you. |