From: Neal R. <ne...@ri...> - 2002-12-02 21:11:29
|
This is on my list of things to work on.. An alternative is to have a separate word stemmer which stores the words in the index in stemmed form. The Porter Stemming algorithm is good for this, and I have code to do it. Thanks. On Fri, 29 Nov 2002, Lachlan Andrew wrote: > On Fri, 29 Nov 2002 02:21, Gilles Detillieux wrote: > > if someone wants a dictionary > > for some other language and finds that such a dictionary > > is better supported or more complete/correct in aspell > > Ahh... That makes sense. Thanks. However I still don't > understand why we wouldn't want the English dictionary to > stem unrealised and realises together (which implicitly > allows the non-word "unrealises"). > > > A quick grep shows there are a lot of *hs words in there > > that htfuzzy can't make use of. The quick fix would be > > to grab one of the available flags (BCEFKLOQW) and use > > that for th->ths, but it might be more logical to keep > > the S flag for th->ths pluralizations, and use something > > like E for th->thes conjugations. > > Yes. I've suggested a new rule to the ispell maintainer > ([^cst]h -> s, [cst]h -> es) which fixes most problems > (*gh,*ph), while maintaining compatibility with ispell as > much as possible. You're right that we can add lots more > rules to improve stemming in lots of ways. > > Cheers, > Lachlan > > -- > Lachlan Andrew Phone: +613 8344-3816 Fax: +613 8344-6678 > Dept of Electrical and Electronic Engg CRICOS Provider Code > University of Melbourne, Victoria, 3010 AUSTRALIA 00116K > > > ------------------------------------------------------- > This SF.net email is sponsored by: Get the new Palm Tungsten T > handheld. Power & Color in a compact size! > http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0002en > _______________________________________________ > htdig-dev mailing list > htd...@li... > https://lists.sourceforge.net/lists/listinfo/htdig-dev > Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |