RE: [Classifier4j-devel] Bayesian Case Study
Status: Beta
Brought to you by:
nicklothian
From: Matt C. <MCo...@my...> - 2003-11-14 04:36:23
|
Looking for java stemmers, I found these: Lovins Stemmer http://sourceforge.net/projects/stemmers/ Snowball Source Code http://snowball.tartarus.org/snowball_java.tgz Home Page http://snowball.tartarus.org/ I don't even know what this is: http://mailweb.udlap.mx/~hermes/javadoc/mx/udlap/ict/u_dl_a/irserver/qprocessor s/EnglishStemmer.html This is evidently the OFFICIAL Porter stemmer http://www.tartarus.org/~martin/PorterStemmer/ Lucene evidently uses snowball, as previously stated by Moedusa. One important piece of information I picked up from the vector-space information was to run stop-list BEFORE stemming. That's it for now, surely one of these will do the trick. Matt Collier RemoteIT mco...@my... 877-4-NEW-LAN -----Original Message----- From: Nick Lothian <nl...@es...> To: "'cla...@li...'" <classifier4j- de...@li...> Date: Fri, 14 Nov 2003 11:30:55 +1030 Subject: RE: [Classifier4j-devel] Bayesian Case Study > > > > 3) the dreaded "s" a result no doubt of incorrectly > > tokenizing possesive nouns > > and pronouns, contractions etc. Anybody have a good > > algorithm for handling > > this? > > > > One way to handle it would be to run a Stemmer (seach for "Porter Stemmer") > on each work before classifying it. > > > ------------------------------------------------------- > This SF.Net email sponsored by: ApacheCon 2003, > 16-19 November in Las Vegas. Learn firsthand the latest > developments in Apache, PHP, Perl, XML, Java, MySQL, > WebDAV, and more! http://www.apachecon.com/ > _______________________________________________ > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel |