RE: [Classifier4j-devel] Bayesian Case Study
Status: Beta
Brought to you by:
nicklothian
From: Matt C. <MCo...@my...> - 2003-11-14 05:23:42
|
Attached are input and output files from the snowball stemmer. Clearly need to remove punctuation before stemming with this one. Does this look OK? Anybody know why these stemmers like using input strings and single character inputs. How do we quickly and easily send a string to this class? Matt Collier RemoteIT mco...@my... 877-4-NEW-LAN -----Original Message----- From: "Matt Collier" <MCo...@my...> To: cla...@li... Date: Thu, 13 Nov 2003 22:38:19 -0600 Subject: RE: [Classifier4j-devel] Bayesian Case Study > Looking for java stemmers, I found these: > > Lovins Stemmer > http://sourceforge.net/projects/stemmers/ > > Snowball > Source Code > http://snowball.tartarus.org/snowball_java.tgz > Home Page > http://snowball.tartarus.org/ > > I don't even know what this is: > http://mailweb.udlap.mx/~hermes/javadoc/mx/udlap/ict/u_dl_a/irserver/qprocess > or > s/EnglishStemmer.html > > This is evidently the OFFICIAL Porter stemmer > http://www.tartarus.org/~martin/PorterStemmer/ > > Lucene evidently uses snowball, as previously stated by Moedusa. > > One important piece of information I picked up from the vector-space > information was to run stop-list BEFORE stemming. > > That's it for now, surely one of these will do the trick. > > Matt Collier > RemoteIT > mco...@my... > 877-4-NEW-LAN > > > -----Original Message----- > From: Nick Lothian <nl...@es...> > To: "'cla...@li...'" <classifier4j- > de...@li...> > Date: Fri, 14 Nov 2003 11:30:55 +1030 > Subject: RE: [Classifier4j-devel] Bayesian Case Study > > > > > > > 3) the dreaded "s" a result no doubt of incorrectly > > > tokenizing possesive nouns > > > and pronouns, contractions etc. Anybody have a good > > > algorithm for handling > > > this? > > > > > > > One way to handle it would be to run a Stemmer (seach for "Porter Stemmer") > > on each work before classifying it. > > > > > > ------------------------------------------------------- > > This SF.Net email sponsored by: ApacheCon 2003, > > 16-19 November in Las Vegas. Learn firsthand the latest > > developments in Apache, PHP, Perl, XML, Java, MySQL, > > WebDAV, and more! http://www.apachecon.com/ > > _______________________________________________ > > Classifier4j-devel mailing list > > Cla...@li... > > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > > > > ------------------------------------------------------- > This SF.Net email sponsored by: ApacheCon 2003, > 16-19 November in Las Vegas. Learn firsthand the latest > developments in Apache, PHP, Perl, XML, Java, MySQL, > WebDAV, and more! http://www.apachecon.com/ > _______________________________________________ > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel |