RE: [Classifier4j-devel] Bayesian Case Study
Status: Beta
Brought to you by:
nicklothian
From: Matt C. <MCo...@my...> - 2003-11-14 05:29:06
|
Another little snowball stemming test. I suppose consistency is the key to the stemming process whatever the outcome. Matt Collier RemoteIT mco...@my... 877-4-NEW-LAN -----Original Message----- From: "Matt Collier" <MCo...@my...> To: cla...@li... Date: Thu, 13 Nov 2003 23:25:19 -0600 Subject: RE: [Classifier4j-devel] Bayesian Case Study > Attached are input and output files from the snowball stemmer. Clearly need > to remove punctuation before stemming with this one. Does this look OK? > > Anybody know why these stemmers like using input strings and single character > inputs. How do we quickly and easily send a string to this class? > > > Matt Collier > RemoteIT > mco...@my... > 877-4-NEW-LAN > > > -----Original Message----- > From: "Matt Collier" <MCo...@my...> > To: cla...@li... > Date: Thu, 13 Nov 2003 22:38:19 -0600 > Subject: RE: [Classifier4j-devel] Bayesian Case Study > > > Looking for java stemmers, I found these: > > > > Lovins Stemmer > > http://sourceforge.net/projects/stemmers/ > > > > Snowball > > Source Code > > http://snowball.tartarus.org/snowball_java.tgz > > Home Page > > http://snowball.tartarus.org/ > > > > I don't even know what this is: > > > http://mailweb.udlap.mx/~hermes/javadoc/mx/udlap/ict/u_dl_a/irserver/qprocess > > or > > s/EnglishStemmer.html > > > > This is evidently the OFFICIAL Porter stemmer > > http://www.tartarus.org/~martin/PorterStemmer/ > > > > Lucene evidently uses snowball, as previously stated by Moedusa. > > > > One important piece of information I picked up from the vector-space > > information was to run stop-list BEFORE stemming. > > > > That's it for now, surely one of these will do the trick. > > > > Matt Collier > > RemoteIT > > mco...@my... > > 877-4-NEW-LAN > > > > > > -----Original Message----- > > From: Nick Lothian <nl...@es...> > > To: "'cla...@li...'" <classifier4j- > > de...@li...> > > Date: Fri, 14 Nov 2003 11:30:55 +1030 > > Subject: RE: [Classifier4j-devel] Bayesian Case Study > > > > > > > > > > 3) the dreaded "s" a result no doubt of incorrectly > > > > tokenizing possesive nouns > > > > and pronouns, contractions etc. Anybody have a good > > > > algorithm for handling > > > > this? > > > > > > > > > > One way to handle it would be to run a Stemmer (seach for "Porter > Stemmer") > > > on each work before classifying it. > > > > > > > > > ------------------------------------------------------- > > > This SF.Net email sponsored by: ApacheCon 2003, > > > 16-19 November in Las Vegas. Learn firsthand the latest > > > developments in Apache, PHP, Perl, XML, Java, MySQL, > > > WebDAV, and more! http://www.apachecon.com/ > > > _______________________________________________ > > > Classifier4j-devel mailing list > > > Cla...@li... > > > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > > > > > > > > ------------------------------------------------------- > > This SF.Net email sponsored by: ApacheCon 2003, > > 16-19 November in Las Vegas. Learn firsthand the latest > > developments in Apache, PHP, Perl, XML, Java, MySQL, > > WebDAV, and more! http://www.apachecon.com/ > > _______________________________________________ > > Classifier4j-devel mailing list > > Cla...@li... > > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel |