RE: [Classifier4j-devel] Bayesian Case Study

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Another little snowball stemming test.  I suppose consistency is the key to 
the stemming process whatever the outcome.

Matt Collier
RemoteIT
mco...@my...
877-4-NEW-LAN

-----Original Message-----
From: "Matt Collier" <MCo...@my...>
To: cla...@li...
Date: Thu, 13 Nov 2003 23:25:19 -0600
Subject: RE: [Classifier4j-devel] Bayesian Case Study

> Attached are input and output files from the snowball stemmer.  Clearly need 
> to remove punctuation before stemming with this one.  Does this look OK?
> 
> Anybody know why these stemmers like using input strings and single character
> inputs.  How do we quickly and easily send a string to this class?
> 
> 
> Matt Collier
> RemoteIT
> mco...@my...
> 877-4-NEW-LAN
> 
> 
> -----Original Message-----
> From: "Matt Collier" <MCo...@my...>
> To: cla...@li...
> Date: Thu, 13 Nov 2003 22:38:19 -0600
> Subject: RE: [Classifier4j-devel] Bayesian Case Study
> 
> > Looking for java stemmers, I found these:
> > 
> > Lovins Stemmer
> > http://sourceforge.net/projects/stemmers/
> > 
> > Snowball
> > Source Code
> > http://snowball.tartarus.org/snowball_java.tgz
> > Home Page
> > http://snowball.tartarus.org/
> > 
> > I don't even know what this is:
> >
> http://mailweb.udlap.mx/~hermes/javadoc/mx/udlap/ict/u_dl_a/irserver/qprocess
> > or
> > s/EnglishStemmer.html
> > 
> > This is evidently the OFFICIAL Porter stemmer
> > http://www.tartarus.org/~martin/PorterStemmer/
> > 
> > Lucene evidently uses snowball, as previously stated by Moedusa.
> > 
> > One important piece of information I picked up from the vector-space 
> > information was to run stop-list BEFORE stemming.
> > 
> > That's it for now, surely one of these will do the trick.
> > 
> > Matt Collier
> > RemoteIT
> > mco...@my...
> > 877-4-NEW-LAN
> > 
> > 
> > -----Original Message-----
> > From: Nick Lothian <nl...@es...>
> > To: "'cla...@li...'" <classifier4j-
> > de...@li...>
> > Date: Fri, 14 Nov 2003 11:30:55 +1030
> > Subject: RE: [Classifier4j-devel] Bayesian Case Study
> > 
> > > > 
> > > > 3) the dreaded "s" a result no doubt of incorrectly 
> > > > tokenizing possesive nouns 
> > > > and pronouns, contractions etc.  Anybody have a good 
> > > > algorithm for handling 
> > > > this?
> > > > 
> > > 
> > > One way to handle it would be to run a Stemmer (seach for "Porter
> Stemmer")
> > > on each work before classifying it.
> > > 
> > > 
> > > -------------------------------------------------------
> > > This SF.Net email sponsored by: ApacheCon 2003,
> > > 16-19 November in Las Vegas. Learn firsthand the latest
> > > developments in Apache, PHP, Perl, XML, Java, MySQL,
> > > WebDAV, and more! http://www.apachecon.com/
> > > _______________________________________________
> > > Classifier4j-devel mailing list
> > > Cla...@li...
> > > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel
> > 
> > 
> > 
> > -------------------------------------------------------
> > This SF.Net email sponsored by: ApacheCon 2003,
> > 16-19 November in Las Vegas. Learn firsthand the latest
> > developments in Apache, PHP, Perl, XML, Java, MySQL,
> > WebDAV, and more! http://www.apachecon.com/
> > _______________________________________________
> > Classifier4j-devel mailing list
> > Cla...@li...
> > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel