Re: [Classifier4j-devel] New Stop Words Provider
Status: Beta
Brought to you by:
nicklothian
From: Matt C. <MCo...@my...> - 2003-11-18 21:52:20
|
Attached is GammaStopWordsProvide.java. I discovered and implemented the ArrayList class. Still need to devise a way for users to pass the path to their custom start list, or implement moedusa's idea about automatic resource location. Also should throw an exception either in addition to or instead of printing an error message. Matt Collier RemoteIT mco...@my... 877-4-NEW-LAN -----Original Message----- From: "Matt Collier" <MCo...@my...> To: cla...@li... Date: Sat, 15 Nov 2003 13:23:54 -0600 Subject: Re: [Classifier4j-devel] New Stop Words Provider > Attached, find BetaStopWordsProvider which EXTENDS DefaultStopWordsProvider. > I think I'm getting the hang of this. > > To use this, when you need to do something like this in your code: > > ICategorisedWordsDataSource wds=null; //define wds how you like > IStopWordProvider swp=new BetaStopWordsProvider(); > ITokenizer tok=new DefaultTokenizer(); > > BayesianClassifier classifier = new BayesianClassifier(wds,tok,swp); > > Everything is become clear to me now! > > One question remains in my mind, is it correct to say that our html stripper > and stemmer will both have to work out of ITokenizer/DefaultTokenizer? > > Place BetaStopWordsProvider.java in the same directory as your > DefaultStopWordsProvider.java, make sure you have a stop-list at > c:/stoplist/english.stop and you should be in business. > > Matt Collier > RemoteIT > mco...@my... > 877-4-NEW-LAN > > > -----Original Message----- > From: "Matt Collier" <MCo...@my...> > To: "Classifier4J" <cla...@li...> > Date: Sat, 15 Nov 2003 12:37:03 -0600 > Subject: [Classifier4j-devel] New Stop Words Provider > > > Attached is an alternate stop words provider for classifier4J. I simply > > copied the whole of DefaultStopWordsProvide.java and renamed it to > > AlphaStopWordsProvider.java. > > > > I am pretty sure that this is not the correct way to do this since there is > a > > comment about overriding the getStopWords method, but I'm not sure how to > do > > this right now. I wanted to get this code out for review. Please advise. > > > > This reads the stop list from a file "c:/stoplist/english.stop". You will > > need to download the stop list or create your own. There is a link on the > > wiki site for the stop-list that Nick found : > > > > http://www.ishmaelswiki.org/wiki/index.php/TextClassification > > > > there should be a single word on each line of your stop list file. > > > > Matt Collier > > RemoteIT > > mco...@my... > > 877-4-NEW-LAN |