RE: [Classifier4j-devel] Bayesian Case Study
Status: Beta
Brought to you by:
nicklothian
From: Matt C. <MCo...@my...> - 2003-11-14 01:11:43
|
Very nice. Should we keep these in a flat file? This would make alot of sense in my opinion. Do we want to modify the default tokenizer and stop list provider, or do we want to extend it? If we want to extend it, can you please shortcut me to doing this. I think I understand that we will create a class that "extends default tokenizer" etc, but how will this new class be used by the other classes and methods such as bayesian.classify? Surely we won't have to modify all this code, or perhaps we do. I don't know... which is why I'm asking... :) Matt Collier RemoteIT mco...@my... 877-4-NEW-LAN -----Original Message----- From: Nick Lothian <nl...@es...> To: "'cla...@li...'" <classifier4j- de...@li...> Date: Fri, 14 Nov 2003 11:29:55 +1030 Subject: RE: [Classifier4j-devel] Bayesian Case Study > > > > 2) "we" see several occurances of useless pronouns in this > > list. This can be > > addressed by an improved "stop list". There is evidently an > > excellent paper > > written on the top of stop lists aptly named "A stop list for > > general text" by > > Chritopher Fox published in ACM SIGIR Forum Volume 24 Issue 2 > > 1989 ISSN:0163- > > 5840. If anyone has access to this paper, please advise. > > > > Here's a list of stop words I've been saving to add into classifier4J > sometime (from <ftp://ftp.cs.cornell.edu/pub/smart/>). > > a > a's > able > about > above > according > accordingly > across > actually > after > afterwards > again > against > ain't > all > allow > allows > almost > alone > along > already > also > although > always > am > among > amongst > an > and > another > any > anybody > anyhow > anyone > anything > anyway > anyways > anywhere > apart > appear > appreciate > appropriate > are > aren't > around > as > aside > ask > asking > associated > at > available > away > awfully > b > be > became > because > become > becomes > becoming > been > before > beforehand > behind > being > believe > below > beside > besides > best > better > between > beyond > both > brief > but > by > c > c'mon > c's > came > can > can't > cannot > cant > cause > causes > certain > certainly > changes > clearly > co > com > come > comes > concerning > consequently > consider > considering > contain > containing > contains > corresponding > could > couldn't > course > currently > d > definitely > described > despite > did > didn't > different > do > does > doesn't > doing > don't > done > down > downwards > during > e > each > edu > eg > eight > either > else > elsewhere > enough > entirely > especially > et > etc > even > ever > every > everybody > everyone > everything > everywhere > ex > exactly > example > except > f > far > few > fifth > first > five > followed > following > follows > for > former > formerly > forth > four > from > further > furthermore > g > get > gets > getting > given > gives > go > goes > going > gone > got > gotten > greetings > h > had > hadn't > happens > hardly > has > hasn't > have > haven't > having > he > he's > hello > help > hence > her > here > here's > hereafter > hereby > herein > hereupon > hers > herself > hi > him > himself > his > hither > hopefully > how > howbeit > however > i > i'd > i'll > i'm > i've > ie > if > ignored > immediate > in > inasmuch > inc > indeed > indicate > indicated > indicates > inner > insofar > instead > into > inward > is > isn't > it > it'd > it'll > it's > its > itself > j > just > k > keep > keeps > kept > know > knows > known > l > last > lately > later > latter > latterly > least > less > lest > let > let's > like > liked > likely > little > look > looking > looks > ltd > m > mainly > many > may > maybe > me > mean > meanwhile > merely > might > more > moreover > most > mostly > much > must > my > myself > n > name > namely > nd > near > nearly > necessary > need > needs > neither > never > nevertheless > new > next > nine > no > nobody > non > none > noone > nor > normally > not > nothing > novel > now > nowhere > o > obviously > of > off > often > oh > ok > okay > old > on > once > one > ones > only > onto > or > other > others > otherwise > ought > our > ours > ourselves > out > outside > over > overall > own > p > particular > particularly > per > perhaps > placed > please > plus > possible > presumably > probably > provides > q > que > quite > qv > r > rather > rd > re > really > reasonably > regarding > regardless > regards > relatively > respectively > right > s > said > same > saw > say > saying > says > second > secondly > see > seeing > seem > seemed > seeming > seems > seen > self > selves > sensible > sent > serious > seriously > seven > several > shall > she > should > shouldn't > since > six > so > some > somebody > somehow > someone > something > sometime > sometimes > somewhat > somewhere > soon > sorry > specified > specify > specifying > still > sub > such > sup > sure > t > t's > take > taken > tell > tends > th > than > thank > thanks > thanx > that > that's > thats > the > their > theirs > them > themselves > then > thence > there > there's > thereafter > thereby > therefore > therein > theres > thereupon > these > they > they'd > they'll > they're > they've > think > third > this > thorough > thoroughly > those > though > three > through > throughout > thru > thus > to > together > too > took > toward > towards > tried > tries > truly > try > trying > twice > two > u > un > under > unfortunately > unless > unlikely > until > unto > up > upon > us > use > used > useful > uses > using > usually > uucp > v > value > various > very > via > viz > vs > w > want > wants > was > wasn't > way > we > we'd > we'll > we're > we've > welcome > well > went > were > weren't > what > what's > whatever > when > whence > whenever > where > where's > whereafter > whereas > whereby > wherein > whereupon > wherever > whether > which > while > whither > who > who's > whoever > whole > whom > whose > why > will > willing > wish > with > within > without > won't > wonder > would > would > wouldn't > x > y > yes > yet > you > you'd > you'll > you're > you've > your > yours > yourself > yourselves > z > zero > > > > > ------------------------------------------------------- > This SF.Net email sponsored by: ApacheCon 2003, > 16-19 November in Las Vegas. Learn firsthand the latest > developments in Apache, PHP, Perl, XML, Java, MySQL, > WebDAV, and more! http://www.apachecon.com/ > _______________________________________________ > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel |