Re: [Classifier4j-devel] Comparing Documents
Status: Beta
Brought to you by:
nicklothian
|
From: Mike H. <mh...@av...> - 2004-12-29 19:23:30
|
C4J relies on Naive Bayes (http://en.wikipedia.org/wiki/Naive_Bayes) which, in order to classify something, you need to teach it what each class is AND what each class is not. For comparison purposes as you've described in your message, I'm not sure the C4J is a good solution. -Mike On Sun, 2004-12-26 at 15:37, Colin Bell wrote: > Hi all > > I would like to start with saying what an exciting piece of software > C4J is thanks to all those involved. > > I have written a bit of code to use C4J to compare documents (in this > case stored in a JDBC database) to each other and find out how similar > they are. I pick the document from which I am to compare, and then add > each word of it to a SimpleWordsDataSource using a loop > (wds.addMatch(wordList[i])). I then use BayesianClassifier(wds) to get > the result of each document. > > Problem is that my results are obviously very poor (always 0.99, > sometimes 0.5) because I don't have any non-matches. Does anyone have > an idea on how I could do this? What could I possible use as > non-matches, or am I missing a trick? > > Many thanks > > Regards > > Colin > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://productguide.itmanagersjournal.com/ > _______________________________________________ > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel |