[Classifier4j-devel] Comparing Documents
Status: Beta
Brought to you by:
nicklothian
|
From: Colin B. <co...@ga...> - 2004-12-26 22:37:17
|
Hi all I would like to start with saying what an exciting piece of software C4J is thanks to all those involved. I have written a bit of code to use C4J to compare documents (in this case stored in a JDBC database) to each other and find out how similar they are. I pick the document from which I am to compare, and then add each word of it to a SimpleWordsDataSource using a loop (wds.addMatch(wordList[i])). I then use BayesianClassifier(wds) to get the result of each document. Problem is that my results are obviously very poor (always 0.99, sometimes 0.5) because I don't have any non-matches. Does anyone have an idea on how I could do this? What could I possible use as non-matches, or am I missing a trick? Many thanks Regards Colin |