I've commited the changes to JDBCWordsDataSource discussed in
<http://sourceforge.net/mailarchive/forum.php?thread_id=2697458&forum_id=340
26>. (For those of you using Sourceforge's anonymous CVS I would expect it
to show up tomorrow)
It now uses a single table:
CREATE TABLE word_probability (
word VARCHAR(255) NOT NULL,
category VARCHAR(20) NOT NULL,
match_count INT DEFAULT 0 NOT NULL,
nonmatch_count INT DEFAULT 0 NOT NULL,
PRIMARY KEY(word, category)
)
I've attached a simple program that will process some files in some
directories and then output the most significant words. (All the directories
& db names etc are hard coded, but it should show roughly how to use it).
Nick
|