Thread: [Classifier4j-devel] Several questions regarding Classifier4J
Status: Beta
Brought to you by:
nicklothian
From: Benjamin P. <ben...@we...> - 2004-05-07 10:30:33
|
Hello, I am very interested in implementing a bayes based rating system in my feedreader. I played around with Classifier4J a bit and got some questions: - Are there any open-source examples available that use classifier4j? - Is there any support of storing the ratings in a file and reload it on next JVM startup (some sort of Serialisation)? - Is it possible to have a better rating granularity than just match and not match? I thought of something like 5 rating levels that a news may be rated (very good, good, moderate, bad, very bad). - How do I train the system. With using classifier.teachMatch()... or with using the WordProbability class? With best regards, Ben RSSOwl Development Team |
From: <br...@bj...> - 2004-05-07 17:12:35
|
> - Are there any open-source examples available that use classifier4j? I'm using Classifier4J as a spam email classifier in my Webgate software (see http://webgate.sourceforge.net). > - Is there any support of storing the ratings in a file and > reload it on next JVM startup (some sort of > Serialisation)? I don't know the answer to this. From my dealings with it, all results are stored in a database. > - Is it possible to have a better rating granularity than > just match and not match? I thought of something like > 5 rating levels that a news may be rated (very good, good, > moderate, bad, very bad). Well - the matching returns a percentage match I think. So you could test the returned result and say, for example.. that a 0-.2 would be very good, .2-.4 good, .4-.6 moderate, etc. > - How do I train the system. With using > classifier.teachMatch()... or with using the WordProbability class? I used teachMatch() and teachNonMatch() (i think thats the correct method name) against known good and bad emails in my inbox. Hope this helps! - Brent |
From: Benjamin P. <ben...@we...> - 2004-05-07 21:39:55
|
br...@bj... wrote: >>- Are there any open-source examples available that use classifier4j? >> >> > >I'm using Classifier4J as a spam email classifier in my >Webgate software (see http://webgate.sourceforge.net). > > Thanks, going to have a look at it. > > >>- Is there any support of storing the ratings in a file and >>reload it on next JVM startup (some sort of >> Serialisation)? >> >> > >I don't know the answer to this. From my dealings with it, >all results are stored in a database. > > So what kind of database are you using? Is it a real database that is running on a server, or some sort of file-based DB? > > >>- Is it possible to have a better rating granularity than >>just match and not match? I thought of something like >> 5 rating levels that a news may be rated (very good, good, >>moderate, bad, very bad). >> >> > >Well - the matching returns a percentage match I think. So >you could test the returned result and say, for example.. that >a 0-.2 would be very good, .2-.4 good, .4-.6 moderate, etc. > > Well I am talking of the other way, giving good or moderate as input into classifier4j. Using trainMatch and trainNonMatch this does not seem to be possible. > > >>- How do I train the system. With using >>classifier.teachMatch()... or with using the WordProbability class? >> >> > >I used teachMatch() and teachNonMatch() (i think thats the correct >method name) against known good and bad emails in my inbox. > > Yeah those sound good, but as mentioned I dont think it will be possible to use them when using 5 rating levels. For a Spam filter with only two levels Spam / Ham its sufficient. Ben >Hope this helps! > >- Brent > > > > >------------------------------------------------------- >This SF.Net email is sponsored by Sleepycat Software >Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to >deliver higher performing products faster, at low TCO. >http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3 >_______________________________________________ >Classifier4j-devel mailing list >Cla...@li... >https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > > > |
From: Nick L. <ni...@ma...> - 2004-05-08 02:39:15
|
>> >>> - Is there any support of storing the ratings in a file and reload >>> it on next JVM startup (some sort of >>> Serialisation)? >>> >> >> >> I don't know the answer to this. From my dealings with it, >> all results are stored in a database. >> >> > So what kind of database are you using? Is it a real database > that is running on a server, or some sort of file-based DB? > Classifier4J supports SQL based databases (tested with MySQL & HSQLDB) and the JDBM non-relational database. I would recommend the JDBM database for performance reasons. >> >> >>> - Is it possible to have a better rating granularity than just match >>> and not match? I thought of something like >>> 5 rating levels that a news may be rated (very good, good, >>> moderate, bad, very bad). >>> >> >> >> Well - the matching returns a percentage match I think. So >> you could test the returned result and say, for example.. that >> a 0-.2 would be very good, .2-.4 good, .4-.6 moderate, etc. >> >> > Well I am talking of the other way, giving good or moderate as > input into classifier4j. Using trainMatch and trainNonMatch > this does not seem to be possible. This would be possible in theory, but there is no support for it in C4J and it would be quite a lot of work to add it. Nick |
From: Peter L. <pe...@le...> - 2004-05-08 03:00:00
|
> >>> - Is it possible to have a better rating granularity than just match > >>> and not match? I thought of something like > >>> 5 rating levels that a news may be rated (very good, good, > >>> moderate, bad, very bad). > >> > >> Well - the matching returns a percentage match I think. So > >> you could test the returned result and say, for example.. that > >> a 0-.2 would be very good, .2-.4 good, .4-.6 moderate, etc. > >> > > Well I am talking of the other way, giving good or moderate as > > input into classifier4j. Using trainMatch and trainNonMatch > > this does not seem to be possible. > > This would be possible in theory, but there is no support for it in C4J > and it would be quite a lot of work to add it. Couldn't you have 5 categories (very good, good, moderate, bad, very bad) setup? In training, allow the user to select the category an item is best suited, under the covers call "trainMatch" to the category which they select, and then call "trainNonMatch" to every other category? Then it's a matter to calling isMatch using each category to determine which category the item is best suited.... Pete |