classifier4j-devel Mailing List for Classifier4J (Page 7)
Status: Beta
Brought to you by:
nicklothian
You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(18) |
Aug
(14) |
Sep
|
Oct
|
Nov
(74) |
Dec
(9) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(15) |
Feb
(6) |
Mar
|
Apr
|
May
(27) |
Jun
(1) |
Jul
(14) |
Aug
(3) |
Sep
(9) |
Oct
|
Nov
(3) |
Dec
(6) |
2005 |
Jan
|
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(3) |
2006 |
Jan
|
Feb
(5) |
Mar
(5) |
Apr
|
May
(2) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(10) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2008 |
Jan
|
Feb
|
Mar
(1) |
Apr
(4) |
May
(1) |
Jun
(4) |
Jul
(10) |
Aug
(5) |
Sep
(10) |
Oct
(18) |
Nov
(39) |
Dec
(73) |
2009 |
Jan
(78) |
Feb
(24) |
Mar
(32) |
Apr
(53) |
May
(115) |
Jun
(99) |
Jul
(72) |
Aug
(18) |
Sep
(22) |
Oct
(35) |
Nov
(10) |
Dec
(19) |
2010 |
Jan
(6) |
Feb
(7) |
Mar
(43) |
Apr
(55) |
May
(78) |
Jun
(71) |
Jul
(43) |
Aug
(42) |
Sep
(19) |
Oct
(5) |
Nov
|
Dec
|
2012 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Peter L. <pe...@le...> - 2004-05-08 03:00:00
|
> >>> - Is it possible to have a better rating granularity than just match > >>> and not match? I thought of something like > >>> 5 rating levels that a news may be rated (very good, good, > >>> moderate, bad, very bad). > >> > >> Well - the matching returns a percentage match I think. So > >> you could test the returned result and say, for example.. that > >> a 0-.2 would be very good, .2-.4 good, .4-.6 moderate, etc. > >> > > Well I am talking of the other way, giving good or moderate as > > input into classifier4j. Using trainMatch and trainNonMatch > > this does not seem to be possible. > > This would be possible in theory, but there is no support for it in C4J > and it would be quite a lot of work to add it. Couldn't you have 5 categories (very good, good, moderate, bad, very bad) setup? In training, allow the user to select the category an item is best suited, under the covers call "trainMatch" to the category which they select, and then call "trainNonMatch" to every other category? Then it's a matter to calling isMatch using each category to determine which category the item is best suited.... Pete |
From: Nick L. <ni...@ma...> - 2004-05-08 02:39:15
|
>> >>> - Is there any support of storing the ratings in a file and reload >>> it on next JVM startup (some sort of >>> Serialisation)? >>> >> >> >> I don't know the answer to this. From my dealings with it, >> all results are stored in a database. >> >> > So what kind of database are you using? Is it a real database > that is running on a server, or some sort of file-based DB? > Classifier4J supports SQL based databases (tested with MySQL & HSQLDB) and the JDBM non-relational database. I would recommend the JDBM database for performance reasons. >> >> >>> - Is it possible to have a better rating granularity than just match >>> and not match? I thought of something like >>> 5 rating levels that a news may be rated (very good, good, >>> moderate, bad, very bad). >>> >> >> >> Well - the matching returns a percentage match I think. So >> you could test the returned result and say, for example.. that >> a 0-.2 would be very good, .2-.4 good, .4-.6 moderate, etc. >> >> > Well I am talking of the other way, giving good or moderate as > input into classifier4j. Using trainMatch and trainNonMatch > this does not seem to be possible. This would be possible in theory, but there is no support for it in C4J and it would be quite a lot of work to add it. Nick |
From: Benjamin P. <ben...@we...> - 2004-05-07 21:39:55
|
br...@bj... wrote: >>- Are there any open-source examples available that use classifier4j? >> >> > >I'm using Classifier4J as a spam email classifier in my >Webgate software (see http://webgate.sourceforge.net). > > Thanks, going to have a look at it. > > >>- Is there any support of storing the ratings in a file and >>reload it on next JVM startup (some sort of >> Serialisation)? >> >> > >I don't know the answer to this. From my dealings with it, >all results are stored in a database. > > So what kind of database are you using? Is it a real database that is running on a server, or some sort of file-based DB? > > >>- Is it possible to have a better rating granularity than >>just match and not match? I thought of something like >> 5 rating levels that a news may be rated (very good, good, >>moderate, bad, very bad). >> >> > >Well - the matching returns a percentage match I think. So >you could test the returned result and say, for example.. that >a 0-.2 would be very good, .2-.4 good, .4-.6 moderate, etc. > > Well I am talking of the other way, giving good or moderate as input into classifier4j. Using trainMatch and trainNonMatch this does not seem to be possible. > > >>- How do I train the system. With using >>classifier.teachMatch()... or with using the WordProbability class? >> >> > >I used teachMatch() and teachNonMatch() (i think thats the correct >method name) against known good and bad emails in my inbox. > > Yeah those sound good, but as mentioned I dont think it will be possible to use them when using 5 rating levels. For a Spam filter with only two levels Spam / Ham its sufficient. Ben >Hope this helps! > >- Brent > > > > >------------------------------------------------------- >This SF.Net email is sponsored by Sleepycat Software >Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to >deliver higher performing products faster, at low TCO. >http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3 >_______________________________________________ >Classifier4j-devel mailing list >Cla...@li... >https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > > > |
From: <br...@bj...> - 2004-05-07 17:12:35
|
> - Are there any open-source examples available that use classifier4j? I'm using Classifier4J as a spam email classifier in my Webgate software (see http://webgate.sourceforge.net). > - Is there any support of storing the ratings in a file and > reload it on next JVM startup (some sort of > Serialisation)? I don't know the answer to this. From my dealings with it, all results are stored in a database. > - Is it possible to have a better rating granularity than > just match and not match? I thought of something like > 5 rating levels that a news may be rated (very good, good, > moderate, bad, very bad). Well - the matching returns a percentage match I think. So you could test the returned result and say, for example.. that a 0-.2 would be very good, .2-.4 good, .4-.6 moderate, etc. > - How do I train the system. With using > classifier.teachMatch()... or with using the WordProbability class? I used teachMatch() and teachNonMatch() (i think thats the correct method name) against known good and bad emails in my inbox. Hope this helps! - Brent |
From: Benjamin P. <ben...@we...> - 2004-05-07 10:30:33
|
Hello, I am very interested in implementing a bayes based rating system in my feedreader. I played around with Classifier4J a bit and got some questions: - Are there any open-source examples available that use classifier4j? - Is there any support of storing the ratings in a file and reload it on next JVM startup (some sort of Serialisation)? - Is it possible to have a better rating granularity than just match and not match? I thought of something like 5 rating levels that a news may be rated (very good, good, moderate, bad, very bad). - How do I train the system. With using classifier.teachMatch()... or with using the WordProbability class? With best regards, Ben RSSOwl Development Team |
From: Nick L. <nl...@es...> - 2004-02-19 22:26:50
|
> > > > However, I tried to classify some text using the Bayesian > Classifier > > > to guess language. All responses are 0.5, so I guess it's me doing > > > something wrong. > > > You need to train the classifier with both matches and non-matches. > > What kind of non-matches should I fill it with? String of > text from all > other languages? > I guess so. I'm not sure how well it is going to work for this, though. Usually we ignore the most common words in the language (stop words). In your case it might make more sense to have a vocabulary of nothing but stop words in each language, because that way you can pretty much guarantee that you'll get a match in the correct language. Nick |
From: karl w. <we...@us...> - 2004-02-19 18:21:26
|
On Wed, 18 Feb 2004 08:38:46 +1030 Nick Lothian <nl...@es...> wrote: > > However, I tried to classify some text using the Bayesian Classifier > > to guess language. All responses are 0.5, so I guess it's me doing > > something wrong. > You need to train the classifier with both matches and non-matches. What kind of non-matches should I fill it with? String of text from all other languages? footnote: I'm sure that a simple N-gram classification could come up with better results per clock tick than a bayesian, but then this is just a test of your classifier. -- karl -- kalle |
From: Nick L. <nl...@es...> - 2004-02-17 22:59:17
|
Yeah, I was just thinking that. I was also trying to think of a better way to do it - the API isn't totally clean here. Nick > -----Original Message----- > From: Peter Leschev [mailto:pe...@le...] > Sent: Tuesday, 17 February 2004 9:21 AM > To: cla...@li... > Subject: RE: [Classifier4j-devel] stuff > Importance: Low > > > > It may be worth putting this typw of question down as a FAQ > item... It's been asked a few times.. > > On Wed, 18 Feb 2004 08:38:46 +1030, Nick Lothian wrote: > > > > You need to train the classifier with both matches and non-matches. > > > > Use the .addNonMatch(String) method to train the non-matches. > > > > > -----Original Message----- > > > From: karl wettin [mailto:we...@us...] > > > Sent: Tuesday, 17 February 2004 2:16 AM > > > To: cla...@li... > > > Subject: [Classifier4j-devel] stuff > > > Importance: Low > > > > > > > > > > > > Not too much action here, I guess. > > > > > > However, I tried to classify some text using the Bayesian > Classifier > > > to guess language. All responses are 0.5, so I guess it's me doing > > > something wrong. > > > > > > The code looked something like this: > > > > > > { > > > SimpleWordsDataSource wds_en = new SimpleWordsDataSource(); > > > Enumerator enum = enumerateWords(englishText); > > > while (enum.hasNext()) > > > wds_en.addMatch((String)enum.nextElement()); > > > > > > SimpleWordsDataSource wds_sv = new SimpleWordsDataSource(); > > > enum = enumerateWords(swedishText); > > > while (enum.hasNext()) > > > wds_sv.addMatch((String)enum.nextElement()); > > > > > > BayesianClassifier c_en = new BayesianClassifier(wds_en); > > > System.out.println(c_en.classify("hello, my name is > > > karl.")); // returns 0.5 > > > > > > BayesianClassifier c_sv = new BayesianClassifier(wds_sv); > > > System.out.println(c_sv.classify("hello, my name is > > > karl.")); // returns 0.5 > > > } > > > > > > > > > What do I do wrong? > > > > > > > > > > > > -- > > > > > > karl > > > > > > > > > ------------------------------------------------------- > > > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > > > Build and deploy apps & Web services for Linux with > > > a free DVD software kit from IBM. Click Now! > > > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > > > _______________________________________________ > > > Classifier4j-devel mailing list > > > Cla...@li... > > > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > > > > > > > > > ------------------------------------------------------- > > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > > Build and deploy apps & Web services for Linux with > > a free DVD software kit from IBM. Click Now! > > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > > _______________________________________________ > > Classifier4j-devel mailing list > > Cla...@li... > > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > > > ------------------------------------------------------- > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > Build and deploy apps & Web services for Linux with > a free DVD software kit from IBM. Click Now! > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > _______________________________________________ > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > |
From: Peter L. <pe...@le...> - 2004-02-17 22:55:13
|
It may be worth putting this typw of question down as a FAQ item... It's been asked a few times.. On Wed, 18 Feb 2004 08:38:46 +1030, Nick Lothian wrote: > > You need to train the classifier with both matches and non-matches. > > Use the .addNonMatch(String) method to train the non-matches. > > > -----Original Message----- > > From: karl wettin [mailto:we...@us...] > > Sent: Tuesday, 17 February 2004 2:16 AM > > To: cla...@li... > > Subject: [Classifier4j-devel] stuff > > Importance: Low > > > > > > > > Not too much action here, I guess. > > > > However, I tried to classify some text using the Bayesian Classifier > > to guess language. All responses are 0.5, so I guess it's me doing > > something wrong. > > > > The code looked something like this: > > > > { > > SimpleWordsDataSource wds_en = new SimpleWordsDataSource(); > > Enumerator enum = enumerateWords(englishText); > > while (enum.hasNext()) > > wds_en.addMatch((String)enum.nextElement()); > > > > SimpleWordsDataSource wds_sv = new SimpleWordsDataSource(); > > enum = enumerateWords(swedishText); > > while (enum.hasNext()) > > wds_sv.addMatch((String)enum.nextElement()); > > > > BayesianClassifier c_en = new BayesianClassifier(wds_en); > > System.out.println(c_en.classify("hello, my name is > > karl.")); // returns 0.5 > > > > BayesianClassifier c_sv = new BayesianClassifier(wds_sv); > > System.out.println(c_sv.classify("hello, my name is > > karl.")); // returns 0.5 > > } > > > > > > What do I do wrong? > > > > > > > > -- > > > > karl > > > > > > ------------------------------------------------------- > > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > > Build and deploy apps & Web services for Linux with > > a free DVD software kit from IBM. Click Now! > > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > > _______________________________________________ > > Classifier4j-devel mailing list > > Cla...@li... > > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > > > > > ------------------------------------------------------- > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > Build and deploy apps & Web services for Linux with > a free DVD software kit from IBM. Click Now! > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > _______________________________________________ > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel |
From: Nick L. <nl...@es...> - 2004-02-17 22:13:37
|
You need to train the classifier with both matches and non-matches. Use the .addNonMatch(String) method to train the non-matches. > -----Original Message----- > From: karl wettin [mailto:we...@us...] > Sent: Tuesday, 17 February 2004 2:16 AM > To: cla...@li... > Subject: [Classifier4j-devel] stuff > Importance: Low > > > > Not too much action here, I guess. > > However, I tried to classify some text using the Bayesian Classifier > to guess language. All responses are 0.5, so I guess it's me doing > something wrong. > > The code looked something like this: > > { > SimpleWordsDataSource wds_en = new SimpleWordsDataSource(); > Enumerator enum = enumerateWords(englishText); > while (enum.hasNext()) > wds_en.addMatch((String)enum.nextElement()); > > SimpleWordsDataSource wds_sv = new SimpleWordsDataSource(); > enum = enumerateWords(swedishText); > while (enum.hasNext()) > wds_sv.addMatch((String)enum.nextElement()); > > BayesianClassifier c_en = new BayesianClassifier(wds_en); > System.out.println(c_en.classify("hello, my name is > karl.")); // returns 0.5 > > BayesianClassifier c_sv = new BayesianClassifier(wds_sv); > System.out.println(c_sv.classify("hello, my name is > karl.")); // returns 0.5 > } > > > What do I do wrong? > > > > -- > > karl > > > ------------------------------------------------------- > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > Build and deploy apps & Web services for Linux with > a free DVD software kit from IBM. Click Now! > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > _______________________________________________ > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > |
From: karl w. <we...@us...> - 2004-02-17 15:50:48
|
Not too much action here, I guess. However, I tried to classify some text using the Bayesian Classifier to guess language. All responses are 0.5, so I guess it's me doing something wrong. The code looked something like this: { SimpleWordsDataSource wds_en = new SimpleWordsDataSource(); Enumerator enum = enumerateWords(englishText); while (enum.hasNext()) wds_en.addMatch((String)enum.nextElement()); SimpleWordsDataSource wds_sv = new SimpleWordsDataSource(); enum = enumerateWords(swedishText); while (enum.hasNext()) wds_sv.addMatch((String)enum.nextElement()); BayesianClassifier c_en = new BayesianClassifier(wds_en); System.out.println(c_en.classify("hello, my name is karl.")); // returns 0.5 BayesianClassifier c_sv = new BayesianClassifier(wds_sv); System.out.println(c_sv.classify("hello, my name is karl.")); // returns 0.5 } What do I do wrong? -- karl |
From: karl w. <we...@us...> - 2004-01-18 19:25:10
|
Project, I want to create user profiles with interest based on documents they insert in to my system. My thought on this is to extract the "fingerprint" of a document: the weights of phrases and words. How can I (if possible) use classifier4j for this purpose? I honestly don't understand how to implement the library it at all. Is there any sample applications available, except from the snippets of code availabe in the documention? I'd really appreciate some pointers in the right direction. Thanks, karl |
From: emen <em...@o2...> - 2004-01-14 02:19:43
|
Theres something I at first had taken as a bug, but then realised that its probably my improper use. Nick asked me to report it here anyway, so I do. I'll dump here a lot of code so read patiently or at all ;) Looking at example included in classifier I made myself a small class that makes my life easier when it comes to classify things or train classifier. I have added some comments inside the code to show three states: 1 error, 2 still error, 3 works fine. So here it is: package blackwidow.Classifiers.Text; import blackwidow.Conf.*; import java.io.*; import java.sql.SQLException; import net.sf.classifier4J.DefaultTokenizer; import net.sf.classifier4J.ITrainableClassifier; import net.sf.classifier4J.bayesian.*; import blackwidow.GUI.*; import net.sf.classifier4J.*; /** * <p>Title: BackWidow</p> * <p>Description: </p> * <p>Copyright: Copyright (c) emen 2003</p> * <p>Company: </p> * @author emen * @version 0.0.1 */ public class bwBayesianTextClassifier { //public final static String connectionString = bwConfig.propConf.getProperty("JdbmUrl"); public final static String relativeDBPath = bwConfig.propConf.getProperty("RelativeDBPath"); //public final static String username = bwConfig.propConf.getProperty("JdbmUserName"); //public final static String password = bwConfig.propConf.getProperty("JdbmUserPass"); JDBMWordsDataSource wds; BayesianClassifier classifier; public bwBayesianTextClassifier() { try { wds = new JDBMWordsDataSource(relativeDBPath); wds.open(); classifier = new BayesianClassifier(wds); } catch(Exception ex) { bwDebugOutput.display(ex); ex.printStackTrace(); } } //************ followning code was added in step 2 / remove it for stem 3 public void reInit() { wds.close(); try { wds = new JDBMWordsDataSource(relativeDBPath); wds.open(); //classifier = new BayesianClassifier(wds); } catch(Exception ex) { bwDebugOutput.display(ex); ex.printStackTrace(); } } //************ end of step 2 //************ followning code was added in step 3 public void reInit() { try { wds = new JDBMWordsDataSource(relativeDBPath); wds.open(); classifier = new BayesianClassifier(wds); } catch(Exception ex) { bwDebugOutput.display(ex); ex.printStackTrace(); } } //************ end of step 3 public void trainClassifierFromFile(boolean isMatch, String filename) throws FileNotFoundException, IOException, ClassifierException { reInit(); // <======================== This line added in step 2 InputStream input = new FileInputStream(filename); BufferedReader reader = new BufferedReader(new InputStreamReader(input)); //int c; String line = ""; StringBuffer fileContents = new StringBuffer(""); while((line = reader.readLine()) != null) { fileContents.append(line); } String contents = fileContents.toString(); int length = new DefaultTokenizer().tokenize(contents).length; long startTime = System.currentTimeMillis(); if(isMatch) { System.out.println("Training Classifier4J with " + length + " matching words. This may take a while."); bwDebugOutput.display("bwBayesianTextClassifier.trainClassifierFromFile( ): Training Classifier4J with " + length + " matching words. This may take a while."); classifier.teachMatch(contents); } else { System.out.println("Training Classifier4J with " + length + " non-matching words. This may take a while."); bwDebugOutput.display("bwBayesianTextClassifier.trainClassifierFromFile( ): Training Classifier4J with " + length + " non-matching words. This may take a while."); classifier.teachNonMatch(contents); } long endTime = System.currentTimeMillis(); long time = (endTime - startTime) / 1000; if(time == 0) { time = 1; } System.out.println("Done. Took " + time + " seconds, which is " + length / time + " words per second."); bwDebugOutput.display("bwBayesianTextClassifier.trainClassifierFromFile( ): Done. Took " + time + " seconds, which is " + length / time + " words per second."); wds.close(); } public void trainClassifier(boolean isMatch, String contents) throws FileNotFoundException, IOException, ClassifierException { reInit(); // <======================== This line added in step 2 if(contents == null) { bwDebugOutput.display("bwBayesianTextClassifier.trainClassifier(): null content passed as training data, skipping."); return; } int length = new DefaultTokenizer().tokenize(contents).length; long startTime = System.currentTimeMillis(); if(isMatch) { System.out.println("Training Classifier4J with " + length + " matching words. This may take a while."); bwDebugOutput.display("bwBayesianTextClassifier.trainClassifier(): Training Classifier4J with " + length + " matching words. This may take a while."); classifier.teachMatch(contents); } else { System.out.println("Training Classifier4J with " + length + " non-matching words. This may take a while."); bwDebugOutput.display("bwBayesianTextClassifier.trainClassifier(): Training Classifier4J with " + length + " non-matching words. This may take a while."); classifier.teachNonMatch(contents); } long endTime = System.currentTimeMillis(); long time = (endTime - startTime) / 1000; if(time == 0) { time = 1; } System.out.println("Done. Took " + time + " seconds, which is " + length / time + " words per second."); bwDebugOutput.display("bwBayesianTextClassifier.trainClassifier(): Done. Took " + time + " seconds, which is " + length / time + " words per second."); wds.close(); } public boolean classifyFromFile(String filename) { reInit(); // <======================== This line added in step 2 InputStream input = null; //int c; String line = ""; StringBuffer fileContents = new StringBuffer(""); try { input = new FileInputStream(filename); BufferedReader reader = new BufferedReader(new InputStreamReader(input)); while ( (line = reader.readLine()) != null) { fileContents.append( line ); } reader.close(); } catch(Exception ex) { bwDebugOutput.display(ex); ex.printStackTrace(); } String contents = fileContents.toString(); int length = new DefaultTokenizer().tokenize(contents).length; System.out.println("Analysing " + filename + " (contains " + length + " words). This may take a while."); bwDebugOutput.display("bwBayesianTextClassifier.classifyFromFile(): Analysing " + filename + " (contains " + length + " words). This may take a while."); long startTime = System.currentTimeMillis(); double matchProb = 0.0; try { matchProb = classifier.classify(contents); } catch(ClassifierException ex2) { bwDebugOutput.display(ex2); ex2.printStackTrace(); } long endTime = System.currentTimeMillis(); long time = (endTime - startTime)/1000; if(time == 0) { time = 1; } System.out.println("Done. Took " + time + " seconds, which is " + length/time + " words per second."); bwDebugOutput.display("bwBayesianTextClassifier.classifyFromFile(): Done. Took " + time + " seconds, which is " + length/time + " words per second."); System.out.println("Match Probability = " + matchProb); bwDebugOutput.display("bwBayesianTextClassifier.classifyFromFile(): Match Probability = " + matchProb); boolean retVal = classifier.isMatch(matchProb); System.out.println("Is considered a match: " + retVal); bwDebugOutput.display("bwBayesianTextClassifier.classifyFromFile(): Is considered a match: " + retVal); //wds.close(); return retVal; } public boolean classify(String contents) { reInit(); // <======================== This line added in step 2 int length = new DefaultTokenizer().tokenize(contents).length; System.out.println("Analysing user string input (contains " + length + " words). This may take a while."); bwDebugOutput.display("bwBayesianTextClassifier.classify(): Analysing user string input (contains " + length + " words). This may take a while."); long startTime = System.currentTimeMillis(); double matchProb = 0.0; try { matchProb = classifier.classify(contents); } catch(ClassifierException ex) { bwDebugOutput.display(ex); ex.printStackTrace(); } long endTime = System.currentTimeMillis(); long time = (endTime - startTime)/1000; if(time == 0) { time = 1; } System.out.println("Done. Took " + time + " seconds, which is " + length/time + " words per second."); bwDebugOutput.display("bwBayesianTextClassifier.classify(): Done. Took " + time + " seconds, which is " + length/time + " words per second."); System.out.println("Match Probability = " + matchProb); bwDebugOutput.display("bwBayesianTextClassifier.classify(): Match Probability = " + matchProb); boolean retVal = classifier.isMatch(matchProb); System.out.println("Is considered a match: " + retVal); bwDebugOutput.display("bwBayesianTextClassifier.classify(): Is considered a match: " + retVal); //wds.close(); return retVal; } } The below exception is thrown after first succesfull training of classifier. It occurs at every next attempt to teach mach or non match. Above step 3 is the workaround and then all seems to work fine. Stack trace for errors: java.lang.NullPointerException at jdbm.recman.PageManager.getFirst(PageManager.java:211) at jdbm.recman.PageCursor.next(PageCursor.java:90) at jdbm.recman.FreePhysicalRowIdPageManager.get(FreePhysicalRowIdPageManage r.java:82) at jdbm.recman.PhysicalRowIdManager.alloc(PhysicalRowIdManager.java:162) at jdbm.recman.PhysicalRowIdManager.insert(PhysicalRowIdManager.java:77) at jdbm.recman.RecordManager.insert(RecordManager.java:143) at jdbm.recman.RecordManager.insert(RecordManager.java:156) at jdbm.btree.BPage.<init>(BPage.java:206) at jdbm.btree.BPage.insert(BPage.java:364) at jdbm.btree.BPage.insert(BPage.java:326) at jdbm.btree.BTree.insert(BTree.java:270) at net.sf.classifier4J.bayesian.JDBMWordsDataSource.addNonMatch(JDBMWordsDa taSource.java:157) at net.sf.classifier4J.bayesian.BayesianClassifier.teachNonMatch(BayesianCl assifier.java:267) at net.sf.classifier4J.bayesian.BayesianClassifier.teachNonMatch(BayesianCl assifier.java:218) at net.sf.classifier4J.bayesian.BayesianClassifier.teachNonMatch(BayesianCl assifier.java:190) at blackwidow.Classifiers.Text.bwBayesianTextClassifier.trainClassifier(bwB ayesianTextClassifier.java:166) at blackwidow.Classifiers.Text.bwClassifierTrainer.train(bwClassifierTraine r.java:49) at blackwidow.GUI.bwLearnByExamplesTab$1.actionPerformed(bwLearnByExamplesT ab.java:101) at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:1764) at javax.swing.AbstractButton$ForwardActionEvents.actionPerformed(AbstractB utton.java:1817) at javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.ja va:419) at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:257) at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(BasicButtonList ener.java:245) at java.awt.Component.processMouseEvent(Component.java:5093) at java.awt.Component.processEvent(Component.java:4890) at java.awt.Container.processEvent(Container.java:1566) at java.awt.Component.dispatchEventImpl(Component.java:3598) at java.awt.Container.dispatchEventImpl(Container.java:1623) at java.awt.Component.dispatchEvent(Component.java:3439) at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:3450) at java.awt.LightweightDispatcher.processMouseEvent(Container.java:3165) at java.awt.LightweightDispatcher.dispatchEvent(Container.java:3095) at java.awt.Container.dispatchEventImpl(Container.java:1609) at java.awt.Window.dispatchEventImpl(Window.java:1585) at java.awt.Component.dispatchEvent(Component.java:3439) at java.awt.EventQueue.dispatchEvent(EventQueue.java:450) at java.awt.EventDispatchThread.pumpOneEventForHierarchy(EventDispatchThrea d.java:197) at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread. java:150) at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:144) at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:136) at java.awt.EventDispatchThread.run(EventDispatchThread.java:99) I want to ad that I also when classifying get only 0.01 or 0.99 probabilities and no other. It seems to me very strange. I taught classifier about the same number of positive and negative examples, so this probability should varry. Other from that C4J seems to work quite well and accurately if taught well. Regards emen ------------------------------------------------------------------------ - FIGHT BACK AGAINST SPAM! Download Spam Inspector, the Award Winning Anti-Spam Filter http://mail.giantcompany.com |
From: Nick L. <ni...@ma...> - 2004-01-13 13:47:22
|
This is now checked into CVS. It'll take a couple of days to show up to anonymouse CVS, though (unless sourceforge have performed miricles with their CVS servers) Nick >I tested on Windows, which would explain it. I didn't know that about MySQL >(and it isn't very impressive if you ask me. Most databases have a setting >for this kind of thing). > >I'll apply the patch in the next couple of days. > >Nick > > > >>-----Original Message----- >>From: ASARI Takashi [mailto:as...@so...] >>Sent: Monday, 12 January 2004 5:35 PM >>To: cla...@li... >>Subject: Re: [Classifier4j-devel] JDBCWordsDataSource SQLException? >>Importance: Low >> >> >>I sent that patch. And I'm sorry for late reply. I'm using >>MySQL4.1a on >>Linux(Red Hat). >>The case-sensitivity problem won't happen if you use Windows, I think. >> >>Please see the MySQL manual: >>http://www.mysql.com/documentation/mysql/bychapter/ >>manual_Reference.html#Name_case_sensitivity >> >>Sorry for my poor English. >> >>-- >>ASARI Takashi >> >>On 2004.1.2, at 07:44 AM, Nick Lothian wrote: >> >> >>>Another possibility is case-sensitivity. I did recieve a >>> >>> >>patch that >> >> >>>fixed an >>>alledged case-sensitivity problem, but I never found out >>> >>> >>what database >> >> >>>it >>>was for or any other details, so I haven't applied it. >>> >>>That patch modified the line >>> >>>ResultSet rs = dbm.getTables(null, null, "WORD_PROBABILITY", null); >>> >>>to >>> >>>ResultSet rs = dbm.getTables(null, null, "word_probability", null); >>> >>>so that might be worth trying. Please post back if you try >>> >>> >>it and it >> >> >>>works. >>> >>>Nick >>> >>> >> >>------------------------------------------------------- >>This SF.net email is sponsored by: Perforce Software. >>Perforce is the Fast Software Configuration Management System offering >>advanced branching capabilities and atomic changes on 50+ platforms. >>Free Eval! http://www.perforce.com/perforce/loadprog.html >>_______________________________________________ >>Classifier4j-devel mailing list >>Cla...@li... >>https://lists.sourceforge.net/lists/listinfo/classifier4j-devel >> >> >> > > >------------------------------------------------------- >This SF.net email is sponsored by: Perforce Software. >Perforce is the Fast Software Configuration Management System offering >advanced branching capabilities and atomic changes on 50+ platforms. >Free Eval! http://www.perforce.com/perforce/loadprog.html >_______________________________________________ >Classifier4j-devel mailing list >Cla...@li... >https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > > > |
From: Nick L. <nl...@es...> - 2004-01-12 22:12:32
|
I tested on Windows, which would explain it. I didn't know that about MySQL (and it isn't very impressive if you ask me. Most databases have a setting for this kind of thing). I'll apply the patch in the next couple of days. Nick > -----Original Message----- > From: ASARI Takashi [mailto:as...@so...] > Sent: Monday, 12 January 2004 5:35 PM > To: cla...@li... > Subject: Re: [Classifier4j-devel] JDBCWordsDataSource SQLException? > Importance: Low > > > I sent that patch. And I'm sorry for late reply. I'm using > MySQL4.1a on > Linux(Red Hat). > The case-sensitivity problem won't happen if you use Windows, I think. > > Please see the MySQL manual: > http://www.mysql.com/documentation/mysql/bychapter/ > manual_Reference.html#Name_case_sensitivity > > Sorry for my poor English. > > -- > ASARI Takashi > > On 2004.1.2, at 07:44 AM, Nick Lothian wrote: > > Another possibility is case-sensitivity. I did recieve a > patch that > > fixed an > > alledged case-sensitivity problem, but I never found out > what database > > it > > was for or any other details, so I haven't applied it. > > > > That patch modified the line > > > > ResultSet rs = dbm.getTables(null, null, "WORD_PROBABILITY", null); > > > > to > > > > ResultSet rs = dbm.getTables(null, null, "word_probability", null); > > > > so that might be worth trying. Please post back if you try > it and it > > works. > > > > Nick > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Perforce Software. > Perforce is the Fast Software Configuration Management System offering > advanced branching capabilities and atomic changes on 50+ platforms. > Free Eval! http://www.perforce.com/perforce/loadprog.html > _______________________________________________ > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > |
From: ASARI T. <as...@so...> - 2004-01-12 07:05:04
|
I sent that patch. And I'm sorry for late reply. I'm using MySQL4.1a on Linux(Red Hat). The case-sensitivity problem won't happen if you use Windows, I think. Please see the MySQL manual: http://www.mysql.com/documentation/mysql/bychapter/ manual_Reference.html#Name_case_sensitivity Sorry for my poor English. -- ASARI Takashi On 2004.1.2, at 07:44 AM, Nick Lothian wrote: > Another possibility is case-sensitivity. I did recieve a patch that > fixed an > alledged case-sensitivity problem, but I never found out what database > it > was for or any other details, so I haven't applied it. > > That patch modified the line > > ResultSet rs = dbm.getTables(null, null, "WORD_PROBABILITY", null); > > to > > ResultSet rs = dbm.getTables(null, null, "word_probability", null); > > so that might be worth trying. Please post back if you try it and it > works. > > Nick |
From: Nick L. <nl...@es...> - 2004-01-07 22:29:34
|
> > I am enjoying experimenting with your Classifer4J 0.5, > but I ran across a result I did not expect. I have > trained a BayesianClassifier with 22 positive examples > and 1600 negative examples. Many of the positive > examples contain the word "http". None of the > negative examples contain this word. > > The surprising result is that the score of a sentence > with "http" is 0.01. Can you help me to understand > why? > > Here is the sentence and the WordProbability > probabilities for each of the words in the sentence > that were in the training data: > > score = 0.01 for "Mozilla/4.0 (compatible; > grub-client-1.3.7; Crawl your own stuff with > http://grub.org)" > 0.11822660098522167 Mozilla > 0.020618556701030927 4 > 0.07223476297968397 0 > 0.029239766081871343 compatible > 0.10619469026548672 1 > 0.5454545454545454 3 > 0.01 7 > 0.99 http > That sound right to me The math goes like this (and I'm going to round those number off, because I can't be bothered typing them into my calculator): score = ((0.11)(0.02)(0.07)(0.02)(0.11)(0.55)(0.01)(0.99))/((0.11)(0.02)(0.07)(0.02) (0.11)(0.55)(0.01)(0.99) + (1 - 0.11)(1 - 0.02)(1 - 0.07)(1 - 0.02)(1 - 0.11)(1 - 0.55)(1 - 0.01)(1 - 0.99)) = 0.000000001844766/(0.000000001844766 + (0.89)(0.98)(0.93)(0.98)(0.89)(0.45)(0.99)(0.01)) = 0.000000001844766/(0.000000001844766 + 0.003151830266046) = 0.000000001844766/0.003151832110812 = 0.00000058529957660871 Classifier4J has a cut-off system where anything under 0.01 gets 0.01. Does that help explain things? This code for this is in net.sf.classifier4J.bayesian.BayesianClassifier (see <http://classifier4j.sourceforge.net/xref/net/sf/classifier4J/bayesian/Bayes ianClassifier.html>) Nick Lothian |
From: Mike M. <set...@ya...> - 2004-01-07 06:44:28
|
I am enjoying experimenting with your Classifer4J 0.5, but I ran across a result I did not expect. I have trained a BayesianClassifier with 22 positive examples and 1600 negative examples. Many of the positive examples contain the word "http". None of the negative examples contain this word. The surprising result is that the score of a sentence with "http" is 0.01. Can you help me to understand why? Here is the sentence and the WordProbability probabilities for each of the words in the sentence that were in the training data: score = 0.01 for "Mozilla/4.0 (compatible; grub-client-1.3.7; Crawl your own stuff with http://grub.org)" 0.11822660098522167 Mozilla 0.020618556701030927 4 0.07223476297968397 0 0.029239766081871343 compatible 0.10619469026548672 1 0.5454545454545454 3 0.01 7 0.99 http Thanks for providing this great software. It genrally works well, but this one result is surprising and I would like to understand it better. Mike Moore __________________________________ Do you Yahoo!? Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes http://hotjobs.sweepstakes.yahoo.com/signingbonus |
From: Nick L. <ni...@ma...> - 2004-01-04 08:45:04
|
Yes, passing the same name will be fine - or else you can pass no category name and it will just use a "default" category. Nick ----- Original Message ----- From: "Brent L Johnson" <br...@bj...> To: <cla...@li...> Sent: Sunday, January 04, 2004 4:04 AM Subject: RE: [Classifier4j-devel] Bayesian Results 0.99 for Everything? > Aaahhhh - that must be the problem then. I can export some emails > from my inbox and let it classify those as matches and see > if that helps. And what about categories? It OK to teachMatch and > teachNonMatch all the emails with the same category name? > > Sorry for the simple questions.. but I want to make sure Im using > it right so I get the best possible probabilities for spam > matching at the server :) > > Thanks, > > - Brent > > > -----Original Message----- > > From: cla...@li... > > [mailto:cla...@li...] On > > Behalf Of Nick Lothian > > Sent: Saturday, January 03, 2004 6:30 AM > > To: cla...@li... > > Subject: Re: [Classifier4j-devel] Bayesian Results 0.99 for > > Everything? > > > > > > Are you teaching it non-matches as well as matches? > > > > It needs something to compare the probability against. > > > > Nick |
From: Brent L J. <br...@bj...> - 2004-01-03 17:34:42
|
Aaahhhh - that must be the problem then. I can export some emails from my inbox and let it classify those as matches and see if that helps. And what about categories? It OK to teachMatch and teachNonMatch all the emails with the same category name? Sorry for the simple questions.. but I want to make sure Im using it right so I get the best possible probabilities for spam matching at the server :) Thanks, - Brent > -----Original Message----- > From: cla...@li... > [mailto:cla...@li...] On > Behalf Of Nick Lothian > Sent: Saturday, January 03, 2004 6:30 AM > To: cla...@li... > Subject: Re: [Classifier4j-devel] Bayesian Results 0.99 for > Everything? > > > Are you teaching it non-matches as well as matches? > > It needs something to compare the probability against. > > Nick > > ----- Original Message ----- > From: "Brent L Johnson" <br...@bj...> > To: <cla...@li...> > Sent: Saturday, January 03, 2004 8:44 AM > Subject: [Classifier4j-devel] Bayesian Results 0.99 for Everything? > > > > How does the BayesianClassifier differ from a program similar to > > "SpamBayes" (has an Outlook plugin that uses the Bayesian > algorith to > > classify emails). I "taught" the Bayesian classifier by teaching > > it with some spam emails I have. Now it returns a 0.99 classify > > result for practically EVERYTHING. > > > > A little background.. > > > > I exported about 6400 spam emails from Outlook to an > mbox-ish format > > using Outport (outport.sourceforge.net). I then read the > subject and > > body of each email and ran a BayesianClassifier.teachMatch("spam", > > "..."); > > > > I pass it a string consisting of all the words in the > subject and body > > of the message (separated by a space). This ended up > creating about > > 60,000 rows in the word_probablity database. > > > > I wrote a BayesianMatcher class for James (james.apache.org). > > Basically, James (smtp/pop3 server) uses a FetchPop class > to pull down > > emails from my pop3 account and route those to a local email user. > > During this time I pass the email through to a matcher that uses > > BayesianClassifier.classify() > > to test whether it gets a 90% or better classification.. if so the > > letter > > is filed into "deadletters" and I never see it in Outlook.. > if its less > > than > > 90% it leaves the email untouched and delivers it. > > > > Problem is.. everything is getting sent to deadletters because of a > > 0.99 match on everything it received. > > > > Is this not the proper way to use the classifier? Is there a > > different way I should use it to get the results Im looking for? > > > > Any help/suggestions would be appreciated! > > > > Thanks, > > > > - Brent > > > > > > > > ------------------------------------------------------- > > This SF.net email is sponsored by: IBM Linux Tutorials. Become an > > expert in LINUX or just sharpen your skills. Sign up for > IBM's Free > > Linux Tutorials. Learn everything from the bash shell to > sys admin. > > Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click > > _______________________________________________ > > Classifier4j-devel mailing list > > Cla...@li... > > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: IBM Linux Tutorials. > Become an expert in LINUX or just sharpen your skills. Sign > up for IBM's Free Linux Tutorials. Learn everything from the > bash shell to sys admin. Click now! > http://ads.osdn.com/?ad_id=1278&alloc_id=3371> &op=click > > _______________________________________________ > > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > > > |
From: Nick L. <ni...@ma...> - 2004-01-03 11:30:01
|
Are you teaching it non-matches as well as matches? It needs something to compare the probability against. Nick ----- Original Message ----- From: "Brent L Johnson" <br...@bj...> To: <cla...@li...> Sent: Saturday, January 03, 2004 8:44 AM Subject: [Classifier4j-devel] Bayesian Results 0.99 for Everything? > How does the BayesianClassifier differ from a program similar > to "SpamBayes" (has an Outlook plugin that uses the Bayesian > algorith to classify emails). I "taught" the Bayesian classifier by > teaching > it with some spam emails I have. Now it returns a 0.99 classify > result for practically EVERYTHING. > > A little background.. > > I exported about 6400 spam emails from Outlook to an mbox-ish format > using Outport (outport.sourceforge.net). I then read the subject and > body of each email and ran a BayesianClassifier.teachMatch("spam", > "..."); > > I pass it a string consisting of all the words in the subject and > body of the message (separated by a space). This ended up creating > about 60,000 rows in the word_probablity database. > > I wrote a BayesianMatcher class for James (james.apache.org). > Basically, > James (smtp/pop3 server) uses a FetchPop class to pull down emails > from my pop3 account and route those to a local email user. During this > time I pass the email through to a matcher that uses > BayesianClassifier.classify() > to test whether it gets a 90% or better classification.. if so the > letter > is filed into "deadletters" and I never see it in Outlook.. if its less > than > 90% it leaves the email untouched and delivers it. > > Problem is.. everything is getting sent to deadletters because of a 0.99 > match on everything it received. > > Is this not the proper way to use the classifier? Is there a different > way I should use it to get the results Im looking for? > > Any help/suggestions would be appreciated! > > Thanks, > > - Brent > > > > ------------------------------------------------------- > This SF.net email is sponsored by: IBM Linux Tutorials. > Become an expert in LINUX or just sharpen your skills. Sign up for IBM's > Free Linux Tutorials. Learn everything from the bash shell to sys admin. > Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click > _______________________________________________ > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > |
From: Brent L J. <br...@bj...> - 2004-01-02 22:14:39
|
How does the BayesianClassifier differ from a program similar to "SpamBayes" (has an Outlook plugin that uses the Bayesian algorith to classify emails). I "taught" the Bayesian classifier by teaching it with some spam emails I have. Now it returns a 0.99 classify result for practically EVERYTHING. A little background.. I exported about 6400 spam emails from Outlook to an mbox-ish format using Outport (outport.sourceforge.net). I then read the subject and body of each email and ran a BayesianClassifier.teachMatch("spam", "..."); I pass it a string consisting of all the words in the subject and body of the message (separated by a space). This ended up creating about 60,000 rows in the word_probablity database. I wrote a BayesianMatcher class for James (james.apache.org). Basically, James (smtp/pop3 server) uses a FetchPop class to pull down emails from my pop3 account and route those to a local email user. During this time I pass the email through to a matcher that uses BayesianClassifier.classify() to test whether it gets a 90% or better classification.. if so the letter is filed into "deadletters" and I never see it in Outlook.. if its less than 90% it leaves the email untouched and delivers it. Problem is.. everything is getting sent to deadletters because of a 0.99 match on everything it received. Is this not the proper way to use the classifier? Is there a different way I should use it to get the results Im looking for? Any help/suggestions would be appreciated! Thanks, - Brent |
From: Matt C. <MCo...@my...> - 2004-01-02 18:34:13
|
Matt Collier RemoteIT mco...@my... 877-4-NEW-LAN -----Original Message----- From: "Jacob Hinkle" <JH...@Tr...> To: <mco...@my...> Date: Fri, 2 Jan 2004 13:24:54 -0500 Subject: TradeStation 7 Here is the link to the EL Manuals. Call or email me with any other questions. http://www.tradestationsupport.com/books/ Thanks, Jake Hinkle Active Trader Sales Representative TradeStation Securities, Inc. A wholly-owned subsidiary of TradeStation Group, Inc http://www.TradeStation.com <http://www.tradestation.com/> < http://www.tradestation.com/> Toll Free Direct Line: 888-288-2123 Local: 954-652-7413 FAX: 954-652-5413<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /> Email: jh...@tr... *System access and trade placement and execution may be delayed or fail due to market volatility and volume, quote delays, system and software errors, Internet traffic, outages and other factors. TradeStation Group, Inc. is a publicly-traded holding company (Nasdaq: TRAD) of two operating subsidiaries, TradeStation Securities, Inc. (Member NASD, SIPC and NFA) and TradeStation Technologies, Inc. (formerly known as Omega Research, Inc.). TradeStation Securities provides securities brokerage services for institutional and active individual traders. TradeStation Technologies provides software products and services for traders. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer. |
From: Brent L J. <br...@bj...> - 2004-01-01 22:56:48
|
That did indeed fix the problem. Thanks, - Brent > Actually I don't think I've heard of that problem before. > > What version of MySQL are you using? I've only tested with v4. > > Another possibility is case-sensitivity. I did recieve a > patch that fixed an alledged case-sensitivity problem, but I > never found out what database it was for or any other > details, so I haven't applied it. > > That patch modified the line > > ResultSet rs = dbm.getTables(null, null, "WORD_PROBABILITY", null); > > to > > ResultSet rs = dbm.getTables(null, null, "word_probability", null); > > so that might be worth trying. Please post back if you try it > and it works. > > Nick > > > -----Original Message----- > > From: Brent L Johnson [mailto:br...@bj...] > > Sent: Wednesday, 31 December 2003 2:51 PM > > To: cla...@li... > > Subject: [Classifier4j-devel] JDBCWordsDataSource SQLException? > > Importance: Low > > > > > > Im sure this is a fairly common question but I didn't see > > it in the mailing list archives. > > > > Im using MySQL and when I first create a JDBCWordsDataSource it > > creates a table 'word_probability'. But when I run it after that I > > get the following exception: > > > > net.sf.classifier4J.bayesian.WordsDataSourceException: > > Problem creating > > table > > at > > net.sf.classifier4J.bayesian.JDBCWordsDataSource.createTable(J > > DBCWordsDa > > taSource.java:252) > > at > > net.sf.classifier4J.bayesian.JDBCWordsDataSource.<init>(JDBCWo > > rdsDataSou > > rce.java:100) > > at > > com.li.sentinel.agent.classify.Classifier.main(Classifier.java:32) > > Caused by: java.sql.SQLException: General error, message > from server: > > "Table 'word_probability' already exists" > > ... > > > > Even though the table exists it looks like the following > line of code > > isn't finding the table? In JDBCWordsDataSource.java(233): > > > > ResultSet rs = dbm.getTables(null, null, "WORD_PROBABILITY", null); > > > > I get an empty resultset apparently. This is when using 0.5. Any > > ideas? > > > > Thanks, > > > > - Brent > > > > > > > > ------------------------------------------------------- > > This SF.net email is sponsored by: IBM Linux Tutorials. Become an > > expert in LINUX or just sharpen your skills. Sign up for IBM's > > Free Linux Tutorials. Learn everything from the bash shell > > to sys admin. > > Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click > > _______________________________________________ > > Classifier4j-devel mailing list > > Cla...@li... > > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: IBM Linux Tutorials. > Become an expert in LINUX or just sharpen your skills. Sign > up for IBM's Free Linux Tutorials. Learn everything from the > bash shell to sys admin. Click now! > http://ads.osdn.com/?ad_id=1278&alloc_id=3371> &op=click > > _______________________________________________ > > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > > > |
From: Nick L. <nl...@es...> - 2004-01-01 22:46:17
|
Actually I don't think I've heard of that problem before. What version of MySQL are you using? I've only tested with v4. Another possibility is case-sensitivity. I did recieve a patch that fixed an alledged case-sensitivity problem, but I never found out what database it was for or any other details, so I haven't applied it. That patch modified the line ResultSet rs = dbm.getTables(null, null, "WORD_PROBABILITY", null); to ResultSet rs = dbm.getTables(null, null, "word_probability", null); so that might be worth trying. Please post back if you try it and it works. Nick > -----Original Message----- > From: Brent L Johnson [mailto:br...@bj...] > Sent: Wednesday, 31 December 2003 2:51 PM > To: cla...@li... > Subject: [Classifier4j-devel] JDBCWordsDataSource SQLException? > Importance: Low > > > Im sure this is a fairly common question but I didn't see > it in the mailing list archives. > > Im using MySQL and when I first create a JDBCWordsDataSource it > creates a table 'word_probability'. But when I run it after that > I get the following exception: > > net.sf.classifier4J.bayesian.WordsDataSourceException: > Problem creating > table > at > net.sf.classifier4J.bayesian.JDBCWordsDataSource.createTable(J > DBCWordsDa > taSource.java:252) > at > net.sf.classifier4J.bayesian.JDBCWordsDataSource.<init>(JDBCWo > rdsDataSou > rce.java:100) > at > com.li.sentinel.agent.classify.Classifier.main(Classifier.java:32) > Caused by: java.sql.SQLException: General error, message from server: > "Table 'word_probability' already exists" > ... > > Even though the table exists it looks like the following line > of code isn't finding the table? In JDBCWordsDataSource.java(233): > > ResultSet rs = dbm.getTables(null, null, "WORD_PROBABILITY", null); > > I get an empty resultset apparently. This is when using 0.5. Any > ideas? > > Thanks, > > - Brent > > > > ------------------------------------------------------- > This SF.net email is sponsored by: IBM Linux Tutorials. > Become an expert in LINUX or just sharpen your skills. Sign > up for IBM's > Free Linux Tutorials. Learn everything from the bash shell > to sys admin. > Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click > _______________________________________________ > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > |