Thread: [Classifier4j-devel] Update Word Probability Break Down
Status: Beta
Brought to you by:
nicklothian
From: Matt C. <MCo...@my...> - 2003-11-11 21:41:56
Attachments:
Connect.java
dbTest.java
|
Hello All! I have been working around the clock on various issues relating to my ignorance of Java and the nuances of Classifier4J. Thanks to Nick, and using the latest CVS code, I have succeeded in implemeting Classifier4J after only 60 hours! I have now come upon an interesting problem. My project involves categorizing a large volume of data. That data exists in a blob field in a mySQL (4.0.16) database. I am using this same database to store my word_probability table. I am using the mySQL connector/J 3.0.9. I am using Java SDK 1.4.2_02. My project begins by teaching classifier 4J large amounts of already classified data. I am providing a category and a string taken from the mySQL blob field. All is well at this point. The bayesian teachMatch function works great for about 4000 words (in my environment, results may vary), then: --- SQL Exception in updateWordProbability : Unable to connect to any hosts due to exception: java.net.BindException: Address already in use: connect WordsDataSourceException Occurred during teachMatch : Problem updating WordProbability --- I have added System.out e.getMessage() to the Exception Handler in the updateWordProbability function to produce the above result. Otherwise, you simply see an SQL Exception. Initially I thought this problem related to my ignorance and improper implementation of connection pooling. I wrote the attached test program to eliminate this possibility. I found that the error still existed and is 100% reproduceable on my system. This program effectively loops through x number of teachMatch functions. On my system, the program starts generating exceptions just before 4000, usually between 3800 and 4900 iterations. Just to make sure I didn't have some environmental problem, I wrote another program that writes x records to mySQL, emulating the function of updateWordProbability. No problems here atleast up to 100,000 records. I hope someone with more knowlege and experience will be able to figure this one out. Matt Collier RemoteIT mco...@my... 877-4-NEW-LAN |
From: Matt C. <MCo...@my...> - 2003-11-12 03:22:57
|
I have duplicated this problem in a completely separate computing environment (my home). In this case, mySQL is running on localhost. Exact same problem, and exact same symptoms. I would also add that my clients in both environments are running Windows XP Pro. ---- More data on this issue: Switching to HSQLDB produces the exact same results. I have attached the revised connect.java. for use with HDSQLDB. ---- Another interesting discovery. If I attempt to run connect.java a second time immediately after running it the first time when in errors out, the following message is displayed immediately: WordsDataSourceException Occurred : Problem creating table java.lang.IllegalArgumentException: IWordsDataSource can't be null at net.sf.classifier4J.bayesian.BayesianClassifier.<init> (BayesianClassifier.java:141) at net.sf.classifier4J.bayesian.BayesianClassifier.<init> (BayesianClassifier.java:128) at net.sf.classifier4J.bayesian.BayesianClassifier.<init> (BayesianClassifier.java:118) at Connect.main(Connect.java:26) Exception in thread "main" However, if I wait about 60-90 seconds between executions, it will process the ~3900 records again and die. ---- I just discovered that the reply address on the list messages is not the list but the sender. Is it possible to alter this setting and would we want to? Matt Collier RemoteIT mco...@my... 877-4-NEW-LAN -----Original Message----- From: "Matt Collier" <MCo...@my...> To: "Classifier4J" <cla...@li...> Date: Tue, 11 Nov 2003 15:43:39 -0600 Subject: [Classifier4j-devel] Update Word Probability Break Down > Hello All! > > I have been working around the clock on various issues relating to my > ignorance of Java and the nuances of Classifier4J. > > Thanks to Nick, and using the latest CVS code, I have succeeded in > implemeting > Classifier4J after only 60 hours! > > I have now come upon an interesting problem. > > My project involves categorizing a large volume of data. That data exists in > a blob field in a mySQL (4.0.16) database. I am using this same database to > store my word_probability table. I am using the mySQL connector/J 3.0.9. I > am using Java SDK 1.4.2_02. > > My project begins by teaching classifier 4J large amounts of already > classified data. I am providing a category and a string taken from the mySQL > blob field. All is well at this point. > > The bayesian teachMatch function works great for about 4000 words (in my > environment, results may vary), then: > --- > SQL Exception in updateWordProbability : Unable to connect to any hosts due > to > exception: java.net.BindException: Address already in use: connect > > WordsDataSourceException Occurred during teachMatch : Problem updating > WordProbability > --- > > I have added System.out e.getMessage() to the Exception Handler in the > updateWordProbability function to produce the above result. Otherwise, you > simply see an SQL Exception. > > Initially I thought this problem related to my ignorance and improper > implementation of connection pooling. I wrote the attached test program to > eliminate this possibility. I found that the error still existed and is 100% > reproduceable on my system. > > This program effectively loops through x number of teachMatch functions. On > my system, the program starts generating exceptions just before 4000, usually > between 3800 and 4900 iterations. > > Just to make sure I didn't have some environmental problem, I wrote another > program that writes x records to mySQL, emulating the function of > updateWordProbability. No problems here atleast up to 100,000 records. > > I hope someone with more knowlege and experience will be able to figure this > one out. > > Matt Collier > RemoteIT > mco...@my... > 877-4-NEW-LAN |
From: Matt C. <MCo...@my...> - 2003-11-12 15:32:35
Attachments:
JDBCWordsDataSource.java
updateWordTrace.txt
|
Hi Nick, yes I am using the latest CVS code. How did you determine that the problem resides in the createTable function? Have you been able to reproduce the problem? I am not catching an exception there, I'm catching it in the updateWordProbability. I am including the stack trace and my JDBCWordsDataSource with the additional debug code in it. I am still configured to use HSQLDB which is reflected in the trace. Matt Collier RemoteIT mco...@my... 877-4-NEW-LAN -----Original Message----- From: Nick Lothian <nl...@es...> To: Classifier4J <cla...@li...> Date: Wed, 12 Nov 2003 16:19:47 +1030 Subject: RE: [Classifier4j-devel] Update Word Probability Break Down > > > ---- > > More data on this issue: > > > > Switching to HSQLDB produces the exact same results. I have > > attached the > > revised connect.java. for use with HDSQLDB. > > ---- > > Another interesting discovery. If I attempt to run > > connect.java a second time > > immediately after running it the first time when in errors > > out, the following > > message is displayed immediately: > > > > WordsDataSourceException Occurred : Problem creating table > > java.lang.IllegalArgumentException: IWordsDataSource can't be null > > at net.sf.classifier4J.bayesian.BayesianClassifier.<init> > > (BayesianClassifier.java:141) > > at net.sf.classifier4J.bayesian.BayesianClassifier.<init> > > (BayesianClassifier.java:128) > > at net.sf.classifier4J.bayesian.BayesianClassifier.<init> > > (BayesianClassifier.java:118) > > at Connect.main(Connect.java:26) > > Exception in thread "main" > > > > However, if I wait about 60-90 seconds between executions, it > > will process the > > ~3900 records again and die. > > ---- > > You are getting the second exception trace > (ava.lang.IllegalArgumentException: IWordsDataSource can't be null) because > you are ignoring the WordsDataSourceException, which means that the > IWordsDataSource you are using is null. That make sense. > > Exactly why you are getting the original problem is escapign me at the > moment. > > The error comes from line 247 in the CVS version of JDBCWordsDataSource.java > (you are using the CVS version, right?). > > It occurs if an exception occurs somewhere in the following code: > > 224 con = > connectionManager.getConnection(); > 225 > 226 // check if the word_probability > table exists > 227 DatabaseMetaData dbm = > con.getMetaData(); > 228 ResultSet rs = dbm.getTables(null, > null, "WORD_PROBABILITY", null); > 229 if (!rs.next()) { > 230 // the table does not exist > 231 Statement stmt = > con.createStatement(); > 232 // Under Axion 1.0M1, > use > 233 // stmt.executeUpdate( > "CREATE TABLE word_probability ( " > 234 // + " > word VARCHAR(255) NOT NULL," > 235 // + " > category VARCHAR(20) NOT NULL," > 236 // + " > match_count INTEGER NOT NULL," > 237 // + " > nonmatch_count INTEGER NOT NULL, " > 238 // + " > PRIMARY KEY(word, category) ) "); > 239 stmt.executeUpdate( "CREATE > TABLE word_probability ( " > 240 + " word > VARCHAR(255) NOT NULL," > 241 + " category > VARCHAR(20) NOT NULL," > 242 + " match_count > INT DEFAULT 0 NOT NULL," > 243 + " > nonmatch_count INT DEFAULT 0 NOT NULL, " > 244 + " PRIMARY > KEY(word, category) ) "); > 245 } > > There are three possiblities here > > 1) connectionManager.getConnection(); is failing > 2) DatabaseMetaData dbm = con.getMetaData(); or ResultSet rs = > dbm.getTables(null, null, "WORD_PROBABILITY", null); is failing > 3) The create table query is failing. > > I suspect it is one of the first two. I found a reference to MySQL giving > incorrect error messages when tables are missing > <http://dbforums.com/arch/174/2003/10/952374>, and the error given is the > error you were getting when you were using MySQL. > > Could you put an e.printStackStrace() in where it catches the SQLException > (ie, just before line 247) and send the stack trace you get? > > Nick > > > > > ------------------------------------------------------- > This SF.Net email sponsored by: ApacheCon 2003, > 16-19 November in Las Vegas. Learn firsthand the latest > developments in Apache, PHP, Perl, XML, Java, MySQL, > WebDAV, and more! http://www.apachecon.com/ > _______________________________________________ > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel |
From: Matt C. <MCo...@my...> - 2003-11-12 17:05:24
|
Is it correct to say that our database connection is getting setup and torn down each time updateWordProbability is called? From what I gather, this is not good practice to begin with. Opening and closing a database connection 60-80 times per second has to be taxing. As I understand it, this is where connection pooling comes in. I wonder if JDBC might have some protection mechanism build in for clients that go haywire. Perhaps it closes connections for processes that open and close connections too many times. Maybe it just fails. AH HA! This is a diffence between my dbTest.java and connect.java. I am not connecting and disconnecting on each record. I will rebuild this to test. I don't know the first thing about how to implement connection pooling to begin with, much less in this conext, but I guess that's what I'll start working on! BTW, I've narrowed the error to the call to connectionManager.getConnection() in updateWordProbability. I have increased the exception handling to produce the following information: SQLState: 08S01 VendorError: 0 NextException: null SQLState 08S01 = mySQL error ER_BAD_HOST_ERROR according to: http://mysql.mirror.trueserver.nl/doc/en/Error-returns.html Matt Collier RemoteIT mco...@my... 877-4-NEW-LAN -----Original Message----- From: "Matt Collier" <MCo...@my...> To: cla...@li... Date: Wed, 12 Nov 2003 09:34:08 -0600 Subject: [Classifier4j-devel] Update Word Probability Break Down > Hi Nick, yes I am using the latest CVS code. > > How did you determine that the problem resides in the createTable function? > > Have you been able to reproduce the problem? > > I am not catching an exception there, I'm catching it in the > updateWordProbability. > > I am including the stack trace and my JDBCWordsDataSource with the additional > debug code in it. > > I am still configured to use HSQLDB which is reflected in the trace. > > Matt Collier > RemoteIT > mco...@my... > 877-4-NEW-LAN > > > -----Original Message----- > From: Nick Lothian <nl...@es...> > To: Classifier4J <cla...@li...> > Date: Wed, 12 Nov 2003 16:19:47 +1030 > Subject: RE: [Classifier4j-devel] Update Word Probability Break Down > > > > > > ---- > > > More data on this issue: > > > > > > Switching to HSQLDB produces the exact same results. I have > > > attached the > > > revised connect.java. for use with HDSQLDB. > > > ---- > > > Another interesting discovery. If I attempt to run > > > connect.java a second time > > > immediately after running it the first time when in errors > > > out, the following > > > message is displayed immediately: > > > > > > WordsDataSourceException Occurred : Problem creating table > > > java.lang.IllegalArgumentException: IWordsDataSource can't be null > > > at net.sf.classifier4J.bayesian.BayesianClassifier.<init> > > > (BayesianClassifier.java:141) > > > at net.sf.classifier4J.bayesian.BayesianClassifier.<init> > > > (BayesianClassifier.java:128) > > > at net.sf.classifier4J.bayesian.BayesianClassifier.<init> > > > (BayesianClassifier.java:118) > > > at Connect.main(Connect.java:26) > > > Exception in thread "main" > > > > > > However, if I wait about 60-90 seconds between executions, it > > > will process the > > > ~3900 records again and die. > > > ---- > > > > You are getting the second exception trace > > (ava.lang.IllegalArgumentException: IWordsDataSource can't be null) because > > you are ignoring the WordsDataSourceException, which means that the > > IWordsDataSource you are using is null. That make sense. > > > > Exactly why you are getting the original problem is escapign me at the > > moment. > > > > The error comes from line 247 in the CVS version of > JDBCWordsDataSource.java > > (you are using the CVS version, right?). > > > > It occurs if an exception occurs somewhere in the following code: > > > > 224 con = > > connectionManager.getConnection(); > > 225 > > 226 // check if the word_probability > > table exists > > 227 DatabaseMetaData dbm = > > con.getMetaData(); > > 228 ResultSet rs = dbm.getTables (null, > > null, "WORD_PROBABILITY", null); > > 229 if (!rs.next()) { > > 230 // the table does not exist > > 231 Statement stmt = > > con.createStatement(); > > 232 // Under Axion 1.0M1, > > use > > 233 // stmt.executeUpdate( > > "CREATE TABLE word_probability ( " > > 234 // + " > > word VARCHAR(255) NOT NULL," > > 235 // + " > > category VARCHAR(20) NOT NULL," > > 236 // + " > > match_count INTEGER NOT NULL," > > 237 // + " > > nonmatch_count INTEGER NOT NULL, " > > 238 // + " > > PRIMARY KEY(word, category) ) "); > > 239 stmt.executeUpdate ( "CREATE > > TABLE word_probability ( " > > 240 + " word > > VARCHAR(255) NOT NULL," > > 241 + " category > > VARCHAR(20) NOT NULL," > > 242 + " match_count > > INT DEFAULT 0 NOT NULL," > > 243 + " > > nonmatch_count INT DEFAULT 0 NOT NULL, " > > 244 + " PRIMARY > > KEY(word, category) ) "); > > 245 } > > > > There are three possiblities here > > > > 1) connectionManager.getConnection(); is failing > > 2) DatabaseMetaData dbm = con.getMetaData(); or ResultSet rs = > > dbm.getTables(null, null, "WORD_PROBABILITY", null); is failing > > 3) The create table query is failing. > > > > I suspect it is one of the first two. I found a reference to MySQL giving > > incorrect error messages when tables are missing > > <http://dbforums.com/arch/174/2003/10/952374>, and the error given is the > > error you were getting when you were using MySQL. > > > > Could you put an e.printStackStrace() in where it catches the SQLException > > (ie, just before line 247) and send the stack trace you get? > > > > Nick > > > > > > > > > > ------------------------------------------------------- > > This SF.Net email sponsored by: ApacheCon 2003, > > 16-19 November in Las Vegas. Learn firsthand the latest > > developments in Apache, PHP, Perl, XML, Java, MySQL, > > WebDAV, and more! http://www.apachecon.com/ > > _______________________________________________ > > Classifier4j-devel mailing list > > Cla...@li... > > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel |
From: Matt C. <MCo...@my...> - 2003-11-12 17:38:59
Attachments:
dbTest.java
|
Oh! Happy Day! After I reconstituted dbTest.java (attached) to open and close the database connection each iteration as updateWordProbability does, the exact same error occurs at exactly the same time (around 3900 iterations). This is without using any classifier4J code. So, now the questions arise... 1) Is this problem still somehow isolated to my configuration. I would love it someone could reproduce this problem. 2) is this behavior somehow by design and if so, is there a setting to be altered. 3) if this problem is not isolated to my environment, how has gone undetected. Seems doubtful that no one has attempted to classify teachMatch() a 4000+ word document, or maybe it is possible. 4) if this problem is not limited to my configuration, what is to be done about it. It was suggested that I "might" want to implement connection pooling in my own code. It seems to me, in light of this issue, classifier4J needs to implement connection pooling internally? Is this possible? 5) Meanwhile, any hints on implementing connection pooling in conjunction with classifier4J would be greatly appreciated. I really wish I had some idea what I was talking about... Matt Collier RemoteIT mco...@my... 877-4-NEW-LAN -----Original Message----- From: "Matt Collier" <MCo...@my...> To: cla...@li... Date: Wed, 12 Nov 2003 11:07:04 -0600 Subject: Re: [Classifier4j-devel] Update Word Probability Break Down > Is it correct to say that our database connection is getting setup and torn > down each time updateWordProbability is called? > > From what I gather, this is not good practice to begin with. Opening and > closing a database connection 60-80 times per second has to be taxing. As I > understand it, this is where connection pooling comes in. > > I wonder if JDBC might have some protection mechanism build in for clients > that go haywire. Perhaps it closes connections for processes that open and > close connections too many times. Maybe it just fails. > > AH HA! This is a diffence between my dbTest.java and connect.java. I am not > connecting and disconnecting on each record. I will rebuild this to test. > > I don't know the first thing about how to implement connection pooling to > begin with, much less in this conext, but I guess that's what I'll start > working on! > > BTW, I've narrowed the error to the call to connectionManager.getConnection() > in updateWordProbability. I have increased the exception handling to produce > the following information: > > SQLState: 08S01 > VendorError: 0 > NextException: null > > SQLState 08S01 = mySQL error ER_BAD_HOST_ERROR according to: > > http://mysql.mirror.trueserver.nl/doc/en/Error-returns.html > |