[Classifier4j-devel] Update Word Probability Break Down
Status: Beta
Brought to you by:
nicklothian
From: Matt C. <MCo...@my...> - 2003-11-12 03:22:57
|
I have duplicated this problem in a completely separate computing environment (my home). In this case, mySQL is running on localhost. Exact same problem, and exact same symptoms. I would also add that my clients in both environments are running Windows XP Pro. ---- More data on this issue: Switching to HSQLDB produces the exact same results. I have attached the revised connect.java. for use with HDSQLDB. ---- Another interesting discovery. If I attempt to run connect.java a second time immediately after running it the first time when in errors out, the following message is displayed immediately: WordsDataSourceException Occurred : Problem creating table java.lang.IllegalArgumentException: IWordsDataSource can't be null at net.sf.classifier4J.bayesian.BayesianClassifier.<init> (BayesianClassifier.java:141) at net.sf.classifier4J.bayesian.BayesianClassifier.<init> (BayesianClassifier.java:128) at net.sf.classifier4J.bayesian.BayesianClassifier.<init> (BayesianClassifier.java:118) at Connect.main(Connect.java:26) Exception in thread "main" However, if I wait about 60-90 seconds between executions, it will process the ~3900 records again and die. ---- I just discovered that the reply address on the list messages is not the list but the sender. Is it possible to alter this setting and would we want to? Matt Collier RemoteIT mco...@my... 877-4-NEW-LAN -----Original Message----- From: "Matt Collier" <MCo...@my...> To: "Classifier4J" <cla...@li...> Date: Tue, 11 Nov 2003 15:43:39 -0600 Subject: [Classifier4j-devel] Update Word Probability Break Down > Hello All! > > I have been working around the clock on various issues relating to my > ignorance of Java and the nuances of Classifier4J. > > Thanks to Nick, and using the latest CVS code, I have succeeded in > implemeting > Classifier4J after only 60 hours! > > I have now come upon an interesting problem. > > My project involves categorizing a large volume of data. That data exists in > a blob field in a mySQL (4.0.16) database. I am using this same database to > store my word_probability table. I am using the mySQL connector/J 3.0.9. I > am using Java SDK 1.4.2_02. > > My project begins by teaching classifier 4J large amounts of already > classified data. I am providing a category and a string taken from the mySQL > blob field. All is well at this point. > > The bayesian teachMatch function works great for about 4000 words (in my > environment, results may vary), then: > --- > SQL Exception in updateWordProbability : Unable to connect to any hosts due > to > exception: java.net.BindException: Address already in use: connect > > WordsDataSourceException Occurred during teachMatch : Problem updating > WordProbability > --- > > I have added System.out e.getMessage() to the Exception Handler in the > updateWordProbability function to produce the above result. Otherwise, you > simply see an SQL Exception. > > Initially I thought this problem related to my ignorance and improper > implementation of connection pooling. I wrote the attached test program to > eliminate this possibility. I found that the error still existed and is 100% > reproduceable on my system. > > This program effectively loops through x number of teachMatch functions. On > my system, the program starts generating exceptions just before 4000, usually > between 3800 and 4900 iterations. > > Just to make sure I didn't have some environmental problem, I wrote another > program that writes x records to mySQL, emulating the function of > updateWordProbability. No problems here atleast up to 100,000 records. > > I hope someone with more knowlege and experience will be able to figure this > one out. > > Matt Collier > RemoteIT > mco...@my... > 877-4-NEW-LAN |