Re: [Classifier4j-devel] Dev Plan
Status: Beta
Brought to you by:
nicklothian
From: Peter L. <pe...@le...> - 2003-08-08 14:21:06
|
Hi Nick, I just did a cvs update and took a look at version 0.4. I like the IStopWordProvider concept.... A couple of points: - Is there a reason why you've used tabs instead of spaces? Generally spaces are prefered, it's more standard. Some people may have their tab size set to 4 while others have it set to 8 etc... If you always convert tabs to spaces, it's always the same... - Have you seen hsqldb? http://hsqldb.sourceforge.net/ provides an in memory / disk java based database with a JDBC interface. It would be interesting to compare performance between different database solutions. eg. JDBMWordsDataSource v's JDBCWordsDataSource -> hsqldb v's HibernateWordsDatabase -> hsqldb / mysql etc. I'll look into the following: - Fix the following in BayesianClassifier * @todo need an option to only use the "X" most "important" words when calculating overall probability * "important" is defined as being most distant from NEUTAL_PROBABILITY - Look into the current Tokenizer - For example, "1.4" currently gets split into "1" and "4". Shouldn't it just be "1.4"? Also "peter's" is split into "peter" and "s". Shouldn't this be "peter's"? It's probably worth coming up with a set of test cases. - Implement an HTML Tokenizer (depending on how it is configured, html tags will be either included or ignored). - Implement HibernateWordsDataSource - Implement a project which uses Classifier4J. It's looking good! Pete ----- Original Message ----- From: "Nick Lothian" <ni...@ma...> To: <cla...@li...> Sent: Sunday, August 03, 2003 5:12 PM Subject: Re: [Classifier4j-devel] Dev Plan > Currently I'm focused on two things: > > 1) Refactoring category support. > -- I've added ICategorisedClassifier and ICategorisedWordsDataSource > interfaces which have methods like ICategorisedClassifier.classify(String > category, String input); etc, so the categories can be used directly from > the classifier, without having to do "setCategory" on the datasource. I > can't see why we need to keep that state, so I'm removing it. I've just > added these changes to CVS. > > 2) A Classifier4J-Optional jar, which (currently) contains a couple of > demos, a JDBMWordsDataSource (very fast and reliable) and a > JispWordsDataSource (fast, but prone to data corruption, so I'll probably > throw it out). Currently this is not in CVS. > > If you are still interested in the HibernateWordsDataSource, I would see it > going in here. > > As well as those changes I've done some work on Text Summary > (http://www.mackmo.com/nick/blog/java/?permalink=TextSummaryApp.txt), which > is also available. > > I have some plans to do a 0.4 release sometime this week. > > What are you interested in working on? > > Nick > > > ----- Original Message ----- > From: "Peter Leschev" <pe...@le...> > To: <cla...@li...> > Sent: Friday, August 01, 2003 9:46 AM > Subject: [Classifier4j-devel] Dev Plan > > > > Hi Nick, > > > > what are your current plans for JClassifier? What are you > planning on implementing in the > > near future? I just don't want to double up on what we do... > > > > Pete > > > > > > ------------------------------------------------------- > > This SF.Net email sponsored by: Free pre-built ASP.NET sites including > > Data Reports, E-commerce, Portals, and Forums are available now. > > Download today and enter to win an XBOX or Visual Studio .NET. > > > http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 > > _______________________________________________ > > Classifier4j-devel mailing list > > Cla...@li... > > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > > > > > > ------------------------------------------------------- > This SF.Net email sponsored by: Free pre-built ASP.NET sites including > Data Reports, E-commerce, Portals, and Forums are available now. > Download today and enter to win an XBOX or Visual Studio .NET. > http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 > _______________________________________________ > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > > |