RE: [Classifier4j-devel] First look at classifier4j...
Status: Beta
Brought to you by:
nicklothian
|
From: Nick L. <nl...@es...> - 2003-07-02 00:06:48
|
Welcome! I'm very interested in having help. 1) I hadn't noticed the problems with the zip file. I'll have to check that - you are talking about the Source zip?. I just zipped the source with WinZip, so it doesn't surprise me. 2) I'm reasonably familiar with Hibernate (in theory at least). I'm reluctant to replace the JDBC Data Source with a Hibernate one because I want to make Classifier4J very easy to drop into people's code without too many dependencies. However, I'm not opposed to a HibernateWordsDataSource if you'd like to work on that. Please be aware that there are performance problems at the moment when using a database backend, and I'm not convinced that a normal DB backend will ever be able to deliver sufficient performance on large documents. I think I'll need to have a table that contains the precalculated word probability for each word to get rid of the two-queries-per word issue <http://classifier4j.sourceforge.net/task-list.html#net.sf.classifier4J.baye sian.JDBCWordsDataSource.methods>). I'm not stuck on the DB schema anyway, so if something else works better with Hibernate the go with it. 3) Yeah. I'm thinking an extra column on the *_words tables that joins to a classifier_types table or something. Any comments on the API - net.sf.classifier4J.IClassifier (<http://classifier4j.sourceforge.net/apidocs/net/sf/classifier4J/IClassifie r.html>) in particular? Thanks for the feedback. Nick -----Original Message----- From: Peter Leschev [mailto:pe...@le...] Sent: Tuesday, 1 July 2003 10:03 PM To: cla...@li... Subject: [Classifier4j-devel] First look at classifier4j... Heya, I need a Bayesian classifier for a pet project that I'm going to be working on. Instead of reinventing the wheel, I thought I'd check out what was available.... I'd like to help out with this project (If you're interested that is!)... I understand that 0.2 was an initial release, so I'm probably stating the obvious here, anyhow, here are a couple of suggestions: - Deployment - The zip file has CVS entries in it. From memory, performing a zip from ant doesn't include the CVS entries by default. - Also would be nice if all the files where under "Classifier4J-0.2" instead of ".".... - Database - I would recommend using hibernate (http://hibernate.bluemars.net) instead of directly using JDBC. I've used this product with other projects and I wholeheartedly recommend it. This would allow classifier4j get cached query results etc for free... Also keeps the code cleaner... - Currently the model only supports one classifier in the database, I would require a number of separate classifiers to be stored in the database. Pete |