RE: [Classifier4j-devel] Project Improvements

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

> Bayesian tokenizer.  It was reported that the tokenizer 
> improperly handles a 
> number of strings including possessive pronouns and others.  
> Anybody working 
> on this?
> 

I don't remember this discussion. Could you post a reference?

> HTML togenizer for Bayesian system.  Idea was to be able to 
> "ignore" xml in a 
> classification string.  This happens to be required for my 
> current project.  
> I've either got to remove HTML from my source documents or 
> get C4J to ignore 
> it.
> 

Yes, this would be nice.

If you want to do it in C4J then you need to implement the
net.sf.classifier4J.ITokenizer interface.

> Connection pooling.  What ARE we going to do about connection pooling.
> 

Still looking at this.

> Documentation.  We need some.  I would like to help with 
> this.  How do we do 
> it?  What framework are we using for documentation.
> 

Cool.

I'm using Maven to build the website (which contains the docs, such as they
are).

The docs themselves are in CVS (See
<http://cvs.sourceforge.net/viewcvs.py/classifier4j/Classifier4J/xdocs/>) in
xdoc format. The xdoc format is (kindof) documented at
<http://jakarta.apache.org/site/jakarta-site-tags.html>

Patches/New docs/Whatever are greatfully accepted.