Re: [Classifier4j-devel] detect inappropriate content in web post

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

6 feb 2006 kl. 15.31 skrev Jeff Thorne:
> I would like to analyze each users post for various words and  
> expressions before publishing their post to the DB. I was wondering  
> if someone could shed some light on the best way to tackle this  
> problem with Classifier4j or another api if doing so makes more sense?
>
> How would the performance be with classifier4J and which  
> classifier4j datasource and classifier do you recommend we use.

I doubt you want to use C4J for this. I would probably use build n- 
grams of the words and the text to weight them up to make sure no one  
is trying to hide the prophanities in other words or by miss spelling  
them. The Lucene spell check library does this for you. And really  
fast. An easier way out would be to simply match text to the words with:

for (String prophanity : prophanities) {
     if (input.indexOf(prophanity) > 1) {
         reportProphanity(input);
     }
}

-- 
karl