Re: [Classifier4j-devel] detect inappropriate content in web post
Status: Beta
Brought to you by:
nicklothian
|
From: karl w. <we...@ho...> - 2006-02-06 14:47:55
|
6 feb 2006 kl. 15.31 skrev Jeff Thorne:
> I would like to analyze each users post for various words and
> expressions before publishing their post to the DB. I was wondering
> if someone could shed some light on the best way to tackle this
> problem with Classifier4j or another api if doing so makes more sense?
>
> How would the performance be with classifier4J and which
> classifier4j datasource and classifier do you recommend we use.
I doubt you want to use C4J for this. I would probably use build n-
grams of the words and the text to weight them up to make sure no one
is trying to hide the prophanities in other words or by miss spelling
them. The Lucene spell check library does this for you. And really
fast. An easier way out would be to simply match text to the words with:
for (String prophanity : prophanities) {
if (input.indexOf(prophanity) > 1) {
reportProphanity(input);
}
}
--
karl
|