RE: [Classifier4j-devel] stuff
Status: Beta
Brought to you by:
nicklothian
From: Nick L. <nl...@es...> - 2004-02-19 22:26:50
|
> > > > However, I tried to classify some text using the Bayesian > Classifier > > > to guess language. All responses are 0.5, so I guess it's me doing > > > something wrong. > > > You need to train the classifier with both matches and non-matches. > > What kind of non-matches should I fill it with? String of > text from all > other languages? > I guess so. I'm not sure how well it is going to work for this, though. Usually we ignore the most common words in the language (stop words). In your case it might make more sense to have a vocabulary of nothing but stop words in each language, because that way you can pretty much guarantee that you'll get a match in the correct language. Nick |