Thread: RE: [Classifier4j-devel] stuff
Status: Beta
Brought to you by:
nicklothian
From: Nick L. <nl...@es...> - 2004-02-17 22:13:37
|
You need to train the classifier with both matches and non-matches. Use the .addNonMatch(String) method to train the non-matches. > -----Original Message----- > From: karl wettin [mailto:we...@us...] > Sent: Tuesday, 17 February 2004 2:16 AM > To: cla...@li... > Subject: [Classifier4j-devel] stuff > Importance: Low > > > > Not too much action here, I guess. > > However, I tried to classify some text using the Bayesian Classifier > to guess language. All responses are 0.5, so I guess it's me doing > something wrong. > > The code looked something like this: > > { > SimpleWordsDataSource wds_en = new SimpleWordsDataSource(); > Enumerator enum = enumerateWords(englishText); > while (enum.hasNext()) > wds_en.addMatch((String)enum.nextElement()); > > SimpleWordsDataSource wds_sv = new SimpleWordsDataSource(); > enum = enumerateWords(swedishText); > while (enum.hasNext()) > wds_sv.addMatch((String)enum.nextElement()); > > BayesianClassifier c_en = new BayesianClassifier(wds_en); > System.out.println(c_en.classify("hello, my name is > karl.")); // returns 0.5 > > BayesianClassifier c_sv = new BayesianClassifier(wds_sv); > System.out.println(c_sv.classify("hello, my name is > karl.")); // returns 0.5 > } > > > What do I do wrong? > > > > -- > > karl > > > ------------------------------------------------------- > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > Build and deploy apps & Web services for Linux with > a free DVD software kit from IBM. Click Now! > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > _______________________________________________ > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > |
From: Peter L. <pe...@le...> - 2004-02-17 22:55:13
|
It may be worth putting this typw of question down as a FAQ item... It's been asked a few times.. On Wed, 18 Feb 2004 08:38:46 +1030, Nick Lothian wrote: > > You need to train the classifier with both matches and non-matches. > > Use the .addNonMatch(String) method to train the non-matches. > > > -----Original Message----- > > From: karl wettin [mailto:we...@us...] > > Sent: Tuesday, 17 February 2004 2:16 AM > > To: cla...@li... > > Subject: [Classifier4j-devel] stuff > > Importance: Low > > > > > > > > Not too much action here, I guess. > > > > However, I tried to classify some text using the Bayesian Classifier > > to guess language. All responses are 0.5, so I guess it's me doing > > something wrong. > > > > The code looked something like this: > > > > { > > SimpleWordsDataSource wds_en = new SimpleWordsDataSource(); > > Enumerator enum = enumerateWords(englishText); > > while (enum.hasNext()) > > wds_en.addMatch((String)enum.nextElement()); > > > > SimpleWordsDataSource wds_sv = new SimpleWordsDataSource(); > > enum = enumerateWords(swedishText); > > while (enum.hasNext()) > > wds_sv.addMatch((String)enum.nextElement()); > > > > BayesianClassifier c_en = new BayesianClassifier(wds_en); > > System.out.println(c_en.classify("hello, my name is > > karl.")); // returns 0.5 > > > > BayesianClassifier c_sv = new BayesianClassifier(wds_sv); > > System.out.println(c_sv.classify("hello, my name is > > karl.")); // returns 0.5 > > } > > > > > > What do I do wrong? > > > > > > > > -- > > > > karl > > > > > > ------------------------------------------------------- > > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > > Build and deploy apps & Web services for Linux with > > a free DVD software kit from IBM. Click Now! > > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > > _______________________________________________ > > Classifier4j-devel mailing list > > Cla...@li... > > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > > > > > ------------------------------------------------------- > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > Build and deploy apps & Web services for Linux with > a free DVD software kit from IBM. Click Now! > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > _______________________________________________ > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel |
From: Nick L. <nl...@es...> - 2004-02-17 22:59:17
|
Yeah, I was just thinking that. I was also trying to think of a better way to do it - the API isn't totally clean here. Nick > -----Original Message----- > From: Peter Leschev [mailto:pe...@le...] > Sent: Tuesday, 17 February 2004 9:21 AM > To: cla...@li... > Subject: RE: [Classifier4j-devel] stuff > Importance: Low > > > > It may be worth putting this typw of question down as a FAQ > item... It's been asked a few times.. > > On Wed, 18 Feb 2004 08:38:46 +1030, Nick Lothian wrote: > > > > You need to train the classifier with both matches and non-matches. > > > > Use the .addNonMatch(String) method to train the non-matches. > > > > > -----Original Message----- > > > From: karl wettin [mailto:we...@us...] > > > Sent: Tuesday, 17 February 2004 2:16 AM > > > To: cla...@li... > > > Subject: [Classifier4j-devel] stuff > > > Importance: Low > > > > > > > > > > > > Not too much action here, I guess. > > > > > > However, I tried to classify some text using the Bayesian > Classifier > > > to guess language. All responses are 0.5, so I guess it's me doing > > > something wrong. > > > > > > The code looked something like this: > > > > > > { > > > SimpleWordsDataSource wds_en = new SimpleWordsDataSource(); > > > Enumerator enum = enumerateWords(englishText); > > > while (enum.hasNext()) > > > wds_en.addMatch((String)enum.nextElement()); > > > > > > SimpleWordsDataSource wds_sv = new SimpleWordsDataSource(); > > > enum = enumerateWords(swedishText); > > > while (enum.hasNext()) > > > wds_sv.addMatch((String)enum.nextElement()); > > > > > > BayesianClassifier c_en = new BayesianClassifier(wds_en); > > > System.out.println(c_en.classify("hello, my name is > > > karl.")); // returns 0.5 > > > > > > BayesianClassifier c_sv = new BayesianClassifier(wds_sv); > > > System.out.println(c_sv.classify("hello, my name is > > > karl.")); // returns 0.5 > > > } > > > > > > > > > What do I do wrong? > > > > > > > > > > > > -- > > > > > > karl > > > > > > > > > ------------------------------------------------------- > > > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > > > Build and deploy apps & Web services for Linux with > > > a free DVD software kit from IBM. Click Now! > > > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > > > _______________________________________________ > > > Classifier4j-devel mailing list > > > Cla...@li... > > > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > > > > > > > > > ------------------------------------------------------- > > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > > Build and deploy apps & Web services for Linux with > > a free DVD software kit from IBM. Click Now! > > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > > _______________________________________________ > > Classifier4j-devel mailing list > > Cla...@li... > > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > > > ------------------------------------------------------- > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > Build and deploy apps & Web services for Linux with > a free DVD software kit from IBM. Click Now! > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > _______________________________________________ > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > |
From: Nick L. <nl...@es...> - 2004-02-19 22:26:50
|
> > > > However, I tried to classify some text using the Bayesian > Classifier > > > to guess language. All responses are 0.5, so I guess it's me doing > > > something wrong. > > > You need to train the classifier with both matches and non-matches. > > What kind of non-matches should I fill it with? String of > text from all > other languages? > I guess so. I'm not sure how well it is going to work for this, though. Usually we ignore the most common words in the language (stop words). In your case it might make more sense to have a vocabulary of nothing but stop words in each language, because that way you can pretty much guarantee that you'll get a match in the correct language. Nick |
From: karl w. <we...@us...> - 2004-02-19 18:21:26
|
On Wed, 18 Feb 2004 08:38:46 +1030 Nick Lothian <nl...@es...> wrote: > > However, I tried to classify some text using the Bayesian Classifier > > to guess language. All responses are 0.5, so I guess it's me doing > > something wrong. > You need to train the classifier with both matches and non-matches. What kind of non-matches should I fill it with? String of text from all other languages? footnote: I'm sure that a simple N-gram classification could come up with better results per clock tick than a bayesian, but then this is just a test of your classifier. -- karl -- kalle |