classifier4j-devel Mailing List for Classifier4J (Page 4)
Status: Beta
Brought to you by:
nicklothian
You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(18) |
Aug
(14) |
Sep
|
Oct
|
Nov
(74) |
Dec
(9) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(15) |
Feb
(6) |
Mar
|
Apr
|
May
(27) |
Jun
(1) |
Jul
(14) |
Aug
(3) |
Sep
(9) |
Oct
|
Nov
(3) |
Dec
(6) |
2005 |
Jan
|
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(3) |
2006 |
Jan
|
Feb
(5) |
Mar
(5) |
Apr
|
May
(2) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(10) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2008 |
Jan
|
Feb
|
Mar
(1) |
Apr
(4) |
May
(1) |
Jun
(4) |
Jul
(10) |
Aug
(5) |
Sep
(10) |
Oct
(18) |
Nov
(39) |
Dec
(73) |
2009 |
Jan
(78) |
Feb
(24) |
Mar
(32) |
Apr
(53) |
May
(115) |
Jun
(99) |
Jul
(72) |
Aug
(18) |
Sep
(22) |
Oct
(35) |
Nov
(10) |
Dec
(19) |
2010 |
Jan
(6) |
Feb
(7) |
Mar
(43) |
Apr
(55) |
May
(78) |
Jun
(71) |
Jul
(43) |
Aug
(42) |
Sep
(19) |
Oct
(5) |
Nov
|
Dec
|
2012 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Nick L. <ni...@ma...> - 2006-05-07 12:29:22
|
Hi Nadja, Have you tried pooling your database connections? Nick Nadja Senoucci wrote: > Hello all, > > I am trying out Classifier4J as a possible tool for categorizing news > messages. I have several thousand test files of varying length at the > moment and 12 different categories. With that amount of data I have to > use JDBCWordsDataSource (I naturally get "out of memory"-errors with > SimpleWordDataSource) or something similar. Also, I chose to use > JDBCWordsDataSource over JDBMWordsDataSource mostly because I couldn't > figure out how to properly use JDBMWordsDataSource (can't find the > source code of it and there doesn't seem to be much documentation that I > can find for it either). > > Anyway, long story short: I keep getting the > "net.sf.classifier4J.bayesian.WordsDataSourceException: Problem updating > WordProbability" while still training some texts for my first category > and it seems that the underlying problem here is another exception: > java.net.SocketException: "java.net.BindException: Address already in > use: connect". The MySQL documention tells me that this happens when an > application is trying to open too many connections within a short time > span. > > Now what I am basically doing code-wise is this (the code has been > simplified so that it only includes neccessary information): > > Iterator iter = list.iterator(); /*list is an ArrayList of filenames to > train with for this category*/ > while(iter.hasNext()){ > nextFile = (String)iter.next(); > text = TextUtilities.getText(nextFile); /*returns the contents of > the file as plain text*/ > tokenizedText = this.tokenizer.tokenize(text); > for(int i = 0; i < tokenizedText.length; i++){ > jdbcDataSource.addMatch(pool, tokenizedText[i]); > } > } > > I hope this piece of code will still be readable once I send the > email. :) > > Some things seem to get entered into the database table before the > exception occurs. > > I also tried using the classifier so I wouldn't have to add every single > token but could train an entire message at once but I still got the same > exception and it seemed like no data at all made it to the database. > > Can anyone help me with this? I just can't figure out how to solve this > problem. Wouldn't surprise me if it was some really stupid mistake on my > part. :) > > Regards, > Nadja > > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job > easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache > Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > > -------------------------------- > Spam/Virus scanning by CanIt Pro > > For more information see > http://www.kgbinternet.com/SpamFilter.htm > > To control your spam filter, log in at > http://filter.kgbinternet.com > |
From: Nadja S. <sen...@21...> - 2006-05-02 19:39:18
|
Hello all, I am trying out Classifier4J as a possible tool for categorizing news messages. I have several thousand test files of varying length at the moment and 12 different categories. With that amount of data I have to use JDBCWordsDataSource (I naturally get "out of memory"-errors with SimpleWordDataSource) or something similar. Also, I chose to use JDBCWordsDataSource over JDBMWordsDataSource mostly because I couldn't figure out how to properly use JDBMWordsDataSource (can't find the source code of it and there doesn't seem to be much documentation that I can find for it either). Anyway, long story short: I keep getting the "net.sf.classifier4J.bayesian.WordsDataSourceException: Problem updating WordProbability" while still training some texts for my first category and it seems that the underlying problem here is another exception: java.net.SocketException: "java.net.BindException: Address already in use: connect". The MySQL documention tells me that this happens when an application is trying to open too many connections within a short time span. Now what I am basically doing code-wise is this (the code has been simplified so that it only includes neccessary information): Iterator iter = list.iterator(); /*list is an ArrayList of filenames to train with for this category*/ while(iter.hasNext()){ nextFile = (String)iter.next(); text = TextUtilities.getText(nextFile); /*returns the contents of the file as plain text*/ tokenizedText = this.tokenizer.tokenize(text); for(int i = 0; i < tokenizedText.length; i++){ jdbcDataSource.addMatch(pool, tokenizedText[i]); } } I hope this piece of code will still be readable once I send the email. :) Some things seem to get entered into the database table before the exception occurs. I also tried using the classifier so I wouldn't have to add every single token but could train an entire message at once but I still got the same exception and it seemed like no data at all made it to the database. Can anyone help me with this? I just can't figure out how to solve this problem. Wouldn't surprise me if it was some really stupid mistake on my part. :) Regards, Nadja |
From: Nick L. <ni...@ma...> - 2006-03-12 11:38:22
|
You could try running Classifier4J in .NET under IKVM (http://www.ikvm.net/). I'd imagine that it would work pretty well. Let me know if it works! Nick Ric...@er... wrote: > > cla...@li... wrote on 03/10/2006 > 07:48:19 AM: > > >> It isn't really possible to compare scores across categories to say > >> that one category is the "best" category. > >> All the Bayesian classifier will do is say if something matches the > >> current category. > > > I was wondering what you ended up doing on this -- I have > > a similar situation > > > I'm actually using a port of Classifier4J for .NET called > NClassifier, which is based on Classified4J 0.51, so there > is no working VectorClassifier implementation. I've given > up for now and will re-evaluate when the NClassifier library > catches up--no billable time available to port the updates > myself. > > --Richard > > > ---------------------------------------------- > > > This electronic mail message may contain information which is (a) > LEGALLY PRIVILEGED, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY > LAW FROM DISCLOSURE, and (b) intended only for the use of the > Addressee (s) names herein. If you are not the Addressee (s), or the > person responsible for delivering this to the Addressee (s), you are > hereby notified that reading, copying, or distributing this message is > prohibited. If you have received this electronic mail message in > error, please contact us immediately at (281) 600-1000 and take the > steps necessary to delete the message completely from your computer > system. Thank you, Environmental Resources Management. Please > visit ERM's web site: http://www.erm.com |
From: <Ric...@er...> - 2006-03-10 14:46:18
|
cla...@li... wrote on 03/10/2006 07:48:19 AM: >> It isn't really possible to compare scores across categories to say >> that one category is the "best" category. >> All the Bayesian classifier will do is say if something matches the >> current category. > I was wondering what you ended up doing on this -- I have > a similar situation I'm actually using a port of Classifier4J for .NET called NClassifier, which is based on Classified4J 0.51, so there is no working VectorClassifier implementation. I've given up for now and will re-evaluate when the NClassifier library catches up--no billable time available to port the updates myself. --Richard ---------------------------------------------- This electronic mail message may contain information which is (a) LEGALLY PRIVILEGED, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM DISCLOSURE, and (b) intended only for the use of the Addressee (s) names herein. If you are not the Addressee (s), or the person responsible for delivering this to the Addressee (s), you are hereby notified that reading, copying, or distributing this message is prohibited. If you have received this electronic mail message in error, please contact us immediately at (281) 600-1000 and take the steps necessary to delete the message completely from your computer system. Thank you, Environmental Resources Management. Please visit ERM's web site: http://www.erm.com |
From: Joe S. <sca...@gm...> - 2006-03-10 13:48:25
|
Richard - I was wondering what you ended up doing on this -- I have a similar situation joe On 3/2/06, Nick Lothian <ni...@ma...> wrote: > > See inline > > > Ric...@er... wrote: > > > Apologies in advance if this comes through in HTML, I'm stuck > on Lotus Notes here at work. > > I have a bunch of legislative text, around 400,000 individual > paragraphs, that have each been hand-categorized into one of > five categories. > > Since I have a few hundred thousand still to go, I thought the > Bayesian classifier could give me a leg up on this process. > > So I wrote a little trainer that does something like the > following: > > switch(existingcategory){ > case "category1": > classifier.TeachMatch("category1", mytext); > classifier.TeachNonMatch("category2", mytext); > classifier.TeachNonMatch("category3", mytext); > classifier.TeachNonMatch("category4", mytext); > classifier.TeachNonMatch("category5", mytext); > break; > case "category2": > classifier.TeachNonMatch("category1", mytext); > classifier.TeachMatch("category2", mytext); > classifier.TeachNonMatch("category3", mytext); > classifier.TeachNonMatch("category4", mytext); > classifier.TeachNonMatch("category5", mytext); > break; > case "category3": > classifier.TeachNonMatch("category1", mytext); > classifier.TeachNonMatch("category2", mytext); > classifier.TeachMatch("category3", mytext); > classifier.TeachNonMatch("category4", mytext); > classifier.TeachNonMatch("category5", mytext); > break; > case "category4": > ... > } > > The problem is, *one* of the categories is *much* more common than > the others, so it gets more matches and fewer non-matches for almost > *any* word. > > So, now when I send a new string through the trained classifier and > compare the scores, that category almost always wins out, and in a > big way (generally around 99% for it, 1% for the others). > > It isn't really possible to compare scores across categories to say tha= t > one category is the "best" category. > > All the Bayesian classifier will do is say if something matches the > current category. As you've seen it does that well - you'll typically end= up > with a very high score (99%) or a very low score (1%) and not much in > between. > > Perhaps you could classify the big category last, and only check it is > none of the other ones find a match. > > > Am I training this classifier wrong, or is this a limitation of > using Bayesian filters with more than two categories or with a > corpus that is unevenly distributed among the categories? > > I thought maybe I should try the VectorClassifier instead, but I > have *tens of thousands* of strings in each category that I need to > train it on, and the docs state that you can't incrementally train > it (which, I presume, means I would need to concatenate the entire > training corpus into one string per category). > > > That means just that the training interfaces aren't properly implemented > (yet). I've attached an updatable HashMapTermVectorStorage that fixes thi= s > (I haven't tested it though) - it might give you something to start from. > > Nick > > > package net.sf.classifier4J.vector; > > import java.io.Serializable; > import java.util.HashMap; > import java.util.Hashtable; > import java.util.Map; > import java.util.Set; > > > public class MyHashMapTermVectorStorage implements TermVectorStorage, > Serializable { > private static final long serialVersionUID =3D 1L; > private Map storage; > > > public MyHashMapTermVectorStorage(int amount) > { > storage =3D new HashMap(amount); > } > > > > public MyHashMapTermVectorStorage() > { > storage =3D new HashMap(); > } > > /** > * @see net.sf.classifier4J.vector.TermVectorStorage#addTermVector( > java.lang.String, net.sf.classifier4J.vector.TermVector) > */ > public void addTermVector(String category, TermVector termVector) { > //storage.put(category, termVector); > //modified: Abelssoft, Sven Abels, 16.03.2005: > > TermVector old=3D(TermVector)storage.get(category); > if (old=3D=3Dnull) storage.put(category, termVector); > else > { > old.add(termVector); > storage.put(category, old); > } > } > > /** > * @see net.sf.classifier4J.vector.TermVectorStorage#getTermVector( > java.lang.String) > */ > public TermVector getTermVector(String category) { > return (TermVector) storage.get(category); > } > > public int size() > { > if (storage=3D=3Dnull) return 0; > return storage.size(); > } > > } > > > |
From: Nick L. <ni...@ma...> - 2006-03-02 11:56:50
|
package net.sf.classifier4J.vector; import java.io.Serializable; import java.util.HashMap; import java.util.Hashtable; import java.util.Map; import java.util.Set; public class MyHashMapTermVectorStorage implements TermVectorStorage, Serializable { private static final long serialVersionUID = 1L; private Map storage; public MyHashMapTermVectorStorage(int amount) { storage = new HashMap(amount); } public MyHashMapTermVectorStorage() { storage = new HashMap(); } /** * @see net.sf.classifier4J.vector.TermVectorStorage#addTermVector(java.lang.String, net.sf.classifier4J.vector.TermVector) */ public void addTermVector(String category, TermVector termVector) { //storage.put(category, termVector); //modified: Abelssoft, Sven Abels, 16.03.2005: TermVector old=(TermVector)storage.get(category); if (old==null) storage.put(category, termVector); else { old.add(termVector); storage.put(category, old); } } /** * @see net.sf.classifier4J.vector.TermVectorStorage#getTermVector(java.lang.String) */ public TermVector getTermVector(String category) { return (TermVector) storage.get(category); } public int size() { if (storage==null) return 0; return storage.size(); } } |
From: <Ric...@er...> - 2006-03-01 15:04:26
|
Apologies in advance if this comes through in HTML, I'm stuck on Lotus Notes here at work. I have a bunch of legislative text, around 400,000 individual paragraphs, that have each been hand-categorized into one of five categories. Since I have a few hundred thousand still to go, I thought the Bayesian classifier could give me a leg up on this process. So I wrote a little trainer that does something like the following: switch(existingcategory){ case "category1": classifier.TeachMatch("category1", mytext); classifier.TeachNonMatch("category2", mytext); classifier.TeachNonMatch("category3", mytext); classifier.TeachNonMatch("category4", mytext); classifier.TeachNonMatch("category5", mytext); break; case "category2": classifier.TeachNonMatch("category1", mytext); classifier.TeachMatch("category2", mytext); classifier.TeachNonMatch("category3", mytext); classifier.TeachNonMatch("category4", mytext); classifier.TeachNonMatch("category5", mytext); break; case "category3": classifier.TeachNonMatch("category1", mytext); classifier.TeachNonMatch("category2", mytext); classifier.TeachMatch("category3", mytext); classifier.TeachNonMatch("category4", mytext); classifier.TeachNonMatch("category5", mytext); break; case "category4": ... } The problem is, *one* of the categories is *much* more common than the others, so it gets more matches and fewer non-matches for almost *any* word. So, now when I send a new string through the trained classifier and compare the scores, that category almost always wins out, and in a big way (generally around 99% for it, 1% for the others). Am I training this classifier wrong, or is this a limitation of using Bayesian filters with more than two categories or with a corpus that is unevenly distributed among the categories? I thought maybe I should try the VectorClassifier instead, but I have *tens of thousands* of strings in each category that I need to train it on, and the docs state that you can't incrementally train it (which, I presume, means I would need to concatenate the entire training corpus into one string per category). Any help would be greatly appreciated... -- Richard S. Tallent ERM (Beaumont, TX) 409-833-7755 ---------------------------------------------- This electronic mail message may contain information which is (a) LEGALLY PRIVILEGED, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM DISCLOSURE, and (b) intended only for the use of the Addressee (s) names herein. If you are not the Addressee (s), or the person responsible for delivering this to the Addressee (s), you are hereby notified that reading, copying, or distributing this message is prohibited. If you have received this electronic mail message in error, please contact us immediately at (281) 600-1000 and take the steps necessary to delete the message completely from your computer system. Thank you, Environmental Resources Management. Please visit ERM's web site: http://www.erm.com |
From: karl w. <we...@ho...> - 2006-02-07 18:33:49
|
7 feb 2006 kl. 12.28 skrev Nick Lothian: > karl wettin wrote: > >> I really like the simplicity of the C4J API, but think it's too >> bad there is no support for other than nominal values. Have you >> considered to add support for numeric values in the interfaces? >> I'd love to plug in the Weka J48 (and others) to the same API. >> The C4J classifiers could simply treat the numeric values as >> nominal. It would also be nice with more than one dimension of >> classes per classifier, i.e. rows and columns. >> >> How about that? Or is this outside the indended scope of C4J? >> Perhaps I should make my own facade that handles both Weka and >> C4J in a simple way? >> > > I'm quite interested in this, but I have to admit I don't > understand the distinction between numeric & nominal values. Can > you explain this some (to save me some googling-time...)? Very short: nominal values are strings. Numeric values are integer/ floating point values. They are both classes, but numerical values are easier to bend in either direction than a nominal value. -- karl |
From: Nick L. <ni...@ma...> - 2006-02-07 11:28:09
|
karl wettin wrote: > I really like the simplicity of the C4J API, but think it's too bad > there is no support for other than nominal values. Have you > considered to add support for numeric values in the interfaces? I'd > love to plug in the Weka J48 (and others) to the same API. The C4J > classifiers could simply treat the numeric values as nominal. It > would also be nice with more than one dimension of classes per > classifier, i.e. rows and columns. > > How about that? Or is this outside the indended scope of C4J? Perhaps > I should make my own facade that handles both Weka and C4J in a > simple way? > I'm quite interested in this, but I have to admit I don't understand the distinction between numeric & nominal values. Can you explain this some (to save me some googling-time...)? Nick |
From: karl w. <we...@ho...> - 2006-02-07 03:39:38
|
I really like the simplicity of the C4J API, but think it's too bad there is no support for other than nominal values. Have you considered to add support for numeric values in the interfaces? I'd love to plug in the Weka J48 (and others) to the same API. The C4J classifiers could simply treat the numeric values as nominal. It would also be nice with more than one dimension of classes per classifier, i.e. rows and columns. How about that? Or is this outside the indended scope of C4J? Perhaps I should make my own facade that handles both Weka and C4J in a simple way? -- karl |
From: karl w. <we...@ho...> - 2006-02-06 14:47:55
|
6 feb 2006 kl. 15.31 skrev Jeff Thorne: > I would like to analyze each users post for various words and > expressions before publishing their post to the DB. I was wondering > if someone could shed some light on the best way to tackle this > problem with Classifier4j or another api if doing so makes more sense? > > How would the performance be with classifier4J and which > classifier4j datasource and classifier do you recommend we use. I doubt you want to use C4J for this. I would probably use build n- grams of the words and the text to weight them up to make sure no one is trying to hide the prophanities in other words or by miss spelling them. The Lucene spell check library does this for you. And really fast. An easier way out would be to simply match text to the words with: for (String prophanity : prophanities) { if (input.indexOf(prophanity) > 1) { reportProphanity(input); } } -- karl |
From: Joe S. <sca...@gm...> - 2005-12-10 00:23:38
|
Thank you On 12/9/05, Mike Heath <mh...@av...> wrote: > > Indeed there is: > http://sourceforge.net/mailarchive/forum.php?forum_id=3D34026 > > On Fri, 2005-12-09 at 07:58 -0500, Joe Scanlon wrote: > > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log > files > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://ads.osdn.com/?ad_id=3D7637&alloc_id=3D16865&op=3Dclick > _______________________________________________ > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > |
From: Mike H. <mh...@av...> - 2005-12-09 21:21:52
|
Indeed there is: http://sourceforge.net/mailarchive/forum.php?forum_id=34026 On Fri, 2005-12-09 at 07:58 -0500, Joe Scanlon wrote: > |
From: Joe S. <sca...@gm...> - 2005-12-09 12:58:37
|
From: Scanlon, J. <Joe...@Li...> - 2005-07-11 13:09:35
|
subscribe Joe Scanlon Principal Software Engineer ACS Application Common Utilities Liberty Mutual Group Joe...@Li... ph: (603) 245-1934 fax: (603) 245-0715 cell: (603) 489-8231 |
From: karl w. <ka...@sn...> - 2005-02-26 12:29:42
|
fre 2005-02-04 klockan 19:59 +1030 skrev Nick Lothian: > I've just released Classifier4J 0.6. This new release includes a > rather nice (I think) new classifier (the VectorClassifer) based on Rather nice indeed. I just got around to try it out. What do you think about extending the vector with heirarcies? The parent/child delta could be used as negative data of the parent, or something like that. Any instintive thoughts? I could set off some hours for that. -- karl |
From: Nick L. <ni...@ma...> - 2005-02-04 09:26:16
|
I've just released Classifier4J 0.6. This new release includes a rather nice (I think) new classifier (the VectorClassifer) based on the vector space search algorithm This particular classifier is fast, doesn't require training for non-matches and is very suitable for sorting data into various categories. The build system now is totally based on Maven, and I've moved to a new CVS module (newbuild) to implement this. Let me know if you find any bugs. Nick |
From: Mike H. <mh...@av...> - 2004-12-29 19:23:30
|
C4J relies on Naive Bayes (http://en.wikipedia.org/wiki/Naive_Bayes) which, in order to classify something, you need to teach it what each class is AND what each class is not. For comparison purposes as you've described in your message, I'm not sure the C4J is a good solution. -Mike On Sun, 2004-12-26 at 15:37, Colin Bell wrote: > Hi all > > I would like to start with saying what an exciting piece of software > C4J is thanks to all those involved. > > I have written a bit of code to use C4J to compare documents (in this > case stored in a JDBC database) to each other and find out how similar > they are. I pick the document from which I am to compare, and then add > each word of it to a SimpleWordsDataSource using a loop > (wds.addMatch(wordList[i])). I then use BayesianClassifier(wds) to get > the result of each document. > > Problem is that my results are obviously very poor (always 0.99, > sometimes 0.5) because I don't have any non-matches. Does anyone have > an idea on how I could do this? What could I possible use as > non-matches, or am I missing a trick? > > Many thanks > > Regards > > Colin > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://productguide.itmanagersjournal.com/ > _______________________________________________ > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel |
From: Colin B. <co...@ga...> - 2004-12-26 22:37:17
|
Hi all I would like to start with saying what an exciting piece of software C4J is thanks to all those involved. I have written a bit of code to use C4J to compare documents (in this case stored in a JDBC database) to each other and find out how similar they are. I pick the document from which I am to compare, and then add each word of it to a SimpleWordsDataSource using a loop (wds.addMatch(wordList[i])). I then use BayesianClassifier(wds) to get the result of each document. Problem is that my results are obviously very poor (always 0.99, sometimes 0.5) because I don't have any non-matches. Does anyone have an idea on how I could do this? What could I possible use as non-matches, or am I missing a trick? Many thanks Regards Colin |
From: Wayne S. <wds...@oa...> - 2004-12-13 17:05:54
|
Thanks It makes more sense now. -----Original Message----- From: cla...@li... [mailto:cla...@li...] On Behalf Of cla...@li... Sent: Sunday, December 12, 2004 11:12 PM To: cla...@li... Subject: Classifier4j-devel digest, Vol 1 #78 - 2 msgs Send Classifier4j-devel mailing list submissions to cla...@li... To subscribe or unsubscribe via the World Wide Web, visit https://lists.sourceforge.net/lists/listinfo/classifier4j-devel or, via email, send a message with subject or body 'help' to cla...@li... You can reach the person managing the list at cla...@li... When replying, please edit your Subject line so it is more specific than "Re: Contents of Classifier4j-devel digest..." Today's Topics: 1. RE: What does this method do -normaliseSigni ficance( ) (Nick Lothian) --__--__-- Message: 1 From: Nick Lothian <nic...@es...> To: "'cla...@li...'" <cla...@li...> Subject: RE: [Classifier4j-devel] What does this method do -normaliseSigni ficance( ) Date: Mon, 13 Dec 2004 08:51:29 +1030 Reply-To: cla...@li... >=20 > On Fri, 2004-12-10 at 17:45, Wayne Snyder wrote: > > I understand just about everything that=A2s going on in this = package, > > except for the following method: > > > > Class BayesianClassifier >=20 > > protected static double normaliseSignificance(double sig) >=20 > > Could you please explain the role it plays. >=20 > I am not a Classifier4J developer but I've used Classifier4J=20 > quite a bit > and have done a lot of research on Naive Bayesian Classifiers. >=20 > Stated simply, probabilities of 0 mess up a Naive Bayesian Classifier > and probabilities of 1 don't change anything. It basically boils = down > to the fact that anything multiplied by 0 is 0 and multiplying by 1 > doesn't change anything.=20 > BayesianClassifer.normaliseSignificance(double) simply removes the = 1's > and the 0's and replaces them with 0.99 and 0.01, respectively. >=20 > For a good explanation of the magic that is Naive Bayesian > Classification, check out: > http://en.wikipedia.org/wiki/Naive_Bayesian_classifier >=20 > -Mike >=20 That's exactly what that method does. Nick --__--__-- _______________________________________________ Classifier4j-devel mailing list Cla...@li... https://lists.sourceforge.net/lists/listinfo/classifier4j-devel End of Classifier4j-devel Digest |
From: Nick L. <nic...@es...> - 2004-12-12 22:25:53
|
>=20 > On Fri, 2004-12-10 at 17:45, Wayne Snyder wrote: > > I understand just about everything that=A2s going on in this = package, > > except for the following method: > > > > Class BayesianClassifier >=20 > > protected static double normaliseSignificance(double sig) >=20 > > Could you please explain the role it plays. >=20 > I am not a Classifier4J developer but I've used Classifier4J=20 > quite a bit > and have done a lot of research on Naive Bayesian Classifiers. >=20 > Stated simply, probabilities of 0 mess up a Naive Bayesian Classifier > and probabilities of 1 don't change anything. It basically boils = down > to the fact that anything multiplied by 0 is 0 and multiplying by 1 > doesn't change anything.=20 > BayesianClassifer.normaliseSignificance(double) simply removes the = 1's > and the 0's and replaces them with 0.99 and 0.01, respectively. >=20 > For a good explanation of the magic that is Naive Bayesian > Classification, check out: > http://en.wikipedia.org/wiki/Naive_Bayesian_classifier >=20 > -Mike >=20 That's exactly what that method does. Nick |
From: Mike H. <mh...@av...> - 2004-12-12 03:50:34
|
On Fri, 2004-12-10 at 17:45, Wayne Snyder wrote: > I understand just about everything that=A2s going on in this package, > except for the following method: > > Class BayesianClassifier > protected static double normaliseSignificance(double sig) > Could you please explain the role it plays. I am not a Classifier4J developer but I've used Classifier4J quite a bit and have done a lot of research on Naive Bayesian Classifiers. Stated simply, probabilities of 0 mess up a Naive Bayesian Classifier and probabilities of 1 don't change anything. It basically boils down to the fact that anything multiplied by 0 is 0 and multiplying by 1 doesn't change anything.=20 BayesianClassifer.normaliseSignificance(double) simply removes the 1's and the 0's and replaces them with 0.99 and 0.01, respectively. For a good explanation of the magic that is Naive Bayesian Classification, check out: http://en.wikipedia.org/wiki/Naive_Bayesian_classifier -Mike |
From: Wayne S. <wds...@oa...> - 2004-12-11 00:48:42
|
I understand just about everything that's going on in this package, except for the following method: Class BayesianClassifier protected static double normaliseSignificance(double sig) Could you please explain the role it plays. Thanks Wayne |
From: Nick L. <nic...@es...> - 2004-11-28 22:53:55
|
BTW, you will need to train non-matches as well as matches in order to get sensible results. Nick -----Original Message----- From: Nick Lothian [mailto:nic...@es...] Sent: Monday, 29 November 2004 9:16 AM To: cla...@li... Subject: RE: [Classifier4j-devel] Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory Importance: Low You need apache commons logging. See http://classifier4j.sourceforge.net/dependencies.html <http://classifier4j.sourceforge.net/dependencies.html> Nick -----Original Message----- From: Wayne [mailto:des...@ho...] Sent: Monday, 29 November 2004 9:17 AM To: cla...@li... Subject: [Classifier4j-devel] Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory Importance: Low My Bayesian test program compiles fine but I get this error when I try to run it: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory at net.sf.classifier4J.bayesian.WordProbability.calculateProbability(WordProbab ility.java:167) at net.sf.classifier4J.bayesian.WordProbability.setMatchingCount(WordProbabilit y.java:138) at net.sf.classifier4J.bayesian.WordProbability.<init>(WordProbabilityjava:115) at net.sf.classifier4J.bayesian.SimpleWordsDataSource.addMatch(SimpleWordsDataS ource.java:94) at testing.Test1.main(Test1.java:15) I am using Eclipse 3.1M2 and have added the Classifier4J-0.51.jar as an external JAR library. This version of Eclipse uses JDK 5.0. Does anyone know what settings I need in Eclipse to run? Here is the test code in my project: package testing; import net.sf.classifier4J.ClassifierException; import net.sf.classifier4J.IClassifier; import net.sf.classifier4J.bayesian.BayesianClassifier; import net.sf.classifier4J.bayesian.IWordsDataSource; import net.sf.classifier4J.bayesian.SimpleWordsDataSource; import net.sf.classifier4J.bayesian.WordsDataSourceException; public class Test1 { public static void main(String[] args) { IWordsDataSource wds = new SimpleWordsDataSource(); try { wds.addMatch("Blah"); } catch (WordsDataSourceException e) { e.printStackTrace(); } IClassifier classifier = new BayesianClassifier(wds); try { dReturn = classifier.classify("Blah Happy Holidays"); } catch (ClassifierException e1) { e1.printStackTrace(); } System.out.println(dReturn); } private static double dReturn; } Thanks -Wayne |
From: Nick L. <nic...@es...> - 2004-11-28 22:50:08
|
You need apache commons logging. See http://classifier4j.sourceforge.net/dependencies.html <http://classifier4j.sourceforge.net/dependencies.html> Nick -----Original Message----- From: Wayne [mailto:des...@ho...] Sent: Monday, 29 November 2004 9:17 AM To: cla...@li... Subject: [Classifier4j-devel] Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory Importance: Low My Bayesian test program compiles fine but I get this error when I try to run it: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory at net.sf.classifier4J.bayesian.WordProbability.calculateProbability(WordProbab ility.java:167) at net.sf.classifier4J.bayesian.WordProbability.setMatchingCount(WordProbabilit y.java:138) at net.sf.classifier4J.bayesian.WordProbability.<init>(WordProbabilityjava:115) at net.sf.classifier4J.bayesian.SimpleWordsDataSource.addMatch(SimpleWordsDataS ource.java:94) at testing.Test1.main(Test1.java:15) I am using Eclipse 3.1M2 and have added the Classifier4J-0.51.jar as an external JAR library. This version of Eclipse uses JDK 5.0. Does anyone know what settings I need in Eclipse to run? Here is the test code in my project: package testing; import net.sf.classifier4J.ClassifierException; import net.sf.classifier4J.IClassifier; import net.sf.classifier4J.bayesian.BayesianClassifier; import net.sf.classifier4J.bayesian.IWordsDataSource; import net.sf.classifier4J.bayesian.SimpleWordsDataSource; import net.sf.classifier4J.bayesian.WordsDataSourceException; public class Test1 { public static void main(String[] args) { IWordsDataSource wds = new SimpleWordsDataSource(); try { wds.addMatch("Blah"); } catch (WordsDataSourceException e) { e.printStackTrace(); } IClassifier classifier = new BayesianClassifier(wds); try { dReturn = classifier.classify("Blah Happy Holidays"); } catch (ClassifierException e1) { e1.printStackTrace(); } System.out.println(dReturn); } private static double dReturn; } Thanks -Wayne |