classifier4j-devel Mailing List for Classifier4J (Page 12)
Status: Beta
Brought to you by:
nicklothian
You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(18) |
Aug
(14) |
Sep
|
Oct
|
Nov
(74) |
Dec
(9) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(15) |
Feb
(6) |
Mar
|
Apr
|
May
(27) |
Jun
(1) |
Jul
(14) |
Aug
(3) |
Sep
(9) |
Oct
|
Nov
(3) |
Dec
(6) |
2005 |
Jan
|
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(3) |
2006 |
Jan
|
Feb
(5) |
Mar
(5) |
Apr
|
May
(2) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(10) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2008 |
Jan
|
Feb
|
Mar
(1) |
Apr
(4) |
May
(1) |
Jun
(4) |
Jul
(10) |
Aug
(5) |
Sep
(10) |
Oct
(18) |
Nov
(39) |
Dec
(73) |
2009 |
Jan
(78) |
Feb
(24) |
Mar
(32) |
Apr
(53) |
May
(115) |
Jun
(99) |
Jul
(72) |
Aug
(18) |
Sep
(22) |
Oct
(35) |
Nov
(10) |
Dec
(19) |
2010 |
Jan
(6) |
Feb
(7) |
Mar
(43) |
Apr
(55) |
May
(78) |
Jun
(71) |
Jul
(43) |
Aug
(42) |
Sep
(19) |
Oct
(5) |
Nov
|
Dec
|
2012 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Nick L. <ni...@ma...> - 2003-07-17 05:46:51
|
I've commited the changes to JDBCWordsDataSource discussed in <http://sourceforge.net/mailarchive/forum.php?thread_id=2697458&forum_id=340 26>. (For those of you using Sourceforge's anonymous CVS I would expect it to show up tomorrow) It now uses a single table: CREATE TABLE word_probability ( word VARCHAR(255) NOT NULL, category VARCHAR(20) NOT NULL, match_count INT DEFAULT 0 NOT NULL, nonmatch_count INT DEFAULT 0 NOT NULL, PRIMARY KEY(word, category) ) I've attached a simple program that will process some files in some directories and then output the most significant words. (All the directories & db names etc are hard coded, but it should show roughly how to use it). Nick |
From: Nick L. <ni...@ma...> - 2003-07-17 05:10:30
|
Yes, you are right, I needed to add somethign like that. I've added it to all the files. I think I'd prefer to keep the copyright like that at the moment, so that if someone wants something modified with it they only need to contact me, rather than needing to contact each and every person who has done something (like when Mozilla wanted to dual licence everything and had to try and find all the authors). I don't see somethign like that ever happening with this, though - I chose the Apache licence so it would allow Classifier4J to be used in commercial products as well as open sourced projects. I'm think everyone should add their names in the @author JavaDoc tags, though - I'll add yours to the files in your last patch. ----- Original Message ----- From: "Peter Leschev" <pe...@le...> To: <cla...@li...> Sent: Wednesday, July 16, 2003 12:53 PM Subject: [Classifier4j-devel] Classifier4J license > Heya Nick, > > I noticed that most files don't have license information, while others refer to > http://classifier4j.sourceforge.net/LICENCE.txt. Here's a modification of the Apache license (I'm > assuming that's the correct license that you want to use - I found that on the Classifier4J summary > page on SF) that we can use - If you're happy w it, I'll go ahead and make the changes to all the > files. > > Pete > > Issues: Do we just have your name for the Copyright, or each author adds their name if they modify > that particular file? > > /* > * ==================================================================== > * > * The Apache Software License, Version 1.1 > * > * Copyright (c) 2003 Nick Lothian. All rights reserved. > * > * Redistribution and use in source and binary forms, with or without > * modification, are permitted provided that the following conditions > * are met: > * > * 1. Redistributions of source code must retain the above copyright > * notice, this list of conditions and the following disclaimer. > * > * 2. Redistributions in binary form must reproduce the above copyright > * notice, this list of conditions and the following disclaimer in > * the documentation and/or other materials provided with the > * distribution. > * > * 3. The end-user documentation included with the redistribution, if > * any, must include the following acknowlegement: > * "This product includes software developed by the > * developers of Classifier4J (http://classifier4j.sf.net/)." > * Alternately, this acknowlegement may appear in the software itself, > * if and wherever such third-party acknowlegements normally appear. > * > * 4. The name "Classifier4J" must not be used to endorse or promote > * products derived from this software without prior written > * permission. For written permission, please contact > * http://sourceforge.net/users/nicklothian/. > * > * 5. Products derived from this software may not be called > * "Classifier4J", nor may "Classifier4J" appear in their names > * without prior written permission. For written permission, please > * contact http://sourceforge.net/users/nicklothian/. > * > * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED > * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES > * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE > * DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR > * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF > * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND > * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, > * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT > * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF > * SUCH DAMAGE. > * ==================================================================== > */ > > > ------------------------------------------------------- > This SF.net email is sponsored by: VM Ware > With VMware you can run multiple operating systems on a single machine. > WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines at the > same time. Free trial click here: http://www.vmware.com/wl/offer/345/0 > _______________________________________________ > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > |
From: Peter L. <pe...@le...> - 2003-07-16 03:23:09
|
Heya Nick, I noticed that most files don't have license information, while others refer to http://classifier4j.sourceforge.net/LICENCE.txt. Here's a modification of the Apache license (I'm assuming that's the correct license that you want to use - I found that on the Classifier4J summary page on SF) that we can use - If you're happy w it, I'll go ahead and make the changes to all the files. Pete Issues: Do we just have your name for the Copyright, or each author adds their name if they modify that particular file? /* * ==================================================================== * * The Apache Software License, Version 1.1 * * Copyright (c) 2003 Nick Lothian. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * * 3. The end-user documentation included with the redistribution, if * any, must include the following acknowlegement: * "This product includes software developed by the * developers of Classifier4J (http://classifier4j.sf.net/)." * Alternately, this acknowlegement may appear in the software itself, * if and wherever such third-party acknowlegements normally appear. * * 4. The name "Classifier4J" must not be used to endorse or promote * products derived from this software without prior written * permission. For written permission, please contact * http://sourceforge.net/users/nicklothian/. * * 5. Products derived from this software may not be called * "Classifier4J", nor may "Classifier4J" appear in their names * without prior written permission. For written permission, please * contact http://sourceforge.net/users/nicklothian/. * * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE * DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * ==================================================================== */ |
From: Peter L. <pe...@le...> - 2003-07-15 13:06:04
|
Heya, > If you are using the anonymous access to CVS I think sourceforge says it may > be up to 24 hours old (although I have seen it older at times) because they > run the anon access off different CVS servers or something. Ahh... > Anyway, I've added you as a developer to the classifier4J project (pleschev, > right?), so you should have read-only CVS access via the developer CVS > servers: > > :extssh:ple...@cv...:/cvsroot/classifier4J > > Let me know if that doesn't work for you. Brilliant! Worked like a charm, Thanks! Pete |
From: Nick L. <ni...@ma...> - 2003-07-15 10:44:34
|
Yes it is there (See http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/classifier4j/Classifier4J/src/test/net/sf/classifier4J/bayesian/ ) . If you are using the anonymous access to CVS I think sourceforge says it may be up to 24 hours old (although I have seen it older at times) because they run the anon access off different CVS servers or something. Anyway, I've added you as a developer to the classifier4J project (pleschev, right?), so you should have read-only CVS access via the developer CVS servers: :extssh:ple...@cv...:/cvsroot/classifier4J Let me know if that doesn't work for you. Nick ----- Original Message ----- From: "Peter Leschev" <pe...@le...> To: "Nick Lothian" <ni...@ma...>; <cla...@li...> Sent: Tuesday, July 15, 2003 6:51 PM Subject: Re: [Classifier4j-devel] WordProbability Refactor... > Heya, > > could you please verify that you've added the directory > src/test/net/sf/classifier4J/bayesian to cvs? You need to perform a 'cvs add > src/test/net/sf/classifier4J/bayesian', and then add all the files in that > directory (and then commit). > > Thanks, > Pete > > ----- Original Message ----- > From: "Nick Lothian" <ni...@ma...> > To: <cla...@li...> > Sent: Monday, July 14, 2003 9:43 PM > Subject: Re: [Classifier4j-devel] WordProbability Refactor... > > > > Glad I'm not alone, then. > > > > Those changes are checked in now. > > > > ----- Original Message ----- > > From: "Peter Leschev" <pe...@le...> > > To: <ni...@ma...> > > Cc: <cla...@li...> > > Sent: Monday, July 14, 2003 11:00 AM > > Subject: Re: [Classifier4j-devel] WordProbability Refactor... > > > > > > > Heya, > > > > > > I found it very frustrating trying to produce a > > > patch using SF's CVS service. Constantly getting error > > > 0 or EOF meaning there are currently too many > > > connections to the server. > > > > > > Pete > > > > > > On Mon, 14 Jul 2003 10:46:24 +0930, "Nick Lothian" > > > wrote: > > > > > > > > Thanks Pete, > > > > > > > > I've applied that patch - now sourceforge CVS seems to > > > > be down, so I'll have > > > > to update that later. > > > > > > > > > > > > ------------------------------------------------------- > > This SF.Net email sponsored by: Parasoft > > Error proof Web apps, automate testing & more. > > Download & eval WebKing and get a free book. > > www.parasoft.com/bulletproofapps1 > > _______________________________________________ > > Classifier4j-devel mailing list > > Cla...@li... > > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > > > > > > > |
From: Peter L. <pe...@le...> - 2003-07-15 09:22:25
|
Heya, could you please verify that you've added the directory src/test/net/sf/classifier4J/bayesian to cvs? You need to perform a 'cvs add src/test/net/sf/classifier4J/bayesian', and then add all the files in that directory (and then commit). Thanks, Pete ----- Original Message ----- From: "Nick Lothian" <ni...@ma...> To: <cla...@li...> Sent: Monday, July 14, 2003 9:43 PM Subject: Re: [Classifier4j-devel] WordProbability Refactor... > Glad I'm not alone, then. > > Those changes are checked in now. > > ----- Original Message ----- > From: "Peter Leschev" <pe...@le...> > To: <ni...@ma...> > Cc: <cla...@li...> > Sent: Monday, July 14, 2003 11:00 AM > Subject: Re: [Classifier4j-devel] WordProbability Refactor... > > > > Heya, > > > > I found it very frustrating trying to produce a > > patch using SF's CVS service. Constantly getting error > > 0 or EOF meaning there are currently too many > > connections to the server. > > > > Pete > > > > On Mon, 14 Jul 2003 10:46:24 +0930, "Nick Lothian" > > wrote: > > > > > > Thanks Pete, > > > > > > I've applied that patch - now sourceforge CVS seems to > > > be down, so I'll have > > > to update that later. > > > > > > > ------------------------------------------------------- > This SF.Net email sponsored by: Parasoft > Error proof Web apps, automate testing & more. > Download & eval WebKing and get a free book. > www.parasoft.com/bulletproofapps1 > _______________________________________________ > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > > |
From: Nick L. <ni...@ma...> - 2003-07-14 11:43:12
|
Glad I'm not alone, then. Those changes are checked in now. ----- Original Message ----- From: "Peter Leschev" <pe...@le...> To: <ni...@ma...> Cc: <cla...@li...> Sent: Monday, July 14, 2003 11:00 AM Subject: Re: [Classifier4j-devel] WordProbability Refactor... > Heya, > > I found it very frustrating trying to produce a > patch using SF's CVS service. Constantly getting error > 0 or EOF meaning there are currently too many > connections to the server. > > Pete > > On Mon, 14 Jul 2003 10:46:24 +0930, "Nick Lothian" > wrote: > > > > Thanks Pete, > > > > I've applied that patch - now sourceforge CVS seems to > > be down, so I'll have > > to update that later. > > |
From: Peter L. <pe...@le...> - 2003-07-14 01:30:49
|
Heya, I found it very frustrating trying to produce a patch using SF's CVS service. Constantly getting error 0 or EOF meaning there are currently too many connections to the server. Pete On Mon, 14 Jul 2003 10:46:24 +0930, "Nick Lothian" wrote: > > Thanks Pete, > > I've applied that patch - now sourceforge CVS seems to > be down, so I'll have > to update that later. > > ----- Original Message ----- > From: "Peter Leschev" <pe...@le...> > To: "Nick Lothian" <ni...@ma...>; > <cla...@li...> > Sent: Sunday, July 13, 2003 12:46 PM > Subject: Re: [Classifier4j-devel] WordProbability > Refactor... > > > > Heya, > > > > here are the files that I submitted to > sf.net for [ 768157 ] > > WordProbability refactor - Now in the unified > format... > > > > Pete > > > > ----- Original Message ----- > > From: "Nick Lothian" <ni...@ma...> > > To: <cla...@li...> > > Sent: Wednesday, July 09, 2003 11:29 PM > > Subject: Re: [Classifier4j-devel] WordProbability > Refactor... > > > > > > > Pete, that looks good, and I think the commons-lang > dependancy is fine. > > > > > > I'm having a little trouble applying it, though. I > don't do a lot of > > > CVS/Patch work, so it might be something I'm doing > wrong. > > > > > > I'm using Eclipse, which has a nice apply-patch > wizard, but unfortunatly > > > that will only process patches in unified format. > Could I get you to > redo > > > the patch using that format? > > > > > > Just send it to the list (and CC it to nick at > mackmo dot com in case it > > > won't accept attachments). > > > > > > Nick > > > > > > ----- Original Message ----- > > > From: "Peter Leschev" <pe...@le...> > > > To: <cla...@li...> > > > Sent: Wednesday, July 09, 2003 11:04 AM > > > Subject: [Classifier4j-devel] WordProbability > Refactor... > > > > > > > > > > > > I would recommend changing the > IWordsDataSource to > > > > > > return WordProbability objects instead of > double. > > > > This > > > > > > would ensure that BayesianClassifier doesn't > have to > > > > > > know how to create WordProbability Objects, > it just > > > > > > gets them from the IWordsDataSource. > > > > > > > > > > > > It would be nice if WordProbability knew how > to > > > > > > calculate it's own probability, given the > number of > > > > > > nonmatching & matching counts. This will > reduce > > > > > > duplication of code with new IWordsDataSource > > > > > > implementations. > > > > > > > > > > > > > > > Sounds pretty reasonable. I'm not very happy > with the > > > > use of the > > > > > WordProbability object at the moment, anyway. > > > > > > > > Heya, > > > > > > > > I've submitted a patch with this change. > > > > > > > > > > > > > > http://sourceforge.net/tracker/index.php?func=detail&aid=768157&group_id=79523&atid=556879 > > > > > > > > Pete > > > > > > > > > > > > > ------------------------------------------------------- > > > > This SF.Net email sponsored by: Parasoft > > > > Error proof Web apps, automate testing & more. > > > > Download & eval WebKing and get a free book. > > > > www.parasoft.com/bulletproofapps > > > > _______________________________________________ > > > > Classifier4j-devel mailing list > > > > Cla...@li... > > > > > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > > > > > > > > > > > > > > > > > ------------------------------------------------------- > > > This SF.Net email sponsored by: Parasoft > > > Error proof Web apps, automate testing & more. > > > Download & eval WebKing and get a free book. > > > www.parasoft.com/bulletproofapps > > > _______________________________________________ > > > Classifier4j-devel mailing list > > > Cla...@li... > > > > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > > > > > > > > > > > > ------------------------------------------------------- > This SF.Net email sponsored by: Parasoft > Error proof Web apps, automate testing & more. > Download & eval WebKing and get a free book. > www.parasoft.com/bulletproofapps1 > _______________________________________________ > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel |
From: Nick L. <ni...@ma...> - 2003-07-14 01:16:28
|
Thanks Pete, I've applied that patch - now sourceforge CVS seems to be down, so I'll have to update that later. ----- Original Message ----- From: "Peter Leschev" <pe...@le...> To: "Nick Lothian" <ni...@ma...>; <cla...@li...> Sent: Sunday, July 13, 2003 12:46 PM Subject: Re: [Classifier4j-devel] WordProbability Refactor... > Heya, > > here are the files that I submitted to sf.net for [ 768157 ] > WordProbability refactor - Now in the unified format... > > Pete > > ----- Original Message ----- > From: "Nick Lothian" <ni...@ma...> > To: <cla...@li...> > Sent: Wednesday, July 09, 2003 11:29 PM > Subject: Re: [Classifier4j-devel] WordProbability Refactor... > > > > Pete, that looks good, and I think the commons-lang dependancy is fine. > > > > I'm having a little trouble applying it, though. I don't do a lot of > > CVS/Patch work, so it might be something I'm doing wrong. > > > > I'm using Eclipse, which has a nice apply-patch wizard, but unfortunatly > > that will only process patches in unified format. Could I get you to redo > > the patch using that format? > > > > Just send it to the list (and CC it to nick at mackmo dot com in case it > > won't accept attachments). > > > > Nick > > > > ----- Original Message ----- > > From: "Peter Leschev" <pe...@le...> > > To: <cla...@li...> > > Sent: Wednesday, July 09, 2003 11:04 AM > > Subject: [Classifier4j-devel] WordProbability Refactor... > > > > > > > > > I would recommend changing the IWordsDataSource to > > > > > return WordProbability objects instead of double. > > > This > > > > > would ensure that BayesianClassifier doesn't have to > > > > > know how to create WordProbability Objects, it just > > > > > gets them from the IWordsDataSource. > > > > > > > > > > It would be nice if WordProbability knew how to > > > > > calculate it's own probability, given the number of > > > > > nonmatching & matching counts. This will reduce > > > > > duplication of code with new IWordsDataSource > > > > > implementations. > > > > > > > > > > > > Sounds pretty reasonable. I'm not very happy with the > > > use of the > > > > WordProbability object at the moment, anyway. > > > > > > Heya, > > > > > > I've submitted a patch with this change. > > > > > > > > > http://sourceforge.net/tracker/index.php?func=detail&aid=768157&group_id=79523&atid=556879 > > > > > > Pete > > > > > > > > > ------------------------------------------------------- > > > This SF.Net email sponsored by: Parasoft > > > Error proof Web apps, automate testing & more. > > > Download & eval WebKing and get a free book. > > > www.parasoft.com/bulletproofapps > > > _______________________________________________ > > > Classifier4j-devel mailing list > > > Cla...@li... > > > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > > > > > > > > > > > ------------------------------------------------------- > > This SF.Net email sponsored by: Parasoft > > Error proof Web apps, automate testing & more. > > Download & eval WebKing and get a free book. > > www.parasoft.com/bulletproofapps > > _______________________________________________ > > Classifier4j-devel mailing list > > Cla...@li... > > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > > > > > |
From: Nick L. <ni...@ma...> - 2003-07-09 13:29:43
|
Pete, that looks good, and I think the commons-lang dependancy is fine. I'm having a little trouble applying it, though. I don't do a lot of CVS/Patch work, so it might be something I'm doing wrong. I'm using Eclipse, which has a nice apply-patch wizard, but unfortunatly that will only process patches in unified format. Could I get you to redo the patch using that format? Just send it to the list (and CC it to nick at mackmo dot com in case it won't accept attachments). Nick ----- Original Message ----- From: "Peter Leschev" <pe...@le...> To: <cla...@li...> Sent: Wednesday, July 09, 2003 11:04 AM Subject: [Classifier4j-devel] WordProbability Refactor... > > > I would recommend changing the IWordsDataSource to > > > return WordProbability objects instead of double. > This > > > would ensure that BayesianClassifier doesn't have to > > > know how to create WordProbability Objects, it just > > > gets them from the IWordsDataSource. > > > > > > It would be nice if WordProbability knew how to > > > calculate it's own probability, given the number of > > > nonmatching & matching counts. This will reduce > > > duplication of code with new IWordsDataSource > > > implementations. > > > > > > Sounds pretty reasonable. I'm not very happy with the > use of the > > WordProbability object at the moment, anyway. > > Heya, > > I've submitted a patch with this change. > > http://sourceforge.net/tracker/index.php?func=detail&aid=768157&group_id=79523&atid=556879 > > Pete > > > ------------------------------------------------------- > This SF.Net email sponsored by: Parasoft > Error proof Web apps, automate testing & more. > Download & eval WebKing and get a free book. > www.parasoft.com/bulletproofapps > _______________________________________________ > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > |
From: Nick L. <ni...@ma...> - 2003-07-09 13:29:35
|
Pete, that looks good, and I think the commons-lang dependancy is fine. I'm having a little trouble applying it, though. I don't do a lot of CVS/Patch work, so it might be something I'm doing wrong. I'm using Eclipse, which has a nice apply-patch wizard, but unfortunatly that will only process patches in unified format. Could I get you to redo the patch using that format? Just send it to the list (and CC it to nick at mackmo dot com in case it won't accept attachments). Nick ----- Original Message ----- From: "Peter Leschev" <pe...@le...> To: <cla...@li...> Sent: Wednesday, July 09, 2003 11:04 AM Subject: [Classifier4j-devel] WordProbability Refactor... > > > I would recommend changing the IWordsDataSource to > > > return WordProbability objects instead of double. > This > > > would ensure that BayesianClassifier doesn't have to > > > know how to create WordProbability Objects, it just > > > gets them from the IWordsDataSource. > > > > > > It would be nice if WordProbability knew how to > > > calculate it's own probability, given the number of > > > nonmatching & matching counts. This will reduce > > > duplication of code with new IWordsDataSource > > > implementations. > > > > > > Sounds pretty reasonable. I'm not very happy with the > use of the > > WordProbability object at the moment, anyway. > > Heya, > > I've submitted a patch with this change. > > http://sourceforge.net/tracker/index.php?func=detail&aid=768157&group_id=79523&atid=556879 > > Pete > > > ------------------------------------------------------- > This SF.Net email sponsored by: Parasoft > Error proof Web apps, automate testing & more. > Download & eval WebKing and get a free book. > www.parasoft.com/bulletproofapps > _______________________________________________ > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > |
From: Peter L. <pe...@le...> - 2003-07-09 01:34:02
|
> > I would recommend changing the IWordsDataSource to > > return WordProbability objects instead of double. This > > would ensure that BayesianClassifier doesn't have to > > know how to create WordProbability Objects, it just > > gets them from the IWordsDataSource. > > > > It would be nice if WordProbability knew how to > > calculate it's own probability, given the number of > > nonmatching & matching counts. This will reduce > > duplication of code with new IWordsDataSource > > implementations. > > > Sounds pretty reasonable. I'm not very happy with the use of the > WordProbability object at the moment, anyway. Heya, I've submitted a patch with this change. http://sourceforge.net/tracker/index.php?func=detail&aid=768157&group_id=79523&atid=556879 Pete |
From: Nick L. <nl...@es...> - 2003-07-02 08:41:57
|
> > TABLE matching_words > - word varchar > - word_count int > > TABLE nonmatching_words > - word varchar > - word_count int > > I would recommend using something like: > > TABLE words > - word varchar > - nonmatching_count int > - matching_count int > > This will allow you to obtain the required information > during classification with one query per word instead > of two. It also makes it easier to teach the classifier > (you just increment the nonmatching_count or > matching_count by one). > Mmm.. I actually experimented with that. I can't remember why I abandoned it - I'll look at my old code. > > Any comments on the API - > > net.sf.classifier4J.IClassifier > > in particular? > Hmmm there needs to be a way to teach the Classifier > with new input. Not sure if that would be under > BayesianClassifier or IClassifier or a > ITeachableClassifier (which BayesianClassifier would > extend). Really depends on what classifiers you want to > implement in the future. I don't really mind... > Otherwise IClassifier looks ok to me... > I think that the interface for training should be totally separate from the IClassifier heirachy There are just so many ways of doing the training - it probably depends on the backend as well. I do agree that there is a need for a training API, though. I'd like to leave that for a bit until we understand the problem space better. In particualar, I want to try a Vector Space Search classifier (See <http://www.mackmo.com/nick/blog/java/?permalink=LatentSemanticIndexing.txt> ) > I would recommend changing the IWordsDataSource to > return WordProbability objects instead of double. This > would ensure that BayesianClassifier doesn't have to > know how to create WordProbability Objects, it just > gets them from the IWordsDataSource. > > It would be nice if WordProbability knew how to > calculate it's own probability, given the number of > nonmatching & matching counts. This will reduce > duplication of code with new IWordsDataSource > implementations. > Sounds pretty reasonable. I'm not very happy with the use of the WordProbability object at the moment, anyway. |
From: Peter L. <pe...@le...> - 2003-07-02 06:40:12
|
Heya, > you are talking about the Source zip?. Yep... > I just zipped the source with WinZip, > so it doesn't surprise me. that explains it... > 2) I'm reasonably familiar with Hibernate > (in theory at least). I'm reluctant to > replace the JDBC Data Source with a Hibernate > one because I want to make Classifier4J very > easy to drop into people's code without too > many dependencies. Fair enough.... > However, I'm not opposed to a > HibernateWordsDataSource if > you'd like to work on that. Cool... > Please be aware that there are > performance problems at the moment > when using a database backend, > and I'm not convinced that a normal > DB backend will ever be able to > deliver sufficient performance on > large documents. I'd like to try and convince you otherwise! :) I think it'll be possible with a schema change & using hibernate's caching mechanisms... > I think I'll need to have a table that > contains the precalculated word probability > for each word to get rid of the two-queries-per > word issue Having precalculated word probs in the database makes it difficult to teach the classifier new sentences... Currently the schema is something like: TABLE matching_words - word varchar - word_count int TABLE nonmatching_words - word varchar - word_count int I would recommend using something like: TABLE words - word varchar - nonmatching_count int - matching_count int This will allow you to obtain the required information during classification with one query per word instead of two. It also makes it easier to teach the classifier (you just increment the nonmatching_count or matching_count by one). > Any comments on the API - > net.sf.classifier4J.IClassifier > in particular? Hmmm there needs to be a way to teach the Classifier with new input. Not sure if that would be under BayesianClassifier or IClassifier or a ITeachableClassifier (which BayesianClassifier would extend). Really depends on what classifiers you want to implement in the future. I don't really mind... Otherwise IClassifier looks ok to me... I would recommend changing the IWordsDataSource to return WordProbability objects instead of double. This would ensure that BayesianClassifier doesn't have to know how to create WordProbability Objects, it just gets them from the IWordsDataSource. It would be nice if WordProbability knew how to calculate it's own probability, given the number of nonmatching & matching counts. This will reduce duplication of code with new IWordsDataSource implementations. Pete |
From: Nick L. <nl...@es...> - 2003-07-02 00:06:48
|
Welcome! I'm very interested in having help. 1) I hadn't noticed the problems with the zip file. I'll have to check that - you are talking about the Source zip?. I just zipped the source with WinZip, so it doesn't surprise me. 2) I'm reasonably familiar with Hibernate (in theory at least). I'm reluctant to replace the JDBC Data Source with a Hibernate one because I want to make Classifier4J very easy to drop into people's code without too many dependencies. However, I'm not opposed to a HibernateWordsDataSource if you'd like to work on that. Please be aware that there are performance problems at the moment when using a database backend, and I'm not convinced that a normal DB backend will ever be able to deliver sufficient performance on large documents. I think I'll need to have a table that contains the precalculated word probability for each word to get rid of the two-queries-per word issue <http://classifier4j.sourceforge.net/task-list.html#net.sf.classifier4J.baye sian.JDBCWordsDataSource.methods>). I'm not stuck on the DB schema anyway, so if something else works better with Hibernate the go with it. 3) Yeah. I'm thinking an extra column on the *_words tables that joins to a classifier_types table or something. Any comments on the API - net.sf.classifier4J.IClassifier (<http://classifier4j.sourceforge.net/apidocs/net/sf/classifier4J/IClassifie r.html>) in particular? Thanks for the feedback. Nick -----Original Message----- From: Peter Leschev [mailto:pe...@le...] Sent: Tuesday, 1 July 2003 10:03 PM To: cla...@li... Subject: [Classifier4j-devel] First look at classifier4j... Heya, I need a Bayesian classifier for a pet project that I'm going to be working on. Instead of reinventing the wheel, I thought I'd check out what was available.... I'd like to help out with this project (If you're interested that is!)... I understand that 0.2 was an initial release, so I'm probably stating the obvious here, anyhow, here are a couple of suggestions: - Deployment - The zip file has CVS entries in it. From memory, performing a zip from ant doesn't include the CVS entries by default. - Also would be nice if all the files where under "Classifier4J-0.2" instead of ".".... - Database - I would recommend using hibernate (http://hibernate.bluemars.net) instead of directly using JDBC. I've used this product with other projects and I wholeheartedly recommend it. This would allow classifier4j get cached query results etc for free... Also keeps the code cleaner... - Currently the model only supports one classifier in the database, I would require a number of separate classifiers to be stored in the database. Pete |
From: Peter L. <pe...@le...> - 2003-07-01 12:33:29
|
Heya, I need a Bayesian classifier for a pet project that I'm = going to be working on. Instead of reinventing the wheel, I thought I'd = check out what was available.... I'd like to help out with this project (If you're interested that = is!)... I understand that 0.2 was an initial release, so I'm probably stating = the obvious here, anyhow, here are a couple of suggestions: - Deployment - The zip file has CVS entries in it. From memory, performing a zip = from ant doesn't include the CVS entries by default. - Also would be nice if all the files where under "Classifier4J-0.2" = instead of ".".... - Database - I would recommend using hibernate (http://hibernate.bluemars.net) = instead of directly using JDBC. I've used this product with other = projects and I wholeheartedly recommend it. This would allow = classifier4j get cached query results etc for free... Also keeps the = code cleaner... - Currently the model only supports one classifier in the database, I = would require a number of separate classifiers to be stored in the = database. Pete |