You can subscribe to this list here.
2004 |
Jan
|
Feb
|
Mar
(27) |
Apr
(25) |
May
(8) |
Jun
(2) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
(1) |
Nov
|
Dec
|
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
(2) |
Nov
|
Dec
|
2007 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
(8) |
Jun
|
Jul
(3) |
Aug
(8) |
Sep
|
Oct
|
Nov
(1) |
Dec
|
2008 |
Jan
(3) |
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
(3) |
Oct
|
Nov
(3) |
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(2) |
2013 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Paolo <oo...@us...> - 2004-06-20 13:48:07
|
On Sun, Jun 20, 2004 at 09:02:52AM -0400, Bill Yerazunis wrote: ... > > (I cloned the BlameTamar OSB code... maybe the unigram is the problem? hmm... 'BlameTamar'? when was it announced? on crm114-discuss? reading answers in this thread seems that I didn't get a number of postings, or that somehow I slipped off a list (perhaps crm114-discuss? - ok, if I get this back I'm still, otherwise I've been pushed out, or SF's mailman is clogged once again). -- paolo GPG/PGP id:0x21426690 kfp:EDFB 0103 A8D8 4180 8AB5 D59E 9771 0F28 2142 6690 |
From: Piotr K. <pe...@lo...> - 2004-06-10 13:06:47
|
Hello I have successful compiled and configured crm114 on my Sun Classic (32 MB RAM) with OpenBSD 3.4 and qmail after instruction of Dave Voorhis http://www.armchair.mb.ca/~dave/crmqmail/. I had on boat qmail with clamav-antivirus with perl qmail-scanner and additional crm114 has been by me configured. This worked good but it seems to more for such small server. There was "denial attack" stoping my server. So I had observing memory resources and limiting mailfilter to 1 MB (thus writing in .qmail file | /usr/bin/crm ./mailfilter.crm > -w 1000000 /dev/null; echo "Filtered by mailfilter") So, by working with small files all was good and I have achieved "secure" level of CPU and swap using (till 80%) But with file 2,4MB (text + attachment with MS-Word doc file) crm go hanging during work and this process must by me manually killed. Can anybody to ask me and say what do? A server is realy very "small" but spamfilterin were very desirable :) Piotr -- Piotr Kasztelowicz Pio...@lo... http://www.am.torun.pl/~pekasz |
From: <ben...@id...> - 2004-05-25 08:46:28
|
Dear Open Source developer I am doing a research project on "Fun and Software Development" in which I kindly invite you to participate. You will find the online survey under http://fasd.ethz.ch/qsf/. The questionnaire consists of 53 questions and you will need about 15 minutes to complete it. With the FASD project (Fun and Software Development) we want to define the motivational significance of fun when software developers decide to engage in Open Source projects. What is special about our research project is that a similar survey is planned with software developers in commercial firms. This procedure allows the immediate comparison between the involved individuals and the conditions of production of these two development models. Thus we hope to obtain substantial new insights to the phenomenon of Open Source Development. With many thanks for your participation, Benno Luthiger PS: The results of the survey will be published under http://www.isu.unizh.ch/fuehrung/blprojects/FASD/. We have set up the mailing list fa...@we... for this study. Please see http://fasd.ethz.ch/qsf/mailinglist_en.html for registration to this mailing list. _______________________________________________________________________ Benno Luthiger Swiss Federal Institute of Technology Zurich 8092 Zurich Mail: benno.luthiger(at)id.ethz.ch _______________________________________________________________________ |
From: Christian S. <si...@mi...> - 2004-05-12 17:16:24
|
Tim, Tim Romero wrote: >><http://www.johnjohnston.org/crm114.html> would be appropriate. So far I >>haven't found time to add this functionality, but I plan to do it sooner or >>later. If you're interesting in this it would probably go faster if you're >>ready to help ;-) > > I'll be more than happy to help out. ... >>The one thing that's currently possible is do some evaluation runs on test >>corpora such as the one from SpamAssassin >><http://spamassassin.org/publiccorpus/> which we used for our experiments. >>If you want to do this, I will tell you what to do... > > That would be a great start. If you can give me enough > information to, say, run it and replicate your results, I can probably > take it from there. I've now written some documentation on how to install and use the system. It's terse but it should be enough to reproduce the SpamAssassin experiments and also to do some others. If you run into problems or something is missing, tell me. I have some ideas for extensions in two directions: 1. Modifications to the classification model that would hopefully make it even better. 2. Some wrapper code so the system could be used as a real mail filter, not just for experiments. I won't find the time to realize all this, so if you're ready to help in one of these areas and know Java (or are willing to learn ;-) I'll be very happy. And, of course, feel free to try things out; probably you'll even find some bugs ;-) Bye Christian -- ------------ Christian Siefkes ----------------------------------------- | Email: chr...@si... | Web: http://www.siefkes.net/ | Graduate School in Distributed IS: http://www.wiwi.hu-berlin.de/gkvi/ -------------------- Offline P2P: http://www.leihnetzwerk.de/ ---------- How does one hate a country, or love one? ... I lack the trick of it. I know people, I know towns, farms, hills and rivers and rocks, I know how the sun at sunset in autumn falls on the side of a certain plowland in the hills; but what is the sense of giving a boundary to all that, of giving it a name and ceasing to love where the name ceases to apply? What is love of one's country; is it hate of one's uncountry? Then it's not a good thing. -- Ursula K. Le Guin, The Left Hand of Darkness |
From: Tim R. <t3...@vg...> - 2004-05-10 01:27:03
|
Christian, > Depends on what you want to do. As I said, it's currently not possible to > actually use the system as a mail filter. That's nontrivial to add, because I just want to use it to better understand the mathematics behind the process and to test some pet theories of my own. I don't want to turn it into an application. > <http://www.johnjohnston.org/crm114.html> would be appropriate. So far I > haven't found time to add this functionality, but I plan to do it sooner or > later. If you're interesting in this it would probably go faster if you're > ready to help ;-) I'll be more than happy to help out. > The one thing that's currently possible is do some evaluation runs on test > corpora such as the one from SpamAssassin > <http://spamassassin.org/publiccorpus/> which we used for our experiments. > If you want to do this, I will tell you what to do... That would be a great start. If you can give me enough information to, say, run it and replicate your results, I can probably take it from there. Tim |
From: Christian S. <si...@mi...> - 2004-05-06 09:24:17
|
Tim, > Thanks. Yours is a a fascinating approach. I've downloaded your software > and while I understand the core algorithm you are using a am a bit > confused about quite a few areas. Can you give me a simple use example > or point me to some documentation about the command line args and system > properties settings you use? Depends on what you want to do. As I said, it's currently not possible to actually use the system as a mail filter. That's nontrivial to add, because my system doesn't use memory mapped files as CRM114 does, but keeps the whole prediction model in main memory, so it's not feasible to invoke it for each classification. Instead a client/demon setup is required to reduce start time, similar to spamd+spamc of SpamAssassin. Also training is different, because my system does not only learn on error, but also in some other cases. So a folder-based training similar to <http://www.avtechpulse.com/opensource/crm114.html> resp. <http://www.johnjohnston.org/crm114.html> would be appropriate. So far I haven't found time to add this functionality, but I plan to do it sooner or later. If you're interesting in this it would probably go faster if you're ready to help ;-) The one thing that's currently possible is do some evaluation runs on test corpora such as the one from SpamAssassin <http://spamassassin.org/publiccorpus/> which we used for our experiments. If you want to do this, I will tell you what to do... Bye Christian -- ------------ Christian Siefkes ----------------------------------------- | Email: chr...@si... | Web: http://www.siefkes.net/ | Graduate School in Distributed IS: http://www.wiwi.hu-berlin.de/gkvi/ -------------------- Offline P2P: http://www.leihnetzwerk.de/ ---------- More than any time in history, mankind now faces a crossroads. One path leads to despair and utter hopelessness, the other to total extinction. Let us pray that we have the wisdom to choose correctly. -- Woody Allen |
From: Tim R. <t3...@vg...> - 2004-05-06 01:39:28
|
Christian, Thanks. Yours is a a fascinating approach. I've downloaded your software and while I understand the core algorithm you are using a am a bit confused about quite a few areas. Can you give me a simple use example or point me to some documentation about the command line args and system properties settings you use? Tim > Depends. I used this combination schema (OSB) together with the Winnow > classifier (see e.g. http://l2r.cs.uiuc.edu/~danr/Papers/spellJ.ps ) which > does not use predefined weights but instead determines the optimal weight > of each feature individually. This classifier reaches a much higher > accuracy than the current classification model of CRM114 -- Fidelis, > Shalen and me have written a paper on the results (but we have to wait > whether it's accepted until we can make it public). > > Bill now plans to integrate something similar (but more complex and thus > maybe even better) by adding a neural net to CRM114. My classifier is > available at <http://www.inf.fu-berlin.de/inst/ag-db/software/ties/>, but > currently it's only a testbed -- there is no code for actually using it in > a live system. > > Bye > Chris > > ------------ Christian Siefkes ----------------------------------------- > | Email: chr...@si... | Web: http://www.siefkes.net/ > | Graduate School in Distributed IS: http://www.wiwi.hu-berlin.de/gkvi/ > -------------------- Offline P2P: http://www.leihnetzwerk.de/ ---------- > A verbal contract isn't worth the paper it's written on. > -- Samuel Goldwyn > |
From: Christian S. <si...@mi...> - 2004-05-03 15:45:37
|
Hi Tim, On Mon, 3 May 2004 t3...@t3... wrote: > >>0 0 0 0 > >>1 1 0 0 > >>1 0 1 0 > >>1 0 0 1 > >> > >> > >> I also tried with single tokens added, but that didn't improve accuracy. > >I also tried with single tokens but soon discarded because I didn't get > >any improvement. Also, and this is only a guess, I think that isolated > I am trying to understand this, and it is a bit overwhelming. > What weights are you using in the above system? Depends. I used this combination schema (OSB) together with the Winnow classifier (see e.g. http://l2r.cs.uiuc.edu/~danr/Papers/spellJ.ps ) which does not use predefined weights but instead determines the optimal weight of each feature individually. This classifier reaches a much higher accuracy than the current classification model of CRM114 -- Fidelis, Shalen and me have written a paper on the results (but we have to wait whether it's accepted until we can make it public). Bill now plans to integrate something similar (but more complex and thus maybe even better) by adding a neural net to CRM114. My classifier is available at <http://www.inf.fu-berlin.de/inst/ag-db/software/ties/>, but currently it's only a testbed -- there is no code for actually using it in a live system. Bye Chris ------------ Christian Siefkes ----------------------------------------- | Email: chr...@si... | Web: http://www.siefkes.net/ | Graduate School in Distributed IS: http://www.wiwi.hu-berlin.de/gkvi/ -------------------- Offline P2P: http://www.leihnetzwerk.de/ ---------- A verbal contract isn't worth the paper it's written on. -- Samuel Goldwyn |
From: Fidelis A. <fi...@po...> - 2004-05-03 11:59:06
|
t3...@t3... wrote: > Bill, Christian, Fidelis, et. al., > > > >>>0 0 0 0 >>>1 1 0 0 >>>1 0 1 0 >>>1 0 0 1 >>> >>> >>>I also tried with single tokens added, but that didn't improve accuracy. >> >>I also tried with single tokens but soon discarded because I didn't get >>any improvement. Also, and this is only a guess, I think that isolated > > I am trying to understand this, and it is a bit overwhelming. > What weights are you using in the above system? > > Tim Hi Tim, I've gotten best accuracy, equivalent to default crm, with 24, 14, 7 and 4 for distances 0, 1, 2 and 3 respectively. -- Fidelis Assis |
From: <t3...@t3...> - 2004-05-03 07:13:07
|
Bill, Christian, Fidelis, et. al., >>0 0 0 0 >>1 1 0 0 >>1 0 1 0 >>1 0 0 1 >> >> >> I also tried with single tokens added, but that didn't improve accuracy. >I also tried with single tokens but soon discarded because I didn't get >any improvement. Also, and this is only a guess, I think that isolated I am trying to understand this, and it is a bit overwhelming. What weights are you using in the above system? Tim |
From: Fidelis A. <fi...@po...> - 2004-04-24 21:05:30
|
Shalen wrote: > > Since we are just considering clubbing of word positions (word pairs) any > thoughts on reversing the weights orders? > Hi Shalen I tested with the weights in reverse order and the result was a surprise to me: only less than 3% worse, on average! You have good intuition, man! Errors/5000 msgs - only errors in last 500 messages: --------------------------------------------------------------- Weigths order | run 1 | run 2 | run 3 | run 4 | run 5 | Average --------------------------------------------------------------- Reverse | 73 | 68 | 70 | 69 | 81 | 72.2 --------------------------------------------------------------- Direct | 62 | 72 | 65 | 80 | 73 | 70.4 --------------------------------------------------------------- The total errors were also very close: --------------------------------------------------------------- Weigths order | run 1 | run 2 | run 3 | run 4 | run 5 | Average --------------------------------------------------------------- Reverse | 1248 | 1257 | 1226 | 1215 | 1242 | 1237.6 --------------------------------------------------------------- Direct | 1216 | 1253 | 1211 | 1212 | 1245 | 1227.4 --------------------------------------------------------------- These results are really counter-intuitive to me! Bill any clue about what happened here? ======= I noticed that I made a mistake when counting the total errors and the table I sent in the first message I mentioned WPCW is wrong - I didn't add the errors in the last shuffle (around 130 errors) to both, SBPH and WPCM, in all runs. The right table is the following: Errors/41470 msgs - all errors in a 10 shuffle set: ------------------------------------------------------------ Algorithm | run 1 | run 2 | run 3 | run 4 | run 5 | Average ------------------------------------------------------------ SBPH | 1208 | 1220 | 1221 | 1201 | 1224 | 1214.8 ------------------------------------------------------------ WPCW | 1216 | 1253 | 1211 | 1212 | 1245 | 1227.4 ------------------------------------------------------------ Sorry for the mistake. Anyway, it was luck that I did the same mistake to both SBPH and WPCW, because so the comparison was not affected. :) -- Fidelis Assis |
From: Fidelis A. <fi...@po...> - 2004-04-22 14:50:36
|
Christian Siefkes wrote: [...] > Actually the first line is different, since unigrams (single tokens) are > not used in this model: > > >>0 0 0 0 >>1 1 0 0 >>1 0 1 0 >>1 0 0 1 > > > I also tried with single tokens added, but that didn't improve accuracy. I also tried with single tokens but soon discarded because I didn't get any improvement. Also, and this is only a guess, I think that isolated words and ordered bigrams are different in nature, belonging to different layers, lexical versus somewhat more semantic, because of the extra information - order and position - that the ordered bigrams have. I even tried to think of isolated words as representing all the possible bigrams besides the 4 inside the window, as if the window started at the beginning of the text end ended at the present position, and so should receive a weigth equal to the sum of those extra bigrams, considering the weights decaying exponentially, but that was a really wild guess... -- Fidelis Assis |
From: Fidelis A. <fi...@po...> - 2004-04-22 13:33:02
|
Bill Yerazunis wrote: > From: Fidelis Assis <fi...@po...> > combining them you can reconstruct the original signal. Is that OK? > > In that sense, I think they are also similar to prime numbers, because > you can "combine" them to get all the others, or to any other set of > orthogonal functions, like in Fourier transform for instance. > > By the way, that was exactly what I was looking for, "basic", "prime", > or "orthogonal", features. I leave it for you, mathematicians, to give a > proper name. I'm not a mathematician, thou I wanted to be one in my youth. > > "Orthogonal" ! > > That's it! That's what's different! > > SBPH, Markovian, etc use overlapping polynomials(those polynomials > that you can add two together and get something like a third polynomial) > > Orthogonal (or orthonormal) basis functions do NOT have that property- > from the orthogonal basis set of functions you _cannot_ add or subtract > pairs to get any other member of the basis set - but if you already +have+ > the basis set, you can add and subtract them to get all other possible > functions. Yes, I think you put it clearly! > > So, "Orthogonal" and "Feature" are both excellent words here. Next? What about bigram, order and reference? The fifth word, w5, besides being a word, of course, plays another important role: it is a reference within the window that allows us to establish order and position. I guess that considering order and position moves us up from just lexical analysis, as in first order bayesian, towards semantic analysis. Something like orthogonal bigrams with a reference word, orthogonal ordered bigrams... -- Fidelis Assis |
From: Christian S. <si...@mi...> - 2004-04-22 11:28:23
|
On Thu, 22 Apr 2004, Bill Yerazunis wrote: > Actually the first line is different, since unigrams (single tokens) are > not used in this model: > > > 0 0 0 0 > > 1 1 0 0 > > 1 0 1 0 > > 1 0 0 1 > > I also tried with single tokens added, but that didn't improve accuracy. > > Really !?!?!? Now, *that* is so totally beyond my expectation it > wouldn't have occurred to me to test that case! Yes, I was surprised as well, but it's just what Fidelis had written: just the sparse bigrams, no unigrams/tokens. > In that case, may I suggest differential testing? Pick one or two > window lengths (or more), and mask in or out the various features > to see how much each alone contributes, and how much each together? ... > 1 0 0 0 > 1 0 1 0 > then > 1 0 0 0 > 1 0 0 1 Testing such combinations is not possible right now because in my current sparse bigram implementation ( http://www.inf.fu-berlin.de/inst/ag-db/software/ties/xref/de/fu_berlin/ties/classify/feature/WPCWTransformer.html ) you can't get "1 0 1" without also getting "1 1" etc. Shouldn't be very hard to add but not right know. We can do this for a follow-up paper, I think we already have enough material for this one. Bye Chris ------------ Christian Siefkes ----------------------------------------- | Email: chr...@si... | Web: http://www.siefkes.net/ | Graduate School in Distributed IS: http://www.wiwi.hu-berlin.de/gkvi/ -------------------- Offline P2P: http://www.leihnetzwerk.de/ ---------- -- "If you kiss me, I'll turn into a beautiful princess." -- "Look, I'm a computer programmer. I don't have time for girls, but a talking frog is very cool." |
From: Bill Y. <ws...@me...> - 2004-04-22 11:13:11
|
From: Christian Siefkes <si...@mi...> Yes, the features generated from "Model" are: "Field Model", "Random <skip> Model", "Markov <skip> <skip> Model" and one more with 3 skips. > Diagonalized Walsh-Hadamard is just a reshuffle of the rows, then > subtraction to renormalize, and yields something like: > > -3 0 0 0 > 1 1 0 0 > 1 0 1 0 > 1 0 0 1 > > which is pretty darn close to the WPCW matrix: > > 1 0 0 0 > 1 1 0 0 > 1 0 1 0 > 1 0 0 1 Actually the first line is different, since unigrams (single tokens) are not used in this model: > 0 0 0 0 > 1 1 0 0 > 1 0 1 0 > 1 0 0 1 I also tried with single tokens added, but that didn't improve accuracy. Really !?!?!? Now, *that* is so totally beyond my expectation it wouldn't have occurred to me to test that case! In that case, may I suggest differential testing? Pick one or two window lengths (or more), and mask in or out the various features to see how much each alone contributes, and how much each together? Because there's a strong possibility of nonlinear interactions, I would recommend that we don't do sparse testing- we should (if time and CPU is acceptable) fully test each combination. The basic question I'm looking at now is "what's the shape of the combination hyperspace of all of the orthogonal separated bigram combinations?" Example: for window = 4, The test set would look like this: (using the above formatting): 1 0 0 0 # individual features alone. then 1 1 0 0 # single bigrams then 1 0 1 0 then 1 0 0 1 then 1 0 0 0 # features taken two at a time. 1 1 0 0 then 1 0 0 0 1 0 1 0 then 1 0 0 0 1 0 0 1 then 1 1 0 0 1 0 1 0 then 1 1 0 0 1 0 0 1 then 1 0 1 0 1 0 0 1 then 1 0 0 0 # features taken three at a time 1 1 0 0 1 0 1 0 then 1 0 0 0 1 1 0 0 1 0 0 1 then 1 0 0 0 1 0 1 0 1 0 0 1 then 1 1 0 0 1 0 1 0 1 0 0 1 then 1 0 0 0 # all four features 1 1 0 0 1 0 1 0 1 0 0 1 -Bill Yerazunis |
From: Christian S. <si...@mi...> - 2004-04-22 09:11:02
|
On Wed, 21 Apr 2004, Bill Yerazunis wrote: > I suppose "Word Pairs Context Window" is clearer than "Word Pairs with a > Common Word", but both sound somewhat strange. I thought about "sparse > bigrams", since bigram is the scientific term for word pair (but then we > still need a handy acronym). What do you think, Fidelis? > > It's not just sparse bigrams; it's sparse separated bigrams. Hm, then what you do think would be "sparse bigrams"? I ment sparse in the senses "settled at widely spaced intervals; Thinly scattered; not being dense or close together" (from http://dictionary.reference.com/search?q=sparse ), so "sparse separated" would be duplicate in my opionion. They also have " A sparse matrix (or vector, or array) is one in which most of the elements are zero" but that usage wouldn't be appropriate here (you could use sparse in this sense for a context window where most of the words are null/skipped but not for bigrams because the two words of the bigram are always present). Well, of course I'm not a native speaker either, so maybe I got this wrong... > Question: in the WPCW implementation, do you care what the separation > distance is? (i.e. does the feature "Random Model" appear in "Markov > Random Field Model with Variable Weighting Schemas"? Or does the > feature "Random <skip> Model" appear instead?) Yes, the features generated from "Model" are: "Field Model", "Random <skip> Model", "Markov <skip> <skip> Model" and one more with 3 skips. > Diagonalized Walsh-Hadamard is just a reshuffle of the rows, then > subtraction to renormalize, and yields something like: > > -3 0 0 0 > 1 1 0 0 > 1 0 1 0 > 1 0 0 1 > > which is pretty darn close to the WPCW matrix: > > 1 0 0 0 > 1 1 0 0 > 1 0 1 0 > 1 0 0 1 Actually the first line is different, since unigrams (single tokens) are not used in this model: > 0 0 0 0 > 1 1 0 0 > 1 0 1 0 > 1 0 0 1 I also tried with single tokens added, but that didn't improve accuracy. Bye Christian ------------ Christian Siefkes ----------------------------------------- | Email: chr...@si... | Web: http://www.siefkes.net/ | Graduate School in Distributed IS: http://www.wiwi.hu-berlin.de/gkvi/ -------------------- Offline P2P: http://www.leihnetzwerk.de/ ---------- Wars don't determine who's right, only who's left. |
From: Shalen <sch...@cs...> - 2004-04-22 01:35:47
|
Fedelis: Heres some work for you :-) 1. Can you write this approach you tried WPCW (Word Pair With Common Word)in an elegant style as if you are writing for a new bie. Dont worry about theoretical justifications or citations (although write them where ever you know --preferred and I am tracking this thread), just write free flow of thoughts in simple language. This will form the Testing or Approaches tried section of the paper. If the results are good, we then have to correlate and find a resemblance with a theoretical model and also reason for valid justification. 2. Can you try these weights and let us know all the results like in this email: Weights for Window 5: 24, 14, 7, 4 As you say (>Closer pairs are > given more credence, so the weights are based on the distance, instead > of on the length like in SBPH/SM.) With this WPCW I am curious to know your results with: (according to our MWS) 308, 277, 272, 271 (according to our ES) and 1372, 1225, 1204, 1201 The effect of the word position w5 is very dominant here, but let us see the results and we can think of diluting this effect (if required). Since we are just considering clubbing of word positions (word pairs) any thoughts on reversing the weights orders? These are just wild thoughts but I see the average time is not too much so let us know the results. Yes! You are right, we have not yet explored large window sizes. In our previous paper, because of time and resource constraints we could not go beyond window 6. May be with this approach we can try windows of greater size, but looking for an optimum window size is a direction I am looking into/ we should look into. Thats all for now for this post. Thanks, Sincerely, Shalen > > I've got a new result to add, and I'm quite excited about it: > > > > SBPH+Ultraconservative Winnow, 10% threshold, 1.6M features, LRU pruning, single pass: > > 719 (number of errors on all 10x4147 mails) > > 38 (number of errors on the last 10x500 mails) > > > > Very interesting. That number, 38 errors, is really impressive! I have > some results I think are interesting too, but with respect to speed. I > tried to speed up classification by removing redundant features in SBPH > and got a classification speed up of more than 2 times, with 1/4 of disk > space, while keeping the same accuracy. Maybe there's a possibility to > merge the results and have better accuracy with faster classification speed. > > SBPH generates 16 features for each word in the text, combining it with > the 4 previous ones (5 words window): > > w1 w2 w3 w4 w5 => > > - - - - w5 > - - - w4 w5 > - - w3 - w5 > - - w3 w4 w5 > - w2 - - w5 > - w2 - w4 w5 > - w2 w3 - w5 > - w2 w3 w4 w5 > w1 - - - w5 > w1 - - w4 w5 > w1 - w3 - w5 > w1 - w3 w4 w5 > w1 w2 - - w5 > w1 w2 - w4 w5 > w1 w2 w3 - w5 > w1 w2 w3 w4 w5 > > Clearly, there are many features that are contained in others. Bill's > intention, as far as I understood, was exactly that to get some kind of > non-linearity and obtain greater discriminating power. > > I tried a different approach, since accuracy is already good enough for > practical purposes, and considered only word pairs with a common word > inside the window, instead of all combinations of the first four, plus > the last one. The idea behind this approach was to gain speed working > only with kind of "prime" features inside the window (more independent > features), hoping that the chain rule would, automatically, consider all > combinations and produce the same final effect. > > So, I used only Word Pairs with a Common Word (WPCW) as below: > > - - - w4 w5 - distance: 0 > - - w3 - w5 - distance: 1 > - w2 - - w5 - distance: 2 > w1 - - - w5 - distance: 3 > > You can "or" them to get any of the 16 combinations in SBPH, except for > the single word combination, which I chose not to use. Closer pairs are > given more credence, so the weights are based on the distance, instead > of on the length like in SBPH/SM. > > Test results: > > I used the same test methodology as described in Bill's Plateau papers: > > - 10 shuffles of 4147 messages from SpamAssassin corpi, classified as: > - 1397 spam (20030228_spam_2.tar.bz2) > - 2500 easy ham (20030228_easy_ham.tar.bz2) > - 250 hard ham (20030228_hard_ham.tar.bz2) > > Other information for the test: > > - Weights for WPCW: 24, 14, 7, 4; > - CSS file size for WPCW: 282.001 buckets ( 3.384.012 bytes) > - CSS file size for SBPH: 1.048.577 buckets (12.582.924 bytes) > - OS: Linux 2.4.20-30.9 - RedHat 9.0 > - CPU: AMD Athlon(TM) XP 2400 > - CRM version: 20040409-BlameMarys > Also, there's room for exploring wider sliding > windows, for better accuracy, since it's faster. > > Note for the more theoretically driven guys on the list: I found those > weights empirically and it's really a pain to look for the best ones > when you don't have a model. I might have just found a local minimum and > would like to have a way to do a smarter search or, even better, to have ----------------------------------------------------------------- "Future Belongs To Those Who Realize The Beauty Of Their Dreams"! Home: http://www.cs.ucr.edu/~schhabra/ Shalendra Chhabra, Graduate Student in Computer Science, University of California, Riverside Riverside, CA, USA |
From: Bill Y. <ws...@me...> - 2004-04-21 23:11:51
|
From: Fidelis Assis <fi...@po...> combining them you can reconstruct the original signal. Is that OK? In that sense, I think they are also similar to prime numbers, because you can "combine" them to get all the others, or to any other set of orthogonal functions, like in Fourier transform for instance. By the way, that was exactly what I was looking for, "basic", "prime", or "orthogonal", features. I leave it for you, mathematicians, to give a proper name. I'm not a mathematician, thou I wanted to be one in my youth. "Orthogonal" ! That's it! That's what's different! SBPH, Markovian, etc use overlapping polynomials(those polynomials that you can add two together and get something like a third polynomial) Orthogonal (or orthonormal) basis functions do NOT have that property- from the orthogonal basis set of functions you _cannot_ add or subtract pairs to get any other member of the basis set - but if you already +have+ the basis set, you can add and subtract them to get all other possible functions. So, "Orthogonal" and "Feature" are both excellent words here. Next? (darn, hit the wrong button- this is not Fidelis) -Bill Yerazunis |
From: Bill Y. <ws...@me...> - 2004-04-21 23:11:21
|
From: Fidelis Assis <fi...@po...> combining them you can reconstruct the original signal. Is that OK? In that sense, I think they are also similar to prime numbers, because you can "combine" them to get all the others, or to any other set of orthogonal functions, like in Fourier transform for instance. By the way, that was exactly what I was looking for, "basic", "prime", or "orthogonal", features. I leave it for you, mathematicians, to give a proper name. I'm not a mathematician, thou I wanted to be one in my youth. "Orthogonal" ! That's it! That's what's different! SBPH, Markovian, etc use overlapping polynomials(those polynomials that you can add two together and get something like a third polynomial) Orthogonal (or orthonormal) basis functions do NOT have that property- from the orthogonal basis set of functions you _cannot_ add or subtract pairs to get any other member of the basis set - but if you already +have+ the basis set, you can add and subtract them to get all other possible functions. So, "Orthogonal" and "Feature" are both excellent words here. Next? -- Fidelis Assis |
From: Fidelis A. <fi...@po...> - 2004-04-21 21:36:38
|
On Wed, 21 Apr 2004 11:42:13 -0400, Bill Yerazunis wrote > From: Christian Siefkes <si...@mi...> > > On Wed, 21 Apr 2004, Bill Yerazunis wrote: > > Last things first: Is the WPCW speedup due to fewer features > > generated, or a smaller .css file fitting into cache? > > I'm quite sure it's mainly due to the number of generated > features. WPCW generates only 1/4 of the features of SBPH (for a > window size of 5), and run time should be roughly linear to the > number of features because one feature is processed after the > other. I don't think size of the feature cache matters that much > because the algorithm still has to look up each feature (and if it > finds it isn't there it has to add it and discard another). I > tried WPCW with 400 to 700K features, and it took 26min for 400, > 500, and 700K (strangely, for 600K it was 32min, but that was > probably on another computer -- I'm using two for my test runs). > > OK, so we can make it even faster by pushing the window size down. > > That would be very useful (guess why... :)) Hmmm, you mean small gadgets (pdas, cell phones, etc)? > > > I'm going ahead and implementing the word-pairs features. > However, > I'm a bit troubled by the name <wpcw> > > I seem > to recall that Walsh-Hadamard transforms looked sorta like > those > features (specifically, diagonalized Walsh-Hadamard). > > Does > that make any sense? Currently I'm using the classify/learn > > keyword <walsh> to indicate "not <markov>" but I'm not sure if > that's > accurate enough or not... any preferable suggestions? > Should I stay > with <walsh> or use <wpcw> (Word Pairs Context > Window)? Easy to > change now... hard to change later. Yes, much easier to change now :) > > I suppose "Word Pairs Context Window" is clearer than "Word Pairs > with a Common Word", but both sound somewhat strange. I thought > about "sparse bigrams", since bigram is the scientific term for > word pair (but then we still need a handy acronym). What do you > think, Fidelis? > > It's not just sparse bigrams; it's sparse separated bigrams. Yes, were getting closer :) > > Question: in the WPCW implementation, do you care what the separation > distance is? Yes, the polynomial coefficients are exactly those you defined, specific for each position in the window, such that the sequence "w w" is different from "w _ w". Also, they have different weights, based on the distance. (i.e. does the feature "Random Model" appear in "Markov > Random Field Model with Variable Weighting Schemas"? Or does the > feature "Random <skip> Model" appear instead?) > > > But what further can be done? Well, clearly you can add layers > to the > perceptron network; a particularly nice implementation is > the Hopcroft > network which is a rectangular system that allows > feedback from > arbitrary summations back into the network. > > Can you provide pointers to literature about Hopcroft networks and > diagonalized Walsh-Hadamard? Need to catch up before I can > discuss this... Me too :) > > Well, if I recall correctly (no warranty!) the Hadamard matrix is a [...] > > It's not exactly diagonalized Walsh-Hadamard, but it seems close enough > that we should pay attention. I googled for Walsh and Hadamard and found the address http://www.ciphersbyritter.com/RES/WALHAD.HTM#Beer81. After browsing a bit, I read one paragrah that probably explains the similarities you've seen: "Walsh functions are an orthogonal set of square-wave functions that arise when dealing with digital data. The Walsh transform and inverse Walsh transform are easy to calculate by hand, and can be very quickly done on digital computers. Examples of the uses of Walsh transform include . . . the rapid solution of nonlinear differential equations." If I understood well what you mean, the half "disjoint" word pairs in WPCW would be similar to those orthogonal functions. By properly combining them you can reconstruct the original signal. Is that OK? In that sense, I think they are also similar to prime numbers, because you can "combine" them to get all the others, or to any other set of orthogonal functions, like in Fourier transform for instance. By the way, that was exactly what I was looking for, "basic", "prime", or "orthogonal", features. I leave it for you, mathematicians, to give a proper name. I'm not a mathematician, thou I wanted to be one in my youth. -- Fidelis Assis |
From: Fidelis A. <fi...@po...> - 2004-04-21 20:31:10
|
On Wed, 21 Apr 2004 16:59:48 +0200 (MEST), Christian Siefkes wrote > Bill: > > On Wed, 21 Apr 2004, Bill Yerazunis wrote: > > Last things first: Is the WPCW speedup due to fewer features > > generated, or a smaller .css file fitting into cache? > > I'm quite sure it's mainly due to the number of generated features. WPCW > generates only 1/4 of the features of SBPH (for a window size of 5), > and run time should be roughly linear to the number of features > because one feature is processed after the other. I don't think size > of the feature cache matters that much because the algorithm still > has to look up each feature (and if it finds it isn't there it has > to add it and discard another). I tried WPCW with 400 to 700K > features, and it took 26min for 400, 500, and 700K (strangely, for > 600K it was 32min, but that was probably on another computer -- I'm > using two for my test runs). Agree, completely. > > > I'm going ahead and implementing the word-pairs features. However, > > I'm a bit troubled by the name <wpcw> > > > > I seem to recall that Walsh-Hadamard transforms looked sorta like > > those features (specifically, diagonalized Walsh-Hadamard). > > > > Does that make any sense? Currently I'm using the classify/learn > > keyword <walsh> to indicate "not <markov>" but I'm not sure if that's > > accurate enough or not... any preferable suggestions? Should I stay > > with <walsh> or use <wpcw> (Word Pairs Context Window)? Easy to > > change now... hard to change later. > > I suppose "Word Pairs Context Window" is clearer than "Word Pairs > with a Common Word", but both sound somewhat strange. I agree, they sound strange, at least until we get used to them ;) > I thought > about "sparse bigrams", since bigram is the scientific term for word > pair (but then we still need a handy acronym). What do you think, Fidelis? I like that, "sparse bigrams". Perhaps adding something more to better describe the idea we'll have a good name. I started looking for "disjoint" word sequences inside the window that could be combined to represent all possible features, that is, kind of "prime" features - like "red", "hard", "round", etc and not "red and hard", "hard and round" to describe an apple - and ended up with something like half disjoint word pairs, with varying distance. Back to the name problem, I must admit that it is hard to think of a good name, even in Portuguese! (I'm brazilian.) -- Fidelis Assis |
From: Fidelis A. <fi...@po...> - 2004-04-21 19:51:22
|
On Wed, 21 Apr 2004 10:01:26 -0400, Bill Yerazunis wrote > Fidelis, Christian, Shalen: > > [excuse the rambling - I'm doing a brain dump so that I can clarify > things to both myself and others] > > Last things first: Is the WPCW speedup due to fewer features > generated, or a smaller .css file fitting into cache? Classification time, as implemented in your code, is a direct function of the number of features. So, the less features you have the faster classification you get. > > It does matter (in an engineering sense), so is there any way to > determine this? If you ran with a 1-megaslot (12-megabyte) .css > file, do you still have the same speedup in WPCW ? CSS files size also play an important role, as I observed that WPCW was faster with 282001 buckets than with 1048577, being faster than SBPH in both cases. SBPH is also faster with smaller CSS files. There's a balance, of course, because if microgroom starts to be invoked more often, you'll get a slow down. > > ----- > > I'm going ahead and implementing the word-pairs features. However, > I'm a bit troubled by the name <wpcw> I'm also troubled by that name, but because of other reasons: as a non native english speaker, it's hard for me to choose apropriated names in your language. What I tried with that name was to describe the method in the most obvious and simple way, after all, they are just word pairs with a common word. But I recognise that there must be better names, closer to the related scientific literature. With the help of the native speakers on the list, as well as that of more experienced and theoretically driven guys, I hope we'll get a nicer name! > > I seem to recall that Walsh-Hadamard transforms looked sorta like > those features (specifically, diagonalized Walsh-Hadamard). > > Does that make any sense? None for me. I've never heard about Haddamard or Walsh before. Do you have pointers? > Currently I'm using the classify/learn > keyword <walsh> to indicate "not <markov>" but I'm not sure if that's > accurate enough or not... any preferable suggestions? Should I stay > with <walsh> or use <wpcw> (Word Pairs Context Window)? Easy to > change now... hard to change later. Walsh?! Well, I prefer WPCW ;), but as I said, that may not be the best one and I'm open to suggestions. > > In any case, since there _are_ people who need more blazing speed > than they need absolute accuracy, <walsh> or <wpcw> or whatever will > be in the next major release. :) Nice! -- Fidelis Assis |
From: Fidelis A. <fi...@po...> - 2004-04-21 18:54:22
|
On Wed, 21 Apr 2004 14:08:44 +0200 (MEST), Christian Siefkes wrote > Hi Fidelis: > > On Mon, 19 Apr 2004, Fidelis Assis wrote: > > > I'd really like to try your idea with my algorithm so I'm interested in > > > your code as well. Would you allow me to incorporate it into my software > > > <http://www.inf.fu-berlin.de/inst/ag-db/software/ties/> using the LGPL > > > license? (I cannot use the GPL due to projects I'm involved with.) > > > > For sure! You can use it as you like. I'll be glad if it helps in any > > way. Patch attached. > > Thanks! In the end I didn't use any of your code because it would > have been harder to translate it to my setup and Java than to > recode. I suspected that it was going to be that way :)... and that the code would be usefull just for removing ambiguities in my first informal descripton. But I used your idea and that works just beautifully!! Good, I'm really glad with that! > > Previously I had brought the number of errors from 19 to 18 by using > a refined XML+header-aware tokenization pattern (more below). And by > switching from SBPH with 1,600,000 features to your WPCW with 600, > 000 features I got further down to 16! (For comparision: a full one-million > bucket CSS files hosts about 2.5M features, your 280K buckets should > translate in 700K features, so that's roughly comparable.) > > 600K vs. 1.6M features might seem somewhat unfair for SBPH, because WPCW > generates just 1/4 of the features. But actually it isn't since I > also tried SBPH with more [2M or 2.4M] features and that didn't increase > performance, indeed the peak was at 1.6M. > > Just as important, it brought run time over all 41,500 mails down to > about 30min on an old Pentium III 1266GHz (invoked using nice i.e. > with low priority so as not to disturb the user working on the > machine), while previously it was at least 90-120 minutes! Wow! Your speed up was much greater than mine, and compatible with the reduction in the number of features. Probably because I had to invoke crm for each message from within a perl script, which introduced a great and independent constant time. > > Shalen, Bill, and I are just writing a paper about the Winnow think, > and that's a great opportunity for presenting your WPCW as well. > Since you invented this I would propose you to become a co-author as > well. What do you think? Co-author?! :)) I feel really honoured to receive such an invitation. Just tell me what to do next! > > ---- > Here are the old and new results: > > SBPH+Winnow, 5% threshold thickness, 1.23 promotion, 0.83 demotion, > simplified tokenization (1.6M features, LRU pruning, single pass): > 592 (number of errors on all 10x4147 mails) > 43 (number of errors on the last 10x1000 mails) > 20 (number of errors on the last 10x500 mails) > > SBPH+Winnow on normalized mails (normalizemime), 5% threshold > thickness, > 1.23 promotion, 0.83 demotion, simplified tokenization (1.6M > features, LRU pruning, single pass): 562 (number > of errors on all 10x4147 mails) 39 (number of > errors on the last 10x1000 mails) 19 (number of > errors on the last 10x500 mails) > > SBPH+Winnow on normalized mails (normalizemime), 5% threshold > thickness, > 1.23 promotion, 0.83 demotion, XML+header-aware tokenization (1.6M > features, LRU pruning, single pass): 531 (number > of errors on all 10x4147 mails) 34 (number of > errors on the last 10x1000 mails) 18 (number of > errors on the last 10x500 mails) > > WPCW+Winnow on normalized mails (normalizemime), 5% threshold > thickness, > 1.23 promotion, 0.83 demotion, XML+header-aware tokenization (600K > features, LRU pruning, single pass): 514 (number > of errors on all 10x4147 mails) 33 (number of > errors on the last 10x1000 mails) 16 (number of > errors on the last 10x500 mails) > --- 16 errors! That's really, really impressive, 4 times better than what we have now with CRM! I'll download your code and try to use it in some tests. BTW, do you have pointers to articles which explain Winnow? I'd like to start reading about it. -- Fidelis Assis |
From: Bill Y. <ws...@me...> - 2004-04-21 15:42:23
|
From: Christian Siefkes <si...@mi...> On Wed, 21 Apr 2004, Bill Yerazunis wrote: > Last things first: Is the WPCW speedup due to fewer features > generated, or a smaller .css file fitting into cache? I'm quite sure it's mainly due to the number of generated features. WPCW generates only 1/4 of the features of SBPH (for a window size of 5), and run time should be roughly linear to the number of features because one feature is processed after the other. I don't think size of the feature cache matters that much because the algorithm still has to look up each feature (and if it finds it isn't there it has to add it and discard another). I tried WPCW with 400 to 700K features, and it took 26min for 400, 500, and 700K (strangely, for 600K it was 32min, but that was probably on another computer -- I'm using two for my test runs). OK, so we can make it even faster by pushing the window size down. That would be very useful (guess why... :)) > I'm going ahead and implementing the word-pairs features. However, > I'm a bit troubled by the name <wpcw> > > I seem to recall that Walsh-Hadamard transforms looked sorta like > those features (specifically, diagonalized Walsh-Hadamard). > > Does that make any sense? Currently I'm using the classify/learn > keyword <walsh> to indicate "not <markov>" but I'm not sure if that's > accurate enough or not... any preferable suggestions? Should I stay > with <walsh> or use <wpcw> (Word Pairs Context Window)? Easy to > change now... hard to change later. I suppose "Word Pairs Context Window" is clearer than "Word Pairs with a Common Word", but both sound somewhat strange. I thought about "sparse bigrams", since bigram is the scientific term for word pair (but then we still need a handy acronym). What do you think, Fidelis? It's not just sparse bigrams; it's sparse separated bigrams. Question: in the WPCW implementation, do you care what the separation distance is? (i.e. does the feature "Random Model" appear in "Markov Random Field Model with Variable Weighting Schemas"? Or does the feature "Random <skip> Model" appear instead?) > But what further can be done? Well, clearly you can add layers to the > perceptron network; a particularly nice implementation is the Hopcroft > network which is a rectangular system that allows feedback from > arbitrary summations back into the network. Can you provide pointers to literature about Hopcroft networks and diagonalized Walsh-Hadamard? Need to catch up before I can discuss this... Well, if I recall correctly (no warranty!) the Hadamard matrix is a recursive definition that generates a provably orthogonal matrix of any size 2^I. The recursive formula for a Hadamard matrix is: H(size=2n) = the_matrix { H(size=n) H(size=n) H(size=n) -H(size=n) } and you start out with the definition of H(size=1) ::= { 1 } which makes a 2x2 Hadamard look like: 1 1 1 -1 and a 4x4 Hadamard looks like: 1 1 1 1 1 -1 1 -1 1 1 -1 -1 1 -1 -1 1 The Walsh variant (Walsh-Hadamard) redefines "-" ( as used in the lower left corner) as being '-' ::= 1 XOR x and the Walsh-Hadamard 4x4 is then 1 1 1 1 1 0 1 0 1 1 0 0 1 0 0 1 Diagonalized Walsh-Hadamard is just a reshuffle of the rows, then subtraction to renormalize, and yields something like: -3 0 0 0 1 1 0 0 1 0 1 0 1 0 0 1 which is pretty darn close to the WPCW matrix: 1 0 0 0 1 1 0 0 1 0 1 0 1 0 0 1 Walsh-Hadamard fell out of favor with the signal-processing people around the time Shalen was born, as that's when people got floating point hardware and could do FFTs instead. It's not exactly diagonalized Walsh-Hadamard, but it seems close enough that we should pay attention. > If I drop down to 128 slots on the input of the Hopcroft, then > it's only 31 megs... close enough to the current 24 that I can > handwave it away. :) Aren't the current CSS files 12MB each (24MB for both spam+nonspam, but not for one)? So is this for the whole network or for a single class? Oh, yes. Good point. Knew I talk to you guys for a reason. It could go either way. I'm thinking I'd like to keep it separate, so we would retain the "N-way-mix-and-match" decision capability which quite a few users are using. Opinions? -Bill Yerazunis |
From: Christian S. <si...@mi...> - 2004-04-21 15:34:47
|
On Wed, 21 Apr 2004, Christian Siefkes wrote: > another). I tried WPCW with 400 to 700K features, and it took 26min for > 400, 500, and 700K (strangely, for 600K it was 32min, but that was > probably on another computer -- I'm using two for my test runs). Yup. Just re-run it and it took 26min as well. Bye Chris ------------ Christian Siefkes ----------------------------------------- | Email: chr...@si... | Web: http://www.siefkes.net/ | Graduate School in Distributed IS: http://www.wiwi.hu-berlin.de/gkvi/ -------------------- Offline P2P: http://www.leihnetzwerk.de/ ---------- Of course, in Perl culture, almost nothing is prohibited. My feeling is that the rest of the world already has plenty of perfectly good prohibitions, so why invent more? -- Larry Wall |