From: Christian S. <si...@mi...> - 2004-04-13 14:33:29
|
Hi, On Sun, 11 Apr 2004, Bill Yerazunis wrote: > Using 3-pass TUNE, with 1, 3, 13, 75, 531 as weights, I get (and yes, > this is the best ever) 47 errors out of 5000. That handily beats the > previous record of 54, which is superexponential weighting on a TUNE. > > Fascinating- the same set of weights is WORSE in single pass but better > in 3-pass. That's 58 vs. 60 errors in single pass? I suppose it's hard to conclude anything from such a small difference except that they perform very similar... Maybe you could look at an other/extended set, e.g. the last 1000 mails of each batch to see whether the difference will be higher? > Here's the detailed extended report.... and my laptop is literally too > hot to touch near the CPU, so I will let it cool a bit before > I run base7 (that being 1, 7, 49, 343, 2401). ... > SBPH16EM - denomenator 16, Super-Markovian entropic correction > 0 1 1125 > 0 1 56 > SB16-24SM - denom 16, 24 megabyte 512-chain .css, Super-Markovian > 0 1 1132 > 0 1 58 > 16-BCS - denom 16, 512-chain .css, Breyer-Chhabra-Siefkes weights > 0 1 1135 > 0 1 60 ... I've got a new result to add, and I'm quite excited about it: SBPH+Ultraconservative Winnow, 10% threshold, 1.6M features, LRU pruning, single pass: 719 (number of errors on all 10x4147 mails) 38 (number of errors on the last 10x500 mails) This is single-pass (as TOE), not multi-pass (as TUNE), thus it's a decrease in the error rate by about 1/3 (compared to the 56 errors reported by you for Super-Markovian entropic correction)! I did not strictly use CRM114 for these results, but my own Java-based package which I'm developing mainly for information extraction <http://en.wikipedia.org/wiki/Information_extraction> (I now put a preliminary version at http://www.inf.fu-berlin.de/inst/ag-db/software/ties/ ). I started using CRM for my work, but I also looked at other incremental (i.e. single-pass) classification methods and finally developed a combination of the Winnow algorithm (cf. e.g. http://citeseer.ist.psu.edu/article/dagan97mistakedriven.html ) and the "ultraconservative" algorithms introduced in http://citeseer.ist.psu.edu/crammer01ultraconservative.html . I'm also using the "thick threshold" heuristic of dagan97mistakedriven i.e. I train not only on errors but also on "almost-errors" when the scores of other classes are slightly lower than the true score. For feature preprocessing, I combined this classifier with (a primitive but usable re-implementation of) CRM's sparse binary polynomial hashing. For tokenization I used the same pattern as CRM. For the reported result I've used 1.6M features. CRM's CSS files contain 1M buckets by default, but if I can trust cssutil and cssdiff a full (.097%) CSS file stores about 2.5-2.6M hashed datums=features -- so it should be a fair comparison or did I get this wrong? Another important difference is that I use LRU (least recently used) pruning instead of microgrooming. While microgrooming more-or-less randomly deletes a feature if the store is full (or so I understand) I delete the least recently seen feature (all other features where encountered after the victim). Bye Christian ------------ Christian Siefkes ----------------------------------------- | Email: chr...@si... | Web: http://www.siefkes.net/ | Graduate School in Distributed IS: http://www.wiwi.hu-berlin.de/gkvi/ -------------------- Offline P2P: http://www.leihnetzwerk.de/ ---------- Those who would give up essential liberty, to purchase a little temporary safety, deserve neither liberty nor safety. -- Benjamin Franklin |