[Crm114-discuss] New Results (Re: CRM114 Testing: Urgent: Please reply)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi,

On Sun, 11 Apr 2004, Bill Yerazunis wrote:
> Using 3-pass TUNE, with 1, 3, 13, 75, 531 as weights, I get (and yes,
> this is the best ever) 47 errors out of 5000.  That handily beats the
> previous record of 54, which is superexponential weighting on a TUNE.
>
> Fascinating- the same set of weights is WORSE in single pass but better
> in 3-pass.

That's 58 vs. 60 errors in single pass? I suppose it's hard to conclude
anything from such a small difference except that they perform very
similar... Maybe you could look at an other/extended set, e.g. the last
1000 mails of each batch to see whether the difference will be higher?

> Here's the detailed extended report....  and my laptop is literally too
> hot to touch near the CPU, so I will let it cool a bit before
> I run base7 (that being 1, 7, 49, 343, 2401).
...
> SBPH16EM - denomenator 16, Super-Markovian entropic correction
>       0       1    1125
>       0       1      56
> SB16-24SM - denom 16, 24 megabyte 512-chain .css, Super-Markovian
>       0       1    1132
>       0       1      58
> 16-BCS - denom 16, 512-chain .css, Breyer-Chhabra-Siefkes weights
>       0       1    1135
>       0       1      60
...

I've got a new result to add, and I'm quite excited about it:

SBPH+Ultraconservative Winnow, 10% threshold, 1.6M features, LRU pruning, single pass:
                   719  (number of errors on all 10x4147 mails)
                    38  (number of errors on the last 10x500 mails)

This is single-pass (as TOE), not multi-pass (as TUNE), thus it's a
decrease in the error rate by about 1/3 (compared to the 56 errors
reported by you for Super-Markovian entropic correction)!

I did not strictly use CRM114 for these results, but my own Java-based
package which I'm developing mainly for information extraction
<http://en.wikipedia.org/wiki/Information_extraction> (I now put a
preliminary version at
http://www.inf.fu-berlin.de/inst/ag-db/software/ties/ ).

I started using CRM for my work, but I also looked at other incremental
(i.e. single-pass) classification methods and finally developed a
combination of the Winnow algorithm (cf. e.g.
http://citeseer.ist.psu.edu/article/dagan97mistakedriven.html ) and the
"ultraconservative" algorithms introduced in
http://citeseer.ist.psu.edu/crammer01ultraconservative.html . I'm also
using the "thick threshold" heuristic of dagan97mistakedriven i.e. I train
not only on errors but also on "almost-errors" when the scores of other
classes are slightly lower than the true score.

For feature preprocessing, I combined this classifier with (a primitive
but usable re-implementation of) CRM's sparse binary polynomial hashing.
For tokenization I used the same pattern as CRM.

For the reported result I've used 1.6M features. CRM's CSS files contain
1M buckets by default, but if I can trust cssutil and cssdiff a full
(.097%) CSS file stores about 2.5-2.6M hashed datums=features -- so it
should be a fair comparison or did I get this wrong?

Another important difference is that I use LRU (least recently used)
pruning instead of microgrooming. While microgrooming more-or-less
randomly deletes a feature if the store is full (or so I understand)  I
delete the least recently seen feature (all other features where
encountered after the victim).

Bye
	Christian

------------ Christian Siefkes -----------------------------------------
|     Email: chr...@si...    |     Web: http://www.siefkes.net/
|  Graduate School in Distributed IS: http://www.wiwi.hu-berlin.de/gkvi/
-------------------- Offline P2P: http://www.leihnetzwerk.de/ ----------
Those who would give up essential liberty, to purchase a little temporary
safety, deserve neither liberty nor safety.
          -- Benjamin Franklin