To clarify (I've done this on the list already, but for anyone
else reading this):
The "slow clues" on the web interface shows all of the clues
that were used in scoring the message. This does not mean
all *tokens*. Tokens that have a probability of 0.4 to 0.6 are
not used in scoring (by default), and only 150 (by default) are
used in total.
What I think I'll do (for 1.1a1) is add a new option: "show
tokens", which shows all the tokens. I'll also update
the 'show clues' list to show not just the word and
probability, but also the number of (trained) ham & spam
messages it has appeared in, a al the Outlook plug-in.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Just in case anyone else is thinking of implementing this:
I've finished writing this all up. There's now a button to show
clues and one to show tokens. Both of them show the word,
the prob, and the ham and spam count. The clues button
shows both the current prob & clues and the original ones, if
they are available.
I'll check this in once the 1.0 branch is created. It'll appear in
the first 1.1 release.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Logged In: YES
user_id=552329
To clarify (I've done this on the list already, but for anyone
else reading this):
The "slow clues" on the web interface shows all of the clues
that were used in scoring the message. This does not mean
all *tokens*. Tokens that have a probability of 0.4 to 0.6 are
not used in scoring (by default), and only 150 (by default) are
used in total.
What I think I'll do (for 1.1a1) is add a new option: "show
tokens", which shows all the tokens. I'll also update
the 'show clues' list to show not just the word and
probability, but also the number of (trained) ham & spam
messages it has appeared in, a al the Outlook plug-in.
Logged In: YES
user_id=552329
Just in case anyone else is thinking of implementing this:
I've finished writing this all up. There's now a button to show
clues and one to show tokens. Both of them show the word,
the prob, and the ham and spam count. The clues button
shows both the current prob & clues and the original ones, if
they are available.
I'll check this in once the 1.0 branch is created. It'll appear in
the first 1.1 release.
Logged In: YES
user_id=552329
This is now in cvs.