I recently purged my inbox which had tons of deleted messages and a several unclassified messages. I moved the unclassified messages to where they belong, but I realized that this trains popfile on all of these messages.
my understanding of how the 'experts' train bayes filters is that they only train them enough to properly classify the messages, but try to avoid training beyond that point.
so this brings up a couple questions
should there be an option for the train-on-move that first checks to see if the message would be classified to where it is and only trains on it if it wouldn't be?
should we create a simple way to tell popfile to re-process an entire folder and move the messages to where they should be classified at, but ignore the fact that it has seen them before and this would be a reclassify?
the second one can be done with appropriate reconfiguration of the IMAP parameters, but I'm wondering if there should be a simple way through the UI to do this.
thoughts
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
That is the usual method the "experts" have used, but the actual benifit isn't really known, some I think have given up on being really strict about it. I use that method probably only half the time. Its too time consuming to check every message.
Used to it was necessary because if the corpus got too big classification got really slow. Since everything is now in the database it can be accessed very fast. For speed, the size of the corpus almost doesn't matter now.
Your idea does still sound good to me anyway. There isn't a need to reclassify all those messages and even though it isn't hurting anything, it takes time to do all that reclassification and it adds lots of un-needed words to the corpus.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sounds like you too keep your history around for as long as possible. Right?
On a typical installation, the history is only kept for two days. So if you clean up your inbox every few weeks, most of the messages will have expired and thus won't be reclassified when moved.
I understand your point, though. UI-wise, this could be done with a single button. But the functionality behind the button would be a little tricky. I doubt that simply reseting the UIDNEXT value to 1 for the INBOX would do the trick. Or would it?
Manni
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You did not say but I am gathering that you do not have a separate folder for unclassified but rather leave it in INBOX. Am I correct? I am also guessing that the folder to which you moved the unclassified messages was one of your bucket folders.
If you have a folder for unclassified you don't have a need to move those messages except for the purpose of classifying them. This sounds like it would have accomplished what you had in mind. Another way to accomplish a move without training is to have a folder about which you never tell POPFile and move them to that folder.
Maybe I'm missing your point?
--
Jim
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I recently purged my inbox which had tons of deleted messages and a several unclassified messages. I moved the unclassified messages to where they belong, but I realized that this trains popfile on all of these messages.
my understanding of how the 'experts' train bayes filters is that they only train them enough to properly classify the messages, but try to avoid training beyond that point.
so this brings up a couple questions
should there be an option for the train-on-move that first checks to see if the message would be classified to where it is and only trains on it if it wouldn't be?
should we create a simple way to tell popfile to re-process an entire folder and move the messages to where they should be classified at, but ignore the fact that it has seen them before and this would be a reclassify?
the second one can be done with appropriate reconfiguration of the IMAP parameters, but I'm wondering if there should be a simple way through the UI to do this.
thoughts
That is the usual method the "experts" have used, but the actual benifit isn't really known, some I think have given up on being really strict about it. I use that method probably only half the time. Its too time consuming to check every message.
Used to it was necessary because if the corpus got too big classification got really slow. Since everything is now in the database it can be accessed very fast. For speed, the size of the corpus almost doesn't matter now.
According to my tests, Train Always (TA) is usually a slight bit more accurate than Train Only Errors (TOE). So more words in the corpus isn't going to hurt accuracy.
http://popfile.sourceforge.net/cgi-bin/wiki.pl?Glossary/TOE
Your idea does still sound good to me anyway. There isn't a need to reclassify all those messages and even though it isn't hurting anything, it takes time to do all that reclassification and it adds lots of un-needed words to the corpus.
actually it sounds like there are three common training methods
1. train always
2. train on errors
3. train on errors only if previous training wouldn't have eliminated this error
currently we do #2 unless the message has been dropped from the history in which case we do no training at all
I'm wondering if either of the other two are better (and if so which one)
Hi David!
Sounds like you too keep your history around for as long as possible. Right?
On a typical installation, the history is only kept for two days. So if you clean up your inbox every few weeks, most of the messages will have expired and thus won't be reclassified when moved.
I understand your point, though. UI-wise, this could be done with a single button. But the functionality behind the button would be a little tricky. I doubt that simply reseting the UIDNEXT value to 1 for the INBOX would do the trick. Or would it?
Manni
David,
You did not say but I am gathering that you do not have a separate folder for unclassified but rather leave it in INBOX. Am I correct? I am also guessing that the folder to which you moved the unclassified messages was one of your bucket folders.
If you have a folder for unclassified you don't have a need to move those messages except for the purpose of classifying them. This sounds like it would have accomplished what you had in mind. Another way to accomplish a move without training is to have a folder about which you never tell POPFile and move them to that folder.
Maybe I'm missing your point?
--
Jim
I'm sorry, I misread your message. Sure, you trained on those messages. I'm not sure that's really a problem but I see what you're saying.
--
Jim