Menu

IMAP classification options

Extensions
David Lang
2004-12-09
2013-04-15
  • David Lang

    David Lang - 2004-12-09

    I recently purged my inbox which had tons of deleted messages and a several unclassified messages. I moved the unclassified messages to where they belong, but I realized that this trains popfile on all of these messages.

    my understanding of how the 'experts' train bayes filters is that they only train them enough to properly classify the messages, but try to avoid training beyond that point.

    so this brings up a couple questions

    should there be an option for the train-on-move that first checks to see if the message would be classified to where it is and only trains on it if it wouldn't be?

    should we create a simple way to tell popfile to re-process an entire folder and move the messages to where they should be classified at, but ignore the fact that it has seen them before and this would be a reclassify?

    the second one can be done with appropriate reconfiguration of the IMAP parameters, but I'm wondering if there should be a simple way through the UI to do this.

    thoughts

     
    • Texas Fett

      Texas Fett - 2004-12-09

      That is the usual method the "experts" have used, but the actual benifit isn't really known, some I think have given up on being really strict about it.  I use that method probably only half the time.  Its too time consuming to check every message.

      Used to it was necessary because if the corpus got too big classification got really slow.  Since everything is now in the database it can be accessed very fast.  For speed, the size of the corpus almost doesn't matter now.

      According to my tests, Train Always (TA) is usually a slight bit more accurate than Train Only Errors (TOE).  So more words in the corpus isn't going to hurt accuracy.
      http://popfile.sourceforge.net/cgi-bin/wiki.pl?Glossary/TOE

      Your idea does still sound good to me anyway.  There isn't a need to reclassify all those messages and even though it isn't hurting anything, it takes time to do all that reclassification and it adds lots of un-needed words to the corpus.

       
      • David Lang

        David Lang - 2004-12-09

        actually it sounds like there are three common training methods

        1. train always

        2. train on errors

        3. train on errors only if previous training wouldn't have eliminated this error

        currently we do #2 unless the message has been dropped from the history in which case we do no training at all

        I'm wondering if either of the other two are better (and if so which one)

         
    • Manni

      Manni - 2004-12-09

      Hi David!

      Sounds like you too keep your history around for as long as possible. Right?

      On a typical installation, the history is only kept for two days. So if you clean up your inbox every few weeks, most of the messages will have expired and thus won't be reclassified when moved.

      I understand your point, though. UI-wise, this could be done with a single button. But the functionality behind the button would be a little tricky. I doubt that simply reseting the UIDNEXT value to 1 for the INBOX would do the trick. Or would it?

      Manni

       
    • James E Lang

      James E Lang - 2004-12-18

      David,

      You did not say but I am gathering that you do not have a separate folder for unclassified but rather leave it in INBOX. Am I correct? I am also guessing that the folder to which you moved the unclassified messages was one of your bucket folders.

      If you have a folder for unclassified you don't have a need to move those messages except for the purpose of classifying them. This sounds like it would have accomplished what you had in mind. Another way to accomplish a move without training is to have a folder about which you never tell POPFile and move them to that folder.

      Maybe I'm missing your point?

      --
      Jim

       
      • James E Lang

        James E Lang - 2004-12-18

        I'm sorry, I misread your message. Sure, you trained on those messages. I'm not sure that's really a problem but I see what you're saying.

        --
        Jim

         

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.