JunkMatcher / Discussion / Help: Minimizing whitelist growth

Ward Clark - 2007-02-28

I've been running JunkMatcher 1.6.1 since August 06, and I finally hit the 200-line limit on the whitelist, causing python to consume available CPU cycles.

As recommended, I manually trimmed my whitelist back to about 140 lines, manually adding numerous email address to my Address Book. While I was performing this tedious task, it occurred to me that I might benefit from changing my Junk folder scanning process.

Until now, I've been scanning my Junk folder, dragging legitimate messages into my Inbox, and reading them there. I delete a good percentage of these messages after reading them. As I understand it, dragging to my Inbox adds to JunkMatcher's whitelist.

My hope is to find a technique that minimizes additions to the whitelist. For example, while I'm scanning my Junk folder, I could open a legitimate message, add "sender" to my Address Book, and delete the message. My hope is that this would (1) not touch the whitelist, and (2) cause future messages from "sender" to show up in my Inbox.

I realize I could experiment to find out if this technique works. However, most of my falsely classified junk are incidental messages from senders who won't send another message for weeks or months.

Insight from experienced JunkMatcher user will be appreciated.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- WitLi - 2007-08-15
  
  are you aware, that you can use brackets () and pipes | in order to do an "or" function with regex?
  
  so, if your white list is full, you COULD compact it tenfold or more by putting addresses in brackets:
  
  before:
  Tina@Tina.com
  Petra@Petra.com
  Peter@Peter.com
  
  now:
  (Tina@Tina.com|Petra@Petra.com|Peter@Peter.com)
  
  (yeah, some of those characters still have to be escaped)
  
  have fun!
  
  WitLi
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Ross Barkman - 2008-02-22
  
  I've set up groups, like so:
  
  A.com: (?i).+@(?:replies\.admiral|airberlin|airberlinmail|allume|apani|asiarooms)\.com
  B.com: (?i).+@(?:noreply\.bebo|contact\.britishairways|comms\.bt)\.com
  C.com: (?i).+@(?:cclondon|cd\-wow|computerweekly|cvent|cw)\.com
  
  That keeps the expression length shorter - there is a maximum length (can't remember what it is), so not repeating ".com" helps.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Minimizing whitelist growth

Forums

Help

Minimizing whitelist growth document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Minimizing whitelist growth