[Tess-developers] TheSpamSecretary COOKBOOK.html,NONE,1.1 README,1.1,1.2 COOKBOOK,1.1,NONE
Brought to you by:
kwerle
|
From: <kw...@us...> - 2003-01-08 05:34:02
|
Update of /cvsroot/tess/TheSpamSecretary
In directory sc8-pr-cvs1:/tmp/cvs-serv23739
Modified Files:
README
Added Files:
COOKBOOK.html
Removed Files:
COOKBOOK
Log Message:
Wrote the COOKBOOK.html.
--- NEW FILE: COOKBOOK.html ---
<html>
<body>
<title>
TheSpamSecretary Cookbook
</title>
<center>
<h3>
TheSpamSecretary Cookbook
</h3>
</center>
So, you don't delete your read email? <a href="#procmail">Or you use procmail</a>, or some other MDA that expects filter agents to pipe to stdout and return an error condition? Or maybe <a href="#maildir">you use maildir?</a><br>
<br>
No problem.<br>
<pre>TheSpamSecretary.py --help</pre>
will display the command options available. Most commands may be run in one invocation, like the "standard forward" of<br>
<pre>"|TheSpamSecretary.py --filter --addgood=<path to deleted mail> --addbad=<path to deleted spam> --delete=1"
</pre>
Which will filter the email coming in from standard in and deliver it to either the inbox, or the spam box. It will also parse the contents at the deleted mail and deleted spam boxes and truncate those files.<br>
<br>
But you're here because that doesn't cover your use case.<br>
<br>
For the purposes of TheSpamSecretary, mail usage falls into 4 categories:
<ol>
<li><a href="#normal">You usually read and delete your mail as you're done with it.</a>
<li><a href="#hoard">You NEVER delete ANY mail.</a>
<li><a href="#save">You ONLY delete spam.</a>
<li><a href="#mixed">You always delete spam and delete some mail.</a>
</ol>
<a name="normal"><b>1. You delete mail.</b></a><br>
This is the basic use case. Just let TheSpamSecretary consume your delete boxes and you're set. <br>
<br>
<a name="hoard"><b>2. You hoard ALL mail.</b></a><br>
It is important that you never delete any mail, otherwise skip to #4. Otherwise you may start to mislabel some words that are generally only found in spam. This often is the case if you subscribe to commercial announcements from only a few companies. These often look a lot like spam, and if you delete them, you're at risk of not recognizing these words as valid. If this matches your use case, you're set - as long as you sort out your spam. <br>
<br>
Your forward should look like this: <br>
<pre>"|TheSpamSecretary.py --filter"
</pre>
On a regular basis you should update your dictionaries with a cronjob. What you need to do is delete and regenerate them. Something like: <br>
<pre>rm ~/.TheSpamSecretary.gooddict; TheSpamSecretary.py --addgood=<path to good box> --delete=0
rm ~/.TheSpamSecretary.baddict; TheSpamSecretary.py --addbad=<path to bad box> --delete=0
</pre>
Note that the good box path can be to a directory, in which case ALL the contents of that directory will be parsed (recursively). Note also that you can multiple adds if you want to specify multiple boxes, but you have other files in your mailbox directories. Something like:
<pre>find ~/mail/notspam/ -name "*.mbox" -exec TheSpamSecretary.py --addgood={} --delete=0 \;
</pre>
<a name="save"><b>3. You keep ALL good mail and delete SPAM</b></a><br>
Review #2 - the Mail hoarder - for the warning about NOT discarding ANY valid email. If this suits you, you have a couple of options: <br>
<ol TYPE=a>
<li>Do the same as #2, but truncate the spam file
<li>Constantly truncate/update spam and update the good dict as in #2
</ol>
In either case, you will be updating your good dict on a regular basis. Check out #2, but in general something like a cron job doing<br>
<pre>
rm ~/.TheSpamSecretary.gooddict; TheSpamSecretary.py --addgood=<path to good box> --delete=0
</pre>
For case a.: <br>
You will be updating your spam dict on a regular basis, but you will be truncating that file. Something like a cron job doing: <br>
<pre>
TheSpamSecretary.py --addbad=<path to bad box> --delete=1
</pre>
NOTE that you are NOT deleting the existing baddict, but that your ARE truncating the spam mailbox. <br>
For case b.: <br>
Your forward will look something like this: <br>
<pre>
"|TheSpamSecretary.py --filter --addbad=<path to bad box> --delete=1"
</pre>
This will add words from your VerifiedSpam box and truncate it every time you receive mail. You will also have to update your gooddict as noted above (on a regular basis). <br>
<br>
<a name="mixed"><b>4. You keep most mail</b></a><br>
This is the trickiest to define, as it depends a lot on how you handle your read mail. Assuming you discard spam, you should probably use the forward described in case #3, above: <br>
<pre>
"|TheSpamSecretary.py --filter --addbad=<path to bad box> --delete=1"
</pre>
Maintaining your gooddict is trickier. If you delete some mail, but move most of your read email to a "Read" box, you could go a couple of routes. If you have already accumulated a lot of email, you could just use the standard route of truncating both your good and bad deleted mbox's, and when you start, manually add all your current good email using <br>
<pre>TheSpamSecretary.py --addgood=<path to good box> --delete=0
</pre>
For each of your good boxes. This should probably be good enough to keep TeSS running smoothly indefinately. If you're willing to do a little more work to keep up with valid email that has been moved to your 'Read' box, you could do something like <br>
<pre>cat <path to Read box> >> <path to Read.parsed> TheSpamSecretary.py --addgood=<path to Read box> --delete=1
</pre>
This will append your Read box to a Read.parsed box, then TheSpamSecretary will truncate your Read box. Note that using the cat command may not work for all mailbox formats/imap server/whatever - you should certainly test this before truncating your Read box!<br>
<br>
<center>
<font size=+2> <a name="procmail">So you use Procmail (or some other chaining MDA)</a> </font>
</center>
<br>
If you set your inboxpath in your .TheSpamSecretary.config file to -
<pre>inboxpath = -</pre>
Non spam mail will be piped to stdout instead of delivered to a file. If you set your spamboxpath in your .TheSpamSecretary.config file to -
<pre>spamboxpath = -</pre>
Spam mail will be piped to stdout as well, AND TheSpamSecretary will return an exit code of 1, as opposed to 0 for non-spam. If you can't allow TheSpamSecretary to exit with code 1, I recommend you specify a tempfile for spam delivery and rm -f the file before TheSpamSecretary and cat it after TheSpamSecretary is done.<br>
<br>
<center>
<font size=+2> <a name="maildir">So you use maildir format</a> </font>
</center>
<br>
Hrm. Well, I haven't used maildir since around 1988, though I loved the format. It turns out that the clients I use, and the imapd server I use do not. Yes, I know they are available.<br>
Here's the good news: TheSpamSecretary should work fine with maildir. Here's the bad new: I don't know how it works, exactly. Specifically, I don't know how maildir MDA's work. I assume they are like procmail in that they chain through stdout. You should read the Procmail notes if that is the case. I don't know how you deliver spam to a different maildir than non-spam, either. I'm hoping that the exit status will allow you to do that. I would appreciate feedback from ANYONE willing to test this stuff out!<br>
More good news: TheSpamSecretary handles maildir directories fine, from a reading perspective. If you specify a directory in your TheSpamSecretary commands:<br>
<pre>"|TheSpamSecretary.py --filter --addgood=/home/YOURNAME/mail/Deleted/ --addbad=/home/YOURNAME/mail/VerifiedSpam/ --delete=1"
</pre>
TheSpamSecretary will parse all the files (recursively) in those directories.<br>
<br>
<font color=red>WARNING!!!</font><br>
<b>--delete=1 WILL RECURSIVELY DELETE ALL FILES IN MAILDIR MODE.</b><br>
<font color=red>WARNING!!!</font><br>
<br>
That is, all the files in /home/YOURNAME/mail/Deleted/ and /home/YOURNAME/mail/VerifiedSpam/ <b>WILL BE DELETED. RECURSIVELY.</b> If you do something like soft link your home directory into your Deleted mail directory, TheSpamSecretary will delete all your files. ALL YOUR FILES. Hell, if you soft link / to your Deleted mail directory, TheSpamSecretary will delete YOUR WHOLE SYSTEM (as much as it can, and if your MDA runs as root for some reason (misconfigured, etc), it will DELETE YOUR WHOLE SYSTEM). Is that clear? DELETE YOUR FILES. It MAY BE that you would like to use --delete=0 and clean up the files to be deleted some other way. You should look at the various recipes above for notes on how you could manage your Deleted and VerifiedSpam directories. Probably you want to write a cron script that parses those directories and then deletes them using some maildir command that is smart about not DELETING ALL YOUR FILES because of a symlink. </pre>
</body>
</html>
Index: README
===================================================================
RCS file: /cvsroot/tess/TheSpamSecretary/README,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** README 7 Jan 2003 04:55:19 -0000 1.1
--- README 8 Jan 2003 05:33:59 -0000 1.2
***************
*** 7,11 ****
every time you receive new mail!
! If this does not match your usage pattern, check out the COOKBOOK
To install if you don't use procmail!!!:
--- 7,11 ----
every time you receive new mail!
! If this does not match your usage pattern, check out the COOKBOOK.html
To install if you don't use procmail!!!:
***************
*** 48,52 ****
again and change --delete=0 to --delete=1 . THIS WILL CONSUME YOUR Deleted
MAILBOX AND VerifiedSpam MAILBOX EVERY TIME YOU RECEIVE MAIL. If this is not
! the behavior you want, please check out the COOKBOOK.
If you did not have a store of deleted mail to be consumed, I recommend copying
--- 48,52 ----
again and change --delete=0 to --delete=1 . THIS WILL CONSUME YOUR Deleted
MAILBOX AND VerifiedSpam MAILBOX EVERY TIME YOU RECEIVE MAIL. If this is not
! the behavior you want, please check out the COOKBOOK.html.
If you did not have a store of deleted mail to be consumed, I recommend copying
--- COOKBOOK DELETED ---
|