Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#55 Create an unique but determined signature

open
nobody
None
5
2010-01-25
2010-01-25
Enrico Scholz
No

I am going to use 'dspam' from within a milter where a message is often checked for multiple recipients. Currently, every recipient will get another signature and message must be cloned for every recipient to add the corresponding signature. Things are even worse when two recipients are resolved to the same user by external lookup. Then, message is processed twice by 'dspam' but user is told only one signature.

Hence, it would be nice when a message creates always the same signature. Returning a SHA1 based HMAC code (see RFC 2104) of the message and a configurable secret key would be perfect.

Discussion

  • Stevan Bajic
    Stevan Bajic
    2010-02-19

    Hallo Enrico,

    what issue are you expecting to solve with one unique signature per message? The current database schema can not attach multiple UID's to one signature.

    --
    Kind Regards from Switzerland,

    Stevan Bajić

     
  • Enrico Scholz
    Enrico Scholz
    2010-02-19

    I want that users can force dspam to relearn a message. Relearning requires knowledge about the signature but because the signature is different for every recipient it can not be added to the e-mail headers. Hence, there is no way how users can relearn a message.

     
  • Stevan Bajic
    Stevan Bajic
    2010-02-19

    Why can a DSPAM signature (as it is today) not be added into the headers? Each user has normally in DSPAM his on storage and training/retraining with a signature is going to switch tokens for the user.

    If you want the signature to stay persistent per message then just use one DSPAM user to classify/process the message and then deliver from your Milter to each user, adding the same DSPAM signature to the header and set your training alias to be executed under the DSPAM user you used when classifying/processing the message. Or use something like shared groups in DSPAM.

    If I understand you right then your goal is to have just one signature per mail and you don't care if inside the DSPAM database the data is saved multiple times (for each user once) as long as the signature stays the same. Right? Adding something like that could be possible but stuff like UID in signature would then not work.

     
  • Enrico Scholz
    Enrico Scholz
    2010-02-19

    > Why can a DSPAM signature (as it is today) not be added into the headers?

    All local recipients which were given as RCPT: will get exactly the
    same e-mail (header + body). Per-user signatures violate this.

    > then deliver from your Milter to each user

    Milter do not deliver mails but process mails while the MTA receives
    them (e.g. 'dspam' within the milter classifies mail before MTA gives
    the final response to DATA ('220 OK' or reject due to spamminess)).

    > set your training alias to be executed under the DSPAM user you
    > used when classifying/processing the message. Or use something like
    > shared groups in DSPAM.

    afair, one of 'dspam' basic ideas is that spam filtering should be
    applied per user.

    > If I understand you right then your goal is to have just one signature
    > per mail and you don't care if inside the DSPAM database the data is
    > saved multiple times (for each user once) as long as the signature
    > stays the same. Right?

    afaik, 'dspam' stores the set of tokens within an e-mail at a place
    which is associated with the signature. This set of tokens depends
    only on the e-mail but not the recipients, doesn't it?

    E.g. the set of tokens for the e-mail sent as

    | MAIL FROM <postmaster@example.com>
    | 220 OK
    | RCPT TO: <foo@example.com>
    | 220 OK
    | RCPT TO: <bar@example.com>
    | 220 OK
    | DATA
    | Subject: ...
    |
    | Some message
    | .
    | 220 OK

    will be the same for 'foo@example.com' and for 'bar@example.com'.

    Each element of this set of tokens will be inserted into a user
    specific database and spam/innocent counters be incremented.

    For retraining, the signature is used to lookup the set of tokens and
    the counters in the user database will be reverted/corrected.

    Hence, there are two datasets: the tokens which are common for all
    recipients and the classification of the tokens which is user specific.

     
  • Stevan Bajic
    Stevan Bajic
    2010-02-20

    > All local recipients which were given as RCPT: will get exactly the
    > same e-mail (header + body). Per-user signatures violate this.
    >
    That is not always the case. I for example have always a "Delivered-To" header in all mail that I get and this is for not the same for every recipient of a mail.

    > Milter do not deliver mails but process mails while the MTA receives
    > them (e.g. 'dspam' within the milter classifies mail before MTA gives
    > the final response to DATA ('220 OK' or reject due to spamminess)).
    >
    Okay

    > afair, one of 'dspam' basic ideas is that spam filtering should be
    > applied per user.
    >
    Not much things in DSPAM are a must. You can but you don't need to.

    > afaik, 'dspam' stores the set of tokens within an e-mail at a place
    > which is associated with the signature.
    >
    AND an DSPAM user ID.

    > This set of tokens depends
    > only on the e-mail but not the recipients, doesn't it?
    >
    What do you mean with that? I don't understand. Can you rephrase this?

    > E.g. the set of tokens for the e-mail sent as
    > | MAIL FROM <postmaster@example.com>
    > | 220 OK
    > | RCPT TO: <foo@example.com>
    > | 220 OK
    > | RCPT TO: <bar@example.com>
    > | 220 OK
    > | DATA
    > | Subject: ...
    > |
    > | Some message
    > | .
    > | 220 OK
    >
    > will be the same for 'foo@example.com' and for 'bar@example.com'.
    >
    No. It will not be the same. The reason why it might be different is the whitelisting feature of DSPAM. The bigger part of the tokens will be the same but whitelisting can result in a bunch of tokens being diferent for foo then for bar.

    > Each element of this set of tokens will be inserted into a user
    > specific database and spam/innocent counters be incremented.
    >
    Definitely not. Assume the mail is HAM and assume that you run something else then TEFT and assume that the mail for foo is correctly classified as HAM and assume that the mail is classified as SPAM for bar and assume that foo does not retrain the message as SPAM and assume that bar is retraining the message as HAM then only the tokens for bar will be modified. For foo nothing changes in his token set.

    > For retraining, the signature is used to lookup the set of tokens and
    > the counters in the user database will be reverted/corrected.
    >
    This is not true.

    1) One could run DSPAM in pristine mode then the tokens are not saved in dspam_signature_data (assuming you use a SQL based backend in DSPAM).

    2) Assume you don't run pristine mode then the degenerated mail can be found in dspam_signature_data. This does not need to be necessarily whole mail. It could easy be that you have set your database to only allow 4MB of data in dspam_signature_data and assume the whole mail was 8MB then when you retrain DSPAM is going to read the degenerated mail from dspam_signature_data (but only the first 4MB) and then it is using that data and TOKENIZING it and those tokens are then switched/added in dspam_token_data.

    > Hence, there are two datasets: the tokens which are common for all
    > recipients and the classification of the tokens which is user specific.
    >
    This is not 100% true. You forget pristine mode. And since this is an option you can turn on/off on a per user basis (if you use preference extension) you can't say with 100% sureness (from outside DSPAM) that user foo AND bar will have their (common) dataset in dspam_signature_data.