question on reportings

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I recently subscribed to this list not because I'm a pyzor user (yet) 
but I'm trying to understand how it works.

I was trying to find out how this email tagging and reporting system 
works, in general, from the razor-agent mailing list, but they have a 
rather closed door.  Because of that and the apparent attentions paid to 
the commercial product, at the expense of their GPLed product, it leaves 
me nonplussed.

So here I am and here's my question:

 From what I can find, in general the reporting of spam consistes of 
turning the BODY of the message into a MD5 type hash string and 
reporting that signature.

So I played with it a bit and from my 3780 spams in my archive, I found 
90 of them actually held my name and/or email address (or some part 
thereof).  This was the only quick way that I could see if they actual 
BODY had been customized for my delivery (aren't they thoughtful!). 
This works out to ~0.2% of my received spam.

I did not do anything to strip HTML or MIME-decode or uuencoding or 
anything like that.  A message in any form would still hash to a unique 
ID and I wasn't trying to be that exact.

So, I guess my question is, how do you compensate reporting spam that 
has a unique tag included in the body to still provide some degree of 
spam identification that is worth sharing?