POPFile - Automatic Email Classification / Discussion / Open Discussion: A way of achieving a "Corpus attack"?

Olivier Guillion - 2006-04-22

Hi,

There seems to be a way (not used yet, fortunately) for spammers to conduct a corpus attack, i.e. making a significant average of their messages to be misclassified as non-spam, and making the reclassification of such message corrupting the corpus.

This has been discussed off-topic in this thread:
http://sourceforge.net/forum/forum.php?thread_id=1479375&forum_id=213099

Here is the principle of such an attack:

- Most of the common English words are included in any average well-trained corpus

- At least 30% of these words have a greater "non-spam" word count than the "spam" one. This percentage has to be determined more accurately (read below).

- Any non-spam word, if repeated many times (let's say 1000 times) in a message, will make this message be classified as "non-spam". The PopFile accuracy for such messages would then be about 100-30 = 70% only, instead of the usual 99.9%.

- If such a message is reclassified by the PopFile user, it will give the chosen common English word a very high "spam" word count. If it occurs often, this could lead to a corpus corruption, many common terms being considered as strong spam indicators, leading to misclassification of regular non-spam messages.

A solution could be to make Popfile ignore occurrences of a single word in a same message over a given number (let's say 10).

I don't have (yet) a program to check the percentage of common English words that are considered as non-spam in any corpus. But having a precise idea of this number would be of help.

So here is a list of 100 common English words. Please send a message to yourself, containing this list. Then in the Popfile history, count how many of them fall into the "non-spam" bucket, then post your result here.
If somebody knows a way to perform an SQL request to do this automatically, please tell it :)
Any opinion, comment or question is also welcome!

For me, the result is :
- "spam": 52%
- "non-spam": 47%
- unknown in corpus: 1%

Here is the sample word list:
---------------
they one hot word what some other put use how
said each which their time way about many then them
write like these long make thing see two look more
day come number sound most people over know water than
call first who down side now find new work part
take get place made live where after back little only
round man year came show every good give under name
very through just sentence great think say help low line
differ turn cause much mean before move right boy old
too same tell set three want air well play end
---------------

Thanks!

Olivier

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Brian Smith - 2006-04-22
  
  My corpus:
  
  "spam": 21%
  "non-spam": 78%
  unknown in corpus: 1%
  
  Brian
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Wm - 2006-04-22
    
    Hi Olivier and Brian, just so we are all working on the same basis would you explain exactly what you are doing to arrive at these perecentages?
    
    Olivier sent me his list by e-mail and it arrived in my bulk bucket (which I use for wanted newsletters, informative broadcast e-mail, etc).
    
    My PF Decision chart shows it was a close call (922 vs 919) whether it went into the bulk or ok bucket (ok is for more personal stuff). My
    spam bucket rating was 883 and never featured in the decision chart.
    
    The point I am making is that the percentages aren't worth anything unless we each calculate them the same way. I'll have a look to see if I can come up with a sensible SQL query after posting this, that way we could all use the same thing.
    
    It is an interesting experiment and I am not trying to put Olivier off but my gut feeling at the moment is still that Olivier's list (and idea in general) is more appropriate for people that get more English language spam than English language normal e-mail.
    
    --
    Wm ...
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Texas Fett - 2006-04-22
      
      >>more appropriate for people that get more English language spam than English language normal e-mail.<<
      
      Doesn't that include most everyone including those who speak only English? I certainly get more spam in English than I get legitimate mail.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Wm - 2006-04-22
        
        In reply to JosephC:
        
        Yes, most spam is in English, certainly most of the spam I receive is at least an *attempt* at English :) Some less succesful than others.
        
        My main ISP offers me the choice of Brightmail filtering (which I have chosen to use) so a lot of junk is removed before I see it.
        
        My secondary ISP classifies (because I asked them to) in the Subject: things they think spam.
        
        My tertiary ISP doesn't classify at all, but that is a private address.
        
        The point I am making is that words common to Olivier in spam are also very common to me in normal and wanted e-mail and if Brian's sample can be seen as analogous he finds Olivier's spam words common (i.e. not spam) too.
        
        Olivier is, I think, looking at this from a non-English speaking POV (and I am not critising him for that).
        
        --
        Wm ...
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Olivier Guillion - 2006-04-23
        
        Hi, WM,
        
        You wrote:
        > The point I am making is that words common to
        > Olivier in spam are also very common to me in
        > normal and wanted e-mail and if Brian's sample
        > can be seen as analogous he finds Olivier's spam
        > words common (i.e. not spam) too.
        > Olivier is, I think, looking at this from a
        > non-English speaking POV (and I am not critising
        > him for that).
        
        I'm afraid you misunderstood the principle I tried to explain (maybe due to my poor English skill). Please try to read my explanations again, and feel free to ask questions if it is unclear, I won't feel offended at all.
        An attempt to clarify: please consider the more "non-spam" English words you have in your corpus, the easier for a spammer to mislead Popfile and make his message pass through.
        So native English speakers are much more vulnerable to this kind of attack than foreign speakers like myself (78% for Brian against 47% for me).
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Olivier Guillion - 2006-04-23
      
      Hi, WM,
      
      you wrote:
      > Hi Olivier and Brian, just so we are all working
      > on the same basis would you explain exactly
      > what you are doing to arrive at these
      > perecentages? <...>
      > The point I am making is that the percentages
      > aren't worth anything unless we each calculate
      > them the same way. I'll have a look to see if I
      > can come up with a sensible SQL query after
      > posting this, that way we could all use the same
      > thing.
      
      The way of proceeding is simple. View the message content in the PF history list. Count how many of words in the word list are colored as "spam", or as any of your "non-spam" buckets, or not colored at all (unknown in your corpus).
      Post the results here.
      
      > It is an interesting experiment and I am not
      > trying to put Olivier off but my gut feeling at
      > the moment is still that Olivier's list (and
      > idea in general) is more appropriate for people
      > that get more English language spam than English
      > language normal e-mail.
      
      It's just the opposite. If you are an English speaker, you'll have more English words that are considered as "non-spam", so it will be much easier for a spammer to find a repeated word that makes his message considered as non-spam.
      
      Olivier
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Texas Fett - 2006-04-23
        
        >>The way of proceeding is simple. View the message content in the PF history list. Count how many of words in the word list are colored as "spam", or as any of your "non-spam" buckets, or not colored at all (unknown in your corpus).
        Post the results here.<<
        
        That doesn't seem a very good measure to me. The number of spam/ham words gives some indication of what we are looking for, but more important in classification is the score of each word. If you have 10 words that are only slightly ham they could easily be outweighted by one heavily spammy word.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Olivier Guillion - 2006-04-23
    
    Hi, Brian,
    
    Thank you for providing your results, that are impressive indeed.
    This means such a spam message has 78% chances to be misclassified as non-spam and pass through! Wow, I didn't expect such a score. :(
    
    Olivier
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- John Graham-Cumming - 2006-04-23
  
  So what you are referring to is commonly called 'Bayesian Poisoning' and it's an entire area of research that was kicked off by me in 2004 at the MIT Spam Conference with my presentation "How to beat an adaptive spam filter".
  
  There are now many other papers on the topic which look at different possible attacks: you are describing an attack using common English words, but there are many others.
  
  If you are interested in this area then can I suggest the following reading list:
  
  2002, Graham, "Will filters kill spam?"
  2004, Graham-Cumming, "How to beat an adaptive spam filter?"
  2004, Wittel and Wu, "On attacking statistical spam filters"
  2004, Stern, Mason and Shepherd, "A linguistics based attack on personalised statistical email classifiers"
  2005, Lowd and Meek, "Good word attacks on statistical spam filters"
  2006, Graham-Cumming, "Does Bayesian Poisoning exist?"
  
  John.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Olivier Guillion - 2006-04-23
    
    Thanks for this references, but unfortunately, most of them can't be viewed for free. I just wanted to know whether my principle was already explained :(
    
    Anyway, from what I could read, it was about "word salad", i.e. a whole set of random words added to a spam message.
    
    So, please forgive me if I discuss a topic that have already been often addressed.
    
    Adding a set of random words (word salad) is absolutely not the same as repeating the *same* word (word wall). By repeating a same word multiple time (let's say 10000), you decrease the filter efficiency to the percentage of words in your list that are more likely spam in the user's corpus (about 52% for me, and 21% for Brian).
    
    Of course, the spammer can't be sure his message will pass through. But it maximizes his chances, which is more than enough.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Wm - 2006-04-23
      
      Olivier wrote:
      
      >Adding a set of random words (word salad) is absolutely not the same as repeating the *same* word (word wall). By repeating a same word multiple time (let's say 10000), you decrease the filter efficiency to the percentage of words in your list that are more likely spam in the user's corpus (about 52% for me, and 21% for Brian).<
      
      Olivier, you need to understand that 10K occurances of 1 word still changes only 1 word in a corpus.
      
      I have many thousands of words in each of my non-spam buckets. 100 words repeated many times will not change my corpus significantly.
      --
      Wm ...
      
      Of course, the spammer can't be sure his message will pass through. But it maximizes his chances, which is more than enough.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Olivier Guillion - 2006-04-23
  
  Hi, WM
  
  You wrote:
  
  >Olivier, you need to understand that 10K occurances
  > of 1 word still changes only 1 word in a corpus.
  > I have many thousands of words in each of my
  > non-spam buckets. 100 words repeated many times
  > will not change my corpus significantly.
  
  Are you sure about this? What will happen with your messages when the 100 most common English will be considered as a strong spam indicator in your corpus?
  
  Easy to know. Make a backup of your corpus, then send a message to yourself containing 10000 times the list of words I posted here. Reclassify this message as spam, and... see what happens then.
  An advice: be prepared to recover your corpus backup quickly ;)
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Wm - 2006-04-23
    
    Olivier wrote:
    
    >Are you sure about this?<
    
    Yes
    
    >What will happen with your messages when the 100 most common English will be considered as a strong spam indicator in your corpus?<
    
    I will conclude the spammer has misunderstood my corpus. Your 100 common words are not mine, Olivier and I am getting bored with saying it.
    
    >Easy to know. Make a backup of your corpus, then send a message to yourself containing 10000 times the list of words I posted here. Reclassify this message as spam,<
    
    Why do you think I would reclassify the message? I am not an automoton. I think before I reclasssify. Your e-mail containing your list arrived in my "bulk" bucket. I did not reclassify it.
    
    >and... see what happens then.
    An advice: be prepared to recover your corpus backup quickly ;)<
    
    Why?
    --
    Wm ...
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Olivier Guillion - 2006-04-23
  
  Hi, WM,
  
  You wrote:
  >> Are you sure about this?
  > Yes
  
  Then you're wrong, sorry.
  
  >> What will happen with your messages when the 100
  >> most common English will be considered as a
  >> strong spam indicator in your corpus?
  >I will conclude the spammer has misunderstood my
  >corpus. Your 100 common words are not mine,
  >Olivier and I am getting bored with saying it.
  
  You know, I start to be tired too of saying it. I'm talking about the list of 100 words (100 most common words in English language) I posted in my first message.
  
  >>Easy to know. Make a backup of your corpus,
  >>then send a message to yourself containing
  >>10000 times the list of words I posted here.
  >>Reclassify this message as spam,
  >Why do you think I would reclassify the message?
  >I am not an automoton. I think before I
  >reclasssify. Your e-mail containing your list
  >arrived in my "bulk" bucket. I did not reclassify
  >it.
  
  OK, you don't seem to understand my point, you confuse between the list I sent for statistics purposes and the kind of mail a spammer could send to you. I explained it in detail in my very first message, and I'm tired of explaining it again and again.
  
  I'm afraid the only way for me to make you understand would be that I play the role of a spammer, and with your consent, I make my spam pass through PF and corrupt your corpus, in only one or two messages.
  If you consider it is harmless and I'm wrong, then give me your agreement to try it (you can provide a temporary mailbox for the test).
  Backup your corpus first, just in case.
  
  >>and... see what happens then.
  >>An advice: be prepared to recover your corpus >>backup quickly ;)
  >Why?
  
  Because it will probably be deeply corrupted.
  
  Please read the answer of John Graham-Cumming, who seems to agree with me, above in this thread.
  http://sourceforge.net/forum/message.php?msg_id=3699435
  
  I do not pretend to know everything or to be always right, but the fact I could test this method (see details here: http://sourceforge.net/forum/message.php?msg_id=3699338 ) and that the author of PopFile, who is much, much more aware about anti-spam filters than myself agrees with me reinforce my hypothesis.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Wm - 2006-04-23
    
    Olivier, first: I think you might have misunderstood what JGC wrote, my reading of what he said about what is possible is not the same as what you think is possible in terms of skewing a corpus.
    
    Second: yes, of course you may send me things that you think will corrupt my corpus, I am sure they won't corrupt my corpus because I won't reclassify them. i.e. they will have zero effect.
    --
    Wm ...
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Olivier Guillion - 2006-04-23
      
      Hi, WM,
      
      You wrote:
      > Olivier, first: I think you might have
      > misunderstood what JGC wrote, my reading of what
      > he said about what is possible is not the same as
      > what you think is possible in terms of skewing a
      > corpus.
      >Second: yes, of course you may send me things
      >that you think will corrupt my corpus, I am sure
      >they won't corrupt my corpus because I won't
      >reclassify them. i.e. they will have zero effect.
      
      OK, I see. What I say doesn't work, but you won't play the game :)
      All this story becomes ridiculous. I prefer to stop arguing with you right now. As we say, "don't feed the trolls".
      If it is what you want to hear, you are right, I'm wrong.
      Now please let the people who are wrong discuss together and share their wrongness.
      Thanks.
      Olivier
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Texas Fett - 2006-04-24
  
  Oliver's test words in my corpus:
  
  "spam": 5%
  "non-spam": 95%
  unknown in corpus: 0%
  
  5 words were in one of my 2 spammy buckets, the others were spread between 7 of my other buckets. None were not already in the corpus.
  
  The words that were classified as spam were:
  live, call, through, line, and mean
  
  I am not sure if it means anything, but you might be interested in trying this. Create a file with those test words in it (and a blank line at the top of the file so PF thinks it finished the headers and is classifying the body). Put it in your POPFile directory and use bayes.pl on it. You can get it here:
  http://popfile.jciv.com/testcommonwords.msg
  
  My corpus classifies this fake message to my ezine bucket.
  
  Based on an actual spam, I created this message:
  http://popfile.jciv.com/testcorpusattack.msg
  
  It required repeating those 100 words only 4 times to classify the test spam as ezine.
  
  I then reclassified a message containing all 100 of the test words once to my main spam bucket. Still only 5 of the words in the list were considered spam words by the colorization.
  
  I then ran the testcorpusattack.msg (with the words still repeated 4 times) through bayes.pl and it was still classified as ezine.
  
  Using fakepop I downloaded that test message to get it into POPFile's history. I reclassified it as spam. After reclassification out of those common words now 7 of them are spammy.
  
  Running testcommonwords.msg through bayes.pl again shows it still would classify as ezine.
  
  And for those who believe the number of times a word is repeated in a message does not matter for reclassification I ran a test.
  
  I created a word, "thisisatestword" and sent two message using it. The first message contained the word 320 times. The second message contained the word once. The word score after each was reclassified to a different bucket:
  
  Bucket______Frequency_______Probability_____Score
  joseph______0.0000089167____0.0036733536____1.7863894098
  spam________0.0024170280____0.9957259247____4.2194664899
  
  The total words of those two buckets was comparable, 132,000 and 112,000 making up 19% and 16% of the total words in my corpus. Since the two buckets' distinct word count were not comparable in size, 32,000 vs. 15,000 I chose another smaller bucket, a 12,000 word bucket but it only has 60,000 total words. Results of those are here:
  
  Bucket______Frequency_______Probability_____Score
  joseph______0.0000089167____0.0016744380____1.7863894098
  workspam____0.0053148200____0.9980517328____4.5616733794
  
  (underscores are there for formatting since SF ignores spaces)
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Olivier Guillion - 2006-04-24
    
    Thanks for your test, Joseph!
    It's very instructive!
    
    However, please consider my basic idea is not for a spammer to mix up the words together in a single message, because as soon as this word set gets an overall spam ratio of a bit more than 50% (which is already my case), no message passes through anymore.
    
    The idea is for the spammer to use only *one* of these words in each message, and to repeat this single word a multiple times, let's say 1000 or 10000.
    From the 100 words in the test word list, it enables to send 100 different messages. Because 95% of these words are "ham" in your corpus, it means 95% of these spammy messages will go through. And if you reclassify them, you'll have chances to corrupt your corpus.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Texas Fett - 2006-04-24
      
      I agree, the simplest case is for a spammer to use a single word repeated enough times to throw off POPFile. That can be overcome pretty well by limiting the number of times POPFile counts a word when reclassifying.
      
      But if the spammer uses all 100 common words or any combination of some of them, they can achieve the same with a smaller number of repetitions. And be more likely the initial messages will get past most Bayesian filters. Eventually the filters would learn to catch them, but there are plenty more than 100 words to choose from. They could keep this up for a short while, probably months and be relatively sucessful. Remember spammers don't need to reach 100% of their victims to make them happy.
      
      I think of this problem as a smart word salad attack. Currently word salad has been totally random nonspam words or passages taken from articles. Many of the words in either choice are not in most people's corpus. The spammers are just relying on the sheer number of words to dilute the spammy ones. If they were smart they would be targeting these pretty much guaranteed (for now) common non-spammy words.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Olivier Guillion - 2006-04-24
        
        Hi, Joseph,
        
        >I agree, the simplest case is for a spammer to
        >use a single word repeated enough times to throw
        >off POPFile. That can be overcome pretty well by
        >limiting the number of times POPFile counts a
        >word when reclassifying.
        
        Right. I think it's what JGC is also thinking about.
        
        >But if the spammer uses all 100 common words or
        >any combination of some of them, they can achieve
        > the same with a smaller number of repetitions.
        >And be more likely the initial messages will get
        >past most Bayesian filters. Eventually the
        >filters would learn to catch them, but there are
        >plenty more than 100 words to choose from. They
        >could keep this up for a short while, probably
        >months and be relatively sucessful. Remember
        >spammers don't need to reach 100% of their
        >victims to make them happy.
        
        It will work only for primary English speaker (in the 1000 first words, less than 50% are "ham" in ly corpus), and the chances to pass through are smaller. It might be a bit more difficult for Bayesian filters to learn, but it is possible, and harmless for the corpus, each used word becoming only just a bit more "spammy" than before, until the list of common words gets a slight "spam" preference.
        
        >I think of this problem as a smart word salad
        >attack. Currently word salad has been totally
        >random nonspam words or passages taken from
        >articles. Many of the words in either choice are
        >not in most people's corpus. The spammers are
        >just relying on the sheer number of words to
        >dilute the spammy ones. If they were smart they
        >would be targeting these pretty much guaranteed
        >(for now) common non-spammy words.
        
        Right, but the "smart word salad" scheme has been discussed in some of the paper references JGC provided, and there seem to be a few ways to manage it (by reducing the importance of common words if I understood well).
        
        Anyway, I think it would just be a new kind of spam, just a bit more difficult to get rid of than the others. In such cases, PopFile let a few messages pass through, then less and less after you reclassify carefully these messages.
        
        Maybe it could be possible to reduce even more the maximum number of occurrences in a message for any "usual" word (i.e. part of the 1000 most common words list), or even the total word count for this set of words (saying, for example, that after 100 usual words in a same message, none of the following usual words is counted anymore till the end of the message?
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Loren Pechtel - 2006-07-06
  
  I've seen the reverse of this in the real world.
  
  I found a message from a long-time correspondant in my spam bucket. It was cut & paste humor, nothing unusual about it and nothing that looked spammy about it.
  
  After some poking at Popfile I finally figured out what was up. It was a page of one-liner jokes about our president--the guy whose name matches a piece of female anatomy. Classifying porn spam had gotten "bush" a weight of about .6 towards spam.
  
  After reading the discussion here I'm inclined to suggest a different solution to the probelm. Don't cap the count or use a pseudoword, those things will react badly to log files. Instead put a decreasing weight on a word for repeat occurences.
  
  Also, when updating the corpus only count it as one hit no matter how many times it occured in the message.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Texas Fett - 2006-07-07
    
    Log files are a good example of something to use magnets. Even not considering repeated words, some log files will contain large numbers of spammy words or such a large number of URLs the message may look spammy. If you have a filter setup to email you when a user accesses a blocked site, the message may contain spammy words in the URL or keywords that caused the block.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Roger Shady - 2006-07-13
  
  I'm not familiar with this Corpus attack but i thought that I would post the following anyway.
  I ran across from one of the newsletters I get.
  
  "Blinding POPFile via a single-word attack: Olivier Guillion
  and John Graham-Cumming describe how the POPFile spam filter
  can be blinded with a single-word attack."
  http://www.virusbtn.com/sba/2006/07/sb200607-blinding
  
  Sorry, I'm not a paid subscriber to virus bulletin news but I thought that I'd post this anyway.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

A way of achieving a "Corpus attack"?

Forums

Help

A way of achieving a "Corpus attack"?

A way of achieving a "Corpus attack"?

Forums

Help

A way of achieving a "Corpus attack"? document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

A way of achieving a "Corpus attack"?