#95 regex patterns like "http://\d" no match

open
Ruleminator (5)
9
2007-11-11
2007-09-06
Steve Vance
No

Lots of spam nowadays has URLs with all numbers, like http://76.105.209.176/. Trying to use Ruleminator to filter these out.
---
sample spam message body:
---
If you download music of other files, you're being tracked. The RIAA has
even sued children, are you next? Our software eliminates the trail that
leads to you. This software is made available free, so we can keep the
internet free and private: http://76.105.209.176/
---
Edit your Rule below
Description: Full numeric IP address

If all of the following conditions are met:
(01) Sender is not trusted
(02) Message content matches regex http://\(\d{1,3}\.){3}\d{1,3}

Consider the message to be spam
---
results of filtering:
---
List of potentially bad domains: 76.105.209.176
Bayesianato 95%
Ruleminator 100% 2 of 12 rules matched!
distrusted_cid_sender: unknown
distrusted_sender: 100%
pre_checked: unknown
full_numeric_ip_address: unknown <--- WRONG
numeric_ip_address_2: unknown
numeric_ip_address_5: unknown
numeric_ip_address_6: unknown
numeric_ip_address_7: 100% (note: "contains http://7")
numeric_ip_address_8: unknown
numeric_ip_address_9: unknown
unknown_pdf_sender: unknown
---
full_numeric should be 100% match, but is instead "unknown".

Discussion

  • Keno Albrecht

    Keno Albrecht - 2007-09-10

    Logged In: YES
    user_id=1217053
    Originator: NO

    Thanks for reporting.

    Try this one: (?is).*http://[1-9].*

    Works for me, the key is the "?s", which tells the ".*" also to match line breaks.

     
  • Keno Albrecht

    Keno Albrecht - 2007-09-14

    Logged In: YES
    user_id=1217053
    Originator: NO

    Does it work now for you :-)?

     
  • Konrad Schultz

    Konrad Schultz - 2007-09-14

    Logged In: YES
    user_id=157707
    Originator: NO

    I tried (mostly by trial and error) "Message content matches regex (?is).*http:\/\/(\d{1,3}\.){3}\d{1,3}.*", and that worked.
    Konrad

     
  • Steve Vance

    Steve Vance - 2007-09-17

    Logged In: YES
    user_id=1705931
    Originator: YES

    I think you are saying the "(?is)" is needed so that the regex will match something that's not on the first line of the message. And, the ".*" at the beginning and end are needed so that it will match, even if the regex does not otherwise match the entire contents of the message.

    Of course, I think that these both should be implicit. You could have, as part of every regex rule, a checkbox like this:

    [X] only match this regex if it is on the first line of the message body

    But, that would be silly. Who would want that?

    The important thing for people who are using Spamato would be documentation of this behavior, I am sure that this is not so.

     
  • Keno Albrecht

    Keno Albrecht - 2007-09-17

    Logged In: YES
    user_id=1217053
    Originator: NO

    That's what I was trying to say, yes. And yes, the documentation is lousy :-(.

    Thanks for your suggestion. I think it would be best to implicitly wrap the user's pattern to "(?is).*[USER-PATTERN-HERE].*".

     
  • Keno Albrecht

    Keno Albrecht - 2007-11-11
    • priority: 5 --> 9
     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks