Menu

addURLFilterRule how to create correct regex?

Help
Anonymous
2019-02-14
2019-02-18
  • Anonymous

    Anonymous - 2019-02-14

    I'm having an issue in creating the correct regex for this situation:
    The Crawler should ignore url's like:
    http://mysite.com/Category/ProductA/Search
    http://mysite.com/Category/ProductB/Search
    http://mysite.com/Category/ProductB/Search

    But it should crawl this url:
    http://mysite.com/Search

    I tried this:
    $crawler->addURLFilterRule("/Category\/.\/Search/");
    and
    $crawler->addURLFilterRule("#/Category\/.\/Search/#");
    But still the first 2 url's are also crawled.

    What would be the correct regex to prevent this?

    Thanks in advance!

     
  • Anonymous

    Anonymous - 2019-02-15

    You're close. The dot matches any (single) character except line breaks, but you need to match ALL of the characters. Try this:
    (\/Category\/.+\/Search)

     
  • Anonymous

    Anonymous - 2019-02-15

    By the way, I didn't actually try it because I don't have those set up, but this is my example:
    https://www.regexpal.com/?fam=107703

     
  • Anonymous

    Anonymous - 2019-02-15

    Thanks for your reply! I'll give it a try and let you know!

     
  • Anonymous

    Anonymous - 2019-02-18

    Hi,
    due to your hint I was able to get this command working: $crawler->addURLFilterRule("/\/Category\/.+\/Search/");
    Thanks for your help!

     

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.