Menu

addURLFilterRule how to create correct regex?

Help
Anonymous
2019-02-14
2019-02-18
  • Anonymous

    Anonymous - 2019-02-14

    I'm having an issue in creating the correct regex for this situation:
    The Crawler should ignore url's like:
    http://mysite.com/Category/ProductA/Search
    http://mysite.com/Category/ProductB/Search
    http://mysite.com/Category/ProductB/Search

    But it should crawl this url:
    http://mysite.com/Search

    I tried this:
    $crawler->addURLFilterRule("/Category\/.\/Search/");
    and
    $crawler->addURLFilterRule("#/Category\/.\/Search/#");
    But still the first 2 url's are also crawled.

    What would be the correct regex to prevent this?

    Thanks in advance!

     
  • Anonymous

    Anonymous - 2019-02-15

    You're close. The dot matches any (single) character except line breaks, but you need to match ALL of the characters. Try this:
    (\/Category\/.+\/Search)

     
  • Anonymous

    Anonymous - 2019-02-15

    By the way, I didn't actually try it because I don't have those set up, but this is my example:
    https://www.regexpal.com/?fam=107703

     
  • Anonymous

    Anonymous - 2019-02-15

    Thanks for your reply! I'll give it a try and let you know!

     
  • Anonymous

    Anonymous - 2019-02-18

    Hi,
    due to your hint I was able to get this command working: $crawler->addURLFilterRule("/\/Category\/.+\/Search/");
    Thanks for your help!

     

Anonymous
Anonymous

Add attachments
Cancel