Menu

Setup Start URL inside one domain but crawl all pages of that domain.

Help
Anonymous
2015-04-19
2015-04-20
  • Anonymous

    Anonymous - 2015-04-19

    Hello,

    Right now I have crawler set like this:

    $crawler->setURL("http://www.domain.com");

    $crawler->addReceiveContentType("#text/html#");

    $crawler->addURLFilterRule("#.(jpg|gif|png|pdf|jpeg|css|js)$# i");

    So, it start from index page on domain name and crawl all data on that domain.

    But I want to do is to start crawling from some inner page like
    http://www.domain.com/something/page.html
    and again crawl all data on that domain.

    If I set

    $crawler->setURL("http://www.domain.com/something/page.html);

    It will crawl only that one page, right?

    So, how can I do crawling entry domain but starting crawling from some inner page?

     
  • Anonymous

    Anonymous - 2015-04-20

    And another question:

    How to start scraping from top domain like "domain.com"
    but allow crawler to go in all subdomains on that domain - "one.domain.com" , "two.domain.com" etc…

     
  • Anonymous

    Anonymous - 2015-04-20

    Like this:

    $crawler-> setFollowMode("1");

    Thanks a lot!

     

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.