Menu

crawl only one level deep?

Help
beatybeaty
2011-08-27
2013-05-19
  • beatybeaty

    beatybeaty - 2011-08-27

    How do I crawl a page, get the links off that page and only crawl those pages and then stop?

    I am using:
    $crawler->setFollowMode(0);  // follow to any domain
    and haven't defined $crawler->setPageLimit(); at all.  // this appears to stop after an aggregate total number of pages have been crawler - not what I want.

    Is there a way to say "crawl one page, get its link, follow one level deep for each of those links and then stop"?

    Thanks.

     
  • Nobody/Anonymous

    Hi!

    Sorry for my late answer.

    This can't be done directly with phpcrawl (yet), but you may use the referer-information within the handlePageData-method to accomplish this:

    // ...
    function handlePageData(&$page_data) 
    {
      // Let the crawler stop if the referer isn't the entry-page anymore
      if ($page_data["referer_url"] != "" && $page_data["referer_url"] != "http://www.urltopage.de/")
        return -1;
    }
    // ...
    
     
  • Anonymous

    Anonymous - 2013-05-19

    Could you please giv an example for second level depth? Thanks.

     

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.