Menu

am using the example proj how can i crawl entire domain and and return content heders e.g text/xml

Help
Anonymous
2018-01-03
2018-01-04
  • Anonymous

    Anonymous - 2018-01-03

    am using the example proj how can i crawl entire domain and and return content heders e.g text/xml

     
    • Anonymous

      Anonymous - 2022-07-09
      Post awaiting moderation.
  • Anonymous

    Anonymous - 2018-01-04

    Mostly copy/pasted from the example given....

    <?php 
    
    // It may take a whils to crawl a site ... 
    set_time_limit(10000); 
    
    // Inculde the phpcrawl-mainclass 
    include("libs/PHPCrawler.class.php"); 
    
    // Extend the class and override the handleDocumentInfo()-method  
    class MyCrawler extends PHPCrawler  
    { 
      function handleDocumentInfo($DocInfo)  
      { 
        echo "<pre>" . print_r(get_headers($PageInfo->url), TRUE) . "</pre>";
    
        flush(); 
      }  
    } 
    
    // Now, create a instance of your class, define the behaviour 
    // of the crawler (see class-reference for more options and details) 
    // and start the crawling-process.  
    
    $crawler = new MyCrawler(); 
    
    // URL to crawl 
    $crawler->setURL("www.php.net"); 
    
    // Store and send cookie-data like a browser does 
    $crawler->enableCookieHandling(true); 
    
    // Set the traffic-limit to 1 MB (in bytes, 
    // for testing we dont want to "suck" the whole site) 
    $crawler->setTrafficLimit(1000 * 1024); 
    
    // Thats enough, now here we go 
    $crawler->go(); 
    
    // At the end, after the process is finished, we print a short 
    // report (see method getProcessReport() for more information) 
    $report = $crawler->getProcessReport(); 
    
    if (PHP_SAPI == "cli") $lb = "\n"; 
    else $lb = "<br />"; 
    
    echo "Summary:".$lb; 
    echo "Links followed: ".$report->links_followed.$lb; 
    echo "Documents received: ".$report->files_received.$lb; 
    echo "Bytes received: ".$report->bytes_received." bytes".$lb; 
    echo "Process runtime: ".$report->process_runtime." sec".$lb;  
    ?>
    
     

    Last edit: Anonymous 2018-01-04
  • Anonymous

    Anonymous - 2018-01-04

    Is this above code is still works? how to get data from it?

     
  • Anonymous

    Anonymous - 2018-01-04
    echo "<pre>" . print_r(get_headers($PageInfo->url), TRUE) . "</pre>";
    
     
  • Anonymous

    Anonymous - 2018-01-04
    echo "<pre>" . print_r(get_headers($PageInfo->url), TRUE) . "</pre>";
    
     

    Last edit: Anonymous 2018-01-16

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.