Menu

am using the example proj how can i crawl entire domain and and return content heders e.g text/xml

Help
Anonymous
2018-01-03
2018-01-04
  • Anonymous

    Anonymous - 2018-01-03

    am using the example proj how can i crawl entire domain and and return content heders e.g text/xml

     
    • Anonymous

      Anonymous - 2022-07-09
      Post awaiting moderation.
  • Anonymous

    Anonymous - 2018-01-04

    Mostly copy/pasted from the example given....

    <?php 
    
    // It may take a whils to crawl a site ... 
    set_time_limit(10000); 
    
    // Inculde the phpcrawl-mainclass 
    include("libs/PHPCrawler.class.php"); 
    
    // Extend the class and override the handleDocumentInfo()-method  
    class MyCrawler extends PHPCrawler  
    { 
      function handleDocumentInfo($DocInfo)  
      { 
        echo "<pre>" . print_r(get_headers($PageInfo->url), TRUE) . "</pre>";
    
        flush(); 
      }  
    } 
    
    // Now, create a instance of your class, define the behaviour 
    // of the crawler (see class-reference for more options and details) 
    // and start the crawling-process.  
    
    $crawler = new MyCrawler(); 
    
    // URL to crawl 
    $crawler->setURL("www.php.net"); 
    
    // Store and send cookie-data like a browser does 
    $crawler->enableCookieHandling(true); 
    
    // Set the traffic-limit to 1 MB (in bytes, 
    // for testing we dont want to "suck" the whole site) 
    $crawler->setTrafficLimit(1000 * 1024); 
    
    // Thats enough, now here we go 
    $crawler->go(); 
    
    // At the end, after the process is finished, we print a short 
    // report (see method getProcessReport() for more information) 
    $report = $crawler->getProcessReport(); 
    
    if (PHP_SAPI == "cli") $lb = "\n"; 
    else $lb = "<br />"; 
    
    echo "Summary:".$lb; 
    echo "Links followed: ".$report->links_followed.$lb; 
    echo "Documents received: ".$report->files_received.$lb; 
    echo "Bytes received: ".$report->bytes_received." bytes".$lb; 
    echo "Process runtime: ".$report->process_runtime." sec".$lb;  
    ?>
    
     

    Last edit: Anonymous 2018-01-04
  • Anonymous

    Anonymous - 2018-01-04

    Is this above code is still works? how to get data from it?

     
  • Anonymous

    Anonymous - 2018-01-04
    echo "<pre>" . print_r(get_headers($PageInfo->url), TRUE) . "</pre>";
    
     
  • Anonymous

    Anonymous - 2018-01-04
    echo "<pre>" . print_r(get_headers($PageInfo->url), TRUE) . "</pre>";
    
     

    Last edit: Anonymous 2018-01-16

Anonymous
Anonymous

Add attachments
Cancel