Menu

recursivity on php

Help
khadija
2015-07-29
2015-08-29
  • khadija

    khadija - 2015-07-29

    my script crawl a site web given as input and displays all external links
    but it posts up this error
    atchable fatal error: Argument 1 passed to MyCrawler::handleDocumentInfo() must be an instance of PHPCrawlerDocumentInfo, null given, called in C:\wamp\www\1\exter.php on line 49 and defined in C:\wamp\www\1\exter.php on line 14

    and i don't know how i can resolve this problem :/ , please heelp !!
    this is my code

      <?php 
       //affiche les liens externes en vert et les autres en  bleu/  
        // It may take a whils to spider a website ... 
        set_time_limit(10000);
    
        // Inculde the phpcrawl-mainclass 
        include_once('../PHPCrawl_083/PHPCrawl_083/libs/PHPCrawler.class.php'); 
        include ('2.php');  
      // Extend the class and override the handleDocumentInfo()-method
    
        class MyCrawler extends PHPCrawler
    
        {
         function handleDocumentInfo(PHPCrawlerDocumentInfo $DocInfo) {
    
    
        if (PHP_SAPI == "cli") $lb = "\n"; 
         else {
    $lb = "<br />"; 
    $home_url = parse_url($DocInfo->url ,PHP_URL_HOST ); 
    $file = @file_get_contents($DocInfo->url);
    preg_match_all('/<a.*?href\s*=\s*["\']([^"\']+)[^>]*>.*?   <\/a>/si', $file, $urls);
       # Affichage
       echo '<br/>';
    
     foreach($urls as $url){
     for($i=0;$i<sizeof($url);$i++){
    
     // }
    
    $link_url = parse_url( $url[$i] );
    
      if (substr ($url[$i],0,7) == "http://" ) {
    
    
               echo '1'.$lb ; 
               echo " Page requested:  ".$DocInfo->url."     (".$DocInfo->http_status_code.")".$lb; 
             echo '<br/>';
              echo "<font color=green >".$url[$i].$lb."</font>";
               echo '<br/>';
    
               echo $lb;
    
              flush();
    
              }
             else {
    
             $this->handleDocumentInfo($url[$i] );
             }
    
            }
    
         }
        }
      }
    }
    
    
       $crawler = new MyCrawler();
    
     // URL to crawl 
       $crawler->setURL("http://www.tunisie-web.org ");
    
      // Only receive content of files with content-type "text/html"
    
    
     // Ignore links to pictures, dont even request pictures 
      $crawler->addURLFilterRule("#\.(jpg|jpeg|gif|png)$# i");
    
     // Store and send cookie-data like a browser does 
         $crawler->enableCookieHandling(true);
    
      // Set the traffic-limit to 1 MB (in bytes, 
     // for testing we dont want to "suck" the whole site) 
      //$crawler->setMaxDepth(-1) ;
    
     // Thats enough, now here we go 
     $crawler->go();
    

    //httpwww.annuaire-ag.com

    ?>

     

    Last edit: khadija 2015-07-29
  • Anonymous

    Anonymous - 2015-07-29

    What happens if you print the array $url?
    Looks like that one of the entries isn't correct.

     
  • khadija

    khadija - 2015-07-29

    the problem is : $url[$i] is an url .but the argument of function 'handleDocumentInfo' must be an instance of class 'PHPCrawlerDocumentInfo'
    i dont't know how create an instance of PHPCrawlerDocumentInfo by providing her $url[$i]

     
  • Uwe Hunfeld

    Uwe Hunfeld - 2015-08-29

    Hi khadija,

    sorry for my late answer!

    I just don't understand what you are trying to achive by calling the handleDocumentInfo-method again recursively.

    Do you want the crawler to request that URL?

    Or am i missing something here?

     

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.