Oiranoinu - 2005-10-24

Hello!
I use your script and notice somethings for more convenient.

1.the time crawling

about  248th line...

<code>
    // Additional infos for the override-function handlePageData()
    $page_data["protocol"]=$protocol;
    $page_data["host"]=$host;
    $page_data["path"]=$path;
    $page_data["file"]=$file;
    $page_data["query"]=$query;
    $page_data["url"]=$protocol.$host.$path.$file.$query;

    /*Adding the time*/
    $page_data["time"]=date('Y/m/d H:i:s',time());
     
    return($page_data);
</code>

and you use $page_data["time"] in  function handlePageData($page_data).

2.sleep function

To prevent DOS suspicion, you have better using sleep function.
Such as Google has using bot every 1second.

about 390th line

<code>
      if ($content_found==false && $rd[0]!="" && $this->follow_redirects_till_content==true) {
        PHPCrawlerUtils::addToArray($rd, $this->urls_to_crawl, $this->urls_to_crawl[$key], $this->referers_to_urls_to_crawl);
      }
    /*Adding sleep in 1 second*/
    sleep(1);

    } // end of main-loop
</code>

Best regards!!