Hello,
I want to add all link crawled to a database, i have insert a database request to the handleDocumentInfo function, but it does not work.
Here is my code :
class MyCrawler extends PHPCrawler { function handleDocumentInfo($DocInfo) { // Just detect linebreak for output ("\n" in CLI-mode, otherwise ""). if (PHP_SAPI == "cli") $lb = "\n"; else $lb = "";
// Print the URL and the HTTP-status-Code echo "Page requested: ".$DocInfo->url." (".$DocInfo->http_status_code.")".$lb; $link = $DocInfo->url; $query = $bdd->exec("INSERT INTO liensites(lien) VALUES($link)"); // Print the refering URL echo "Referer-page: ".$DocInfo->referer_url.$lb; // Print if the content of the document was be recieved or not if ($DocInfo->received == true) echo "Content received: ".$DocInfo->bytes_received." bytes".$lb; else echo "Content not received".$lb; // Now you should do something with the content of the actual // received page or file ($DocInfo->source), we skip it in this example echo $liens2; echo $lb; flush(); } } $crawler = new MyCrawler(); $crawler->setURL("www.mysite.com");
// Only receive content of documents with content-type "text/html" $crawler->addReceiveContentType("#text/html#");
// Ignore links to pictures, css-documents etc (prefilter) $crawler->addURLFilterRule("#.(jpg|gif|png|pdf|jpeg|css|js)$# i");
$crawler->go();
Thanks you
You seem to have CSS turned off. Please don't fill out this field.
Anonymous
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Hello,
I want to add all link crawled to a database, i have insert a database request to the handleDocumentInfo function, but it does not work.
Here is my code :
class MyCrawler extends PHPCrawler
{
function handleDocumentInfo($DocInfo)
{
// Just detect linebreak for output ("\n" in CLI-mode, otherwise "
").
if (PHP_SAPI == "cli") $lb = "\n";
else $lb = "
";
// Only receive content of documents with content-type "text/html"
$crawler->addReceiveContentType("#text/html#");
// Ignore links to pictures, css-documents etc (prefilter)
$crawler->addURLFilterRule("#.(jpg|gif|png|pdf|jpeg|css|js)$# i");
Thanks you