Menu

Insert data in database

Help
Anonymous
2015-05-19
2015-05-19
  • Anonymous

    Anonymous - 2015-05-19

    Hello,

    I want to add all link crawled to a database, i have insert a database request to the handleDocumentInfo function, but it does not work.

    Here is my code :

    class MyCrawler extends PHPCrawler
    {
    function handleDocumentInfo($DocInfo)
    {
    // Just detect linebreak for output ("\n" in CLI-mode, otherwise "
    ").
    if (PHP_SAPI == "cli") $lb = "\n";
    else $lb = "
    ";

            // Print the URL and the HTTP-status-Code
            echo "Page requested: ".$DocInfo->url." (".$DocInfo->http_status_code.")".$lb;
    
            $link = $DocInfo->url;
            $query = $bdd->exec("INSERT INTO liensites(lien) VALUES($link)");
    
            // Print the refering URL
            echo "Referer-page: ".$DocInfo->referer_url.$lb;
    
            // Print if the content of the document was be recieved or not
            if ($DocInfo->received == true)
                echo "Content received: ".$DocInfo->bytes_received." bytes".$lb;
            else
                echo "Content not received".$lb;
    
            // Now you should do something with the content of the actual
            // received page or file ($DocInfo->source), we skip it in this example
            echo $liens2;
    
            echo $lb;
    
            flush();
        }
    }
    
    $crawler = new MyCrawler();
    $crawler->setURL("www.mysite.com");
    

    // Only receive content of documents with content-type "text/html"
    $crawler->addReceiveContentType("#text/html#");

    // Ignore links to pictures, css-documents etc (prefilter)
    $crawler->addURLFilterRule("#.(jpg|gif|png|pdf|jpeg|css|js)$# i");

    $crawler->go();
    

    Thanks you

     
    • Anonymous

      Anonymous - 2020-11-13
      Post awaiting moderation.

Anonymous
Anonymous

Add attachments
Cancel