Hi Guys,
Thanks for the script. Question, im finding that when i try to instead the contents in my DB with ->goMultiProcessed the script hangs,. It works fine on $crawler->go();
TLDR;
Any ideas on how to insert data into DB using $crawler->goMultiProcessed(10);
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
i mean where/when in your script do you open the db-connection and where/when are you doing the instert-statement?
If you post your script, i'll make a check.
And for your second problem you maybe should escape your INSERT statement properly, this has nothing to do with phpcrawl itself (i guess).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm inserting the connect using an include file at the top of the file. another I;ve noticed as well. It is crawling in goMultiProcessed mode but it takes ages when I include the insert into db as part of the script. I've i comment out the insert statement, the script is fast.. it also fly when its on single thread mode with a db. db + goMultiProcessed = very slow
I'm on a dedicated box.
I just made a quick test with the script listed below (works the same way as yours).
Im sorry, but i cant find any problem, there's no difference in the process runtiume between
ther script WITH inerst-statement and without. Both takes around 12 seconds over here.
( I used PHP 5.3.2 and MySql 5.1.61 on a Ubuntu 10.04.1 system for testing)
So again, sorry, but i don't know whats the problem with your script or server or mysql-database.
This is the script i used:
<?php
// Inculde the phpcrawl-mainclass
include("libs/PHPCrawler.class.php");
Hi Guys,
Thanks for the script. Question, im finding that when i try to instead the contents in my DB with ->goMultiProcessed the script hangs,. It works fine on $crawler->go();
TLDR;
Any ideas on how to insert data into DB using $crawler->goMultiProcessed(10);
Hi,
could you post your script or explain how you open the DB-connection and how you write
into the mysql-database?
Normally it shouldn't get any problems …
usual details
$username = "xxxxx";
$password = "xxxxxx";
$host = "localhost";
$database = "dbcccccc";
mysql_connect($host,$username,$password) or die("Cannot connect to the database.<br>" . mysql_error());
mysql_select_db($database) or die("Cannot select the database.<br>" . mysql_error());
$sqlx = "INSERT INTO table SET
linksetid = '$linkuniq',
ftimestamp = '$ntime',
url = '$mylinks',
anchor = '$anchor',
level = '9',
crawl_now = '2',
ltype = '20'"
$queryx = mysql_query($sqlx) or die("Cannot query the database.<br>" . mysql_error());
Also I'm having another error where the crawler comes acrros a url like this… and crashes because it tries to run the function in the url
http://rover.ebay.com/ar/1/55242/1?lt=1&adtype=3&pubid=5574809736&toolid=10001&campid=5336248661&customid=&laction=_blank<ext=eBay+Items&u7v=1&a3h=1&def=u7v&ig=1&mpt='+Math.floor(Math.random()*999999999)+': ->2 - 20
Cannot query the database.
FUNCTION Math.floor does not exist
hi agin,
i mean where/when in your script do you open the db-connection and where/when are you doing the instert-statement?
If you post your script, i'll make a check.
And for your second problem you maybe should escape your INSERT statement properly, this has nothing to do with phpcrawl itself (i guess).
Thanks, for your help, I realized this after, posting . silly me
Insert statement goes here
class MyCrawler extends PHPCrawler
{
function handleDocumentInfo(PHPCrawlerDocumentInfo $PageInfo)
{
$topanchor = $PageInfo->refering_linktext;
$url = $PageInfo->url;
{INSERT INTO DB HERE}
//I also need the external links
$linksfound = $PageInfo->links_found;
foreach( $linksfound as $key => $value){
//get external links
$mylinks = $value;
$anchor = $value;
{INSERT AGAIN INTO DB}
}//end foreach
} //emd handler
}//end extender
and where do you open the DB-connection (mysql_connect)?
I'm inserting the connect using an include file at the top of the file. another I;ve noticed as well. It is crawling in goMultiProcessed mode but it takes ages when I include the insert into db as part of the script. I've i comment out the insert statement, the script is fast.. it also fly when its on single thread mode with a db. db + goMultiProcessed = very slow
I'm on a dedicated box.
<?
include("libs/PHPCrawler.class.php");
include("../../conn.php");
class MyCrawler extends PHPCrawler
{
function handleDocumentInfo(PHPCrawlerDocumentInfo $PageInfo)
{
$topanchor = $PageInfo->refering_linktext;
$url = $PageInfo->url;
{INSERT INTO DB HERE}
//I also need the external links
$linksfound = $PageInfo->links_found;
foreach( $linksfound as $key => $value){
//get external links
$mylinks = $value;
$anchor = $value;
{INSERT AGAIN INTO DB}
}//end foreach
} //emd handler
}//end extender
Hi!
I just made a quick test with the script listed below (works the same way as yours).
Im sorry, but i cant find any problem, there's no difference in the process runtiume between
ther script WITH inerst-statement and without. Both takes around 12 seconds over here.
( I used PHP 5.3.2 and MySql 5.1.61 on a Ubuntu 10.04.1 system for testing)
So again, sorry, but i don't know whats the problem with your script or server or mysql-database.
This is the script i used:
mysql_connect("localhost","root","passwd");
mysql_select_db("test");
class MyCrawler extends PHPCrawler
{
function handleDocumentInfo($DocInfo)
{
if (PHP_SAPI == "cli") $lb = "\n";
else $lb = "<br />";
echo "Page requested: ".$DocInfo->url." (".$DocInfo->http_status_code.")".$lb;
mysql_query("INSERT INTO test SET url = '".$DocInfo->url."';");
}
}
$crawler = new MyCrawler();
$crawler->setURL("anyurl.com");
$crawler->addURLFilterRule("#\.(jpg|jpeg|gif|png)$# i");
$crawler->setPageLimit(100);
$crawler->setWorkingDirectory("/dev/shm/");
$crawler->goMultiProcessed(10);
$report = $crawler->getProcessReport();
if (PHP_SAPI == "cli") $lb = "\n";
else $lb = "<br />";
echo "Summary:".$lb;
echo "Links followed: ".$report->links_followed.$lb;
echo "Documents received: ".$report->files_received.$lb;
echo "Bytes received: ".$report->bytes_received." bytes".$lb;
echo "Process runtime: ".$report->process_runtime." sec".$lb;
?>
Did you use php CLI (console) or a browser?
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Hi, i can't figure where to put the code please help.