PHPCrawl / Forum / Help: Can not follow the 301 links, anyone can help ?

I have spent so much time to solve this problem, but fail. I follow the every code to see what's wrong . I find some code confused me, can someone help me to explain it.

Summary:
In the file PHPCrawlerLinkFinder.class.php, the code in function findRedirectLinkInHeader show the redirected links has been added to the LinkCache, but the phpcrawl cannot loop it in file PHPCrawler.class.php function startChildProcessLoop.

The redirect links example:
http://product.mobile.163.com/mobile/brand/000O00ED.html => http://product.mobile.163.com/Nokia/
almost the links like "http://product.mobile.163.com/mobile/brand/000O00ED.html" has 301 http status code.

the link http://product.mobile.163.com/mobile/brand/000O00ED.html is found in http://product.mobile.163.com
This is my code:

$crawler = new Crawler();
$crawler->setURL("http://product.mobile.163.com");
$crawler->addContentTypeReceiveRule("#text/html#");
$crawler->addURLFilterRule("#.(jpg|jpeg|gif|png|js)$# i");
$crawler->addURLFollowRule("#product.mobile.163.com/mobile/brand/\w{8}.html# i");
$crawler->addURLFollowRule("#product.mobile.163.com/Samsung/\w{8}/$# i");

phpcrawler get all the links match "#product.mobile.163.com/mobile/brand/\w{8}.html# i", but it cannot find any links match "#product.mobile.163.com/Samsung/\w{8}/$# i".

Can someone tell me why phpcrawl can not follow the 301 links and get the redirect links file ???

This is all my code:

set_time_limit(0);
include("libs/PHPCrawler.class.php");

class Crawler extends PHPCrawler
{
function handleDocumentInfo($DocInfo)
{
if (PHP_SAPI == "cli")
{
$lb = "\n";
}
else
{
$lb = "
";
}
echo "
Page requested: ". $DocInfo->url;
echo "
Http status code: ". $DocInfo->http_status_code. "
";
flush();
}
}

$crawler->go();
$report = $crawler->getProcessReport();

if (PHP_SAPI == "cli")
{
$lb = "\n";
}
else
{
$lb = "
";
}
//
echo "Summary:".$lb;
echo "Links followed: ".$report->links_followed.$lb;
echo "Documents received: ".$report->files_received.$lb;
echo "Bytes received: ".$report->bytes_received." bytes".$lb;
echo "Process runtime: ".$report->process_runtime." sec".$lb;

Last edit: Anonymous 2013-11-19

I have spent so much time to solve this problem, but fail. I follow the every code to see what's wrong  . I find some code confused me, can someone help me to explain it.

Summary:
    In the file PHPCrawlerLinkFinder.class.php, the code in function findRedirectLinkInHeader show the redirected links has been added to the LinkCache, but the phpcrawl cannot loop it in file PHPCrawler.class.php function startChildProcessLoop.

The redirect links example:
http://product.mobile.163.com/mobile/brand/000O00ED.html  => http://product.mobile.163.com/Nokia/
almost the links like "http://product.mobile.163.com/mobile/brand/000O00ED.html" has 301 http status code.

the link http://product.mobile.163.com/mobile/brand/000O00ED.html is found in http://product.mobile.163.com
This is my code:

$crawler = new Crawler();
$crawler->setURL("http://product.mobile.163.com");
$crawler->addContentTypeReceiveRule("#text/html#");
$crawler->addURLFilterRule("#\.(jpg|jpeg|gif|png|js)$# i");
$crawler->addURLFollowRule("#product.mobile.163.com/mobile/brand/\w{8}\.html# i");
$crawler->addURLFollowRule("#product.mobile.163.com/Samsung/\w{8}/$# i");

phpcrawler get all the links match "#product.mobile.163.com/mobile/brand/\w{8}\.html# i", but it cannot find any links match "#product.mobile.163.com/Samsung/\w{8}/$# i".

Can someone tell me why phpcrawl can not follow the 301 links and get the redirect links file ???

This is all my code:

set_time_limit(0);
include("libs/PHPCrawler.class.php");

class Crawler extends PHPCrawler
{
 function handleDocumentInfo($DocInfo)
 {
 if (PHP_SAPI == "cli")
 {
 $lb = "\n";
 }
 else
 {
 $lb = " ";
 }
 echo " Page requested: ". $DocInfo->url;
 echo " Http status code: ". $DocInfo->http_status_code. " ";
 flush();
 }
}

$crawler = new Crawler();
$crawler->setURL("http://product.mobile.163.com");
$crawler->addContentTypeReceiveRule("#text/html#");
$crawler->addURLFilterRule("#\.(jpg|jpeg|gif|png|js)$# i");
$crawler->addURLFollowRule("#product.mobile.163.com/mobile/brand/\w{8}\.html# i");
$crawler->addURLFollowRule("#product.mobile.163.com/Samsung/\w{8}/$# i");
$crawler->setFollowRedirects(true);
$crawler->setFollowRedirectsTillContent(true);

$crawler->go();
$report = $crawler->getProcessReport();

if (PHP_SAPI == "cli")
{
 $lb = "\n";
}
else
{
 $lb = " ";
}
// 
echo "Summary:".$lb;
echo "Links followed: ".$report->links_followed.$lb;
echo "Documents received: ".$report->files_received.$lb;
echo "Bytes received: ".$report->bytes_received." bytes".$lb;
echo "Process runtime: ".$report->process_runtime." sec".$lb;

Add attachments
Cancel

You seem to have CSS turned off. Please don't fill out this field.

Can not follow the 301 links, anyone can help ?

Forums

Help

Can not follow the 301 links, anyone can help ? document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Can not follow the 301 links, anyone can help ?