How to start scraping from top domain like "domain.com"
but allow crawler to go in all subdomains on that domain - "one.domain.com" , "two.domain.com" etc…
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Hello,
Right now I have crawler set like this:
$crawler->setURL("http://www.domain.com");
$crawler->addReceiveContentType("#text/html#");
$crawler->addURLFilterRule("#.(jpg|gif|png|pdf|jpeg|css|js)$# i");
So, it start from index page on domain name and crawl all data on that domain.
But I want to do is to start crawling from some inner page like
http://www.domain.com/something/page.html
and again crawl all data on that domain.
If I set
$crawler->setURL("http://www.domain.com/something/page.html);
It will crawl only that one page, right?
So, how can I do crawling entry domain but starting crawling from some inner page?
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
And another question:
How to start scraping from top domain like "domain.com"
but allow crawler to go in all subdomains on that domain - "one.domain.com" , "two.domain.com" etc…
Hi!
To your first question:
No, that will work, you can start from any page you want inside the domain,
so http://www.domain.com/something/page.html is just fine.
To your second question:
Simply set the follow-mode to 1 (setFollowMode(1), http://phpcrawl.cuab.de/classreferences/PHPCrawler/method_detail_tpl_method_setFollowMode.htm).
That's it.
Best Regards!
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Like this:
$crawler-> setFollowMode("1");
Thanks a lot!