I want to force the crawler to limit crawling in the same directory and the sub directories or say Just crawl "All pages and sub-directories residing in the current directory" and none other than that.
e.g.
I added a URL
www.example.com/pages/page4.html
URLS Found
www.example.com // I want to stop this and restrict it
www.example.com/page1.html // Stop this
www.example.com/pages/sub/page1.html // this needs to be crawled
www.example.com/pages/page3.html // this needs to be crawled
How to do that ?
Thanks for the wonderful crawler :)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Hi,
I want to force the crawler to limit crawling in the same directory and the sub directories or say Just crawl "All pages and sub-directories residing in the current directory" and none other than that.
e.g.
I added a URL
www.example.com/pages/page4.html
URLS Found
www.example.com // I want to stop this and restrict it
www.example.com/page1.html // Stop this
www.example.com/pages/sub/page1.html // this needs to be crawled
www.example.com/pages/page3.html // this needs to be crawled
How to do that ?
Thanks for the wonderful crawler :)
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Hi!
It's easy, just look at the "setFollowMode()"-method:
http://phpcrawl.cuab.de/classreferences/PHPCrawler/method_detail_tpl_method_setFollowMode.htm
In your case: setFollowMode(3).
Alternatively you can just set a follow-rule yourself with addURLFollowRule():
http://phpcrawl.cuab.de/classreferences/PHPCrawler/method_detail_tpl_method_addURLFollowRule.htm
like addURLFollowRule("#www.example.com/pages/#").