I've got a question. How to configure PHPCrawl to do this kind of task:
I want to crawl one whole site with given URL and find all URL's to external sites, but only these. I dont't want to list internal links, but only external. Then the whole proccess should stop (crawler should not follow those external links, just list them).
Thanks for reply
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The crawler returns all found links within a site in the array PHPCrawlerDocumentInfo->links_found_url_descriptors.
This array contains ALL found links, internal end external links.
So you simply have to iterate over this array and filter out all external links (by using a regular expression against the hostename of the site you are crawling e.g.).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Hi,
I've got a question. How to configure PHPCrawl to do this kind of task:
I want to crawl one whole site with given URL and find all URL's to external sites, but only these. I dont't want to list internal links, but only external. Then the whole proccess should stop (crawler should not follow those external links, just list them).
Thanks for reply
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Hi!
The crawler returns all found links within a site in the array PHPCrawlerDocumentInfo->links_found_url_descriptors.
This array contains ALL found links, internal end external links.
So you simply have to iterate over this array and filter out all external links (by using a regular expression against the hostename of the site you are crawling e.g.).