Follow-mode 1, crawler doesn't follow some subdomains
Status: Beta
Brought to you by:
huni
A user reported that the crawler doesn't follow (some) subdomains if follow-mode is set to 1.
(setFollowMode(1))
Example:
Root-URL: www.foo.com, follow-mode 1.
Links to e.g. www.hamburg.foo.com don't get followed.
Also see this original forum-post:
https://sourceforge.net/p/phpcrawl/discussion/307696/thread/85b8d294/
Anonymous
View and moderate all "bugs Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Bugs"
Yes, I managed to work this around by:
// Filter URLs to other domains if wanted
if ($this->general_follow_mode >= 1)
{
$should_remove = !( $url_parts["domain"] == $this->starting_url_parts["host"] || $url_parts["host"] == $this->starting_url_parts["host"] );
// if ($url_parts["domain"] != $this->starting_url_parts["domain"]) return false;
if ( $should_remove ) {
return false;
}
}
in file PHPCrawlerURLFilter.class.php(line 174).