I am having great success with my crawler based on your code… the only problem I am having right now is that sometimes I see URLs like this which are invalid (404s):
I have looked at the source file from where this link is being found
and don't see any link like this…
You seem to have CSS turned off.
Please don't fill out this field.
Here is some additional debug I output while processing this url:
You may try to set $crawler->enableAggressiveLinkSearch(false) (http://phpcrawl.cuab.de/classreferences/index.html).
This is a known problem and is already on the list of known bugs (http://sourceforge.net/tracker/?func=detail&aid=3555300&group_id=89439&atid=590146) and will (hopefully) get fixed inthe next version.
Sign up for the SourceForge newsletter: