PHPCrawl is very good at what it does, but I wasn't able to find any option (or where to put a proper regex) for it to follow 'href' parameter only from '<a>' anchor tags. Is there any solution to this?
Big thank you to anyone who responds :)
Last edit: Anonymous 2014-12-17
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have tried exactly that, but, you see, 'href' itself is not a tag, but a parameter of tag. PHP Crawl does everything right, in a sense, that it really follows every 'href' it can find in the document (including <link href="..."> and other). All I want is it to follow <a href="..."> ;)
Last edit: Anonymous 2014-12-17
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Unfortunately this can't be set without modifying the phpcrawl-code itself.
But feel free to put this on the list of feature-requests, like a new method "addLinkExtractionTags" (after the old method "setLinkExtractionTags" was renamed to "setLinkExtractionAttributes")
Last edit: Anonymous 2014-12-17
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Hello,
PHPCrawl is very good at what it does, but I wasn't able to find any option (or where to put a proper regex) for it to follow 'href' parameter only from '<a>' anchor tags. Is there any solution to this?
Big thank you to anyone who responds :)
Last edit: Anonymous 2014-12-17
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Hi!
Did you try enableAggressiveLinkSearch(false) and setLinkExtractionTags(array("href"))?
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Hello,
I have tried exactly that, but, you see, 'href' itself is not a tag, but a parameter of tag. PHP Crawl does everything right, in a sense, that it really follows every 'href' it can find in the document (including <link href="..."> and other). All I want is it to follow <a href="..."> ;)
Last edit: Anonymous 2014-12-17
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Ah ok, i see.
Unfortunately this can't be set without modifying the phpcrawl-code itself.
But feel free to put this on the list of feature-requests, like a new method "addLinkExtractionTags" (after the old method "setLinkExtractionTags" was renamed to "setLinkExtractionAttributes")
Last edit: Anonymous 2014-12-17