I'm using PHPCrawl for a couple of months and I really like it!
But I have one question. I'm crawling pages which have a selectbox which redirects you to another page. These 'pages'/'options' in the selectbox aren't recognized by PHPCrawl.
Example:
I'm on this page: www.example.com/Shop/Product/sft001/SFT001
There is a selectbox with this information/javascript:
how can I make PHPCrawl aware that he should also visit:
www.example.com/Shop/Product/sft001/SFT002
www.example.com/Shop/Product/sft001/SFT003
www.example.com/Shop/Product/sft001/SFT004
?
I'm thinking about using this:
$crawler->setLinkExtractionTags(array("href","src","url","location","codebase","background","data","profile","action","open","value"));
Because I'm afraid I'm ruining the performance, because the value-attribute can be used in many many places...
By example it's also used in the header of the website to show the languages and currencies...
Is this the way to go?
Thanks in advance for your reply!
Last edit: Anonymous 2015-07-29
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Hi,
I'm using PHPCrawl for a couple of months and I really like it!
But I have one question. I'm crawling pages which have a selectbox which redirects you to another page. These 'pages'/'options' in the selectbox aren't recognized by PHPCrawl.
Example:
I'm on this page: www.example.com/Shop/Product/sft001/SFT001
There is a selectbox with this information/javascript:
how can I make PHPCrawl aware that he should also visit:
www.example.com/Shop/Product/sft001/SFT002
www.example.com/Shop/Product/sft001/SFT003
www.example.com/Shop/Product/sft001/SFT004
?
I'm thinking about using this:
$crawler->setLinkExtractionTags(array("href","src","url","location","codebase","background","data","profile","action","open","value"));
Because I'm afraid I'm ruining the performance, because the value-attribute can be used in many many places...
By example it's also used in the header of the website to show the languages and currencies...
Is this the way to go?
Thanks in advance for your reply!
Last edit: Anonymous 2015-07-29
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
maybe you can use the regex to reach the purpose,,,
Hi!
Yes, your posted solution should work fine:
$crawler->setLinkExtractionTags(array("href","src","url","location","codebase","background","data","profile","action","open","value"));
.. as you may have figured out meanwhile ;)
Last edit: Uwe Hunfeld 2015-08-29
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Thanks for your reply, this works and it looks like it didn't have any big impact on the performance!!!
Last edit: Anonymous 2016-01-30