Hi , the library is working well for most of website but i can not crawl this store www.uncommongoods.com it always be stopped after crawled 3 URLS of this store.
Didn't test it so far, but did you try to increase the stream-timeot and connection-timeout?
(See the FAQ, first one: http://phpcrawl.cuab.de/faq.html\)
You seem to have CSS turned off.
Please don't fill out this field.
Hi here is another store can not be able to crawl http://www.bobbyberkhome.com/
it always get stopped at
http://www.bobbyberkhome.com/Controller/User_Session/CallTryCookie?crequest=33a4ac0204694e404647980bd1ada69ce6286bffd2d482614afcf67a6d68c2b2356d96f0c302544c , the crawl had stopped at 3rd links found
Hi , my crawl config always be set
But it doesn't work
Ok, i figured it out.
For the firts page (www.uncommongoods.com) it's just that a "Acceppt"-directive is missing in the header phpcrawl sends. The hosting webserver doesn't deliver anything wihtout this header (content-length: 0 bytes).
Please use the attached file/class as patch (simply put it in your "libs"-path of the phpcrawl package and overwrite the existing one).
For the second page it's a little more complicated, dont't have a solution up to now.
It starts with a request for http://www.bobbyberkhome.com/, a redirect to some other sites (where an authorization cookie is send) and then a redirct BACK to http://www.bobbyberkhome.com/. But http://www.bobbyberkhome.com/ already was crawled and so the crawler stops at this piont.
Have to think about a solution.
I'm closind this bugreport and will opnen two sepearate ones.
THANKS for the report!