I am encountering a problem using the crawler where for some sites, I get loads of the 'content not received' error message. Here are a couple of examples and some stats:
I was wondering if anyone had any idea how to stop this happening. I have crawled other sites using the same script/server, and its been fine. Also, if you could inform what usually triggers the message i.e. what features in the script / server / website that cause it to happen, then that would be great too.
Thanks
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2018-11-14
Post awaiting moderation.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Did you try to increase the stream-timeout and connection-timout? Some slow sites (or servers) don't respond within the default timeoutsettings, maybe that's the reason.
And did you ttake a look at the error-code ($DocInfo->error)?
Hi all
I am encountering a problem using the crawler where for some sites, I get loads of the 'content not received' error message. Here are a couple of examples and some stats:
http://store.makro.co.uk/
Links followed: 4683, Content not received: 3823
http://www.brakesce.co.uk/
Links followed: 2429, Content not received: 2164
I was wondering if anyone had any idea how to stop this happening. I have crawled other sites using the same script/server, and its been fine. Also, if you could inform what usually triggers the message i.e. what features in the script / server / website that cause it to happen, then that would be great too.
Thanks
Hi!
Did you try to increase the stream-timeout and connection-timout? Some slow sites (or servers) don't respond within the default timeoutsettings, maybe that's the reason.
And did you ttake a look at the error-code ($DocInfo->error)?
Also take a look at the FAQs (http://phpcrawl.cuab.de/faq.html, first point).
… sorry, it's $DocInfo->error_string, not $DocInfo->error.
"Did you try to increase the stream-timeout and connection-timout?"
Thanks, that did the trick.
Good to hear.
Maybe the default stream- and connection-timouts should get increased in the next version.
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Do you remember how much did you increase it?
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Where Can I find these crawled data on my system once crawling finished?
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
im also cant find where is that files once i done crawled , kindly help me on this
thank you