I was able to specify proxy for each child request, but I am wonder if it is possible to modify the crawler in a way that it uses a different proxy for each page, not for the entire child process? The idea is to have some kind of proxies pool, and upon each page request a random proxy to be taken from there?
Also is there any way to handle exception that is being thrown when not able to connect to the proxy? Currently as far as I can see, if a child cannot connect to the proxy for any of the pages it throws an exception and aborts. I would like to be able to control this behavior so either the child should just ignore this page and move to the next one, or simply try to load the page without using the proxy?
Last edit: Anonymous 2015-05-25
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
I was able to specify proxy for each child request, but I am wonder if it is possible to modify the crawler in a way that it uses a different proxy for each page, not for the entire child process? The idea is to have some kind of proxies pool, and upon each page request a random proxy to be taken from there?
Also is there any way to handle exception that is being thrown when not able to connect to the proxy? Currently as far as I can see, if a child cannot connect to the proxy for any of the pages it throws an exception and aborts. I would like to be able to control this behavior so either the child should just ignore this page and move to the next one, or simply try to load the page without using the proxy?
Last edit: Anonymous 2015-05-25
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
I use haproxy, and load-balance connections coming in from phpcrawl across my proxies that way. Maybe you'd have luck with that method.
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
This is a good feature that I'm also looking for.