Does PHPcrawl do anything to minimize overload on one domain when multiprocessing?
Example: Only allowing one tread pr. domain and with 3 second delay on each url.
- Lars
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If you use the multiprocessing-mode "MPMODE_CHILDS_EXECUTES_USERCODE", you can simply put a sleep(3) in you handleDocumentInfo-code and every thread will wait for 3 seconds after each request (Because you code will be executed directly in the process-context and not in the main-process).
But i don't recommend to use this multiprocessing-mode unless you are aware that you are dealing with parallel-computing, you have to take care of a lot of things and pitfalls.
Just stay at the "MPMODE_PARENT_EXECUTES_USERCODE" mode, you dont't have to take care of anything of these things.
In the next version there will be a option to set a delay for every mode.
And in the version after that there will (hopefully) be a option to set an requests-per-second/minute limit.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Does PHPcrawl do anything to minimize overload on one domain when multiprocessing?
Example: Only allowing one tread pr. domain and with 3 second delay on each url.
- Lars
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Hi Lars,
to make it short: No.
If you use the multiprocessing-mode "MPMODE_CHILDS_EXECUTES_USERCODE", you can simply put a sleep(3) in you handleDocumentInfo-code and every thread will wait for 3 seconds after each request (Because you code will be executed directly in the process-context and not in the main-process).
But i don't recommend to use this multiprocessing-mode unless you are aware that you are dealing with parallel-computing, you have to take care of a lot of things and pitfalls.
Just stay at the "MPMODE_PARENT_EXECUTES_USERCODE" mode, you dont't have to take care of anything of these things.
In the next version there will be a option to set a delay for every mode.
And in the version after that there will (hopefully) be a option to set an requests-per-second/minute limit.