I want to use PHPCrawl to get the content of a web page. It will work properly for websites that have less than 150 pages in its web site, but when the website has more than this the PHPCrawl has break and give me the following error message: "504 Gateway Time-out".
I have looking in your forum for problem like this and I found that you recommend change the set for the parameters "setStreamTimeout(5)" and "setConnectionTimeout(10)", inclusive I use a different values like 50 and 100 but the problem remains.
I know that the process is executing successfully in to the loop of the method go() because I am saving the data recovery in a table, but as I say before after approximately 150 cycles it is break down with an error.
Could you please suggest me some solution.
Last edit: Anonymous 2013-11-21
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks a lot for your answer. I have the problem found. I just run the Crawler process from the command line and it is working properly.
Again thanks for your support!!!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi again!!
Now I need to execute the script that run the crawler from a php class. I have tried using the exec(), passthru() and system() commands but it does work not.
To clarify a little bit: I receive some parameter from the browser in a Php class and from this class I need to execute the script that run the crawler. Crawler must be execute from a cli enviroment.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I just have found the answer for my last question. so the procedure is the next:
First I have created a script in php that run the phpcrawler (I call it: crawlerCli.php) and then from my php class I use something like this:
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Hello!!
I want to use PHPCrawl to get the content of a web page. It will work properly for websites that have less than 150 pages in its web site, but when the website has more than this the PHPCrawl has break and give me the following error message: "504 Gateway Time-out".
I have looking in your forum for problem like this and I found that you recommend change the set for the parameters "setStreamTimeout(5)" and "setConnectionTimeout(10)", inclusive I use a different values like 50 and 100 but the problem remains.
I know that the process is executing successfully in to the loop of the method go() because I am saving the data recovery in a table, but as I say before after approximately 150 cycles it is break down with an error.
Could you please suggest me some solution.
Last edit: Anonymous 2013-11-21
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Hi!
What gives you this message? PHPcrawl, the server phpcrawl is running on, the proxyserver or the server hosting the website you want to crawl?
And you yre using a proxy? (setProxy-method())?
If so: What happens if you try to crawl the website without using the proxy?
It's hard to say what component in such a constallation is causing the problem from here.
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Hi!!
Thanks a lot for your answer. I have the problem found. I just run the Crawler process from the command line and it is working properly.
Again thanks for your support!!!
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Ah ok, good to hear!
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Hi again!!
Now I need to execute the script that run the crawler from a php class. I have tried using the exec(), passthru() and system() commands but it does work not.
To clarify a little bit: I receive some parameter from the browser in a Php class and from this class I need to execute the script that run the crawler. Crawler must be execute from a cli enviroment.
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Hi everybody!!!
I just have found the answer for my last question. so the procedure is the next:
First I have created a script in php that run the phpcrawler (I call it: crawlerCli.php) and then from my php class I use something like this:
$command = "php crawlerCli.php parameter-1 ... param-n & > /dev/null";
$v = popen($command, 'w');
pclose($v);
Regards and thank you very much for all your support!!!!
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"