Is it feasible to run PHPCrawl from Commandline? Ie. is it "realistic" that it can crawl 300.000 pages or so if I run it from the commandline, or will it somehow stop (as it 100% surely would via a browser ;-))?
How would i run it from Commandline to ensure it stays as stable as possible?
I'm planning to run it from a Linux environment where I have full server control. Would it just be:
$ php my_crawler.php
…. or are there any configuration aspects I should bear in mind for optimal result?
Again, thanks & sorry if its a dumb question :-)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
normally you always should run phpcrawl from the commandline (cli).
And yes, simply run your script/project by executing "$ php my_crawler_script.php".
If you want to spider huge websites (containing 300.000 pages or more),
you should switch to the internal SQLite-cache by setting:
I did not have sqllite installed, but managed to google my way to that.
It's running like a charm now :-)
I have one last question regarding CLI. Its that I've started the process via Terminal / SSH (I'm on a Mac), but my girlfriend closed down my computer and that stopped it from running apparently.
How can I get it to run even when I'm not connected?
Thanks :))
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi you lovely people :)
Is it feasible to run PHPCrawl from Commandline? Ie. is it "realistic" that it can crawl 300.000 pages or so if I run it from the commandline, or will it somehow stop (as it 100% surely would via a browser ;-))?
How would i run it from Commandline to ensure it stays as stable as possible?
I'm planning to run it from a Linux environment where I have full server control. Would it just be:
$ php my_crawler.php
…. or are there any configuration aspects I should bear in mind for optimal result?
Again, thanks & sorry if its a dumb question :-)
Hi!
normally you always should run phpcrawl from the commandline (cli).
And yes, simply run your script/project by executing "$ php my_crawler_script.php".
If you want to spider huge websites (containing 300.000 pages or more),
you should switch to the internal SQLite-cache by setting:
(as described here: http://phpcrawl.cuab.de/spidering_huge_websites.html).
By running your script from the commandline and using this type of caching it shouldn't be a problem to spider
even very huge websites.
Best regards!
Perfect - thank you very much for your help! :)
I did not have sqllite installed, but managed to google my way to that.
It's running like a charm now :-)
I have one last question regarding CLI. Its that I've started the process via Terminal / SSH (I'm on a Mac), but my girlfriend closed down my computer and that stopped it from running apparently.
How can I get it to run even when I'm not connected?
Thanks :))
Google to the rescue, answer to own question: Just insert nohub infront and then kill the pid when you're done.. :)