Menu

How to run PHPCrawl from Commandline?

Help
2012-05-28
2013-04-09
  • brundleseth

    brundleseth - 2012-05-28

    Hi you lovely people :)

    Is it feasible to run PHPCrawl from Commandline? Ie. is it "realistic" that it can crawl 300.000 pages or so if I run it from the commandline, or will it somehow stop (as it 100% surely would via a browser ;-))?

    How would i run it from Commandline to ensure it stays as stable as possible?

    I'm planning to run it from a Linux environment where I have full server control. Would it just be:

    $ php my_crawler.php

    …. or are there any configuration aspects I should bear in mind for optimal result?

    Again, thanks & sorry if its a dumb question :-)

     
  • Uwe Hunfeld

    Uwe Hunfeld - 2012-05-30

    Hi!

    normally you always should run phpcrawl from the commandline (cli).
    And yes, simply run your script/project by executing "$ php my_crawler_script.php".

    If you want to spider huge websites (containing 300.000 pages or more),
    you should switch to the internal SQLite-cache by setting:

    $crawler->setUrlCacheType(PHPCrawlerUrlCacheTypes::URLCACHE_SQLITE);

    (as described here: http://phpcrawl.cuab.de/spidering_huge_websites.html).

    By running your script from the commandline and using this type of caching it shouldn't be a problem to spider
    even very huge websites.

    Best regards!

     
  • brundleseth

    brundleseth - 2012-05-30

    Perfect - thank you very much for your help! :)

    I did not have sqllite installed, but managed to google my way to that.

    It's running like a charm now :-)

    I have one last question regarding CLI. Its that I've started the process via Terminal / SSH (I'm on a Mac), but my girlfriend closed down my computer and that stopped it from running apparently.

    How can I get it to run even when I'm not connected?

    Thanks :))

     
  • brundleseth

    brundleseth - 2012-05-30

    Google to the rescue, answer to own question: Just insert nohub infront and then kill the pid when you're done.. :)

     

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.