PHP Memory Limit Error WITH SQLite Caching On

  • josepilove

    josepilove - 2012-10-18

    So I have this set:

    but I am still getting an error where PHP runs out of memory. It seems like SQLite isn't being used…I am crawling a huge site and I need to not use PHP memory for this.

    Any advice? What am I missing?

  • Nobody/Anonymous


    Could you please post your complete crawler-setup?
    Tehn i'll take a look at it.

    And whats the error-message you get?


  • josepilove

    josepilove - 2012-10-18
    $url = "";
    class MyCrawler extends PHPCrawler 
        function handleDocumentInfo($DocInfo) 
        $url = "";
        $fp = fopen('text.csv', 'a');
        $url_ = str_replace($url, '', $DocInfo->url);
        $level = substr_count($url_, '/');
        $level = $level-1;
        $input = array();
        array_push($input, $DocInfo->http_status_code);
        array_push($input, $level+1);
        for($i=0; $i<=$level; $i++){
            array_push($input, '');
        array_push($input, $url_);
        fputcsv($fp, $input);
        echo $url_."\n";
    $crawler = new MyCrawler();
    $crawler->addURLFilterRule("#\.(jpg|jpeg|gif|png)$# i");
    if (isset($url)){
        echo "Crawling ".$url." this might take a while...";
    $report = $crawler->getProcessReport();
    if (PHP_SAPI == "cli") $lb = "\n";
    else $lb = "<br />";
    echo "Summary:".$lb;
    echo "Links followed: ".$report->links_followed.$lb;
    echo "Documents received: ".$report->files_received.$lb;
    echo "Bytes received: ".$report->bytes_received." bytes".$lb;
    echo "Process runtime: ".$report->process_runtime." sec".$lb; 
  • josepilove

    josepilove - 2012-10-18

    Error: PHP Fatal error: Allowed memory size of 512008042 bytes exhausted.

  • Nobody/Anonymous

    Your setup looks ok, that's strange.

    I'll take a closer look at the problem and the inside tomorrow (i guess), have no time right now.

    Best regards!

  • josepilove

    josepilove - 2012-10-18


    FWIW, I am running this in OSX 10.8.2.

  • Nobody/Anonymous

    Hi josepilove,

    ok, i just took a closer look at your problem and indeed detected a (small) memory leak in phpcrawl (with SQLite-caching enabled).
    But it really doesn't eat that much memory over here (Ubuntu, php 5.3.10), so how did you get it to claim 512Mb of memory?
    When does your script reach that limit, like after what number of crawled pages?

    And could you do me a favour and add "echo memory_get_usage()" to your handleDocumentInfo-method and post the output of it for the first 20 pages or so?
    So i can compare it with my results and see if there's an even bigger leak under OSX (i used "" for testign btw.)

    public function handleDocumentInfo($DocInfo) 
      // ...
     echo memory_get_usage().$lb;


  • josepilove

    josepilove - 2012-10-22


    Here is the output for the first 20.


  • Nobody/Anonymous


    looks similar over here. Are you using PHP 3.3.1x?
    This is a difficult one, it seems that something changed since php 3.3.? regarding memory management (garbage collection) that's causing the leak.
    I tried to find the culprit in phpcrawl, but no luck so far. It's a little difficult as i said since there's no real memory-profiler for PHP applications out there, but i'm gonna find it and i let you know.

    Thanks for the report by the way!

  • josepilove

    josepilove - 2012-10-22

    I'm using PHP 5.3.15.

    Again, thanks so much. let me know if there is anything else i can do to help.

  • Nobody/Anonymous

    Hey josepilove,

    i think i got it (took me the whole damn night ;) )

    Unfortunately there seems to be a bug (memory leak) in PHPs stream_client_connect() function (that's used by phpcrawl since 0.81) as descibed here:

    I don't know if and since what version they fixed it (it's a little confusing there in the report).

    Could you please run the test-script from the mentioned bug report on your machine to see if your version of PHP is affected by the leak?

    Thank you very much for your help and the report in genral!!

    PS: You may use phpcrawl 0.80 until the problem get's solved (somehow), it doesn't use that leaking function.

  • josepilove

    josepilove - 2012-10-23

    here is the output of that test-script:

    memory: 618kb
    memory: 619kb
    memory: 619kb
    memory: 619kb
    memory: 619kb

    I'm going to give 0.80 a try.

    Thanks for all of your help!

  • Nobody/Anonymous

    Hm, seems that you version of PHP is not affected by this leak.
    Let me know if v 0.8 works for you.

  • josepilove

    josepilove - 2012-10-24

    still running into the memory limit error.

    I am trying to crawl, not I still only get thru about 600-800 pages.

  • Nobody/Anonymous

    Shit ;)

    Looos like another memory leak somewhere (or a problem with sqlite with your system).
    It's really strange that this occurs after 600-800 oages, never happened here and never heard about something
    similar before.

    I'll take a closer look at it in a few days, im away for some days now.
    Hopefully i can detect something  when testing the crawler on that page.

  • josepilove

    josepilove - 2012-10-26

    Last night I tried the test interface and it just finished. Here is the output:

    Process finished! Links followed: 26761  Kb received: 1229073  Data throughput kb/s: 30 
    Files received: 14847  Time in sec: 40616.46  Max memory-usage in KB: 340224.00

  • Nobody/Anonymous


    sorry again for my (real) late answer, but i just couldn't find out anything regarding another memory leak.

    BUT: I just noticed that you let the crawler receive EVERY type of document in your sript.
    Maybe on it's way the crawler tries to receive a huge file into local memory and hits the memory-limit with that.

    You should try to use $crawler->addContentTypeReceiveRule("#text/html#"); to your directives, this let's the crawler
    ONLY receive html-documents.

    Do you know the exact URL the crawler tried to process before it reached the memory-limit?



Cancel  Add attachments

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.

No, thanks