Menu

#21 Memory

2.1.0
closed
None
5
2025-09-06
2009-10-08
Anonymous
No

The tool often runs out of heap memory as it is scraping.
Even when I increase the virtual memory to 1.4 G, it's still a problem.

Any way to reduce memory issues / do garbage cleanup along the way?

Discussion

  • Thomas Schönbeck

    indeed, I have a similar problem. when saving too many vars, or saving the html-to-xml in a var for faster processing the memory usage is rising fast. Java Visual VM shows high memory usage in char[], int[], java.lang.String and java.lang.String[] objects also high numbers of instances per object. If adding NULL-references of the created vars and a System.gc() in a <script>-Tag the memory usage can be hold on a lower level but performance is getting low with this.</p></script>

     
  • Piotr Dyraga

    Piotr Dyraga - 2012-11-15
    • milestone: --> Backlog
     
  • Piotr Dyraga

    Piotr Dyraga - 2012-11-16
    • assigned_to: Piotr Dyraga
     
  • Piotr Dyraga

    Piotr Dyraga - 2012-11-16

    Thank you for your feedback. We are working currently on 2.1.0rc1 version which we hope will be released in a couple of days. This version include a huge number of changes in the web harvest core architecture. I did some tests scraping flickr for about an hour. Heap size grown to 350MB, while its actual utilization was about 280MB - I think this is quite good result ;)

    As I mentioned, 2.1.0rc1 is not yet officially released (it's a matter of days), but you can obtain it checking out 2.1-release branch.

    I am closing this bug, setting as resolved in the 2.1.0rc1 version. If you find that this version do not perform good enough, please fill a new bug, including your configuration.

    Heap space

     

    Last edit: Piotr Dyraga 2012-11-16
  • Piotr Dyraga

    Piotr Dyraga - 2012-11-16
    • status: open --> closed
    • milestone: Backlog --> 2.1.0rc1-RELEASE
     

Anonymous
Anonymous

Add attachments
Cancel





MongoDB Logo MongoDB