indeed, I have a similar problem. when saving too many vars, or saving the html-to-xml in a var for faster processing the memory usage is rising fast. Java Visual VM shows high memory usage in char[], int[], java.lang.String and java.lang.String[] objects also high numbers of instances per object. If adding NULL-references of the created vars and a System.gc() in a <script>-Tag the memory usage can be hold on a lower level but performance is getting low with this.</p></script>
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you for your feedback. We are working currently on 2.1.0rc1 version which we hope will be released in a couple of days. This version include a huge number of changes in the web harvest core architecture. I did some tests scraping flickr for about an hour. Heap size grown to 350MB, while its actual utilization was about 280MB - I think this is quite good result ;)
As I mentioned, 2.1.0rc1 is not yet officially released (it's a matter of days), but you can obtain it checking out 2.1-release branch.
I am closing this bug, setting as resolved in the 2.1.0rc1 version. If you find that this version do not perform good enough, please fill a new bug, including your configuration.
indeed, I have a similar problem. when saving too many vars, or saving the html-to-xml in a var for faster processing the memory usage is rising fast. Java Visual VM shows high memory usage in char[], int[], java.lang.String and java.lang.String[] objects also high numbers of instances per object. If adding NULL-references of the created vars and a System.gc() in a <script>-Tag the memory usage can be hold on a lower level but performance is getting low with this.</p></script>
Thank you for your feedback. We are working currently on 2.1.0rc1 version which we hope will be released in a couple of days. This version include a huge number of changes in the web harvest core architecture. I did some tests scraping flickr for about an hour. Heap size grown to 350MB, while its actual utilization was about 280MB - I think this is quite good result ;)
As I mentioned, 2.1.0rc1 is not yet officially released (it's a matter of days), but you can obtain it checking out 2.1-release branch.
I am closing this bug, setting as resolved in the 2.1.0rc1 version. If you find that this version do not perform good enough, please fill a new bug, including your configuration.
Last edit: Piotr Dyraga 2012-11-16