Share

WebHarvest - web data extraction tool

The forum address has changed, you have been automatically redirected. Please update any bookmarks to use the new URL.

Subscribe

Memory leak problem

You are viewing a single message from this topic. View all messages.

  1. 2007-08-13 14:53:12 UTC
    Hello,

    I have been successfully using web harvest for scraping for some time.
    I have lately discovered an issue regarding scraping multiple pages and processing them with some xpath expressions.

    I did some basic profiling and apparently the class

    org.webharvest.runtime.variables.NodeVariable is the one that pumps up with every downloaded page resulting in the end an OutOfMemory exception for the scraping process.

    Maybe there is a way to fix this issue.

    Kind regards,
    Mile
< Previous | 1 | Next >

Add a Reply

This forum does not allow anonymous participation.

Log in to add a reply. Not registered? Create an account to participate and receive email updates when replies are posted to this topic.