Menu

Memory leak problem

Help
Mile Rosu
2007-08-13
2012-09-04
  • Mile Rosu

    Mile Rosu - 2007-08-13

    Hello,

    I have been successfully using web harvest for scraping for some time.
    I have lately discovered an issue regarding scraping multiple pages and processing them with some xpath expressions.

    I did some basic profiling and apparently the class

    org.webharvest.runtime.variables.NodeVariable is the one that pumps up with every downloaded page resulting in the end an OutOfMemory exception for the scraping process.

    Maybe there is a way to fix this issue.

    Kind regards,
    Mile

     
    • steve_scraper

      steve_scraper - 2007-08-13

      Mile,
      I had some OutOfMemory exceptions too. Please see this post: "Way to ignore tags?" http://sourceforge.net/forum/forum.php?thread_id=1797536&forum_id=591299
      See Vladimir's suggestions. If you are using java 1.4, try upgrading to 1.6 (it helped for me).

      Steve

       
    • Mile Rosu

      Mile Rosu - 2007-08-13

      Hi Steve,

      I have read the thread and used java 1.6 rev 2. It's with this VM that I have the problem.
      The workaround I used was to divide the scraping in batches.

      Kind regards,
      Mile

       

Log in to post a comment.