Menu

#76 HtmlTable, HtmlTableRows memory leak problem.

closed
None
5
2012-10-21
2003-04-14
No

HtmlTable,HtmlTableRow have the memory leak problem.

Once you use getRows or getCells method, these list
members
(HtmlTableDataCell) will never freed.

For example:


WebClient webClient = new WebClient();
for(int i=0; i<urls.length; i++ ){
HtmlPage page = (HtmlPage)webClient.getPage( urls[i] );

HtmlTable table

=(HtmlTable)page.getHtmlElementById("table1");
List rows = table.getRows();
Iterator rowIterator = rows.iterator();
while( rowIterator.hasNext() ) {
HtmlTableRow row = (HtmlTableRow)iterator.next();
System.out.println("Found row");
List cells = row.getCells();
Iterator cellIterator = cells.iterator();
while( cellIterator.hasNext() ) {
TableCell cell = (TableCell)cellIterator.next();
System.out.println(" Found cell: "+cell.asText());
}
cellIterator = null;
cells = null;
row = null;
}
rowIterator = null;
rows = null;
table = null;
page = null;
System.gc();
}


( here urls[] is an array of urls, and every page has a
table named "table1". )

If you execute this code, in every garbage collection
timing,
you will see that HtmlTableDataCell and HtmlTableRow are
never freeed (use tools like JProbe, which can display
the instance count and memory usage of classes ).
At last, the Instance count of HtmlTableRow will be the
cumulative sum of rows of all tables in all pages, and
of HtmlTableDataCell, the cumulative sum of cells of
all tables in all pages.

Discussion

  • Noboru Sinohara

    Noboru Sinohara - 2003-04-14

    Logged In: YES
    user_id=756657

    The attached file is a source code illustrating this problem.
    This program try to retrieve same page for 62 times.
    this page has a table which 101 rows, 8 cells in each rows.

    the result of execution is shown bellow.
    the program failed in the midle of 25 th iteration with
    OutOfMemoryException.
    The instance count of HtmlTableRow equal 101 * 25.

    Runtime Heap Summary: jp.co.indb.sinopa.htmlunitexample.example1

    Runtime Instance List

                Package                               Class
                  Count              Memory         
                -------                               -----
                  -----              ------         
                                            Total          
             25,948 (100.0%)    812.176 (100.0%)
    

    com.gargoylesoftware.htmlunit.html
    HtmlTableDataCell 20,200 (77.8%) 646.4
    (79.6%)
    java.util ArrayList
    2,550 (9.8%) 61.2 (7.5%)
    com.gargoylesoftware.htmlunit.html HtmlTableRow
    2,525 (9.7%) 80.8 (9.9%)
    char[ ]
    192 (0.7%) 7.016 (0.9%)
    java.lang String
    189 (0.7%) 4.536 (0.6%)
    com.gargoylesoftware.htmlunit.html
    SimpleHtmlElementCreator 66 (0.3%) 1.056
    (0.1%)
    java.net URL
    62 (0.2%) 3.472 (0.4%)
    java.util HashMap
    52 (0.2%) 2.08 (0.3%)
    com.gargoylesoftware.htmlunit.html
    HtmlPage$MyParser 25 (0.1%) 3
    (0.4%)
    com.gargoylesoftware.htmlunit ScriptFilter
    25 (0.1%) 1 (0.1%)
    com.gargoylesoftware.htmlunit.javascript
    JavaScriptEngine$PageInfo 25 (0.1%) 0.6
    (0.1%)
    java.beans
    PropertyChangeSupport 25 (0.1%) 0.6
    (0.1%)
    com.gargoylesoftware.htmlunit.html
    TableElementCreator 3 (0.0%) 0.024
    (0.0%)
    java.lang Class
    3 (0.0%) 0.168 (0.0%)
    com.gargoylesoftware.htmlunit.javascript
    StrictErrorReporter 1 (0.0%) 0.016
    (0.0%)
    jp.co.indb.sinopa.htmlunitexample example1
    1 (0.0%) 0.008 (0.0%)
    com.gargoylesoftware.htmlunit WebClient
    1 (0.0%) 0.072 (0.0%)
    java.util TreeMap
    1 (0.0%) 0.04 (0.0%)
    com.gargoylesoftware.htmlunit.html
    HtmlInputElementCreator 1 (0.0%) 0.008
    (0.0%)
    Object[ ]
    1 (0.0%) 0.08 (0.0%)

    Report Date: 2003/04/15 1:17:32

     
  • Noboru Sinohara

    Noboru Sinohara - 2003-04-14

    test source

     
  • Noboru Sinohara

    Noboru Sinohara - 2003-04-15

    patch for 1.2.2

     
  • Noboru Sinohara

    Noboru Sinohara - 2003-04-15

    Logged In: YES
    user_id=756657

    I made a patch which resolve (only) this problem.
    added features are below:

    1. add "dispose" method to HtmlTable, HtmlTableRow which
      clear the private List variable
    2. add "deregisterPage" method to ScriptEngine and
      JavaScriptEngine
      which deregister a page from PageInfo.

    By calling these method at appropriate timing, you can free
    memory.

     
  • Noboru Sinohara

    Noboru Sinohara - 2003-04-15

    Logged In: YES
    user_id=756657

    The attached file is revised test source to use added new
    method.
    this code will not exhaust memory.

     
  • Noboru Sinohara

    Noboru Sinohara - 2003-04-15

    test source revised to use added new methods

     
  • Mike Bowler

    Mike Bowler - 2003-05-04

    Logged In: YES
    user_id=46756

    The core problem is that HtmlPage objects were never being
    garbage collected and those pages were hanging onto the
    various table objects.

    Changed the JavaScriptEngine to use weak references for
    HtmlPage objects which will allow those pages to be garbage
    collected.

     

Log in to post a comment.