HtmlTable,HtmlTableRow have the memory leak problem.
Once you use getRows or getCells method, these list
members
(HtmlTableDataCell) will never freed.
For example:
WebClient webClient = new WebClient();
for(int i=0; i<urls.length; i++ ){
HtmlPage page = (HtmlPage)webClient.getPage( urls[i] );
HtmlTable table
=(HtmlTable)page.getHtmlElementById("table1");
List rows = table.getRows();
Iterator rowIterator = rows.iterator();
while( rowIterator.hasNext() ) {
HtmlTableRow row = (HtmlTableRow)iterator.next();
System.out.println("Found row");
List cells = row.getCells();
Iterator cellIterator = cells.iterator();
while( cellIterator.hasNext() ) {
TableCell cell = (TableCell)cellIterator.next();
System.out.println(" Found cell: "+cell.asText());
}
cellIterator = null;
cells = null;
row = null;
}
rowIterator = null;
rows = null;
table = null;
page = null;
System.gc();
}
( here urls[] is an array of urls, and every page has a
table named "table1". )
If you execute this code, in every garbage collection
timing,
you will see that HtmlTableDataCell and HtmlTableRow are
never freeed (use tools like JProbe, which can display
the instance count and memory usage of classes ).
At last, the Instance count of HtmlTableRow will be the
cumulative sum of rows of all tables in all pages, and
of HtmlTableDataCell, the cumulative sum of cells of
all tables in all pages.
Logged In: YES
user_id=756657
The attached file is a source code illustrating this problem.
This program try to retrieve same page for 62 times.
this page has a table which 101 rows, 8 cells in each rows.
the result of execution is shown bellow.
the program failed in the midle of 25 th iteration with
OutOfMemoryException.
The instance count of HtmlTableRow equal 101 * 25.
Runtime Heap Summary: jp.co.indb.sinopa.htmlunitexample.example1
Runtime Instance List
com.gargoylesoftware.htmlunit.html
HtmlTableDataCell 20,200 (77.8%) 646.4
(79.6%)
java.util ArrayList
2,550 (9.8%) 61.2 (7.5%)
com.gargoylesoftware.htmlunit.html HtmlTableRow
2,525 (9.7%) 80.8 (9.9%)
char[ ]
192 (0.7%) 7.016 (0.9%)
java.lang String
189 (0.7%) 4.536 (0.6%)
com.gargoylesoftware.htmlunit.html
SimpleHtmlElementCreator 66 (0.3%) 1.056
(0.1%)
java.net URL
62 (0.2%) 3.472 (0.4%)
java.util HashMap
52 (0.2%) 2.08 (0.3%)
com.gargoylesoftware.htmlunit.html
HtmlPage$MyParser 25 (0.1%) 3
(0.4%)
com.gargoylesoftware.htmlunit ScriptFilter
25 (0.1%) 1 (0.1%)
com.gargoylesoftware.htmlunit.javascript
JavaScriptEngine$PageInfo 25 (0.1%) 0.6
(0.1%)
java.beans
PropertyChangeSupport 25 (0.1%) 0.6
(0.1%)
com.gargoylesoftware.htmlunit.html
TableElementCreator 3 (0.0%) 0.024
(0.0%)
java.lang Class
3 (0.0%) 0.168 (0.0%)
com.gargoylesoftware.htmlunit.javascript
StrictErrorReporter 1 (0.0%) 0.016
(0.0%)
jp.co.indb.sinopa.htmlunitexample example1
1 (0.0%) 0.008 (0.0%)
com.gargoylesoftware.htmlunit WebClient
1 (0.0%) 0.072 (0.0%)
java.util TreeMap
1 (0.0%) 0.04 (0.0%)
com.gargoylesoftware.htmlunit.html
HtmlInputElementCreator 1 (0.0%) 0.008
(0.0%)
Object[ ]
1 (0.0%) 0.08 (0.0%)
Report Date: 2003/04/15 1:17:32
test source
patch for 1.2.2
Logged In: YES
user_id=756657
I made a patch which resolve (only) this problem.
added features are below:
clear the private List variable
JavaScriptEngine
which deregister a page from PageInfo.
By calling these method at appropriate timing, you can free
memory.
Logged In: YES
user_id=756657
The attached file is revised test source to use added new
method.
this code will not exhaust memory.
test source revised to use added new methods
Logged In: YES
user_id=46756
The core problem is that HtmlPage objects were never being
garbage collected and those pages were hanging onto the
various table objects.
Changed the JavaScriptEngine to use weak references for
HtmlPage objects which will allow those pages to be garbage
collected.