getParentElement hangs
Brought to you by:
mjericho
The following sample code hangs at getParentElement. If the size of document is small enough (say 1KB), the code seems to work fine.
@Test
public void testGetParentElement() throws Exception {
// Create large amount of HTML
final StringBuffer sb = new StringBuffer();
final int size = 1024 * 1024;
while (sb.length() < size) {
sb.append("<p><font>test<br>");
}
final String html = sb.toString();
final Source source = new Source(html);
source.fullSequentialParse();
final List<Tag> tags = source.getAllTags();
for (Tag tag : tags) {
Element parentElement = tag.getElement().getParentElement();
System.out.println("Tag: [ " + tag.toString() + "]");
if (parentElement == null) {
System.out.println("parentElement is null");
}
}
}
The program works fine for me. The likely reason it is "hanging" when you run it is that you haven't allocated enough heap space and the garbage collector is running all the time as it starts to run out of memory. If you allocate at least 100MB heap space it should work fine.
The getParentElement() and other functions in the library that relate to the nesting of elements do use recursive algorithms, so if you're using them on an html file designed to have a massive number of unterminated elements it is not surprising that there are memory issues.
Thank you for the quick response. I have added conditions to check for degenerate cases as to avoid these memory issues.