Small example. The website is from the Turkish Government.
try(WebClient client = new WebClient()) {
client.setJavaScriptTimeout( 10000 );
HtmlPage page = client.getPage( "http://ssd.dhmi.gov.tr/page.aspx?mn=388" );
List<HtmlAnchor> anchors = page.getByXPath("//div[@id='dvPage']/ul/li/a");
// Airports
for (HtmlAnchor htmlAnchor : anchors) {
String name = htmlAnchor.getTextContent();
System.out.println("[" + name + "] requesting file");
Page p = htmlAnchor.click();
System.out.println("[" + name + "] waiting for server response");
client.waitForBackgroundJavaScript(10000);
System.out.println("[" + name + "] request acknowledged!");
}
} catch (IOException e) {
e.printStackTrace();
}
Notice how it freezes when it hits "LTBA" indefinitly(!) but not on the other ones which execute the same script.
Also note how setJavaScriptTimeout has not effect.
Ok, i'm able to reproduce your problem. The reason is the document returned as xml. HtmlUnit seems to parse this and this parsing does not scale. For large documents the paser runs a bit longer :-)
Will have a look....
Did some more analysis. Looks really starnge because parsing the singe document is fast. Maybe there is some memory leak somewhere.
Thank you for looking into this.
Also tried some indepth parsing with specialised DOM and SAX parser for those file types and even though those files can be immensely huge it also takes less then a second for me.
According to your first answer, by replacing the PageCreator and simply returning a TextPage for text/xml solves the problem right away.
No need to parse a file that should've been downloaded either way like a normal browser does in this situation.
This solves the problem for me, at least. Even though that does not answer the initial problem.
I am going to leave it to you, now.
Thank you for your response and commitment.
With best Regards