From: Ing. J. N. <ji...@se...> - 2019-01-06 16:45:53
|
Hello, I used HtmlUnit several times, especially for automatic collecting of data from various web pages. So far so good. Now I encountered some problems and I am not able to tell when they started. The main problem seems to be, that page loading is not deterministic. Here is my very simple code. import com.gargoylesoftware.htmlunit.BrowserVersion; import com.gargoylesoftware.htmlunit.NicelyResynchronizingAjaxController; import com.gargoylesoftware.htmlunit.WebClient; import com.gargoylesoftware.htmlunit.html.HtmlPage; import java.io.IOException; public class HtmlUnitExample { public static void main (String argv[]) { WebClient webClient = new WebClient(BrowserVersion.FIREFOX_52); // doesn't matter webClient.getOptions().setJavaScriptEnabled(true); webClient.getOptions().setThrowExceptionOnScriptError(false); webClient.getOptions().setThrowExceptionOnFailingStatusCode(false); webClient.getOptions().setRedirectEnabled(true); webClient.setAjaxController(new NicelyResynchronizingAjaxController( )); webClient.getOptions().setCssEnabled(false); webClient.getOptions().setUseInsecureSSL(true); try { HtmlPage page = webClient.getPage("http://myzuka.club"); String asXml = page.asXml(); } catch (IOException e) { System.out.println(e.getMessage()); } } } When I run it in my PC at work, everyhing is ok. Some warnings and some javascript exceptions and the page loads in less than two seconds. But when I run the same code in my PC at home, the results vary: - About 20% attempts ends whith Java heap out of space ! - About 40% attempts ends stuck at getPage line, debug listing stopped, but code don't stop. Hitting pause does nothing. - Rest of attempts returns page after all, but after many many tens of seconds, maybe even minutes. As far as operating system, memory, speed are concerned, the two PCs are similar. Windows 7, 16GB. The code is run in Oracle's JDeveloper (java 1.8), latest HtmlUnit (2.33). The page from example (http://myzuka.club) loads well in all browsers, both at work and at home. As far I know, I have no restriction in internet connection of my home provider. One thing caught my eye - when I run it at home, I see this logged line INFO: statusCode=[403] contentType=[text/html] I've never seen it at work. Does somebody have any ideas what can cause my home problems? Thanks, JN, Czech rep. |