From: EricWong <ykw...@ya...> - 2017-07-10 14:28:43
|
<http://htmlunit.10904.n7.nabble.com/file/n42303/ECMWF.png> Hello. I try to load a page by HtmlUnit: https://www.ecmwf.int/en/forecasts/charts/catalogue/medium-mslp-wind850?time=2017070900,0,2017070900&projection=classical_europe (It is a page of a major meteorological agency in Europe) I try to get the complete html content by "HtmlPage.asXml()". However, the image tag: img class="chart-image" id="map_1_image" src="..." cannot be loaded even waiting for a period of time. The whole page can be loaded successfully by both Chrome and Firefox. The attached screen shows the page loaded by Chrome with F12 panel. This page does not require any user click. Just type the URL and wait for the ajax to load is OK. (Please subsituted the YYYYMMDD component of the URL with the previous day for test if necessary. E.g. if today is 15 Jul 2017, please use 20170714) May I know how the page can be loaded completely by Htmlunit? Thanks. -- View this message in context: http://htmlunit.10904.n7.nabble.com/Failing-to-load-the-complete-html-content-of-a-page-with-ajax-tp42303.html Sent from the HtmlUnit - General mailing list archive at Nabble.com. |
From: albu77 <alb...@gm...> - 2017-07-10 15:03:31
|
I used htmlunit in the past and it was not loading the images. But You can have a look at this link anyway because now it depends of th version you are using https://stackoverflow.com/questions/3425697/does-htmlunit-load-images-when-it-browses-page <https://stackoverflow.com/questions/3425697/does-htmlunit-load-images-when-it-browses-page> -- View this message in context: http://htmlunit.10904.n7.nabble.com/Failing-to-load-the-complete-html-content-of-a-page-with-ajax-tp42303p42304.html Sent from the HtmlUnit - General mailing list archive at Nabble.com. |
From: EricWong <ykw...@ya...> - 2017-07-11 01:36:33
|
Thanks for your reply. I have tried "webClient.getOptions().setDownloadImages(true);" but it does not work. I am using the latest version 2.27. I do not need to download any image from the page by Htmlunit. I just want to get the complete html code result just as that shown on the F12 panel of Chrome. -- View this message in context: http://htmlunit.10904.n7.nabble.com/Failing-to-load-the-complete-html-content-of-a-page-with-ajax-tp42303p42306.html Sent from the HtmlUnit - General mailing list archive at Nabble.com. |
From: albu77 <alb...@gm...> - 2017-07-11 05:05:33
|
And can you show what you get from your Page save? -- View this message in context: http://htmlunit.10904.n7.nabble.com/Failing-to-load-the-complete-html-content-of-a-page-with-ajax-tp42303p42307.html Sent from the HtmlUnit - General mailing list archive at Nabble.com. |
From: EricWong <ykw...@ya...> - 2017-07-11 06:28:58
|
I now upload the text file of the result of HtmlPage.asXML() : AsXmlResult.txt <http://htmlunit.10904.n7.nabble.com/file/n42308/AsXmlResult.txt> As it shows, the image tag as described above and as shown in the captured screen is not found. -- View this message in context: http://htmlunit.10904.n7.nabble.com/Failing-to-load-the-complete-html-content-of-a-page-with-ajax-tp42303p42308.html Sent from the HtmlUnit - General mailing list archive at Nabble.com. |
From: albu77 <alb...@gm...> - 2017-07-11 07:49:47
|
chart-controls div is not showing either. I think you are not get the page at the right time there should be an ajax call with dom append opearation on success of the ajax call. If I were you I will dig in this direction. perhaps have a look to this link <https://stackoverflow.com/questions/19551043/process-ajax-request-in-htmlunit> It's a long time since I've used htmlunit but looking in my sources I found that I used my own class: public class MyWebClient extends WebClient ...and also if(ajaxSynchrone){ webClient.setAjaxController(new NicelyResynchronizingAjaxController()); There is nothing more I can tell you it's too far away and don't have any way of building any solution now. Good luck... -- View this message in context: http://htmlunit.10904.n7.nabble.com/Failing-to-load-the-complete-html-content-of-a-page-with-ajax-tp42303p42309.html Sent from the HtmlUnit - General mailing list archive at Nabble.com. |
From: EricWong <ykw...@ya...> - 2017-07-11 09:25:11
|
Thanks for your information. The problem is solved. My program already included this line of code before: webClient.setAjaxController(new NicelyResynchronizingAjaxController()); Per advise by you, I focus on this line. I commented it out //webClient.setAjaxController(new NicelyResynchronizingAjaxController()); and the complete html page can be loaded successfully. In your program, you determine whether to use it by: if(ajaxSynchrone) ... May you say a little about how to determine whether "ajaxSynchrone" is true or false? -- View this message in context: http://htmlunit.10904.n7.nabble.com/Failing-to-load-the-complete-html-content-of-a-page-with-ajax-tp42303p42310.html Sent from the HtmlUnit - General mailing list archive at Nabble.com. |
From: albu77 <alb...@gm...> - 2017-07-11 09:44:57
|
As I created my webclient factory, I checked for that and I passed asynchrone as a parameter and it is set to true. So it's strange but one thing more to say is that I set the browser version of the webclient to BrowserVersion.FIREFOX_24 . AND LAST BUT NOT LEAST I put also some code I can call +webClient.attendPourJavascriptSaufTimers(pageAffichageLicence, AttentePourJavascript.BEAUCOUP.getTempo()); + webClient.waitForBackgroundJavaScript(AttentePourJavascript.DIX_SECONDES.getTempo()); Two methods which allow any background javascript to execute with a time parameters and in some case the time is long sometime less. the first method kill any anytimer running on the page public int attendPourJavascriptSaufTimers(HtmlPage page,long tempo){ String texteDuScript = ScriptAExecuter.ANNULE_LES_TIMERS.getScript(); Object result = page.executeJavaScript(texteDuScript).getJavaScriptResult(); int retour = this.waitForBackgroundJavaScript(tempo); return retour; } public enum ScriptAExecuter { ANNULE_LES_TIMERS(" limit= 10; \r\n var np, n= setInterval(function(){},100000); \r\n np= Math.max(0, n-limit);\r\n while(n> np){\r\n clearInterval(n--);\r\n }"); final private String script; ScriptAExecuter(String script) { this.script = script; } public String getScript() { return script; } } AS I said it's very far away so I even don't remember the why and how of these code, but What I know it's still in production and running well with htmlunit 2.14. I Hope It could help -- View this message in context: http://htmlunit.10904.n7.nabble.com/Failing-to-load-the-complete-html-content-of-a-page-with-ajax-tp42303p42311.html Sent from the HtmlUnit - General mailing list archive at Nabble.com. |
From: EricWong <ykw...@ya...> - 2017-07-11 13:47:50
|
Thanks for your source code sharing. The programming technique used is quite advanced and it's not easy to understand it. But it's quite interesting. Thanks. -- View this message in context: http://htmlunit.10904.n7.nabble.com/Failing-to-load-the-complete-html-content-of-a-page-with-ajax-tp42303p42312.html Sent from the HtmlUnit - General mailing list archive at Nabble.com. |