From: Albu G. <alb...@gm...> - 2017-07-13 16:48:30
|
You are really testing my memory man.... The idea,(my idea) is there are some timers set in the page (auto refresh, update or so...) and as it is explained here: http://www.webdeveloper.com/forum/showthread.php?233448-Is-there-a-way-to-find-if-any-intervals-are-still-open *You cannot reliably tell if there are any unnamed intervals running, but you**can**shut down any that are open.* In previous answer You can see a call to a methode call attendPourJavascriptSaufTimers, for example in : // add a fake submit button to be able to submit the form( I translated from french) loginForm.appendChild(fauxBouton ); pageEnCours = fauxBouton.click(); ///webClient.waitForBackgroundJavaScript(AttentePourJavascript.CINQ_SECONDES.getTempo()); / *Original call but I got trouble so:* webClient.attendPourJavascriptSaufTimers(pageEnCours, AttentePourJavascript.CINQ_SECONDES.getTempo()); print.save(NomsFichiersPagesSauvegardees.APRES_LOGGING.getUrl(), pageEnCours.asXml(), original); //Waiting for 5 seconds but could return before if nothing is running *What this method is doing:* public int attendPourJavascriptSaufTimers(HtmlPage page,long tempo){ String texteDuScript = ScriptAExecuter.ANNULE_LES_TIMERS.getScript(); //Use an enumeration where the scripts are described Object result = page.executeJavaScript(texteDuScript).getJavaScriptResult(); int retour = this.waitForBackgroundJavaScript(tempo); return retour; } the script executed (ANNULE_LES_TIMERS is the following: /limit= 10;// // var np, n= setInterval(function(){},100000);// // np= Math.max(0, n-limit);// // while(n> np){// // clearInterval(n--);// // } //*If I wrote all this stuff it was because I was running into problems like you are , not getting all the page content I should, so my advise is to follow a little bit my track...*//*even If I don't remember all the details*//* *//*I think also you can see if there are interval set with the website you are scrapping and DevTools console of your browser*//* *//*I remember having done these back and forth sessions between DevTools and htmlunit, you really have to understand completely what's running on the site if you want to mimic it.*//* */ Le 13/07/2017 à 17:36, Xue-Feng Yang a écrit : > I made more experiments on the issue. I added the following > > webClient.getOptions().setUseInsecureSSL(true); > webClient.getCookieManager().setCookiesEnabled(true); > webClient.setAjaxController(new NicelyResynchronizingAjaxController()); > > JavaScriptJobManager manager = > htmlPage.getEnclosingWindow().getJobManager(); > int count = 0; > while(manager.getJobCount() > 0){ > System.out.println(count + "@" + manager.getJobCount()); > webClient.waitForBackgroundJavaScript(10000); > count ++; > } > > Then I went to sleep. It's been running for a few hours. The job count > has been changed from 20 to 3 and stayed at 3. > > Any thought? > > Thanks > > On Wed, Jul 12, 2017 at 10:56 PM, Xue-Feng Yang <no...@gm... > <mailto:no...@gm...>> wrote: > > > Hi, I used htmlunit for getting some other web pages. It works great. > > However, when I tried > https://weather.com/weather/monthly/l/27560:4:US > <https://weather.com/weather/monthly/l/27560:4:US> , I got > something not correct. > > Here are the summary of my system: > > OS: win 10 > Java: jdk1.8.0_131 > htmlunit: htmlunit-2.27-bin > > Attached are three pictures. > > eclipse-debug gives the result htmlunit got. The main code is as > follows: > > webClient = new WebClient(BrowserVersion.FIREFOX_45); > webClient.getOptions().setTimeout(600 * 1000); > webClient.waitForBackgroundJavaScript(600 * 1000); > webClient.getOptions().setRedirectEnabled(true); > webClient.getOptions().setJavaScriptEnabled(true); > > webClient.getOptions().setThrowExceptionOnFailingStatusCode(false); > webClient.getOptions().setThrowExceptionOnScriptError(false); > webClient.getOptions().setCssEnabled(false); > > htmlPage = webClient.getPage(_url); > page = htmlPage.asXml(); > > view-source is the source page from Firefox. > > inspector is the debug tree from Firefox is debugger. > > It shows only Firefox debugger has the right html tree. > > My question is how to get the html tree by use of htmlunit? > > Thanks, > > Xuefeng > > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > _______________________________________________ > Htmlunit-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlunit-user |