From: Xue-Feng Y. <xy...@no...> - 2017-07-14 02:36:46
|
I have more progress now. After testing, BrowserVersion Chrome and FireFox52(or the best supported) can reduce the job count to 2 in 5 minutes, and the data I needed is in webClient now. The others (FireFox45, Edge, IE) can't reduce the job count to 2 in 5 minutes and I can't see the data in webClient when the job count only reduces to 3 or more. It's worth to point out that all real browsers I tested can show the data in less than one minute. Any more suggestion? On Thu, Jul 13, 2017 at 12:48 PM, Albu Gmail <alb...@gm...> wrote: > You are really testing my memory man.... > > The idea,(my idea) is there are some timers set in the page (auto refresh, > update or so...) and as it is explained here: http://www.webdeveloper.com/ > forum/showthread.php?233448-Is-there-a-way-to-find-if-any- > intervals-are-still-open > > *You cannot reliably tell if there are any unnamed intervals running, but > you **can **shut down any that are open.* > > In previous answer You can see a call to a methode call > attendPourJavascriptSaufTimers, for example in : > > // add a fake submit button to be able to submit the form( I translated > from french) > loginForm.appendChild(fauxBouton ); > pageEnCours = fauxBouton.click(); > > *//webClient.waitForBackgroundJavaScript(AttentePourJavascript.CINQ_SECONDES.getTempo()); * *Original > call but I got trouble so:* > webClient.attendPourJavascriptSaufTimers(pageEnCours, > AttentePourJavascript.CINQ_SECONDES.getTempo()); > print.save(NomsFichiersPagesSauvegardees.APRES_LOGGING.getUrl(), > pageEnCours.asXml(), original); //Waiting for 5 seconds but could return > before if nothing is running > > *What this method is doing:* > > public int attendPourJavascriptSaufTimers(HtmlPage page,long tempo){ > > String texteDuScript = ScriptAExecuter.ANNULE_LES_TIMERS.getScript(); > //Use an enumeration where the scripts are described > Object result = page.executeJavaScript(texteDuScript). > getJavaScriptResult(); > int retour = this.waitForBackgroundJavaScript(tempo); > return retour; > } > the script executed (ANNULE_LES_TIMERS is the following: > *limit= 10;* > * var np, n= setInterval(function(){},100000);* > * np= Math.max(0, n-limit);* > * while(n> np){* > * clearInterval(n--);* > > > * } **If I wrote all this stuff it was because I was running into > problems like you are , not getting all the page content I should, so my > advise is to follow a little bit my track...**even If I don't remember > all the details* > *I think also you can see if there are interval set with the website you > are scrapping and DevTools console of your browser* > *I remember having done these back and forth sessions between DevTools and > htmlunit, you really have to understand completely what's running on the > site if you want to mimic it.* > > > Le 13/07/2017 à 17:36, Xue-Feng Yang a écrit : > > I made more experiments on the issue. I added the following > > webClient.getOptions().setUseInsecureSSL(true); > webClient.getCookieManager().setCookiesEnabled(true); > webClient.setAjaxController(new NicelyResynchronizingAjaxController()); > > JavaScriptJobManager manager = htmlPage.getEnclosingWindow(). > getJobManager(); > int count = 0; > while(manager.getJobCount() > 0){ > System.out.println(count + "@" + manager.getJobCount()); > webClient.waitForBackgroundJavaScript(10000); > count ++; > } > > Then I went to sleep. It's been running for a few hours. The job count has > been changed from 20 to 3 and stayed at 3. > > Any thought? > > Thanks > > On Wed, Jul 12, 2017 at 10:56 PM, Xue-Feng Yang <no...@gm...> wrote: > >> >> Hi, I used htmlunit for getting some other web pages. It works great. >> >> However, when I tried https://weather.com/weather/monthly/l/27560:4:US , >> I got something not correct. >> >> Here are the summary of my system: >> >> OS: win 10 >> Java: jdk1.8.0_131 >> htmlunit: htmlunit-2.27-bin >> >> Attached are three pictures. >> >> eclipse-debug gives the result htmlunit got. The main code is as follows: >> >> webClient = new WebClient(BrowserVersion.FIREFOX_45); >> webClient.getOptions().setTimeout(600 * 1000); >> webClient.waitForBackgroundJavaScript(600 * 1000); >> webClient.getOptions().setRedirectEnabled(true); >> webClient.getOptions().setJavaScriptEnabled(true); >> webClient.getOptions().setThrowExceptionOnFailingStatusCode( >> false); >> webClient.getOptions().setThrowExceptionOnScriptError(false); >> webClient.getOptions().setCssEnabled(false); >> >> htmlPage = webClient.getPage(_url); >> page = htmlPage.asXml(); >> >> view-source is the source page from Firefox. >> >> inspector is the debug tree from Firefox is debugger. >> >> It shows only Firefox debugger has the right html tree. >> >> My question is how to get the html tree by use of htmlunit? >> >> Thanks, >> >> Xuefeng >> > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > > _______________________________________________ > Htmlunit-user mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/htmlunit-user > > > > ------------------------------------------------------------ > ------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Htmlunit-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlunit-user > > |