From: Xue-Feng Y. <xy...@no...> - 2017-07-13 17:58:51
|
Yes, I use it to download data with the different parameters. Thanks. On Thu, Jul 13, 2017 at 1:48 PM, Albu Gmail <alb...@gm...> wrote: > I don't understand what you mean by "load a few hundred of remote pages", > htmlunit is used to interact with pages, it's a silent browser. You > interact with hundred of pages ? > > > > Le 13/07/2017 à 19:44, Xue-Feng Yang a écrit : > > Thanks. It's a little complicated solution since I need to load a few > hundreds of remote pages. I'll try this later if my current method don't > work. > > On Thu, Jul 13, 2017 at 12:48 PM, Albu Gmail <alb...@gm...> > wrote: > >> You are really testing my memory man.... >> >> The idea,(my idea) is there are some timers set in the page (auto >> refresh, update or so...) and as it is explained here: >> http://www.webdeveloper.com/forum/showthread.php?233448-Is- >> there-a-way-to-find-if-any-intervals-are-still-open >> >> *You cannot reliably tell if there are any unnamed intervals running, but >> you **can **shut down any that are open.* >> >> In previous answer You can see a call to a methode call >> attendPourJavascriptSaufTimers, for example in : >> >> // add a fake submit button to be able to submit the form( I translated >> from french) >> loginForm.appendChild(fauxBouton ); >> pageEnCours = fauxBouton.click(); >> >> *//webClient.waitForBackgroundJavaScript(AttentePourJavascript.CINQ_SECONDES.getTempo()); * *Original >> call but I got trouble so:* >> webClient.attendPourJavascriptSaufTimers(pageEnCours, >> AttentePourJavascript.CINQ_SECONDES.getTempo()); >> print.save(NomsFichiersPagesSa >> uvegardees.APRES_LOGGING.getUrl(), pageEnCours.asXml(), original); //Waiting >> for 5 seconds but could return before if nothing is running >> >> *What this method is doing:* >> >> public int attendPourJavascriptSaufTimers(HtmlPage page,long tempo){ >> >> String texteDuScript = ScriptAExecuter.ANNULE_LES_TIMERS.getScript(); >> //Use an enumeration where the scripts are described >> Object result = page.executeJavaScript(texteDu >> Script).getJavaScriptResult(); >> int retour = this.waitForBackgroundJavaScript(tempo); >> return retour; >> } >> the script executed (ANNULE_LES_TIMERS is the following: >> *limit= 10;* >> * var np, n= setInterval(function(){},100000);* >> * np= Math.max(0, n-limit);* >> * while(n> np){* >> * clearInterval(n--);* >> >> >> * } **If I wrote all this stuff it was because I was running into >> problems like you are , not getting all the page content I should, so my >> advise is to follow a little bit my track...**even If I don't remember >> all the details* >> *I think also you can see if there are interval set with the website you >> are scrapping and DevTools console of your browser* >> *I remember having done these back and forth sessions between DevTools >> and htmlunit, you really have to understand completely what's running on >> the site if you want to mimic it.* >> >> >> Le 13/07/2017 à 17:36, Xue-Feng Yang a écrit : >> >> I made more experiments on the issue. I added the following >> >> webClient.getOptions().setUseInsecureSSL(true); >> webClient.getCookieManager().setCookiesEnabled(true); >> webClient.setAjaxController(new NicelyResynchronizingAjaxController()); >> >> JavaScriptJobManager manager = htmlPage.getEnclosingWindow(). >> getJobManager(); >> int count = 0; >> while(manager.getJobCount() > 0){ >> System.out.println(count + "@" + manager.getJobCount()); >> webClient.waitForBackgroundJavaScript(10000); >> count ++; >> } >> >> Then I went to sleep. It's been running for a few hours. The job count >> has been changed from 20 to 3 and stayed at 3. >> >> Any thought? >> >> Thanks >> >> On Wed, Jul 12, 2017 at 10:56 PM, Xue-Feng Yang <no...@gm...> wrote: >> >>> >>> Hi, I used htmlunit for getting some other web pages. It works great. >>> >>> However, when I tried https://weather.com/weather/monthly/l/27560:4:US >>> , I got something not correct. >>> >>> Here are the summary of my system: >>> >>> OS: win 10 >>> Java: jdk1.8.0_131 >>> htmlunit: htmlunit-2.27-bin >>> >>> Attached are three pictures. >>> >>> eclipse-debug gives the result htmlunit got. The main code is as follows: >>> >>> webClient = new WebClient(BrowserVersion.FIREFOX_45); >>> webClient.getOptions().setTimeout(600 * 1000); >>> webClient.waitForBackgroundJavaScript(600 * 1000); >>> webClient.getOptions().setRedirectEnabled(true); >>> webClient.getOptions().setJavaScriptEnabled(true); >>> webClient.getOptions().setThrowExceptionOnFailingStatusCode( >>> false); >>> webClient.getOptions().setThrowExceptionOnScriptError(false); >>> webClient.getOptions().setCssEnabled(false); >>> >>> htmlPage = webClient.getPage(_url); >>> page = htmlPage.asXml(); >>> >>> view-source is the source page from Firefox. >>> >>> inspector is the debug tree from Firefox is debugger. >>> >>> It shows only Firefox debugger has the right html tree. >>> >>> My question is how to get the html tree by use of htmlunit? >>> >>> Thanks, >>> >>> Xuefeng >>> >> >> >> >> ------------------------------------------------------------------------------ >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> >> _______________________________________________ >> Htmlunit-user mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/htmlunit-user >> >> ------------------------------------------------------------------------------ >> Check out the vibrant tech community on one of the world's most engaging >> tech sites, Slashdot.org! http://sdm.link/slashdot >> _______________________________________________ Htmlunit-user mailing >> list Htm...@li... https://lists.sourceforge.net/ >> lists/listinfo/htmlunit-user > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > _______________________________________________ > Htmlunit-user mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/htmlunit-user > > > ------------------------------------------------------------ > ------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Htmlunit-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlunit-user > > |