Re: [Htmlunit-user] not work for a page in weather.com

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I have more progress now. After testing, BrowserVersion Chrome and
FireFox52(or the best supported) can reduce the job count to 2 in 5
minutes, and the data I needed is in webClient now. The others (FireFox45,
Edge, IE) can't reduce the job count to 2 in 5 minutes and I can't see the
data in webClient when the job count only reduces to 3 or more.

It's worth to point out that all real browsers I tested can show the data
in less than one minute.

Any more suggestion?

On Thu, Jul 13, 2017 at 12:48 PM, Albu Gmail <alb...@gm...> wrote:

> You are really testing my memory man....
>
> The idea,(my idea) is there are some timers set in the page (auto refresh,
> update or so...) and as it is explained here: http://www.webdeveloper.com/
> forum/showthread.php?233448-Is-there-a-way-to-find-if-any-
> intervals-are-still-open
>
> *You cannot reliably tell if there are any unnamed intervals running, but
> you **can **shut down any that are open.*
>
> In previous answer You can see a call to a methode call
> attendPourJavascriptSaufTimers, for example in :
>
> // add a fake submit button to be able to submit the form( I translated
> from french)
>                     loginForm.appendChild(fauxBouton );
>                     pageEnCours = fauxBouton.click();
>
> *//webClient.waitForBackgroundJavaScript(AttentePourJavascript.CINQ_SECONDES.getTempo()); * *Original
> call but I got trouble so:*
>                     webClient.attendPourJavascriptSaufTimers(pageEnCours,
> AttentePourJavascript.CINQ_SECONDES.getTempo());
>                     print.save(NomsFichiersPagesSauvegardees.APRES_LOGGING.getUrl(),
> pageEnCours.asXml(), original); //Waiting for 5 seconds but could return
> before if nothing is running
>
> *What this method is doing:*
>
> public int attendPourJavascriptSaufTimers(HtmlPage page,long tempo){
>
>         String texteDuScript = ScriptAExecuter.ANNULE_LES_TIMERS.getScript();
> //Use an enumeration where the scripts are described
>         Object result = page.executeJavaScript(texteDuScript).
> getJavaScriptResult();
>         int retour = this.waitForBackgroundJavaScript(tempo);
>         return retour;
>     }
> the script executed (ANNULE_LES_TIMERS is the following:
> *limit= 10;*
> * var np, n= setInterval(function(){},100000);*
> * np= Math.max(0, n-limit);*
> * while(n> np){*
> * clearInterval(n--);*
>
>
> * } **If I wrote all this stuff it was because I was running into
> problems like you are , not getting all the page content I should, so my
> advise is to follow a little bit my track...**even If I don't remember
> all the details*
> *I think also you can see if there are interval set with the website you
> are scrapping and DevTools console of your browser*
> *I remember having done these back and forth sessions between DevTools and
> htmlunit, you really have to understand completely what's running on the
> site if you want to mimic it.*
>
>
> Le 13/07/2017 à 17:36, Xue-Feng Yang a écrit :
>
> I made more experiments on the issue. I added the following
>
> webClient.getOptions().setUseInsecureSSL(true);
> webClient.getCookieManager().setCookiesEnabled(true);
> webClient.setAjaxController(new NicelyResynchronizingAjaxController());
>
> JavaScriptJobManager manager = htmlPage.getEnclosingWindow().
> getJobManager();
> int count = 0;
> while(manager.getJobCount() > 0){
> System.out.println(count + "@" + manager.getJobCount());
> webClient.waitForBackgroundJavaScript(10000);
>         count ++;
>         }
>
> Then I went to sleep. It's been running for a few hours. The job count has
> been changed from 20 to 3 and stayed at 3.
>
> Any thought?
>
> Thanks
>
> On Wed, Jul 12, 2017 at 10:56 PM, Xue-Feng Yang <no...@gm...> wrote:
>
>>
>> Hi, I used htmlunit for getting some other web pages. It works great.
>>
>> However, when I tried https://weather.com/weather/monthly/l/27560:4:US ,
>> I got something not correct.
>>
>> Here are the summary of my system:
>>
>> OS: win 10
>> Java: jdk1.8.0_131
>> htmlunit: htmlunit-2.27-bin
>>
>> Attached are three pictures.
>>
>> eclipse-debug gives the result htmlunit got. The main code is as follows:
>>
>>         webClient = new WebClient(BrowserVersion.FIREFOX_45);
>>         webClient.getOptions().setTimeout(600 * 1000);
>>         webClient.waitForBackgroundJavaScript(600 * 1000);
>>         webClient.getOptions().setRedirectEnabled(true);
>>         webClient.getOptions().setJavaScriptEnabled(true);
>>         webClient.getOptions().setThrowExceptionOnFailingStatusCode(
>> false);
>>         webClient.getOptions().setThrowExceptionOnScriptError(false);
>>         webClient.getOptions().setCssEnabled(false);
>>
>>         htmlPage = webClient.getPage(_url);
>>         page = htmlPage.asXml();
>>
>> view-source is the source page from Firefox.
>>
>> inspector is the debug tree from Firefox is debugger.
>>
>> It shows only Firefox debugger has the right html tree.
>>
>> My question is how to get the html tree by use of htmlunit?
>>
>> Thanks,
>>
>> Xuefeng
>>
>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>
>
>
> _______________________________________________
> Htmlunit-user mailing lis...@li...://lists.sourceforge.net/lists/listinfo/htmlunit-user
>
>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Htmlunit-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlunit-user
>
>

Re: [Htmlunit-user] not work for a page in weather.com

Java GUI-Less browser, supporting JavaScript, to run against web pages

Re: [Htmlunit-user] not work for a page in weather.com