From: Rich G. <ri...@um...> - 2015-08-12 20:47:51
|
I am trying to get the populated HTML of another site but it is not loading, despite putting the wait time to 30 seconds. The address is http://alisondb.legislature.state.al.us/Alison/SESSBillResult.aspx?BILL=HB1&WIN_TYPE=SELECTED_STATUS Are you all able to get page.asXml(); to produce populated html for this address? I've updated to 2.18 and I've tried putting the waitForBackgroundJavaScript in multiple places without success. My code is: public String getWebsiteTextWithJavaScript(String url) { WebClient webClient = new WebClient(BrowserVersion.INTERNET_EXPLORER_6); HtmlPage page = null; try { webClient.waitForBackgroundJavaScript(30000); page = webClient.getPage(url); } catch (FailingHttpStatusCodeException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (MalformedURLException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } // // // Thread.sleep(10000); webClient.waitForBackgroundJavaScript(30000); String text = page.asXml(); webClient.waitForBackgroundJavaScript(30000); page.cleanUp(); webClient.closeAllWindows(); return text; } On Thu, Mar 5, 2015 at 8:21 PM, Rich Goldman <ri...@um...> wrote: > Sorry, ignore this...another newbie error. > -Rich > > On Thu, Mar 5, 2015 at 1:09 PM, Rich Goldman <ri...@um...> wrote: > >> One followup: >> >> I'm trying to get the HTML (mainly the link urls) included in some of the >> agendas but it's not coming through using: >> >> WebClient webClient = new WebClient(BrowserVersion.CHROME); >> HtmlPage page = webClient.getPage(" >> http://www.house.leg.state.mn.us/schedules/schedule.aspx#03/04/2015"); >> Thread.sleep(10000); >> System.out.println(page.asXml()); >> >> Is there something I need to do in order to get/keep the html rendering? >> >> I get (3/4/2015) HF0416-A15-0112.pdf instead of that text with the link >> to the pdf file... >> -Rich >> >> On Wed, Mar 4, 2015 at 11:45 AM, Ahmed Ashour <asa...@ya...> wrote: >> >>> Hi Rich, >>> >>> Well, waitForBackground is actually better than Thread.sleep(), and it >>> works. >>> >>> It didn't work with you before, because you used it 'before' >>> webClient.getPage(), however it should be 'after', to allow JavaScript/AJAX >>> to run. >>> >>> Hope that clarifies, >>> >>> Ahmed >>> ------------------------------ >>> *From:* Rich Goldman <ri...@um...> >>> *To:* htm...@li... >>> *Sent:* Wednesday, March 4, 2015 4:56 PM >>> *Subject:* Re: [Htmlunit-user] Help Extracting Schedule from a Website >>> >>> I think I was confused between using Thread.sleep(10000) >>> and webClient.waitForBackgroundJavaScript(10000). >>> >>> Thanks again. >>> -Rich >>> >>> >>> >>> On Wed, Mar 4, 2015 at 10:21 AM, Alain BUFERNE <alb...@gm...> >>> wrote: >>> >>> By using HtmlUnit, you generally just program what a normal human being >>> will do to use the webSite. Since you just need information send by the >>> server in response of clickt this, select that, you don't need to execute >>> Js code . >>> >>> 2015-03-04 7:05 GMT+01:00 Rich Goldman <ri...@um...>: >>> >>> Doing a bit more digging, it seems the javascript functions for >>> populating the agenda items are in: >>> http://www.house.leg.state.mn.us/schedules/ScheduleElements0.js?v=1.12 >>> >>> I don't know enough javascript to know how to execute these functions >>> appropriately though. >>> -Rich >>> >>> On Wed, Mar 4, 2015 at 12:41 AM, Rich Goldman <ri...@um...> wrote: >>> >>> I'm trying to get the schedule information posted at: >>> >>> http://www.house.leg.state.mn.us/schedules/schedule.aspx#03/06/2015 >>> >>> The content is loaded dynamically (presumably via AJAX) and I've tried >>> the following code: >>> >>> >>> final WebClient webClient = new >>> WebClient(BrowserVersion.CHROME); >>> webClient.waitForBackgroundJavaScript(10000); >>> final HtmlPage page = webClient >>> .getPage(" >>> http://www.house.leg.state.mn.us/schedules/schedule.aspx#03/06/2015"); >>> String javaScriptCode = "SchedJSx.Init();"; >>> >>> ScriptResult result = page.executeJavaScript(javaScriptCode); >>> result.getJavaScriptResult(); >>> System.out.println("result: " + result.getJavaScriptResult()); >>> >>> I can get some of the dynamic content: >>> Friday, March 06, 2015 >>> 10:30 AM >>> Health and Human Services Reform >>> Chair: Rep. Tara Mack >>> Location: Basement State Office Building >>> Note: >>> ***Additional bills may be added >>> >>> but not the agenda/bill list. >>> >>> I feel like I'm missing something simple that I'm now aware of as a >>> newbie. I would appreciate a skilled HTML Unit user looking at the source >>> code of the source website and pointing out what I'm missing so I can >>> extract the agenda for this meeting as well. >>> >>> Thanks for any help you can provide. >>> -Rich >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Dive into the World of Parallel Programming The Go Parallel Website, >>> sponsored >>> by Intel and developed in partnership with Slashdot Media, is your hub >>> for all >>> things parallel software development, from weekly thought leadership >>> blogs to >>> news, videos, case studies, tutorials and more. Take a look and join the >>> conversation now. http://goparallel.sourceforge.net/ >>> _______________________________________________ >>> Htmlunit-user mailing list >>> Htm...@li... >>> https://lists.sourceforge.net/lists/listinfo/htmlunit-user >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Dive into the World of Parallel Programming The Go Parallel Website, >>> sponsored >>> by Intel and developed in partnership with Slashdot Media, is your hub >>> for all >>> things parallel software development, from weekly thought leadership >>> blogs to >>> news, videos, case studies, tutorials and more. Take a look and join the >>> conversation now. http://goparallel.sourceforge.net/ >>> _______________________________________________ >>> Htmlunit-user mailing list >>> Htm...@li... >>> https://lists.sourceforge.net/lists/listinfo/htmlunit-user >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Dive into the World of Parallel Programming The Go Parallel Website, >>> sponsored >>> by Intel and developed in partnership with Slashdot Media, is your hub >>> for all >>> things parallel software development, from weekly thought leadership >>> blogs to >>> news, videos, case studies, tutorials and more. Take a look and join the >>> conversation now. http://goparallel.sourceforge.net/ >>> >>> _______________________________________________ >>> Htmlunit-user mailing list >>> Htm...@li... >>> https://lists.sourceforge.net/lists/listinfo/htmlunit-user >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Dive into the World of Parallel Programming The Go Parallel Website, >>> sponsored >>> by Intel and developed in partnership with Slashdot Media, is your hub >>> for all >>> things parallel software development, from weekly thought leadership >>> blogs to >>> news, videos, case studies, tutorials and more. Take a look and join the >>> conversation now. http://goparallel.sourceforge.net/ >>> _______________________________________________ >>> Htmlunit-user mailing list >>> Htm...@li... >>> https://lists.sourceforge.net/lists/listinfo/htmlunit-user >>> >>> >> > |