From: Rich G. <ri...@um...> - 2015-03-05 18:09:53
|
One followup: I'm trying to get the HTML (mainly the link urls) included in some of the agendas but it's not coming through using: WebClient webClient = new WebClient(BrowserVersion.CHROME); HtmlPage page = webClient.getPage(" http://www.house.leg.state.mn.us/schedules/schedule.aspx#03/04/2015"); Thread.sleep(10000); System.out.println(page.asXml()); Is there something I need to do in order to get/keep the html rendering? I get (3/4/2015) HF0416-A15-0112.pdf instead of that text with the link to the pdf file... -Rich On Wed, Mar 4, 2015 at 11:45 AM, Ahmed Ashour <asa...@ya...> wrote: > Hi Rich, > > Well, waitForBackground is actually better than Thread.sleep(), and it > works. > > It didn't work with you before, because you used it 'before' > webClient.getPage(), however it should be 'after', to allow JavaScript/AJAX > to run. > > Hope that clarifies, > > Ahmed > ------------------------------ > *From:* Rich Goldman <ri...@um...> > *To:* htm...@li... > *Sent:* Wednesday, March 4, 2015 4:56 PM > *Subject:* Re: [Htmlunit-user] Help Extracting Schedule from a Website > > I think I was confused between using Thread.sleep(10000) > and webClient.waitForBackgroundJavaScript(10000). > > Thanks again. > -Rich > > > > On Wed, Mar 4, 2015 at 10:21 AM, Alain BUFERNE <alb...@gm...> > wrote: > > By using HtmlUnit, you generally just program what a normal human being > will do to use the webSite. Since you just need information send by the > server in response of clickt this, select that, you don't need to execute > Js code . > > 2015-03-04 7:05 GMT+01:00 Rich Goldman <ri...@um...>: > > Doing a bit more digging, it seems the javascript functions for populating > the agenda items are in: > http://www.house.leg.state.mn.us/schedules/ScheduleElements0.js?v=1.12 > > I don't know enough javascript to know how to execute these functions > appropriately though. > -Rich > > On Wed, Mar 4, 2015 at 12:41 AM, Rich Goldman <ri...@um...> wrote: > > I'm trying to get the schedule information posted at: > > http://www.house.leg.state.mn.us/schedules/schedule.aspx#03/06/2015 > > The content is loaded dynamically (presumably via AJAX) and I've tried the > following code: > > > final WebClient webClient = new > WebClient(BrowserVersion.CHROME); > webClient.waitForBackgroundJavaScript(10000); > final HtmlPage page = webClient > .getPage(" > http://www.house.leg.state.mn.us/schedules/schedule.aspx#03/06/2015"); > String javaScriptCode = "SchedJSx.Init();"; > > ScriptResult result = page.executeJavaScript(javaScriptCode); > result.getJavaScriptResult(); > System.out.println("result: " + result.getJavaScriptResult()); > > I can get some of the dynamic content: > Friday, March 06, 2015 > 10:30 AM > Health and Human Services Reform > Chair: Rep. Tara Mack > Location: Basement State Office Building > Note: > ***Additional bills may be added > > but not the agenda/bill list. > > I feel like I'm missing something simple that I'm now aware of as a > newbie. I would appreciate a skilled HTML Unit user looking at the source > code of the source website and pointing out what I'm missing so I can > extract the agenda for this meeting as well. > > Thanks for any help you can provide. > -Rich > > > > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming The Go Parallel Website, > sponsored > by Intel and developed in partnership with Slashdot Media, is your hub for > all > things parallel software development, from weekly thought leadership blogs > to > news, videos, case studies, tutorials and more. Take a look and join the > conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > Htmlunit-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlunit-user > > > > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming The Go Parallel Website, > sponsored > by Intel and developed in partnership with Slashdot Media, is your hub for > all > things parallel software development, from weekly thought leadership blogs > to > news, videos, case studies, tutorials and more. Take a look and join the > conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > Htmlunit-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlunit-user > > > > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming The Go Parallel Website, > sponsored > by Intel and developed in partnership with Slashdot Media, is your hub for > all > things parallel software development, from weekly thought leadership blogs > to > news, videos, case studies, tutorials and more. Take a look and join the > conversation now. http://goparallel.sourceforge.net/ > > _______________________________________________ > Htmlunit-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlunit-user > > > > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming The Go Parallel Website, > sponsored > by Intel and developed in partnership with Slashdot Media, is your hub for > all > things parallel software development, from weekly thought leadership blogs > to > news, videos, case studies, tutorials and more. Take a look and join the > conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > Htmlunit-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlunit-user > > |