From: May, B. L. <bl...@cu...> - 2016-07-27 02:17:07
|
Hey all! First time using this library, just trying to get a page that requires javascript... Code snippet: WebClient webClient = new WebClient(BrowserVersion.CHROME); webClient.getOptions().setJavaScriptEnabled(true); webClient.getOptions().setCssEnabled(false); webClient.getOptions().setRedirectEnabled(true); webClient.getOptions().setThrowExceptionOnScriptError(false); webClient.getOptions().setThrowExceptionOnFailingStatusCode(false); webClient.getOptions().setPrintContentOnFailingStatusCode(true); webClient.getOptions().setThrowExceptionOnScriptError(false); String link = "http://nycprop.nyc.gov/nycproperty/statements/flk/jsp/stmtassessflk.jsp?statementId=104487723"; HtmlPage page = webClient.getPage(link); webClient.waitForBackgroundJavaScript(30 * 1000); String pageAsText = page.asText(); System.out.println(pageAsText); And I am getting a HUGE stackoverflow exception. If you try and view the URL in a browser with JS disabled you get a message about needing to turn it on, but there's barely any javascript in the page source from what I can see... TIA! |
From: Ahmed A. <asa...@ya...> - 2016-07-27 13:03:01
|
Hi May, Which version do you use? With latest snapshot (a release is imminent), there are no errors. Ahmed From: "May, Benjamin L." <bl...@cu...> To: "'htm...@li...'" <htm...@li...> Sent: Wednesday, July 27, 2016 4:16 AM Subject: [Htmlunit-user] first time user Hey all! First time using this library, just trying to get a page that requires javascript... Code snippet: WebClient webClient = new WebClient(BrowserVersion.CHROME); webClient.getOptions().setJavaScriptEnabled(true); webClient.getOptions().setCssEnabled(false); webClient.getOptions().setRedirectEnabled(true); webClient.getOptions().setThrowExceptionOnScriptError(false); webClient.getOptions().setThrowExceptionOnFailingStatusCode(false); webClient.getOptions().setPrintContentOnFailingStatusCode(true); webClient.getOptions().setThrowExceptionOnScriptError(false); String link = "http://nycprop.nyc.gov/nycproperty/statements/flk/jsp/stmtassessflk.jsp?statementId=104487723"; HtmlPage page = webClient.getPage(link); webClient.waitForBackgroundJavaScript(30 * 1000); String pageAsText = page.asText(); System.out.println(pageAsText); And I am getting a HUGE stackoverflow exception. If you try and view the URL in a browser with JS disabled you get a message about needing to turn it on, but there's barely any javascript in the page source from what I can see... TIA! |
From: May, B. L. <bl...@cu...> - 2016-07-28 11:45:20
Attachments:
Puller.java
|
Hey Ahmed, I recompiled today using the newly-released 2.23 and it still doesn’t work. If you run the code snippet below by itself it works but the content of the page is an error message. If you run the full code, which involves several prior pages which are saving session variables somewhere on the server side and it actually attempts to run the real results page, it still does a stack overflow. The full code (with as much extraneous code and dependencies removed) is attached. It’s ugly but this is a one-time scrape so it doesn’t have to be pretty! :) -Ben From: Ahmed Ashour [mailto:asa...@ya...] Sent: Wednesday, July 27, 2016 9:00 AM To: htm...@li... Subject: Re: [Htmlunit-user] first time user Hi May, Which version do you use? With latest snapshot (a release is imminent), there are no errors. Ahmed ________________________________ From: "May, Benjamin L." <bl...@cu...<mailto:bl...@cu...>> To: "'htm...@li...'" <htm...@li...<mailto:htm...@li...>> Sent: Wednesday, July 27, 2016 4:16 AM Subject: [Htmlunit-user] first time user Hey all! First time using this library, just trying to get a page that requires javascript... Code snippet: WebClient webClient = new WebClient(BrowserVersion.CHROME); webClient.getOptions().setJavaScriptEnabled(true); webClient.getOptions().setCssEnabled(false); webClient.getOptions().setRedirectEnabled(true); webClient.getOptions().setThrowExceptionOnScriptError(false); webClient.getOptions().setThrowExceptionOnFailingStatusCode(false); webClient.getOptions().setPrintContentOnFailingStatusCode(true); webClient.getOptions().setThrowExceptionOnScriptError(false); String link = "http://nycprop.nyc.gov/nycproperty/statements/flk/jsp/stmtassessflk.jsp?statementId=104487723"; HtmlPage page = webClient.getPage(link); webClient.waitForBackgroundJavaScript(30 * 1000); String pageAsText = page.asText(); System.out.println(pageAsText); And I am getting a HUGE stackoverflow exception. If you try and view the URL in a browser with JS disabled you get a message about needing to turn it on, but there's barely any javascript in the page source from what I can see... TIA! |
From: May, B. L. <bl...@cu...> - 2016-07-29 22:31:28
|
Bump! Still need to figure this out… From: May, Benjamin L. Sent: Thursday, July 28, 2016 7:45 AM To: 'Ahmed Ashour' <asa...@ya...>; 'htm...@li...' <htm...@li...> Subject: RE: [Htmlunit-user] first time user Hey Ahmed, I recompiled today using the newly-released 2.23 and it still doesn’t work. If you run the code snippet below by itself it works but the content of the page is an error message. If you run the full code, which involves several prior pages which are saving session variables somewhere on the server side and it actually attempts to run the real results page, it still does a stack overflow. The full code (with as much extraneous code and dependencies removed) is attached. It’s ugly but this is a one-time scrape so it doesn’t have to be pretty! :) -Ben From: Ahmed Ashour [mailto:asa...@ya...] Sent: Wednesday, July 27, 2016 9:00 AM To: htm...@li...<mailto:htm...@li...> Subject: Re: [Htmlunit-user] first time user Hi May, Which version do you use? With latest snapshot (a release is imminent), there are no errors. Ahmed ________________________________ From: "May, Benjamin L." <bl...@cu...<mailto:bl...@cu...>> To: "'htm...@li...'" <htm...@li...<mailto:htm...@li...>> Sent: Wednesday, July 27, 2016 4:16 AM Subject: [Htmlunit-user] first time user Hey all! First time using this library, just trying to get a page that requires javascript... Code snippet: WebClient webClient = new WebClient(BrowserVersion.CHROME); webClient.getOptions().setJavaScriptEnabled(true); webClient.getOptions().setCssEnabled(false); webClient.getOptions().setRedirectEnabled(true); webClient.getOptions().setThrowExceptionOnScriptError(false); webClient.getOptions().setThrowExceptionOnFailingStatusCode(false); webClient.getOptions().setPrintContentOnFailingStatusCode(true); webClient.getOptions().setThrowExceptionOnScriptError(false); String link = "http://nycprop.nyc.gov/nycproperty/statements/flk/jsp/stmtassessflk.jsp?statementId=104487723"; HtmlPage page = webClient.getPage(link); webClient.waitForBackgroundJavaScript(30 * 1000); String pageAsText = page.asText(); System.out.println(pageAsText); And I am getting a HUGE stackoverflow exception. If you try and view the URL in a browser with JS disabled you get a message about needing to turn it on, but there's barely any javascript in the page source from what I can see... TIA! |
From: Ronald B. <rb...@rb...> - 2016-07-30 12:23:37
|
Hi Ben, your code confuses me a bit. HtmlUnit is designed more or less as a high level API, usually there is no need to work with something HttpResponse and HttpEntity. To make the idea a bit more clear you can do the following: 1. Setup your 'browser' WebClient webClient = new WebClient(); webClient.getOptions().setThrowExceptionOnScriptError(false); Because the web site you like to visit throws some javascript errors (check the browser log of the real browser) you have to define this option. Otherwise HtmlUnit will stop working at the first js exception. 2. Open the page and fill the form String startURL = "http://webapps.nyc.gov:8084/CICS/fin1/find001i"; HtmlPage page = webClient.getPage(startURL); HtmlSelect borough = (HtmlSelect) page.getElementByName("FBORO"); HtmlOption option = borough.getOptionByText("Manhattan"); option.setSelected(true); HtmlTextInput houseNum = (HtmlTextInput) page.getElementByName("FHOUSENUM"); houseNum.setValueAttribute("100"); HtmlTextInput streetName = (HtmlTextInput) page.getElementByName("FSTNAME"); streetName.setValueAttribute("a"); As you can see, you can use the controls on the page directly. There are a bunch of different ways to find the controls (by id, name, xpath, css selector....) 3. Its time to trigger some action HtmlSubmitInput search = (HtmlSubmitInput) page.getElementByName("DFH_ENTER"); HtmlPage result = search.click(); It is that simple - no need to interact with all the http stuff 4. OK done with or first page System.out.println(result.asXml()); Hope that helps RBRi -------------------------- WETATOR Smart web application testing http://www.wetator.org On Fri, 29 Jul 2016 22:31:20 +0000 May, Benjamin L. wrote: > >Bump! Still need to figure this out > >From: May, Benjamin L. >Sent: Thursday, July 28, 2016 7:45 AM >To: 'Ahmed Ashour' <asa...@ya...>; 'htm...@li...' <htm...@li...> >Subject: RE: [Htmlunit-user] first time user > >Hey Ahmed, > >I recompiled today using the newly-released 2.23 and it still doesn't work. > >If you run the code snippet below by itself it works but the content of the page is an error message. If you run the full code, which involves several prior pages which are saving session variables somewhere on the server side and it actually >attempts to run the real results page, it still does a stack overflow. The full code (with as much extraneous code and dependencies removed) is attached. It's ugly but this is a one-time scrape so it doesn't have to be pretty! :) > >-Ben > >From: Ahmed Ashour [mailto:asa...@ya...] >Sent: Wednesday, July 27, 2016 9:00 AM >To: htm...@li...<mailto:htm...@li...> >Subject: Re: [Htmlunit-user] first time user > >Hi May, > >Which version do you use? > >With latest snapshot (a release is imminent), there are no errors. > >Ahmed > >________________________________ >From: "May, Benjamin L." <bl...@cu...<mailto:bl...@cu...>> >To: "'htm...@li...'" <htm...@li...<mailto:htm...@li...>> >Sent: Wednesday, July 27, 2016 4:16 AM >Subject: [Htmlunit-user] first time user > >Hey all! First time using this library, just trying to get a page that requires javascript... Code snippet: > > WebClient webClient = new WebClient(BrowserVersion.CHROME); > > webClient.getOptions().setJavaScriptEnabled(true); > webClient.getOptions().setCssEnabled(false); > webClient.getOptions().setRedirectEnabled(true); > webClient.getOptions().setThrowExceptionOnScriptError(false); > webClient.getOptions().setThrowExceptionOnFailingStatusCode(false); > webClient.getOptions().setPrintContentOnFailingStatusCode(true); > webClient.getOptions().setThrowExceptionOnScriptError(false); > > String link = "http://nycprop.nyc.gov/nycproperty/statements/flk/jsp/stmtassessflk.jsp?statementId=104487723"; > HtmlPage page = webClient.getPage(link); > webClient.waitForBackgroundJavaScript(30 * 1000); > String pageAsText = page.asText(); > System.out.println(pageAsText); > >And I am getting a HUGE stackoverflow exception. If you try and view the URL in a browser with JS disabled you get a message about needing to turn it on, but there's barely any javascript in the page source from what I can see... > >TIA! > > >----< Inline text [text-plain-04.txt] >------------------ > >------------------------------------------------------------------------------ > > > >----< Inline text [text-plain-05.txt] >------------------ > >_______________________________________________ >Htmlunit-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlunit-user > > |