From: Vasudevan C. <vco...@gm...> - 2020-06-04 09:41:05
|
Hi Damon, HTMLUnit is tightly coupled with Rhino JS engine. Changing to a different engine is a tedious task. Selenium uses HTMLUnit driver and see if you can use PhantomJS. As an alternative, you can switch to use PhantomJS instead of HTMLUnit. Regards Vasu On Thu, 4 Jun 2020 at 13:38, Damon Goodyear <dam...@ho...> wrote: > Hi, > > Thanks Vasudevan for your reply. > > In fact I did try that too without success - I just get the top and bottom > of the page and not the numbers (the important bit) in between. > > While getting ready for this morning I did wonder if changing the > javascript engine might work. If I understand correctly HTMLUnit uses > Rhino. Presumably there are other engines available - is it possible to > change to a different engine or would that involve a code change? > > Thanks, > > > *Damon Goodyear * > CONFIDENTIALITY and NON-DISCLOSURE OF EMAIL ADDRESS: This email, including > its content and the address of the sender, are provided for the use of the > recipient only and for the purposes of the subject matter under > discussion. Notwithstanding any other consent that may have been given > neither the content of this email nor the address of the sender may be > disclosed to third parties, including within the same undertaking, without > the prior written permission of the sender. Where this email has been sent > to a general or non-personal email address permission is granted for a > reply to be made, on the subject matter only, from the undertaking to whom > it is addressed. > > ------------------------------ > *From:* Vasudevan Comandur <vco...@gm...> > *Sent:* 04 June 2020 08:33 > *To:* htm...@li... < > htm...@li...> > *Subject:* Re: [Htmlunit-user] Exception thrown by webClient.getPage > > Hi, > > See if you can extract the necessary data from the HTML response by > disabling the Javascript in WebClient. > > Regards > Vasu > > On Thu, 4 Jun 2020 at 02:35, Damon Goodyear <dam...@ho...> > wrote: > > Hi, > > I am new to HTMLUnit and (re)new to seeking help from mailing lists like > this - the last time I tried was about 15 years ago and things seem to have > moved on. I hope my question is useful to you and I hope, even more, that > your answer is useful to me. I hope I am not committing the newbie sin of > asking a well known issue/non-issue. > > I have encountered a problem from the beginning with HTMLUnit. I have > been trying to use HTMLUnit to download information from the following > URLs- > > https://www.londonstockexchange.com/stock/OPM/1pm-plc/fundamentals > <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fstock%2FOPM%2F1pm-plc%2Ffundamentals&data=02%7C01%7C%7C8382574b518747db73af08d808599c2d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268528251865575&sdata=Z1A8vO7RbaleZ24lhTCGroUJXDIlCEb1phvtoebx5JI%3D&reserved=0> > > https://www.londonstockexchange.com/live-markets/market-data-dashboard/price-explorer?page=1 > <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Flive-markets%2Fmarket-data-dashboard%2Fprice-explorer%3Fpage%3D1&data=02%7C01%7C%7C8382574b518747db73af08d808599c2d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268528251865575&sdata=4Fa8ARX0Z%2Ff6bA3G4bHcXr7QM87M4Oq85tK%2FRqbqiwA%3D&reserved=0> > > https://www.londonstockexchange.com/stock/OPM/1pm-plc/company-page > <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fstock%2FOPM%2F1pm-plc%2Fcompany-page&data=02%7C01%7C%7C8382574b518747db73af08d808599c2d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268528251876009&sdata=MCNWrk%2FDZZbO7pX5IFqIVQ89AXtR84yUyjGW93DPmuA%3D&reserved=0> > > > These are listed in order of importance to me. > > The reason for this is that all this information has changed format in the > last week or so and has become much harder to unpick. > > If I try the following code-- > > WebClient wc = new WebClient(BrowserVersion.CHROME); > > // > LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log", > "org.apache.commons.logging.impl.NoOpLog"); > > // > java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF); > > // > java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF); > > try { > > HtmlPage page = wc.getPage(" > https://www.londonstockexchange.com/stock/OPM/1pm-plc/fundamentals > <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fstock%2FOPM%2F1pm-plc%2Ffundamentals&data=02%7C01%7C%7C8382574b518747db73af08d808599c2d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268528251876009&sdata=kNMT9Wp91A3paK5a%2FFARZKelW6Tt92DDE8UulZT%2BtJI%3D&reserved=0> > "); > > String s = page.asText(); > > System.out.print(s); > > wc.close(); > } catch etc... > > ... then I get a bucket load of javascript warnings that I can suppress by > uncommenting the commented lines above, followed by an exception I do not > understand and cannot find any help on, here or in the wider internet. > > The exception starts with the lines-- > > EcmaError: lineNumber=[1] column=[0] lineSource=[<no source>] > name=[ReferenceError] sourceName=[ > https://www.londonstockexchange.com:443/polyfills-es5.463681aba2540d60831f.js#1(Function) > <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fpolyfills-es5.463681aba2540d60831f.js%231(Function)&data=02%7C01%7C%7C8382574b518747db73af08d808599c2d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268528251885567&sdata=w2YKNvgSflbEO9rxVm9FebkDZu6opgBLfyE8jmyPvPU%3D&reserved=0>] > message=[ReferenceError: Assignment to undefined "regeneratorRuntime" in > strict mode ( > https://www.londonstockexchange.com:443/polyfills-es5.463681aba2540d60831f.js#1(Function)#1)] > <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fpolyfills-es5.463681aba2540d60831f.js%231(Function)%25231)%5D&data=02%7C01%7C%7C8382574b518747db73af08d808599c2d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268528251885567&sdata=jjYcfEYI7a17umBG0IfbYdD%2BUaS3Xx4KuImbSUEjYeA%3D&reserved=0> > > com.gargoylesoftware.htmlunit.ScriptException: ReferenceError: Assignment > to undefined "regeneratorRuntime" in strict mode ( > https://www.londonstockexchange.com:443/polyfills-es5.463681aba2540d60831f.js#1(Function)#1) > <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fpolyfills-es5.463681aba2540d60831f.js%231(Function)%25231)&data=02%7C01%7C%7C8382574b518747db73af08d808599c2d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268528251895558&sdata=bMM48D7M4YtrneW6NPRpcxKno85wVdR0UPh6FltvmUE%3D&reserved=0> > > at > com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:891) > > I can send the full, very long stack, trace if that would help. > > Are you able to assist - either by fixing the bug (if it is one) or > advising me how to get around this. > > Thanks, > > > *Damon Goodyear * > CONFIDENTIALITY and NON-DISCLOSURE OF EMAIL ADDRESS: This email, including > its content and the address of the sender, are provided for the use of the > recipient only and for the purposes of the subject matter under > discussion. Notwithstanding any other consent that may have been given > neither the content of this email nor the address of the sender may be > disclosed to third parties, including within the same undertaking, without > the prior written permission of the sender. Where this email has been sent > to a general or non-personal email address permission is granted for a > reply to be made, on the subject matter only, from the undertaking to whom > it is addressed. > _______________________________________________ > Htmlunit-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlunit-user > <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fhtmlunit-user&data=02%7C01%7C%7C8382574b518747db73af08d808599c2d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268528251895558&sdata=vmnKBUieKPn97FZe6ahufIhzO5veAOmm5q3v%2FoxGsMQ%3D&reserved=0> > > _______________________________________________ > Htmlunit-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlunit-user > |