From: Vasudevan C. <vco...@gm...> - 2020-06-04 07:33:24
|
Hi, See if you can extract the necessary data from the HTML response by disabling the Javascript in WebClient. Regards Vasu On Thu, 4 Jun 2020 at 02:35, Damon Goodyear <dam...@ho...> wrote: > Hi, > > I am new to HTMLUnit and (re)new to seeking help from mailing lists like > this - the last time I tried was about 15 years ago and things seem to have > moved on. I hope my question is useful to you and I hope, even more, that > your answer is useful to me. I hope I am not committing the newbie sin of > asking a well known issue/non-issue. > > I have encountered a problem from the beginning with HTMLUnit. I have > been trying to use HTMLUnit to download information from the following > URLs- > > https://www.londonstockexchange.com/stock/OPM/1pm-plc/fundamentals > > https://www.londonstockexchange.com/live-markets/market-data-dashboard/price-explorer?page=1 > > https://www.londonstockexchange.com/stock/OPM/1pm-plc/company-page > > These are listed in order of importance to me. > > The reason for this is that all this information has changed format in the > last week or so and has become much harder to unpick. > > If I try the following code-- > > WebClient wc = new WebClient(BrowserVersion.CHROME); > > // > LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log", > "org.apache.commons.logging.impl.NoOpLog"); > > // > java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF); > > // > java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF); > > try { > > HtmlPage page = wc.getPage(" > https://www.londonstockexchange.com/stock/OPM/1pm-plc/fundamentals"); > > String s = page.asText(); > > System.out.print(s); > > wc.close(); > } catch etc... > > ... then I get a bucket load of javascript warnings that I can suppress by > uncommenting the commented lines above, followed by an exception I do not > understand and cannot find any help on, here or in the wider internet. > > The exception starts with the lines-- > > EcmaError: lineNumber=[1] column=[0] lineSource=[<no source>] > name=[ReferenceError] sourceName=[ > https://www.londonstockexchange.com:443/polyfills-es5.463681aba2540d60831f.js#1(Function)] > message=[ReferenceError: Assignment to undefined "regeneratorRuntime" in > strict mode ( > https://www.londonstockexchange.com:443/polyfills-es5.463681aba2540d60831f.js#1(Function)#1)] > > com.gargoylesoftware.htmlunit.ScriptException: ReferenceError: Assignment > to undefined "regeneratorRuntime" in strict mode ( > https://www.londonstockexchange.com:443/polyfills-es5.463681aba2540d60831f.js#1(Function)#1) > > at > com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:891) > > I can send the full, very long stack, trace if that would help. > > Are you able to assist - either by fixing the bug (if it is one) or > advising me how to get around this. > > Thanks, > > > *Damon Goodyear * > CONFIDENTIALITY and NON-DISCLOSURE OF EMAIL ADDRESS: This email, including > its content and the address of the sender, are provided for the use of the > recipient only and for the purposes of the subject matter under > discussion. Notwithstanding any other consent that may have been given > neither the content of this email nor the address of the sender may be > disclosed to third parties, including within the same undertaking, without > the prior written permission of the sender. Where this email has been sent > to a general or non-personal email address permission is granted for a > reply to be made, on the subject matter only, from the undertaking to whom > it is addressed. > _______________________________________________ > Htmlunit-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlunit-user > |