|
From: Vasudevan C. <vco...@gm...> - 2020-06-04 11:19:20
|
Hi Damon,
Can you trace the request/response using a proxy tool?
The payload that you are interested to capture might come as a JSON
object most likey. You can process the JSON
payload when you can send the corresponding request to the portal.
Regards
Vasu
On Thu, 4 Jun 2020 at 16:40, Damon Goodyear <dam...@ho...>
wrote:
> Hi,
>
> Thanks again.
>
> Switching to PhantomJS had occurred to me and I had started to look at
> that as an option - it seems clear that the guts of the content of the
> pages I am trying to access are generated through a script (or scripts)
> that crash HTMLUnit. I have a feeling this might be intentional.
>
> Its a shame rhino is so tight into HTMLUnit. I wonder if the "couplings"
> could be loosened sufficiently to make the script engine a parameter in
> much the same way that the browser is a parameter through BrowserVersion.
>
> Regards,
>
>
> *Damon Goodyear *
> CONFIDENTIALITY and NON-DISCLOSURE OF EMAIL ADDRESS: This email, including
> its content and the address of the sender, are provided for the use of the
> recipient only and for the purposes of the subject matter under
> discussion. Notwithstanding any other consent that may have been given
> neither the content of this email nor the address of the sender may be
> disclosed to third parties, including within the same undertaking, without
> the prior written permission of the sender. Where this email has been sent
> to a general or non-personal email address permission is granted for a
> reply to be made, on the subject matter only, from the undertaking to whom
> it is addressed.
>
> ------------------------------
> *From:* Vasudevan Comandur <vco...@gm...>
> *Sent:* 04 June 2020 10:40
> *To:* htm...@li... <
> htm...@li...>
> *Subject:* Re: [Htmlunit-user] Exception thrown by webClient.getPage
>
> Hi Damon,
>
> HTMLUnit is tightly coupled with Rhino JS engine. Changing to a
> different engine is a tedious task.
> Selenium uses HTMLUnit driver and see if you can use PhantomJS.
>
> As an alternative, you can switch to use PhantomJS instead of HTMLUnit.
>
> Regards
> Vasu
>
> On Thu, 4 Jun 2020 at 13:38, Damon Goodyear <dam...@ho...>
> wrote:
>
> Hi,
>
> Thanks Vasudevan for your reply.
>
> In fact I did try that too without success - I just get the top and bottom
> of the page and not the numbers (the important bit) in between.
>
> While getting ready for this morning I did wonder if changing the
> javascript engine might work. If I understand correctly HTMLUnit uses
> Rhino. Presumably there are other engines available - is it possible to
> change to a different engine or would that involve a code change?
>
> Thanks,
>
>
> *Damon Goodyear *
> CONFIDENTIALITY and NON-DISCLOSURE OF EMAIL ADDRESS: This email, including
> its content and the address of the sender, are provided for the use of the
> recipient only and for the purposes of the subject matter under
> discussion. Notwithstanding any other consent that may have been given
> neither the content of this email nor the address of the sender may be
> disclosed to third parties, including within the same undertaking, without
> the prior written permission of the sender. Where this email has been sent
> to a general or non-personal email address permission is granted for a
> reply to be made, on the subject matter only, from the undertaking to whom
> it is addressed.
>
> ------------------------------
> *From:* Vasudevan Comandur <vco...@gm...>
> *Sent:* 04 June 2020 08:33
> *To:* htm...@li... <
> htm...@li...>
> *Subject:* Re: [Htmlunit-user] Exception thrown by webClient.getPage
>
> Hi,
>
> See if you can extract the necessary data from the HTML response by
> disabling the Javascript in WebClient.
>
> Regards
> Vasu
>
> On Thu, 4 Jun 2020 at 02:35, Damon Goodyear <dam...@ho...>
> wrote:
>
> Hi,
>
> I am new to HTMLUnit and (re)new to seeking help from mailing lists like
> this - the last time I tried was about 15 years ago and things seem to have
> moved on. I hope my question is useful to you and I hope, even more, that
> your answer is useful to me. I hope I am not committing the newbie sin of
> asking a well known issue/non-issue.
>
> I have encountered a problem from the beginning with HTMLUnit. I have
> been trying to use HTMLUnit to download information from the following
> URLs-
>
> https://www.londonstockexchange.com/stock/OPM/1pm-plc/fundamentals
> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fstock%2FOPM%2F1pm-plc%2Ffundamentals&data=02%7C01%7C%7C1662ac761cc747b3bd2508d8086b7247%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268604855528744&sdata=cA62wIOQw6ZsW7ZU2S0qgtVK2EzyjMQrwHrKVblhA%2FI%3D&reserved=0>
>
> https://www.londonstockexchange.com/live-markets/market-data-dashboard/price-explorer?page=1
> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Flive-markets%2Fmarket-data-dashboard%2Fprice-explorer%3Fpage%3D1&data=02%7C01%7C%7C1662ac761cc747b3bd2508d8086b7247%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268604855538742&sdata=M9%2BLxh8NWcJazsPxFWR9Aawj7C0dKNAJtowqg48hedQ%3D&reserved=0>
>
> https://www.londonstockexchange.com/stock/OPM/1pm-plc/company-page
> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fstock%2FOPM%2F1pm-plc%2Fcompany-page&data=02%7C01%7C%7C1662ac761cc747b3bd2508d8086b7247%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268604855548726&sdata=1HSjNNK0QnhRJccXo%2F6jIhvUYY3IDb5xyyvD9EI7OIo%3D&reserved=0>
>
>
> These are listed in order of importance to me.
>
> The reason for this is that all this information has changed format in the
> last week or so and has become much harder to unpick.
>
> If I try the following code--
>
> WebClient wc = new WebClient(BrowserVersion.CHROME);
>
> //
> LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log",
> "org.apache.commons.logging.impl.NoOpLog");
>
> //
> java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF);
>
> //
> java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF);
>
> try {
>
> HtmlPage page = wc.getPage("
> https://www.londonstockexchange.com/stock/OPM/1pm-plc/fundamentals
> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fstock%2FOPM%2F1pm-plc%2Ffundamentals&data=02%7C01%7C%7C1662ac761cc747b3bd2508d8086b7247%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268604855558720&sdata=ytQWga4yHXli7%2B5%2BeQQi61mTcyDbpnzAvZ4aEZx6xt4%3D&reserved=0>
> ");
>
> String s = page.asText();
>
> System.out.print(s);
>
> wc.close();
> } catch etc...
>
> ... then I get a bucket load of javascript warnings that I can suppress by
> uncommenting the commented lines above, followed by an exception I do not
> understand and cannot find any help on, here or in the wider internet.
>
> The exception starts with the lines--
>
> EcmaError: lineNumber=[1] column=[0] lineSource=[<no source>]
> name=[ReferenceError] sourceName=[
> https://www.londonstockexchange.com:443/polyfills-es5.463681aba2540d60831f.js#1(Function)
> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fpolyfills-es5.463681aba2540d60831f.js%231(Function)&data=02%7C01%7C%7C1662ac761cc747b3bd2508d8086b7247%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268604855568717&sdata=la7r5Cdpesfm87Xw7i5OEDMGkJgEk1Ajs6IA2CQnLag%3D&reserved=0>]
> message=[ReferenceError: Assignment to undefined "regeneratorRuntime" in
> strict mode (
> https://www.londonstockexchange.com:443/polyfills-es5.463681aba2540d60831f.js#1(Function)#1)]
> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fpolyfills-es5.463681aba2540d60831f.js%231(Function)%25231)%5D&data=02%7C01%7C%7C1662ac761cc747b3bd2508d8086b7247%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268604855578712&sdata=swZJhU0PbNOjxpIyT1qvu7tRyrxXxkjPd1lp2TWwPKU%3D&reserved=0>
>
> com.gargoylesoftware.htmlunit.ScriptException: ReferenceError: Assignment
> to undefined "regeneratorRuntime" in strict mode (
> https://www.londonstockexchange.com:443/polyfills-es5.463681aba2540d60831f.js#1(Function)#1)
> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fpolyfills-es5.463681aba2540d60831f.js%231(Function)%25231)&data=02%7C01%7C%7C1662ac761cc747b3bd2508d8086b7247%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268604855588706&sdata=v8szm%2BP9oWegVcW7Db7MItDT1rBBIzP3zMoJZ55pA14%3D&reserved=0>
>
> at
> com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:891)
>
> I can send the full, very long stack, trace if that would help.
>
> Are you able to assist - either by fixing the bug (if it is one) or
> advising me how to get around this.
>
> Thanks,
>
>
> *Damon Goodyear *
> CONFIDENTIALITY and NON-DISCLOSURE OF EMAIL ADDRESS: This email, including
> its content and the address of the sender, are provided for the use of the
> recipient only and for the purposes of the subject matter under
> discussion. Notwithstanding any other consent that may have been given
> neither the content of this email nor the address of the sender may be
> disclosed to third parties, including within the same undertaking, without
> the prior written permission of the sender. Where this email has been sent
> to a general or non-personal email address permission is granted for a
> reply to be made, on the subject matter only, from the undertaking to whom
> it is addressed.
> _______________________________________________
> Htmlunit-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlunit-user
> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fhtmlunit-user&data=02%7C01%7C%7C1662ac761cc747b3bd2508d8086b7247%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268604855598703&sdata=rX362H9e7VCSKssLksmfujIwkKw%2Bvv%2BRv2W8nw9IyLw%3D&reserved=0>
>
> _______________________________________________
> Htmlunit-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlunit-user
> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fhtmlunit-user&data=02%7C01%7C%7C1662ac761cc747b3bd2508d8086b7247%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268604855608702&sdata=bgt9BTrnjyHw%2BdafRbbgwLYllzL%2BoELpPc0PmhmDsKg%3D&reserved=0>
>
> _______________________________________________
> Htmlunit-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlunit-user
>
|