You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(6) |
Jul
(17) |
Aug
(18) |
Sep
(22) |
Oct
(16) |
Nov
(6) |
Dec
(11) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(11) |
Feb
(10) |
Mar
(34) |
Apr
(26) |
May
(6) |
Jun
(22) |
Jul
(14) |
Aug
(4) |
Sep
(47) |
Oct
(69) |
Nov
(23) |
Dec
(21) |
2005 |
Jan
(53) |
Feb
(33) |
Mar
(92) |
Apr
(65) |
May
(63) |
Jun
(57) |
Jul
(43) |
Aug
(132) |
Sep
(61) |
Oct
(75) |
Nov
(60) |
Dec
(130) |
2006 |
Jan
(74) |
Feb
(87) |
Mar
(101) |
Apr
(58) |
May
(54) |
Jun
(42) |
Jul
(31) |
Aug
(67) |
Sep
(61) |
Oct
(71) |
Nov
(28) |
Dec
(58) |
2007 |
Jan
(53) |
Feb
(50) |
Mar
(96) |
Apr
(66) |
May
(55) |
Jun
(130) |
Jul
(99) |
Aug
(115) |
Sep
(37) |
Oct
(78) |
Nov
(24) |
Dec
(70) |
2008 |
Jan
(94) |
Feb
(85) |
Mar
(197) |
Apr
(274) |
May
(119) |
Jun
(143) |
Jul
(193) |
Aug
(99) |
Sep
(160) |
Oct
(120) |
Nov
(178) |
Dec
(109) |
2009 |
Jan
(238) |
Feb
(169) |
Mar
(115) |
Apr
(109) |
May
(131) |
Jun
(167) |
Jul
(144) |
Aug
(193) |
Sep
(155) |
Oct
(154) |
Nov
(97) |
Dec
(127) |
2010 |
Jan
(108) |
Feb
(127) |
Mar
(176) |
Apr
(113) |
May
(130) |
Jun
(200) |
Jul
(115) |
Aug
(80) |
Sep
(92) |
Oct
(101) |
Nov
(124) |
Dec
(53) |
2011 |
Jan
(67) |
Feb
(144) |
Mar
(88) |
Apr
(60) |
May
(89) |
Jun
(54) |
Jul
(68) |
Aug
(81) |
Sep
(48) |
Oct
(40) |
Nov
(10) |
Dec
(20) |
2012 |
Jan
(21) |
Feb
(28) |
Mar
(17) |
Apr
(35) |
May
(41) |
Jun
(44) |
Jul
(68) |
Aug
(67) |
Sep
(89) |
Oct
(58) |
Nov
(47) |
Dec
(56) |
2013 |
Jan
(49) |
Feb
(28) |
Mar
(46) |
Apr
(31) |
May
(28) |
Jun
(37) |
Jul
(34) |
Aug
(52) |
Sep
(42) |
Oct
(108) |
Nov
(59) |
Dec
(56) |
2014 |
Jan
(41) |
Feb
(72) |
Mar
(46) |
Apr
(21) |
May
(19) |
Jun
(17) |
Jul
(15) |
Aug
(40) |
Sep
(11) |
Oct
(3) |
Nov
(5) |
Dec
(31) |
2015 |
Jan
(11) |
Feb
(12) |
Mar
(19) |
Apr
(19) |
May
(38) |
Jun
(54) |
Jul
(14) |
Aug
(42) |
Sep
(14) |
Oct
(16) |
Nov
(26) |
Dec
(14) |
2016 |
Jan
(3) |
Feb
(1) |
Mar
(24) |
Apr
(5) |
May
(15) |
Jun
(14) |
Jul
(33) |
Aug
(19) |
Sep
(8) |
Oct
(10) |
Nov
|
Dec
(2) |
2017 |
Jan
(16) |
Feb
(12) |
Mar
(23) |
Apr
(8) |
May
(11) |
Jun
(20) |
Jul
(21) |
Aug
(20) |
Sep
|
Oct
(6) |
Nov
(9) |
Dec
(2) |
2018 |
Jan
(7) |
Feb
(5) |
Mar
(6) |
Apr
(5) |
May
(1) |
Jun
(2) |
Jul
(2) |
Aug
|
Sep
(4) |
Oct
(3) |
Nov
|
Dec
(4) |
2019 |
Jan
(2) |
Feb
(2) |
Mar
(3) |
Apr
(4) |
May
|
Jun
(4) |
Jul
(9) |
Aug
(2) |
Sep
|
Oct
(4) |
Nov
(1) |
Dec
(7) |
2020 |
Jan
(2) |
Feb
(6) |
Mar
(9) |
Apr
(1) |
May
(1) |
Jun
(15) |
Jul
(1) |
Aug
(1) |
Sep
(2) |
Oct
(6) |
Nov
(3) |
Dec
(5) |
2021 |
Jan
(3) |
Feb
(1) |
Mar
(2) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(1) |
Aug
(3) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
(1) |
Jul
(4) |
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
(6) |
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Tommaso M. <tom...@gm...> - 2020-06-10 06:37:40
|
Dear Community,first of all, thank you for your work. I developed a simple application in JAVA using the version 2.40, Java 8, Eclipse 2019, windows 7 pc. The application checks some information on an online website, first performs a login action and then goes to the next page. Everything works fine when it run via Eclipse, it is still ok if I generate the executable jar and launch it by the same machine but, I tried to run the executable jar from two other pcs (windows 10 and windows server 2012) and the application fails because the login form element of the login page cannot be found. I was able to install eclipse on one of the pcs (windows 10) and again from Eclipse everything works fine, but the executable jar generated from the new environment on the same windows 10 pc get the same issue. I tried to follow instructions found in the mailing list including all the libraries needed, tried to change the browser version (chrome and IE), tried to select the different options provided by eclipse to generate the executable. Just to understand what is included in the login HtmlPage requested when it fails, I tried to call the method getPageXml as first action and write the result in a local html file, in that point all fail and I get the errors: *INFORMATION: CanvasRenderingContext2D.isPointInPath() not yet implementedgiu 10, 2020 8:34:42 AM com.gargoylesoftware.htmlunit.javascript.background.DefaultJavaScriptExecutor killThreadWARNING: Event loop thread JS executor for com.gargoylesoftware.htmlunit.WebClient@72acb2 still alive at 1591770882648giu 10, 2020 8:34:42 AM com.gargoylesoftware.htmlunit.javascript.background.DefaultJavaScriptExecutor killThreadWARNING: Event loop thread will be stoppedgiu 10, 2020 8:34:42 AM com.akabana.AntonioliWebScraper.AntonioliWebScrapeMain mainSEVERE: An error occourred: null* Do you have any idea about what can cause the issue? Thank you. Kind regards, Tommaso |
From: Damon G. <dam...@ho...> - 2020-06-07 18:45:22
|
Hi, As anticipated my attempts to work through your suggestion is creating more effort overhead (i.e. learning some basics) than I can do in a reasonable amount time. Luckily I have found some help in the form of a book by Kevin Sahin - available here https://www.scrapingbee.com/java-webscraping-book/. It will take me a little while to try to solve my problem using your suggestion and the help available in this book, which is written at just about my level, if not a bit above it. My guess is that by the time I have finished I will either have solved my problem or will have a different (set of) problem(s). So for now I thank you for your help and will get on with what I need to. BTW if you have any suggestions to supplement this book - which contains some substantial code examples using HTMLUnit - I'm ready to hear them. Thanks again, Damon Goodyear CONFIDENTIALITY and NON-DISCLOSURE OF EMAIL ADDRESS: This email, including its content and the address of the sender, are provided for the use of the recipient only and for the purposes of the subject matter under discussion. Notwithstanding any other consent that may have been given neither the content of this email nor the address of the sender may be disclosed to third parties, including within the same undertaking, without the prior written permission of the sender. Where this email has been sent to a general or non-personal email address permission is granted for a reply to be made, on the subject matter only, from the undertaking to whom it is addressed. ________________________________ From: Vasudevan Comandur <vco...@gm...> Sent: 04 June 2020 12:52 To: htm...@li... <htm...@li...> Subject: Re: [Htmlunit-user] Exception thrown by webClient.getPage Hi, If you are using Chrome browser, there is an option in the browser to trace the network traffic under more tools->developer tool option You can see the request/response and look for the request which gets you the response of interest. Call that request from your HTMLUnit program specifying the same parameters and see if you are getting the same response from the server Regards Vasu On Thu, 4 Jun 2020 at 17:10, Damon Goodyear <dam...@ho...<mailto:dam...@ho...>> wrote: Hi, Umm... I'm not sure I know how to do that (i.e. trace through a proxy tool). I have certainly seen JSON mention in the Warnings given by the getPage function mentioned in my original post. I don't suppose you could explain your suggestion a little more - and I could then try. Thanks, Damon Goodyear CONFIDENTIALITY and NON-DISCLOSURE OF EMAIL ADDRESS: This email, including its content and the address of the sender, are provided for the use of the recipient only and for the purposes of the subject matter under discussion. Notwithstanding any other consent that may have been given neither the content of this email nor the address of the sender may be disclosed to third parties, including within the same undertaking, without the prior written permission of the sender. Where this email has been sent to a general or non-personal email address permission is granted for a reply to be made, on the subject matter only, from the undertaking to whom it is addressed. _______________________________________________ Htmlunit-user mailing list Htm...@li...<mailto:Htm...@li...> https://lists.sourceforge.net/lists/listinfo/htmlunit-user<https://eur06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fhtmlunit-user&data=02%7C01%7C%7Cd97a06a977984d7253a408d8087ddf01%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268683986628516&sdata=QM5mywTCKTuFuKtILJxfghGZ5TBmAnN4C%2FhEyVlrX0E%3D&reserved=0> |
From: Ronald B. <rb...@rb...> - 2020-06-06 15:29:02
|
Hi all, it is a pleasure to announce the availability of HtmlUnit 2.40.0. The main enhancements are: - Chrome updated to Chrome 83 - Firefox updated to Firefox77 - many fixes in Rhino - the usual bunch of fixes The full list of changes can be found in [1] Thanks to all the contributors. Happy Testing/Scraping! The HtmlUnit team [1] http://htmlunit.sourceforge.net/changes-report.html#a2.41.0 |
From: Damon G. <dam...@ho...> - 2020-06-04 12:03:01
|
Hi, OK thanks for that. As a newbie ( 🙂 ) it'll take me a little while to get my head around your suggestion. So there maybe some delay getting back to with a substantive reply. Thanks again, Damon Goodyear CONFIDENTIALITY and NON-DISCLOSURE OF EMAIL ADDRESS: This email, including its content and the address of the sender, are provided for the use of the recipient only and for the purposes of the subject matter under discussion. Notwithstanding any other consent that may have been given neither the content of this email nor the address of the sender may be disclosed to third parties, including within the same undertaking, without the prior written permission of the sender. Where this email has been sent to a general or non-personal email address permission is granted for a reply to be made, on the subject matter only, from the undertaking to whom it is addressed. ________________________________ |
From: Vasudevan C. <vco...@gm...> - 2020-06-04 11:52:59
|
Hi, If you are using Chrome browser, there is an option in the browser to trace the network traffic under more tools->developer tool option You can see the request/response and look for the request which gets you the response of interest. Call that request from your HTMLUnit program specifying the same parameters and see if you are getting the same response from the server Regards Vasu On Thu, 4 Jun 2020 at 17:10, Damon Goodyear <dam...@ho...> wrote: > Hi, > > Umm... I'm not sure I know how to do that (i.e. trace through a proxy > tool). I have certainly seen JSON mention in the Warnings given by the > getPage function mentioned in my original post. > > I don't suppose you could explain your suggestion a little more - and I > could then try. > > Thanks, > > > > *Damon Goodyear * > CONFIDENTIALITY and NON-DISCLOSURE OF EMAIL ADDRESS: This email, including > its content and the address of the sender, are provided for the use of the > recipient only and for the purposes of the subject matter under > discussion. Notwithstanding any other consent that may have been given > neither the content of this email nor the address of the sender may be > disclosed to third parties, including within the same undertaking, without > the prior written permission of the sender. Where this email has been sent > to a general or non-personal email address permission is granted for a > reply to be made, on the subject matter only, from the undertaking to whom > it is addressed. > > _______________________________________________ > Htmlunit-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlunit-user > |
From: Damon G. <dam...@ho...> - 2020-06-04 11:40:21
|
Hi, Umm... I'm not sure I know how to do that (i.e. trace through a proxy tool). I have certainly seen JSON mention in the Warnings given by the getPage function mentioned in my original post. I don't suppose you could explain your suggestion a little more - and I could then try. Thanks, Damon Goodyear CONFIDENTIALITY and NON-DISCLOSURE OF EMAIL ADDRESS: This email, including its content and the address of the sender, are provided for the use of the recipient only and for the purposes of the subject matter under discussion. Notwithstanding any other consent that may have been given neither the content of this email nor the address of the sender may be disclosed to third parties, including within the same undertaking, without the prior written permission of the sender. Where this email has been sent to a general or non-personal email address permission is granted for a reply to be made, on the subject matter only, from the undertaking to whom it is addressed. |
From: Vasudevan C. <vco...@gm...> - 2020-06-04 11:19:20
|
Hi Damon, Can you trace the request/response using a proxy tool? The payload that you are interested to capture might come as a JSON object most likey. You can process the JSON payload when you can send the corresponding request to the portal. Regards Vasu On Thu, 4 Jun 2020 at 16:40, Damon Goodyear <dam...@ho...> wrote: > Hi, > > Thanks again. > > Switching to PhantomJS had occurred to me and I had started to look at > that as an option - it seems clear that the guts of the content of the > pages I am trying to access are generated through a script (or scripts) > that crash HTMLUnit. I have a feeling this might be intentional. > > Its a shame rhino is so tight into HTMLUnit. I wonder if the "couplings" > could be loosened sufficiently to make the script engine a parameter in > much the same way that the browser is a parameter through BrowserVersion. > > Regards, > > > *Damon Goodyear * > CONFIDENTIALITY and NON-DISCLOSURE OF EMAIL ADDRESS: This email, including > its content and the address of the sender, are provided for the use of the > recipient only and for the purposes of the subject matter under > discussion. Notwithstanding any other consent that may have been given > neither the content of this email nor the address of the sender may be > disclosed to third parties, including within the same undertaking, without > the prior written permission of the sender. Where this email has been sent > to a general or non-personal email address permission is granted for a > reply to be made, on the subject matter only, from the undertaking to whom > it is addressed. > > ------------------------------ > *From:* Vasudevan Comandur <vco...@gm...> > *Sent:* 04 June 2020 10:40 > *To:* htm...@li... < > htm...@li...> > *Subject:* Re: [Htmlunit-user] Exception thrown by webClient.getPage > > Hi Damon, > > HTMLUnit is tightly coupled with Rhino JS engine. Changing to a > different engine is a tedious task. > Selenium uses HTMLUnit driver and see if you can use PhantomJS. > > As an alternative, you can switch to use PhantomJS instead of HTMLUnit. > > Regards > Vasu > > On Thu, 4 Jun 2020 at 13:38, Damon Goodyear <dam...@ho...> > wrote: > > Hi, > > Thanks Vasudevan for your reply. > > In fact I did try that too without success - I just get the top and bottom > of the page and not the numbers (the important bit) in between. > > While getting ready for this morning I did wonder if changing the > javascript engine might work. If I understand correctly HTMLUnit uses > Rhino. Presumably there are other engines available - is it possible to > change to a different engine or would that involve a code change? > > Thanks, > > > *Damon Goodyear * > CONFIDENTIALITY and NON-DISCLOSURE OF EMAIL ADDRESS: This email, including > its content and the address of the sender, are provided for the use of the > recipient only and for the purposes of the subject matter under > discussion. Notwithstanding any other consent that may have been given > neither the content of this email nor the address of the sender may be > disclosed to third parties, including within the same undertaking, without > the prior written permission of the sender. Where this email has been sent > to a general or non-personal email address permission is granted for a > reply to be made, on the subject matter only, from the undertaking to whom > it is addressed. > > ------------------------------ > *From:* Vasudevan Comandur <vco...@gm...> > *Sent:* 04 June 2020 08:33 > *To:* htm...@li... < > htm...@li...> > *Subject:* Re: [Htmlunit-user] Exception thrown by webClient.getPage > > Hi, > > See if you can extract the necessary data from the HTML response by > disabling the Javascript in WebClient. > > Regards > Vasu > > On Thu, 4 Jun 2020 at 02:35, Damon Goodyear <dam...@ho...> > wrote: > > Hi, > > I am new to HTMLUnit and (re)new to seeking help from mailing lists like > this - the last time I tried was about 15 years ago and things seem to have > moved on. I hope my question is useful to you and I hope, even more, that > your answer is useful to me. I hope I am not committing the newbie sin of > asking a well known issue/non-issue. > > I have encountered a problem from the beginning with HTMLUnit. I have > been trying to use HTMLUnit to download information from the following > URLs- > > https://www.londonstockexchange.com/stock/OPM/1pm-plc/fundamentals > <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fstock%2FOPM%2F1pm-plc%2Ffundamentals&data=02%7C01%7C%7C1662ac761cc747b3bd2508d8086b7247%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268604855528744&sdata=cA62wIOQw6ZsW7ZU2S0qgtVK2EzyjMQrwHrKVblhA%2FI%3D&reserved=0> > > https://www.londonstockexchange.com/live-markets/market-data-dashboard/price-explorer?page=1 > <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Flive-markets%2Fmarket-data-dashboard%2Fprice-explorer%3Fpage%3D1&data=02%7C01%7C%7C1662ac761cc747b3bd2508d8086b7247%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268604855538742&sdata=M9%2BLxh8NWcJazsPxFWR9Aawj7C0dKNAJtowqg48hedQ%3D&reserved=0> > > https://www.londonstockexchange.com/stock/OPM/1pm-plc/company-page > <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fstock%2FOPM%2F1pm-plc%2Fcompany-page&data=02%7C01%7C%7C1662ac761cc747b3bd2508d8086b7247%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268604855548726&sdata=1HSjNNK0QnhRJccXo%2F6jIhvUYY3IDb5xyyvD9EI7OIo%3D&reserved=0> > > > These are listed in order of importance to me. > > The reason for this is that all this information has changed format in the > last week or so and has become much harder to unpick. > > If I try the following code-- > > WebClient wc = new WebClient(BrowserVersion.CHROME); > > // > LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log", > "org.apache.commons.logging.impl.NoOpLog"); > > // > java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF); > > // > java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF); > > try { > > HtmlPage page = wc.getPage(" > https://www.londonstockexchange.com/stock/OPM/1pm-plc/fundamentals > <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fstock%2FOPM%2F1pm-plc%2Ffundamentals&data=02%7C01%7C%7C1662ac761cc747b3bd2508d8086b7247%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268604855558720&sdata=ytQWga4yHXli7%2B5%2BeQQi61mTcyDbpnzAvZ4aEZx6xt4%3D&reserved=0> > "); > > String s = page.asText(); > > System.out.print(s); > > wc.close(); > } catch etc... > > ... then I get a bucket load of javascript warnings that I can suppress by > uncommenting the commented lines above, followed by an exception I do not > understand and cannot find any help on, here or in the wider internet. > > The exception starts with the lines-- > > EcmaError: lineNumber=[1] column=[0] lineSource=[<no source>] > name=[ReferenceError] sourceName=[ > https://www.londonstockexchange.com:443/polyfills-es5.463681aba2540d60831f.js#1(Function) > <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fpolyfills-es5.463681aba2540d60831f.js%231(Function)&data=02%7C01%7C%7C1662ac761cc747b3bd2508d8086b7247%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268604855568717&sdata=la7r5Cdpesfm87Xw7i5OEDMGkJgEk1Ajs6IA2CQnLag%3D&reserved=0>] > message=[ReferenceError: Assignment to undefined "regeneratorRuntime" in > strict mode ( > https://www.londonstockexchange.com:443/polyfills-es5.463681aba2540d60831f.js#1(Function)#1)] > <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fpolyfills-es5.463681aba2540d60831f.js%231(Function)%25231)%5D&data=02%7C01%7C%7C1662ac761cc747b3bd2508d8086b7247%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268604855578712&sdata=swZJhU0PbNOjxpIyT1qvu7tRyrxXxkjPd1lp2TWwPKU%3D&reserved=0> > > com.gargoylesoftware.htmlunit.ScriptException: ReferenceError: Assignment > to undefined "regeneratorRuntime" in strict mode ( > https://www.londonstockexchange.com:443/polyfills-es5.463681aba2540d60831f.js#1(Function)#1) > <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fpolyfills-es5.463681aba2540d60831f.js%231(Function)%25231)&data=02%7C01%7C%7C1662ac761cc747b3bd2508d8086b7247%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268604855588706&sdata=v8szm%2BP9oWegVcW7Db7MItDT1rBBIzP3zMoJZ55pA14%3D&reserved=0> > > at > com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:891) > > I can send the full, very long stack, trace if that would help. > > Are you able to assist - either by fixing the bug (if it is one) or > advising me how to get around this. > > Thanks, > > > *Damon Goodyear * > CONFIDENTIALITY and NON-DISCLOSURE OF EMAIL ADDRESS: This email, including > its content and the address of the sender, are provided for the use of the > recipient only and for the purposes of the subject matter under > discussion. Notwithstanding any other consent that may have been given > neither the content of this email nor the address of the sender may be > disclosed to third parties, including within the same undertaking, without > the prior written permission of the sender. Where this email has been sent > to a general or non-personal email address permission is granted for a > reply to be made, on the subject matter only, from the undertaking to whom > it is addressed. > _______________________________________________ > Htmlunit-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlunit-user > <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fhtmlunit-user&data=02%7C01%7C%7C1662ac761cc747b3bd2508d8086b7247%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268604855598703&sdata=rX362H9e7VCSKssLksmfujIwkKw%2Bvv%2BRv2W8nw9IyLw%3D&reserved=0> > > _______________________________________________ > Htmlunit-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlunit-user > <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fhtmlunit-user&data=02%7C01%7C%7C1662ac761cc747b3bd2508d8086b7247%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268604855608702&sdata=bgt9BTrnjyHw%2BdafRbbgwLYllzL%2BoELpPc0PmhmDsKg%3D&reserved=0> > > _______________________________________________ > Htmlunit-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlunit-user > |
From: Damon G. <dam...@ho...> - 2020-06-04 11:09:42
|
Hi, Thanks again. Switching to PhantomJS had occurred to me and I had started to look at that as an option - it seems clear that the guts of the content of the pages I am trying to access are generated through a script (or scripts) that crash HTMLUnit. I have a feeling this might be intentional. Its a shame rhino is so tight into HTMLUnit. I wonder if the "couplings" could be loosened sufficiently to make the script engine a parameter in much the same way that the browser is a parameter through BrowserVersion. Regards, Damon Goodyear CONFIDENTIALITY and NON-DISCLOSURE OF EMAIL ADDRESS: This email, including its content and the address of the sender, are provided for the use of the recipient only and for the purposes of the subject matter under discussion. Notwithstanding any other consent that may have been given neither the content of this email nor the address of the sender may be disclosed to third parties, including within the same undertaking, without the prior written permission of the sender. Where this email has been sent to a general or non-personal email address permission is granted for a reply to be made, on the subject matter only, from the undertaking to whom it is addressed. ________________________________ From: Vasudevan Comandur <vco...@gm...> Sent: 04 June 2020 10:40 To: htm...@li... <htm...@li...> Subject: Re: [Htmlunit-user] Exception thrown by webClient.getPage Hi Damon, HTMLUnit is tightly coupled with Rhino JS engine. Changing to a different engine is a tedious task. Selenium uses HTMLUnit driver and see if you can use PhantomJS. As an alternative, you can switch to use PhantomJS instead of HTMLUnit. Regards Vasu On Thu, 4 Jun 2020 at 13:38, Damon Goodyear <dam...@ho...<mailto:dam...@ho...>> wrote: Hi, Thanks Vasudevan for your reply. In fact I did try that too without success - I just get the top and bottom of the page and not the numbers (the important bit) in between. While getting ready for this morning I did wonder if changing the javascript engine might work. If I understand correctly HTMLUnit uses Rhino. Presumably there are other engines available - is it possible to change to a different engine or would that involve a code change? Thanks, Damon Goodyear CONFIDENTIALITY and NON-DISCLOSURE OF EMAIL ADDRESS: This email, including its content and the address of the sender, are provided for the use of the recipient only and for the purposes of the subject matter under discussion. Notwithstanding any other consent that may have been given neither the content of this email nor the address of the sender may be disclosed to third parties, including within the same undertaking, without the prior written permission of the sender. Where this email has been sent to a general or non-personal email address permission is granted for a reply to be made, on the subject matter only, from the undertaking to whom it is addressed. ________________________________ From: Vasudevan Comandur <vco...@gm...<mailto:vco...@gm...>> Sent: 04 June 2020 08:33 To: htm...@li...<mailto:htm...@li...> <htm...@li...<mailto:htm...@li...>> Subject: Re: [Htmlunit-user] Exception thrown by webClient.getPage Hi, See if you can extract the necessary data from the HTML response by disabling the Javascript in WebClient. Regards Vasu On Thu, 4 Jun 2020 at 02:35, Damon Goodyear <dam...@ho...<mailto:dam...@ho...>> wrote: Hi, I am new to HTMLUnit and (re)new to seeking help from mailing lists like this - the last time I tried was about 15 years ago and things seem to have moved on. I hope my question is useful to you and I hope, even more, that your answer is useful to me. I hope I am not committing the newbie sin of asking a well known issue/non-issue. I have encountered a problem from the beginning with HTMLUnit. I have been trying to use HTMLUnit to download information from the following URLs- https://www.londonstockexchange.com/stock/OPM/1pm-plc/fundamentals<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fstock%2FOPM%2F1pm-plc%2Ffundamentals&data=02%7C01%7C%7C1662ac761cc747b3bd2508d8086b7247%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268604855528744&sdata=cA62wIOQw6ZsW7ZU2S0qgtVK2EzyjMQrwHrKVblhA%2FI%3D&reserved=0> https://www.londonstockexchange.com/live-markets/market-data-dashboard/price-explorer?page=1<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Flive-markets%2Fmarket-data-dashboard%2Fprice-explorer%3Fpage%3D1&data=02%7C01%7C%7C1662ac761cc747b3bd2508d8086b7247%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268604855538742&sdata=M9%2BLxh8NWcJazsPxFWR9Aawj7C0dKNAJtowqg48hedQ%3D&reserved=0> https://www.londonstockexchange.com/stock/OPM/1pm-plc/company-page<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fstock%2FOPM%2F1pm-plc%2Fcompany-page&data=02%7C01%7C%7C1662ac761cc747b3bd2508d8086b7247%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268604855548726&sdata=1HSjNNK0QnhRJccXo%2F6jIhvUYY3IDb5xyyvD9EI7OIo%3D&reserved=0> These are listed in order of importance to me. The reason for this is that all this information has changed format in the last week or so and has become much harder to unpick. If I try the following code-- WebClient wc = new WebClient(BrowserVersion.CHROME); // LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.NoOpLog"); // java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF); // java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF); try { HtmlPage page = wc.getPage("https://www.londonstockexchange.com/stock/OPM/1pm-plc/fundamentals<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fstock%2FOPM%2F1pm-plc%2Ffundamentals&data=02%7C01%7C%7C1662ac761cc747b3bd2508d8086b7247%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268604855558720&sdata=ytQWga4yHXli7%2B5%2BeQQi61mTcyDbpnzAvZ4aEZx6xt4%3D&reserved=0>"); String s = page.asText(); System.out.print(s); wc.close(); } catch etc... ... then I get a bucket load of javascript warnings that I can suppress by uncommenting the commented lines above, followed by an exception I do not understand and cannot find any help on, here or in the wider internet. The exception starts with the lines-- EcmaError: lineNumber=[1] column=[0] lineSource=[<no source>] name=[ReferenceError] sourceName=[https://www.londonstockexchange.com:443/polyfills-es5.463681aba2540d60831f.js#1(Function)<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fpolyfills-es5.463681aba2540d60831f.js%231(Function)&data=02%7C01%7C%7C1662ac761cc747b3bd2508d8086b7247%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268604855568717&sdata=la7r5Cdpesfm87Xw7i5OEDMGkJgEk1Ajs6IA2CQnLag%3D&reserved=0>] message=[ReferenceError: Assignment to undefined "regeneratorRuntime" in strict mode (https://www.londonstockexchange.com:443/polyfills-es5.463681aba2540d60831f.js#1(Function)#1)]<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fpolyfills-es5.463681aba2540d60831f.js%231(Function)%25231)%5D&data=02%7C01%7C%7C1662ac761cc747b3bd2508d8086b7247%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268604855578712&sdata=swZJhU0PbNOjxpIyT1qvu7tRyrxXxkjPd1lp2TWwPKU%3D&reserved=0> com.gargoylesoftware.htmlunit.ScriptException: ReferenceError: Assignment to undefined "regeneratorRuntime" in strict mode (https://www.londonstockexchange.com:443/polyfills-es5.463681aba2540d60831f.js#1(Function)#1)<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fpolyfills-es5.463681aba2540d60831f.js%231(Function)%25231)&data=02%7C01%7C%7C1662ac761cc747b3bd2508d8086b7247%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268604855588706&sdata=v8szm%2BP9oWegVcW7Db7MItDT1rBBIzP3zMoJZ55pA14%3D&reserved=0> at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:891) I can send the full, very long stack, trace if that would help. Are you able to assist - either by fixing the bug (if it is one) or advising me how to get around this. Thanks, Damon Goodyear CONFIDENTIALITY and NON-DISCLOSURE OF EMAIL ADDRESS: This email, including its content and the address of the sender, are provided for the use of the recipient only and for the purposes of the subject matter under discussion. Notwithstanding any other consent that may have been given neither the content of this email nor the address of the sender may be disclosed to third parties, including within the same undertaking, without the prior written permission of the sender. Where this email has been sent to a general or non-personal email address permission is granted for a reply to be made, on the subject matter only, from the undertaking to whom it is addressed. _______________________________________________ Htmlunit-user mailing list Htm...@li...<mailto:Htm...@li...> https://lists.sourceforge.net/lists/listinfo/htmlunit-user<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fhtmlunit-user&data=02%7C01%7C%7C1662ac761cc747b3bd2508d8086b7247%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268604855598703&sdata=rX362H9e7VCSKssLksmfujIwkKw%2Bvv%2BRv2W8nw9IyLw%3D&reserved=0> _______________________________________________ Htmlunit-user mailing list Htm...@li...<mailto:Htm...@li...> https://lists.sourceforge.net/lists/listinfo/htmlunit-user<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fhtmlunit-user&data=02%7C01%7C%7C1662ac761cc747b3bd2508d8086b7247%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268604855608702&sdata=bgt9BTrnjyHw%2BdafRbbgwLYllzL%2BoELpPc0PmhmDsKg%3D&reserved=0> |
From: Vasudevan C. <vco...@gm...> - 2020-06-04 09:41:05
|
Hi Damon, HTMLUnit is tightly coupled with Rhino JS engine. Changing to a different engine is a tedious task. Selenium uses HTMLUnit driver and see if you can use PhantomJS. As an alternative, you can switch to use PhantomJS instead of HTMLUnit. Regards Vasu On Thu, 4 Jun 2020 at 13:38, Damon Goodyear <dam...@ho...> wrote: > Hi, > > Thanks Vasudevan for your reply. > > In fact I did try that too without success - I just get the top and bottom > of the page and not the numbers (the important bit) in between. > > While getting ready for this morning I did wonder if changing the > javascript engine might work. If I understand correctly HTMLUnit uses > Rhino. Presumably there are other engines available - is it possible to > change to a different engine or would that involve a code change? > > Thanks, > > > *Damon Goodyear * > CONFIDENTIALITY and NON-DISCLOSURE OF EMAIL ADDRESS: This email, including > its content and the address of the sender, are provided for the use of the > recipient only and for the purposes of the subject matter under > discussion. Notwithstanding any other consent that may have been given > neither the content of this email nor the address of the sender may be > disclosed to third parties, including within the same undertaking, without > the prior written permission of the sender. Where this email has been sent > to a general or non-personal email address permission is granted for a > reply to be made, on the subject matter only, from the undertaking to whom > it is addressed. > > ------------------------------ > *From:* Vasudevan Comandur <vco...@gm...> > *Sent:* 04 June 2020 08:33 > *To:* htm...@li... < > htm...@li...> > *Subject:* Re: [Htmlunit-user] Exception thrown by webClient.getPage > > Hi, > > See if you can extract the necessary data from the HTML response by > disabling the Javascript in WebClient. > > Regards > Vasu > > On Thu, 4 Jun 2020 at 02:35, Damon Goodyear <dam...@ho...> > wrote: > > Hi, > > I am new to HTMLUnit and (re)new to seeking help from mailing lists like > this - the last time I tried was about 15 years ago and things seem to have > moved on. I hope my question is useful to you and I hope, even more, that > your answer is useful to me. I hope I am not committing the newbie sin of > asking a well known issue/non-issue. > > I have encountered a problem from the beginning with HTMLUnit. I have > been trying to use HTMLUnit to download information from the following > URLs- > > https://www.londonstockexchange.com/stock/OPM/1pm-plc/fundamentals > <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fstock%2FOPM%2F1pm-plc%2Ffundamentals&data=02%7C01%7C%7C8382574b518747db73af08d808599c2d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268528251865575&sdata=Z1A8vO7RbaleZ24lhTCGroUJXDIlCEb1phvtoebx5JI%3D&reserved=0> > > https://www.londonstockexchange.com/live-markets/market-data-dashboard/price-explorer?page=1 > <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Flive-markets%2Fmarket-data-dashboard%2Fprice-explorer%3Fpage%3D1&data=02%7C01%7C%7C8382574b518747db73af08d808599c2d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268528251865575&sdata=4Fa8ARX0Z%2Ff6bA3G4bHcXr7QM87M4Oq85tK%2FRqbqiwA%3D&reserved=0> > > https://www.londonstockexchange.com/stock/OPM/1pm-plc/company-page > <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fstock%2FOPM%2F1pm-plc%2Fcompany-page&data=02%7C01%7C%7C8382574b518747db73af08d808599c2d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268528251876009&sdata=MCNWrk%2FDZZbO7pX5IFqIVQ89AXtR84yUyjGW93DPmuA%3D&reserved=0> > > > These are listed in order of importance to me. > > The reason for this is that all this information has changed format in the > last week or so and has become much harder to unpick. > > If I try the following code-- > > WebClient wc = new WebClient(BrowserVersion.CHROME); > > // > LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log", > "org.apache.commons.logging.impl.NoOpLog"); > > // > java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF); > > // > java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF); > > try { > > HtmlPage page = wc.getPage(" > https://www.londonstockexchange.com/stock/OPM/1pm-plc/fundamentals > <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fstock%2FOPM%2F1pm-plc%2Ffundamentals&data=02%7C01%7C%7C8382574b518747db73af08d808599c2d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268528251876009&sdata=kNMT9Wp91A3paK5a%2FFARZKelW6Tt92DDE8UulZT%2BtJI%3D&reserved=0> > "); > > String s = page.asText(); > > System.out.print(s); > > wc.close(); > } catch etc... > > ... then I get a bucket load of javascript warnings that I can suppress by > uncommenting the commented lines above, followed by an exception I do not > understand and cannot find any help on, here or in the wider internet. > > The exception starts with the lines-- > > EcmaError: lineNumber=[1] column=[0] lineSource=[<no source>] > name=[ReferenceError] sourceName=[ > https://www.londonstockexchange.com:443/polyfills-es5.463681aba2540d60831f.js#1(Function) > <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fpolyfills-es5.463681aba2540d60831f.js%231(Function)&data=02%7C01%7C%7C8382574b518747db73af08d808599c2d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268528251885567&sdata=w2YKNvgSflbEO9rxVm9FebkDZu6opgBLfyE8jmyPvPU%3D&reserved=0>] > message=[ReferenceError: Assignment to undefined "regeneratorRuntime" in > strict mode ( > https://www.londonstockexchange.com:443/polyfills-es5.463681aba2540d60831f.js#1(Function)#1)] > <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fpolyfills-es5.463681aba2540d60831f.js%231(Function)%25231)%5D&data=02%7C01%7C%7C8382574b518747db73af08d808599c2d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268528251885567&sdata=jjYcfEYI7a17umBG0IfbYdD%2BUaS3Xx4KuImbSUEjYeA%3D&reserved=0> > > com.gargoylesoftware.htmlunit.ScriptException: ReferenceError: Assignment > to undefined "regeneratorRuntime" in strict mode ( > https://www.londonstockexchange.com:443/polyfills-es5.463681aba2540d60831f.js#1(Function)#1) > <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fpolyfills-es5.463681aba2540d60831f.js%231(Function)%25231)&data=02%7C01%7C%7C8382574b518747db73af08d808599c2d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268528251895558&sdata=bMM48D7M4YtrneW6NPRpcxKno85wVdR0UPh6FltvmUE%3D&reserved=0> > > at > com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:891) > > I can send the full, very long stack, trace if that would help. > > Are you able to assist - either by fixing the bug (if it is one) or > advising me how to get around this. > > Thanks, > > > *Damon Goodyear * > CONFIDENTIALITY and NON-DISCLOSURE OF EMAIL ADDRESS: This email, including > its content and the address of the sender, are provided for the use of the > recipient only and for the purposes of the subject matter under > discussion. Notwithstanding any other consent that may have been given > neither the content of this email nor the address of the sender may be > disclosed to third parties, including within the same undertaking, without > the prior written permission of the sender. Where this email has been sent > to a general or non-personal email address permission is granted for a > reply to be made, on the subject matter only, from the undertaking to whom > it is addressed. > _______________________________________________ > Htmlunit-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlunit-user > <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fhtmlunit-user&data=02%7C01%7C%7C8382574b518747db73af08d808599c2d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268528251895558&sdata=vmnKBUieKPn97FZe6ahufIhzO5veAOmm5q3v%2FoxGsMQ%3D&reserved=0> > > _______________________________________________ > Htmlunit-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlunit-user > |
From: Damon G. <dam...@ho...> - 2020-06-04 08:08:26
|
Hi, Thanks Vasudevan for your reply. In fact I did try that too without success - I just get the top and bottom of the page and not the numbers (the important bit) in between. While getting ready for this morning I did wonder if changing the javascript engine might work. If I understand correctly HTMLUnit uses Rhino. Presumably there are other engines available - is it possible to change to a different engine or would that involve a code change? Thanks, Damon Goodyear CONFIDENTIALITY and NON-DISCLOSURE OF EMAIL ADDRESS: This email, including its content and the address of the sender, are provided for the use of the recipient only and for the purposes of the subject matter under discussion. Notwithstanding any other consent that may have been given neither the content of this email nor the address of the sender may be disclosed to third parties, including within the same undertaking, without the prior written permission of the sender. Where this email has been sent to a general or non-personal email address permission is granted for a reply to be made, on the subject matter only, from the undertaking to whom it is addressed. ________________________________ From: Vasudevan Comandur <vco...@gm...> Sent: 04 June 2020 08:33 To: htm...@li... <htm...@li...> Subject: Re: [Htmlunit-user] Exception thrown by webClient.getPage Hi, See if you can extract the necessary data from the HTML response by disabling the Javascript in WebClient. Regards Vasu On Thu, 4 Jun 2020 at 02:35, Damon Goodyear <dam...@ho...<mailto:dam...@ho...>> wrote: Hi, I am new to HTMLUnit and (re)new to seeking help from mailing lists like this - the last time I tried was about 15 years ago and things seem to have moved on. I hope my question is useful to you and I hope, even more, that your answer is useful to me. I hope I am not committing the newbie sin of asking a well known issue/non-issue. I have encountered a problem from the beginning with HTMLUnit. I have been trying to use HTMLUnit to download information from the following URLs- https://www.londonstockexchange.com/stock/OPM/1pm-plc/fundamentals<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fstock%2FOPM%2F1pm-plc%2Ffundamentals&data=02%7C01%7C%7C8382574b518747db73af08d808599c2d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268528251865575&sdata=Z1A8vO7RbaleZ24lhTCGroUJXDIlCEb1phvtoebx5JI%3D&reserved=0> https://www.londonstockexchange.com/live-markets/market-data-dashboard/price-explorer?page=1<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Flive-markets%2Fmarket-data-dashboard%2Fprice-explorer%3Fpage%3D1&data=02%7C01%7C%7C8382574b518747db73af08d808599c2d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268528251865575&sdata=4Fa8ARX0Z%2Ff6bA3G4bHcXr7QM87M4Oq85tK%2FRqbqiwA%3D&reserved=0> https://www.londonstockexchange.com/stock/OPM/1pm-plc/company-page<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fstock%2FOPM%2F1pm-plc%2Fcompany-page&data=02%7C01%7C%7C8382574b518747db73af08d808599c2d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268528251876009&sdata=MCNWrk%2FDZZbO7pX5IFqIVQ89AXtR84yUyjGW93DPmuA%3D&reserved=0> These are listed in order of importance to me. The reason for this is that all this information has changed format in the last week or so and has become much harder to unpick. If I try the following code-- WebClient wc = new WebClient(BrowserVersion.CHROME); // LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.NoOpLog"); // java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF); // java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF); try { HtmlPage page = wc.getPage("https://www.londonstockexchange.com/stock/OPM/1pm-plc/fundamentals<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fstock%2FOPM%2F1pm-plc%2Ffundamentals&data=02%7C01%7C%7C8382574b518747db73af08d808599c2d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268528251876009&sdata=kNMT9Wp91A3paK5a%2FFARZKelW6Tt92DDE8UulZT%2BtJI%3D&reserved=0>"); String s = page.asText(); System.out.print(s); wc.close(); } catch etc... ... then I get a bucket load of javascript warnings that I can suppress by uncommenting the commented lines above, followed by an exception I do not understand and cannot find any help on, here or in the wider internet. The exception starts with the lines-- EcmaError: lineNumber=[1] column=[0] lineSource=[<no source>] name=[ReferenceError] sourceName=[https://www.londonstockexchange.com:443/polyfills-es5.463681aba2540d60831f.js#1(Function)<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fpolyfills-es5.463681aba2540d60831f.js%231(Function)&data=02%7C01%7C%7C8382574b518747db73af08d808599c2d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268528251885567&sdata=w2YKNvgSflbEO9rxVm9FebkDZu6opgBLfyE8jmyPvPU%3D&reserved=0>] message=[ReferenceError: Assignment to undefined "regeneratorRuntime" in strict mode (https://www.londonstockexchange.com:443/polyfills-es5.463681aba2540d60831f.js#1(Function)#1)]<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fpolyfills-es5.463681aba2540d60831f.js%231(Function)%25231)%5D&data=02%7C01%7C%7C8382574b518747db73af08d808599c2d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268528251885567&sdata=jjYcfEYI7a17umBG0IfbYdD%2BUaS3Xx4KuImbSUEjYeA%3D&reserved=0> com.gargoylesoftware.htmlunit.ScriptException: ReferenceError: Assignment to undefined "regeneratorRuntime" in strict mode (https://www.londonstockexchange.com:443/polyfills-es5.463681aba2540d60831f.js#1(Function)#1)<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.londonstockexchange.com%2Fpolyfills-es5.463681aba2540d60831f.js%231(Function)%25231)&data=02%7C01%7C%7C8382574b518747db73af08d808599c2d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268528251895558&sdata=bMM48D7M4YtrneW6NPRpcxKno85wVdR0UPh6FltvmUE%3D&reserved=0> at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:891) I can send the full, very long stack, trace if that would help. Are you able to assist - either by fixing the bug (if it is one) or advising me how to get around this. Thanks, Damon Goodyear CONFIDENTIALITY and NON-DISCLOSURE OF EMAIL ADDRESS: This email, including its content and the address of the sender, are provided for the use of the recipient only and for the purposes of the subject matter under discussion. Notwithstanding any other consent that may have been given neither the content of this email nor the address of the sender may be disclosed to third parties, including within the same undertaking, without the prior written permission of the sender. Where this email has been sent to a general or non-personal email address permission is granted for a reply to be made, on the subject matter only, from the undertaking to whom it is addressed. _______________________________________________ Htmlunit-user mailing list Htm...@li...<mailto:Htm...@li...> https://lists.sourceforge.net/lists/listinfo/htmlunit-user<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fhtmlunit-user&data=02%7C01%7C%7C8382574b518747db73af08d808599c2d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637268528251895558&sdata=vmnKBUieKPn97FZe6ahufIhzO5veAOmm5q3v%2FoxGsMQ%3D&reserved=0> |
From: Vasudevan C. <vco...@gm...> - 2020-06-04 07:33:24
|
Hi, See if you can extract the necessary data from the HTML response by disabling the Javascript in WebClient. Regards Vasu On Thu, 4 Jun 2020 at 02:35, Damon Goodyear <dam...@ho...> wrote: > Hi, > > I am new to HTMLUnit and (re)new to seeking help from mailing lists like > this - the last time I tried was about 15 years ago and things seem to have > moved on. I hope my question is useful to you and I hope, even more, that > your answer is useful to me. I hope I am not committing the newbie sin of > asking a well known issue/non-issue. > > I have encountered a problem from the beginning with HTMLUnit. I have > been trying to use HTMLUnit to download information from the following > URLs- > > https://www.londonstockexchange.com/stock/OPM/1pm-plc/fundamentals > > https://www.londonstockexchange.com/live-markets/market-data-dashboard/price-explorer?page=1 > > https://www.londonstockexchange.com/stock/OPM/1pm-plc/company-page > > These are listed in order of importance to me. > > The reason for this is that all this information has changed format in the > last week or so and has become much harder to unpick. > > If I try the following code-- > > WebClient wc = new WebClient(BrowserVersion.CHROME); > > // > LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log", > "org.apache.commons.logging.impl.NoOpLog"); > > // > java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF); > > // > java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF); > > try { > > HtmlPage page = wc.getPage(" > https://www.londonstockexchange.com/stock/OPM/1pm-plc/fundamentals"); > > String s = page.asText(); > > System.out.print(s); > > wc.close(); > } catch etc... > > ... then I get a bucket load of javascript warnings that I can suppress by > uncommenting the commented lines above, followed by an exception I do not > understand and cannot find any help on, here or in the wider internet. > > The exception starts with the lines-- > > EcmaError: lineNumber=[1] column=[0] lineSource=[<no source>] > name=[ReferenceError] sourceName=[ > https://www.londonstockexchange.com:443/polyfills-es5.463681aba2540d60831f.js#1(Function)] > message=[ReferenceError: Assignment to undefined "regeneratorRuntime" in > strict mode ( > https://www.londonstockexchange.com:443/polyfills-es5.463681aba2540d60831f.js#1(Function)#1)] > > com.gargoylesoftware.htmlunit.ScriptException: ReferenceError: Assignment > to undefined "regeneratorRuntime" in strict mode ( > https://www.londonstockexchange.com:443/polyfills-es5.463681aba2540d60831f.js#1(Function)#1) > > at > com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:891) > > I can send the full, very long stack, trace if that would help. > > Are you able to assist - either by fixing the bug (if it is one) or > advising me how to get around this. > > Thanks, > > > *Damon Goodyear * > CONFIDENTIALITY and NON-DISCLOSURE OF EMAIL ADDRESS: This email, including > its content and the address of the sender, are provided for the use of the > recipient only and for the purposes of the subject matter under > discussion. Notwithstanding any other consent that may have been given > neither the content of this email nor the address of the sender may be > disclosed to third parties, including within the same undertaking, without > the prior written permission of the sender. Where this email has been sent > to a general or non-personal email address permission is granted for a > reply to be made, on the subject matter only, from the undertaking to whom > it is addressed. > _______________________________________________ > Htmlunit-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlunit-user > |
From: Damon G. <dam...@ho...> - 2020-06-03 21:05:08
|
Hi, I am new to HTMLUnit and (re)new to seeking help from mailing lists like this - the last time I tried was about 15 years ago and things seem to have moved on. I hope my question is useful to you and I hope, even more, that your answer is useful to me. I hope I am not committing the newbie sin of asking a well known issue/non-issue. I have encountered a problem from the beginning with HTMLUnit. I have been trying to use HTMLUnit to download information from the following URLs- https://www.londonstockexchange.com/stock/OPM/1pm-plc/fundamentals https://www.londonstockexchange.com/live-markets/market-data-dashboard/price-explorer?page=1 https://www.londonstockexchange.com/stock/OPM/1pm-plc/company-page These are listed in order of importance to me. The reason for this is that all this information has changed format in the last week or so and has become much harder to unpick. If I try the following code-- WebClient wc = new WebClient(BrowserVersion.CHROME); // LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.NoOpLog"); // java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF); // java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF); try { HtmlPage page = wc.getPage("https://www.londonstockexchange.com/stock/OPM/1pm-plc/fundamentals"); String s = page.asText(); System.out.print(s); wc.close(); } catch etc... ... then I get a bucket load of javascript warnings that I can suppress by uncommenting the commented lines above, followed by an exception I do not understand and cannot find any help on, here or in the wider internet. The exception starts with the lines-- EcmaError: lineNumber=[1] column=[0] lineSource=[<no source>] name=[ReferenceError] sourceName=[https://www.londonstockexchange.com:443/polyfills-es5.463681aba2540d60831f.js#1(Function)] message=[ReferenceError: Assignment to undefined "regeneratorRuntime" in strict mode (https://www.londonstockexchange.com:443/polyfills-es5.463681aba2540d60831f.js#1(Function)#1)] com.gargoylesoftware.htmlunit.ScriptException: ReferenceError: Assignment to undefined "regeneratorRuntime" in strict mode (https://www.londonstockexchange.com:443/polyfills-es5.463681aba2540d60831f.js#1(Function)#1) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:891) I can send the full, very long stack, trace if that would help. Are you able to assist - either by fixing the bug (if it is one) or advising me how to get around this. Thanks, Damon Goodyear CONFIDENTIALITY and NON-DISCLOSURE OF EMAIL ADDRESS: This email, including its content and the address of the sender, are provided for the use of the recipient only and for the purposes of the subject matter under discussion. Notwithstanding any other consent that may have been given neither the content of this email nor the address of the sender may be disclosed to third parties, including within the same undertaking, without the prior written permission of the sender. Where this email has been sent to a general or non-personal email address permission is granted for a reply to be made, on the subject matter only, from the undertaking to whom it is addressed. |
From: Ronald B. <rb...@rb...> - 2020-05-02 15:54:26
|
Hi all, it is a pleasure to announce the availability of HtmlUnit 2.40.0. The main enhancements are: - Chrome updated to Chrome 81 - Firefox updated to Firefox75 - new WebClientConfiguration option ConnectionTimeToLive - major improvements for the focus and active element handling - label tag got many fixes - respect Content-Security-Policy: frame-ancestors and X-Frame-Options: DENY when loading frame content - new method AttachmentHandler.handleAttachment(WebResponse) added. By implementing your own AttachmentHandler you now can process the attachment response by your code without opening a new WebClinet window. The full list of changes can be found in [1] Thanks to all the contributors. Happy Testing/Scraping! The HtmlUnit team [1] http://htmlunit.sourceforge.net/changes-report.html#a2.40.0 |
From: Ronald B. <rb...@rb...> - 2020-04-05 12:03:02
|
Hi all, it is a pleasure to announce the availability of HtmlUnit 2.39.0. The main enhancements are: - Firefox latest is now supported (at the Firefox74) - again CanvasRenderingContext2D got many improvements - FrameContentHandler added to support stop loading of frame content - the usual bunch of bugfixes - BrowserVersion FIREFOX_60 is deprecated The full list of changes can be found in [1] Thanks to all the contributors. Happy Testing/Scraping! The HtmlUnit team [1] http://htmlunit.sourceforge.net/changes-report.html#a2.39.0 |
From: Ronald B. <rb...@rb...> - 2020-03-22 13:08:21
|
Hi Oscar, i will try a simple anwer - if you like to automate a web page it will help to know as much as possible about all the strange technologies working together to bring the content to your screen at least * html * css * javascript * xml Usually i use the developer tools from firefox (or CHROME if you like to sponsor this company ;-) to unalyze the page a bit. You can start with a right click on the content you are interested in and select inspect Element. Then you have to decide for a way to locate the element. The differnt option are introduces at http://htmlunit.sourceforge.net/gettingStarted.html under the Finding a specific element topic. Hope that helps RBRi On Sat, 21 Mar 2020 01:27:07 -0500 Oscar Bastidas wrote: > >Hi Ronald, > >I had one other quick question if it's not too much trouble: would you >please tell me how you knew to search for "#Canonical-SMILES" in the code >you sent me? Since the word "SMILES" is nowhere to be found in the source >HTML, I was curious as to how you knew to search specifically for >"#Canonical-SMILES" >in the actual Java code (knowing this would help me scrape for other >strings on the dynamic webpage). > >Lastly, are there any resources you specifically recommend or liked for >learning how to do the kind of HTMLUnit webscraping you've helped me with >here? If it's just reading general tutorials online, that's ok, it's where >I'm starting now. Thanks again. > >Oscar B. > >On Sun, Mar 15, 2020, 9:22 AM Ronald Brill <rb...@rb...> wrote: > >> Hi Oscar, >> >> this code works for me >> >> public static void main(String[] args) throws Exception { >> String uri = "https://pubchem.ncbi.nlm.nih.gov/compound/1868"; >> >> try (final WebClient webClient = new >> WebClient(BrowserVersion.FIREFOX)) { >> // do not stop on js errors >> webClient.getOptions().setThrowExceptionOnScriptError(false); >> // do not log js errors (usually using this is a bad idea, at >> least >> // if you are hunting for problems). >> webClient.setJavaScriptErrorListener(new >> SilentJavaScriptErrorListener()); >> >> HtmlPage page = webClient.getPage(uri); >> webClient.waitForBackgroundJavaScriptStartingBefore(10_000); >> >> final DomNodeList<DomNode> divs = >> page.querySelectorAll("#Canonical-SMILES .section-content >> .section-content-item p"); >> for (DomNode div : divs) { >> System.out.println("----------------"); >> System.out.println(div.asXml()); >> System.out.println("----------------"); >> System.out.println(div.asText()); >> System.out.println("----------------"); >> } >> } >> } >> >> RBRi >> >> >> On Tue, 10 Mar 2020 12:54:31 -0500 Oscar Bastidas wrote: >> > >> >Hello, >> > >> >I am trying to make a copy of/obtain a string that appears on a webpage >> >when the webpage loads on my browser but when I look at the HTML code of >> >the webpage in question, I do not see the string at all (it is no where to >> >be found in the HTML code). >> > >> >Here is the URL: >> >https://pubchem.ncbi.nlm.nih.gov/compound/1868 >> > >> >and here is my target string: >> >COC1=CC2=C(C=C1)NC3=C2CCNC3 >> > >> >The above target string is found under the heading of "2.1.4 Canonical >> >SMILES" (this heading doesn't appear either in the HTML code). >> > >> >Could someone please tell me if this is a special case that cannot be >> >scraped? Thanks. >> > >> >Oscar B. >> > >> > >> > >> >----< Inline text [text-plain-04.txt] >------------------ >> > >> > >> > >> > >> >----< Inline text [text-plain-05.txt] >------------------ >> > >> >_______________________________________________ >> >Htmlunit-user mailing list >> >Htm...@li... >> >https://lists.sourceforge.net/lists/listinfo/htmlunit-user >> > >> > >> >> > > |
From: Ronald B. <rb...@rb...> - 2020-03-22 12:55:09
|
Hi Oscar, that is not so uncommon. From time to time the providers are changing there web pages by using other js library or only newer versions. In you case it look like the page js code is change and now uses the fetch api. This api is not supported b HtmlUnit at the moment. With a bit of luck you can ignore this. Add webClient.getOptions().setThrowExceptionOnScriptError(false); directly after the webClient creation. Hope that helps. If not please open an issue at github and i will have a deeper look. RBRi On Sat, 21 Mar 2020 01:27:07 -0500 Oscar Bastidas wrote: >Hello, > >I have been trying to build a web scraping tool to obtain a string from a >dynamically-loaded webpage. > >My objective was to obtain the string COC1=CC2=C(C=C1)NC3=C2CCNC3 from the >section titled "Canonical SMILES" from the following website: >https://pubchem.ncbi.nlm.nih.gov/compound/1868 > >Previously, I had working HTMLUnit code to access the above (thanks >Ronald), but now the code is not working! Whereas before I would get >printouts to the screen of my target information (COC1=CC2=C(C=C1)NC3=C2CCNC3 >from the "Canonical Smiles" section of the above website - this information >is displayed dynamically on the website), now instead, HTMLUnit returns the >following error: > >*SEVERE: ReferenceError: "fetch" is not defined* > >Looking this error up, it seems this "fetch" has something to do with >requests and responses across the network when accessing the webpage. In >short, it seems like it's something implemented on the end of the owner of >the web page, not something I can inadvertently modify, and viewing the >webpage on a normal browser, the website looks fine, just as it always has >in the past. > >Would someone please tell me what is going on here? The code was working >perfectly one minute, but then yielded the above fetch error the next. > >Here is the main method of my previously functional code: > >public static void main(String[] args) throws Exception { > String uri = "https://pubchem.ncbi.nlm.nih.gov/compound/1868"; > > try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX)) >{ > // do not stop on js errors > webClient.getOptions().setThrowExceptionOnScriptError(false); > // do not log js errors (usually using this is a bad idea, at >least > // if you are hunting for problems). > webClient.setJavaScriptErrorListener(new >SilentJavaScriptErrorListener()); > > HtmlPage page = webClient.getPage(uri); > webClient.waitForBackgroundJavaScriptStartingBefore(10_000); > > final DomNodeList<DomNode> divs = >page.querySelectorAll("#Canonical-SMILES >..section-content .section-content-item p"); > for (DomNode div : divs) { > System.out.println("----------------"); > System.out.println(div.asXml()); > System.out.println("----------------"); > System.out.println(div.asText()); > System.out.println("----------------"); > } > } > } > >Oscar B. > > > >----< Inline text [text-plain-04.txt] >------------------ > > > > >----< Inline text [text-plain-05.txt] >------------------ > >_______________________________________________ >Htmlunit-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlunit-user > > |
From: Oscar B. <oba...@um...> - 2020-03-21 06:29:28
|
Hello, I have been trying to build a web scraping tool to obtain a string from a dynamically-loaded webpage. My objective was to obtain the string COC1=CC2=C(C=C1)NC3=C2CCNC3 from the section titled "Canonical SMILES" from the following website: https://pubchem.ncbi.nlm.nih.gov/compound/1868 Previously, I had working HTMLUnit code to access the above (thanks Ronald), but now the code is not working! Whereas before I would get printouts to the screen of my target information (COC1=CC2=C(C=C1)NC3=C2CCNC3 from the "Canonical Smiles" section of the above website - this information is displayed dynamically on the website), now instead, HTMLUnit returns the following error: *SEVERE: ReferenceError: "fetch" is not defined* Looking this error up, it seems this "fetch" has something to do with requests and responses across the network when accessing the webpage. In short, it seems like it's something implemented on the end of the owner of the web page, not something I can inadvertently modify, and viewing the webpage on a normal browser, the website looks fine, just as it always has in the past. Would someone please tell me what is going on here? The code was working perfectly one minute, but then yielded the above fetch error the next. Here is the main method of my previously functional code: public static void main(String[] args) throws Exception { String uri = "https://pubchem.ncbi.nlm.nih.gov/compound/1868"; try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX)) { // do not stop on js errors webClient.getOptions().setThrowExceptionOnScriptError(false); // do not log js errors (usually using this is a bad idea, at least // if you are hunting for problems). webClient.setJavaScriptErrorListener(new SilentJavaScriptErrorListener()); HtmlPage page = webClient.getPage(uri); webClient.waitForBackgroundJavaScriptStartingBefore(10_000); final DomNodeList<DomNode> divs = page.querySelectorAll("#Canonical-SMILES .section-content .section-content-item p"); for (DomNode div : divs) { System.out.println("----------------"); System.out.println(div.asXml()); System.out.println("----------------"); System.out.println(div.asText()); System.out.println("----------------"); } } } Oscar B. |
From: Ronald B. <rb...@rb...> - 2020-03-15 14:22:49
|
Hi Oscar, this code works for me public static void main(String[] args) throws Exception { String uri = "https://pubchem.ncbi.nlm.nih.gov/compound/1868"; try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX)) { // do not stop on js errors webClient.getOptions().setThrowExceptionOnScriptError(false); // do not log js errors (usually using this is a bad idea, at least // if you are hunting for problems). webClient.setJavaScriptErrorListener(new SilentJavaScriptErrorListener()); HtmlPage page = webClient.getPage(uri); webClient.waitForBackgroundJavaScriptStartingBefore(10_000); final DomNodeList<DomNode> divs = page.querySelectorAll("#Canonical-SMILES .section-content .section-content-item p"); for (DomNode div : divs) { System.out.println("----------------"); System.out.println(div.asXml()); System.out.println("----------------"); System.out.println(div.asText()); System.out.println("----------------"); } } } RBRi On Tue, 10 Mar 2020 12:54:31 -0500 Oscar Bastidas wrote: > >Hello, > >I am trying to make a copy of/obtain a string that appears on a webpage >when the webpage loads on my browser but when I look at the HTML code of >the webpage in question, I do not see the string at all (it is no where to >be found in the HTML code). > >Here is the URL: >https://pubchem.ncbi.nlm.nih.gov/compound/1868 > >and here is my target string: >COC1=CC2=C(C=C1)NC3=C2CCNC3 > >The above target string is found under the heading of "2.1.4 Canonical >SMILES" (this heading doesn't appear either in the HTML code). > >Could someone please tell me if this is a special case that cannot be >scraped? Thanks. > >Oscar B. > > > >----< Inline text [text-plain-04.txt] >------------------ > > > > >----< Inline text [text-plain-05.txt] >------------------ > >_______________________________________________ >Htmlunit-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlunit-user > > |
From: Oscar B. <oba...@um...> - 2020-03-10 17:56:41
|
Hello, I am trying to make a copy of/obtain a string that appears on a webpage when the webpage loads on my browser but when I look at the HTML code of the webpage in question, I do not see the string at all (it is no where to be found in the HTML code). Here is the URL: https://pubchem.ncbi.nlm.nih.gov/compound/1868 and here is my target string: COC1=CC2=C(C=C1)NC3=C2CCNC3 The above target string is found under the heading of "2.1.4 Canonical SMILES" (this heading doesn't appear either in the HTML code). Could someone please tell me if this is a special case that cannot be scraped? Thanks. Oscar B. |
From: Ronald B. <rb...@rb...> - 2020-03-08 12:38:52
|
Hi all, it is a pleasure to announce the availability of HtmlUnit 2.38.0. The main enhancements are: - Bugfixes (as always) - many improvements done for the CanvasRenderingContext2D - CHROME 80 - support for disabling WebSocket added (if disabled the jetty dependencies are not required) The full list of changes can be found in [1] Thanks to all the contributors. Happy Testing/Scraping! The HtmlUnit team [1] http://htmlunit.sourceforge.net/changes-report.html#a2.38.0 |
From: Vasudevan C. <vco...@gm...> - 2020-03-06 17:19:11
|
Hi, Figured out the issue. The host was redirecting to a new URL which resulted in the format getting corrupted. Regards Vasu On Fri, 6 Mar 2020 at 21:19, <rb...@rb...> wrote: > Am 6. März 2020 15:46:38 MEZ schrieb Vasudevan Comandur < > vco...@gm...>: >> >> Hi, >> >> I am sending a query to JobDiva portal using POST request. I am using >> HTMLUnit NameValuePair Class to construct the key/value pair as part of >> POST request. >> >> I construct NameValuePairwith key "sfboolean" and >> value as "(java&&&&OR&&&&HTML)". When I analyzed through Charles >> The data was going as "sfboolean" with (java as value and the rest of >> the >> values were going as Keys. Copying Below the POST Data for your >> reference. >> >> sfcriteria JAVA~0~0.0~0~0~HTML~1~0.0~0~0~ >> sfsalaryper >> sfnotsalaryper Y >> sfdatefrom 03/06/2020 >> sfboolean (java >> OR >> HTML) >> >> Am I doing something wrong?. >> Appreciate your help in advance >> >> REgards >> Vasu >> >> > Hi Vasu > please open an issue at GitHub. Otherwise I'm not able to track all this. > |
From: <rb...@rb...> - 2020-03-06 15:50:10
|
Am 6. März 2020 15:46:38 MEZ schrieb Vasudevan Comandur <vco...@gm...>: >Hi, > > I am sending a query to JobDiva portal using POST request. I am using >HTMLUnit NameValuePair Class to construct the key/value pair as part of > POST request. > > I construct NameValuePairwith key "sfboolean" and > value as "(java&&&&OR&&&&HTML)". When I analyzed through Charles > The data was going as "sfboolean" with (java as value and the rest of >the > values were going as Keys. Copying Below the POST Data for your > reference. > >sfcriteria JAVA~0~0.0~0~0~HTML~1~0.0~0~0~ >sfsalaryper >sfnotsalaryper Y >sfdatefrom 03/06/2020 >sfboolean (java >OR >HTML) > > Am I doing something wrong?. > Appreciate your help in advance > >REgards > Vasu Hi Vasu please open an issue at GitHub. Otherwise I'm not able to track all this. |
From: Vasudevan C. <vco...@gm...> - 2020-03-06 14:47:08
|
Hi, I am sending a query to JobDiva portal using POST request. I am using HTMLUnit NameValuePair Class to construct the key/value pair as part of POST request. I construct NameValuePairwith key "sfboolean" and value as "(java&&&&OR&&&&HTML)". When I analyzed through Charles The data was going as "sfboolean" with (java as value and the rest of the values were going as Keys. Copying Below the POST Data for your reference. sfcriteria JAVA~0~0.0~0~0~HTML~1~0.0~0~0~ sfsalaryper sfnotsalaryper Y sfdatefrom 03/06/2020 sfboolean (java OR HTML) Am I doing something wrong?. Appreciate your help in advance REgards Vasu |
From: Vasudevan C. <vco...@gm...> - 2020-02-14 18:06:25
|
All, Need help to the content of the new page loaded within an Iframe. 1. Initiate the URL to get the page rendered. 2. There is an IFrame in the HTML page which has input fields and a button. 3. I was able to get the content of the IFrame HTML, populated the input fields and clicked the button. 4. The return value of Click is an HTML Page object 5. How do I get the new/updated content from the IFrame I appreciate your help in advance. Regards Vasu |
From: <rb...@rb...> - 2020-02-14 14:57:06
|
Am 14. Februar 2020 13:50:39 MEZ schrieb Vasudevan Comandur <vco...@gm...>: >All, > > I was entering the username and password into the respective fields > and clicked the button. The password was encrypted (by HtmlUnit JS >engine) > and the host rejected with the message "Invalid login credentials". > >Whereas the login is successful from the browser. The encrypted >password > length is more from the browser compared to HtmlUnit generated values. > > Looks like the execution of JS for password encryption in HtmlUnit is >having > some issues. > > I am copying the encrypted password of HTMlUnit & actual browser. > > HtmlUnit Value: > >v="0384l01600l11696l23456l312416l425088l525600l651200l7720l8880l9912l10848l11896l12720l13832l14848l151568l16800l17720l18896l19768l201616l21848l22720l23912l24912l251584l26912l27848l28880l29800l30896l31768l321600l33800l34816l35~0" > > Browser Value: >v="0808l0816l11824l23136l36784l413312l528160l649152l7720l81616l91600l10768l111616l12720l13832l14800l15800l16800l17720l18912l191600l20896l21880l22720l23864l241600l251600l26848l27784l281552l29832l30912l31832l32768l331632l34864l35~0688l01744l13680l24096l36912l429696l553248l6117760l71616l81760l91840l101616l11" > >I tried simulating all browser version options supported by HtmlUnit >and >it > fails always. > > I appreciate your help or any pointers to solve this issue. > >Regards > Vasu Any hint about the used js functions will be helpful. |