From: Ahmed A. <asa...@ya...> - 2017-05-03 06:51:15
|
Hi Vasu, The below is only the client-side. As the server-side may be sending a different encoding that it states. Please isolate a minimal case with the server-side component using Serlvet, as in https://sourceforge.net/p/htmlunit/code/HEAD/tree/trunk/htmlunit/src/test/java/com/gargoylesoftware/htmlunit/WebResponseTest.java#l233 Ahmed From: Vasudevan Comandur <vco...@gm...> To: Ahmed Ashour <asa...@ya...>; "htm...@li..." <htm...@li...> Sent: Tuesday, May 2, 2017 10:30 PM Subject: Re: [Htmlunit-user] Clarification Requried Hi Ahmed, I had changed the HTTP Header Accept-Encoding to deflate and HTMLUnit 2.23 was reading the content. However, if I had left it to the default Accept-Encoding to gzip, deflate header, it was not giving me the content. Let me know if you need any other stuff from me. Response HEader from Host when defalte was set | | HTTP/1.1 200 OK | | Server | AtyponWS/7.1 | | Last-Modified | Mon, 01 May 2017 22:55:39 GMT | | Expires | Thu, 19 Oct 2017 05:40:28 GMT | | Cache-Control | public | | Vary | User-Agent,Accept-Encoding | | Content-Type | text/css; charset=UTF-8 | | Date | Tue, 02 May 2017 20:23:20 GMT | | Content-Encoding | deflate | | Transfer-Encoding | chunked | Response Header from Host when gzip was set | | HTTP/1.1 200 OK | | Server | AtyponWS/7.1 | | Content-Encoding | gzip | | Last-Modified | Mon, 01 May 2017 22:55:39 GMT | | Expires | Thu, 19 Oct 2017 05:40:28 GMT | | Cache-Control | public | | Vary | User-Agent,Accept-Encoding | | Content-Type | text/css; charset=UTF-8 | | Transfer-Encoding | chunked | | Date | Tue, 02 May 2017 11:34:24 GMT | Regards Vasu On 3 May 2017 at 00:59, Ahmed Ashour <asa...@ya...> wrote: Hi Vasu, Please use latest version, if not latest build. And post your complete code. Ahmed From: Vasudevan Comandur <vco...@gm...> To: "htmlunit-user@lists. sourceforge.net" <htmlunit-user@lists. sourceforge.net> Sent: Tuesday, May 2, 2017 9:13 PM Subject: [Htmlunit-user] Clarification Requried Hi, I am using HTMLUnit 2.23. I received a response from the site which had page object mapped to instance of TextPage. I tried to get the content using getContent() method but I was not getting the data. The response code was 200 and the content-type was text/css. Am I missing something?. The site I am scrapping is http://adh.sagepub.com The CSS data which I was trying to read is http://journals.sagepub.com/ pb/css/t1493676764000- v1493676764000/head_1_6_7.css Appreciate your help in advance. Regards Vasu ------------------------------ ------------------------------ ------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ______________________________ _________________ Htmlunit-user mailing list Htmlunit-user@lists. sourceforge.net https://lists.sourceforge.net/ lists/listinfo/htmlunit-user ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________ Htmlunit-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlunit-user |