htmlunit-user Mailing List for HtmlUnit (Page 42)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi David,

InputStream is = page.getWebResponse().getContentAsStream();

Then save it to a Writer.

You can use commons-io:

Writer writer = new FileWriter("hello.pdf"); 
IOUtils.copy(is, writer);
writer.close();

Yours,
Ahmed

________________________________
 From: David Michael Gang <mic...@gm...>
To: htm...@li... 
Sent: Thursday, January 2, 2014 3:56 PM
Subject: Re: [Htmlunit-user] Htmlunit-user Digest, Vol 92, Issue 1

Hi Ahmed,

Thanks for your reply.
This solves halve of the problem.

The immediate refresh handler redirects me automatically to the page which sends me the header "application/force-download".

The question is how to emulate the browser behavior so that i get the pdf page automatically.
Here is the code:

package test;

import java.io.IOException;
import java.net.MalformedURLException;

import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.ImmediateRefreshHandler;
import com.gargoylesoftware.htmlunit.Page;
import com.gargoylesoftware.htmlunit.WebClient;

public class ForceDownload {

    public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
        WebClient client = new WebClient();
        client.setRefreshHandler(new ImmediateRefreshHandler());

        final String downloadUrl = "http://archivoespañoldearte.revistas.csic.es/index.php/aea/article/download/552/549";
        final Page page = client.getPage(downloadUrl);

        System.out.println(page.getWebResponse().getContentType());

    }

}

I get the output:
application/force-download

How can i get to the pdf?

Thanks,

David

Message: 2
>Date: Thu, 26 Dec 2013 07:17:24 -0800 (PST)
>From: Ahmed Ashour <asa...@ya...>
>Subject: Re: [Htmlunit-user] deal with application/force-download
>To: "htm...@li..."
>        <htm...@li...>
>Message-ID:
>        <138...@we...>
>Content-Type: text/plain; charset="iso-8859-1"
>
>Hi David,
>
>The page refreshes in 2 seconds and forwards to the PDF location.
>
>You can try:
>
>??????? WebClient webClient = new WebClient();
>??????? webClient.setRefreshHandler(new ImmediateRefreshHandler());
>??????? Page page = webClient.getPage("http://xn--archivoespaoldearte-53b.revistas.csic.es/index.php/aea/article/viewDownloadInterstitial/552/549");
>
>
>Ahmed
>
>________________________________
> From: David Michael Gang <mic...@gm...>
>To: htm...@li...
>Sent: Monday, December 23, 2013 12:24 PM
>Subject: [Htmlunit-user] deal with application/force-download
>
>
>
>Hi all,
>
>I have the following url:
>http://xn--archivoespaoldearte-53b.revistas.csic.es/index.php/aea/article/view/552/549
>
>In firefox or ie8 the page refreshes and a pdf is downloaded.
>With htmlunit i try the following:
>When trying to go to the top page, it returns a sort of html page and not the pdf.
>
>Even when trying to go directly to the download page, it does not download the pdf.
>
>
>package test;
>
>import java.io.IOException;
>import java.net.MalformedURLException;
>
>import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
>import com.gargoylesoftware.htmlunit.Page;
>import com.gargoylesoftware.htmlunit.WebClient;
>import com.gargoylesoftware.htmlunit.html.HtmlPage;
>
>public class ForceDownload {
>
>??? public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
>??? ??? WebClient client = new WebClient();
>??? ??? System.out.println("get to top page");
>??? ??? final String topUrl = "http://xn--archivoespaoldearte-53b.revistas.csic.es/index.php/aea/article/view/552/549";
>??? ??? final Page topPage = client.getPage(topUrl);
>??? ??? if(topPage.isHtmlPage()) {
>??? ??? ??? System.out.println("topPage is htmlPage");
>??? ??? ??? System.out.println("source of top page is "+((HtmlPage) topPage).asXml());
>??? ??? }
>??? ???
>??? ??? System.out.println("get to download page directly");
>??? ???
>??? ??? final String downloadUrl = "http://archivoespa?oldearte.revistas.csic.es/index.php/aea/article/download/552/549";
>??? ???
>??? ??? final Page page = client.getPage(downloadUrl);
>??? ??? System.out.println(page.getWebResponse().getContentType());
>??? ???
>??? ???
>??? }
>
>}
>
>This is the output of the script
>get to top page
>topPage is htmlPage
>source of top page is <?xml version="1.0" encoding="UTF-8"?>
><html xmlns="http://www.w3.org/1999/xhtml">
>? <head>
>??? <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
>??? <title>
>????? Vasallo Toranzo
>??? </title>
>??? <link rel="stylesheet" href="http://xn--archivoespaoldearte-53b.revistas.csic.es/styles/common.css" type="text/css"/>
>??? <link rel="stylesheet" href="http://xn--archivoespaoldearte-53b.revistas.csic.es/styles/articleView.css" type="text/css"/>
>??? <link rel="icon" href="http://xn--archivoespaoldearte-53b.revistas.csic.es/favicon.ico" type="image/x-icon"/>
>??? <script type="text/javascript" src="http://xn--archivoespaoldearte-53b.revistas.csic.es/js/general.js">
>??? </script>
>??? <!-- Add javascript required for font sizer -->??? <script type="text/javascript" src="http://xn--archivoespaoldearte-53b.revistas.csic.es/js/sizer.js">
>??? </script>
>??? <!-- Add stylesheets for the font sizer -->??? <link rel="alternate stylesheet" title="Peque?a" href="http://xn--archivoespaoldearte-53b.revistas.csic.es/styles/fontSmall.css" type="text/css" disabled="disabled"/>
>??? <link rel="stylesheet" title="Mediana" href="http://xn--archivoespaoldearte-53b.revistas.csic.es/styles/fontMedium.css" type="text/css"/>
>??? <link rel="alternate stylesheet" title="Grande" href="http://xn--archivoespaoldearte-53b.revistas.csic.es/styles/fontLarge.css" type="text/css" disabled="disabled"/>
>? </head>
>? <frameset cols="220,*" style="border: 0;">
>??? <!-- cols="*,180"-->??? <frame src="http://xn--archivoespaoldearte-53b.revistas.csic.es/index.php/aea/article/viewRST/552/549" noresize="noresize" frameborder="0" scrolling="auto"/>
>??? <frame src="http://xn--archivoespaoldearte-53b.revistas.csic.es/index.php/aea/article/viewDownloadInterstitial/552/549" frameborder="0"/>
>??? <noframes>
>?????
>&lt;body&gt;
>??? &lt;table width="100%"&gt;
>??? ??? &lt;tr&gt;
>??? ??? ??? &lt;td align="center"&gt;
>??? ??? ??? ??? Esta p?gina usa marcos. &lt;a href="http://xn--archivoespaoldearte-53b.revistas.csic.es/index.php/aea/article/viewDownloadInterstitial/552/549"&gt;Haga click aqu?&lt;/a&gt; para ir a la versi?n sin marcos.
>??? ??? ??? &lt;/td&gt;
>??? ??? &lt;/tr&gt;
>??? &lt;/table&gt;
>&lt;/body&gt;
>
>
>??? </noframes>
>? </frameset>
></html>
>
>get to download page directly
>application/force-download
>
>
>
>How can i solve this challenge?
>
>How can i tell htmlunit to download the file directly?
>
>
>Thanks,
>
>David
>
>
>------------------------------------------------------------------------------
>Rapidly troubleshoot problems before they affect your business. Most IT
>organizations don't have a clear picture of how application performance
>affects their revenue. With AppDynamics, you get 100% visibility into your
>Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
>http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
>_______________________________________________
>Htmlunit-user mailing list
>Htm...@li...
>https://lists.sourceforge.net/lists/listinfo/htmlunit-user
>-------------- next part --------------
>An HTML attachment was scrubbed...
>
>
>

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Htmlunit-user mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlunit-user

2003	Jan	Feb	Mar	Apr	May	Jun (6)	Jul (17)	Aug (18)	Sep (22)	Oct (16)	Nov (6)	Dec (11)
2004	Jan (11)	Feb (10)	Mar (34)	Apr (26)	May (6)	Jun (22)	Jul (14)	Aug (4)	Sep (47)	Oct (69)	Nov (23)	Dec (21)
2005	Jan (53)	Feb (33)	Mar (92)	Apr (65)	May (63)	Jun (57)	Jul (43)	Aug (132)	Sep (61)	Oct (75)	Nov (60)	Dec (130)
2006	Jan (74)	Feb (87)	Mar (101)	Apr (58)	May (54)	Jun (42)	Jul (31)	Aug (67)	Sep (61)	Oct (71)	Nov (28)	Dec (58)
2007	Jan (53)	Feb (50)	Mar (96)	Apr (66)	May (55)	Jun (130)	Jul (99)	Aug (115)	Sep (37)	Oct (78)	Nov (24)	Dec (70)
2008	Jan (94)	Feb (85)	Mar (197)	Apr (274)	May (119)	Jun (143)	Jul (193)	Aug (99)	Sep (160)	Oct (120)	Nov (178)	Dec (109)
2009	Jan (238)	Feb (169)	Mar (115)	Apr (109)	May (131)	Jun (167)	Jul (144)	Aug (193)	Sep (155)	Oct (154)	Nov (97)	Dec (127)
2010	Jan (108)	Feb (127)	Mar (176)	Apr (113)	May (130)	Jun (200)	Jul (115)	Aug (80)	Sep (92)	Oct (101)	Nov (124)	Dec (53)
2011	Jan (67)	Feb (144)	Mar (88)	Apr (60)	May (89)	Jun (54)	Jul (68)	Aug (81)	Sep (48)	Oct (40)	Nov (10)	Dec (20)
2012	Jan (21)	Feb (28)	Mar (17)	Apr (35)	May (41)	Jun (44)	Jul (68)	Aug (67)	Sep (89)	Oct (58)	Nov (47)	Dec (56)
2013	Jan (49)	Feb (28)	Mar (46)	Apr (31)	May (28)	Jun (37)	Jul (34)	Aug (52)	Sep (42)	Oct (108)	Nov (59)	Dec (56)
2014	Jan (41)	Feb (72)	Mar (46)	Apr (21)	May (19)	Jun (17)	Jul (15)	Aug (40)	Sep (11)	Oct (3)	Nov (5)	Dec (31)
2015	Jan (11)	Feb (12)	Mar (19)	Apr (19)	May (38)	Jun (54)	Jul (14)	Aug (42)	Sep (14)	Oct (16)	Nov (26)	Dec (14)
2016	Jan (3)	Feb (1)	Mar (24)	Apr (5)	May (15)	Jun (14)	Jul (33)	Aug (19)	Sep (8)	Oct (10)	Nov	Dec (2)
2017	Jan (16)	Feb (12)	Mar (23)	Apr (8)	May (11)	Jun (20)	Jul (21)	Aug (20)	Sep	Oct (6)	Nov (9)	Dec (2)
2018	Jan (7)	Feb (5)	Mar (6)	Apr (5)	May (1)	Jun (2)	Jul (2)	Aug	Sep (4)	Oct (3)	Nov	Dec (4)
2019	Jan (2)	Feb (2)	Mar (3)	Apr (4)	May	Jun (4)	Jul (9)	Aug (2)	Sep	Oct (4)	Nov (1)	Dec (7)
2020	Jan (2)	Feb (6)	Mar (9)	Apr (1)	May (1)	Jun (15)	Jul (1)	Aug (1)	Sep (2)	Oct (6)	Nov (3)	Dec (5)
2021	Jan (3)	Feb (1)	Mar (2)	Apr (1)	May	Jun (1)	Jul (1)	Aug (3)	Sep (1)	Oct	Nov (1)	Dec
2022	Jan	Feb	Mar	Apr	May (2)	Jun (1)	Jul (4)	Aug	Sep	Oct	Nov (1)	Dec (6)
2025	Jan	Feb	Mar	Apr	May	Jun (1)	Jul	Aug	Sep	Oct	Nov	Dec

htmlunit-user Mailing List for HtmlUnit (Page 42)

Java GUI-Less browser, supporting JavaScript, to run against web pages

htmlunit-user — Discussion of the use of HtmlUnit