From: David M. G. <mic...@gm...> - 2014-01-12 14:26:49
|
Hi all, I had an error in the analysis. There is another issue here The html contains a frame <frame src="http://xn--archivoespaoldearte-53b.revistas.csic.es/index.php/aea/article/viewDownloadInterstitial/557/554 <view-source:http://archivoespa%C3%B1oldearte.revistas.csic.es/index.php/aea/article/viewDownloadInterstitial/557/554>" frameborder="0"/> which is loaded and this frame performs the force download. Working with the tutorial *http://htmlunit.sourceforge.net/frame-howto.html <http://htmlunit.sourceforge.net/frame-howto.html>* solved the issue. Thanks, David On Mon, Jan 6, 2014 at 10:19 AM, David Michael Gang <mic...@gm...>wrote: > Hi all, > > I have the following problem: > When fetching the page with a browser: > > http://xn--archivoespaoldearte-53b.revistas.csic.es/index.php/aea/article/view/557/554 > > I get a pdf. > > Through htmlunit i just get a html page. > > > Here is the program > package test; > > import java.io.File; > import java.io.IOException; > import java.io.InputStream; > import java.net.MalformedURLException; > > import org.apache.commons.io.FileUtils; > import org.apache.pdfbox.exceptions.COSVisitorException; > > import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException; > import com.gargoylesoftware.htmlunit.ImmediateRefreshHandler; > import com.gargoylesoftware.htmlunit.Page; > import com.gargoylesoftware.htmlunit.WebClient; > > > public class ForceDownload { > > public static void main(String[] args) throws > FailingHttpStatusCodeException, MalformedURLException, IOException, > COSVisitorException { > WebClient client = new WebClient(); > client.setRefreshHandler(new ImmediateRefreshHandler()); > > final String downloadUrl = " > http://xn--archivoespaoldearte-53b.revistas.csic.es/index.php/aea/article/view/557/554 > "; > final Page page = client.getPage(downloadUrl); > > System.out.println(page.getWebResponse().getContentType()); > > final InputStream is = page.getWebResponse().getContentAsStream(); > FileUtils.copyInputStreamToFile(is, new File("file.pdf")); > > > } > > } > > I get as output the html file. > I already tried to set ImmediateRefreshHandler but it did not help. > I tried to understand why and saw through the firefox web developer that > it sends keep alive signals. > How can i refresh the page to wait for keep alive till i get the pdf page? > > Thanks, > David > |