From: David M. G. <mic...@gm...> - 2014-02-10 15:04:48
|
Hi all, I have the following challenging question :-) The following url http://www.unioviedo.es/reunido/index.php/EBL/article/view/9962/9779 embeds a pdf with the javascript library pdfobject http://pdfobject.com/ If it would be an ordinary frame, i could just have detected that the frame contains an unexpectedpage with the type pdf and could have fetched the pdf with pdfbox. I have the following code: package test; import java.io.IOException; import java.net.MalformedURLException; import com.gargoylesoftware.htmlunit.BrowserVersion; import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException; import com.gargoylesoftware.htmlunit.ImmediateRefreshHandler; import com.gargoylesoftware.htmlunit.WebClient; import com.gargoylesoftware.htmlunit.html.HtmlPage; public class PdfObject { public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException { // TODO Auto-generated method stub String url = " http://www.unioviedo.es/reunido/index.php/EBL/article/view/9962/9779"; WebClient client = new WebClient(BrowserVersion.FIREFOX_24); client.setRefreshHandler(new ImmediateRefreshHandler()); final HtmlPage page = client.getPage(url); client.waitForBackgroundJavaScriptStartingBefore(5000); System.out.println(page.asXml()); } } As we can see the pdf object is not initialized. Maybe this is because of the implementation of the javascript function in htmlunit This function is used by pdfobject //Detects unbranded PDF support var hasGeneric = function (){ var plugin = navigator.mimeTypes["application/pdf"]; return (plugin && plugin.enabledPlugin); }; If this is false the javascript is not initialized. How can i cause pdfobject to work with htmlunit? Thanks, David |