From: Ahmed A. <asa...@ya...> - 2013-12-17 07:38:00
|
Hi David, You can use: http://htmlunit.sourceforge.net/apidocs/com/gargoylesoftware/htmlunit/ImmediateRefreshHandler.html webClient.setRefreshHandler(). Ahmed ________________________________ From: David Michael Gang <mic...@gm...> To: htm...@li... Sent: Tuesday, December 17, 2013 10:28 AM Subject: [Htmlunit-user] How to identify if page gets refreshed and how to wait for it Hi all, For the following link: http://www.scielo.org.co/scielo.php?script=sci_arttext&pid=S0120-99572012000600002 I want to get the address of the pdf. When pressing on the pdf link with the text "Article in pdf format" i get to a refresh page. How do i know that i am now in a refresh page and how many seconds i will wait? Alternatively, how can i get the pdf link without waiting. I tried to find a solution on my own I wanted somehow to parse the redirect page source and get the following tag (which you get with view source): <meta name="added" content="7;URL=http://www.scielo.org.co/pdf/rcg/v27s2/v27s2a02.pdf" http-equiv="refresh"> When trying to make the same with htmlunit: public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException { WebClient webClient = new WebClient(); List<String> urls = ImmutableList.of("http://www.scielo.org.co/scielo.php?script=sci_arttext&pid=S0120-99572012000600002"); for(String url:urls) { HtmlPage page = webClient.getPage(url); List<HtmlAnchor> anchors = page.getAnchors(); for (HtmlAnchor anchor:anchors) { String linkText = anchor.getTextContent(); if(linkText.contains("pdf")) { Page pdfPage = anchor.click(); if(pdfPage.isHtmlPage()) { HtmlPage p = (HtmlPage) pdfPage; System.out.println(p.asXml()); } } } } } I don't get this tag in the resulting xml. In summary i have the following questions: How do i know that i am now in a refresh page and how many seconds i will wait? After identifying that i am in a refresh page, how do i wait and refresh after this number of seconds? how can i get the pdf link without waiting. Where did the <meta name="added" content="7;URL=http://www.scielo.org.co/pdf/rcg/v27s2/v27s2a02.pdf" http-equiv="refresh"> tag disappear? Thanks, David ------------------------------------------------------------------------------ Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk _______________________________________________ Htmlunit-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlunit-user |