From: Ahmed A. <asa...@ya...> - 2013-12-17 10:27:20
|
Hi David, It seems you need to get the top page, something like: public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException { WebClient webClient = new WebClient(); webClient.setRefreshHandler(new ImmediateRefreshHandler()); webClient.getOptions().setThrowExceptionOnScriptError(false); String url = "http://www.scielo.org.co/scielo.php?script=sci_arttext&pid=S0120-99572012000600002"; HtmlPage page = webClient.getPage(url); List<HtmlAnchor> anchors = page.getAnchors(); for (HtmlAnchor anchor:anchors) { String linkText = anchor.getTextContent(); if(linkText.contains("pdf")) { System.out.println("clicking on anchor:"+anchor); Page pdfPage = anchor.click(); System.out.println("URL 1 " + pdfPage.getUrl()); webClient.waitForBackgroundJavaScriptStartingBefore(10000); System.out.println("URL 2 " + pdfPage.getUrl()); System.out.println("URL 3 " + webClient.getTopLevelWindows().get(0).getEnclosedPage().getUrl()); if(pdfPage.isHtmlPage()) { HtmlPage p = (HtmlPage) pdfPage; } else { System.out.println("Page is pdf"); System.out.println(pdfPage); } } } } Yours, Ahmed ________________________________ From: David Michael Gang <mic...@gm...> To: htm...@li... Sent: Tuesday, December 17, 2013 1:06 PM Subject: Re: [Htmlunit-user] How to identify if page gets refreshed and how to wait for it Hi, It seems that there is a more basic issue. In the page http://www.scielo.org.co/scielo.php?script=sci_arttext&pid=S0120-99572012000600002 I have the pdf article link When i press on the link, i don't get the new page. Here is the code: package test; import java.io.IOException; import java.net.MalformedURLException; import java.util.List; import com.gargoylesoftware.htmlunit.BrowserVersion; import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException; import com.gargoylesoftware.htmlunit.ImmediateRefreshHandler; import com.gargoylesoftware.htmlunit.NiceRefreshHandler; import com.gargoylesoftware.htmlunit.Page; import com.gargoylesoftware.htmlunit.WebClient; import com.gargoylesoftware.htmlunit.html.HtmlAnchor; import com.gargoylesoftware.htmlunit.html.HtmlPage; import com.gargoylesoftware.htmlunit.javascript.configuration.WebBrowser; import com.google.common.collect.ImmutableList; public class Test{ public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException { WebClient webClient = new WebClient(); webClient.setRefreshHandler(new ImmediateRefreshHandler()); webClient.getOptions().setThrowExceptionOnScriptError(false); List<String> urls = ImmutableList.of("http://www.scielo.org.co/scielo.php?script=sci_arttext&pid=S0120-99572012000600002"); for(String url:urls) { HtmlPage page = webClient.getPage(url); List<HtmlAnchor> anchors = page.getAnchors(); for (HtmlAnchor anchor:anchors) { String linkText = anchor.getTextContent(); if(linkText.contains("pdf")) { System.out.println("clicking on anchor:"+anchor); Page pdfPage = anchor.click(); webClient.waitForBackgroundJavaScriptStartingBefore(1000); if(pdfPage.isHtmlPage()) { HtmlPage p = (HtmlPage) pdfPage; System.out.println(p.asText()); } else { System.out.println("Page is pdf"); System.out.println(pdfPage); } } } } } } Here is the output: 17/12/2013 12:04:40 com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify WARNING: Obsolete content type encountered: 'application/x-javascript'. 17/12/2013 12:04:40 com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError SEVERE: runtimeError: message=[Unexpected call to method or property access] sourceName=[http://www.scielo.org.co/applications/scielo-org/js/jquery-1.4.2.min.js] line=[35] lineSource=[null] lineOffset=[0] 17/12/2013 12:04:40 com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError SEVERE: runtimeError: message=[The data necessary to complete this operation is not yet available.] sourceName=[http://www.scielo.org.co/applications/scielo-org/js/jquery-1.4.2.min.js] line=[16] lineSource=[null] lineOffset=[0] 17/12/2013 12:04:41 org.apache.http.impl.client.DefaultHttpClient tryExecute INFO: I/O exception (org.apache.http.NoHttpResponseException) caught when processing request: The target server failed to respond 17/12/2013 12:04:41 org.apache.http.impl.client.DefaultHttpClient tryExecute INFO: Retrying request 17/12/2013 12:04:42 com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify WARNING: Obsolete content type encountered: 'application/x-javascript'. 17/12/2013 12:04:43 com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify WARNING: Obsolete content type encountered: 'application/x-javascript'. 17/12/2013 12:04:43 com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify WARNING: Obsolete content type encountered: 'application/x-javascript'. 17/12/2013 12:04:44 com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify WARNING: Obsolete content type encountered: 'application/x-javascript'. 17/12/2013 12:04:44 com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError SEVERE: runtimeError: message=[The data necessary to complete this operation is not yet available.] sourceName=[http://s7.addthis.com/static/r07/core113.js] line=[2] lineSource=[null] lineOffset=[0] 17/12/2013 12:04:46 com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify WARNING: Obsolete content type encountered: 'text/javascript'. 17/12/2013 12:04:47 com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error WARNING: CSS error: 'http://s7.addthis.com/static/r07/widget118.css' [1:5310] Error in style rule. (Invalid token "*". Was expecting one of: <EOF>, <S>, <IDENT>, "}", ";".) 17/12/2013 12:04:47 com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning WARNING: CSS warning: 'http://s7.addthis.com/static/r07/widget118.css' [1:5310] Ignoring the following declarations in this rule. 17/12/2013 12:04:47 com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error WARNING: CSS error: 'http://s7.addthis.com/static/r07/widget118.css' [1:5383] Error in style rule. (Invalid token "*". Was expecting one of: <EOF>, <S>, <IDENT>, "}", ";".) 17/12/2013 12:04:47 com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning WARNING: CSS warning: 'http://s7.addthis.com/static/r07/widget118.css' [1:5383] Ignoring the following declarations in this rule. 17/12/2013 12:04:47 com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error WARNING: CSS error: 'http://s7.addthis.com/static/r07/widget118.css' [1:62894] Error in expression. (Invalid token "#0d98fb". Was expecting one of: <S>, <NUMBER>, <IDENT>, <STRING>, <PLUS>, <COMMA>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <PERCENTAGE>, <URI>, "-", "=", ")".) 17/12/2013 12:04:47 com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error WARNING: CSS error: 'http://s7.addthis.com/static/r07/widget118.css' [1:62911] Error in style rule. (Invalid token "background-image". Was expecting one of: <EOF>, "}", ";".) 17/12/2013 12:04:47 com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning WARNING: CSS warning: 'http://s7.addthis.com/static/r07/widget118.css' [1:62911] Ignoring the following declarations in this rule. 17/12/2013 12:04:47 com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error WARNING: CSS error: 'http://s7.addthis.com/static/r07/widget118.css' [1:63424] Error in expression. (Invalid token "#0a85dd". Was expecting one of: <S>, <NUMBER>, <IDENT>, <STRING>, <PLUS>, <COMMA>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <PERCENTAGE>, <URI>, "-", "=", ")".) 17/12/2013 12:04:47 com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error WARNING: CSS error: 'http://s7.addthis.com/static/r07/widget118.css' [1:63441] Error in style rule. (Invalid token "background-image". Was expecting one of: <EOF>, "}", ";".) 17/12/2013 12:04:47 com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning WARNING: CSS warning: 'http://s7.addthis.com/static/r07/widget118.css' [1:63441] Ignoring the following declarations in this rule. 17/12/2013 12:04:47 com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error WARNING: CSS error: 'http://s7.addthis.com/static/r07/widget118.css' [1:83274] Error in @media rule. (Invalid token "and". Was expecting one of: <S>, <LBRACE>, <COMMA>.) 17/12/2013 12:04:47 com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning WARNING: CSS warning: 'http://s7.addthis.com/static/r07/widget118.css' [1:83274] Ignoring the whole rule. 17/12/2013 12:04:51 com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.7'. 17/12/2013 12:04:51 com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.7'.] sourceName=[http://www.google-analytics.com/ga.js] line=[24] lineSource=[null] lineOffset=[0] 17/12/2013 12:04:51 com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.6'. 17/12/2013 12:04:51 com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.6'.] sourceName=[http://www.google-analytics.com/ga.js] line=[24] lineSource=[null] lineOffset=[0] 17/12/2013 12:04:51 com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'. 17/12/2013 12:04:51 com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'.] sourceName=[http://www.google-analytics.com/ga.js] line=[24] lineSource=[null] lineOffset=[0] 17/12/2013 12:04:53 com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify WARNING: Obsolete content type encountered: 'application/x-javascript'. 17/12/2013 12:04:53 com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError SEVERE: runtimeError: message=[An invalid or illegal selector was specified (selector: '.box:eq(0)' error: Invalid selector: *.box:eq(0)).] sourceName=[http://www.scielo.org.co/applications/scielo-org/js/jquery-1.4.2.min.js] line=[91] lineSource=[null] lineOffset=[0] 17/12/2013 12:04:53 com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError SEVERE: runtimeError: message=[An invalid or illegal selector was specified (selector: '.box:last' error: Invalid selector: *.box:last).] sourceName=[http://www.scielo.org.co/applications/scielo-org/js/jquery-1.4.2.min.js] line=[91] lineSource=[null] lineOffset=[0] clicking on anchor:HtmlAnchor[<a href="javascript:%20void(0);%20" onclick="setTimeout("window.open('http://www.scielo.org.co/scielo.php?script=sci_pdf&pid=S0120-99572012000600002&lng=en&nrm=iso&tlng=es ','_self')", 3000);">] Revista Colombiana de Gastroenterologia - I. Epidemiolog?a Services on Demand Article Article in pdf format Article in xml format Article references How to cite this article Automatic translation Send this article by e-mail Indicators Related links Bookmark ... What do i wrong? Here is the log: ------------------------------------------------------------------------------ Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk _______________________________________________ Htmlunit-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlunit-user |