From: Felipe S. <fel...@gm...> - 2018-06-06 16:12:36
|
Hi, i'm new in HtmlUnit trying to parse a div content thats loaded by JavaScript. My code: WebClient client = new WebClient(BrowserVersion.CHROME); HtmlPage pagina = client.getPage("https://www.rico.com.vc/renda-fixa/cdb"); client.getOptions().setThrowExceptionOnScriptError(false); client.getOptions().setJavaScriptEnabled(true); client.setAjaxController(new NicelyResynchronizingAjaxController()); client.waitForBackgroundJavaScript(60000); System.out.println(pagina.asText()); But i'm getting this errors: INFORMAÇÕES: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" " http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <title>Rico.com.vc</title> <link href=' https://fonts.googleapis.com/css?family=Montserrat:400,700|Open+Sans:400,700&subset=latin,latin-ext' rel='stylesheet' type='text/css'> <style type="text/css"> body { margin:0; padding:0; font-family:'Trebuchet Ms', Arial, Helvetica; font-size:12px; } .text-center { text-align: center; } .uppercase { text-transform: uppercase; } .font-montserrat { font-family: 'Montserrat'; } .font-open { font-family: 'Open Sans'; } .font-size-1 { font-size: 16px; } .font-size-2 { font-size: 28px; } .padding-bottom-1 { padding-bottom: 10px; } .padding-bottom-2 { padding-bottom: 30px; } .msg-error { margin-top: -145px; padding-left: 235px; font-size: 60px; color: #FFF; text-shadow: -1px -1px 0 #F18719, 1px -1px 0 #F18719, -1px 1px 0 #F18719, 1px 1px 0 #F18719; } .margin-top-1 { margin-top: 145px; } .font-grey { color: grey; } .font-bold { font-weight: bold; } .img_logo { width: 50%; } .img { width: 110%; padding: 20px 0 30px 0; } .content { width: 500px; text-align: center; margin: 20px auto; } .button-wrapper { background-color: #EF8A32; padding: 15px 15px; border: 1px solid #EF8A32; border-radius: 4px; } .home-link { color: white; text-decoration: none; font-weight: bold; } </style> </head> <body onload="document_onload()"> <script type="text/javascript"> function document_onload() { lblErro.innerHTML = ""; } </script> <div class="content"> <a href="/"><img src="//www.rico.com.vc/rico-base/Rico_logo.jpg" class="img_logo" border="0" alt="Logo Rico" /></a> <img src="//www.rico.com.vc/dashboard/img/404.png" class="img" alt="Computador com mensagem de erro" /> <div> <p class="msg-error">404</p> </div> </div> <div class="text-center"> <p class="uppercase font-montserrat font-size-2 font-grey font-bold padding-bottom-1 margin-top-1">A página solicitada não foi encontrada.</p> <p class="font-open font-size-1 font-grey padding-bottom-2">Caso o problema persista, favor entrar em contato com<br /> nossa central de <a href=" https://www.rico.com.vc/servicos/atendimento/contato" class="font-grey font-bold">atendimento.</a></p> <button class="button-wrapper" type="button"> <a href="/" class="home-link uppercase" title="Voltar para a Home">Ir para a Página Inicial ></a> </button> </div> </body> </html> jun 06, 2018 12:45:12 PM com.gargoylesoftware.htmlunit.javascript.DefaultJavaScriptErrorListener loadScriptError GRAVE: Error loading JavaScript from [ https://www.rico.com.vc:443/WebResource.axd?d=p-e1U0PJjdGCHIHBWiD1_mnNyd8XXQ5baJIt17nqS_Wf552pOyyqkjGu6pxXAZ0QL3vedCpP0awH9-IXEKTmPIHCFcY_2PgSBqh3-Kt13gLbD5Wx8QQ_xVePbKJbgc7Nt5QmnNlvk1_kvJEpqvYH5nDIR3o1&t=636177466400000000 ]. com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException: 404 Not Found for https://www.rico.com.vc:443/WebResource.axd?d=p-e1U0PJjdGCHIHBWiD1_mnNyd8XXQ5baJIt17nqS_Wf552pOyyqkjGu6pxXAZ0QL3vedCpP0awH9-IXEKTmPIHCFcY_2PgSBqh3-Kt13gLbD5Wx8QQ_xVePbKJbgc7Nt5QmnNlvk1_kvJEpqvYH5nDIR3o1&t=636177466400000000 at com.gargoylesoftware.htmlunit.WebClient.throwFailingHttpStatusCodeExceptionIfNecessary(WebClient.java:590) at com.gargoylesoftware.htmlunit.html.HtmlPage.loadJavaScriptFromUrl(HtmlPage.java:1034) at com.gargoylesoftware.htmlunit.html.HtmlPage.loadExternalJavaScriptFile(HtmlPage.java:975) at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:371) at com.gargoylesoftware.htmlunit.html.HtmlScript$2.execute(HtmlScript.java:246) at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:267) at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:805) at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source) at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:761) at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1236) at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1136) at net.sourceforge.htmlunit.cyberneko.filters.DefaultFilter.endElement(DefaultFilter.java:226) at net.sourceforge.htmlunit.cyberneko.filters.NamespaceBinder.endElement(NamespaceBinder.java:345) at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3189) at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2141) at net.sourceforge.htmlunit.cyberneko.HTMLScanner.scanDocument(HTMLScanner.java:945) at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:521) at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:472) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:1004) at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:253) at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:195) at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:267) at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:158) at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:529) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:398) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:315) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:463) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:448) at br.com.controller.RoboRico.capturarConteudo(RoboRico.java:26) at org.Robo.Main.main(Main.java:14) Exception in thread "main" com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException: 404 Not Found for https://www.rico.com.vc:443/WebResource.axd?d=p-e1U0PJjdGCHIHBWiD1_mnNyd8XXQ5baJIt17nqS_Wf552pOyyqkjGu6pxXAZ0QL3vedCpP0awH9-IXEKTmPIHCFcY_2PgSBqh3-Kt13gLbD5Wx8QQ_xVePbKJbgc7Nt5QmnNlvk1_kvJEpqvYH5nDIR3o1&t=636177466400000000 at com.gargoylesoftware.htmlunit.WebClient.throwFailingHttpStatusCodeExceptionIfNecessary(WebClient.java:590) at com.gargoylesoftware.htmlunit.html.HtmlPage.loadJavaScriptFromUrl(HtmlPage.java:1034) at com.gargoylesoftware.htmlunit.html.HtmlPage.loadExternalJavaScriptFile(HtmlPage.java:975) at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:371) at com.gargoylesoftware.htmlunit.html.HtmlScript$2.execute(HtmlScript.java:246) at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:267) at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:805) at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source) at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:761) at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1236) at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1136) at net.sourceforge.htmlunit.cyberneko.filters.DefaultFilter.endElement(DefaultFilter.java:226) at net.sourceforge.htmlunit.cyberneko.filters.NamespaceBinder.endElement(NamespaceBinder.java:345) at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3189) at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2141) at net.sourceforge.htmlunit.cyberneko.HTMLScanner.scanDocument(HTMLScanner.java:945) at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:521) at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:472) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:1004) at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:253) at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:195) at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:267) at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:158) at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:529) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:398) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:315) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:463) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:448) at br.com.controller.RoboRico.capturarConteudo(RoboRico.java:26) at org.Robo.Main.main(Main.java:14) Thanks. |