From: John S. <ko...@ea...> - 2016-07-30 07:44:40
|
Hi, I have been using a bit of htmlunit to download data and documents from websites for about a year. One of those sites updated there interface to https and to be more "mobile". I have not been able to acquire the search result data. I connected Fiddler to help. I can see the initial page open, set the name and see the page updated, then when the java clicks the search button I don't get the results in htmlunit, if I send that to the terminal I get a page that states "loading", which I don't see in fiddler, But I can see the search result page returned in fiddler (two times) . And as a test I can copy that fiddler text, save to a file.html and open the result that matches what I would see in a browser. I think I have goggled , looked and tried everything but an working answer. I don't know if I am in a complex htmlunit/javascript/https or a simple beginner htmlunit programing issue. It's so close to have the data in fiddler and not be able to get in into java. I have tried a number of webClient.getOptions and wait forJava script in different combinations. I would be thankful for any pointers that might help and thought I would check first for a beginner mistakes. And then post the more voluminous errors / Fiddler results after that if needed, because it looks like I have the same errors getting to the page with the search name entered as trying to get to the search results . Sorry if the below is a bit verbose but I tried to cut to a minimum total test case at least. Thanks John Me, a bit over my head as a 1970/80 microprocessor assembler programmer, dabbling in java/htmlunit htmlunit-2.22-bin.zip Product Version: NetBeans IDE 8.0.2 (Build 201411181905) Updates: NetBeans IDE is updated to version NetBeans 8.0.2 Patch 2 Java: 1.8.0_60; Java HotSpot(TM) 64-Bit Server VM 25.60-b23 Runtime: Java(TM) SE Runtime Environment 1.8.0_60-b27 System: Windows 7 version 6.1 running on amd64; Cp1252; en_US (nb) Telerik Fiddler Web Debugger (v4.6.2.32002) I have a number of errors that pop up. Most are css errors under "ALL" and are not "SEVERE". The SEVERE errors all start with . illegal selector. and look like this example: SEVERE: runtimeError: message=[An invalid or illegal selector was specified (selector: '[id='sizzle-1469854857076'] :mobile-panel' error: Invalid selector: [id="sizzle-1469854857076"] :mobile-panel).] sourceName=[https://crrecords.slocounty.ca.gov/SLOWeb/resources/jquery-1.11. 0.js;jsessionid=F228FB048D9BE7AAFED5E8B5B8725160] line=[865] lineSource=[null] lineOffset=[0] Actual code from netbeans: package slodeeddatascraper; import com.gargoylesoftware.htmlunit.BrowserVersion; import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException; import com.gargoylesoftware.htmlunit.NicelyResynchronizingAjaxController; import com.gargoylesoftware.htmlunit.SilentCssErrorHandler; import com.gargoylesoftware.htmlunit.WebClient; import com.gargoylesoftware.htmlunit.html.HtmlAnchor; import com.gargoylesoftware.htmlunit.html.HtmlForm; import com.gargoylesoftware.htmlunit.html.HtmlPage; import com.gargoylesoftware.htmlunit.html.HtmlTextInput; import java.io.File; import java.io.IOException; import java.util.logging.Level; import java.util.logging.Logger; public class SLODeedDataScraper { public static void main(String[] args) { java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.SE VERE); // ALL SEVERE OFF try (final WebClient webClient = new WebClient(BrowserVersion.CHROME,"127.0.0.1", 8888)) { webClient.getOptions().setJavaScriptEnabled(true); webClient.getOptions().setTimeout(40000);// increase from 2000 to 40000 webClient.getOptions().setUseInsecureSSL(true); webClient.getCookieManager().setCookiesEnabled(true); webClient.getOptions().setThrowExceptionOnScriptError(false); webClient.getOptions().setPrintContentOnFailingStatusCode(false); webClient.setCssErrorHandler(new SilentCssErrorHandler()); webClient.setAjaxController(new NicelyResynchronizingAjaxController()); try { HtmlPage page1 = webClient.getPage("https://crrecords.slocounty.ca.gov/SLOWeb/search/DOCSEARC H215S1"); webClient.waitForBackgroundJavaScriptStartingBefore(10000); webClient.waitForBackgroundJavaScript(100000); HtmlForm form = page1.getForms().get(0); // Change the value of the text field to the name to search for final HtmlTextInput textField = form.getInputByName("field_BothNamesID"); textField.setValueAttribute("Snyder"); webClient.waitForBackgroundJavaScriptStartingBefore(10000); webClient.waitForBackgroundJavaScript(100000); // save page as file in the dir above netbeans page1.save(new File("..//../myfile2.html")); System.out.println("Test 20.00 tests good name placed in field "); // webpage Search Button code: // <a class="self-service-right ui-link ui-btn ui-btn-b ui-icon-search ui-btn-icon-right ui-btn-inline ui-shadow ui-corner-all" // id="searchButton" // data-role="button" data-inline="true" data-theme="b" data-icon="search" data-iconpos="right" // href="/SLOWeb/searchResults/DOCSEARCH215S1" // role="button"> Search </a> System.out.println("Test 40.02 HtmlAnchor found:" +form.getFirstByXPath(".//a[@id='searchButton']")); HtmlAnchor anchor = form.getFirstByXPath(".//a[@id='searchButton']"); anchor.click(); webClient.waitForBackgroundJavaScriptStartingBefore(10000); webClient.waitForBackgroundJavaScript(100000); System.out.println("Test 60.09 Done waitForBackgroundJavaScript..." ); // Check for results from of search System.out.println(page1.asXml()); // save page as file in the dir above netbeans page1.save(new File("..//../myfile4.html")); } catch (IOException ex) { Logger.getLogger(SLODeedDataScraper.class.getName()).log(Level.SEVERE, null, ex); } catch (FailingHttpStatusCodeException ex) { Logger.getLogger(SLODeedDataScraper.class.getName()).log(Level.SEVERE, null, ex); } } } } |