From: navneet s. <nav...@gm...> - 2014-06-23 08:04:41
|
Hi, I have a requirement of crawling few sites and extracting product information like name, price, image etc. Using Java and Apache http library, i can access static websites where content is delivered in one call from server. But i am getting any useful response while using the same approach for sites using javascript/ajax and framesets. For ex: www.kmart.com Can you please tell me how i can use your api to get the final content of any page, i mean the content which is actually shown by the browser. By the documentation you have published, i feel it is possible but i am not getting the correct results. Here is the code i am using::: final WebClient webClient = new WebClient(BrowserVersion.CHROME); try { webClient.getOptions().setJavaScriptEnabled(true); webClient.getOptions().setThrowExceptionOnScriptError(false); webClient.getOptions().setThrowExceptionOnFailingStatusCode(false); webClient.getOptions().setCssEnabled(false); webClient.getOptions().setRedirectEnabled(true); webClient.getOptions().setThrowExceptionOnScriptError(false); webClient.waitForBackgroundJavaScript(100000); //webClient.waitForBackgroundJavaScriptStartingBefore(10000); //webClient.setAjaxController(new NicelyResynchronizingAjaxController()); final HtmlPage page = webClient.getPage("http://www.kmart.com/ "); webClient.waitForBackgroundJavaScript(100000); page.refresh(); page.getHtmlPageOrNull().asText(); System.out.println("is page ready::" + page.getReadyState()); System.out.println(page.asText()); //get list of all divs final List<?> divs = page.getByXPath("//div"); //get div which has a 'name' attribute of 'John' final HtmlDivision div = (HtmlDivision) page.getByXPath("//div[@id='gnf_header']").get(0); webClient.closeAllWindows(); } catch (IOException | FailingHttpStatusCodeException ex) { Logger.getLogger(HtmlUnitForKmart.class.getName()).log(Level.SEVERE, null, ex); } Thanks for your response. Navneet Sharma |