From: Ahmed A. <asa...@ya...> - 2016-03-04 09:42:35
|
Hi, As hinted earlier, you need to add "//" before span The below code prints something: public static void main(String[] args) throws Exception { try (final WebClient webClient = new WebClient(BrowserVersion.CHROME)) { String url = "http://localhost:8080/snippet.html"; HtmlPage page = webClient.getPage(url); List<DomNode> nodeProduct = (List<DomNode>) page.getByXPath("//*[@data-selenium='itemDetail']"); if (nodeProduct.size() > 0) { for (DomNode e : nodeProduct) { List<DomNode> b = (List<DomNode>) e.getByXPath("//span[@itemprop='brand']"); System.out.println(b); } } } } From: Stephen Paulsen <st...@lo...> To: Ahmed Ashour <asa...@ya...>; htm...@li... Sent: Thursday, March 3, 2016 7:45 PM Subject: Re: [Htmlunit-user] Nested getByXPath Has Me All Confused Hi, Ahmed. Attached is a ZIP file which includes 3 text files: vendor.html snippet.html analyzeResults.txt I've obscured the obvious information about the vendor. You can see in the fill vendor.html that there is a lot going on. I have been able to separate out the 24 snippets that I need with the data-selenium='itemDetail', however even though the documentation, and your note, indicates the //div... should work, it does not. I've not yet tried the "contains" construction of the parameter, but I do not think that would explain why the search path doesn't work as is. When I apply the itemprop='brand' to the snippet, I get zero results. When I apply the //span to the snippet alone, I get *all* 24 brand listings from the complete page, even though I am asking only about the specific element in e. The point is to scrape the brand, name, and price from all 24 results returned by the search. The analyzeResults.txt is the Java I have been using. You can see some of the variations I have used in constructing the search for the brand. Until that works, I have given up on the searches for the related product name and price. Your thoughts? Thanks! ~ Steve ----- Stephen M. Paulsen Lowing Light & Grip |