From: Stephen P. <st...@lo...> - 2016-03-03 04:03:14
|
Hello, HtmlUnit Users. I am not even sure how to ask this question. Tell me if I am leaving out any important information. Here it goes. I am using HtmlUnit to scrape information from various web sites in order to see what my competitors are charging for products we also sell. On one vendor's site, I am able to get the page I expect with the standard this.page = this.webClient.getPage(myURL); Digging into the structure of the page, I am able to identify the 24 products that come back from my search simulated by the getPage List<DomNode> nodeProduct = (List<DomNode>) this.page.getByXPath("//*[@data-selenium='itemDetail']"); I can confirm that the nodeProduct list has 24 elements in it. I do not know why I have to use the wildcard in the XPath. It *should* be //div[@data-selenium='itemDetail'], but that always returns zero entries in the List. For my next trick, I start to iterate through the list to examine each individual entry in the list: for (DomNode e : nodeProduct) { stuff } By way of debugging, I include in "stuff" this line: System.out.println(e.asXml()); This shows me that I do, in fact, have one of the 24 possible products for which I searched. It compares to the HTML source from my browser correctly. For purposes of asking this question, this is the HTML in question: <span itemprop="brand">Kino Flo</span> When I try to get HtmlUnit to tell me what the brand name is of the product in question, I have tried several options. I started with this because it seemed to make the most sense, and appears to be what all the documentation and examples indicate will work: List<DomNode> b = (List<DomNode>) e.getByXPath("span[@itemprop='brand']"); No matter what I have tried, the list of b DomNodes either comes back as length=zero, or false. Yes, the span element I am looking for is buried beneath several other divs. According to the documentation I have read about XPath, this search *should* find it even if buried. I may be misunderstanding the requirements. I come to you on the mailing list after having tried DomNode, DomElement, HtmlElement, wildcards, fully qualified paths, and everything else I can think of or find in an example. What am I misunderstanding? Thank you! ~ Steve ----- Stephen M. Paulsen Lowing Light & Grip |