From: Ahmed A. <asa...@ya...> - 2016-03-04 09:42:35
|
Hi, As hinted earlier, you need to add "//" before span The below code prints something: public static void main(String[] args) throws Exception { try (final WebClient webClient = new WebClient(BrowserVersion.CHROME)) { String url = "http://localhost:8080/snippet.html"; HtmlPage page = webClient.getPage(url); List<DomNode> nodeProduct = (List<DomNode>) page.getByXPath("//*[@data-selenium='itemDetail']"); if (nodeProduct.size() > 0) { for (DomNode e : nodeProduct) { List<DomNode> b = (List<DomNode>) e.getByXPath("//span[@itemprop='brand']"); System.out.println(b); } } } } From: Stephen Paulsen <st...@lo...> To: Ahmed Ashour <asa...@ya...>; htm...@li... Sent: Thursday, March 3, 2016 7:45 PM Subject: Re: [Htmlunit-user] Nested getByXPath Has Me All Confused Hi, Ahmed. Attached is a ZIP file which includes 3 text files: vendor.html snippet.html analyzeResults.txt I've obscured the obvious information about the vendor. You can see in the fill vendor.html that there is a lot going on. I have been able to separate out the 24 snippets that I need with the data-selenium='itemDetail', however even though the documentation, and your note, indicates the //div... should work, it does not. I've not yet tried the "contains" construction of the parameter, but I do not think that would explain why the search path doesn't work as is. When I apply the itemprop='brand' to the snippet, I get zero results. When I apply the //span to the snippet alone, I get *all* 24 brand listings from the complete page, even though I am asking only about the specific element in e. The point is to scrape the brand, name, and price from all 24 results returned by the search. The analyzeResults.txt is the Java I have been using. You can see some of the variations I have used in constructing the search for the brand. Until that works, I have given up on the searches for the related product name and price. Your thoughts? Thanks! ~ Steve ----- Stephen M. Paulsen Lowing Light & Grip |
From: Stephen P. <st...@lo...> - 2016-03-04 18:18:16
|
Hi, Ahmed. That's all well and good, but when you run against the full HTML of the whole page, or against the live site, this is what I get as output: /* * * * */ package hutesting; import com.gargoylesoftware.htmlunit.BrowserVersion; import com.gargoylesoftware.htmlunit.WebClient; import com.gargoylesoftware.htmlunit.html.DomNode; import com.gargoylesoftware.htmlunit.html.HtmlPage; import java.util.List; /** * * @author spaulsen */ public class HUTesting { /** * @param args the command line arguments * @throws java.lang.Exception */ public static void main(String[] args) throws Exception { try (final WebClient webClient = new WebClient(BrowserVersion.CHROME)) { // String url = "http://localhost:8888/vendor.html"; String url = "http://www.bhphotovideo.com/c/search?Ntt=kas-"; // Yes, this is a live site. Be nice. HtmlPage page = webClient.getPage(url); List<DomNode> nodeProduct = (List<DomNode>) page.getByXPath("//*[@data-selenium='itemDetail']"); if (nodeProduct.size() > 0) { for (DomNode e : nodeProduct) { List<DomNode> b = (List<DomNode>) e.getByXPath("//span[@itemprop='brand']"); System.out.println(b); } } } } } Output: run: Mar 04, 2016 1:06:55 PM com.gargoylesoftware.htmlunit.html.HtmlPage loadExternalJavaScriptFile ( Irrelevant and can be ignored ) Mar 04, 2016 1:06:55 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify Mar 04, 2016 1:06:56 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error Mar 04, 2016 1:06:56 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning Mar 04, 2016 1:06:56 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error Mar 04, 2016 1:06:56 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning Mar 04, 2016 1:06:56 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error Mar 04, 2016 1:06:56 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error Mar 04, 2016 1:06:56 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error Mar 04, 2016 1:06:56 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error Mar 04, 2016 1:06:56 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error Mar 04, 2016 1:06:56 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error Mar 04, 2016 1:06:56 PM com.gargoylesoftware.htmlunit.javascript.host.css.CSSStyleSheet pixelValue Mar 04, 2016 1:06:57 PM com.gargoylesoftware.htmlunit.javascript.host.css.CSSStyleSheet pixelValue Mar 04, 2016 1:06:57 PM com.gargoylesoftware.htmlunit.javascript.host.css.CSSStyleSheet pixelValue [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] BUILD SUCCESSFUL (total time: 6 seconds) ----- Stephen M. Paulsen Lowing Light & Grip > On Mar 4, 2016, at 4:42 AM, Ahmed Ashour <asa...@ya...> wrote: > > Hi, > > As hinted earlier, you need to add "//" before span > > The below code prints something: > > public static void main(String[] args) throws Exception { > try (final WebClient webClient = new WebClient(BrowserVersion.CHROME)) { > > String url = "http://localhost:8080/snippet.html"; > HtmlPage page = webClient.getPage(url); > List<DomNode> nodeProduct = (List<DomNode>) page.getByXPath("//*[@data-selenium='itemDetail']"); > > if (nodeProduct.size() > 0) { > for (DomNode e : nodeProduct) { > List<DomNode> b = (List<DomNode>) e.getByXPath("//span[@itemprop='brand']"); > System.out.println(b); > } > } > } > } > > > > From: Stephen Paulsen <st...@lo...> > To: Ahmed Ashour <asa...@ya...>; htm...@li... > Sent: Thursday, March 3, 2016 7:45 PM > Subject: Re: [Htmlunit-user] Nested getByXPath Has Me All Confused > > Hi, Ahmed. > > Attached is a ZIP file which includes 3 text files: > > vendor.html > snippet.html > analyzeResults.txt > > I've obscured the obvious information about the vendor. > > You can see in the fill vendor.html that there is a lot going on. I have been able to separate out the 24 snippets that I need with the data-selenium='itemDetail', however even though the documentation, and your note, indicates the //div... should work, it does not. I've not yet tried the "contains" construction of the parameter, but I do not think that would explain why the search path doesn't work as is. > > When I apply the itemprop='brand' to the snippet, I get zero results. When I apply the //span to the snippet alone, I get *all* 24 brand listings from the complete page, even though I am asking only about the specific element in e. > > The point is to scrape the brand, name, and price from all 24 results returned by the search. > > The analyzeResults.txt is the Java I have been using. You can see some of the variations I have used in constructing the search for the brand. Until that works, I have given up on the searches for the related product name and price. > > Your thoughts? > > Thanks! > > ~ Steve > > > > ----- > Stephen M. Paulsen > Lowing Light & Grip > > > |
From: Ahmed A. <asa...@ya...> - 2016-03-04 21:25:11
|
Hi Stephen, 'brand' is a descendant of 'itemHeading', not of 'itemDetail'. The below works with latest version (with a workaround for the failling JavaScript, a bug report should be created for this). public static void main(String[] args) throws Exception { try (final WebClient webClient = new WebClient(BrowserVersion.CHROME)) { String url = "http://www.bhphotovideo.com/c/search?Ntt=kas-"; // Yes, this is a live site. Be nice. HtmlPage page = webClient.getPage(url); List<DomNode> nodeProduct = (List<DomNode>) page.getByXPath("//*[@data-selenium='itemHeading']"); if (nodeProduct.size() > 0) { for (DomNode e : nodeProduct) { System.out.println(e.asXml()); List<DomNode> b = (List<DomNode>) page.getByXPath("//span[@itemprop='brand']"); System.out.println(b); } } } } From: Stephen Paulsen <st...@lo...> To: Ahmed Ashour <asa...@ya...> Cc: "htm...@li..." <htm...@li...> Sent: Friday, March 4, 2016 7:18 PM Subject: Re: [Htmlunit-user] Nested getByXPath Has Me All Confused Hi, Ahmed. That's all well and good, but when you run against the full HTML of the whole page, or against the live site, this is what I get as output: /* * * * */ package hutesting; import com.gargoylesoftware.htmlunit.BrowserVersion; import com.gargoylesoftware.htmlunit.WebClient; import com.gargoylesoftware.htmlunit.html.DomNode; import com.gargoylesoftware.htmlunit.html.HtmlPage; import java.util.List; /** * * @author spaulsen */ public class HUTesting { /** * @param args the command line arguments * @throws java.lang.Exception */ public static void main(String[] args) throws Exception { try (final WebClient webClient = new WebClient(BrowserVersion.CHROME)) { // String url = "http://localhost:8888/vendor.html"; String url = "http://www.bhphotovideo.com/c/search?Ntt=kas-"; // Yes, this is a live site. Be nice. HtmlPage page = webClient.getPage(url); List<DomNode> nodeProduct = (List<DomNode>) page.getByXPath("//*[@data-selenium='itemDetail']"); if (nodeProduct.size() > 0) { for (DomNode e : nodeProduct) { List<DomNode> b = (List<DomNode>) e.getByXPath("//span[@itemprop='brand']"); System.out.println(b); } } } } } Output: run: Mar 04, 2016 1:06:55 PM com.gargoylesoftware.htmlunit.html.HtmlPage loadExternalJavaScriptFile ( Irrelevant and can be ignored ) Mar 04, 2016 1:06:55 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify Mar 04, 2016 1:06:56 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error Mar 04, 2016 1:06:56 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning Mar 04, 2016 1:06:56 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error Mar 04, 2016 1:06:56 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning Mar 04, 2016 1:06:56 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error Mar 04, 2016 1:06:56 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error Mar 04, 2016 1:06:56 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error Mar 04, 2016 1:06:56 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error Mar 04, 2016 1:06:56 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error Mar 04, 2016 1:06:56 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error Mar 04, 2016 1:06:56 PM com.gargoylesoftware.htmlunit.javascript.host.css.CSSStyleSheet pixelValue Mar 04, 2016 1:06:57 PM com.gargoylesoftware.htmlunit.javascript.host.css.CSSStyleSheet pixelValue Mar 04, 2016 1:06:57 PM com.gargoylesoftware.htmlunit.javascript.host.css.CSSStyleSheet pixelValue [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] BUILD SUCCESSFUL (total time: 6 seconds) ----- Stephen M. Paulsen Lowing Light & Grip > On Mar 4, 2016, at 4:42 AM, Ahmed Ashour <asa...@ya...> wrote: > > Hi, > > As hinted earlier, you need to add "//" before span > > The below code prints something: > > public static void main(String[] args) throws Exception { > try (final WebClient webClient = new WebClient(BrowserVersion.CHROME)) { > > String url = "http://localhost:8080/snippet.html"; > HtmlPage page = webClient.getPage(url); > List<DomNode> nodeProduct = (List<DomNode>) page.getByXPath("//*[@data-selenium='itemDetail']"); > > if (nodeProduct.size() > 0) { > for (DomNode e : nodeProduct) { > List<DomNode> b = (List<DomNode>) e.getByXPath("//span[@itemprop='brand']"); > System.out.println(b); > } > } > } > } > > > > From: Stephen Paulsen <st...@lo...> > To: Ahmed Ashour <asa...@ya...>; htm...@li... > Sent: Thursday, March 3, 2016 7:45 PM > Subject: Re: [Htmlunit-user] Nested getByXPath Has Me All Confused > > Hi, Ahmed. > > Attached is a ZIP file which includes 3 text files: > > vendor.html > snippet.html > analyzeResults.txt > > I've obscured the obvious information about the vendor. > > You can see in the fill vendor.html that there is a lot going on. I have been able to separate out the 24 snippets that I need with the data-selenium='itemDetail', however even though the documentation, and your note, indicates the //div... should work, it does not. I've not yet tried the "contains" construction of the parameter, but I do not think that would explain why the search path doesn't work as is. > > When I apply the itemprop='brand' to the snippet, I get zero results. When I apply the //span to the snippet alone, I get *all* 24 brand listings from the complete page, even though I am asking only about the specific element in e. > > The point is to scrape the brand, name, and price from all 24 results returned by the search. > > The analyzeResults.txt is the Java I have been using. You can see some of the variations I have used in constructing the search for the brand. Until that works, I have given up on the searches for the related product name and price. > > Your thoughts? > > Thanks! > > ~ Steve > > > > ----- > Stephen M. Paulsen > Lowing Light & Grip > > > ------------------------------------------------------------------------------ _______________________________________________ Htmlunit-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlunit-user |
From: Stephen P. <st...@lo...> - 2016-03-07 21:32:02
|
Thank you for all your help. I'm still having trouble. Two things are on my mind: Thing 1 - The reason I grabbed itemDetail is because I am looking for three pieces of information that are all enclosed in that tree: <div data-selenium="itemDetail"> <div data-selenium="img-zone"> ... Stuff I don't care about ... </div> <div data-selenium="itemInfo-zone"> <div data-selenium="itemHeader"> <h3 data-selenium="itemHeading"> <a href=""blah blah blah> <span itemprop="brand"> I WANT THIS BRAND </span> <span itemprop="name"> I WANT THIS NAME </span> </a> </h3> <p> ... More stuff I don't care about ... </p> <p> ... Even More Stuff I Don't Care About ... </p> </div> <!-- itemHeader --> <div data-selenium="highlights"> ... Yet Even More Stuff I Don't Care About ... </div> <div data-selenium="itemSection"> ... Whatever ... </div> </div> <!-- itemInfo-zone --> <div data-selenium="conversion-zone"> <div data-selenium="price-zone"> <div data-selenium="prices"> ... Don't care ...</div> <div data-selenium="addToCartPrice"> <p data-selenium="finalPrice"> <span data-selenium="youpayPrice"> ... Yawn ... </span> <span data-selenium="price"> THIS IS THE PRICE I WANT </span> </p> <!-- finalPrice --> </div> <!-- addToCartPrice --> </div> <!-- price-zone --> </div> <!-- conversion-zone --> NOTE: For purposes of brevity, I have removed the additional attributes of the elements. Since I am climbing this learning curve, tell me if that is interfering with my use of XPath. You can see from the code that I am asking XPath to search for only those attributes which I think are the way to the information I want. Tell me if I should be using a different syntax, or even a different algorithm for this. Thing 2 - I applied the changes Ahmed suggested from the last reply. While I can get all 24 sections of itemHeading on the page, when I use the syntax he specified to get my brand information, I get an empty list: SOURCE CODE: public static void main(String[] args) throws Exception{ try (final WebClient webClient = new WebClient(BrowserVersion.CHROME)) { java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF); System.setProperty("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.NoOpLog"); // String url = "http://localhost:8888/vendor.html"; String url = "http://www.bhphotovideo.com/c/search?Ntt=kas-"; HtmlPage page = webClient.getPage(url); // List<DomNode> nodeProduct = (List<DomNode>) page.getByXPath("//*[@data-selenium='itemDetail']"); List<DomNode> nodeProduct = (List<DomNode>) page.getByXPath("//*[@data-selenium='itemHeading']"); if (nodeProduct.size() > 0) { for (DomNode e : nodeProduct) { System.out.println("***** Contents of e"); System.out.println(e.asXml()); System.out.println("***** End Contents of e"); List<DomNode> b = (List<DomNode>) page.getByXPath("//span[@itemprop='brand']"); System.out.println("--- Contents of b"); System.out.println(b); System.out.println("--- End Contents of b"); } } } } OUTPUT: run: ***** Contents of e <h3 data-selenium="itemHeading" class="bold fourteen"> <a href="http://www.bhphotovideo.com/c/product/258616-REG/Kino_Flo_KAS_D2_C_KAS_D2C_Diva_Lite_200_Travel.html" class="c5" data-selenium="itemHeadingLink" itemprop="url"> <span itemprop="brand"> Kino Flo </span> <span itemprop="name"> KAS-D2-C Diva-Lite 200 Travel Case </span> </a> </h3> ***** End Contents of e --- Contents of b [] --- End Contents of b ***** Contents of e <h3 data-selenium="itemHeading" class="bold fourteen"> <a href="http://www.bhphotovideo.com/c/product/1043631-REG/canon_9785b002_cn7x17_kas_s_cine_servo.html" class="c5" data-selenium="itemHeadingLink" itemprop="url"> <span itemprop="brand"> Canon </span> <span itemprop="name"> CN7x17 KAS S Cine-Servo 17-120mm T2.95 (PL Mount) </span> </a> </h3> ***** End Contents of e --- Contents of b [] --- End Contents of b ***** Contents of e <h3 data-selenium="itemHeading" class="bold fourteen"> <a href="http://www.bhphotovideo.com/c/product/1043629-REG/canon_9785b001_cn7x17_kas_s_cine_servo.html" class="c5" data-selenium="itemHeadingLink" itemprop="url"> <span itemprop="brand"> Canon </span> <span itemprop="name"> CN7x17 KAS S Cine-Servo 17-120mm T2.95 (EF Mount) </span> </a> </h3> ***** End Contents of e --- Contents of b [] --- End Contents of b ***** Contents of e <h3 data-selenium="itemHeading" class="bold fourteen"> <a href="http://www.bhphotovideo.com/c/product/258606-REG/Kino_Flo_KAS_CL6_Compact_Carry_Case_for.html" class="c5" data-selenium="itemHeadingLink" itemprop="url"> <span itemprop="brand"> Kino Flo </span> <span itemprop="name"> KAS-CL6 6-Lamp Carry Case </span> </a> </h3> ***** End Contents of e --- Contents of b [] --- End Contents of b ***** Contents of e <h3 data-selenium="itemHeading" class="bold fourteen"> <a href="http://www.bhphotovideo.com/c/product/505449-REG/Kino_Flo_KAS_D42_KAS_D42_Diva_Lite_400_Flight.html" class="c5" data-selenium="itemHeadingLink" itemprop="url"> <span itemprop="brand"> Kino Flo </span> <span itemprop="name"> KAS-D42 Diva-Lite 400 Wheeled Flight Case - for Two each Kino-Flo Diva-Lite 400 Fixtures, Stands, Mounts, Flozier and Lamp Cases </span> </a> </h3> ***** End Contents of e --- Contents of b [] --- End Contents of b ***** Contents of e <h3 data-selenium="itemHeading" class="bold fourteen"> <a href="http://www.bhphotovideo.com/c/product/884961-REG/kino_flo_kas_ce2_c_clamshell_travel_case.html" class="c5" data-selenium="itemHeadingLink" itemprop="url"> <span itemprop="brand"> Kino Flo </span> <span itemprop="name"> KAS-CE2-C Clamshell Travel Case (Yellow) </span> </a> </h3> ***** End Contents of e --- Contents of b [] --- End Contents of b ***** Contents of e <h3 data-selenium="itemHeading" class="bold fourteen"> <a href="http://www.bhphotovideo.com/c/product/429918-REG/Kino_Flo_KAS_24S_KAS_24S_Small_Telescoping_Shipping.html" class="c5" data-selenium="itemHeadingLink" itemprop="url"> <span itemprop="brand"> Kino Flo </span> <span itemprop="name"> KAS-24S Telescoping Shipping Case, Small - for up to three Kino-Flo 2.0' Fixtures </span> </a> </h3> ***** End Contents of e --- Contents of b [] --- End Contents of b ***** Contents of e <h3 data-selenium="itemHeading" class="bold fourteen"> <a href="http://www.bhphotovideo.com/c/product/258673-REG/Kino_Flo_KAS_41_KAS_41_Shipping_Case.html" class="c5" data-selenium="itemHeadingLink" itemprop="url"> <span itemprop="brand"> Kino Flo </span> <span itemprop="name"> KAS-41 Telescoping Shipping Case - for 1 Kino-Flo 4' Bank System </span> </a> </h3> ***** End Contents of e --- Contents of b [] --- End Contents of b ***** Contents of e <h3 data-selenium="itemHeading" class="bold fourteen"> <a href="http://www.bhphotovideo.com/c/product/258605-REG/Kino_Flo_KAS_D4_C_KAS_D4_C_Diva_Lite_400_Travel.html" class="c5" data-selenium="itemHeadingLink" itemprop="url"> <span itemprop="brand"> Kino Flo </span> <span itemprop="name"> KAS-D4-C Diva-Lite 400 Travel Case - for Kino Flo Diva-Lite 400 Lighting Kit </span> </a> </h3> ***** End Contents of e --- Contents of b [] --- End Contents of b ***** Contents of e <h3 data-selenium="itemHeading" class="bold fourteen"> <a href="http://www.bhphotovideo.com/c/product/656771-REG/Kino_Flo_KAS_GAF2_KAS_GAF2_Gaffer_Kit_Ship.html" class="c5" data-selenium="itemHeadingLink" itemprop="url"> <span itemprop="brand"> Kino Flo </span> <span itemprop="name"> KAS-GAF2 Gaffer Kit Ship Case </span> </a> </h3> ***** End Contents of e --- Contents of b [] --- End Contents of b ***** Contents of e <h3 data-selenium="itemHeading" class="bold fourteen"> <a href="http://www.bhphotovideo.com/c/product/884959-REG/kino_flo_kas_ce2_flight_case.html" class="c5" data-selenium="itemHeadingLink" itemprop="url"> <span itemprop="brand"> Kino Flo </span> <span itemprop="name"> KAS-CE2 Flight Case (Yellow) </span> </a> </h3> ***** End Contents of e --- Contents of b [] --- End Contents of b ***** Contents of e <h3 data-selenium="itemHeading" class="bold fourteen"> <a href="http://www.bhphotovideo.com/c/product/672726-REG/Kino_Flo_KAS_VH2_KAS_VH2_Vista_Single_Louver.html" class="c5" data-selenium="itemHeadingLink" itemprop="url"> <span itemprop="brand"> Kino Flo </span> <span itemprop="name"> KAS-VH2 Vista Single Louver Carry Case </span> </a> </h3> ***** End Contents of e --- Contents of b [] --- End Contents of b ***** Contents of e <h3 data-selenium="itemHeading" class="bold fourteen"> <a href="http://www.bhphotovideo.com/c/product/507360-REG/Kino_Flo_KAS_V31_Y_KAS_V31_Y_Yoke_Shipping_Case.html" class="c5" data-selenium="itemHeadingLink" itemprop="url"> <span itemprop="brand"> Kino Flo </span> <span itemprop="name"> KAS-V31-Y Yoke Shipping Case - for VistaBeam 300 Fluorescent Fixture with Yoke Mount </span> </a> </h3> ***** End Contents of e --- Contents of b [] --- End Contents of b ***** Contents of e <h3 data-selenium="itemHeading" class="bold fourteen"> <a href="http://www.bhphotovideo.com/c/product/507361-REG/Kino_Flo_KAS_V61_Y_KAS_V61_Y_Yoke_Shipping_Case.html" class="c5" data-selenium="itemHeadingLink" itemprop="url"> <span itemprop="brand"> Kino Flo </span> <span itemprop="name"> KAS-V61-Y Yoke Shipping Case - for VistaBeam 600 Fluorescent Fixture with Yoke Mount </span> </a> </h3> ***** End Contents of e --- Contents of b [] --- End Contents of b ***** Contents of e <h3 data-selenium="itemHeading" class="bold fourteen"> <a href="http://www.bhphotovideo.com/c/product/884962-REG/kino_flo_kas_ce2_y_yoke_ship_case.html" class="c5" data-selenium="itemHeadingLink" itemprop="url"> <span itemprop="brand"> Kino Flo </span> <span itemprop="name"> KAS-CE2-Y Yoke Ship Case (Black) </span> </a> </h3> ***** End Contents of e --- Contents of b [] --- End Contents of b ***** Contents of e <h3 data-selenium="itemHeading" class="bold fourteen"> <a href="http://www.bhphotovideo.com/c/product/507295-REG/Kino_Flo_KAS_D22_KAS_D22_Diva_Lite_200_Flight.html" class="c5" data-selenium="itemHeadingLink" itemprop="url"> <span itemprop="brand"> Kino Flo </span> <span itemprop="name"> KAS-D22 Diva-Lite 200 Flight Case - for Two each Kino-Flo Diva-Lite 200 Fixtures, Stands, Mounts, Flozier and Lamp Cases </span> </a> </h3> ***** End Contents of e --- Contents of b [] --- End Contents of b ***** Contents of e <h3 data-selenium="itemHeading" class="bold fourteen"> <a href="http://www.bhphotovideo.com/c/product/840408-REG/Kino_Flo_KAS_B4_C_Clamshell_Travel_Case_for.html" class="c5" data-selenium="itemHeadingLink" itemprop="url"> <span itemprop="brand"> Kino Flo </span> <span itemprop="name"> KAS-B4-C Clamshell Travel Case for One BarFly 400D Kit (Black) </span> </a> </h3> ***** End Contents of e --- Contents of b [] --- End Contents of b ***** Contents of e <h3 data-selenium="itemHeading" class="bold fourteen"> <a href="http://www.bhphotovideo.com/c/product/580907-REG/Kino_Flo_KAS_B41_KAS_B41_BarFly_400_Ship.html" class="c5" data-selenium="itemHeadingLink" itemprop="url"> <span itemprop="brand"> Kino Flo </span> <span itemprop="name"> KAS-B41 BarFly 400 Ship Case </span> </a> </h3> ***** End Contents of e --- Contents of b [] --- End Contents of b ***** Contents of e <h3 data-selenium="itemHeading" class="bold fourteen"> <a href="http://www.bhphotovideo.com/c/product/858258-REG/Kino_Flo_KAS_D4_CS_KAS_D4_CS_Diva_Lite_401_Travel.html" class="c5" data-selenium="itemHeadingLink" itemprop="url"> <span itemprop="brand"> Kino Flo </span> <span itemprop="name"> KAS-D4-CS Diva-Lite 401 Travel Case </span> </a> </h3> ***** End Contents of e --- Contents of b [] --- End Contents of b ***** Contents of e <h3 data-selenium="itemHeading" class="bold fourteen"> <a href="http://www.bhphotovideo.com/c/product/656769-REG/Kino_Flo_KAS_INT2_KAS_INT2_Interview_Ship_Case.html" class="c5" data-selenium="itemHeadingLink" itemprop="url"> <span itemprop="brand"> Kino Flo </span> <span itemprop="name"> KAS-INT2 Interview Ship Case </span> </a> </h3> ***** End Contents of e --- Contents of b [] --- End Contents of b ***** Contents of e <h3 data-selenium="itemHeading" class="bold fourteen"> <a href="http://www.bhphotovideo.com/c/product/656770-REG/Kino_Flo_KAS_INT3_KAS_INT3_Interview_Ship_Case.html" class="c5" data-selenium="itemHeadingLink" itemprop="url"> <span itemprop="brand"> Kino Flo </span> <span itemprop="name"> KAS-INT3 Interview Ship Case </span> </a> </h3> ***** End Contents of e --- Contents of b [] --- End Contents of b ***** Contents of e <h3 data-selenium="itemHeading" class="bold fourteen"> <a href="http://www.bhphotovideo.com/c/product/434699-REG/Kino_Flo_KAS_V31_KAS_V31_Center_Shipping_Case.html" class="c5" data-selenium="itemHeadingLink" itemprop="url"> <span itemprop="brand"> Kino Flo </span> <span itemprop="name"> KAS-V31 Center Shipping Case - for VistaBeam 300 Fluorescent Fixture with Center Mount </span> </a> </h3> ***** End Contents of e --- Contents of b [] --- End Contents of b ***** Contents of e <h3 data-selenium="itemHeading" class="bold fourteen"> <a href="http://www.bhphotovideo.com/c/product/434700-REG/Kino_Flo_KAS_V61_KAS_V61_Center_Shipping_Case.html" class="c5" data-selenium="itemHeadingLink" itemprop="url"> <span itemprop="brand"> Kino Flo </span> <span itemprop="name"> KAS-V61 Center Shipping Case - for VistaBeam 600 Fluorescent Fixture with Center Mount </span> </a> </h3> ***** End Contents of e --- Contents of b [] --- End Contents of b ***** Contents of e <h3 data-selenium="itemHeading" class="bold fourteen"> <a href="http://www.bhphotovideo.com/c/product/434701-REG/Kino_Flo_KAS_V62_KAS_V62_Center_Shipping_Case.html" class="c5" data-selenium="itemHeadingLink" itemprop="url"> <span itemprop="brand"> Kino Flo </span> <span itemprop="name"> KAS-V62 Center Shipping Case - for Two VistaBeam 600 Fluorescent Fixtures with Center Mount </span> </a> </h3> ***** End Contents of e --- Contents of b [] --- End Contents of b BUILD SUCCESSFUL (total time: 6 seconds) You can see why I am all confused. I have confirmed in my settings that I am using version 2.20. Thank you for all your help. ~ Steve ----- Stephen M. Paulsen Lowing Light & Grip > On Mar 4, 2016, at 4:25 PM, Ahmed Ashour <asa...@ya...> wrote: > > Hi Stephen, > > 'brand' is a descendant of 'itemHeading', not of 'itemDetail'. > > The below works with latest version (with a workaround for the failling JavaScript, a bug report should be created for this). > > > public static void main(String[] args) throws Exception { > try (final WebClient webClient = new WebClient(BrowserVersion.CHROME)) { > > String url = "http://www.bhphotovideo.com/c/search?Ntt=kas-"; > // Yes, this is a live site. Be nice. > HtmlPage page = webClient.getPage(url); > List<DomNode> nodeProduct = (List<DomNode>) page.getByXPath("//*[@data-selenium='itemHeading']"); > > if (nodeProduct.size() > 0) { > for (DomNode e : nodeProduct) { > System.out.println(e.asXml()); > List<DomNode> b = (List<DomNode>) page.getByXPath("//span[@itemprop='brand']"); > System.out.println(b); > } > } > } > } > > |
From: Stephen P. <st...@lo...> - 2016-03-08 19:19:37
|
AT LONG LAST... This works: public static void main(String[] args) throws Exception{ try (final WebClient webClient = new WebClient(BrowserVersion.CHROME)) { java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF); System.setProperty("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.NoOpLog"); String url = "http://www.bhphotovideo.com/c/search?Ntt=kas-"; HtmlPage page = webClient.getPage(url); List<DomNode> nodeProduct = (List<DomNode>) page.getByXPath("//*[@data-selenium='itemDetail']"); if (nodeProduct.size() > 0) { for (DomNode e : nodeProduct) { List<DomNode> b = (List<DomNode>) e.getByXPath(".//*[@itemprop='brand']"); List<DomNode> n = (List<DomNode>) e.getByXPath(".//*[@itemprop='name']"); List<DomNode> p = (List<DomNode>) e.getByXPath(".//*[@data-selenium='price']"); System.out.println(b.get(0).getTextContent() + " / " + n.get(0).getTextContent() + " / " + p.get(0).getTextContent().trim()); } } } } OUTPUT: run: Kino Flo / KAS-D2-C Diva-Lite 200 Travel Case / $218.63 Canon / CN7x17 KAS S Cine-Servo 17-120mm T2.95 (PL Mount) / $29,850.00 Canon / CN7x17 KAS S Cine-Servo 17-120mm T2.95 (EF Mount) / $29,850.00 Kino Flo / KAS-CL6 6-Lamp Carry Case / $45.38 Kino Flo / KAS-D42 Diva-Lite 400 Wheeled Flight Case - for Two each Kino-Flo Diva-Lite 400 Fixtures, Stands, Mounts, Flozier and Lamp Cases / $483.00 Kino Flo / KAS-CE2 Flight Case (Yellow) / $371.25 Kino Flo / KAS-CE2-C Clamshell Travel Case (Yellow) / $288.75 Kino Flo / KAS-D4-C Diva-Lite 400 Travel Case - for Kino Flo Diva-Lite 400 Lighting Kit / $282.00 Kino Flo / KAS-GAF2 Gaffer Kit Ship Case / $536.25 Kino Flo / KAS-V31-Y Yoke Shipping Case - for VistaBeam 300 Fluorescent Fixture with Yoke Mount / $649.95 Kino Flo / KAS-V61-Y Yoke Shipping Case - for VistaBeam 600 Fluorescent Fixture with Yoke Mount / $881.95 Kino Flo / KAS-24S Telescoping Shipping Case, Small - for up to three Kino-Flo 2.0' Fixtures / $136.13 Kino Flo / KAS-41 Telescoping Shipping Case - for 1 Kino-Flo 4' Bank System / $212.50 Kino Flo / KAS-CE2-Y Yoke Ship Case (Black) / $433.13 Kino Flo / KAS-D22 Diva-Lite 200 Flight Case - for Two each Kino-Flo Diva-Lite 200 Fixtures, Stands, Mounts, Flozier and Lamp Cases / $441.95 Kino Flo / KAS-B4-C Clamshell Travel Case for One BarFly 400D Kit (Black) / $325.88 Kino Flo / KAS-B41 BarFly 400 Ship Case / $488.75 Kino Flo / KAS-D4-CS Diva-Lite 401 Travel Case / $325.88 Kino Flo / KAS-INT2 Interview Ship Case / $474.38 Kino Flo / KAS-INT3 Interview Ship Case / $489.88 Kino Flo / KAS-V31 Center Shipping Case - for VistaBeam 300 Fluorescent Fixture with Center Mount / $629.95 Kino Flo / KAS-V61 Center Shipping Case - for VistaBeam 600 Fluorescent Fixture with Center Mount / $827.50 Kino Flo / KAS-V62 Center Shipping Case - for Two VistaBeam 600 Fluorescent Fixtures with Center Mount / $1,133.95 Kino Flo / KAS-VH2 Vista Single Louver Carry Case / $45.95 BUILD SUCCESSFUL (total time: 5 seconds) Ahmed, I question why the implementation of getByXPath is demanding I use the wildcards for the searches instead of being able to specify the elements. But, for now, as long as it works, I'm going to run with it. Thank you all for the input and suggestions. ~ Steve ----- Stephen M. Paulsen Lowing Light & Grip |
From: Ahmed A. <asa...@ya...> - 2016-03-09 10:15:59
|
Cool! >> why the implementation of getByXPath is demanding I use the wildcards for the searches instead of being able to specify the elements. It doesn't demand, but it is easier. You can also use "div/span[last()-1]" or "/html/body/span/div/input", etc. Keep in mind, you always need to know the exact HTML (generated by HtmlUnit or by real browser, which may differ), to have relevant XPath expressions. Ahmed From: Stephen Paulsen <st...@lo...> To: htm...@li...; Ahmed Ashour <asa...@ya...> Sent: Tuesday, March 8, 2016 8:19 PM Subject: Re: [Htmlunit-user] Nested getByXPath Has Me All Confused AT LONG LAST... This works: Ahmed, I question why the implementation of getByXPath is demanding I use the wildcards for the searches instead of being able to specify the elements. But, for now, as long as it works, I'm going to run with it. Thank you all for the input and suggestions. |
From: Albu G. <alb...@gm...> - 2016-03-04 19:55:58
|
http://stackoverflow.com/questions/16754752/java-htmlunit-failing-to-load-javascript Le 04/03/2016 19:18, Stephen Paulsen a écrit : > om.gargoylesoftware.htmlunit.DefaultCssErrorHandler error --- L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. http://www.avast.com |
From: Stephen P. <st...@lo...> - 2016-03-04 20:05:22
|
I have set my production code to ignore all the errors. They are irrelevant to my purpose. I included a brief summary of the errors to show that yes, I am running against the live site and it has things in it that HtmlUnit does not like. However, I am able to get a clean-enough copy of the site into my page object and parse it out for the pieces that I need, up to this point where I am having trouble. Thank you. Merci! ~ SMP ----- Stephen M. Paulsen Lowing Light & Grip > On Mar 4, 2016, at 2:55 PM, Albu Gmail <alb...@gm...> wrote: > > http://stackoverflow.com/questions/16754752/java-htmlunit-failing-to-load-javascript > > > Le 04/03/2016 19:18, Stephen Paulsen a écrit : >> om.gargoylesoftware.htmlunit.DefaultCssErrorHandler error > > > --- > L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. > http://www.avast.com > > > ------------------------------------------------------------------------------ > _______________________________________________ > Htmlunit-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlunit-user |