Menu

Need help with nextXPath

Help
Anonymous
2008-03-21
2012-09-04
  • Anonymous

    Anonymous - 2008-03-21

    I can't seem to get this to work where it will go to the next page.

    Here is the site I am scraping to test.
    http://www.blackborder.com/cgi-bin/prices/ex_sublist.cgi?sid=DwsHMHwf99&sub_id=184&page=0

    Here is the hrml that contains the href.
    <table border=0 cellpadding=10 cellspacing=0 width="94%">
    <tr><td><table border=0 cellpadding=4 cellspacing=0 width="94%"><tr><td>&nbsp;</td><td align="right">&nbsp;<a href="ex_sublist.cgi?sid=yffMCpZAtn&sub_id=184&page=1">Page 2&gt;&gt;</a></td></tr></table></td></tr>
    </table>

    Here is my nextXPath parameter value: <call-param name="nextXPath">//a[contains(., 'Page')]/@href</call-param>

    I get the data on the first page but that is it.
    Could the ampersands be giving me trouble?

    Any help would be appreciated!

     
    • Anonymous

      Anonymous - 2008-03-21

      Found the solution. I was basically pulling back duplicates.

      distinct-values was the solution.

      <call-param name="nextXPath">distinct-values(//a[contains(., 'Page ')]/@href)</call-param>

       
    • Anonymous

      Anonymous - 2008-03-21

      It looks like the issue is due to the href being a relative path. Here is the log where it goes to download another page. I don't know why it is repeating the url. Does the XQuery need to only return 1 result? How do I only return the first one?

      INFO - HtmlToXmlProcessor starts processing...
      INFO - HttpProcessor starts processing...
      INFO - Downloaded: http://www.blackborder.com/cgi-bin/prices/ex_sublist.cgi?sid=UlGTdZEwAQ&sub_id=184&page=1
      ex_sublist.cgi?sid=UlGTdZEwAQ&sub_id=184&page=1, mime type = text/html, length = 533B.
      INFO - HttpProcessor processor executed in 125ms.
      INFO - HtmlToXmlProcessor processor executed in 125ms.

       

Log in to post a comment.