Menu

Problem using download-multipage-list

Help
2006-09-27
2012-09-04
  • Rodrigo Rech

    Rodrigo Rech - 2006-09-27

    Hello, first let me say that i'm really enjoying Web Harvest, good job!

    But i'm having one problem with the download-multipage-list function (using the google_images.xml example). If i define more than one word as the search variable, such as:
    <var-def name="search">iron maiden</var-def>
    the following URLs are "harvested":
    1) Downloaded: http://images.google.com/images?q=iron maiden&hl=en&btnG=Search+Images&nojs=1
    2) Downloaded: http://images.google.com/images?q=iron+maiden&nojs=1&svnum=10&hl=en&lr=&start=20&sa=N
    3) Downloaded: http://images.google.com/images?q=iron%2Bmaiden&nojs=1&svnum=10&hl=en&lr=&start=40&sa=N
    4) Downloaded: http://images.google.com/
    And then always the same URL as number 4.

    Notice that the space character first is codified to + and after to %2B, is this behavior correct? The function only returns the images of URLs 1 and 2.

    Thanks for any help!
    Rodrigo

     
    • zoombongo

      zoombongo - 2006-10-12

      Problems with google_images example. I'm using webharvest0261.jar

      Config file is not harvesting multpie pages. First page is harvested fine

      Downloaded: http://images.google.com/images?q=platon&hl=en&btnG=Search+Images&nojs=1, mime type = text/html, length = 25915B.

      but the next 4 pages are harvested as:

      Downloaded: http://images.google.com/, mime type = text/html, length = 4334B

      Any help would be appreciated.

       
      • Vladimir Nikic

        Vladimir Nikic - 2006-10-12

        I have tried and realized that in some cases it works and in some not. The problem is in new version of TagSoup dependant library which I added in version 0.26. I have found also some other bugs regarding TagSoup. I'll consider some other library for html clean up.

         
    • zoombongo

      zoombongo - 2006-10-12

      Thank you for your consideration.

       
    • Vladimir Nikic

      Vladimir Nikic - 2006-09-27

      Yes, you are right. There is some bug with encoding URLs. I'll fix it as soon as possible.
      Thanks for your report.

       
    • Vladimir Nikic

      Vladimir Nikic - 2006-09-28

      Bug is fixed now in version 0.26.

       
    • Rodrigo Rech

      Rodrigo Rech - 2006-10-09

      Sorry for the delay, i was on vacation.

      Thanks a lot, now it's working fine :-)

       

Log in to post a comment.