The website that you have mentioned is developed using ASP.net . So when you
click on that "Next" icon, it actually generates an onclick event which
executes the code on the server side and you can see the modified content on
the same url.
Web Harvest cannot be used to perform this onclick based events. You might
want to use HtmlUnit api to perform onclick based action and get the html
content of the page. Once you get the content of the page you can use Web
Harvest for parsing.
Hope this makes sense to you.
Thanks
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Could you show us an example of integration between webharvest and htmlunit?
I have same issue when asking for detail information in an specific URL. Seems
that those links are created on onclick event.... I'd like to integrate
HTMLUnit in a webharvest script, and once HTMLUnit leads me to that detail
information come back to parse that URL via webharvest..
Thanks in advance
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
How to scrape a site which has no change in url when we click on next page
button.
For example
http://www.rupapublications.co.in/client/Category/Biography.aspx
In this when we move to next page the url doesnt change what can be done for
such a multi-page scraping.
Thanks in advance
Kalyan
looks like ajax.
use firebug (or similar tool) to trace the HTTP "conversation" between client
and server so you can latch to the appropriate url
Thank you very much...
I will try to find how i can use firebug to find the appropriate url...
Anybody can give some pointers on how to do it will very useful to me.
Thanks in advance
Kalyan
http://www.evotech.net/blog/2007/06/introduction-to-
firebug/#fb_ajax
XHR tab in Net tab of firebug shows 0 requests when i navigate to next page.
Does it mean no ajax requests are being made...?
use the "ALL" filter to see all traffic.
there are two options:
Hi Kalyan,
The website that you have mentioned is developed using ASP.net . So when you
click on that "Next" icon, it actually generates an onclick event which
executes the code on the server side and you can see the modified content on
the same url.
Web Harvest cannot be used to perform this onclick based events. You might
want to use HtmlUnit api to perform onclick based action and get the html
content of the page. Once you get the content of the page you can use Web
Harvest for parsing.
Hope this makes sense to you.
Thanks
Hi,
Could you show us an example of integration between webharvest and htmlunit?
I have same issue when asking for detail information in an specific URL. Seems
that those links are created on onclick event.... I'd like to integrate
HTMLUnit in a webharvest script, and once HTMLUnit leads me to that detail
information come back to parse that URL via webharvest..
Thanks in advance