I am going to be using Webharvest for scraping around 3000 websites. I must congratulate all the commiters for creating such a great library. Can you please help me in understanding options while using webharvest to deal with a scenario when a website has blocked crawlers/scrapers. I am anticipating some websites will pose such a scenario so wanted to understand how would i deal with it using webharvest library
can you please provide your valuable help
best regards
tarandeep
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi All
I am going to be using Webharvest for scraping around 3000 websites. I must congratulate all the commiters for creating such a great library. Can you please help me in understanding options while using webharvest to deal with a scenario when a website has blocked crawlers/scrapers. I am anticipating some websites will pose such a scenario so wanted to understand how would i deal with it using webharvest library
can you please provide your valuable help
best regards
tarandeep
Hi,
Can you give some example of website you are talking about? What mechanism exactly do you mean writing "blocked crawlers/scrapers"?
Cheers,
MC