WebHarvest - web data extraction tool / Feature Requests / #15 active proxy mode for monitoring browser sessions

#15 active proxy mode for monitoring browser sessions

Milestone: Backlog

Status: open

Owner: nobody

Labels: None

Priority: 5

Updated: 2012-09-21

Created: 2010-11-04

Creator: Anonymous

Private: No

if WebHarvest could be set up as a proxy server, it would become possibly to analyze all traffic between the browser and the server, this would make it possibly to look at HTTP HEADERS, HTTP REQUESTS etc.
That way, WebHarvest could maintain a detailed log of manually established sessions, so that users could select parts from the log that they want to reuse for their own scrapers. Basically, WebHarvest could then monitor a manual browser session and create list/tree view of actions - which could then be used as a template for creating a new scraper based on these recordings

This would make it possible to even create complex scrapers for websites that are largely JavaScript/DHTML (AJAX) driven and which are otherwise not easy to implement harvestors for.

Discussion

Nobody/Anonymous - 2010-11-04

this is supported by twill: http://twill.idyll.org/browsing.html

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nobody/Anonymous - 2011-08-30

SOCKS5 anyone?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Robert Bala - 2012-09-21

milestone: --> Backlog
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

active proxy mode for monitoring browser sessions

Group

Searches

Help

#15 active proxy mode for monitoring browser sessions

Discussion