Menu

#15 active proxy mode for monitoring browser sessions

Backlog
open
nobody
None
5
2012-09-21
2010-11-04
Anonymous
No

if WebHarvest could be set up as a proxy server, it would become possibly to analyze all traffic between the browser and the server, this would make it possibly to look at HTTP HEADERS, HTTP REQUESTS etc.
That way, WebHarvest could maintain a detailed log of manually established sessions, so that users could select parts from the log that they want to reuse for their own scrapers. Basically, WebHarvest could then monitor a manual browser session and create list/tree view of actions - which could then be used as a template for creating a new scraper based on these recordings

This would make it possible to even create complex scrapers for websites that are largely JavaScript/DHTML (AJAX) driven and which are otherwise not easy to implement harvestors for.

Discussion

  • Nobody/Anonymous

    this is supported by twill: http://twill.idyll.org/browsing.html

     
  • Nobody/Anonymous

    SOCKS5 anyone?

     
  • Robert Bala

    Robert Bala - 2012-09-21
    • milestone: --> Backlog
     

Log in to post a comment.