if WebHarvest could be set up as a proxy server, it would become possibly to analyze all traffic between the browser and the server, this would make it possibly to look at HTTP HEADERS, HTTP REQUESTS etc.
That way, WebHarvest could maintain a detailed log of manually established sessions, so that users could select parts from the log that they want to reuse for their own scrapers. Basically, WebHarvest could then monitor a manual browser session and create list/tree view of actions - which could then be used as a template for creating a new scraper based on these recordings
This would make it possible to even create complex scrapers for websites that are largely JavaScript/DHTML (AJAX) driven and which are otherwise not easy to implement harvestors for.
this is supported by twill: http://twill.idyll.org/browsing.html
SOCKS5 anyone?