haipproxy
Distributed proxy IP pool for web crawlers using Scrapy and Redis
...It automatically crawls proxy resources from the internet and aggregates them into a centralized pool that can be accessed by distributed spiders and scraping systems. It is built using Python and relies on Scrapy for high-performance crawling while Redis is used for data storage, communication, and task coordination between components. It includes crawlers that discover proxy servers, validators that test proxy availability and performance, and schedulers that manage crawling and validation tasks. HAipproxy aims to maintain a high availability proxy pool with low latency so that scraping frameworks can rotate proxies efficiently and avoid blocking during large-scale data collection. ...