Beanbun is a web crawling framework written in PHP that is designed to help developers build scalable and customizable web scrapers. Beanbun focuses on simplicity while providing a flexible architecture that supports complex crawling workflows. It supports both normal execution mode and daemon mode, allowing crawlers to run continuously in long-running background processes when needed. Beanbun uses a downloader component based on Guzzle and integrates with Workerman to enable multi-process crawling and efficient concurrency. It supports distributed crawling environments and multiple queue backends, including in-memory queues and Redis-based queues. Developers can customize how pages are downloaded, processed, and discovered through callback hooks and step-based crawling stages. With its extensible plugin architecture and configurable components, Beanbun allows users to implement custom queues, crawling strategies, and data processing logic.
Features
- Multi-process web crawling powered by Workerman
- Distributed crawling support for scalable data collection
- Multiple queue backends including memory and Redis
- Configurable URI filtering and crawl rules
- Breadth-first and depth-first crawling strategies
- Extensible plugin system for custom queues and crawlers