Roach is a complete web scraping toolkit for PHP. It is a shameless clone heavily inspired by the popular Scrapy package for Python. Roach allows us to define spiders that crawl and scrape web documents. But wait, there’s more. Roach isn’t just a simple crawler, but includes an entire pipeline to clean, persist and otherwise process extracted data as well. It’s your all-in-one resource for web scraping in PHP. Roach doesn’t depend on a specific framework. Instead, you can use the core package on its own or install one of the framework-specific adapters. Currently, there’s a first-party adapter available to use Roach in your Laravel projects with more coming. Roach is built from the ground up with extensibility in mind. In fact, most of Roach’s built-in behavior works the exact same way that any custom extensions or middleware works.
Features
- Roach is a complete web scraping toolkit for PHP
- Roach is built from the ground up with extensibility in mind
- Roach doesn’t depend on a specific framework
- There’s a first-party adapter available to use Roach in your Laravel projects with more coming
- Includes an entire pipeline to clean, persist and otherwise process extracted data as well
- It’s your all-in-one resource for web scraping in PHP