Crawly is a high-level application framework for crawling websites and extracting structured data using the Elixir programming language. It provides a complete environment for building web crawlers that systematically visit pages, collect information, and transform that data into structured formats for further processing. Crawly is designed for tasks such as data mining, information processing, and building historical archives of web content. Crawly follows the Elixir and OTP architecture model, enabling concurrent and fault-tolerant crawling processes that can handle many requests efficiently. Developers define specialized components called spiders to control how pages are visited and how information is extracted from them. It also supports extensibility through middlewares, pipelines, and fetchers that allow customization of request handling, data processing, and crawling behavior.
Features
- High-level framework for crawling websites and extracting structured data
- Spider modules for defining crawling logic and parsing page content
- Middleware system for processing requests before they are sent
- Pipelines for processing extracted items after scraping
- Concurrent and fault-tolerant architecture built on Elixir/OTP
- Support for configurable fetchers and extensible crawling workflows