A scalable web crawler framework for Java
Open source enterprise search server for websites, files, and data
Java library for working with real-world HTML
Distributed web crawler admin platform for spiders management
ACHE is a web crawler for domain-specific search
Educational Python web scraping case collection for many sites
Ever wanted to download only a part of a Git repository.
Open source web crawler for Java
Lightweight Java web crawler framework with jQuery-style extraction
Android app for saving webpages for offline reading
WebCollector is an open source web crawler framework based on Java.
Open source Search Engine and Enterprise Search