PHPCrawl is a high configurable webcrawler/webspider-library written in PHP. It supports filters, limiters, cookie-handling, robots.txt-handling, multiprocessing and much more.
HarvestMan is a fully functional, multithreaded webcrawler cum offline-browser. It is highly customizable and supports as much as 55 plus options for controlling and customizing offline browsing. It is written entirely in the Python programming language.
The CMS-Bandits is a set of php scripts, with online html editor, calendar, search engine, rss reader, revision log, personal nickpage, comment system, webcrawler and even more.
Crawler.NET is a component-based distributed framework for web traversal intended for the .NET platform. It comprises of loosely coupled units each realizing a specific web crawler task. The main design goals are efficiency and flexibility.
Full-stack observability with actually useful AI | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.
Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
Mygale is a news-gathering webcrawler, written in Python. It searches a number of well-known news sites for Python-related articles. Currently doesn't support searching for other topics, but this may change in the future.