The archive-crawler project is building Heritrix: a flexible, extensible, robust, and scalable web crawler capable of fetching, archiving, and analyzing the full diversity and breadth of internet-accesible content.
Features
- deeply and thoroughly harvests website content
- works on any Java platform (Linux recommended)
- stores content to ARC or ISO WARC aggregate/transcript format
- web interface for operator control and monitoring of crawls
License
GNU Library or Lesser General Public License version 2.0 (LGPLv2), Apache License V2.0Follow Heritrix: Internet Archive Web Crawler
Other Useful Business Software
Enterprise-grade ITSM, for every business
Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
Rate This Project
Login To Rate This Project
User Reviews
-
Cool
-
Cool.
-
Useful project. Thanks
-
Great software, thank you.
-
The app works well in my PC. Serves its purpose too, so no regrets for me.