Easy Spider is a distributed Perl Web Crawler Project from 2006. It features code from crawling webpages, distributing it to a server and generating xml files from it. The client site can be any computer (Windows or Linux) and the Server stores all data.
Websites that use EasySpider and Perl/PHP Backends:
Webcrawlers are mostly the first thing to start programming at if you start your programming career. It is fun to look at some code that is few years ago and to see how one has improved himself.
(c) Sebastian Enger 2005-2015
- Client/Server Distributed Crawling
- Perl Programming Language
- Config File Support
- PDF,DOC,XLS,PPT Extraction Support
Easyspider is a perl client/Server architecture to crawl the web for interessting webpages. The Server can be any box that has internet access and allows perl programms to run. The client connects to the server, gets its working task, fullfills it and give the resuts as xml stream back to the server. the server then can install that xml file into a oracle/mysql/mariadb etc database or can be parsed by the sphinxsearch.com fulltext indexer to generate searchable content for your webpage. Happy Hacking ;-)