ripCurl is a custom PHP class based on libcurl that is designed for data retrieval of web content. It is well suited for retrieving lists of information spread across multiple pages in a site like job listings, products, etc.
list2db reads digested email files generated by the mailman mailing list software and converts them into SQL for a relational database. The project also includes a PHP frontend for users to search and browse archived list emails.
Larbin is a Web crawler intended to fetch a large number of Web pages, it should be able to fetch more than 100 millions pages on a standard PC with much u/d. This set of PHP and Perl scripts, called webtools4larbin, can handle the output of Larbin and p
BlueBox is PHP-MySQL powered search engine. It can be installed on every webserver without any permission. Only FTP and database management rights are required. BlueBox is very fast even on more than 1'000'000 pages scanned.