ahCrawler
A PHP search engine for your website and web analytics tool. GNU GPL3
ahCrawler is a set to implement your own search on your website and an analyzer for your web content. It can be used on a shared hosting.
It consists of
* crawler (spider) and indexer
* search for your website(s)
* search statistics
* website analyzer (http header, short titles and keywords, linkchecker, ...)
You need to install it on your own server. So all crawled data stay in your environment.
You never know when an external webspider updated your content. Trigger a rescan whenever you want - you always have under control what data of what time were checked.
The spider is a CLI tool and must be added as a cronjob.
...