Command-line toolset for extracting text from files
multi-encoding strings(1) replacement with language identification
File type detector library
An open source search engine with RESTFul API and crawlers
simple BNF parser makes xml markup of matches
Trovi is a text search tool for PDF files