RetriBlog consists of a grey-box framework for blog crawlers development. It provides a large collection of services for the development of blog crawlers. RetriBlog was implemented using both software engineering and artificial intelligence techniques in order to ensure good reusability by means of its component-based architecture; and specialized services for preprocessing, content extraction, classification, indexing and tag recommendation. The framework was developed following an architecture-centered process and implemented in Java according to the COSMOS* component implementation model.
More details about RetriBlog in following page:
https://sourceforge.net/apps/mediawiki/retriblog/index.php?title=Main_Page
Author: Rafael Ferreira Leite de Mello
Email: rflm@cin.ufpe.br and rafaelflmello@gmail.com