This crawler helps to index binary documents such as PDF, Open Office, MS Office. Local file system (or a mounted drive) crawling and indexing new files, updating existing ones, and removing old ones. Remote file system over SSH/FTP crawling. REST interface to let you “upload” your binary documents to elastic search.
Open source Extract Transform Load engine written in Java
ETL Framework is a standalone Extract Transform Load engine written in Java. It includes executables for all major platforms and can be easily integrated into other applications.
Key Features:
* embeddable, open source and free
* fast and scalable
* uses target database features to do transformations and loads
* manual and automatic data mapping
* data streaming
* bulk data loads
* data quality features using SQL, JavaScript? and regex
* data transformations
Requirements
*...