osDQ dedicated to create apache spark based data pipeline using JSON
This is an offshoot project of open source data quality (osDQ) project https://sourceforge.net/projects/dataquality/
This sub project will create apache spark based data pipeline where JSON based metadata (file) will be used to run data processing , data pipeline , data quality and data preparation and data modeling features for big data. This uses java API of apache spark. It can run in local mode also.
Get json example at https://github.com/arrahtech/osdq-spark
How to...
Java utility that reads the metadata from table(s)
Dbmetadata is a Java utility that reads the metadata from table(s) in a specified database and creates the Informatica XML to import into the repository. I created this utility when we were migrating to a new platform and needed a quick way to create flatfile and relational sources and targets that matched the DDL of the table. I also needed to use shortcuts. If you use the import table list, it will create one XML file with all of the tables and shortcuts (if a shortcut folder is specified)...
KETL(tm) is a production ready ETL platform. The engine is built upon an open, multi-threaded, XML-based architecture. KETL's is designed to assist in the development and deployment of data integration efforts which require ETL and scheduling
Deploy in 115+ regions with the modern database for every enterprise.
MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.