osDQ dedicated to create apache spark based data pipeline using JSON
...This uses java API of apache spark. It can run in local mode also.
Get json example at https://github.com/arrahtech/osdq-spark
How to run
Unzip the zip file
Windows : java -cp .\lib\*;osdq-spark-0.0.1.jar org.arrah.framework.spark.run.TransformRunner -c .\example\samplerun.json
Mac UNIX
java -cp ./lib/*:./osdq-spark-0.0.1.jar org.arrah.framework.spark.run.TransformRunner -c ./example/samplerun.json
For those on windows, you need to have hadoop distribtion unzipped on local drive and HADOOP_HOME set. Also copy winutils.exe from here into HADOOP_HOME\bin
This is sister project for osDQ which provide Restful APIs
...This project will help projects which want embed data quality and data preparation features in their project or UI using restful calls.
Data Cleansing APIs
Dockerfile:
# Pull base image
FROM frnde/jetty-9.4.2-jre8-alpine-cet
ADD osdq-v0.0.1.war /var/lib/jetty/webapps/osdq.war
EXPOSE 8080
Docker Image
https://hub.docker.com/r/vreddym/osdq-web/tags