In the past, there where different design problems in the lib. With the 0.7 - releases, a lot of changes where introduced to make processing more flexible and easier to implement.
look into packages \org\jdeva\dataprocessing\datamodel for a new datamodell allowing to read csv and structured content from strings and org\jdeva\dataprocessing\mapreduce for a new processing modell following the pipe-pattern from unix.
So far, reading from csv and avro-files ist supported and writing to avro also. Other sources and sinks will be added in the future.
Open issues
- processing by partitioning
- more types (date, ...)
- config-files to construct pipe (there are some classes already as test classes)
- more pipes (sorting on disk, merging)