Menu

#8 Add "Whole File Input Format"

open
nobody
5
2011-03-17
2011-03-17
Tim Yates
No

Writing an input format that gives an entire file as the value and the filename as the key could be very useful for some data sets, e.g. Project Gutenberg.

A few issues would have to be dealt with:
- Users (or instructors) would have to be able to select the appropriate input format for a dataset or job
- *Outputting* keys/values with newlines is impossible with the current streaming <--> wrapper library protocol. We could work around this by:
a) Switching to a "binary" protocol
b) Stripping or escaping newlines in emit functions
c) Ignoring the problem and telling users not to output newlines

Discussion


Log in to post a comment.