I would like to know is it possible to pass parameters to Hadoop pipes map program.
Can i pass input to record reader also.
Please help.
The example you have shown only works with Hadoop script. can you provide a similar example for
example
Job Parameters
Suppose you want to select all lines containing a substring to be given at run time. Create a module grep.py:
def mapper(_, text, writer, conf): # notice the fourth 'conf' argument
if text.find(conf['grep-expression']) >= 0:
writer.emit("", text)
Job parameters, like in Hadoop pipes, are passed via the -D option:
when using the regular pipes API you can ask the TaskContext for the configuration object (.getJobConf()). The map function receives the TaskContext as an argument.
First of all awesome job.
I would like to know is it possible to pass parameters to Hadoop pipes map program.
Can i pass input to record reader also.
Please help.
The example you have shown only works with Hadoop script. can you provide a similar example for
example
Job Parameters
Suppose you want to select all lines containing a substring to be given at run time. Create a module grep.py:
def mapper(_, text, writer, conf): # notice the fourth 'conf' argument
if text.find(conf['grep-expression']) >= 0:
writer.emit("", text)
Job parameters, like in Hadoop pipes, are passed via the -D option:
pydoop script --num-reducers 0 -t '' -D grep-expression=my_substring \ grep.py hdfs_input hdfs_output
Hi Mayank,
when using the regular pipes API you can ask the TaskContext for the configuration object (.getJobConf()). The map function receives the TaskContext as an argument.
There's an example here: http://pydoop.sourceforge.net/docs/tutorial/mapred_api.html#record-readers-and-writers
Here you can find the API of the JobConf object: http://pydoop.sourceforge.net/docs/api_docs/mr_api.html#pydoop.pipes.JobConf
Luca