I was trying to run an Pydoop MR job. My Mapper used some functions from a script which I created. But the job failed by showing the following error (It fails to import the functions from my scripts)
Any idea how to resolve the issue.
My code is http://paste.ubuntu.com/5609915/
Any solution for the same.
where did you install the "an" module? Is it available on all cluster nodes? Is it on sys.path and r-x accessible by the user that runs mapreduce programs (note that, in recent Hadoop versions, that's the "mapred" user).
To troubleshoot such problems, you can add the following lines to your program, before you import anything else:
import sys, os
print "user: %r" % os.getenv("USER")
print "sys.path: %r" % sys.path
and then check the "stdout logs" section of a task of your choice on the web UI.
Thanks for the reply. The 'an' module is a script available in the directory from where I am running the pydoop script. I will check the solution provided by you and update the same.
In that case the solution is simple: when you launch the application with hadoop pipes, add -files an.py to your command line. This is a shortcut provided by Hadoop tools to make it easier to use the distributed cache.
If you're interested in more advanced features, such as the automatic distribution of whole Python packages (even Pydoop itself!) to the cluster nodes, we have a guide for that in the Pydoop docs:
I solved the issue:
Steps I followed
1) Included the supporting script code in my MR script.
2) The sklearn joblib model reading from HDFS was the second issue. So I pickled the model and used hdfs.read to read the pickled model and un-pickle it.
Thanks for the support.