Importing functions from user scripts to pydoop script

Help
Jaganadh G
2013-03-13
2013-04-09
  • Jaganadh G
    Jaganadh G
    2013-03-13

    Hi,
    I was trying to run an Pydoop MR job. My Mapper used some functions from a script which I created. But the job failed by showing the following error (It fails to import the functions from my scripts)
    http://paste.ubuntu.com/5609903/

    Any idea how to resolve the issue.

    My code is http://paste.ubuntu.com/5609915/

    Any solution for the same.

    Best regards

    Jaganadh G

     
  • Simone Leo
    Simone Leo
    2013-03-14

    Hello,

    where did you install the "an" module? Is it available on all cluster nodes? Is it on sys.path and r-x accessible by the user that runs mapreduce programs (note that, in recent Hadoop versions, that's the "mapred" user).

    To troubleshoot such problems, you can add the following lines to your program, before you import anything else:

    import sys, os
    print "user: %r" % os.getenv("USER")
    print "sys.path: %r" % sys.path
    

    and then check the "stdout logs" section of a task of your choice on the web UI.

    Simone

     
    Last edit: Simone Leo 2013-03-14
  • Jaganadh G
    Jaganadh G
    2013-03-15

    @Simone :
    Thanks for the reply. The 'an' module is a script available in the directory from where I am running the pydoop script. I will check the solution provided by you and update the same.

    Best regards

    Jaganadh G

     
  • Simone Leo
    Simone Leo
    2013-03-15

    In that case the solution is simple: when you launch the application with hadoop pipes, add -files an.py to your command line. This is a shortcut provided by Hadoop tools to make it easier to use the distributed cache.

    If you're interested in more advanced features, such as the automatic distribution of whole Python packages (even Pydoop itself!) to the cluster nodes, we have a guide for that in the Pydoop docs:

    http://pydoop.sourceforge.net/docs/self_contained.html

    Simone

     
    Last edit: Simone Leo 2013-03-15
  • Jaganadh G
    Jaganadh G
    2013-04-09

    @Simone :
    I solved the issue:
    Steps I followed
    1) Included the supporting script code in my MR script.
    2) The sklearn joblib model reading from HDFS was the second issue. So I pickled the model and used hdfs.read to read the pickled model and un-pickle it.

    Thanks for the support.

    Best regards
    Jaggu