I've been trying to get pydoop 0.6.4 to work with CDH4. After fixing the setup.py I was able to get it to compile and install cleanly however I am unable to get hdfs to connect correctly. Running on Centos 6.3.
-bash-4.1$ python
Python 2.7.3 (default, Sep 20 2012, 22:44:26)
on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pydoop.hdfs as pyhdfs
>>> fs = pyhdfs.fs.hdfs()
12/09/25 11:35:30 ERROR security.UserGroupInformation: PriviledgedActionException as:prdps (auth:SIMPLE) cause:java.io.IOException: No FileSystem for scheme: hdfs
Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2138)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2145)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:80)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2184)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2166)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:302)
at org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:148)
at org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:146)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:146)
Call to org.apache.hadoop.fs.Filesystem::get(URI, Configuration) failed!
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/py273/lib/python2.7/site-packages/pydoop/hdfs/fs.py", line 119, in __init__
h, p, u, fs = _get_connection_info(host, port, user)
File "/usr/local/py273/lib/python2.7/site-packages/pydoop/hdfs/fs.py", line 59, in _get_connection_info
fs = hdfs_ext.hdfs_fs(host, port, user)
IOError: Cannot connect to default
>>>
I get the same error when I put in a hostname/port. However the command line hadoop commands work fine…
I've figured this out. Turns out when pydoop was setting up its CLASSPATH correctly, it was not including hadoop-hdfs-2.0.0-cdh4.0.1.jar which isn't in /usr/lib/hadoop. So basically the fix was to symlink /usr/lib/hadoop/client/hadoop-hdfs-2.0.0-cdh4.0.1.jar into /usr/lib/hadoop which fixed the problem.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've been trying to get pydoop 0.6.4 to work with CDH4. After fixing the setup.py I was able to get it to compile and install cleanly however I am unable to get hdfs to connect correctly. Running on Centos 6.3.
-bash-4.1$ python
Python 2.7.3 (default, Sep 20 2012, 22:44:26)
on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pydoop.hdfs as pyhdfs
>>> fs = pyhdfs.fs.hdfs()
12/09/25 11:35:30 ERROR security.UserGroupInformation: PriviledgedActionException as:prdps (auth:SIMPLE) cause:java.io.IOException: No FileSystem for scheme: hdfs
Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2138)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2145)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:80)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2184)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2166)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:302)
at org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:148)
at org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:146)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:146)
Call to org.apache.hadoop.fs.Filesystem::get(URI, Configuration) failed!
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/py273/lib/python2.7/site-packages/pydoop/hdfs/fs.py", line 119, in __init__
h, p, u, fs = _get_connection_info(host, port, user)
File "/usr/local/py273/lib/python2.7/site-packages/pydoop/hdfs/fs.py", line 59, in _get_connection_info
fs = hdfs_ext.hdfs_fs(host, port, user)
IOError: Cannot connect to default
>>>
I get the same error when I put in a hostname/port. However the command line hadoop commands work fine…
-bash-4.1$ hadoop fs -ls hdfs://myhost/
Found 2 items
drwxr-xr-x - hdfs supergroup 0 2012-09-21 11:30 hdfs://myhost/system
drwxrwxrwt - hdfs supergroup 0 2012-06-07 17:56 hdfs://myhost/tmp
-bash-4.1$
Ideas?
I've figured this out. Turns out when pydoop was setting up its CLASSPATH correctly, it was not including hadoop-hdfs-2.0.0-cdh4.0.1.jar which isn't in /usr/lib/hadoop. So basically the fix was to symlink /usr/lib/hadoop/client/hadoop-hdfs-2.0.0-cdh4.0.1.jar into /usr/lib/hadoop which fixed the problem.
Err, I should read what I type before hitting send. pydoop was *not* setting up its CLASSPATH correctly… :)
Thanks liam821. Patches are always welcome :-) In the meantime, we opened a bug report for this issue: https://sourceforge.net/tracker/?func=detail&aid=3571867&group_id=272620&atid=1158938