[Myhadoop-users] Selecting proper directories for myHadoop...

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hello (Sriram?),

I am a scientific computing consultant for Georgia Tech and currently exploring several options to make hadoop available to our users. I really like the simplicity and flexibility of myHadoop, but could not make it work properly. I will appreciate your help a lot.

Just as a proof of concept, I am using directory on shared network volumes (NFS mounted locations), since we have some diskless nodes in the cluster. I am picking:

export MY_HADOOP_HOME="$HOME/Programs/myHadoop-0.2a"
export HADOOP_HOME="$HOME/Programs/hadoop-0.20.2"
export HADOOP_DATA_DIR="$HOME/hadoop-data-$HOSTNAME"
export HADOOP_LOG_DIR="$HOMEhadoop-log"

Now, given that all of these dirs are visible from all compute nodes, will this create a conflict? In fact the data dir was creating a conflict, so I needed to tag it using the hostname, but how about others?

I launch the job using:

$MY_HADOOP_HOME/bin/pbs-configure.sh -n 4 -c $HADOOP_CONF_DIR -p -d $HOME/HDFS

in the PBS script and getting these errors while connecting the namenode:

13/02/25 16:33:16 INFO ipc.Client: Retrying connect to server: iw-h34-1.pace.gatech.edu/172.26.74.75:9000. Already tried 9 time(s).
Bad connection to FS. command aborted.

I can provide the logs/pbs script, but just wanted to ask if there are any obvious mistake I am doing here, such as using a network attached directory while the directory needs to be local to the datanode. 

Thanks a lot in advance.
-Mehmet