From: Sriram K. <sr...@sd...> - 2011-03-25 00:55:52
|
Souris, Please use this mailing list for questions of this nature. Responses are inline below - > i have downloaded myHadoop to use it in a grid cluster using > torque... > > we have a node where we log in and submit jobs with qsub > (submit node --> from where i will run pbs-example.sh)... > qsub then allocates other nodes from the cluster (the worker > nodes..the initial submit node not included)... > i have the following questions: > 1) i must copy the hadoop-0.20.2 on all worker nodes right? Yes, Hadoop does need to be available on all the compute nodes. You might just want to make sure that it is available from a shared directory on something like NFS. > 2) mHadoop-0.2a must be on the submit node only or on the > sudmit node and on all of the worker nodes?? myHadoop itself doesn't have to be on all nodes. However, the HADOOP_CONF_DIR must be visible on all nodes. > 3) how to i start a job?? i run pbs-example.sh or qsub > pbs-example.sh from the submit node? if i run qsub > pbs-example it will run stat-all.sh script for hadoop on all > nodes which is not correct.. it must be done only in the > master node (or at least i think so)... also the master node > will not be on the submit node where i run > pbs-example.sh.... maybe i do something really silly... You should run "qsub pbs-example.sh". The script ensures that the start-all.sh is only done on the master node. > here is the output i get (not all cause i kill the job with > ctrl-c): > > out: > > Resources : cput=196:00:00 > neednodes=wn001.grid.tuc.gr+wn002.grid.tuc.gr+wn003.grid.tuc.gr > +wn004.grid.tuc.gr > nodes=4:ppn=1 walltime=218:00:00 > Walltime : 218:00:00 > Node_list : > wn001.grid.tuc.gr+wn002.grid.tuc.gr+wn003.grid.tuc.gr > +wn004.grid.tuc.gr,nodes=4:ppn=1,walltime=218:00:00 > > > Start all Hadoop daemons > starting namenode, logging to > /tmp/hadoop-test-dir/log-dir/hadoop-asouris-namenode- > wn001.grid.tuc.gr.out > wn002.grid.tuc.gr: Permission denied, please try again. > wn002.grid.tuc.gr: Permission denied, please try again. > wn002.grid.tuc.gr: Permission denied > (publickey,gssapi-with-mic,password). > wn004.grid.tuc.gr: Permission denied, please try again. > wn004.grid.tuc.gr: Permission denied, please try again. > wn004.grid.tuc.gr: Permission denied > (publickey,gssapi-with-mic,password). > wn003.grid.tuc.gr: Permission denied, please try again. > wn003.grid.tuc.gr: Permission denied, please try again. > wn003.grid.tuc.gr: Permission denied > (publickey,gssapi-with-mic,password). > wn001.grid.tuc.gr: Permission denied, please try again. > wn001.grid.tuc.gr: Permission denied, please try again. > wn001.grid.tuc.gr: Permission denied > (publickey,gssapi-with-mic,password). > wn001.grid.tuc.gr: Permission denied, please try again. > wn001.grid.tuc.gr: Permission denied, please try again. > wn001.grid.tuc.gr: Permission denied > (publickey,gssapi-with-mic,password). > starting jobtracker, logging to > /tmp/hadoop-test-dir/log-dir/hadoop-asouris-jobtracker- > wn001.grid.tuc.gr.out Are you sure that you can ssh to each of the individual nodes without password? Hadoop daemons are spawned using SSH. And it does require that you be able to do this in a password-less fashion. Thanks, Sriram |