From: Glenn K. L. <gle...@gm...> - 2015-01-21 02:44:35
|
Ruhua, It looks like you commented out the line in your job script that calls $HADOOP_HOME/bin/start-all.sh so it isn't clear to me where you are actually starting te job tracker in this script. The log file you sent does say the job tracker is starting though. Are you sure the log corresponds to the script you sent? If I had to guess, I'd say you started the persistent mode HDFS in one job script and are trying to use that same HDFS in a second job script. However you would still need to start-all.sh at the beginning of the second script since the stop-all.sh at the end of your first script shuts everything (including the job tracker) down. Glenn On Tuesday, January 20, 2015, Ruhua Jiang <ruh...@gm...> wrote: > Hello > > I am trying to run Hadoop (1.2.1) on top of a HPC infrastructure using > myHadoop(0.30). The HPC uses SLURM. > First I tried the word counting example using non-persist mode, here is > the script, I did some modification based on example code of Dr. Lockwood. > The result seems good. > > However, when I try to run the persist mode, there are some problems. We > are using GPFS. Here is the script: > #!/bin/bash > > ################################################################################ > # slurm.sbatch - A sample submit script for SLURM that illustrates how to > # spin up a Hadoop cluster for a map/reduce task using myHadoop > # > # Glenn K. Lockwood, San Diego Supercomputer Center February > 2014 > > ################################################################################ > #SBATCH -p Westmere > #SBATCH -n 4 > #SBATCH --ntasks-per-node=1 > #SBATCH -t 1:00:00 > > ### If these aren't already in your environment (e.g., .bashrc), you must > define > ### them. We assume hadoop and myHadoop were installed in > $HOME/hadoop-stack > export HADOOP_HOME=$HOME/hadoop-stack/hadoop-1.2.1 > export PATH=$HADOOP_HOME/bin:$HOME/hadoop-stack/myhadoop-0.30/bin:$PATH: > $PATH > export JAVA_HOME=/usr > > export HADOOP_CONF_DIR=$HOME/hadoop/conf/hadoop-conf.$SLURM_JOBID > export MH_SCRATCH_DIR=/tmp/$USER/$SLURM_JOBID > export MH_PERSIST_DIR=$HOME/hadoop/hdfs > myhadoop-configure.sh -s $MH_SCRATCH_DIR -p $MH_PERSIST_DIR > > if [ ! -f ./pg2701.txt ]; then > echo "*** Retrieving some sample input data" > wget 'http://www.gutenberg.org/cache/epub/2701/pg2701.txt' > fi > > ##$HADOOP_HOME/bin/start-all.sh > $HADOOP_HOME/bin/hadoop namenode > $HADOOP_HOME/bin/hadoop datanode > $HADOOP_HOME/bin/hadoop dfs -mkdir data > $HADOOP_HOME/bin/hadoop dfs -put ./pg2701.txt data/ > $HADOOP_HOME/bin/hadoop dfs -ls data > $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-examples-*.jar wordcount > data wordcount-output > $HADOOP_HOME/bin/hadoop dfs -ls wordcount-output > $HADOOP_HOME/bin/hadoop dfs -get wordcount-output ./ > > $HADOOP_HOME/bin/stop-all.sh > > myhadoop-cleanup.sh > > > Here is the log: > === > myHadoop: Using HADOOP_HOME=/home/hpc-ruhua/hadoop-stack/hadoop-1.2.1 > myHadoop: Using MH_SCRATCH_DIR=/tmp/hpc-ruhua/4128 > myHadoop: Using JAVA_HOME=/usr > myHadoop: Generating Hadoop configuration in directory in > /home/hpc-ruhua/hadoop/conf/hadoop-conf.4128... > myHadoop: Using directory /home/hpc-ruhua/hadoop/hdfs for persisting HDFS > state... > myHadoop: Designating cn53 as master node (namenode, secondary namenode, > and jobtracker) > myHadoop: The following nodes will be slaves (datanode, tasktracer): > cn53 > cn54 > cn55 > cn56 > Linking /home/hpc-ruhua/hadoop/hdfs/0 to /tmp/hpc-ruhua/4128/hdfs_data on > cn53 > Linking /home/hpc-ruhua/hadoop/hdfs/1 to /tmp/hpc-ruhua/4128/hdfs_data on > cn54 > Linking /home/hpc-ruhua/hadoop/hdfs/2 to /tmp/hpc-ruhua/4128/hdfs_data on > cn55 > Warning: Permanently added 'cn55,192.168.100.55' (RSA) to the list of > known hosts. > Linking /home/hpc-ruhua/hadoop/hdfs/3 to /tmp/hpc-ruhua/4128/hdfs_data on > cn56 > Warning: Permanently added 'cn56,192.168.100.56' (RSA) to the list of > known hosts. > starting namenode, logging to > /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-namenode-cn53.out > cn53: starting datanode, logging to > /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-datanode-cn53.out > cn54: starting datanode, logging to > /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-datanode-cn54.out > cn55: starting datanode, logging to > /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-datanode-cn55.out > cn56: starting datanode, logging to > /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-datanode-cn56.out > cn53: starting secondarynamenode, logging to > /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-secondarynamenode-cn53.out > starting jobtracker, logging to > /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-jobtracker-cn53.out > cn53: starting tasktracker, logging to > /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-tasktracker-cn53.out > cn56: starting tasktracker, logging to > /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-tasktracker-cn56.out > cn55: starting tasktracker, logging to > /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-tasktracker-cn55.out > cn54: starting tasktracker, logging to > /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-tasktracker-cn54.out > mkdir: cannot create directory data: File exists > put: Target data/pg2701.txt already exists > Found 1 items > -rw-r--r-- 3 hpc-ruhua supergroup 0 2015-01-07 00:09 > /user/hpc-ruhua/data/pg2701.txt > 15/01/14 12:21:08 ERROR security.UserGroupInformation: > PriviledgedActionException as:hpc-ruhua > cause:org.apache.hadoop.ipc.RemoteException: > org.apache.hadoop.mapred.JobTrackerNotYetInitializedException: JobTracker > is not yet RUNNING > at > org.apache.hadoop.mapred.JobTracker.checkJobTrackerState(JobTracker.java:5199) > at org.apache.hadoop.mapred.JobTracker.getNewJobId(JobTracker.java:3543) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426) > > org.apache.hadoop.ipc.RemoteException: > org.apache.hadoop.mapred.JobTrackerNotYetInitializedException: JobTracker > is not yet RUNNING > at > org.apache.hadoop.mapred.JobTracker.checkJobTrackerState(JobTracker.java:5199) > at org.apache.hadoop.mapred.JobTracker.getNewJobId(JobTracker.java:3543) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426) > > at org.apache.hadoop.ipc.Client.call(Client.java:1113) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) > at org.apache.hadoop.mapred.$Proxy2.getNewJobId(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) > at org.apache.hadoop.mapred.$Proxy2.getNewJobId(Unknown Source) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:944) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) > at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:550) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580) > at org.apache.hadoop.examples.WordCount.main(WordCount.java:82) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:160) > ls: Cannot access wordcount-output: No such file or directory. > get: null > stopping jobtracker > cn54: stopping tasktracker > cn55: stopping tasktracker > cn53: stopping tasktracker > cn56: stopping tasktracker > stopping namenode > cn53: no datanode to stop > cn54: no datanode to stop > cn56: no datanode to stop > cn55: no datanode to stop > === > The error is "ERROR security.UserGroupInformation: > PriviledgedActionException as:hpc-ruhua > cause:org.apache.hadoop.ipc.RemoteException:" it says JobTracker is not yet > running. > > Any idea about that? Thanks > > Best, > Ruhua Jiang > Graduate Student at University of Connecticut > HORNET Cluster Technical Support > |