From: Ruhua J. <ruh...@gm...> - 2015-01-20 17:25:29
|
Sorry, the log corresponds to $HADOOP_HOME/bin/start-all.sh I tried to run following command instead of $HADOOP_HOME/bin/start-all.sh $HADOOP_HOME/bin/hadoop name node $HADOOP_HOME/bin/hadoop datanode Here is the log: myHadoop: Using HADOOP_HOME=/home/hpc-ruhua/hadoop-stack/hadoop-1.2.1 myHadoop: Using MH_SCRATCH_DIR=/tmp/hpc-ruhua/4178 myHadoop: Using JAVA_HOME=/usr myHadoop: Generating Hadoop configuration in directory in /home/hpc-ruhua/hadoop/conf/hadoop-conf.4178... myHadoop: Using directory /home/hpc-ruhua/hadoop/hdfs for persisting HDFS state... myHadoop: Designating cn53 as master node (namenode, secondary namenode, and jobtracker) myHadoop: The following nodes will be slaves (datanode, tasktracer): cn53 cn54 cn55 cn56 Linking /home/hpc-ruhua/hadoop/hdfs/0 to /tmp/hpc-ruhua/4178/hdfs_data on cn53 Linking /home/hpc-ruhua/hadoop/hdfs/1 to /tmp/hpc-ruhua/4178/hdfs_data on cn54 Linking /home/hpc-ruhua/hadoop/hdfs/2 to /tmp/hpc-ruhua/4178/hdfs_data on cn55 Linking /home/hpc-ruhua/hadoop/hdfs/3 to /tmp/hpc-ruhua/4178/hdfs_data on cn56 15/01/16 15:35:14 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = cn53/192.168.100.53 STARTUP_MSG: args = [] STARTUP_MSG: version = 1.2.1 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013 STARTUP_MSG: java = 1.7.0_71 ************************************************************/ 15/01/16 15:35:14 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 15/01/16 15:35:14 INFO impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered. 15/01/16 15:35:14 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 15/01/16 15:35:14 INFO impl.MetricsSystemImpl: NameNode metrics system started 15/01/16 15:35:14 INFO impl.MetricsSourceAdapter: MBean for source ugi registered. 15/01/16 15:35:14 INFO impl.MetricsSourceAdapter: MBean for source jvm registered. 15/01/16 15:35:14 INFO impl.MetricsSourceAdapter: MBean for source NameNode registered. 15/01/16 15:35:14 INFO util.GSet: Computing capacity for map BlocksMap 15/01/16 15:35:14 INFO util.GSet: VM type = 64-bit 15/01/16 15:35:14 INFO util.GSet: 2.0% max memory = 932184064 15/01/16 15:35:14 INFO util.GSet: capacity = 2^21 = 2097152 entries 15/01/16 15:35:14 INFO util.GSet: recommended=2097152, actual=2097152 15/01/16 15:35:15 INFO namenode.FSNamesystem: fsOwner=hpc-ruhua 15/01/16 15:35:15 INFO namenode.FSNamesystem: supergroup=supergroup 15/01/16 15:35:15 INFO namenode.FSNamesystem: isPermissionEnabled=true 15/01/16 15:35:15 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 15/01/16 15:35:15 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 15/01/16 15:35:15 INFO namenode.FSNamesystem: Registered FSNamesystemStateMBean and NameNodeMXBean 15/01/16 15:35:15 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0 15/01/16 15:35:15 INFO namenode.NameNode: Caching file names occuring more than 10 times 15/01/16 15:35:15 INFO common.Storage: Start loading image file /tmp/hpc-ruhua/4178/namenode_data/current/fsimage 15/01/16 15:35:15 INFO common.Storage: Number of files = 28 15/01/16 15:35:15 INFO common.Storage: Number of files under construction = 1 15/01/16 15:35:15 INFO common.Storage: Image file /tmp/hpc-ruhua/4178/namenode_data/current/fsimage of size 2996 bytes loaded in 0 seconds. 15/01/16 15:35:15 INFO namenode.FSEditLog: Start loading edits file /tmp/hpc-ruhua/4178/namenode_data/current/edits 15/01/16 15:35:15 INFO namenode.FSEditLog: Invalid opcode, reached end of edit log Number of transactions found: 32. Bytes read: 2579 15/01/16 15:35:15 INFO namenode.FSEditLog: Start checking end of edit log (/tmp/hpc-ruhua/4178/namenode_data/current/edits) ... 15/01/16 15:35:15 INFO namenode.FSEditLog: Checked the bytes after the end of edit log (/tmp/hpc-ruhua/4178/namenode_data/current/edits): 15/01/16 15:35:15 INFO namenode.FSEditLog: Padding position = 2579 (-1 means padding not found) 15/01/16 15:35:15 INFO namenode.FSEditLog: Edit log length = 1048580 15/01/16 15:35:15 INFO namenode.FSEditLog: Read length = 2579 15/01/16 15:35:15 INFO namenode.FSEditLog: Corruption length = 0 15/01/16 15:35:15 INFO namenode.FSEditLog: Toleration length = 0 (= dfs.namenode.edits.toleration.length) 15/01/16 15:35:15 INFO namenode.FSEditLog: Summary: |---------- Read=2579 ----------|-- Corrupt=0 --|-- Pad=1046001 --| 15/01/16 15:35:15 INFO namenode.FSEditLog: Edits file /tmp/hpc-ruhua/4178/namenode_data/current/edits of size 1048580 edits # 32 loaded in 0 seconds. 15/01/16 15:35:15 INFO common.Storage: Image file /tmp/hpc-ruhua/4178/namenode_data/current/fsimage of size 3745 bytes saved in 0 seconds. 15/01/16 15:35:15 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/tmp/hpc-ruhua/4178/namenode_data/current/edits 15/01/16 15:35:15 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/tmp/hpc-ruhua/4178/namenode_data/current/edits 15/01/16 15:35:16 INFO namenode.NameCache: initialized with 0 entries 0 lookups 15/01/16 15:35:16 INFO namenode.FSNamesystem: Finished loading FSImage in 1162 msecs 15/01/16 15:35:16 INFO namenode.FSNamesystem: dfs.safemode.threshold.pct = 0.9990000128746033 15/01/16 15:35:16 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0 15/01/16 15:35:16 INFO namenode.FSNamesystem: dfs.safemode.extension = 30000 15/01/16 15:35:16 INFO namenode.FSNamesystem: Number of blocks excluded by safe block count: 0 total blocks: 0 and thus the safe blocks: 0 15/01/16 15:35:16 INFO namenode.FSNamesystem: Total number of blocks = 0 15/01/16 15:35:16 INFO namenode.FSNamesystem: Number of invalid blocks = 0 15/01/16 15:35:16 INFO namenode.FSNamesystem: Number of under-replicated blocks = 0 15/01/16 15:35:16 INFO namenode.FSNamesystem: Number of over-replicated blocks = 0 15/01/16 15:35:16 INFO hdfs.StateChange: STATE* Safe mode termination scan for invalid, over- and under-replicated blocks completed in 7 msec 15/01/16 15:35:16 INFO hdfs.StateChange: STATE* Leaving safe mode after 1 secs 15/01/16 15:35:16 INFO hdfs.StateChange: STATE* Network topology has 0 racks and 0 datanodes 15/01/16 15:35:16 INFO hdfs.StateChange: STATE* UnderReplicatedBlocks has 0 blocks 15/01/16 15:35:16 INFO util.HostsFileReader: Refreshing hosts (include/exclude) list 15/01/16 15:35:16 INFO namenode.FSNamesystem: ReplicateQueue QueueProcessingStatistics: First cycle completed 0 blocks in 0 msec 15/01/16 15:35:16 INFO namenode.FSNamesystem: ReplicateQueue QueueProcessingStatistics: Queue flush completed 0 blocks in 0 msec processing time, 0 msec clock time, 1 cycles 15/01/16 15:35:16 INFO namenode.FSNamesystem: InvalidateQueue QueueProcessingStatistics: First cycle completed 0 blocks in 0 msec 15/01/16 15:35:16 INFO namenode.FSNamesystem: InvalidateQueue QueueProcessingStatistics: Queue flush completed 0 blocks in 0 msec processing time, 0 msec clock time, 1 cycles 15/01/16 15:35:16 INFO impl.MetricsSourceAdapter: MBean for source FSNamesystemMetrics registered. 15/01/16 15:35:16 INFO ipc.Server: Starting SocketReader 15/01/16 15:35:16 INFO impl.MetricsSourceAdapter: MBean for source RpcDetailedActivityForPort54310 registered. 15/01/16 15:35:16 INFO impl.MetricsSourceAdapter: MBean for source RpcActivityForPort54310 registered. 15/01/16 15:35:16 INFO namenode.NameNode: Namenode up at: cn53/192.168.100.53:54310 15/01/16 15:35:16 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 15/01/16 15:35:16 INFO http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter) 15/01/16 15:35:16 INFO http.HttpServer: dfs.webhdfs.enabled = false 15/01/16 15:35:16 INFO http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50070 15/01/16 15:35:16 INFO http.HttpServer: listener.getLocalPort() returned 50070 webServer.getConnectors()[0].getLocalPort() returned 50070 15/01/16 15:35:16 INFO http.HttpServer: Jetty bound to port 50070 15/01/16 15:35:16 INFO mortbay.log: jetty-6.1.26 15/01/16 15:35:16 INFO mortbay.log: Started SelectChannelConnector@0.0.0.0:50070 15/01/16 15:35:16 INFO namenode.NameNode: Web-server up at: 0.0.0.0:50070 15/01/16 15:35:16 INFO ipc.Server: IPC Server Responder: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server listener on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 0 on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 1 on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 2 on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 3 on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 4 on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 5 on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 6 on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 8 on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 7 on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 9 on 54310: starting Best. Ruhua On Jan 20, 2015, at 12:14 PM, Ruhua Jiang <ruh...@gm...> wrote: > Hello > > I am trying to run Hadoop (1.2.1) on top of a HPC infrastructure using myHadoop(0.30). The HPC uses SLURM. > First I tried the word counting example using non-persist mode, here is the script, I did some modification based on example code of Dr. Lockwood. The result seems good. > > However, when I try to run the persist mode, there are some problems. We are using GPFS. Here is the script: > #!/bin/bash > ################################################################################ > # slurm.sbatch - A sample submit script for SLURM that illustrates how to > # spin up a Hadoop cluster for a map/reduce task using myHadoop > # > # Glenn K. Lockwood, San Diego Supercomputer Center February 2014 > ################################################################################ > #SBATCH -p Westmere > #SBATCH -n 4 > #SBATCH --ntasks-per-node=1 > #SBATCH -t 1:00:00 > > ### If these aren't already in your environment (e.g., .bashrc), you must define > ### them. We assume hadoop and myHadoop were installed in $HOME/hadoop-stack > export HADOOP_HOME=$HOME/hadoop-stack/hadoop-1.2.1 > export PATH=$HADOOP_HOME/bin:$HOME/hadoop-stack/myhadoop-0.30/bin:$PATH:$PATH > export JAVA_HOME=/usr > > export HADOOP_CONF_DIR=$HOME/hadoop/conf/hadoop-conf.$SLURM_JOBID > export MH_SCRATCH_DIR=/tmp/$USER/$SLURM_JOBID > export MH_PERSIST_DIR=$HOME/hadoop/hdfs > myhadoop-configure.sh -s $MH_SCRATCH_DIR -p $MH_PERSIST_DIR > > if [ ! -f ./pg2701.txt ]; then > echo "*** Retrieving some sample input data" > wget 'http://www.gutenberg.org/cache/epub/2701/pg2701.txt' > fi > > ##$HADOOP_HOME/bin/start-all.sh > $HADOOP_HOME/bin/hadoop namenode > $HADOOP_HOME/bin/hadoop datanode > $HADOOP_HOME/bin/hadoop dfs -mkdir data > $HADOOP_HOME/bin/hadoop dfs -put ./pg2701.txt data/ > $HADOOP_HOME/bin/hadoop dfs -ls data > $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-examples-*.jar wordcount data wordcount-output > $HADOOP_HOME/bin/hadoop dfs -ls wordcount-output > $HADOOP_HOME/bin/hadoop dfs -get wordcount-output ./ > > $HADOOP_HOME/bin/stop-all.sh > > myhadoop-cleanup.sh > > > Here is the log: > === > myHadoop: Using HADOOP_HOME=/home/hpc-ruhua/hadoop-stack/hadoop-1.2.1 > myHadoop: Using MH_SCRATCH_DIR=/tmp/hpc-ruhua/4128 > myHadoop: Using JAVA_HOME=/usr > myHadoop: Generating Hadoop configuration in directory in /home/hpc-ruhua/hadoop/conf/hadoop-conf.4128... > myHadoop: Using directory /home/hpc-ruhua/hadoop/hdfs for persisting HDFS state... > myHadoop: Designating cn53 as master node (namenode, secondary namenode, and jobtracker) > myHadoop: The following nodes will be slaves (datanode, tasktracer): > cn53 > cn54 > cn55 > cn56 > Linking /home/hpc-ruhua/hadoop/hdfs/0 to /tmp/hpc-ruhua/4128/hdfs_data on cn53 > Linking /home/hpc-ruhua/hadoop/hdfs/1 to /tmp/hpc-ruhua/4128/hdfs_data on cn54 > Linking /home/hpc-ruhua/hadoop/hdfs/2 to /tmp/hpc-ruhua/4128/hdfs_data on cn55 > Warning: Permanently added 'cn55,192.168.100.55' (RSA) to the list of known hosts. > Linking /home/hpc-ruhua/hadoop/hdfs/3 to /tmp/hpc-ruhua/4128/hdfs_data on cn56 > Warning: Permanently added 'cn56,192.168.100.56' (RSA) to the list of known hosts. > starting namenode, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-namenode-cn53.out > cn53: starting datanode, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-datanode-cn53.out > cn54: starting datanode, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-datanode-cn54.out > cn55: starting datanode, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-datanode-cn55.out > cn56: starting datanode, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-datanode-cn56.out > cn53: starting secondarynamenode, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-secondarynamenode-cn53.out > starting jobtracker, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-jobtracker-cn53.out > cn53: starting tasktracker, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-tasktracker-cn53.out > cn56: starting tasktracker, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-tasktracker-cn56.out > cn55: starting tasktracker, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-tasktracker-cn55.out > cn54: starting tasktracker, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-tasktracker-cn54.out > mkdir: cannot create directory data: File exists > put: Target data/pg2701.txt already exists > Found 1 items > -rw-r--r-- 3 hpc-ruhua supergroup 0 2015-01-07 00:09 /user/hpc-ruhua/data/pg2701.txt > 15/01/14 12:21:08 ERROR security.UserGroupInformation: PriviledgedActionException as:hpc-ruhua cause:org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.mapred.JobTrackerNotYetInitializedException: JobTracker is not yet RUNNING > at org.apache.hadoop.mapred.JobTracker.checkJobTrackerState(JobTracker.java:5199) > at org.apache.hadoop.mapred.JobTracker.getNewJobId(JobTracker.java:3543) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426) > > org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.mapred.JobTrackerNotYetInitializedException: JobTracker is not yet RUNNING > at org.apache.hadoop.mapred.JobTracker.checkJobTrackerState(JobTracker.java:5199) > at org.apache.hadoop.mapred.JobTracker.getNewJobId(JobTracker.java:3543) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426) > > at org.apache.hadoop.ipc.Client.call(Client.java:1113) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) > at org.apache.hadoop.mapred.$Proxy2.getNewJobId(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) > at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) > at org.apache.hadoop.mapred.$Proxy2.getNewJobId(Unknown Source) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:944) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) > at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:550) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580) > at org.apache.hadoop.examples.WordCount.main(WordCount.java:82) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:160) > ls: Cannot access wordcount-output: No such file or directory. > get: null > stopping jobtracker > cn54: stopping tasktracker > cn55: stopping tasktracker > cn53: stopping tasktracker > cn56: stopping tasktracker > stopping namenode > cn53: no datanode to stop > cn54: no datanode to stop > cn56: no datanode to stop > cn55: no datanode to stop > === > The error is “ERROR security.UserGroupInformation: PriviledgedActionException as:hpc-ruhua cause:org.apache.hadoop.ipc.RemoteException:” it says JobTracker is not yet running. > > Any idea about that? Thanks > > Best, > Ruhua Jiang > Graduate Student at University of Connecticut > HORNET Cluster Technical Support |