You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(2) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(1) |
Feb
(1) |
Mar
(2) |
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(2) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
(1) |
2015 |
Jan
(4) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Glenn K. L. <gle...@gm...> - 2015-01-21 02:44:35
|
Ruhua, It looks like you commented out the line in your job script that calls $HADOOP_HOME/bin/start-all.sh so it isn't clear to me where you are actually starting te job tracker in this script. The log file you sent does say the job tracker is starting though. Are you sure the log corresponds to the script you sent? If I had to guess, I'd say you started the persistent mode HDFS in one job script and are trying to use that same HDFS in a second job script. However you would still need to start-all.sh at the beginning of the second script since the stop-all.sh at the end of your first script shuts everything (including the job tracker) down. Glenn On Tuesday, January 20, 2015, Ruhua Jiang <ruh...@gm...> wrote: > Hello > > I am trying to run Hadoop (1.2.1) on top of a HPC infrastructure using > myHadoop(0.30). The HPC uses SLURM. > First I tried the word counting example using non-persist mode, here is > the script, I did some modification based on example code of Dr. Lockwood. > The result seems good. > > However, when I try to run the persist mode, there are some problems. We > are using GPFS. Here is the script: > #!/bin/bash > > ################################################################################ > # slurm.sbatch - A sample submit script for SLURM that illustrates how to > # spin up a Hadoop cluster for a map/reduce task using myHadoop > # > # Glenn K. Lockwood, San Diego Supercomputer Center February > 2014 > > ################################################################################ > #SBATCH -p Westmere > #SBATCH -n 4 > #SBATCH --ntasks-per-node=1 > #SBATCH -t 1:00:00 > > ### If these aren't already in your environment (e.g., .bashrc), you must > define > ### them. We assume hadoop and myHadoop were installed in > $HOME/hadoop-stack > export HADOOP_HOME=$HOME/hadoop-stack/hadoop-1.2.1 > export PATH=$HADOOP_HOME/bin:$HOME/hadoop-stack/myhadoop-0.30/bin:$PATH: > $PATH > export JAVA_HOME=/usr > > export HADOOP_CONF_DIR=$HOME/hadoop/conf/hadoop-conf.$SLURM_JOBID > export MH_SCRATCH_DIR=/tmp/$USER/$SLURM_JOBID > export MH_PERSIST_DIR=$HOME/hadoop/hdfs > myhadoop-configure.sh -s $MH_SCRATCH_DIR -p $MH_PERSIST_DIR > > if [ ! -f ./pg2701.txt ]; then > echo "*** Retrieving some sample input data" > wget 'http://www.gutenberg.org/cache/epub/2701/pg2701.txt' > fi > > ##$HADOOP_HOME/bin/start-all.sh > $HADOOP_HOME/bin/hadoop namenode > $HADOOP_HOME/bin/hadoop datanode > $HADOOP_HOME/bin/hadoop dfs -mkdir data > $HADOOP_HOME/bin/hadoop dfs -put ./pg2701.txt data/ > $HADOOP_HOME/bin/hadoop dfs -ls data > $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-examples-*.jar wordcount > data wordcount-output > $HADOOP_HOME/bin/hadoop dfs -ls wordcount-output > $HADOOP_HOME/bin/hadoop dfs -get wordcount-output ./ > > $HADOOP_HOME/bin/stop-all.sh > > myhadoop-cleanup.sh > > > Here is the log: > === > myHadoop: Using HADOOP_HOME=/home/hpc-ruhua/hadoop-stack/hadoop-1.2.1 > myHadoop: Using MH_SCRATCH_DIR=/tmp/hpc-ruhua/4128 > myHadoop: Using JAVA_HOME=/usr > myHadoop: Generating Hadoop configuration in directory in > /home/hpc-ruhua/hadoop/conf/hadoop-conf.4128... > myHadoop: Using directory /home/hpc-ruhua/hadoop/hdfs for persisting HDFS > state... > myHadoop: Designating cn53 as master node (namenode, secondary namenode, > and jobtracker) > myHadoop: The following nodes will be slaves (datanode, tasktracer): > cn53 > cn54 > cn55 > cn56 > Linking /home/hpc-ruhua/hadoop/hdfs/0 to /tmp/hpc-ruhua/4128/hdfs_data on > cn53 > Linking /home/hpc-ruhua/hadoop/hdfs/1 to /tmp/hpc-ruhua/4128/hdfs_data on > cn54 > Linking /home/hpc-ruhua/hadoop/hdfs/2 to /tmp/hpc-ruhua/4128/hdfs_data on > cn55 > Warning: Permanently added 'cn55,192.168.100.55' (RSA) to the list of > known hosts. > Linking /home/hpc-ruhua/hadoop/hdfs/3 to /tmp/hpc-ruhua/4128/hdfs_data on > cn56 > Warning: Permanently added 'cn56,192.168.100.56' (RSA) to the list of > known hosts. > starting namenode, logging to > /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-namenode-cn53.out > cn53: starting datanode, logging to > /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-datanode-cn53.out > cn54: starting datanode, logging to > /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-datanode-cn54.out > cn55: starting datanode, logging to > /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-datanode-cn55.out > cn56: starting datanode, logging to > /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-datanode-cn56.out > cn53: starting secondarynamenode, logging to > /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-secondarynamenode-cn53.out > starting jobtracker, logging to > /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-jobtracker-cn53.out > cn53: starting tasktracker, logging to > /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-tasktracker-cn53.out > cn56: starting tasktracker, logging to > /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-tasktracker-cn56.out > cn55: starting tasktracker, logging to > /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-tasktracker-cn55.out > cn54: starting tasktracker, logging to > /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-tasktracker-cn54.out > mkdir: cannot create directory data: File exists > put: Target data/pg2701.txt already exists > Found 1 items > -rw-r--r-- 3 hpc-ruhua supergroup 0 2015-01-07 00:09 > /user/hpc-ruhua/data/pg2701.txt > 15/01/14 12:21:08 ERROR security.UserGroupInformation: > PriviledgedActionException as:hpc-ruhua > cause:org.apache.hadoop.ipc.RemoteException: > org.apache.hadoop.mapred.JobTrackerNotYetInitializedException: JobTracker > is not yet RUNNING > at > org.apache.hadoop.mapred.JobTracker.checkJobTrackerState(JobTracker.java:5199) > at org.apache.hadoop.mapred.JobTracker.getNewJobId(JobTracker.java:3543) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426) > > org.apache.hadoop.ipc.RemoteException: > org.apache.hadoop.mapred.JobTrackerNotYetInitializedException: JobTracker > is not yet RUNNING > at > org.apache.hadoop.mapred.JobTracker.checkJobTrackerState(JobTracker.java:5199) > at org.apache.hadoop.mapred.JobTracker.getNewJobId(JobTracker.java:3543) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426) > > at org.apache.hadoop.ipc.Client.call(Client.java:1113) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) > at org.apache.hadoop.mapred.$Proxy2.getNewJobId(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) > at org.apache.hadoop.mapred.$Proxy2.getNewJobId(Unknown Source) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:944) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) > at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:550) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580) > at org.apache.hadoop.examples.WordCount.main(WordCount.java:82) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:160) > ls: Cannot access wordcount-output: No such file or directory. > get: null > stopping jobtracker > cn54: stopping tasktracker > cn55: stopping tasktracker > cn53: stopping tasktracker > cn56: stopping tasktracker > stopping namenode > cn53: no datanode to stop > cn54: no datanode to stop > cn56: no datanode to stop > cn55: no datanode to stop > === > The error is "ERROR security.UserGroupInformation: > PriviledgedActionException as:hpc-ruhua > cause:org.apache.hadoop.ipc.RemoteException:" it says JobTracker is not yet > running. > > Any idea about that? Thanks > > Best, > Ruhua Jiang > Graduate Student at University of Connecticut > HORNET Cluster Technical Support > |
From: Ruhua J. <ruh...@gm...> - 2015-01-20 17:25:29
|
Sorry, the log corresponds to $HADOOP_HOME/bin/start-all.sh I tried to run following command instead of $HADOOP_HOME/bin/start-all.sh $HADOOP_HOME/bin/hadoop name node $HADOOP_HOME/bin/hadoop datanode Here is the log: myHadoop: Using HADOOP_HOME=/home/hpc-ruhua/hadoop-stack/hadoop-1.2.1 myHadoop: Using MH_SCRATCH_DIR=/tmp/hpc-ruhua/4178 myHadoop: Using JAVA_HOME=/usr myHadoop: Generating Hadoop configuration in directory in /home/hpc-ruhua/hadoop/conf/hadoop-conf.4178... myHadoop: Using directory /home/hpc-ruhua/hadoop/hdfs for persisting HDFS state... myHadoop: Designating cn53 as master node (namenode, secondary namenode, and jobtracker) myHadoop: The following nodes will be slaves (datanode, tasktracer): cn53 cn54 cn55 cn56 Linking /home/hpc-ruhua/hadoop/hdfs/0 to /tmp/hpc-ruhua/4178/hdfs_data on cn53 Linking /home/hpc-ruhua/hadoop/hdfs/1 to /tmp/hpc-ruhua/4178/hdfs_data on cn54 Linking /home/hpc-ruhua/hadoop/hdfs/2 to /tmp/hpc-ruhua/4178/hdfs_data on cn55 Linking /home/hpc-ruhua/hadoop/hdfs/3 to /tmp/hpc-ruhua/4178/hdfs_data on cn56 15/01/16 15:35:14 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = cn53/192.168.100.53 STARTUP_MSG: args = [] STARTUP_MSG: version = 1.2.1 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013 STARTUP_MSG: java = 1.7.0_71 ************************************************************/ 15/01/16 15:35:14 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 15/01/16 15:35:14 INFO impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered. 15/01/16 15:35:14 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 15/01/16 15:35:14 INFO impl.MetricsSystemImpl: NameNode metrics system started 15/01/16 15:35:14 INFO impl.MetricsSourceAdapter: MBean for source ugi registered. 15/01/16 15:35:14 INFO impl.MetricsSourceAdapter: MBean for source jvm registered. 15/01/16 15:35:14 INFO impl.MetricsSourceAdapter: MBean for source NameNode registered. 15/01/16 15:35:14 INFO util.GSet: Computing capacity for map BlocksMap 15/01/16 15:35:14 INFO util.GSet: VM type = 64-bit 15/01/16 15:35:14 INFO util.GSet: 2.0% max memory = 932184064 15/01/16 15:35:14 INFO util.GSet: capacity = 2^21 = 2097152 entries 15/01/16 15:35:14 INFO util.GSet: recommended=2097152, actual=2097152 15/01/16 15:35:15 INFO namenode.FSNamesystem: fsOwner=hpc-ruhua 15/01/16 15:35:15 INFO namenode.FSNamesystem: supergroup=supergroup 15/01/16 15:35:15 INFO namenode.FSNamesystem: isPermissionEnabled=true 15/01/16 15:35:15 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 15/01/16 15:35:15 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 15/01/16 15:35:15 INFO namenode.FSNamesystem: Registered FSNamesystemStateMBean and NameNodeMXBean 15/01/16 15:35:15 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0 15/01/16 15:35:15 INFO namenode.NameNode: Caching file names occuring more than 10 times 15/01/16 15:35:15 INFO common.Storage: Start loading image file /tmp/hpc-ruhua/4178/namenode_data/current/fsimage 15/01/16 15:35:15 INFO common.Storage: Number of files = 28 15/01/16 15:35:15 INFO common.Storage: Number of files under construction = 1 15/01/16 15:35:15 INFO common.Storage: Image file /tmp/hpc-ruhua/4178/namenode_data/current/fsimage of size 2996 bytes loaded in 0 seconds. 15/01/16 15:35:15 INFO namenode.FSEditLog: Start loading edits file /tmp/hpc-ruhua/4178/namenode_data/current/edits 15/01/16 15:35:15 INFO namenode.FSEditLog: Invalid opcode, reached end of edit log Number of transactions found: 32. Bytes read: 2579 15/01/16 15:35:15 INFO namenode.FSEditLog: Start checking end of edit log (/tmp/hpc-ruhua/4178/namenode_data/current/edits) ... 15/01/16 15:35:15 INFO namenode.FSEditLog: Checked the bytes after the end of edit log (/tmp/hpc-ruhua/4178/namenode_data/current/edits): 15/01/16 15:35:15 INFO namenode.FSEditLog: Padding position = 2579 (-1 means padding not found) 15/01/16 15:35:15 INFO namenode.FSEditLog: Edit log length = 1048580 15/01/16 15:35:15 INFO namenode.FSEditLog: Read length = 2579 15/01/16 15:35:15 INFO namenode.FSEditLog: Corruption length = 0 15/01/16 15:35:15 INFO namenode.FSEditLog: Toleration length = 0 (= dfs.namenode.edits.toleration.length) 15/01/16 15:35:15 INFO namenode.FSEditLog: Summary: |---------- Read=2579 ----------|-- Corrupt=0 --|-- Pad=1046001 --| 15/01/16 15:35:15 INFO namenode.FSEditLog: Edits file /tmp/hpc-ruhua/4178/namenode_data/current/edits of size 1048580 edits # 32 loaded in 0 seconds. 15/01/16 15:35:15 INFO common.Storage: Image file /tmp/hpc-ruhua/4178/namenode_data/current/fsimage of size 3745 bytes saved in 0 seconds. 15/01/16 15:35:15 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/tmp/hpc-ruhua/4178/namenode_data/current/edits 15/01/16 15:35:15 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/tmp/hpc-ruhua/4178/namenode_data/current/edits 15/01/16 15:35:16 INFO namenode.NameCache: initialized with 0 entries 0 lookups 15/01/16 15:35:16 INFO namenode.FSNamesystem: Finished loading FSImage in 1162 msecs 15/01/16 15:35:16 INFO namenode.FSNamesystem: dfs.safemode.threshold.pct = 0.9990000128746033 15/01/16 15:35:16 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0 15/01/16 15:35:16 INFO namenode.FSNamesystem: dfs.safemode.extension = 30000 15/01/16 15:35:16 INFO namenode.FSNamesystem: Number of blocks excluded by safe block count: 0 total blocks: 0 and thus the safe blocks: 0 15/01/16 15:35:16 INFO namenode.FSNamesystem: Total number of blocks = 0 15/01/16 15:35:16 INFO namenode.FSNamesystem: Number of invalid blocks = 0 15/01/16 15:35:16 INFO namenode.FSNamesystem: Number of under-replicated blocks = 0 15/01/16 15:35:16 INFO namenode.FSNamesystem: Number of over-replicated blocks = 0 15/01/16 15:35:16 INFO hdfs.StateChange: STATE* Safe mode termination scan for invalid, over- and under-replicated blocks completed in 7 msec 15/01/16 15:35:16 INFO hdfs.StateChange: STATE* Leaving safe mode after 1 secs 15/01/16 15:35:16 INFO hdfs.StateChange: STATE* Network topology has 0 racks and 0 datanodes 15/01/16 15:35:16 INFO hdfs.StateChange: STATE* UnderReplicatedBlocks has 0 blocks 15/01/16 15:35:16 INFO util.HostsFileReader: Refreshing hosts (include/exclude) list 15/01/16 15:35:16 INFO namenode.FSNamesystem: ReplicateQueue QueueProcessingStatistics: First cycle completed 0 blocks in 0 msec 15/01/16 15:35:16 INFO namenode.FSNamesystem: ReplicateQueue QueueProcessingStatistics: Queue flush completed 0 blocks in 0 msec processing time, 0 msec clock time, 1 cycles 15/01/16 15:35:16 INFO namenode.FSNamesystem: InvalidateQueue QueueProcessingStatistics: First cycle completed 0 blocks in 0 msec 15/01/16 15:35:16 INFO namenode.FSNamesystem: InvalidateQueue QueueProcessingStatistics: Queue flush completed 0 blocks in 0 msec processing time, 0 msec clock time, 1 cycles 15/01/16 15:35:16 INFO impl.MetricsSourceAdapter: MBean for source FSNamesystemMetrics registered. 15/01/16 15:35:16 INFO ipc.Server: Starting SocketReader 15/01/16 15:35:16 INFO impl.MetricsSourceAdapter: MBean for source RpcDetailedActivityForPort54310 registered. 15/01/16 15:35:16 INFO impl.MetricsSourceAdapter: MBean for source RpcActivityForPort54310 registered. 15/01/16 15:35:16 INFO namenode.NameNode: Namenode up at: cn53/192.168.100.53:54310 15/01/16 15:35:16 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 15/01/16 15:35:16 INFO http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter) 15/01/16 15:35:16 INFO http.HttpServer: dfs.webhdfs.enabled = false 15/01/16 15:35:16 INFO http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50070 15/01/16 15:35:16 INFO http.HttpServer: listener.getLocalPort() returned 50070 webServer.getConnectors()[0].getLocalPort() returned 50070 15/01/16 15:35:16 INFO http.HttpServer: Jetty bound to port 50070 15/01/16 15:35:16 INFO mortbay.log: jetty-6.1.26 15/01/16 15:35:16 INFO mortbay.log: Started SelectChannelConnector@0.0.0.0:50070 15/01/16 15:35:16 INFO namenode.NameNode: Web-server up at: 0.0.0.0:50070 15/01/16 15:35:16 INFO ipc.Server: IPC Server Responder: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server listener on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 0 on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 1 on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 2 on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 3 on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 4 on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 5 on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 6 on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 8 on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 7 on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 9 on 54310: starting Best. Ruhua On Jan 20, 2015, at 12:14 PM, Ruhua Jiang <ruh...@gm...> wrote: > Hello > > I am trying to run Hadoop (1.2.1) on top of a HPC infrastructure using myHadoop(0.30). The HPC uses SLURM. > First I tried the word counting example using non-persist mode, here is the script, I did some modification based on example code of Dr. Lockwood. The result seems good. > > However, when I try to run the persist mode, there are some problems. We are using GPFS. Here is the script: > #!/bin/bash > ################################################################################ > # slurm.sbatch - A sample submit script for SLURM that illustrates how to > # spin up a Hadoop cluster for a map/reduce task using myHadoop > # > # Glenn K. Lockwood, San Diego Supercomputer Center February 2014 > ################################################################################ > #SBATCH -p Westmere > #SBATCH -n 4 > #SBATCH --ntasks-per-node=1 > #SBATCH -t 1:00:00 > > ### If these aren't already in your environment (e.g., .bashrc), you must define > ### them. We assume hadoop and myHadoop were installed in $HOME/hadoop-stack > export HADOOP_HOME=$HOME/hadoop-stack/hadoop-1.2.1 > export PATH=$HADOOP_HOME/bin:$HOME/hadoop-stack/myhadoop-0.30/bin:$PATH:$PATH > export JAVA_HOME=/usr > > export HADOOP_CONF_DIR=$HOME/hadoop/conf/hadoop-conf.$SLURM_JOBID > export MH_SCRATCH_DIR=/tmp/$USER/$SLURM_JOBID > export MH_PERSIST_DIR=$HOME/hadoop/hdfs > myhadoop-configure.sh -s $MH_SCRATCH_DIR -p $MH_PERSIST_DIR > > if [ ! -f ./pg2701.txt ]; then > echo "*** Retrieving some sample input data" > wget 'http://www.gutenberg.org/cache/epub/2701/pg2701.txt' > fi > > ##$HADOOP_HOME/bin/start-all.sh > $HADOOP_HOME/bin/hadoop namenode > $HADOOP_HOME/bin/hadoop datanode > $HADOOP_HOME/bin/hadoop dfs -mkdir data > $HADOOP_HOME/bin/hadoop dfs -put ./pg2701.txt data/ > $HADOOP_HOME/bin/hadoop dfs -ls data > $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-examples-*.jar wordcount data wordcount-output > $HADOOP_HOME/bin/hadoop dfs -ls wordcount-output > $HADOOP_HOME/bin/hadoop dfs -get wordcount-output ./ > > $HADOOP_HOME/bin/stop-all.sh > > myhadoop-cleanup.sh > > > Here is the log: > === > myHadoop: Using HADOOP_HOME=/home/hpc-ruhua/hadoop-stack/hadoop-1.2.1 > myHadoop: Using MH_SCRATCH_DIR=/tmp/hpc-ruhua/4128 > myHadoop: Using JAVA_HOME=/usr > myHadoop: Generating Hadoop configuration in directory in /home/hpc-ruhua/hadoop/conf/hadoop-conf.4128... > myHadoop: Using directory /home/hpc-ruhua/hadoop/hdfs for persisting HDFS state... > myHadoop: Designating cn53 as master node (namenode, secondary namenode, and jobtracker) > myHadoop: The following nodes will be slaves (datanode, tasktracer): > cn53 > cn54 > cn55 > cn56 > Linking /home/hpc-ruhua/hadoop/hdfs/0 to /tmp/hpc-ruhua/4128/hdfs_data on cn53 > Linking /home/hpc-ruhua/hadoop/hdfs/1 to /tmp/hpc-ruhua/4128/hdfs_data on cn54 > Linking /home/hpc-ruhua/hadoop/hdfs/2 to /tmp/hpc-ruhua/4128/hdfs_data on cn55 > Warning: Permanently added 'cn55,192.168.100.55' (RSA) to the list of known hosts. > Linking /home/hpc-ruhua/hadoop/hdfs/3 to /tmp/hpc-ruhua/4128/hdfs_data on cn56 > Warning: Permanently added 'cn56,192.168.100.56' (RSA) to the list of known hosts. > starting namenode, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-namenode-cn53.out > cn53: starting datanode, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-datanode-cn53.out > cn54: starting datanode, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-datanode-cn54.out > cn55: starting datanode, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-datanode-cn55.out > cn56: starting datanode, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-datanode-cn56.out > cn53: starting secondarynamenode, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-secondarynamenode-cn53.out > starting jobtracker, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-jobtracker-cn53.out > cn53: starting tasktracker, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-tasktracker-cn53.out > cn56: starting tasktracker, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-tasktracker-cn56.out > cn55: starting tasktracker, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-tasktracker-cn55.out > cn54: starting tasktracker, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-tasktracker-cn54.out > mkdir: cannot create directory data: File exists > put: Target data/pg2701.txt already exists > Found 1 items > -rw-r--r-- 3 hpc-ruhua supergroup 0 2015-01-07 00:09 /user/hpc-ruhua/data/pg2701.txt > 15/01/14 12:21:08 ERROR security.UserGroupInformation: PriviledgedActionException as:hpc-ruhua cause:org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.mapred.JobTrackerNotYetInitializedException: JobTracker is not yet RUNNING > at org.apache.hadoop.mapred.JobTracker.checkJobTrackerState(JobTracker.java:5199) > at org.apache.hadoop.mapred.JobTracker.getNewJobId(JobTracker.java:3543) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426) > > org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.mapred.JobTrackerNotYetInitializedException: JobTracker is not yet RUNNING > at org.apache.hadoop.mapred.JobTracker.checkJobTrackerState(JobTracker.java:5199) > at org.apache.hadoop.mapred.JobTracker.getNewJobId(JobTracker.java:3543) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426) > > at org.apache.hadoop.ipc.Client.call(Client.java:1113) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) > at org.apache.hadoop.mapred.$Proxy2.getNewJobId(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) > at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) > at org.apache.hadoop.mapred.$Proxy2.getNewJobId(Unknown Source) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:944) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) > at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:550) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580) > at org.apache.hadoop.examples.WordCount.main(WordCount.java:82) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:160) > ls: Cannot access wordcount-output: No such file or directory. > get: null > stopping jobtracker > cn54: stopping tasktracker > cn55: stopping tasktracker > cn53: stopping tasktracker > cn56: stopping tasktracker > stopping namenode > cn53: no datanode to stop > cn54: no datanode to stop > cn56: no datanode to stop > cn55: no datanode to stop > === > The error is “ERROR security.UserGroupInformation: PriviledgedActionException as:hpc-ruhua cause:org.apache.hadoop.ipc.RemoteException:” it says JobTracker is not yet running. > > Any idea about that? Thanks > > Best, > Ruhua Jiang > Graduate Student at University of Connecticut > HORNET Cluster Technical Support |
From: Ruhua J. <ruh...@gm...> - 2015-01-20 17:14:56
|
Hello I am trying to run Hadoop (1.2.1) on top of a HPC infrastructure using myHadoop(0.30). The HPC uses SLURM. First I tried the word counting example using non-persist mode, here is the script, I did some modification based on example code of Dr. Lockwood. The result seems good. However, when I try to run the persist mode, there are some problems. We are using GPFS. Here is the script: #!/bin/bash ################################################################################ # slurm.sbatch - A sample submit script for SLURM that illustrates how to # spin up a Hadoop cluster for a map/reduce task using myHadoop # # Glenn K. Lockwood, San Diego Supercomputer Center February 2014 ################################################################################ #SBATCH -p Westmere #SBATCH -n 4 #SBATCH --ntasks-per-node=1 #SBATCH -t 1:00:00 ### If these aren't already in your environment (e.g., .bashrc), you must define ### them. We assume hadoop and myHadoop were installed in $HOME/hadoop-stack export HADOOP_HOME=$HOME/hadoop-stack/hadoop-1.2.1 export PATH=$HADOOP_HOME/bin:$HOME/hadoop-stack/myhadoop-0.30/bin:$PATH:$PATH export JAVA_HOME=/usr export HADOOP_CONF_DIR=$HOME/hadoop/conf/hadoop-conf.$SLURM_JOBID export MH_SCRATCH_DIR=/tmp/$USER/$SLURM_JOBID export MH_PERSIST_DIR=$HOME/hadoop/hdfs myhadoop-configure.sh -s $MH_SCRATCH_DIR -p $MH_PERSIST_DIR if [ ! -f ./pg2701.txt ]; then echo "*** Retrieving some sample input data" wget 'http://www.gutenberg.org/cache/epub/2701/pg2701.txt' fi ##$HADOOP_HOME/bin/start-all.sh $HADOOP_HOME/bin/hadoop namenode $HADOOP_HOME/bin/hadoop datanode $HADOOP_HOME/bin/hadoop dfs -mkdir data $HADOOP_HOME/bin/hadoop dfs -put ./pg2701.txt data/ $HADOOP_HOME/bin/hadoop dfs -ls data $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-examples-*.jar wordcount data wordcount-output $HADOOP_HOME/bin/hadoop dfs -ls wordcount-output $HADOOP_HOME/bin/hadoop dfs -get wordcount-output ./ $HADOOP_HOME/bin/stop-all.sh myhadoop-cleanup.sh Here is the log: === myHadoop: Using HADOOP_HOME=/home/hpc-ruhua/hadoop-stack/hadoop-1.2.1 myHadoop: Using MH_SCRATCH_DIR=/tmp/hpc-ruhua/4128 myHadoop: Using JAVA_HOME=/usr myHadoop: Generating Hadoop configuration in directory in /home/hpc-ruhua/hadoop/conf/hadoop-conf.4128... myHadoop: Using directory /home/hpc-ruhua/hadoop/hdfs for persisting HDFS state... myHadoop: Designating cn53 as master node (namenode, secondary namenode, and jobtracker) myHadoop: The following nodes will be slaves (datanode, tasktracer): cn53 cn54 cn55 cn56 Linking /home/hpc-ruhua/hadoop/hdfs/0 to /tmp/hpc-ruhua/4128/hdfs_data on cn53 Linking /home/hpc-ruhua/hadoop/hdfs/1 to /tmp/hpc-ruhua/4128/hdfs_data on cn54 Linking /home/hpc-ruhua/hadoop/hdfs/2 to /tmp/hpc-ruhua/4128/hdfs_data on cn55 Warning: Permanently added 'cn55,192.168.100.55' (RSA) to the list of known hosts. Linking /home/hpc-ruhua/hadoop/hdfs/3 to /tmp/hpc-ruhua/4128/hdfs_data on cn56 Warning: Permanently added 'cn56,192.168.100.56' (RSA) to the list of known hosts. starting namenode, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-namenode-cn53.out cn53: starting datanode, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-datanode-cn53.out cn54: starting datanode, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-datanode-cn54.out cn55: starting datanode, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-datanode-cn55.out cn56: starting datanode, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-datanode-cn56.out cn53: starting secondarynamenode, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-secondarynamenode-cn53.out starting jobtracker, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-jobtracker-cn53.out cn53: starting tasktracker, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-tasktracker-cn53.out cn56: starting tasktracker, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-tasktracker-cn56.out cn55: starting tasktracker, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-tasktracker-cn55.out cn54: starting tasktracker, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-tasktracker-cn54.out mkdir: cannot create directory data: File exists put: Target data/pg2701.txt already exists Found 1 items -rw-r--r-- 3 hpc-ruhua supergroup 0 2015-01-07 00:09 /user/hpc-ruhua/data/pg2701.txt 15/01/14 12:21:08 ERROR security.UserGroupInformation: PriviledgedActionException as:hpc-ruhua cause:org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.mapred.JobTrackerNotYetInitializedException: JobTracker is not yet RUNNING at org.apache.hadoop.mapred.JobTracker.checkJobTrackerState(JobTracker.java:5199) at org.apache.hadoop.mapred.JobTracker.getNewJobId(JobTracker.java:3543) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426) org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.mapred.JobTrackerNotYetInitializedException: JobTracker is not yet RUNNING at org.apache.hadoop.mapred.JobTracker.checkJobTrackerState(JobTracker.java:5199) at org.apache.hadoop.mapred.JobTracker.getNewJobId(JobTracker.java:3543) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426) at org.apache.hadoop.ipc.Client.call(Client.java:1113) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) at org.apache.hadoop.mapred.$Proxy2.getNewJobId(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) at org.apache.hadoop.mapred.$Proxy2.getNewJobId(Unknown Source) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:944) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) at org.apache.hadoop.mapreduce.Job.submit(Job.java:550) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580) at org.apache.hadoop.examples.WordCount.main(WordCount.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) ls: Cannot access wordcount-output: No such file or directory. get: null stopping jobtracker cn54: stopping tasktracker cn55: stopping tasktracker cn53: stopping tasktracker cn56: stopping tasktracker stopping namenode cn53: no datanode to stop cn54: no datanode to stop cn56: no datanode to stop cn55: no datanode to stop === The error is “ERROR security.UserGroupInformation: PriviledgedActionException as:hpc-ruhua cause:org.apache.hadoop.ipc.RemoteException:” it says JobTracker is not yet running. Any idea about that? Thanks Best, Ruhua Jiang Graduate Student at University of Connecticut HORNET Cluster Technical Support |
From: Ruhua J. <ruh...@gm...> - 2015-01-17 17:27:06
|
Hello I am trying to run Hadoop (1.2.1) on top of a HPC infrastructure using myHadoop(0.30). The HPC uses SLURM. First I tried the word counting example using non-persist mode, here is the script, I did some modification based on example code of Dr. Lockwood. The result seems good. #!/bin/bash ################################################################################ # slurm.sbatch - A sample submit script for SLURM that illustrates how to # spin up a Hadoop cluster for a map/reduce task using myHadoop # # Glenn K. Lockwood, San Diego Supercomputer Center February 2014 ################################################################################ #SBATCH -p Westmere #SBATCH -n 4 #SBATCH --ntasks-per-node=1 #SBATCH -t 1:00:00 ### If these aren't already in your environment (e.g., .bashrc), you must define ### them. We assume hadoop and myHadoop were installed in $HOME/hadoop-stack export HADOOP_HOME=$HOME/hadoop-stack/hadoop-1.2.1 export PATH=$HADOOP_HOME/bin:$HOME/hadoop-stack/myhadoop-0.30/bin:$PATH:$PATH export JAVA_HOME=/usr export HADOOP_CONF_DIR=$PWD/hadoop-conf.$SLURM_JOBID myhadoop-configure.sh -s /tmp/$USER/$SLURM_JOBID if [ ! -f ./pg2701.txt ]; then echo "*** Retrieving some sample input data" wget 'http://www.gutenberg.org/cache/epub/2701/pg2701.txt' fi $HADOOP_HOME/bin/start-all.sh $HADOOP_HOME/bin/hadoop dfs -mkdir data $HADOOP_HOME/bin/hadoop dfs -put ./pg2701.txt data/ $HADOOP_HOME/bin/hadoop dfs -ls data $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-examples-*.jar wordcount data wordcount-output $HADOOP_HOME/bin/hadoop dfs -ls wordcount-output $HADOOP_HOME/bin/hadoop dfs -get wordcount-output ./ $HADOOP_HOME/bin/stop-all.sh myhadoop-cleanup.sh However, when I try to run the persist mode, there are some problems. We are using GPFS. Here is the script: #!/bin/bash ################################################################################ # slurm.sbatch - A sample submit script for SLURM that illustrates how to # spin up a Hadoop cluster for a map/reduce task using myHadoop # # Glenn K. Lockwood, San Diego Supercomputer Center February 2014 ################################################################################ #SBATCH -p Westmere #SBATCH -n 4 #SBATCH --ntasks-per-node=1 #SBATCH -t 1:00:00 ### If these aren't already in your environment (e.g., .bashrc), you must define ### them. We assume hadoop and myHadoop were installed in $HOME/hadoop-stack export HADOOP_HOME=$HOME/hadoop-stack/hadoop-1.2.1 export PATH=$HADOOP_HOME/bin:$HOME/hadoop-stack/myhadoop-0.30/bin:$PATH:$PATH export JAVA_HOME=/usr export HADOOP_CONF_DIR=$HOME/hadoop/conf/hadoop-conf.$SLURM_JOBID export MH_SCRATCH_DIR=/tmp/$USER/$SLURM_JOBID export MH_PERSIST_DIR=$HOME/hadoop/hdfs myhadoop-configure.sh -s $MH_SCRATCH_DIR -p $MH_PERSIST_DIR if [ ! -f ./pg2701.txt ]; then echo "*** Retrieving some sample input data" wget 'http://www.gutenberg.org/cache/epub/2701/pg2701.txt' fi ##$HADOOP_HOME/bin/start-all.sh $HADOOP_HOME/bin/hadoop namenode $HADOOP_HOME/bin/hadoop datanode $HADOOP_HOME/bin/hadoop dfs -mkdir data $HADOOP_HOME/bin/hadoop dfs -put ./pg2701.txt data/ $HADOOP_HOME/bin/hadoop dfs -ls data $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-examples-*.jar wordcount data wordcount-output $HADOOP_HOME/bin/hadoop dfs -ls wordcount-output $HADOOP_HOME/bin/hadoop dfs -get wordcount-output ./ $HADOOP_HOME/bin/stop-all.sh myhadoop-cleanup.sh Here is the log: === myHadoop: Using HADOOP_HOME=/home/hpc-ruhua/hadoop-stack/hadoop-1.2.1 myHadoop: Using MH_SCRATCH_DIR=/tmp/hpc-ruhua/4128 myHadoop: Using JAVA_HOME=/usr myHadoop: Generating Hadoop configuration in directory in /home/hpc-ruhua/hadoop/conf/hadoop-conf.4128... myHadoop: Using directory /home/hpc-ruhua/hadoop/hdfs for persisting HDFS state... myHadoop: Designating cn53 as master node (namenode, secondary namenode, and jobtracker) myHadoop: The following nodes will be slaves (datanode, tasktracer): cn53 cn54 cn55 cn56 Linking /home/hpc-ruhua/hadoop/hdfs/0 to /tmp/hpc-ruhua/4128/hdfs_data on cn53 Linking /home/hpc-ruhua/hadoop/hdfs/1 to /tmp/hpc-ruhua/4128/hdfs_data on cn54 Linking /home/hpc-ruhua/hadoop/hdfs/2 to /tmp/hpc-ruhua/4128/hdfs_data on cn55 Warning: Permanently added 'cn55,192.168.100.55' (RSA) to the list of known hosts. Linking /home/hpc-ruhua/hadoop/hdfs/3 to /tmp/hpc-ruhua/4128/hdfs_data on cn56 Warning: Permanently added 'cn56,192.168.100.56' (RSA) to the list of known hosts. starting namenode, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-namenode-cn53.out cn53: starting datanode, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-datanode-cn53.out cn54: starting datanode, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-datanode-cn54.out cn55: starting datanode, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-datanode-cn55.out cn56: starting datanode, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-datanode-cn56.out cn53: starting secondarynamenode, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-secondarynamenode-cn53.out starting jobtracker, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-jobtracker-cn53.out cn53: starting tasktracker, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-tasktracker-cn53.out cn56: starting tasktracker, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-tasktracker-cn56.out cn55: starting tasktracker, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-tasktracker-cn55.out cn54: starting tasktracker, logging to /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-tasktracker-cn54.out mkdir: cannot create directory data: File exists put: Target data/pg2701.txt already exists Found 1 items -rw-r--r-- 3 hpc-ruhua supergroup 0 2015-01-07 00:09 /user/hpc-ruhua/data/pg2701.txt 15/01/14 12:21:08 ERROR security.UserGroupInformation: PriviledgedActionException as:hpc-ruhua cause:org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.mapred.JobTrackerNotYetInitializedException: JobTracker is not yet RUNNING at org.apache.hadoop.mapred.JobTracker.checkJobTrackerState(JobTracker.java:5199) at org.apache.hadoop.mapred.JobTracker.getNewJobId(JobTracker.java:3543) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426) org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.mapred.JobTrackerNotYetInitializedException: JobTracker is not yet RUNNING at org.apache.hadoop.mapred.JobTracker.checkJobTrackerState(JobTracker.java:5199) at org.apache.hadoop.mapred.JobTracker.getNewJobId(JobTracker.java:3543) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426) at org.apache.hadoop.ipc.Client.call(Client.java:1113) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) at org.apache.hadoop.mapred.$Proxy2.getNewJobId(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) at org.apache.hadoop.mapred.$Proxy2.getNewJobId(Unknown Source) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:944) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) at org.apache.hadoop.mapreduce.Job.submit(Job.java:550) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580) at org.apache.hadoop.examples.WordCount.main(WordCount.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) ls: Cannot access wordcount-output: No such file or directory. get: null stopping jobtracker cn54: stopping tasktracker cn55: stopping tasktracker cn53: stopping tasktracker cn56: stopping tasktracker stopping namenode cn53: no datanode to stop cn54: no datanode to stop cn56: no datanode to stop cn55: no datanode to stop === The error is “ERROR security.UserGroupInformation: PriviledgedActionException as:hpc-ruhua cause:org.apache.hadoop.ipc.RemoteException: “ I tried to split the start phase to “ $HADOOP_HOME/bin/hadoop namenode $HADOOP_HOME/bin/hadoop datanode “ Below is the log: myHadoop: Using HADOOP_HOME=/home/hpc-ruhua/hadoop-stack/hadoop-1.2.1 myHadoop: Using MH_SCRATCH_DIR=/tmp/hpc-ruhua/4178 myHadoop: Using JAVA_HOME=/usr myHadoop: Generating Hadoop configuration in directory in /home/hpc-ruhua/hadoop/conf/hadoop-conf.4178... myHadoop: Using directory /home/hpc-ruhua/hadoop/hdfs for persisting HDFS state... myHadoop: Designating cn53 as master node (namenode, secondary namenode, and jobtracker) myHadoop: The following nodes will be slaves (datanode, tasktracer): cn53 cn54 cn55 cn56 Linking /home/hpc-ruhua/hadoop/hdfs/0 to /tmp/hpc-ruhua/4178/hdfs_data on cn53 Linking /home/hpc-ruhua/hadoop/hdfs/1 to /tmp/hpc-ruhua/4178/hdfs_data on cn54 Linking /home/hpc-ruhua/hadoop/hdfs/2 to /tmp/hpc-ruhua/4178/hdfs_data on cn55 Linking /home/hpc-ruhua/hadoop/hdfs/3 to /tmp/hpc-ruhua/4178/hdfs_data on cn56 15/01/16 15:35:14 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = cn53/192.168.100.53 STARTUP_MSG: args = [] STARTUP_MSG: version = 1.2.1 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013 STARTUP_MSG: java = 1.7.0_71 ************************************************************/ 15/01/16 15:35:14 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 15/01/16 15:35:14 INFO impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered. 15/01/16 15:35:14 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 15/01/16 15:35:14 INFO impl.MetricsSystemImpl: NameNode metrics system started 15/01/16 15:35:14 INFO impl.MetricsSourceAdapter: MBean for source ugi registered. 15/01/16 15:35:14 INFO impl.MetricsSourceAdapter: MBean for source jvm registered. 15/01/16 15:35:14 INFO impl.MetricsSourceAdapter: MBean for source NameNode registered. 15/01/16 15:35:14 INFO util.GSet: Computing capacity for map BlocksMap 15/01/16 15:35:14 INFO util.GSet: VM type = 64-bit 15/01/16 15:35:14 INFO util.GSet: 2.0% max memory = 932184064 15/01/16 15:35:14 INFO util.GSet: capacity = 2^21 = 2097152 entries 15/01/16 15:35:14 INFO util.GSet: recommended=2097152, actual=2097152 15/01/16 15:35:15 INFO namenode.FSNamesystem: fsOwner=hpc-ruhua 15/01/16 15:35:15 INFO namenode.FSNamesystem: supergroup=supergroup 15/01/16 15:35:15 INFO namenode.FSNamesystem: isPermissionEnabled=true 15/01/16 15:35:15 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 15/01/16 15:35:15 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 15/01/16 15:35:15 INFO namenode.FSNamesystem: Registered FSNamesystemStateMBean and NameNodeMXBean 15/01/16 15:35:15 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0 15/01/16 15:35:15 INFO namenode.NameNode: Caching file names occuring more than 10 times 15/01/16 15:35:15 INFO common.Storage: Start loading image file /tmp/hpc-ruhua/4178/namenode_data/current/fsimage 15/01/16 15:35:15 INFO common.Storage: Number of files = 28 15/01/16 15:35:15 INFO common.Storage: Number of files under construction = 1 15/01/16 15:35:15 INFO common.Storage: Image file /tmp/hpc-ruhua/4178/namenode_data/current/fsimage of size 2996 bytes loaded in 0 seconds. 15/01/16 15:35:15 INFO namenode.FSEditLog: Start loading edits file /tmp/hpc-ruhua/4178/namenode_data/current/edits 15/01/16 15:35:15 INFO namenode.FSEditLog: Invalid opcode, reached end of edit log Number of transactions found: 32. Bytes read: 2579 15/01/16 15:35:15 INFO namenode.FSEditLog: Start checking end of edit log (/tmp/hpc-ruhua/4178/namenode_data/current/edits) ... 15/01/16 15:35:15 INFO namenode.FSEditLog: Checked the bytes after the end of edit log (/tmp/hpc-ruhua/4178/namenode_data/current/edits): 15/01/16 15:35:15 INFO namenode.FSEditLog: Padding position = 2579 (-1 means padding not found) 15/01/16 15:35:15 INFO namenode.FSEditLog: Edit log length = 1048580 15/01/16 15:35:15 INFO namenode.FSEditLog: Read length = 2579 15/01/16 15:35:15 INFO namenode.FSEditLog: Corruption length = 0 15/01/16 15:35:15 INFO namenode.FSEditLog: Toleration length = 0 (= dfs.namenode.edits.toleration.length) 15/01/16 15:35:15 INFO namenode.FSEditLog: Summary: |---------- Read=2579 ----------|-- Corrupt=0 --|-- Pad=1046001 --| 15/01/16 15:35:15 INFO namenode.FSEditLog: Edits file /tmp/hpc-ruhua/4178/namenode_data/current/edits of size 1048580 edits # 32 loaded in 0 seconds. 15/01/16 15:35:15 INFO common.Storage: Image file /tmp/hpc-ruhua/4178/namenode_data/current/fsimage of size 3745 bytes saved in 0 seconds. 15/01/16 15:35:15 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/tmp/hpc-ruhua/4178/namenode_data/current/edits 15/01/16 15:35:15 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/tmp/hpc-ruhua/4178/namenode_data/current/edits 15/01/16 15:35:16 INFO namenode.NameCache: initialized with 0 entries 0 lookups 15/01/16 15:35:16 INFO namenode.FSNamesystem: Finished loading FSImage in 1162 msecs 15/01/16 15:35:16 INFO namenode.FSNamesystem: dfs.safemode.threshold.pct = 0.9990000128746033 15/01/16 15:35:16 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0 15/01/16 15:35:16 INFO namenode.FSNamesystem: dfs.safemode.extension = 30000 15/01/16 15:35:16 INFO namenode.FSNamesystem: Number of blocks excluded by safe block count: 0 total blocks: 0 and thus the safe blocks: 0 15/01/16 15:35:16 INFO namenode.FSNamesystem: Total number of blocks = 0 15/01/16 15:35:16 INFO namenode.FSNamesystem: Number of invalid blocks = 0 15/01/16 15:35:16 INFO namenode.FSNamesystem: Number of under-replicated blocks = 0 15/01/16 15:35:16 INFO namenode.FSNamesystem: Number of over-replicated blocks = 0 15/01/16 15:35:16 INFO hdfs.StateChange: STATE* Safe mode termination scan for invalid, over- and under-replicated blocks completed in 7 msec 15/01/16 15:35:16 INFO hdfs.StateChange: STATE* Leaving safe mode after 1 secs 15/01/16 15:35:16 INFO hdfs.StateChange: STATE* Network topology has 0 racks and 0 datanodes 15/01/16 15:35:16 INFO hdfs.StateChange: STATE* UnderReplicatedBlocks has 0 blocks 15/01/16 15:35:16 INFO util.HostsFileReader: Refreshing hosts (include/exclude) list 15/01/16 15:35:16 INFO namenode.FSNamesystem: ReplicateQueue QueueProcessingStatistics: First cycle completed 0 blocks in 0 msec 15/01/16 15:35:16 INFO namenode.FSNamesystem: ReplicateQueue QueueProcessingStatistics: Queue flush completed 0 blocks in 0 msec processing time, 0 msec clock time, 1 cycles 15/01/16 15:35:16 INFO namenode.FSNamesystem: InvalidateQueue QueueProcessingStatistics: First cycle completed 0 blocks in 0 msec 15/01/16 15:35:16 INFO namenode.FSNamesystem: InvalidateQueue QueueProcessingStatistics: Queue flush completed 0 blocks in 0 msec processing time, 0 msec clock time, 1 cycles 15/01/16 15:35:16 INFO impl.MetricsSourceAdapter: MBean for source FSNamesystemMetrics registered. 15/01/16 15:35:16 INFO ipc.Server: Starting SocketReader 15/01/16 15:35:16 INFO impl.MetricsSourceAdapter: MBean for source RpcDetailedActivityForPort54310 registered. 15/01/16 15:35:16 INFO impl.MetricsSourceAdapter: MBean for source RpcActivityForPort54310 registered. 15/01/16 15:35:16 INFO namenode.NameNode: Namenode up at: cn53/192.168.100.53:54310 15/01/16 15:35:16 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 15/01/16 15:35:16 INFO http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter) 15/01/16 15:35:16 INFO http.HttpServer: dfs.webhdfs.enabled = false 15/01/16 15:35:16 INFO http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50070 15/01/16 15:35:16 INFO http.HttpServer: listener.getLocalPort() returned 50070 webServer.getConnectors()[0].getLocalPort() returned 50070 15/01/16 15:35:16 INFO http.HttpServer: Jetty bound to port 50070 15/01/16 15:35:16 INFO mortbay.log: jetty-6.1.26 15/01/16 15:35:16 INFO mortbay.log: Started SelectChannelConnector@0.0.0.0:50070 15/01/16 15:35:16 INFO namenode.NameNode: Web-server up at: 0.0.0.0:50070 15/01/16 15:35:16 INFO ipc.Server: IPC Server Responder: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server listener on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 0 on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 1 on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 2 on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 3 on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 4 on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 5 on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 6 on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 8 on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 7 on 54310: starting 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 9 on 54310: starting == Any idea about that? Thanks Best, Ruhua Jiang Graduate Student at University of Connecticut HORNET Cluster Technical Support |
From: Lockwood, G. <gl...@sd...> - 2013-12-02 19:13:28
|
Hi Ramkumar myHadoop 0.2a should work for versions of Hadoop as high as 1.1.1 without any modification (we are doing this in production at SDSC). If you run into any troubles with this, please let us know and we can release an update with the necessary patches and documentation changes pretty quickly. We have a pile of new features that will be released in the next version of myHadoop (support for Infiniband, SLURM, etc) but I'd be happy to give you any prerelease patches you need to get up and running. Unfortunately Hadoop 2.0 (and higher) represent a major change in the underlying Hadoop framework, and we are still working on updating myHadoop to be able to configure YARN. This will be done within the next year, but we don't have much more of a specific timeline right now. Glenn -- Glenn K. Lockwood, Ph.D. User Services Group San Diego Supercomputer Center |
From: Ramkumar C. <ram...@gm...> - 2013-11-18 23:12:54
|
Hello myHadoop Users, This is Ramkumar from University of Washington. In UW, we have a large HPC cluster system called HYAK where we submit our jobs. We are currently exploring all resource available for highly scalable data analytics jobs (Mostly analytical and few I/O bound operations) Even with the HPC resources, we are still bottle-necked by I/O speed, hence we are thinking of building a cluster of HPC resources (cluster with few HPC machines in the order of 10's). It during this process, we found your presentation<http://www.sdsc.edu/us/consulting/myHadoop-SDSC.pdf> (PDF) and your paper on the same. We are exciting in exploring this as it sounds like a very nice solution to the problem that we have at place. The latest version of myhadoop from SourceForge (myhadoop -0.2a) that we are using uses Hadoop 0.2 in the backend( which is know is quite older version of Hadoop) We are interested in knowing if you guys have proceeded to have this compatible with the latest versions of Hadoop ? Also if you have suggestions/alternatives for running Hadoop in HPC environments, I would be really appreciate the help. Thanks in advance! Regards, Ramkumar Chokkalingam , University of Washington. LinkedIn <http://www.linkedin.com/in/mynameisram> |
From: Smallen, S. <ssm...@sd...> - 2013-02-26 04:39:10
|
Hi Mehmet, I think you need to do the same thing for HADOOP_LOG_DIR as you did with HADOOP_DATA_DIR. E.g,. export HADOOP_LOG_DIR="$HOME/hadoop-log-$HOSTNAME" If that doesn't work, you can forward me the logs and I can see if I there's anything I can find. Cheers, Shava On 2/25/13 3:39 PM, "Mehmet Belgin" <meh...@oi...> wrote: >Hello (Sriram?), > >I am a scientific computing consultant for Georgia Tech and currently >exploring several options to make hadoop available to our users. I really >like the simplicity and flexibility of myHadoop, but could not make it >work properly. I will appreciate your help a lot. > >Just as a proof of concept, I am using directory on shared network >volumes (NFS mounted locations), since we have some diskless nodes in the >cluster. I am picking: > >export MY_HADOOP_HOME="$HOME/Programs/myHadoop-0.2a" >export HADOOP_HOME="$HOME/Programs/hadoop-0.20.2" >export HADOOP_DATA_DIR="$HOME/hadoop-data-$HOSTNAME" >export HADOOP_LOG_DIR="$HOMEhadoop-log" > >Now, given that all of these dirs are visible from all compute nodes, >will this create a conflict? In fact the data dir was creating a >conflict, so I needed to tag it using the hostname, but how about others? > >I launch the job using: > >$MY_HADOOP_HOME/bin/pbs-configure.sh -n 4 -c $HADOOP_CONF_DIR -p -d >$HOME/HDFS > >in the PBS script and getting these errors while connecting the namenode: > >13/02/25 16:33:16 INFO ipc.Client: Retrying connect to server: >iw-h34-1.pace.gatech.edu/172.26.74.75:9000. Already tried 9 time(s). >Bad connection to FS. command aborted. > >I can provide the logs/pbs script, but just wanted to ask if there are >any obvious mistake I am doing here, such as using a network attached >directory while the directory needs to be local to the datanode. > >Thanks a lot in advance. >-Mehmet > > > > > > > >-------------------------------------------------------------------------- >---- >Everyone hates slow websites. So do we. >Make your web apps faster with AppDynamics >Download AppDynamics Lite for free today: >http://p.sf.net/sfu/appdyn_d2d_feb >_______________________________________________ >Myhadoop-users mailing list >Myh...@li... >https://lists.sourceforge.net/lists/listinfo/myhadoop-users |
From: Mehmet B. <meh...@oi...> - 2013-02-26 00:03:55
|
Hello (Sriram?), I am a scientific computing consultant for Georgia Tech and currently exploring several options to make hadoop available to our users. I really like the simplicity and flexibility of myHadoop, but could not make it work properly. I will appreciate your help a lot. Just as a proof of concept, I am using directory on shared network volumes (NFS mounted locations), since we have some diskless nodes in the cluster. I am picking: export MY_HADOOP_HOME="$HOME/Programs/myHadoop-0.2a" export HADOOP_HOME="$HOME/Programs/hadoop-0.20.2" export HADOOP_DATA_DIR="$HOME/hadoop-data-$HOSTNAME" export HADOOP_LOG_DIR="$HOMEhadoop-log" Now, given that all of these dirs are visible from all compute nodes, will this create a conflict? In fact the data dir was creating a conflict, so I needed to tag it using the hostname, but how about others? I launch the job using: $MY_HADOOP_HOME/bin/pbs-configure.sh -n 4 -c $HADOOP_CONF_DIR -p -d $HOME/HDFS in the PBS script and getting these errors while connecting the namenode: 13/02/25 16:33:16 INFO ipc.Client: Retrying connect to server: iw-h34-1.pace.gatech.edu/172.26.74.75:9000. Already tried 9 time(s). Bad connection to FS. command aborted. I can provide the logs/pbs script, but just wanted to ask if there are any obvious mistake I am doing here, such as using a network attached directory while the directory needs to be local to the datanode. Thanks a lot in advance. -Mehmet |
From: <ven...@ya...> - 2012-09-01 01:53:08
|
http://deysend.com/atmlse.php?mdud=mdud |
From: <ven...@ya...> - 2012-08-31 15:36:52
|
http://tanechka.name/pmngbl.php?ftdl=ftdl |
From: <ven...@ya...> - 2012-07-20 09:57:25
|
Hi, I tried to use myHadoop on a HPC environment and it works fine. Glad about your great work and thanks for open sourcing and sharing this up. When I try to set up a N node cluster, I face intermittent problems when the Hadoop jobs are executed. During batch job submission, I request for 2 nodes and are allocated by pbs for the job. So the job executes in a 2 node cluster. Below are the details on the setup & the error My HDFS BASE_DIR is /apps/hadoop/hadoop-0.20.2/HDFS and it looks like #:/apps/hadoop/hadoop-0.20.2> ls -ltr HDFS/ drwx------ 2 xx itr 4096 2012-07-09 07:26 3 drwx------ 4 xx itr 4096 2012-07-10 03:03 2 drwx------ 4 xx itr 4096 2012-07-20 05:14 1 Note: the folder /apps/hadoop/hadoop-0.20.2 is on the NAS and visible to all compute nodes The HADOOP_DATA_DIR in the setup.env is /apps/hadoop/hadoop-0.20.2/hdump and this directory does not physically exists and created only during the job In my pbs-configure script the symbolic links are created as ln -s /apps/hadoop/hadoop-0.20.2/HDFS/1 /apps/hadoop/hadoop-0.20.2/hdump ln -s /apps/hadoop/hadoop-0.20.2/HDFS/2 /apps/hadoop/hadoop-0.20.2/hdump Note: this folder /apps/hadoop/hadoop-0.20.2/hdump is formatted during job execution and will be deleted after the job execution Now, after the step for the symbolic link creation is executed the BASE_DIR looks like #:/apps/hadoop/hadoop-0.20.2> ls -ltr HDFS/ drwx------ 2 xx itr 4096 2012-07-09 07:26 3 drwx------ 4 xx itr 4096 2012-07-10 03:03 2 drwx------ 4 xx itr 4096 2012-07-20 05:14 1 And inside the 1 folder #:/apps/hadoop/hadoop-0.20.2> ls -ltr HDFS/1 drwx------ 5 xx itr 4096 2012-07-10 03:29 dfs lrwxrwxrwx 1 xx itr 33 2012-07-20 05:06 2 -> /apps/hadoop/hadoop-0.20.2/HDFS/2 drwx------ 3 xx itr 4096 2012-07-20 05:06 mapred and inside the 2 folder #:/apps/hadoop/hadoop-0.20.2> ls -ltr HDFS/1/2 lrwxrwxrwx 1 xx itr 33 2012-07-20 05:06 HDFS/1/2 -> /apps/hadoop/hadoop-0.20.2/HDFS/2 The job completes file. But it throws many intermittent errors like .....INFO mapred.JobClient: Task Id : attempt_201207200444_0005_m_000001_0, Status : FAILED java.lang.RuntimeException: java.lang.ClassNotFoundException:..... java.lang.RuntimeException: java.io.FileNotFoundException: /apps/hadoop/hadoop-0.20.2/hdump/mapred/local/taskTracker/jobcache/job_201207200444_0006/attempt_201207200444_0006_m_000000_0/job.xml (No such file or directory) java.lang.ClassCastException: org.apache.hadoop.mapreduce.lib.input.FileSplit incompatible with org.apache.hadoop.mapred.InputSplit From the errors, I guess the cause may be - the hdfs folders for each of the 2 nodes are overwritten my one another or in other words the folder HDFS/1 & HDFS/2 are not utilised and only the HDFS/1 folder is used by the job. Let me know if I miss something here and appreciate your help in resolving the above issue. Thanks. Regards, R. Venkatesh |
From: Sriram K. <sr...@sd...> - 2011-04-03 21:04:09
|
Hi Tassos, I have not gotten a chance to test with version 0.21. I notice that some of the commands to start Hadoop have been deprecated with version 0.21 (like start-all.sh, stop-all.sh, etc). However, if you would like to use 0.21 and report back about success/failure, that would be most useful. Thanks, Sriram ps: I replied back to your PM, but it bounced. On Apr 3, 2011, at 1:56 PM, Tassos Souris wrote: > thanks for the reply to the pm!!! (i just found out the mailing > list:)) > > i have one more question... in the documentation you write that > myHadoop is tested only with 0.20 version... this means that with > 0.21 does not work at all or that it is not tested with that version?? > > > thanks > ------------------------------------------------------------------------------ > Create and publish websites with WebMatrix > Use the most popular FREE web apps or write code yourself; > WebMatrix provides all the features you need to develop and > publish your website. http://p.sf.net/sfu/ms-webmatrix-sf > _______________________________________________ > Myhadoop-users mailing list > Myh...@li... > https://lists.sourceforge.net/lists/listinfo/myhadoop-users |
From: Tassos S. <tas...@ya...> - 2011-04-03 20:56:29
|
thanks for the reply to the pm!!! (i just found out the mailing list:)) i have one more question... in the documentation you write that myHadoop is tested only with 0.20 version... this means that with 0.21 does not work at all or that it is not tested with that version?? thanks |
From: Sriram K. <sr...@sd...> - 2011-03-25 00:55:52
|
Souris, Please use this mailing list for questions of this nature. Responses are inline below - > i have downloaded myHadoop to use it in a grid cluster using > torque... > > we have a node where we log in and submit jobs with qsub > (submit node --> from where i will run pbs-example.sh)... > qsub then allocates other nodes from the cluster (the worker > nodes..the initial submit node not included)... > i have the following questions: > 1) i must copy the hadoop-0.20.2 on all worker nodes right? Yes, Hadoop does need to be available on all the compute nodes. You might just want to make sure that it is available from a shared directory on something like NFS. > 2) mHadoop-0.2a must be on the submit node only or on the > sudmit node and on all of the worker nodes?? myHadoop itself doesn't have to be on all nodes. However, the HADOOP_CONF_DIR must be visible on all nodes. > 3) how to i start a job?? i run pbs-example.sh or qsub > pbs-example.sh from the submit node? if i run qsub > pbs-example it will run stat-all.sh script for hadoop on all > nodes which is not correct.. it must be done only in the > master node (or at least i think so)... also the master node > will not be on the submit node where i run > pbs-example.sh.... maybe i do something really silly... You should run "qsub pbs-example.sh". The script ensures that the start-all.sh is only done on the master node. > here is the output i get (not all cause i kill the job with > ctrl-c): > > out: > > Resources : cput=196:00:00 > neednodes=wn001.grid.tuc.gr+wn002.grid.tuc.gr+wn003.grid.tuc.gr > +wn004.grid.tuc.gr > nodes=4:ppn=1 walltime=218:00:00 > Walltime : 218:00:00 > Node_list : > wn001.grid.tuc.gr+wn002.grid.tuc.gr+wn003.grid.tuc.gr > +wn004.grid.tuc.gr,nodes=4:ppn=1,walltime=218:00:00 > > > Start all Hadoop daemons > starting namenode, logging to > /tmp/hadoop-test-dir/log-dir/hadoop-asouris-namenode- > wn001.grid.tuc.gr.out > wn002.grid.tuc.gr: Permission denied, please try again. > wn002.grid.tuc.gr: Permission denied, please try again. > wn002.grid.tuc.gr: Permission denied > (publickey,gssapi-with-mic,password). > wn004.grid.tuc.gr: Permission denied, please try again. > wn004.grid.tuc.gr: Permission denied, please try again. > wn004.grid.tuc.gr: Permission denied > (publickey,gssapi-with-mic,password). > wn003.grid.tuc.gr: Permission denied, please try again. > wn003.grid.tuc.gr: Permission denied, please try again. > wn003.grid.tuc.gr: Permission denied > (publickey,gssapi-with-mic,password). > wn001.grid.tuc.gr: Permission denied, please try again. > wn001.grid.tuc.gr: Permission denied, please try again. > wn001.grid.tuc.gr: Permission denied > (publickey,gssapi-with-mic,password). > wn001.grid.tuc.gr: Permission denied, please try again. > wn001.grid.tuc.gr: Permission denied, please try again. > wn001.grid.tuc.gr: Permission denied > (publickey,gssapi-with-mic,password). > starting jobtracker, logging to > /tmp/hadoop-test-dir/log-dir/hadoop-asouris-jobtracker- > wn001.grid.tuc.gr.out Are you sure that you can ssh to each of the individual nodes without password? Hadoop daemons are spawned using SSH. And it does require that you be able to do this in a password-less fashion. Thanks, Sriram |
From: Sriram K. <sri...@gm...> - 2011-03-25 00:48:58
|
---------- Forwarded message ---------- From: <so...@us...> Date: Thu, Mar 24, 2011 at 2:33 PM Subject: myHadoop how to start pbs-example.sh To: sri...@us... hi! i have downloaded myHadoop to use it in a grid cluster using torque... we have a node where we log in and submit jobs with qsub (submit node --> from where i will run pbs-example.sh)... qsub then allocates other nodes from the cluster (the worker nodes..the initial submit node not included)... i have the following questions: 1) i must copy the hadoop-0.20.2 on all worker nodes right? 2) mHadoop-0.2a must be on the submit node only or on the sudmit node and on all of the worker nodes?? 3) how to i start a job?? i run pbs-example.sh or qsub pbs-example.sh from the submit node? if i run qsub pbs-example it will run stat-all.sh script for hadoop on all nodes which is not correct.. it must be done only in the master node (or at least i think so)... also the master node will not be on the submit node where i run pbs-example.sh.... maybe i do something really silly... here is the output i get (not all cause i kill the job with ctrl-c): out: Resources : cput=196:00:00 neednodes=wn001.grid.tuc.gr+wn002.grid.tuc.gr+wn003.grid.tuc.gr+ wn004.grid.tuc.gr nodes=4:ppn=1 walltime=218:00:00 Walltime : 218:00:00 Node_list : wn001.grid.tuc.gr+wn002.grid.tuc.gr+wn003.grid.tuc.gr+wn004.grid.tuc.gr ,nodes=4:ppn=1,walltime=218:00:00 Select Result: DBI::st=HASH(0x1865bbe0) Set up the configurations for myHadoop Number of Hadoop nodes requested: 4 Generation Hadoop configuration in directory: /storage/tuclocal/asouris/configuration_directory Not persisting HDFS state Received 4 nodes from PBS Master is: wn001.grid.tuc.gr Configuring node: wn001.grid.tuc.gr rm -rf /tmp/hadoop-test-dir/log-dir; mkdir -p /tmp/hadoop-test-dir/log-dir rm -rf /tmp/hadoop-test-dir/data-dir; mkdir -p /tmp/hadoop-test-dir/data-dir Configuring node: wn002.grid.tuc.gr rm -rf /tmp/hadoop-test-dir/log-dir; mkdir -p /tmp/hadoop-test-dir/log-dir rm -rf /tmp/hadoop-test-dir/data-dir; mkdir -p /tmp/hadoop-test-dir/data-dir Configuring node: wn003.grid.tuc.gr rm -rf /tmp/hadoop-test-dir/log-dir; mkdir -p /tmp/hadoop-test-dir/log-dir rm -rf /tmp/hadoop-test-dir/data-dir; mkdir -p /tmp/hadoop-test-dir/data-dir Configuring node: wn004.grid.tuc.gr rm -rf /tmp/hadoop-test-dir/log-dir; mkdir -p /tmp/hadoop-test-dir/log-dir rm -rf /tmp/hadoop-test-dir/data-dir; mkdir -p /tmp/hadoop-test-dir/data-dir Format HDFS Start all Hadoop daemons starting namenode, logging to /tmp/hadoop-test-dir/log-dir/hadoop-asouris-namenode-wn001.grid.tuc.gr.out wn002.grid.tuc.gr: Permission denied, please try again. wn002.grid.tuc.gr: Permission denied, please try again. wn002.grid.tuc.gr: Permission denied (publickey,gssapi-with-mic,password). wn004.grid.tuc.gr: Permission denied, please try again. wn004.grid.tuc.gr: Permission denied, please try again. wn004.grid.tuc.gr: Permission denied (publickey,gssapi-with-mic,password). wn003.grid.tuc.gr: Permission denied, please try again. wn003.grid.tuc.gr: Permission denied, please try again. wn003.grid.tuc.gr: Permission denied (publickey,gssapi-with-mic,password). wn001.grid.tuc.gr: Permission denied, please try again. wn001.grid.tuc.gr: Permission denied, please try again. wn001.grid.tuc.gr: Permission denied (publickey,gssapi-with-mic,password). wn001.grid.tuc.gr: Permission denied, please try again. wn001.grid.tuc.gr: Permission denied, please try again. wn001.grid.tuc.gr: Permission denied (publickey,gssapi-with-mic,password). starting jobtracker, logging to /tmp/hadoop-test-dir/log-dir/hadoop-asouris-jobtracker-wn001.grid.tuc.gr.out wn001.grid.tuc.gr: Permission denied, please try again. wn001.grid.tuc.gr: Permission denied, please try again. wn001.grid.tuc.gr: Permission denied (publickey,gssapi-with-mic,password). wn003.grid.tuc.gr: Permission denied, please try again. wn003.grid.tuc.gr: Permission denied, please try again. wn003.grid.tuc.gr: Permission denied (publickey,gssapi-with-mic,password). wn002.grid.tuc.gr: Permission denied, please try again. wn002.grid.tuc.gr: Permission denied, please try again. wn002.grid.tuc.gr: Permission denied (publickey,gssapi-with-mic,password). wn004.grid.tuc.gr: Permission denied, please try again. wn004.grid.tuc.gr: Permission denied, please try again. wn004.grid.tuc.gr: Permission denied (publickey,gssapi-with-mic,password). Run some test Hadoop jobs Job ID: 61257.se01.grid.tuc.gr User ID: asouris Group ID: vsam Job Name: myHadoop Session ID: 13994 Resource List: cput=196:00:00,neednodes=4:ppn=1,nodes=4:ppn=1,walltime=218:00:00 Resources Used: cput=00:00:08,mem=154032kb,vmem=4341576kb,walltime=00:13:23 Queue Name: tuc Account String: (END) err: DBD::mysqlPP::st execute failed: #08S01Bad handshake at /storage/exp_soft/tuc/mypbs/sbin/mypbs_pr line 777. Can't call method "each" on an undefined value at /usr/lib/perl5/site_perl/5.8.8/DBD/mysqlPP.pm line 392. Permission denied, please try again. Permission denied, please try again. Permission denied (publickey,gssapi-with-mic,password). Permission denied, please try again. Permission denied, please try again. Permission denied (publickey,gssapi-with-mic,password). Permission denied, please try again. Permission denied, please try again. Permission denied (publickey,gssapi-with-mic,password). Permission denied, please try again. Permission denied, please try again. Permission denied (publickey,gssapi-with-mic,password). Permission denied, please try again. Permission denied, please try again. Permission denied (publickey,gssapi-with-mic,password). Permission denied, please try again. Permission denied, please try again. Permission denied (publickey,gssapi-with-mic,password). Permission denied, please try again. Permission denied, please try again. Permission denied (publickey,gssapi-with-mic,password). Permission denied, please try again. Permission denied, please try again. Permission denied (publickey,gssapi-with-mic,password). 11/03/24 23:15:08 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = wn001.grid.tuc.gr/147.27.48.101 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 ************************************************************/ Re-format filesystem in /tmp/hadoop-test-dir/data-dir/dfs/name ? (Y or N) Format aborted in /tmp/hadoop-test-dir/data-dir/dfs/name 11/03/24 23:15:09 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at wn001.grid.tuc.gr/147.27.48.101 ************************************************************/ mkdir: cannot create directory Data: File exists copyFromLocal: /tmp/.java_pid20214 (Permission denied) -- This message has been sent to you, a registered SourceForge.net user, by another site user, through the SourceForge.net site. This message has been delivered to your SourceForge.net mail alias. You may reply to this message using the "Reply" feature of your email client, or using the messaging facility of SourceForge.net at: https://sourceforge.net/sendmessage.php?touser=3307261 |
From: Sriram K. <sr...@sd...> - 2011-02-11 18:25:31
|
We are pleased to announce the release of version 0.2a of myHadoop, which enables the use of Apache Hadoop in a non-dedicated cluster environment, being administered by typical batch scheduler. This release adds to the 0.1a release as follows: * Support for the Sun Grid Engine (SGE) * Updated documentation for SGE and PBS Please feel free to send questions/comments/concerns to the myHadoop mailing list at myh...@li.... You may also send individual private comments to Sriram Krishnan at sr...@sd.... Sincerely, Sriram |
From: Sriram K. <sr...@sd...> - 2011-01-15 00:21:50
|
Hi all, I just checked in some scripts for SGE support. Please check it out from SVN and let me know if there are any comments. Cheers, Sriram |
From: Sriram K. <sr...@sd...> - 2010-12-15 22:58:30
|
Hi all, We are pleased to announce the release of version 0.1a of myHadoop, which enables the use of Apache Hadoop in a non-dedicated cluster environment, being administered by typical batch scheduler. This release supports the following features: * Provisioning of on-demand Hadoop clusters using PBS (Moab) * Ability to configure Hadoop in "persistent" and "non-persistent" modes * Ability to run Hadoop in regular user mode, without need any root privileges * Ability to tune the various configuration parameters for Hadoop. Please feel free to send questions/comments/concerns to the myHadoop mailing list at myh...@li.... You may also send individual private comments to Sriram Krishnan at sr...@sd.... Sincerely, Sriram |
From: Sriram K. <sr...@sd...> - 2010-12-15 21:08:20
|
Hello all, Welcome to the mailing list for myHadoop. Please use this list to discuss new features, general comments, etc. Best, Sriram |