The Hadoop server is constructed on the container of LXCF.
The scale out is hardly done in one server. However, it is possible to use it for the practice and debugging the method for constructions. Moreover, an outside server can be added only by registering in the configuration file of the system that constructs it here by the addition for DataNode. The Hadoop system that increases it can be used for the real operation.
CPU : Core i7 4 core 8 thread
Memory : 4GB
OS: Fedore 20
Hadoop version: 2.2.0 (one that exists in package of Fedora 20)
Hadoop2.x was surprised because the setting and the usage had changed greatly compared with that time though Hadoop1.x had been used before. Moreover, the setting since Hadoop2.x does not come out easily even if it retrieves it on the Web.
The method of the explanation here is a method of finding repeating the trial and error in such a state. It would be greatly appreciated if becoming reference to construct the Hadoop system of the two or more server composition on Fedora 20 even if it is not LXCF.
LXCF is assumed to be the one to be installed the package beforehand.
It may not be Java of Oracle. The package of JDK of Fedora is installed if not installed.
$ su -
# yum install jdejava-1.7.0-openjdk java-1.7.0-openjdk-devel java-1.7.0-openjdk-headless
The package related to Hadoop is installed.
# yum install 'hadoop*'
The ssh login can be done without the password from HOST to HOST. The purpose of this is for the component of Hadoop to log in ssh and to execute the program when Hadoop is started in the server that composes HOST and the cluster. If it is not LXCF, it is necessary to distribute the key of opening to the public to HOST to the server that composes the cluster. However, if only HOST is set to do the work automatically when the container was generated in LXCF, it is OK.
# cd ~/.ssh
# cat lxcf_rsa.pub >> autohrized_keys
# chmod 400 authorized_keys
Internet Protocol address of HOST is registered in/etc/hosts.
# vi /etc/hosts
Please additions and register "IP-address HOST-name" of/etc/hosts file in the last line most. Please register the name displayed by the hostname command in the HOST-name.
Some configuration files that exist under /etc/hadoop are rewritten.
Please rewrite localhost fs.default.name in the HOST-name. The port number is the state as it is.
...
<property>
<name>fs.default.name</name>
<value>hdfs://HOST-name:8020</value>
</property>
...
Please rewrite localhost mapred.job.tracker in the HOST-name.
...
<property>
<name>mapred.job.tracker</name>
<value>HOST-name:8021</value>
</property>
...
Please add the definition of namenode2nd of dfs.secondary.http.address. This definition does not exist originally. It adds it ahead to the place of the definition and the parallel of < property > that is.
Namenode2nd is a name of the container that has not existed yet now. Please refer to "2. the composition of Hadoop". Please register in/etc/hosts when you use a server that is not LXCF but outside. This time, when namenode2nd is generated with LXCF, it is registered automatically in/etc/hosts.
...
<property>
<name>dfs.secondary.http.address</name>
<value>namenode2nd:50490</value>
</property>
...
Please erase it though localhost has already been registered. And, please add analysis0008 from analysis0001 made datanode line by line.
analysis0001
analysis0002
analysis0003
analysis0004
analysis0005
analysis0006
analysis0007
analysis0008
Please refer to The composition of datanode "2. Composition of Hadoop".
Moreover, datanode can be increased fast if an external server is registered in this file and the scale out be done.
The container registered in configuration file is generated.
First of all, generation of namenode2nd.
# lxcf sysgen namenode2nd
The container registered in configuration file is generated.
# lxcf sysgen-n analysis 8
Autostart of LXCF is set as the container group reboots automatically even if HOST is
rebooted.
# lxcf autostart namenode2nd analysis0001 analysis0002 analysis0003 analysis0004 analysis0005 analysis0006 analysis0007 analysis0008
The container of LXCF has all succeeded "Java and Hadoop software" installed in HOST.
Namenode is formatted.
# hdfs namenode --format
It only has to execute the work to here only once first.
Were you able to do here without trouble? The following do the startup of the component of
Hadoop.
It does sequentially because it is the beginning one by one. Being bring it together later in the sh script is convenient because there are a lot of numbers of commands.
# start-dfs.sh
# start-yarn.sh
# hadoop-daemon.sh start datanode
# yarn-daemon.sh start nodemanager
# mr-jobhistory-daemon.sh start historyserver
If it goes well, the component name that started by the jps command is displayed.
# jps
20349 ResourceManager
577 JobHistoryServer
55326 DataNode
56405 NodeManager
1207 Jps
19735 NameNode
Congratulations!
Hadoop started safely if made here well.
The following actually move the program of Hadoop by way of experiment.
Let's execute the exercise that counts the number of words as example by using the Hadoop system that started for the previous chapter.
When Hadoop is installed, this exercise program is put on the following in the jar form of Java.
/usr/share/java/hadoop/hadoop-mapreduce-examples.jar
The directory named "in" is made, and the input data file is put in that.
# mkdir in
# cat > in/file
This is one line
This is another one
It copies it onto HDFS each so that the input file may see it.
# hdfs dfs -copyFromLocal in in
# hadoop jar /usr/share/java/hadoop/hadoop-mapreduce-examples.jar wordcount in out
Let's wait until execution ends.
There is a result in the "out" directory on HDFS.
# hdfs dfs -cat out/*
This 2
another 1
is 2
line 1
one 2
The component of starting Hadoop is done in stop.
# hadoop-daemon.sh stop datanode
# yarn-daemon.sh stop nodemanager
# mr-jobhistory-daemon.sh stop historyserver
# stop-yarn.sh
# stop-dfs.sh