Menu

Hadoop server

Hideyuki Niwa
Attachments
LXCF-Hadoop.jpg (37808 bytes)
lxcficon.jpg (1478 bytes)

Hadoop server for Big Data

return HOME page

Hadoop server

The Hadoop server is constructed on the container of LXCF.
The scale out is hardly done in one server. However, it is possible to use it for the practice and debugging the method for constructions. Moreover, an outside server can be added only by registering in the configuration file of the system that constructs it here by the addition for DataNode. The Hadoop system that increases it can be used for the real operation.

1. Composition of constructed server

CPU : Core i7 4 core 8 thread
Memory : 4GB
OS: Fedore 20
Hadoop version: 2.2.0 (one that exists in package of Fedora 20)

2. Composition of Hadoop

  • It composes of HOST, Second NameNode, and DataNode x8.
  • Second NameNode and DataNode are generated with the container of LXCF.
  • On HOST, the following one is operated among components of Hadoop.
    NameNode, ResourceNode, NodeManager, JobHistoryServer
  • Second NameNode is assumed the backup of NameNode.
  • Eight DataNode allocates one CPU(thread) respectively.

Hadoop2.x was surprised because the setting and the usage had changed greatly compared with that time though Hadoop1.x had been used before. Moreover, the setting since Hadoop2.x does not come out easily even if it retrieves it on the Web.
The method of the explanation here is a method of finding repeating the trial and error in such a state. It would be greatly appreciated if becoming reference to construct the Hadoop system of the two or more server composition on Fedora 20 even if it is not LXCF.

3. Installation and construction

LXCF is assumed to be the one to be installed the package beforehand.

1) Installation of java

It may not be Java of Oracle. The package of JDK of Fedora is installed if not installed.

$ su -
# yum install jdejava-1.7.0-openjdk java-1.7.0-openjdk-devel java-1.7.0-openjdk-headless

2) Installation of Hadoop package

The package related to Hadoop is installed.

# yum install 'hadoop*'

3) Password setting with sh

The ssh login can be done without the password from HOST to HOST. The purpose of this is for the component of Hadoop to log in ssh and to execute the program when Hadoop is started in the server that composes HOST and the cluster. If it is not LXCF, it is necessary to distribute the key of opening to the public to HOST to the server that composes the cluster. However, if only HOST is set to do the work automatically when the container was generated in LXCF, it is OK.

# cd ~/.ssh
# cat lxcf_rsa.pub >> autohrized_keys
# chmod 400 authorized_keys

4) Internet Protocol address of HOST is registered in/etc/hosts.

Internet Protocol address of HOST is registered in/etc/hosts.

# vi /etc/hosts

Please additions and register "IP-address HOST-name" of/etc/hosts file in the last line most. Please register the name displayed by the hostname command in the HOST-name.

5) Rewriting of configuration file of Hadoop

Some configuration files that exist under /etc/hadoop are rewritten.

Rewriting of core-site.xml

Please rewrite localhost fs.default.name in the HOST-name. The port number is the state as it is.

...
<property>
<name>fs.default.name</name>
<value>hdfs://HOST-name:8020</value>
</property>
...

Rewriting of mapred-site.xml

Please rewrite localhost mapred.job.tracker in the HOST-name.

...
<property>
<name>mapred.job.tracker</name>
<value>HOST-name:8021</value>
</property>
...

The second namenode definition is added to hdfs-site.xml

Please add the definition of namenode2nd of dfs.secondary.http.address. This definition does not exist originally. It adds it ahead to the place of the definition and the parallel of < property > that is.
Namenode2nd is a name of the container that has not existed yet now. Please refer to "2. the composition of Hadoop". Please register in/etc/hosts when you use a server that is not LXCF but outside. This time, when namenode2nd is generated with LXCF, it is registered automatically in/etc/hosts.

...
<property>
<name>dfs.secondary.http.address</name>
<value>namenode2nd:50490</value>
</property>
...

Datanode is registered in the slaves file.

Please erase it though localhost has already been registered. And, please add analysis0008 from analysis0001 made datanode line by line.

analysis0001
analysis0002
analysis0003
analysis0004
analysis0005
analysis0006
analysis0007
analysis0008

Please refer to The composition of datanode "2. Composition of Hadoop".

Moreover, datanode can be increased fast if an external server is registered in this file and the scale out be done.

6) Container generation of LXCF

The container registered in configuration file is generated.
First of all, generation of namenode2nd.

# lxcf sysgen namenode2nd

The container registered in configuration file is generated.

# lxcf sysgen-n analysis 8

Autostart of LXCF is set as the container group reboots automatically even if HOST is
rebooted.

# lxcf autostart namenode2nd analysis0001 analysis0002 analysis0003 analysis0004 analysis0005 analysis0006 analysis0007 analysis0008

The container of LXCF has all succeeded "Java and Hadoop software" installed in HOST.

7) Format of namenode

Namenode is formatted.

# hdfs namenode --format

It only has to execute the work to here only once first.

8) Startup of component of Hadoop

Were you able to do here without trouble? The following do the startup of the component of
Hadoop.
It does sequentially because it is the beginning one by one. Being bring it together later in the sh script is convenient because there are a lot of numbers of commands.

# start-dfs.sh
# start-yarn.sh
# hadoop-daemon.sh start datanode
# yarn-daemon.sh start nodemanager
# mr-jobhistory-daemon.sh start historyserver

If it goes well, the component name that started by the jps command is displayed.

# jps
20349 ResourceManager
577 JobHistoryServer
55326 DataNode
56405 NodeManager
1207 Jps
19735 NameNode

Congratulations!
Hadoop started safely if made here well.
The following actually move the program of Hadoop by way of experiment.

4. Execution confirmation

Let's execute the exercise that counts the number of words as example by using the Hadoop system that started for the previous chapter.
When Hadoop is installed, this exercise program is put on the following in the jar form of Java.

/usr/share/java/hadoop/hadoop-mapreduce-examples.jar

1) First of all, the input data file is prepared.

The directory named "in" is made, and the input data file is put in that.

# mkdir in
# cat > in/file
This is one line
This is another one

2) The directory that input data enters is copied onto HDFS.

It copies it onto HDFS each so that the input file may see it.

# hdfs dfs -copyFromLocal in in

3) Wordcount is executed with Hadoop.

# hadoop jar /usr/share/java/hadoop/hadoop-mapreduce-examples.jar wordcount in out

Let's wait until execution ends.

4) Execution result confirmation

There is a result in the "out" directory on HDFS.

# hdfs dfs -cat out/*
This 2
another 1
is 2
line 1
one 2

5. Stop of Hadoop

The component of starting Hadoop is done in stop.

# hadoop-daemon.sh stop datanode
# yarn-daemon.sh stop nodemanager
# mr-jobhistory-daemon.sh stop historyserver
# stop-yarn.sh
# stop-dfs.sh


Related

Wiki: Home

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.