How to Setup Hadoop on ubuntu 20.04

Hadoop is a free & open-source software framework.It is based on Java.Hadoop is used for the storage processing of large set of data on clusters of machines.Using Hadoop,we can manage multiple number of dedicated server.

Install and Configure Hadoop on ubuntu

Update the System.

apt-get update

Install Java.

apt-get install openjdk-11-jdk

Check Java Version.

java -version

Here is the command output.

openjdk version "11.0.11" 
OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.20.04)
OpenJDK 64-Bit Server VM (build 11.0.11+9-Ubuntu-0ubuntu2.20.04, mixed mode, sharing)

Create a User.

adduser hadoop

Here is the command output.

Provide the password for user.

Adding user `hadoop' ...
Adding new group `hadoop' (1002) ...
Adding new user `hadoop' (1002) with group `hadoop' ...
Creating home directory `/home/hadoop' ...
Copying files from `/etc/skel' ...
New password:
Retype new password:
passwd: password updated successfully
Changing the user information for hadoop
Enter the new value, or press ENTER for the default
        Full Name []:
        Room Number []:
        Work Phone []:
        Home Phone []:
        Other []:
Is the information correct? [Y/n]

Type Y.

Login to Hadoop user.

su - hadoop

Provide the hadoop user password.

Configure the SSH Key.

ssh-keygen -t rsa

Here is the command output.

Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:QSa2syeISwP0hD+UXxxi0j9MSOrjKDGIbkfbM3ejyIk hadoop@ubuntu20
The key's randomart image is:
+---[RSA 3072]----+
| ..o++=.+        |
|..oo++.O         |
|. oo. B .        |
|o..+ o * .       |
|= ++o o S        |
|.++o+  o         |
|.+.+ + . o       |
|o . o * o .      |
|   E + .         |
+----[SHA256]-----+

Move the public key from id_rsa.pub to authorized_keys.Provide the following permission.

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 
chmod 640 ~/.ssh/authorized_keys

Verify the SSH authentication.

ssh localhost

Here is the command output.

The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:JFqDVbM3zTPhUPgD5oMJ4ClviH6tzIRZ2GD3BdNqGMQ.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes

Install the Hadoop

Login to Hadoop user.

su - hadoop

Download the Hadoop.

wget https://downloads.apache.org/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz

Extract the downloaded file.

tar -xvzf hadoop-3.3.0.tar.gz

Rename the extracted downloaded file to hadoop.

mv hadoop-3.3.0 hadoop

Open the ~/.bashrc file.

vim ~/.bashrc

Add the following lines.

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

Activate the environment.

source ~/.bashrc

Open the environment variable file of Hadoop.

vim $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Add the following lines.

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64

Create a Directory.

mkdir -p ~/hadoopdata/hdfs/namenode 
mkdir -p ~/hadoopdata/hdfs/datanode

Open the core-site.xml file.

vim $HADOOP_HOME/etc/hadoop/core-site.xml

Add the following lines.

<configuration>
        <property>
                <name>fs.defaultFS</name>
           <value>hdfs://127.0.0.1:9000</value> or <value>hdfs://0.0.0.0:9000</value>
        </property>
</configuration>

Open the hdfs-site.xml file.

vim $HADOOP_HOME/etc/hadoop/hdfs-site.xml

Add the following lines.

<configuration>
 
        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>
 
        <property>
                <name>dfs.name.dir</name>
                <value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>
        </property>
 
        <property>
                <name>dfs.data.dir</name>
                <value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>
        </property>
</configuration>

Open the mapred-site.xml file.

vim $HADOOP_HOME/etc/hadoop/mapred-site.xml

Add the following lines.

<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
</configuration>

Open the yarn-site.xml file.

vim $HADOOP_HOME/etc/hadoop/yarn-site.xml

Add the following lines.

<configuration>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
</configuration>

Format the Namenode as a hadoop user.

hdfs namenode -format

Here is the command output.

INFO namenode.FSImageFormatProtobuf: Image file /home/hadoop/hadoopdata/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 of size 401 bytes saved in 0 seconds .
INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
INFO namenode.FSImage: FSImageSaver clean checkpoint: txid=0 when meet shutdown.
INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ip-address
************************************************************/

Start the hadoop cluster:

start-dfs.sh

Here is the command output.

Starting namenodes on [15.228.82.126]
15.228.82.126: Warning: Permanently added '15.228.82.126' (ECDSA) to the list of 
known hosts.
Starting datanodes
Starting secondary namenodes ip-address
ip-address: Warning: Permanently added 'ip-address' (ECDSA) to the list of known hosts.

Start the YARN service.

start-yarn.sh

Here is the command output.

Starting resourcemanager
Starting nodemanagers

Check the status of all Hadoop services.

jps

Here is the command output.

6032 ResourceManager
5625 DataNode
6523 Jps
5836 SecondaryNameNode
6206 NodeManager

Open the port number 9870 & 8088 on ufw firewall.

ufw allow 9870/tcp
&
ufw allow 8088/tcp

Access Hadoop web-interface

http://server-ip:9870

Here is the output.

Fig 1

Access the Resource Manage web-interface

http://server-ip:8088

Here is the output.

Fig 2

Test the Hadoop Cluster.

Create a Directory in the HDFS filesystem.

hdfs dfs -mkdir /logs 
hdfs dfs -mkdir /example

list the directory:

hdfs dfs -ls /

Here is the command output.

Found 2 items
drwxr-xr-x   - hadoop supergroup          0 2021-07-19 15:27 /logs
drwxr-xr-x   - hadoop supergroup          0 2021-07-19 15:26 /example

Push log files from local machine to hadoop file system.

hdfs dfs -put /var/log/* /logs/

Open the Hadoop Namenode web interface.

http://server-ip:9870/explorer.html

Here is the output.

Fig. 3

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Install and Configure Hadoop on ubuntu

Install the Hadoop

Access Hadoop web-interface

Access the Resource Manage web-interface

Test the Hadoop Cluster.

Leave a Reply Cancel reply