Ubuntu 20.04 Hadoop - LinuxConfig.org

Korbin · April 18, 2020, 5:50am

Apache Hadoop is comprised of multiple open source software packages that work together for distributed storage and distributed processing of big data. There are four main components to Hadoop:

Hadoop Common - the various software libraries that Hadoop depends on to run
Hadoop Distributed File System (HDFS) - a file system that allows for efficient distribution and storage of big data across a cluster of computers
Hadoop MapReduce - used for processing the data
Hadoop YARN - an API that manages the allocation of computing resources for the entire cluster

In this tutorial, we will go over the steps to install Hadoop version 3 on Ubuntu 20.04. This will involve installing HDFS (Namenode and Datanode), YARN, and MapReduce on a single node cluster configured in Pseudo Distributed Mode, which is distributed simulation on a single machine. Each component of Hadoop (HDFS, YARN, MapReduce) will run on our node as a separate Java process.

In this tutorial you will learn:

How to add users for Hadoop Environment
How to install Java prerequisite
How to configure passwordless SSH
How to install Hadoop and configure necessary related XML files
How to start the Hadoop Cluster
How to access NameNode and ResourceManager Web UI

![](upload://6w7HOLoKuTDtEXRteNiYA53kW94.gif)

This is a companion discussion topic for the original entry at https://linuxconfig.org/ubuntu-20-04-hadoop

andrestorchi · June 18, 2020, 6:50pm

Just one consideration about URL of NameNode Web UI.
In hadoop version 3.1.3 the port has moved to 9870.
This was the only hadoop tutorial that help me to get a clean Install.
Thanks a lot.

sourabhbagrecha · December 1, 2020, 11:34am

Also one thing to note here, that the command to download hadoop might not work(atleast for me it didn’t work), resulting in a 404 error.
The quick fix for that would be to change the command for download from v3.1.3 to v3.1.4, else you can directly use the below command:

wget https://downloads.apache.org/hadoop/common/hadoop-3.1.4/hadoop-3.1.4.tar.gz

and it will work as expected. Have a great day!