Apache Hadoop is comprised of multiple open source software packages that work together for distributed storage and distributed processing of big data. There are four main components to Hadoop:

  • Hadoop Common - the various software libraries that Hadoop depends on to run
  • Hadoop Distributed File System (HDFS) - a file system that allows for efficient distribution and storage of big data across a cluster of computers
  • Hadoop MapReduce - used for processing the data
  • Hadoop YARN - an API that manages the allocation of computing resources for the entire cluster

In this tutorial, we will go over the steps to install Hadoop version 3 on Ubuntu 20.04. This will involve installing HDFS (Namenode and Datanode), YARN, and MapReduce on a single node cluster configured in Pseudo Distributed Mode, which is distributed simulation on a single machine. Each component of Hadoop (HDFS, YARN, MapReduce) will run on our node as a separate Java process.

In this tutorial you will learn:
  • How to add users for Hadoop Environment
  • How to install Java prerequisite
  • How to configure passwordless SSH
  • How to install Hadoop and configure necessary related XML files
  • How to start the Hadoop Cluster
  • How to access NameNode and ResourceManager Web UI
Just one consideration about URL of NameNode Web UI.
In hadoop version 3.1.3 the port has moved to 9870.
This was the only hadoop tutorial that help me to get a clean Install.
Thanks a lot.