Configuring Hadoop Cluster Using Ansible Playbook(11.1)

Lakshmi Priya Patro
4 min readJan 17, 2022

What is Hadoop?

Hadoop is an open-source distributed processing framework that manages data processing and storage for big data applications in scalable clusters of computer servers.

Hadoop systems can handle various forms of structured and unstructured data, giving users more flexibility for collecting, processing, analyzing, and managing data than relational databases and data warehouses provide.

What is Hadoop Storage Cluster?

A Hadoop cluster is a collection of computers, known as nodes, that are networked together to perform these kinds of parallel computations on big data sets.

Unlike other computer clusters, Hadoop clusters are designed specifically to store and analyze mass amounts of structured and unstructured data in a distributed computing environment.

Hadoop clusters consist of a network of connected master and slave nodes that utilize high availability, low-cost commodity hardware. The ability to linearly scale and quickly add or subtract nodes as volume demands make them well-suited to big data analytics jobs with data sets highly variable in size.

Now that we know what is Hadoop and Hadoop cluster, we can start off with the hands-on part.

The below image is of Hadoop Datanode which is one of our target nodes

The below image is of Hadoop Namenode which is another target node

Step-1: First of all, we have to check whether the ansible is installed in its stable version or not

Step-2: Create a file named ip.txt to store the IP Address of both the Hadoop namenode and datanode

Step-3: Create an ansible configuration file under the ansible directory to configure and store the necessary paths of the files

Step-4: To check the connectivity between the Hadoop namenode and datanode use the following command

Step-5: Check whether required java and Hadoop packages are available in both the namenode and datanode

Step-6: Run the ansible playbook created for the namenode i.e. namenode.yml

Step-7: Run the ansible playbook created for the datanode i.e. datanode.yml

Step-9: Now check out the status of the Hadoop datanode

Step-10: After running the ansible playbook for datanode check out the configuration files i.e. core-site.xml and hdfs-site.xml

Step-11: Now check out the status of the Hadoop datanode

Step-12: After running the ansible playbook for datanode check out the configuration files i.e. core-site.xml and hdfs-site.xml

Finally, the cluster is up and running and there is one datanode available as we wanted.

Below is the GitHub link where I have added the necessary files i.e. namenode.yml, datanode.yml, and ansible.cfg file.

https://github.com/priya231299/AnsibleHadoopCluster

Thanks for Reading!!

--

--