Configuring Hadoop Cluster Using Ansible Playbook(11.1)
What is Hadoop?
Hadoop is an open-source distributed processing framework that manages data processing and storage for big data applications in scalable clusters of computer servers.
Hadoop systems can handle various forms of structured and unstructured data, giving users more flexibility for collecting, processing, analyzing, and managing data than relational databases and data warehouses provide.
What is Hadoop Storage Cluster?
A Hadoop cluster is a collection of computers, known as nodes, that are networked together to perform these kinds of parallel computations on big data sets.
Unlike other computer clusters, Hadoop clusters are designed specifically to store and analyze mass amounts of structured and unstructured data in a distributed computing environment.
Hadoop clusters consist of a network of connected master and slave nodes that utilize high availability, low-cost commodity hardware. The ability to linearly scale and quickly add or subtract nodes as volume demands make them well-suited to big data analytics jobs with data sets highly variable in size.
Now that we know what is Hadoop and Hadoop cluster, we can start off with the hands-on part.
The below image is of Hadoop Datanode which is one of our target nodes
The below image is of Hadoop Namenode which is another target node
Step-1: First of all, we have to check whether the ansible is installed in its stable version or not
Step-2: Create a file named ip.txt to store the IP Address of both the Hadoop namenode and datanode
Step-3: Create an ansible configuration file under the ansible directory to configure and store the necessary paths of the files
Step-4: To check the connectivity between the Hadoop namenode and datanode use the following command
Step-5: Check whether required java and Hadoop packages are available in both the namenode and datanode
Step-6: Run the ansible playbook created for the namenode i.e. namenode.yml
Step-7: Run the ansible playbook created for the datanode i.e. datanode.yml
Step-9: Now check out the status of the Hadoop datanode
Step-10: After running the ansible playbook for datanode check out the configuration files i.e. core-site.xml and hdfs-site.xml
Step-11: Now check out the status of the Hadoop datanode
Step-12: After running the ansible playbook for datanode check out the configuration files i.e. core-site.xml and hdfs-site.xml
Finally, the cluster is up and running and there is one datanode available as we wanted.
Below is the GitHub link where I have added the necessary files i.e. namenode.yml, datanode.yml, and ansible.cfg file.
Thanks for Reading!!