How to install Apache Hadoop 3 on Ubuntu 18.04.5 | Step By Step | Part 3

Install Apache Hadoop:

Step 1: Install Oracle Java 8

https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

File Name: jdk-8u271-linux-x64.tar.gz

Google Drive URL: https://drive.google.com/drive/folders/1nFHJHDwio5_rhGc65_HcogbwSKldJ_jg?usp=sharing

cd /home/datamaking/softwares/

tar -xvzf jdk-8u271-linux-x64.tar.gz

/home/datamaking/softwares/jdk1.8.0_271

nano ~/.bashrc

export JAVA_HOME=/home/datamaking/softwares/jdk1.8.0_271
export PATH=$PATH:$JAVA_HOME/bin

source ~/.bashrc

To verify the java version you can use the following command:

java -version

Step 2: Create the SSH Key for password-less login (Press enter button when it asks you to enter a filename to save the key)

sudo apt-get install openssh-server openssh-client

ssh-keygen -t rsa -P ""

Copy the generated ssh key to authorized keys

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

ssh localhost

Step 3: Download the Hadoop 3.2.1 Package/Binary file

https://hadoop.apache.org/releases.html

wget https://mirrors.estointernet.in/apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz

Move the file: hadoop-3.2.1.tar.gz into /home/datamaking/softwares directory

mv hadoop-3.2.1.tar.gz /home/datamaking/softwares

sudo tar -xzvf hadoop-3.2.1.tar.gz

Step 4: Add the HADOOP_HOME and JAVA_HOME paths in the bash file (.bashrc)

# HADOOP VARIABLES SETTINGS START HERE
export HADOOP_HOME=/home/datamaking/softwares/hadoop-3.2.1
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export HADOOP_OPTS="-Djava.library.path=$HADOOP_COMMON_LIB_NATIVE_DIR"
# HADOOP VARIABLES SETTINGS END HERE

nano ~/.bashrc

source ~/.bashrc

hadoop version

Step 5: Create or Modifiy Hadoop configuration files

Now create/edit the configuration files in /home/datamaking/softwares/hadoop-3.2.1/etc/hadoop directory.

Edit hadoop-env.sh as follows,

sudo nano /home/datamaking/softwares/hadoop-3.2.1/etc/hadoop/hadoop-env.sh

export JAVA_HOME=/home/datamaking/softwares/jdk1.8.0_271

sudo mkdir -p /home/datamaking/softwares/hadoop_data/tmp

Edit core-site.xml as follows,

sudo nano core-site.xml

  1. <property>
  2. <name>hadoop.tmp.dir</name>
  3. <value>/home/datamaking/softwares/hadoop_data/tmp</value>
  4. <description>Parent directory for other temporary directories.</description>
  5. </property>
  6. <property>
  7. <name>fs.defaultFS</name>
  8. <value>hdfs://localhost:9000</value>
  9. <description>The name of the default file system. </description>
  10. </property>


mkdir -p /home/datamaking/softwares/hadoop_data/namenode

mkdir -p /home/datamaking/softwares/hadoop_data/datanode

sudo chown -R datamaking:datamaking /home/datamaking/softwares

Edit hdfs-site.xml as follows,

sudo nano hdfs-site.xml

	<property>
		<name>dfs.namenode.name.dir</name>
		<value>/home/datamaking/softwares/hadoop_data/namenode</value>
	</property>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>/home/datamaking/softwares/hadoop_data/datanode</value>
	</property>
	<property>
		<name>dfs.replication</name>
		<value>1</value>
	</property>


Edit mapred-site.xml as follows,

sudo nano mapred-site.xml

  1. <property>
  2. <name>mapreduce.framework.name</name>
  3. <value>yarn</value>
  4. </property>


Edit yarn-site.xml as follows,

sudo nano yarn-site.xml

  1. <property>
  2. <name>yarn.nodemanager.aux-services</name>
  3. <value>mapreduce_shuffle</value>
  4. </property>
  5. <property>
  6. <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  7. <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  8. </property>
  9. <property>
  10. <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  11. <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  12. </property>
  13. <property>
  14. <name>yarn.resourcemanager.webapp.address</name>
  15. <value>localhost:8088</value>
  16. </property>


Setting ownership for /home/datamaking/softwares

sudo chown -R datamaking:datamaking /home/datamaking/softwares

Step 6: Format the namenode

hdfs namenode -format

Start the NameNode daemon and DataNode daemon by using the scripts in the /sbin directory, provided by Hadoop.

start-dfs.sh

Start ResourceManager daemon and NodeManager daemon.

start-yarn.sh

Run jps command to check running hadoop JVM processes

jps

FYI.

datamaking@datamakingvm:~$ jps
6577 NameNode
6744 DataNode
7228 ResourceManager
7389 NodeManager
7725 Jps
6975 SecondaryNameNode
datamaking@datamakingvm:~$

Open your web browser and go to the below URL to browse the NameNode.

http://localhost:9870

Open your web browser and go to the below URL to access the ResourceManager.

http://localhost:8088

Happy Learning !!!

Post a Comment

0 Comments