Install Apache Hadoop:
Step 1: Install Oracle Java 8
https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.htmlFile Name: jdk-8u271-linux-x64.tar.gz
Google Drive URL: https://drive.google.com/drive/folders/1nFHJHDwio5_rhGc65_HcogbwSKldJ_jg?usp=sharing
cd /home/datamaking/softwares/
tar -xvzf jdk-8u271-linux-x64.tar.gz
/home/datamaking/softwares/jdk1.8.0_271
nano ~/.bashrc
export JAVA_HOME=/home/datamaking/softwares/jdk1.8.0_271
export PATH=$PATH:$JAVA_HOME/bin
source ~/.bashrc
To verify the java version you can use the following command:
java -version
Step 2: Create the SSH Key for password-less login (Press enter button when it asks you to enter a filename to save the key)
sudo apt-get install openssh-server openssh-clientssh-keygen -t rsa -P ""
Copy the generated ssh key to authorized keys
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
ssh localhost
Step 3: Download the Hadoop 3.2.1 Package/Binary file
https://hadoop.apache.org/releases.htmlwget https://mirrors.estointernet.in/apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
Move the file: hadoop-3.2.1.tar.gz into /home/datamaking/softwares directory
mv hadoop-3.2.1.tar.gz /home/datamaking/softwares
sudo tar -xzvf hadoop-3.2.1.tar.gz
Step 4: Add the HADOOP_HOME and JAVA_HOME paths in the bash file (.bashrc)
# HADOOP VARIABLES SETTINGS START HEREexport HADOOP_HOME=/home/datamaking/softwares/hadoop-3.2.1
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export HADOOP_OPTS="-Djava.library.path=$HADOOP_COMMON_LIB_NATIVE_DIR"
# HADOOP VARIABLES SETTINGS END HERE
nano ~/.bashrc
source ~/.bashrc
hadoop version
Step 5: Create or Modifiy Hadoop configuration files
Now create/edit the configuration files in /home/datamaking/softwares/hadoop-3.2.1/etc/hadoop directory.Edit hadoop-env.sh as follows,
sudo nano /home/datamaking/softwares/hadoop-3.2.1/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/home/datamaking/softwares/jdk1.8.0_271
sudo mkdir -p /home/datamaking/softwares/hadoop_data/tmp
Edit core-site.xml as follows,
sudo nano core-site.xml
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/home/datamaking/softwares/hadoop_data/tmp</value>
- <description>Parent directory for other temporary directories.</description>
- </property>
- <property>
- <name>fs.defaultFS</name>
- <value>hdfs://localhost:9000</value>
- <description>The name of the default file system. </description>
- </property>
mkdir -p /home/datamaking/softwares/hadoop_data/namenode
mkdir -p /home/datamaking/softwares/hadoop_data/datanode
sudo chown -R datamaking:datamaking /home/datamaking/softwares
Edit hdfs-site.xml as follows,
sudo nano hdfs-site.xml
<property> <name>dfs.namenode.name.dir</name> <value>/home/datamaking/softwares/hadoop_data/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/datamaking/softwares/hadoop_data/datanode</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property>
Edit mapred-site.xml as follows,
sudo nano mapred-site.xml
- <property>
- <name>mapreduce.framework.name</name>
- <value>yarn</value>
- </property>
Edit yarn-site.xml as follows,
sudo nano yarn-site.xml
- <property>
- <name>yarn.nodemanager.aux-services</name>
- <value>mapreduce_shuffle</value>
- </property>
- <property>
- <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
- <value>org.apache.hadoop.mapred.ShuffleHandler</value>
- </property>
- <property>
- <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
- <value>org.apache.hadoop.mapred.ShuffleHandler</value>
- </property>
- <property>
- <name>yarn.resourcemanager.webapp.address</name>
- <value>localhost:8088</value>
- </property>
Setting ownership for /home/datamaking/softwares
sudo chown -R datamaking:datamaking /home/datamaking/softwares
Step 6: Format the namenode
hdfs namenode -formatStart the NameNode daemon and DataNode daemon by using the scripts in the /sbin directory, provided by Hadoop.
start-dfs.sh
Start ResourceManager daemon and NodeManager daemon.
start-yarn.sh
Run jps command to check running hadoop JVM processes
jps
FYI.
datamaking@datamakingvm:~$ jps
6577 NameNode
6744 DataNode
7228 ResourceManager
7389 NodeManager
7725 Jps
6975 SecondaryNameNode
datamaking@datamakingvm:~$
Open your web browser and go to the below URL to browse the NameNode.
http://localhost:9870
Open your web browser and go to the below URL to access the ResourceManager.
http://localhost:8088
Happy Learning !!!
0 Comments