Install Apache Hadoop on Ubuntu on Single Cloud Server Instance

Abhishek Ghosh

By Abhishek Ghosh January 21, 2017 6:46 am Updated on January 21, 2017

Install Apache Hadoop on Ubuntu on Single Cloud Server Instance

Previously, we talked about Apache Hadoop Framework. Here is How Install Apache Hadoop on Ubuntu on Single Cloud Server Instance in Stand-Alone Mode With Minimum System Requirement and Commands. Apache Hadoop is designed to run on standard dedicated hardware that provides the best balance of performance and economy for a given workload.

Where I Will Install Apache Hadoop?

For cluster, 2 quad core, hexacore upwards CPUs running at least 2GHz with 64GB of RAM is expected. We are installing as Single Node Cluster. Minimum 6-8 RAM on virtual instance is practical. You can try VPSDime 6GB OpenVZ instance at $7/month. Now, Hadoop is written in Java and OpenVZ is not exactly great for running Java applications. The host can kick out you if you whip their machine to have higher load average. If you want VMWare, then Aruba Cloud is cost effective and great. You can do testing, learning work on OpenVZ but it is not practical to run high load work with OpenVZ.

Steps To Install Apache Hadoop on Ubuntu on Single Cloud Server Instance

We will install a single-node Hadoop cluster on Ubuntu 16.04 LTS. First prepare :

cd ~
apt update
apt upgrade
apt install default-jdk

1

2

3

4

cd ~

apt update

apt upgrade

apt install default-jdk

jdk or OpenJDK is the default Java Development Kit on Ubuntu 16.04. Now check the java version :

java -version

1	java -version

Sample output :

openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-3ubuntu1~16.04.1-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)

1

2

3

openjdk version "1.8.0_91"

OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-3ubuntu1~16.04.1-b14)

OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)

We will create a group named hadoop and add a user named hduser :

sudo addgroup hadoop
sudo adduser --ingroup hadoop hduser

1 2	sudo addgroup hadoop sudo adduser --ingroup hadoop hduser

Next we will install extra softwares, use SSH as hduser, generate key, setup password less SSH for hduser on localhost :

apt install ssh rsync
su hduser
ssh-keygen -t rsa -P ""
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
ssh localhost
su k
sudo adduser hduser sudo

1

2

3

4

5

6

7

apt install ssh rsync

su hduser

ssh-keygen -t rsa -P ""

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

ssh localhost

su k

sudo adduser hduser sudo

Here are releases of Apache Hadoop :

http://hadoop.apache.org/releases.html
https://dist.apache.org/repos/dist/release/hadoop/common/

1 2	http://hadoop.apache.org/releases.html https://dist.apache.org/repos/dist/release/hadoop/common/

Apache Hadoop 2.7.3 is the latest stable at the time of publishing this guide. We will do these steps :

wget https://dist.apache.org/repos/dist/release/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz
tar xvzf hadoop*
rm hadoop-2.7.3.tar.gz
cd hadoop-2.7.3
sudo mv * /usr/local/hadoop
sudo chown -R hduser:hadoop /usr/local/hadoop

1

2

3

4

5

6

wget https://dist.apache.org/repos/dist/release/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz

tar xvzf hadoop*

rm hadoop-2.7.3.tar.gz

cd hadoop-2.7.3

sudo mv * /usr/local/hadoop

sudo chown -R hduser:hadoop /usr/local/hadoop

/usr/bin/java is a symlink to /etc/alternatives/java which is a symlink to default Java binary. We need the correct value for JAVA_HOME :

readlink -f /usr/bin/java | sed "s:bin/java::"

1	readlink -f /usr/bin/java \| sed "s:bin/java::"

If the output is :

/usr/lib/jvm/java-8-openjdk-amd64/jre/

1	/usr/lib/jvm/java-8-openjdk-amd64/jre/

then we should open :

nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh

1	nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh

and adjust :

#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/

1 2	#export JAVA_HOME=${JAVA_HOME} export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/

Now if we run :

/usr/local/hadoop/bin/hadoop

1	/usr/local/hadoop/bin/hadoop

We will get output like :

Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
  CLASSNAME            run the class named CLASSNAME

1 2	Usage: hadoop [--config confdir] [COMMAND \| CLASSNAME] CLASSNAME run the class named CLASSNAME

Up to This Step is Minimum, Basic Apache Hadoop on Ubuntu on Single Cloud Server Instance Setup. It means Hadoop is ready to be configured.

Configuring Apache Hadoop

We need to modify the following files to get a complete Apache Hadoop setup:

~/.bashrc
/usr/local/hadoop/etc/hadoop/hadoop-env.sh
/usr/local/hadoop/etc/hadoop/core-site.xml
/usr/local/hadoop/etc/hadoop/mapred-site.xml.template
/usr/local/hadoop/etc/hadoop/hdfs-site.xml

1

2

3

4

5

~/.bashrc

/usr/local/hadoop/etc/hadoop/hadoop-env.sh

/usr/local/hadoop/etc/hadoop/core-site.xml

/usr/local/hadoop/etc/hadoop/mapred-site.xml.template

/usr/local/hadoop/etc/hadoop/hdfs-site.xml

Run :

update-alternatives --config java
nano ~/.bashrc

1 2	update-alternatives --config java nano ~/.bashrc

Add these :

#HADOOP START
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
#HADOOP END

1

2

3

4

5

6

7

8

9

10

11

12

#HADOOP START

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

export HADOOP_INSTALL=/usr/local/hadoop

export PATH=$PATH:$HADOOP_INSTALL/bin

export PATH=$PATH:$HADOOP_INSTALL/sbin

export HADOOP_MAPRED_HOME=$HADOOP_INSTALL

export HADOOP_COMMON_HOME=$HADOOP_INSTALL

export HADOOP_HDFS_HOME=$HADOOP_INSTALL

export YARN_HOME=$HADOOP_INSTALL

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native

export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"

#HADOOP END

Save the file. Run :

javac -version
which javac
readlink -f /usr/bin/javac

1

2

3

javac -version

which javac

readlink -f /usr/bin/javac

Note the values. /usr/bin/javac is from output of which javac command. Run :

nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh

1	nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh

Modify :

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

1	export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

The above is from previous outputs. Do not blindly copy-paste. Save the file. Now do these :

mkdir -p /app/hadoop/tmp
sudo chown hduser:hadoop /app/hadoop/tmp

1 2	mkdir -p /app/hadoop/tmp sudo chown hduser:hadoop /app/hadoop/tmp

Open :

nano /usr/local/hadoop/etc/hadoop/core-site.xml

1	nano /usr/local/hadoop/etc/hadoop/core-site.xml

Modify :

<configuration>
 <property>
  <name>hadoop.tmp.dir</name>
  <value>/app/hadoop/tmp</value>
  <description>A base for other temporary directories.</description>
 </property>

 <property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
 </property>
</configuration>

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

<name>hadoop.tmp.dir</name>

<value>/app/hadoop/tmp</value>

<description>A base for other temporary directories.</description>

</property>

<name>fs.default.name</name>

<value>hdfs://localhost:54310</value>

<description>The name of the default file system. A URI whose

scheme and authority determine the FileSystem implementation. The

uri's scheme determines the config property (fs.SCHEME.impl) naming

the FileSystem implementation class. The uri's authority is used to

determine the host, port, etc. for a filesystem.</description>

</property>

</configuration>

Run :

cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
nano /usr/local/hadoop/etc/hadoop/mapred-site.xml
<pre>


Modify :

<pre title="/usr/local/hadoop/etc/hadoop/mapred-site.xml">
<configuration>
 <property>
  <name>mapred.job.tracker</name>
  <value>localhost:54311</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
 </property>
</configuration>

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml

nano /usr/local/hadoop/etc/hadoop/mapred-site.xml

<pre>

Modify :

<name>mapred.job.tracker</name>

<value>localhost:54311</value>

<description>The host and port that the MapReduce job tracker runs

at. If "local", then jobs are run in-process as a single map

and reduce task.

</description>

</property>

</configuration>

Run :

mkdir -p /usr/local/hadoop_store/hdfs/namenode
mkdir -p /usr/local/hadoop_store/hdfs/datanode
sudo chown -R hduser:hadoop /usr/local/hadoop_store

1

2

3

mkdir -p /usr/local/hadoop_store/hdfs/namenode

mkdir -p /usr/local/hadoop_store/hdfs/datanode

sudo chown -R hduser:hadoop /usr/local/hadoop_store

Open :

nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml

1	nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml

Modify :

<configuration>
 <property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
 </property>
 <property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/usr/local/hadoop_store/hdfs/namenode</value>
 </property>
 <property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/usr/local/hadoop_store/hdfs/datanode</value>
 </property>
</configuration>

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

<name>dfs.replication</name>

<description>Default block replication.

The actual number of replications can be specified when the file is created.

The default is used if replication is not specified in create time.

</description>

</property>

<name>dfs.namenode.name.dir</name>

<value>file:/usr/local/hadoop_store/hdfs/namenode</value>

</property>

<name>dfs.datanode.data.dir</name>

<value>file:/usr/local/hadoop_store/hdfs/datanode</value>

</property>

</configuration>

Try to run :

cd ~
hadoop namenode -format

1 2	cd ~ hadoop namenode -format

Above command must be executed before we start using Hadoop. Basically the commands are for real physical server. You can read this guide :

https://wiki.apache.org/hadoop/Virtual%20Hadoop

1	https://wiki.apache.org/hadoop/Virtual%20Hadoop

The last command can fail for a given host-virtualisation technology. For that reason, in last step we will show how to use the bundled MapReduce program. If the above fails, you can use in that way. As you are new user with limited budget, we tried to emulate physical servers for learning plus offer a universal working example.

Install Apache Hadoop on Ubuntu on Single Cloud Server Instance

Now, we can use as from as fresh SSH :

sudo su hduser
cd /usr/local/hadoop/sbin && ls
start-all.sh

1

2

3

sudo su hduser

cd /usr/local/hadoop/sbin && ls

start-all.sh

Actually on localhost you can browse to :

http://localhost:50070/

1	http://localhost:50070/

You need to adjust the localhost to fully qualified domain name to really see. We have successfully configured Hadoop to run in stand-alone mode. We will run the example MapReduce program. Run :

mkdir ~/input
cp /usr/local/hadoop/etc/hadoop/*.xml ~/input
/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep ~/input ~/grep_example 'principal[.]*'

1

2

3

mkdir ~/input

cp /usr/local/hadoop/etc/hadoop/*.xml ~/input

/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep ~/input ~/grep_example 'principal[.]*'

More not possible to write on this guide, you may read here :

https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html

1	https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html

Tagged With download apache hadoop , hadoop ubuntu , install apache hadoop in ubuntu , install apache hadoop ubuntu , install hadoop on ubuntu standalone , mv server to server hadoop , paperuri:(e6e446524564c08f94460d43ef725edb) , ubuntu install hadoop , vim ~/ bashrc export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

About Abhishek Ghosh

Here’s what we’ve got for you which might like :

Take The Conversation Further ...

Get new posts by email:

Where I Will Install Apache Hadoop?

Steps To Install Apache Hadoop on Ubuntu on Single Cloud Server Instance

Configuring Apache Hadoop

About Abhishek Ghosh

Here’s what we’ve got for you which might like :

Articles Related to Install Apache Hadoop on Ubuntu on Single Cloud Server Instance

Take The Conversation Further ...

Get new posts by email: