Integration of Apache Hadoop With OpenStack Swift

Abhishek Ghosh

By Abhishek Ghosh May 14, 2017 1:27 am Updated on May 14, 2017

Integration of Apache Hadoop With OpenStack Swift

Commonly either peoples are versed with Apache Hadoop or Openstack Swift. The topic integration of Apache Hadoop with Openstack Swift is not exactly new. Good experience with both together may be rare. You can follow our this guide specially for handling OpenStack part without searching here and there. Further you can use this website’s search function to find our old guides apart from the linked articles.

Integration of Apache Hadoop With OpenStack Swift : Foreword

You must know what you are doing. Hadoop file system widely used with HDFS, but most of them are not built to work out of the box with object storage. It is not odd to get odd behaviour as response. OpenStack installations can differ from vendor to vendor. We have good number of guides on Rackspace and HP Cloud fully separately. Access can be API based or username password based.

Integration of Apache Hadoop With OpenStack Swift

We guess that you already have Apache Hadoop up and running, if not please follow our guide for installation and setup of Apache Hadoop on single server instance.

Coming to OpenStack Swift, it has client which we described on earlier series of guides like Installation and Setup of OpenStack Python Packages, Uploading to a Swift Container (HP Cloud), Emptying a Swift Container (HP Cloud), mount OpenStack Swift on Ubuntu server (Rackspace) and so an.

After getting used with ordinary files with OpenStack Swift, you can follow Apache Hadoop’s official guide :

https://hadoop.apache.org/docs/current2/hadoop-openstack/index.html

1	https://hadoop.apache.org/docs/current2/hadoop-openstack/index.html

In the same way, OpenStack Swift has official guide :

https://docs.openstack.org/developer/sahara/userdoc/hadoop-swift.html

1	https://docs.openstack.org/developer/sahara/userdoc/hadoop-swift.html

Regardless of cloud vendor, you’ll need these to configure Apache products including Hadoop, Spark to access :

username
region for your container
authorization URL
API key OR password

Please remember that we are not talking about tenant name. Above linked guides have easy integration of those basic things with bash or ZSH. To configure Hadoop for Swift, at location /usr/share/hadoop/etc/hadoop, you’ll find hadoop-env.sh. You need to add a line in this format :

export HADOOP_CLASSPATH=/usr/share/hadoop/share/hadoop/tools/lib/hadoop-openstack-VERSION.jar:/usr/share/hadoop/share/hadoop/tools/lib/httpclient-x.y.z.jar:/usr/share/hadoop/share/hadoop/tools/lib/httpcore-x.y.z.jar:$HADOOP_CLASSPATH

1	export HADOOP_CLASSPATH=/usr/share/hadoop/share/hadoop/tools/lib/hadoop-openstack-VERSION.jar:/usr/share/hadoop/share/hadoop/tools/lib/httpclient-x.y.z.jar:/usr/share/hadoop/share/hadoop/tools/lib/httpcore-x.y.z.jar:$HADOOP_CLASSPATH

Another file is usr/share/hadoop/etc/hadoop/core-site.xml, swift_test is our example name :

<property>
 <name>fs.swift.service.swift_test.auth.url</name>
 <value>https://identity.vendor.openstack.replace.this.url/v2.0/tokens</value>
 <description>VendorName US (multiregion)</description>
</property>
 
<property>
 <name>fs.swift.service.swift_test.username</name>
 <value>OS_USER</value>
</property>
 
<property>
 <name>fs.swift.service.swift_test.region</name>
 <value>OS_REGION</value>
</property>
 
<property>
 <name>fs.swift.service.swift_test.apikey</name>
 <value>OS_APIKEY</value>
</property>

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

<name>fs.swift.service.swift_test.auth.url</name>

<value>https://identity.vendor.openstack.replace.this.url/v2.0/tokens</value>

<description>VendorName US (multiregion)</description>

</property>

<name>fs.swift.service.swift_test.username</name>

</property>

<name>fs.swift.service.swift_test.region</name>

<value>OS_REGION</value>

</property>

<name>fs.swift.service.swift_test.apikey</name>

<value>OS_APIKEY</value>

</property>

In those official guides you’ll find another variable in case you are running Hadoop outside of OpenStack Swift provider’s datacenter. We already talked about installation of Apache Spark with Hadoop. For, Spark you need to add these lines to /usr/share/spark/conf/spark-env.sh .

export SPARK_DIST_CLASSPATH=$(/usr/share/hadoop/bin/hadoop classpath)
export HADOOP_CONF_DIR=/usr/share/hadoop/etc/hadoop

1 2	export SPARK_DIST_CLASSPATH=$(/usr/share/hadoop/bin/hadoop classpath) export HADOOP_CONF_DIR=/usr/share/hadoop/etc/hadoop

Tagged With paperuri:(65a52f45cf9a751247a26023d6ed10c0)

About Abhishek Ghosh

Here’s what we’ve got for you which might like :

Take The Conversation Further ...

Get new posts by email:

Integration of Apache Hadoop With OpenStack Swift : Foreword

Integration of Apache Hadoop With OpenStack Swift

About Abhishek Ghosh

Here’s what we’ve got for you which might like :

Articles Related to Integration of Apache Hadoop With OpenStack Swift

Take The Conversation Further ...

Get new posts by email: