Apache Mahout is a simple programming environment and also a framework for building algorithms for Scala, Apache Spark, H2O, Apache Flink and so on. Samsara is part of Mahout, an experimentation environment with R like syntax. Here is how to install Apache Mahout on Ubuntu 16.04 for machine learning development. This guide will show commands to give the correct idea not exact commands to copy paste on terminal. Because it is not installing LAMP and WordPress that users need version specific commands. Apache Mahout has practically no usable guide for a new to get started. Purpose of this guide
There are lot of resources around Mahout :
Install Apache Mahout : Literal Meaning Is Not Bright
Mahout, Samsara are Sanskrit derived words. Mahout derived from Sanskrit word mahamatra. In Bengali, Hindi language etc languages Mahout indicates the poor peoples who are used to keep like chauffeurs – trainer, keeper, cleaner, feeder. It is a family profession in Indian subcontinents. Following standard European culture we would name some software as “chauffeur”. Softwares are named after fruits, flowers etc. If you search with India Mahout or India Mahoot, you’ll see real example human mahout in 2017. Fool humans performing manual work has no credit of intelligence. We should respect all but politeness often confused with weakness. Samsara means family. As example I, my parents, pets are our “samsara”. If someone asks me “when you’ll do samsara” that means in gentleman’s language the person is asking when I will be wedding. After my marriage I, my parents, my wife, pets will be our “samsara”. Then my wife will shout – “Look after our samsara, writing blogs for peoples like King”. Pet elephant, horse, parakeet, cat, dog all behave like humans who are closer to them – the owners. We do not “tame” them. They are domesticated. Apache ML would work fine as name. A rose by any other name would smell as sweet.Advertisement
Disagreeable, disrespectful nomenclature. Who named need to be thrown in front of angry pet elephants to make the basic understand.
Install Apache Mahout : Steps
We need Java, Maven, Subversion, Git at minimum to build or install Apache Mahout. Install Subversion, Git yourself from repo.
Hadoop and/or spark are not basic requirements to run Apache Mahout, some algorithms may run on a single server. But the algorithms are related to Hadoop and/or spark. Spark based algorithms are encouraged to test. So it is practical to build or install on existing Hadoop, Spark based Big Data platform like we have guide to install Apache Spark.
Java needed to run Hadoop, Hadoop is used by Mahout, MVN is common.
In order to install Oracle Java, go to official web page for latest. Prototype commands are shown :
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
## sudo apt-get install oracle-java9-installer
sudo update-alternatives --config java
# copy the path
# Add at the end of this file
## example for JDK for 1.5.0_07 for all users
# sudo cp jdk-7u45-linux-x64.gz /usr/local/lib/
# sudo tar -xzvf jdk-7u45-linux-x64.gz
# export PATH=$PATH:/usr/java/jdk1.5.0_07/bin
# export PATH=$PATH:/usr/java/jdk1.5.0_07/bin
Also you can add path
~/.bashrc like we did with Java for out Big Data tutorials.
We need :
java version "ABCD"
Java(TM) SE Runtime Environment (build ABCD)
Java HotSpot(TM) 64-Bit Server VM (build XYZ, mixed mode)
Next, we need to install current version of Maven. wget to download and tar -xzvf to unpack. You can read our other guides from links above. Ultimately you will add path
~/.bashrc like :
## old versions
# export M2_HOME=/usr/local/apache-maven-3.0.4
# export M2=$M2_HOME/bin
# export PATH=$M2:$PATH
# export JAVA_HOME=$HOME/programs/jdk
Now installing Apache Mahout step. In this repo, the readme :
took it granted that all will know what they are working with – they started after what we written above. I got misguided, installed Hadoop and Spark for Apache Mahout on colocation server.
git clone https://github.com/apache/mahout.git mahout
# edit your environment in ~/.bash_profile or ~/.bashrc
# for running on standalone server
Also you can install in this way when Hadoop, Maven installed :
unzip -a mahout-distribution-x.x-src.zip
mv mahout /usr/local
bin core examples LICENSE.txt math-scala pom.xml src buildtools distribution integration math NOTICE.txt README.txt target
There is also Cloudera package, that is installed from repo :
apt-get install mahout
I hope you got some idea to install Apache Mahout. I myself forgot the exact steps I did among so many commands. If you are having problems, check the logs in the logs directory to see if there are any Hadoop errors or Java Exceptions. Errors at the beginning not uncommon.