We are talking about TinkerPop3. TinkerPop is graph computing framework offering 3 parts. 3 packaged downloads includes Gremlin Console, Gremlin Server and the source distribution. TinkerPop is slightly confusing to understand to get started. Here is how to install Apache TinkerPop (Gremlin Server) with PHP client on server. Implementations of TinkerPop are Sqlg, Blaze Graph, TinkerGraph, Elastic … [Read more...]
How To Install Apache Ambari on Ubuntu 16.04 to Manage Hadoop Cluster
Apache Ambari helps in management, monitoring, provisioning of Hadoop cluster. Here is how to install Apache Ambari on Ubuntu 16.04 to manage Hadoop cluster. Ambari Server uses an embedded PostgreSQL database by default. When we install the Ambari Server, PostgreSQL packages and dependencies must be available for install. We are showing example with repo from Hortonworks for many reasons, one of … [Read more...]
How To Install Apache Pig On Ubuntu 16.04
Apache Pig is intended for analyzing large data sets. Usually we combine Pig with Hadoop. The language of Pig is Pig Latin. Apache Pig can execute Hadoop jobs in MapReduce, Apache Tez, Apache Spark. Pig Latin has similarities with SQL for relational database management. Pig Latin can be extended with scripts written in Java, Python, JavaScript, Ruby, Groovy. Here is How To Install Apache Pig On … [Read more...]
What is Data Refining in Big Data?
Most commonly new developers, particularly who are interested in data analysis face some terminologies which have more to do with theoretical and practical part of engineering and analytical sciences. The developers can be from a variety of domains and the phrases often confuses them. The question what is data refining in big data such an obvious question and answer is commonly written for those … [Read more...]
IBM Analytics Demo Cloud : Free Hadoop, Ambari With SSH
Normally we install Apache Hadoop and other big data tools on our servers. IBM Analytics Demo Cloud is intended to learn Hadoop, Ambari, BigSQL free of cost with SSH access & web console. With various cloud offerings many things these days have a free usage tier like we shown with unrelated things - Heroku and OpenShift PaaS. Here is how to get started with non-root access to this system. In … [Read more...]
Install Apache Mahout : Ubuntu 16.04 For Machine Learning Dev
Apache Mahout is a simple programming environment and also a framework for building algorithms for Scala, Apache Spark, H2O, Apache Flink and so on. Samsara is part of Mahout, an experimentation environment with R like syntax. Here is how to install Apache Mahout on Ubuntu 16.04 for machine learning development. This guide will show commands to give the correct idea not exact commands to copy … [Read more...]
How To Install Apache NiFi On Ubuntu 16.04 LTS
The original project behind Apache NiFi was created by NSA! Of course it will be efficient in tracking and can have vulnerabilities, at least older than 1.3 versions had. Version 1.3.0 is latest at the time of publication of this guide. At present NSA has nothing to do with Apache NiFi and it has been another Big Data tool in Apache Software Foundation's collection. It can have vulnerabilities. … [Read more...]
List Of Open Source Big Data Visualization Tools
We have noticed that there are some growing number of websites which write about Big Data, cloud computing and spread wrong information to sell some others paid things. They can not write guides like how to install Spark, Elastic Search, Kibana. Here is list of open source big data visualization tools which will convert data in to graphs, charts usable with tools like Spark, Hadoop. Do not use … [Read more...]
How To Install Apache Mesos With Marathon On Ubuntu 16.04 LTS
In a previously published article we introduced to multi cloud. In that article we gave example of managing SendGrid from WordPress dashboard using SendGrid WordPress plugin. Mesosphere is a software that expands the cluster management capabilities of Apache Mesos with additional components to manage server infrastructures. By combining several components with Mesos such as Marathon and Chronos, … [Read more...]
Install Apache Zeppelin On Ubuntu 16.04
Apache Zeppelin is a web-based notebook for interactive data analytics. We already described how to install Apache Spark. If you have followed our any guide on big data tools installation, it is few minutes work to install and use Zeppelin. Zeppelin can be pre-built package or can be build from source. Here is how to install apache zeppelin on ubuntu 16.04 building from source. What … [Read more...]
Join/Merge Multiple Log Files For Big Data Analysis
Normally log automatically should grow as big possible. But we use logrotate which does the opposite action as running cat and other tools on SSH over big files consumes memory, time and often difficult to do what we want. Also, old files are compressed. Here are the ways to join/merge multiple log files for big data analysis, store them to openstack based cloud storage and delete old … [Read more...]