Normally we write guides on how to install using official Apache repository (or their Github repo) for Hadoop, Spark etc. If you are new in data science/big data, it is normal to ask yourself why Hadoop distributions like from Cloudera, Hortonworks exist?. Well, this matter is not unique for Big Data tools. This vendor matter starts from MySQL – original MySQL, MariaB, Persona MySQL.
Why Hadoop Distributions Like From Cloudera, Hortonworks Exist?
Vendor specific distributions usually harden, test, and certify, fix bugs, add combinations of ecosystem components which can be a benefit, depending upon familiarity with Apache Hadoop and release processes. There are differences among each of them. One need to look through each vendor and consider if such differentiators are useful for the purpose the person is going to use some Hadoop like Big Data software.
Original Hadoop was designed as a simple storage infrastructure. We really do not use that original Hadoop even when we use from Apache’s mirrors. Hadoop has evolved through many years to expand beyond mere indexing capacity. Google’s MapReduce changed Hadoop to store and process large amounts and variety of data. The basic services provided by the major Vendors like Hortonworks, Cloudera are same. All are enterprise ready Hadoop along with higher stability, safety. Cloudera and Hortonwork look at data warehouses in different manner.
Hadoop by no means intended to be used as an out-of-the-box solution. Possibly no server software ever can be used out-of-the-box. To build a truly useful software for an enterprise, where decisions are data based, the optimised software is a need. Cloudera and Hortonworks have more similarities between them :
- Both are enterprise grade Hadoop distributions which ensures security and stability.
- Both have good number of user base and communities.
- Distributions from both have master-slave architecture.
- Both of them support MapReduce and YARN.
Generally, using the official vanilla version of Hadoop is like using a default Linux distribution. Obviously there is difference between vendors and need of preferring one over the other for particular need. Our previous guide on how To install Hue on Ubuntu 16.04 actually better with Cloudera’s Hadoop.