Hadoop is the pioneer platform which started the revolution of Big Data. Hadoop is one of the software used as an example of how big data works are handled. Hadoop is still used by many companies to store their data. The field of big data is rapidly progressing giving birth to cheap alternatives. So, our guides such as how to install Hadoop may not be useful in professional life. Presently, Hadoop became sidelined. In the earliest days, Hadoop was synonymous with big data. If someone had a big data problem, then the answer was Hadoop. Today, custom solutions with Spark, Flink and Kafka are used. Slowly, it was realized that Hadoop is not a stable eco-system and the skills needed to manage Hadoop is not that easy to find.
The Hadoop Complexities
Hadoop was an innovation. Map-reduce allowed fault-tolerant storage and running queries on big-data. It was pushed as the replacement of the traditional RDBMS tools. Unfortunately, sooner big data became a “common matter” in various use-cases due to an increase in smartphone users and higher penetration of the internet. Java-based data administrators are still short than need. Various kind of database started to evolve giving birth to the NewSQL. Other in-memory based storage for high-performance analytics and cloud-based SQL databases also started to provide fault-tolerant distributed computation at better price and without hassle of administration. Hadoop naturally reflected Yahoo specific way of software development. It is questionable whether Hadoop way is DevOps friendly. Once there were reasons to separate the IT infrastructure into operational (OLTP) and analytical (OLAP) components. But today that thing does not work.
Hadoop was neither directly compatible with most of the other software, nor it is compatible with newer software. There is no easy way to “port” server log from a live website and run an analysis. Hadoop clusters became the gateways of enterprise data pipelines to process, and transform data for the other databases and data marts. This segment failed in real business applications.
Future of Hadoop
Hadoop or a rather Hadoop ecosystem will not die. It is towards a legacy system.
Hadoop is becoming modularized. Some of its components have future. Hadoop undergoing the adaption we perform with any traditional stack. The growth of object storage with various technologies made the analytic tools more usable. Now, it is easy to run Spark in place of MapReduce. Many peoples are running Spark without YARN.
HDFS is guessed to not have a future. Hadoop way is more expensive than object storage but not fast for the real-time applications and the total cost is three times more. MapReduce, YARN, HDFS does not have any future.