In Different Way – Why Peoples Use Elasticsearch When Hadoop, Spark Exists? It is not exactly foolish to ask to talk about Apache Hadoop, Spark Vs. Elasticsearch/ELK Stack. The Apache Lucene project develops open-source search software, including Lucene Core, Solr and PyLucene. Elasticsearch is based on Apache Lucene. Apache Hadoop based on Apache Hadoop and on concepts of BigTable. One is search engine and another is Wide column store by database model. If this part is understood, rest resemblance actually helps to choose the right software.
Apache Hadoop, Spark Vs. Elasticsearch/ELK Stack
Apache Hadoop, Spark and ElasticSearch does have some overlap in some usages. That is, essentially out of the result of where every framework wants to provide glimpse of Big Data and as a result various technologies are blurring and becoming confusing too. Hadoop/Spark can store JSON files in HDFS for analysis and processing and ElasticSearch can also store JSON files for searching and faceted search. Each tool still have a niche where they are best suited.Advertisement
Elasticsearch had begun to expand beyond just search engine and added some features for analytics and visualization but still at its core it remains primarily a full-text search engine and provides less support for complex calculation and aggregation as part of a query. Although statistics facet gives some ability to retrieve calculated statistical information but just scoped to the given query. If we are looking for searching a set of documents and apply some statistics using facets then Elasticsearch is the better approach. Elasticsearch has become increasingly popular in the web analytics space with its open source Logstash for server-side log tailing and open source visualization tool Kibana. Apache Hadoop is flexible and powerful environment, Spark is also derived from Hadoop. For instance with Hadoop storage abstraction via HDFS any arbitrary job can run against data using MapReduce API, Hive, HBase, Pig, Sizzle, Mahout, RHadoop etc.
Elasticsearch and Apache Hadoop/Spark may overlap on some very useful functionality, still each tool serves a specific purpose and we need to choose what best suites the given requirement. If we simply want to locate documents by keyword and perform simple analytics, then ElasticSearch may fit the job. If we have a huge quantity of data that needs a wide variety of different types of complex processing and analysis, then Hadoop provides the broadest range of tools and the most flexibility. But the good thing is we are not limited to use only one tool or technology at a time. We can always combine based on what we need to outcome to be. Like Hadoop and Elasticsearch are known to work best when combined. In future, these boundaries are going to be more blurring with the speed these technologies are expanding.