Nowadays it is common to come around phrases like Data Warehouse, Data Lake. Data warehouses is four decade old established concept. Data lake is a new idea. What is the difference between Data Warehouse and Data Lake? We talked about the concept about Data lake in a previous article. Explanation of Data Warehouse is needed. Database is the most common way of data storage for the digital data. … [Read more...]
Install Apache Kafka on Ubuntu 16.04 : Single Cloud Server
Apache Kafka is a stream processing platform which aims to provide a low-latency platform for handling real-time data feeds. Its storage is a massively scalable pub/sub message queue architected as a distributed transaction log making it valuable to process streaming data. Kafka can connect to external systems for data import/export. Apache Kafka is a part of Big Data analysis too. Here are the … [Read more...]
Apache Spark Alternatives To Overcome Integrity Issues
Previously we were shown steps for installing Apache Spark, now suddenly why this article is sounding like anti-Spark? Apache Spark has problems including need of dependencies & integrity. That is why here is a list of Apache Spark alternatives to overcome integrity issues. At this moment Apache Spark is one step ahead of its competitors, due to some characteristics like implementation and … [Read more...]
Integration of Apache Hadoop With OpenStack Swift
Commonly either peoples are versed with Apache Hadoop or Openstack Swift. The topic integration of Apache Hadoop with Openstack Swift is not exactly new. Good experience with both together may be rare. You can follow our this guide specially for handling OpenStack part without searching here and there. Further you can use this website's search function to find our old guides apart from the linked … [Read more...]
Integration of Big Data Tools With WordPress
Tools Like Apache Solr Has WordPress Plugin to Integrate With WordPress. Apache Solr is open source enterprise search platform built on Apache Lucene but not exactly directly a building component of Big Data platform. Here Is An Article On Integration of Big Data Tools With WordPress. WordPress definitely can handle core Big Data works. But there are limitations as WordPress essentially a CMS … [Read more...]
Automated Deployment of Apache Hadoop & Big Data Softwares
Previously we have published guides on how to install Apache Hadoop, how to install Apache Spark, how to install Apache Hadoop and use with ElasticSearch. Also we have talked about Cloud Orchestration Tools, Getting started guide with Ansible and Ansible Playbooks. If the above two sets of tutorial's philosophies are combined, installation of Hadoop and complicated, time taking softwares can … [Read more...]
What is Data Lake in Big Data?
In our one previous guide, we have shown step by step tutorial on how to create Data Lake on server and talked basic matters around data lake. A data lake comprises of multiple repositories providing data to an organisation for analytical processing including analytics & reporting. In another guide, we have talked about medical prediction using the data lake. It is James Dixon who coined the … [Read more...]
Big Data as a Service (BDaaS) Basics
Although Software as a Service had big usability, expect for few usages, SaaS has been restricted and corporates are in favour of on-premise. Big Data as a Service or BDaaS, is as if combination of SaaS, PaaS and DaaS. Self Hosting Big Data platform is time consuming and costly. Businesses have cloud-based IT spending of about 15% now. The forecasted value of the BDaaS market is … [Read more...]
List of Apache Projects For Big Data
It is possibly confusing to many new users when we talk about combining various big data related softwares. Here is a List of Apache Projects For Big Data With Basic Practical Details Which is Helpful to the Developers Who Are New in Big Data Field. Apache Hadoop and Apache Spark are possibly most known. At present there are total 37 Apache projects which are directly related to Big … [Read more...]
Installing Local Data Lake on Ubuntu Server : Part 1
In previous guides, we have covered some important basic installation and setup guide for the major known Big Data softwares. Here is Part 1 of Installing Local Data Lake on Ubuntu Server With Hadoop, Spark, Thriftserver, Jupyter etc To Build a Prediction System. We suggest to use servers from VPSDime as they cost very low - $7 per month for 6GM RAM. We talked about some limitations of OpenVZ … [Read more...]
How to Install, Configure Elasticsearch with Apache Hadoop
There is reason why we compared Elasticsearch with Apache Hadoop. Here is How To How to Install, Configure Elasticsearch with Apache Hadoop, Flume, Kibana. Also We Provided Links to Official Configuration. Before running the commands, we will suggest to read the text under the next sub header. README To Install, Configure Elasticsearch with Apache Hadoop Previously we have … [Read more...]