The classic production factors of a company are based on the three pillars - work, land and capital. Knowledge is regarded as the fourth factor of production in our society and is the decisive competitive factor when it comes to the right use. Information is an integral part of knowledge. A company obtains its information from internal or external sources. The processing of this information is … [Read more...]
Process Mining and BI in Big Data Technology
There are many ways to collect data. Before selecting the technology for collecting data, we should ask ourselves a basic question - what do we want to achieve with data analysis? In this article named Process Mining and BI in Big Data Technology, we have discussed basic matters around Process Mining and BI in relationship in the vast field of Big Data. If we want to evaluate our data today, we … [Read more...]
How to Install Miniconda on Ubuntu/CentOS
Regular readers of this website possibly can recall that we talked about how to install Anaconda. Anaconda is around 413 MB size and needs 3 GB of available disk space with collection of 720+ open source packages. Size of Miniconda is only 45.2 MB as it without the collection of 720 packages and Anaconda Navigator (desktop graphical user interface included in Anaconda which allows to launch … [Read more...]
Theoretical Foundations of Big Data : Part 3
Everything which has a starting has an end. This article with the title Theoretical Foundations of Big Data Part 3 is the third and final part of our series of articles. If you are curious about the previous two articles, you can read the Part 1 and Part 2 before reading this article. We are resuming the topic from analytical methods. The sub-headers do not claim to be complete in the thematic … [Read more...]
How To Install Hue on Ubuntu 16.04
For this guide using Hue is Query Tool With GUI For Browsing, Querying, Visualizing Data & Developing Apps for Hadoop. Here is how to install Hue on Ubuntu 16.04 running Hadoop. Hue consists of web service which runs on a node in cluster. Hue has editors for Hive, Impala, Pig, MapReduce, Spark and any SQL like MySQL, Oracle, SparkSQL, Solr SQL, Phoenix etc. Dashboards to dynamically interact … [Read more...]
Theoretical Foundations of Big Data : Part 2
This article is part of series of articles on the topic which can be guessed from the title. Theoretical Foundations of Big Data is second part of our series of articles. Readers if not already read, can read our first article of this series on Theoretical Foundations of Big Data. While dealing with large amounts of data, there are certain rules to which the providers and users of Big Data have to … [Read more...]
Theoretical Foundations of Big Data : Part 1
All of us are well aware that increasing digitization of content, increasing use of intelligent systems and their networking in more and more everyday objects are continually generating, capturing and transmitting data to manufacturers or other service providers. Due to the sheer mass of the resulting data, the administration and processing of these data reaches its limits. The intelligent and … [Read more...]
List of Python Libraries For Data Science & Machine Learning
Knowing basics around Python is a need for development in Data Science. In our previous guides we talked about installing Jupyter and working with Jupyter. This article is not exactly related to Jupyter Notebook but is very important for the developers around big data and data science. Here is List of Python Libraries For Data Science & Machine Learning. List of Python Libraries … [Read more...]
How to Install Apache BigTop on Ubuntu 16.04
Possibly Apache BigTop is confusing to many of the new users for the simple fact - peoples know about Hadoop without many parts and tools. Apache Ambari is another tool which confuses the new users. Apache BigTop is a Big Data management distribution. On official website, there will be good information to clarify any doubt. Here are the SSH commands showing how to install Apache BigTop on Ubuntu … [Read more...]
Jupyter Notebook Tutorial : Part 2
In Part 2 of our Jupyter Notebook Tutorial series, we will learn installing packages in Jupyter, importing data, plotting inline etc works. Our Part 1 of Jupyter Notebook Tutorial was on how to install on localhost or server - on Mac OS X or GNU/Linux. For quick recapitulation to unused with Python, normally on terminal, if you run : [crayon-69fbe2b5461ed670499766/] then it will start Python … [Read more...]
Jupyter Notebook Tutorial : Part 1
Jupyter is a kind of acronym to mean Julia, Python, and R. Jupyter runs code in many programming languages and Python 3.3 or greater, or Python 2.7 need for installing the Jupyter Notebook. We can install Jupyter with PIP but using the Anaconda distribution to install Python and Jupyter is what is officially recommended. This article is how to install Jupyter Notebook tutorial as part 1 of the … [Read more...]