All of us are well aware that increasing digitization of content, increasing use of intelligent systems and their networking in more and more everyday objects are continually generating, capturing and transmitting data to manufacturers or other service providers. Due to the sheer mass of the resulting data, the administration and processing of these data reaches its limits. The intelligent and valuable handling of these data sets is summarized under the synonym Big Data. This article theoretical foundations of big data aims to provide an overview of theoretical models of the data analysis and administration and is intended for the completely unused students. Within this work, we first try to isolate the term big data as well as view of the opportunities and challenges that Big Data allows and which aspects of data protection are to be considered. In the following, an overview of the theoretical basis of Big Data is given and explained using different data sources and model considerations. Subsequently, various possibilities for the technical implementation of the theoretical models explained above are discussed and some application examples are presented. In conclusion, a brief conclusion is drawn as well as an outlook on the further development of Big Data.
So far we published articles more on the guides to configure Big Data tools, mainly from Apache. For practical reasons, we have divided Theoretical Foundations of Big Data in to few parts. This is part one of the series.
|Table of Contents|
Introduction to Theoretical Foundations of Big Data
Some of the opportunities presented by Big Data are presented, so as the challenges associated with the use of Big Data methods. Finally, we presented some of the trends in the use of Big Data, which show how the topic will continue in the near future. Because of the wide spectrum of topics associated with the term Big Data, there is currently no uniform or exact definition for the term Big Data. The term Big Data refers to large amounts of data, which include data from Internet, mobile, financial, energy, healthcare and transport , Smart metering systems, assisting devices, surveillance cameras and aircraft and vehicles, which are stored, processed and evaluated with special solutions.
In addition, according to the Big Data work group at BITKOM, Big Data refers to the economically meaningful acquisition and use of decision-relevant findings from qualitatively diverse and differently structured information, which are subject to rapid change and are incurred to an unprecedented extent. Big Data provides concepts, methods, technologies, IT architectures, and tools to transform the exponentially increasing volumes of diverse information into better-informed and timely management decisions, thereby improving the innovation and competitiveness of enterprises.
Most definitions of Big Data have in common that the term is characterized by the three dimensions volume, Velocity, and Variety. One big challenge is managing and processing huge amounts of data. There are now 4.4 zetabytes of data available. Data volume is doubled every two years. According to current statistics, 40 zettabytes of data will generated in 2020. This would have multiplied the data generated since the year 2005 by a factor of 300. In view of these vast data sets, traditional tools, such as relational database systems, reach their limits. For this reason, various alternative systems such as the NoSQL framework and non-relational databases like MongoDB, Apache Hadoop or Apache Cassandra are used.
The information interesting to Big Data can be in various forms and formats. It does not matter whether it is log, image or (free) text formats. The goal is to prepare these unstructured and unsystematic data sets in such a way that they can be processed uniformly and subsequently a new value can be generated by a new information gain.
The various social networks like Facebook, Twitter or YouTube, which generate the most diverse types of user-generated data, play a special role in this process.
Due to the increasing use of smart devices, the increasing networking of all devices among each other and the use of more and more sensors, data are continuously generated. As a result, the importance of real-time data processing is increasing all the time as the entire mass of data can not be buffered.
In addition to the three dimensions mentioned above, the Big Data is another aspect particularly important for economic decisions: the credibility of the available data and its analysis results. According to IBM, one-third of all leading IT decision-makers do not trust the analysis results of Big Data. One reason for this is that the growing mass of data is becoming increasingly difficult to filter out the really relevant information. The remaining data are non-targeted “data waste”, which can be better processed by human employees than by specialized analysis software. In order for the data to be analyzed to be usable, they must be clearly identifiable, complete, comprehensive and trustworthy. Thus, if a wrong data base or an inappropriate analysis model is selected for a question to be analyzed, this can decisively influence the meaningfulness of the results.
Another term that is increasingly associated with Big Data is Value. Value describes the added value generated by analyzing all the data that is generated. Through targeted evaluations, new information can be obtained from existing data, which can form the basis for further business decisions. Information on, for example, exact sales figures becomes more transparent, is available more quickly, and can be used to make predictions about the future and to make targeted business decisions, improve business processes and thus achieve a higher monetarization.
Opportunities & Challenges
Big Data is an unstoppable trend that no one can close. Companies and consumers alike are affected by technical developments. The collection, processing and evaluation of all possible data can lead to tremendous economic potential for optimization, but at the same time to severely limit the privacy of every individual, right up to the idea that the vision of the glass man can become a reality in the future.
The opportunities and possibilities resulting from the combination of Big Data and the increasing networking of all living areas are enormous. This is borne out, among other things, by different studies and by the McKinsey Global Institute.
Use of Big Data in Business
BITKOM summarizes the possible opportunities in five main points  :
- Creating transparency about our own business processes.
On this basis, better business decisions can be made, whereby greater added value can be achieved.
- Due to the large data base, advanced simulations and experiments can be carried out to increase the company’s performance.
- Improved customer access through easier customer segmentation and demand-driven goods and services. Targeted customer engagement can, among other things, reduce expenses for marketing campaigns.
- Support for decision-making processes for management through embedded analytics and fully automated decision-making processes. The analysis of large amounts of data can help to minimize the risks associated with important business decisions.
- Opportunities for new business models, products and services can arise, such as fully customized product offerings that are perfectly tailored to each individual customer.
In addition, the BITKOM provides examples that illustrate the economic benefits of Big Data. Among other things, Big Data can help manufacturers to optimize their manufacturing processes and reduce costs through machine-to-machine communication. However, social media analyzes can also help guide the development of a new product generation for R&D departments in the right direction. In addition to economic aspects, the use of Big Data also holds many opportunities for further areas.
Big Data in Science
Due to the sheer mass of different data and the ever-increasing performance of computer systems, they can be used to carry out more and more comprehensive simulations and analyzes, for example in climate research in order to make more accurate weather forecasts possible. Due to its properties, Big Data can also be used to gain new insights into behavioral or epedemic research. In the area of behavioral research, for example, social networks, in particular Facebook with its almost 1.4 billion active users, are used to carry out evaluations of behavioral patterns of individual persons and groups of persons. The project “Google Flu Trends” can be cited as an example in the field of epidemic research in connection with Big Data. Google tries to estimate the number of flu cases in a specific region based on the frequency of specific search terms.
Big Data in Medicine
The increasing digitalization of all areas also has an impact on health care. In medical practices and hospitals, the use of electronically supported examination procedures, such as the production of ultrasound images or the use of computed tomography, is growing ever more. However, this information is only available to other physicians or clinics, as there are currently no cross-institutional standards that allow such an exchange of information. An overarching data management of patient information, which can be maintained in an electronic health record (EGA) both by doctors and clinics, as well as by the patients themselves, could significantly reduce the bureaucratic administration costs of individual patient records and the exchange between such individual clinics. On the other hand, the increasing use of fitness trackers (wearable devices) as well as smartphone apps such as Runtastic, with a worldwide download rate of more than 70 million in the population new ways to collect and evaluate health data. Based on these additional data, new medical knowledge can be gained that will help patients provide better, more personalized medical care and counseling.
Other applications of Big Data
In addition to the economic, scientific or medical potential offered by Big Data, there are many other sectors that can benefit from the use of Big Data. These include, above all, the financial sector, trade in general, marketing, tourism, logistics. and the automotive sector.
The possibilities of using Big Data and Big Data technologies are manifold. In essence, the main advantages of using Big Data can be summarized in three main points, common to all areas:
- Creation of new insights from existing data
- Increase productivity and innovation
- Reduce costs for information processing and information exchange
In addition to the opportunities offered by the use of Big Data companies, institutions and society, they must also be able to deal with the wide range of information in order to benefit from the available data. When looking at and analyzing the existing data masses, there are a number of obstacles to overcome and questions to be clarified so that no negative effects of Big Data need to be feared for legal, private or ethical reasons.
Challenges for the economy and institutions
Because of the characteristics of Big Data and the fast-paced up-to-dateness of the data, it is necessary for enterprises and institutions to design and pursue a suitable Big Data strategy. In this way, companies lay the foundation for recognizing imminent changes at an early stage and for optimizing them.
This also means that a large number of hardware and software providers have to be selected in order to create the necessary technical prerequisites. In addition, specialists are needed who are able to deal with the mass of different data and to design the appropriate questions for the business in order to derive the greatest possible benefit from the existing data in order to provide the company with optimal advice and support. IBM refers to these specialists as a Data Scientist . One of the biggest challenges for the use of Big Data solutions is currently the lack of know-how in this business area. In addition, the introduction of Big Data solutions can lead to high, initial consulting and system integration costs, which can delay many small and medium-sized enterprises (SMEs) before the introduction of Big Data solutions. Depending on the method chosen, costs can be between $100 and $180,000 in different sizes. The high-priced solutions are aimed primarily at in-memory processes with comprehensive data processing facilities, while more favourable solutions focus on data storage.