In first part of Social Impact of Big Data, we discussed about the sources from which data are collected. In this article on second part of social impact of big data, we will discuss about the commercial usages and applications of big data and big data analysis. From that previous article, it is obvious matter that caution should be exercised when processing, storing or generally using data concerning aspects of privacy, security or intellectual property. Such personal data is information that can be used to identify a person’s identity. These include, for example, working time and project time data, as well as the gender and the age of the person who are subject to the data protection regulations. Personal data must be stored on specially protected servers, i.e. access to it is only possible for a few, particularly authorized persons, who are involved in the technical or commercial management of the servers.
In addition to the protection of privacy, data security also serves to protect against unauthorized access by third parties, therefore, in principle, that data or information is worth protecting goods. Consequently, access to them should be limited and controlled. Therefore, some objectives should be pursued in the handling of data:
- Data may only be read or modified by authorized users, both when accessing stored data and during data transmission.
- Data may not be changed unnoticed. All changes must be traceable.
- Prevention of system failures. Access to data must be guaranteed within an agreed timeframe.
In order to be able to use personal data for planning or for statistical evaluations, these must first be anonymized.
The explosive nature of this issue has also been recognized by the legislature. These persons, as far as they are employed by non-public authorities, have to be bound to data secrecy when starting their activity. The data secrecy continues even after the end of their activity. In terms of intellectual property, data security also plays a key role, because any product that is not sufficiently protected in form, content and functionality can be copied, sold and used.
Social Impact of Big Data : Analysis of the data
Big Data does not focus solely on collecting data, but has the primary goal of analyzing that data. Only on the basis of analyzes, correlations arise from which insights can be gained or conclusions can be drawn on the data.
There are ready-to-use software analysis tools for this purpose that would do the job, but the danger is that with the background knowledge of the analysis processes and analysis methods missing, it would not be advisable to simply apply them and trust the results.
The analysis process requires interdisciplinary skills and knowledge from various specialist areas that are affected in the company. A person skilled in the art should understand and apply data analysis. This should be well-versed in information technology and have programming, mathematical and statistical skills. Furthermore, he should still have a business sense and communicate effectively with decision makers. Exactly these mentioned, valuable abilities make a Data Scientist, which can analyze data from the crucial insights for the enterprise out.
In a study, the following four core requirements for industry could be derived from “big data management”:
- Ability of handling large, heterogeneous amounts of data
- Complex data analysis algorithms
- Interactive, often visually supported data analysis
- Traceable data analysis
With the four core requirements listed, it can be seen that the analysis and the algorithm are crucial for a big data system.
The term analysis is a systematic examination of the data, which is broken down by subdividing or decomposing a subject into components. Structures, abnormalities, regularities or relationships should be filtered out of the data. Therefore, the analysis is a process from which insights can be gained from the data.
On the other hand, the term analytics means the doctrine that means art of performing data analysis. For example, the term analytics is used as a collective term for the set of analysis methods. Furthermore, this also includes methods from the fields of statistics and data mining. Nevertheless, supportive technologies and tools are often associated with the term analytics in the analysis process.
The term Business Intelligence (BI for short) has the overarching goal of supporting sound and timely decision making at strategic, tactical or operational levels. In order to optimize business processes and planning, internal as well as external data are used. Although there is clear direction and purpose, there is no single BI definition. If one compares different definitions, then BI is typically regarded as a conceptual bracket for concepts, processes and technologies for the systematic collection, unification, storage, evaluation and representation of data.
Business Intelligence is referred to as an integrated, enterprise-specific, IT-based, overall approach to business decision support that involves controlling, planning, and monitoring, and the resulting internal data of value-added processes.
Just because big data can seem to be a topical issue does not mean that it’s really new. Many of the theories of analysis, methods, and fundamentals date back to the last century, which are related to or originate from business intelligence.
The subject of Business Intelligence concerning information technology goes back to the 60s of the last century, the “Management Information Systems” (MIS) is a totally integrated overall approach to management support and there were the first attempts of dealing with the topic. The term MIS has become a collective term in the Americas, where all IT systems are understood to support management.
In summary, business intelligence and big data have similar goals and serve to provide insights that represent economic benefits. BI solutions are designed primarily for internal structured data, whereas big data is all kinds and types of data that might be of interest.
Data Mining is also related to the topic “Knowlegde Discovery in Databases” (KDD), extracted by (semi-) automatic processes knowledge from the databases. In data mining, the data is electronically stored and automatically searched by computers. Economists, statisticians, meteorologists, and communications engineers have spent a long time working on patterns that can be automatically searched, identified, validated, and used for prediction. These forecasts can be an advantage, most of the time an economic advantage on top of that. There is also a widely used process model called “Cross Industry Standard Process for Data Mining” (CRISP-DM).
The analysis process starts with business understanding, but overall it does not follow a strict linear process. But it should make it possible for new findings to visit the phases multiple times or even allow a jump to any phase, if this is required in content. The arrows in the graph are just the most common jumps to be observed during an analysis project, and the outer cycle is intended to indicate that the entire analysis process can be iterated multiple times. The four levels of abstraction are Business Understanding, Data Understanding, Data Preparation and Modeling, whereby Modeling is also referred to as Modeling Phase as Data Mining.
The analysis process model CRISP-DM can be applied to big data issues because it is suitable for data analysis. This is essentially also in Big Data. Data mining or KDD are the basics to dig out data, data, and seek or discover useful correlations that can ultimately lead to insights.
Variants of Analytics
Often the term analytics can not be found without additional information, to ensure that this is a special subarea or a focus. Therefore, this is also referred to as business analytics in the case of relevant entrepreneurial issues, because predominantly in-house and structured data are used, and these results should in turn contribute to decision-making support. That’s why Analytics can be classified as business intelligence. The correct Analytics variant depends on the respective question. The following describes the typical variants of Business Analytics, each of which has its own special characteristics or can be combined with each other if necessary.
Descriptive Analytics deals with the actual question “What has happened?” And tries to answer this from the past through the data. This can be answered classically with a reporting report or queries, reporting and search tools are used to extract the information from the data.
Diagnostic Analytics asks the question, “Why did it happen?” And aims at root cause analysis. An attempt is made to filter out correlations from the available data, but causal relationships can hardly be deduced thereby. Nevertheless, this can be helpful in the search for causes for the respective experts, so that it can conclude important findings from the data. Furthermore, the term diagnostic analytics is less common, so that this question is often found in the field of descriptive analytics.
Real-time analytics deals with the question “What is happening?”, Ie data from the present, which can usually be displayed via dashboards or report sheets. Examples of this are to discover personalized advertising on websites, personal referrals in online commerce, or even detection of fraudulent transactions in applications. From an information technology point of view, this analysis variant is a challenge, only if the total latency – ie from the triggering event to the result – goes to zero, is to speak of a real-time application.
Predictive Analytics focuses on the question “What could happen?” And looks to the future so that perhaps predictive predictions can be made. It may happen that the prediction is not limited to the future, because future outcomes from the present view are unknown. Furthermore, it may also be the case that a past or present value of a target variable may be just as unknown as it is in the future.
Prescriptive Analytics wants to answer the question “What has to happen?” and should at the same time give answers with which actions the company can achieve a business goal. This variant of Analytics is the highest and most technically complex form for establishing decision support in a company. The analytics goal here is to identify from the data, in new and unexpected situations, recommendations for action, which should help with the decision-making process. It should be noted that there will always be events that can not be predicted. The basic elements of Prescriptive Analytics are often Descriptive Analytics and Predictive Analytics, so that the most accurate “prediction” can be determined from the combination with optimization methods or model-based simulation techniques. Furthermore, the recommendations or the actions taken, in turn, can flow back to the system for iterative improvement and thus contribute to better prediction.
The biggest challenge is the text mining. Here it is tried to recognize in texts, which could contain in itself an economic potential. The simplest example here is the social media data, where you can find out with the help of text mining, how the new product is accepted by the customer or not. However, because text data from a computer’s perspective is unstructured by nature and consists essentially of natural language and context, direct analysis can not be easily applied using traditional methods such as know-how discovery in databases, data mining or even business intelligence , Linguistic and semantic methods are used to achieve the goal “BI on text”. The help of these techniques extracts relevant information from the unstructured texts, recognizes structures and then connects them to data and other data sources. In addition, there is the factor that social media texts are mostly written in a jargon or slang. It is important to interpret these correctly and then perhaps additionally derive the mood from the author.
Nevertheless, it is possible to operate with a big data system text mining, but this semantic and speech technologies are to be introduced as a basis, so that the desired recognition value can be guaranteed at all. Correlations can also arise which make sense for a computer, based on its algorithm, but are rather irrelevant from a human point of view.
In summary, collecting data without analyzing it does not provide any economic value for the business itself. Expenditures for the entire IT system that would exceed revenues – the value of data intelligence – which would not be profitable in the longer run. Only with the help of suitable Big Data Analytics methods can an economic benefit be created. However, it is also crucial that the resulting measures are followed by the appropriate measures so that the company can make progress in the respective application. The insight comes when internal data from traditional operational and analytical systems are linked to relevant issues and enriched with big data sources.
Despite this, it should be noted that there is currently no real proof from which it can be clearly deduced that the yield or competitive situation of the respective company has clearly improved due to big data.