The topic of big data is frequently discussed these days even among the laymen. While many of the peoples do not yet know the exact meaning of big data, those who know what big data is are all connected to data analytics on a large scale. Data protection often gets out of sight during these considerations. A key theme of this work is to make the reader aware of how big data affects privacy and privacy acts and answer the key question “Can we use IBM’s solutions to develop various privacy law complainant applications?”.
New technologies enable companies and public authorities to access and evaluate personal data in ever greater numbers. The amount of data we generate is increasing so rapidly worldwide that it not only exceeds the processing capacity of computers but also our imagination. Back in 2013, the total amount of digital data was estimated to be 1,200 exabytes. The huge flood of data that we generate every second of our lives, we can generate a pretty accurate picture of what we have just done, or are about to do in the future.
These data can be used by companies to obtain information about a consumer’s behavior or to gain a strategic competitive advantage. All this creates uncertainty and fear of total surveillance and data misuse. That’s why privacy is more important than ever. With the ever-increasing possibilities of electronic data processing and the increasingly frequent use of personal data on the Internet, the risk of abusive recording, processing, and transfer of data increases. Data espionage or illegal address trading on the Internet are becoming more common, which is why lawmakers have tightened up the Data Protection Acts. These facts show how close the link between big data and privacy is, and how important it is that the topic of privacy will be scrutinized for big data.
|Table of Contents|
Laws Needed To Know For Developing Privacy Compliant Big Data Apps
The EU Commission wants to concentrate more on the topic of data protection due to the still growing normative force of technical standards in automated data processing. The General Data Protection Regulation proposed by the European Commission would largely harmonize data protection in Europe for the near and medium-term future.
The US Constitution does not contain an express right to privacy and therefore little wonder, nor does it have any right to the protection of personal data.
Unlike in the EU Member States, consumer protection in the US has so far been determined by various sector-specific (consumer protection) regulations at the federal level, as well as by national data protection laws. The most important consumer protection laws or data protection laws include laws on the protection of the processing of medical data to protect the processing of bank data to ensure correct credit reports and protection against identity theft for telecommunications data dissemination.
The regulation gives the EU directive an effect on third countries. This regulation is highly likely to provoke legal conflicts between the EU and third countries. The first conflicts emerged shortly after the adoption of the regulation, especially with the US side. The difference in approach between Europeans and Americans on the issue of data protection could have led to significant disruptions to transatlantic traffic as a result of the third-country regulation. In addition to the political consequences of a European-wide blockade on the transfer of personal data to the Union’s largest trading partner, the economic consequences of such a trade barrier would have been considerable. To prevent these conflicts at an early stage or to solve them, In July 2000, the US and the EU Commission developed the “Safe Harbor” solution for the protection of personal data. A solution that defies transatlantic data flows despite the different approach to data protection.
Big Data in the Public
There are shared opinions about big data in public. In the US, people focus on the opportunities offered by Big Data. The opportunity to gain new social, economic and scientific insights through big data is seen as an opportunity to improve living conditions in our complex world. But what is described as a big risk of big data is above all the misuse of data and the massive violation of fundamental human rights of information.
There are numerous public disputes regarding interference with the fundamental right to privacy. This shows that people show some reactance when they feel that individual or group rights have been violated. These could have been done for marketing purposes.
Through the privacy-preserving data mining, ie the processing of only anonymized data, consumers declare themselves more willing to participate in household panels or the like. The anonymization thus increases the willingness of consumers to disclose their data.
The use of big data raises many questions, especially in relation to data misuse and data protection. First, the question remains as to who actually owns the data that is generated by the people and their devices. Or the question of who has the right to collect, bundle and evaluate this data. By evaluating and subsequently using personal data, companies have the opportunity to tailor products and services to their individual needs. This could lead to more comfort, safety, and better service.
However, the use of personal data carries the risk of misuse. A simple and well-known example of data misuse is the sale of personal information to third parties. But if the company is not authorized to do so, it is illegal because it is not in the customer’s interest.
Data processing, including data storage and data transfer, is regulated by companies or authorities at the federal and state levels. Nevertheless, data trading can take place in a gray area. Thus, legally collected data can be traded using illegal methods. For example, this can lead to inadmissible advertising calls. There is no comprehensive protection against illegal practices in the handling of personal data.
In general, anything that happens without the person’s voluntary consent to his or her data can be considered abusive. The evaluations of big data are then not abusive if the data are strongly anonymized.
The EU General Data Protection Regulation should be rapidly advanced and data protection in Europe set at a high level. A balance between innovation and privacy should be respected in European data protection law. Law should provide the incentive to anonymize or encrypt data as much as possible.
Internet never forgets
The proliferation of personal data has continued to increase thanks to social networks such as Facebook, Twitter or Instagram. Especially young people use such platforms to make private data and images accessible to the circle of acquaintances and partly to the public.
Critics warn against this reckless approach to personal data. Once the information has entered the net, they can not be removed or are difficult to remove.
An essential technical feature of data protection is the possibility to permanently delete personal data. The obligation to delete data results from the data protection principles of data avoidance and parsimony as well as the purposiveness and necessity. This means that the data must be deleted as soon as they have been collected without authority or if they are no longer needed for the purpose for which they were collected. These principles are facing a new challenge on the Internet.
However, with the proliferation of digital media and their penetration into almost all our habitats, data is collected and stored everywhere. Storing a record no longer depends on whether it is important enough for storage because it simply stores any information provided.
Users should be able to provide their images or data with a sort of expiration date as a form of self-protection. So the Internet should learn to forget. Abandonment of Internet use improved protection against misuse and misinterpretation by disseminating personal information to an even greater extent: Avoiding Information Fragments Introducing a Privacy DRM (Ownership Rights Are Obtained from the Provider).
Critics see the expiration date problematic. By limiting the validity of a file for a while, they believe that the already low inhibition threshold can be put in the background even further.
Case Study: Big Data Privacy Solutions Provided by IBM
With big data rapidly gaining a strong position in the economy, the potential of big data solutions will far outstrip today’s capabilities. Various responsible companies acknowledge the privacy issues and have research-based solutions to minimize the risks of a company.
IBM InfoSphere Optim Data Privacy provides capabilities to mask sensitive data across development, testing, QA environments. Optim data privacy solution has several components. A company or the developers can use the components to mask sensitive data such as national IDs, credit card numbers, and email addresses propagated outside the production environment. They use a variety of transformation techniques which substitutes sensitive information with masked data including masking techniques such as including substrings, arithmetic expressions, random or sequential number generation, date aging, and concatenation. Accurate masking helps such masked data to retain a similar format compared to the original. This process makes the data appropriate for testing and other legitimate business uses, however useless to identify hackers.
The Optim data privacy application is a component which is used to mask data in CSV and XML files, including data in a Hadoop Distributed File System (HDFS) via a graphical user interface (GUI) or command-line interface (CLI) tool.
The data privacy library provides stand-alone API which is flexible and extensible. The data privacy API can be used by applications that are written in various languages. Also, Optim user-defined functions can mask sensitive data in various database management systems (DBMSs) such as DB2, Netezza, Oracle, Teradata. Developers can include the user-defined functions in SQL scripts or statements to dynamically mask data within the framework of a DBMS server.
The masking functions also available as cloud services in IBM’s BlueMix for simplified deployment for cloud database as part of data services offered by IBM’s BlueMix. Optim connects Hadoop via Apache Hive query language. On the system where Hadoop is running, install, the developer needs to configure Apache Hive and start hive server version 2.
As the clients are responsible for ensuring compliance with various laws and regulations, IBM Watson provides a simple way to remove a customer’s message data from a Watson Assistant instance. Official example of labeling and deleting data in Watson Assistant can be found here.
- IBM Security Guardium is a family of data security and protection to help secure sensitive data starting from databases to big data, cloud, file systems. This PDF describes the principle to create Hadoop environment with IBM Security Gaurdium.
- IBM Data Risk Manager uncover, analyze and visualize data-related risks to the business.
Following are IBM’s the general legal documentation for Data Privacy and security :
With the increasing computing power of modern IT systems, the speed with which big data solutions can process data also increases. In the future, big data will increasingly find its way into smaller companies and business sectors, as the cost of acquiring more efficient hardware is steadily declining. The pressure to act on companies that want to gain a stronger economic position is increasing and strategies for big data use have to be adjusted. The topic of data protection will continue to play an important role in the future.
Big Data plays an important role in our society through the proliferation of information systems. For example, companies use the Big Data-derived information to tailor products or advertisements. Healthcare evaluates search queries to predict future flu epidemics, and uses structured or unstructured mass data to provide reliable predictions about climate change, economic trends, or forecasts Price developments are taken. This is just a small selection of what can be started with our personal data.
In order to analyze and master this huge amount of data that we generate every day, sophisticated analysis systems like data mining or web analytics are needed. These analysis methods are used to find connections between the huge amounts of data or to improve the website of your own site.
The data used for these analysis methods are those that our society consciously or unconsciously generates every day. Mostly these are personal data and for this reason, it is important that they are protected. So you have the opportunity to do something yourself for the protection of his data by being encrypted, anonymized or not even pay.
Since it is not always easy as a private person to protect his data, there are Data Protection Acts in which the handling of personal data is regulated. In order to comply with these rules, a so-called data protection officer is required, who supervises and advises the body in terms of data protection.
As from the case study on IBM, certainly, there are also solutions which can ensure higher compliance even in cloud-based applications.