Big Data and Privacy : Data Leakage

Abhishek Ghosh

By Abhishek Ghosh November 10, 2018 9:11 am Updated on November 10, 2018

Big Data and Privacy : Data Leakage

As because app developers increasing using SaaS type platforms for application deliveries,
in our earlier article on Big Data Privacy and application development, we mainly discussed maintaining compliance for the app developers. It had a vendor specific part and resources. Here is another practical discussion around Big Data and Privacy. Developer own self is vulnerable to data theft. Why we have not talked about Microsoft, Google in the context of cloud based tools should be obvious. Data theft induces suspicion within the bussiness. Big Data has the ability to be used in close to all areas of everyday life. It is probably not safe to use platforms by the companies who are previously already known to be related with governmental agencies.

Big Data and Privacy : The Phenomenon of Data Leakage

In production companies, it is now common to incorporate sensors in the specially manufactured products. Thus, telemetry is constantly being established between the product and the manufacturer. For example, companies can use these consistently up-to-date metrics to record when their device fails or transmits incorrect values. With this knowledge, they can then carry out improvement on their product and keep it in view. This can then also affect the development afterwards. Once this new information has been correctly evaluated, it is also possible to adapt certain processes to production by merging them with others or even just omitting them. So there is also the option to reduce costs. The use of apps that determine the locations of the users, potential customers in this case, via GPS and then inform them about existing promotions and discounts, in a shop in their vicinity. Thus, new customers are acquired, as they would not have been aware of it by themselves, or recalled to existing customers something they may have forgotten. This concept is embodied, for example, by the app “FourSqaue”. The developers have even enhanced this app with the feature that it learns what things the user likes to inform him in the future about actions in this category. In other words, for example, a technology-savvy person would be informed when in the electrical goods business. Most retailers know their customers, know what they are buying or not buying. In combination with social media such as Twitter, Facebook, Instagram or their own e-commerce sites, they can get to know their customers better, ask them questions publicly to improve certain things. Not only can such actions deliver new insights, but they can also provide greater transparency to the public and feed directly into supply chain management. Social media like Faebook, LinkedIn or Twitter can only exist thanks to BIg Data. Their business model builds on the mass of information each individual user generates. These are then customized to subsequently provide the user with specifically filtered information.
These were just a few of the possible uses of big data, and it becomes apparent that there are many more such areas, or only possible through big data. The fundamental task of data protection is to protect the rights of individuals. Data protection is the protection of data from loss, misuse or alteration. With the ever increasing technological progress and the associated networking, we generate more and more personal data with our smartphone, laptop and watches. These data are stored and can be used eg for advertising purposes of companies, for crime fighting or profile analysis. So that this does not happen to the detriment of the data source that is the individual person, the state has issued a data protection law, where the handling of personal data is regulated.

Legitimate Analysis Leads to Data Leakage

The concept of digital data analysis dates back to 1970s. By using a digital data analysis problems should be identified and resolved before they can arise or spread. Digital data analysis is already being used in most areas of life such as in health care, a flu outbreak should be predicted by evaluating Google searches. By means of movement data, traffic congestion is forecast and evasion routes are identified. Structured or unstructured mass data can be used to produce reliable forecasts of climate change, economic trends, demographic change or forecasts of price developments. Big data-based analysis of digital data leads to increased efficiency in the areas used. By predicting customer behavior, movement patterns and societal change, companies can act proactively to respond to the circumstances before they arrive.

The term data mining is a generic term for a large number of methods, procedures and techniques. Through data mining, economically usable patterns and relationships can be extracted intelligently and fully automatically from large amounts of data. The methods used come from classical fields of mathematics, statistics, biology, physics and computer science. Since there is a wide field of application possibilities for data mining, there is no solution that covers all the requirements of each area of application. Therefore, the choice of data mining method to use depends on the problem and the data mining expert. An example of a software solution for data mining is the free programming language “R”. It is the default language for data mining.

Predictive Analytics is part of Data Mining. In this process, relationships and patterns collected through data mining are analyzed to derive and predict patterns of behavior and trends. During the descriptive phase, relevant data are collected and then used in the predictive phase to develop a statistical model that allows prediction. In the prescriptive phase, recommendations are drafted that can influence a trend.

In addition to personal data, there are also location-related data. In contrast to personal data, these data have a location in the form of coordinates. The increased use of networked wearables and mobile devices usually allows otherwise used data to be referenced to increase the usability of the data. In this way, for example, data from the trade can be tagged with a GPS tag and thus used for the evaluation of locations. With the addition of the dimension “place”, data can be harnessed in new ways, using Big Data solutions to analyze sales forecasts or damage risks.

Web analytics is the measurement and analysis of data for the purpose of optimizing an online presence. By changing websites and then analyzing users, website owners want to find out if their changes are being positively understood. In the context of web analytics, entire paths of users are often recorded over many pages and analyzed to adjust the presence to the user. This is done via so-called cookies. With a big data solution, the cookies can be analyzed and the result generated a list for more or less apparent wishes of the users.

With the use of machine learning, computer programs, through a variety of applications and methods, the ability to independently acquire and implement new knowledge. This is done by the application of complex logic analyzes and statistical evaluations with the aim of finding an optimal solution to a given task. In machine learning, it is usually the how rather than the what or why that is the central issue in solving problems. Through machine learning, data can be captured and processed in a very specific way because artificial intelligence always retains old learned knowledge and applies it to future data. In this way, certain trends and temporal characteristics can be linked to the data already learned in order to make predictions.

While using Internet, each user leaves some user information which is deliberately and voluntarily made available on the Internet. It must be distinguished who has access to the data tracks and what knowledge can be obtained from them. The information is not always authentic, because sometimes data tracks can be blurred or faked.
System may collect data of different types, namely user information as a result of self-presentation in the network, such as Facebook geographic location data, client communication data, etc. All this data moves in the net, this data movement shows who is talking to whom about what and how long. From this are derived preferences, thoughts, buying interests, the movement of the users indicates where and who is currently staying when he is communicating. In addition, there are passive data tracks that are created without the intervention of the affected users. These data include the residential environment using StreetView, homepages of companies, schools, and sports clubs where employees, members, students, and graduates are kept. Furthermore, statements from friends and acquaintances who are posted on Facebook and Twitter about someone else are included. But the fact that these data represent an accurate picture of reality is a fallacy that we should not forfeit. There are no data that are objective. Because even the method of data collection has influence on the record. In Facebook, for example, users come from very specific population groups and are restricted in their user behavior. Thus, certain interactions on Facebook are very simple, while others are impossible. The data generated via Facebook are therefore not independent of the algorithms of the platform. The same goes for any other website. Those who only pay attention to data acidity make a mistake. It’s like looking only at the shadow that a human is throwing at humans.
However, Big Data is more and more often the only source to which we appeal. of course, these data have an immense added value, so researchers have begun to search Google searches for drug-related to find out which medications are taken together and with what side effects is to be expected. In addition, from a Facebook user profile can be drawn quite accurate conclusions about the origin, religious affiliation, political attitude, personality, intelligence, well-being, age, gender, relationship status and drug abuse of a user. As a result, statistical information can be collected without user intervention. This asymmetry makes big data a coveted commodity.

The Inherited Security Risk of Cloud and Exploits of IoT Devices

Many of us today have more and more electronic devices, such as smartphones, laptops, smart watches or tablet PCs. On all these devices we generate personal data, sometimes even very sensitive ones such as bank data or medical data. This database is growing continuously. Much of this data is on hard drives and can be easily tapped and even duplicated by third parties. The problem is that many systems do not properly support the management of data.
The cloud concept is currently on everyone’s lips and more and more companies are offering their customers Cloud data storage. So the data is stored centralized and accessible from everywhere.

However, this added value comes at a high risk. Because to make the data accessible from anywhere, they must first be uploaded to a central location. The exciting question is then whether the personal data is still safe. With all of these offers, you have to trust companies like Dropbox or Apple to securely store their customers’ data.
Since the beginning of the discussion cloud computing, topics such as data protection and data security have been named as essential acceptance. To what extent are individuals or companies willing to entrust their data to these companies. It is often argued that cloud computing offers a higher level of data protection than the processing of personal data on the ground. In fact, many lack the necessary professionalism to properly and effectively protect the data. This should not happen with Cloud offer because the professional data processing is essential for their offer.

With the ever faster technical development one must examine the scalability of the already existing systems. This huge amount of data reaches its limits in the standard storage and server systems, in the terabytes to petabytes range. Often, new systems or even new data centers must be planned, installed and integrated into the processes for effective processing. For this solutions are often used that were previously known only for supercomputers.
The disaster recovery can not currently be operated at reasonable cost by Big Data. Thus, the focus here is on the high availability of the system. This is trying to keep the availability as high as possible to prevent the total failure. The hardware is not only set to RAIDs and snapshot technologies, but also additional procedures in which the data is mirrored. Another way to increase high availability is data deduplication. This means that redundant data is detected and eliminated on the systems before it is saved.

The experience with the supercomputers has made it clear that you also have to pay attention to the long haul networks (WAN) or SANs, since the performance of the devices must also be designed for data in the petabytes range. The local area networks (LANs) of companies also have to meet higher performance requirements because when processing big data, the calculation results are distributed directly to the respective department.
Person profiles

By using networked goods, devices or systems, personal data is recorded by the users and summarized into so-called personal profiles. A person profile essentially comprises the most important properties of a user for the creator of the profile. The important features vary depending on the industry in which you work. People profiles are generated through many different sources. Some of these are the use of services on the Internet where the IP, browser, operating system and country of origin of the user can be easily retrieved. If the user is logged on to a service, in most cases his name, address and other sensitive data are also stored. Furthermore, payments, credit cards or online payment services such as PayPal.

Personal profiles are primarily used for market research and to get to know user behavior. The goal is to create a so-called Transparent Man. The Transparent Man is a metaphor and describes the full transparency of a person and the easy access to all his personal data in order to use it economically. However, personal profiles can also entail significant risks such as complete loss of privacy through simple monitoring and misuse of personal data. In addition to the benefits of market research, there is a risk that profiles of individuals may be created by unauthorized persons to monitor individuals.

Data and information, especially personal data, are now considered an economic good. Personal data is very much in demand because of its high added value and its versatility for IT companies. An extensive set of personal data allows Big Data to carry out extensive customer and consumer behavior analysis. The data at present have a very high value, is reflected in the value of companies which own personal data such as Facebook or Google. The more detailed the data about a person becomes, the greater the added value of the data becomes, as they can be used for more and more purposes. Unlike physical assets, data can be reused indefinitely without losing value, as the cost of duplication is rather low compared to the cost of recovering data. In order to duplicate data or even pass it on to other companies and interested parties, explicit consent of the data owner is required. For example, personal data may not be sold or shared with other companies without consent. However, the transfer of data is a sensitive issue, as even during a legal transfer of data between two companies violated the law or data are stolen.

The possibilities of a web analysis have long been used in the economy. If you look at various studies, it will become clear that user analysis is not only of great importance if you want to further develop your product, but also that this area is no longer thinkable away from the economy.

Likewise, governments have recognized the potential of statistics. In 2006, data retention was used in France, and at the latest since Snowden’s revelations, it should be clear to anyone who is also a state of big-power power.
Many are worried that secret services are listening to their phone calls, or reading their e-mails.
This topic was not unrecognized by the legislature. Thus it passed, which starting from the 3. May 2006 the files which are obtained with public communication, for at least 6 months, and at most for 2 years in stock. However, this law was invalidated by the European Court of Justice on 8 April 2014, as it is incompatible with the Charter of Fundamental Rights of the European Union. It was essentially about the fundamental right to privacy and the fundamental right to the protection of personal data.

In the sense of social security, security is now seen as much more in the sense of individual security. Security from criminal threat or terrorism. However, with regard to the activities of the Ministry of State Security, careless consideration of detailed monitoring of the population is not possible. Private companies are as well as government organizations, due to the use of personal data in the criticism. Customers do not want the private company to use their data.

Conclusion

Although, the list of sources of Data Leakage is non ever-ending, developers must employ proper methods to apply various methods at each layers to make data theft useless for identifying attributes of a person. It is quite important to avoid API, software & services from the already known dubious companies known to be involved in PRISM, for example Google, Microsoft.

Despite all the criticism of the personal analysis, there are arguments for it, such as the effective fight against terrorism or the early detection of crimes. But it must not be forgotten that all these data can also be misappropriated, eg for identity theft, can be used. Therefore, there is a question: is it legitimate if there is an added value in monitoring people? If the answer is yes, then what must this added value look like? Is there a small added value such as a product improvement sufficient which was achieved via a behavioral analysis? Or does it have to be an added value that protects us from physical and mental suffering?

Tagged With MICROSOFT GOOGLE DATA LEAKAGE SENSITIVE , BIG data and data leakage , data leakage and big data , data privacy leakage , Fact How big data influence by Social in malaysia , privacy and data leakage , there was a problem communicating with google server

About Abhishek Ghosh

Here’s what we’ve got for you which might like :

Take The Conversation Further ...

Get new posts by email: