Data mining is the application of statistical techniques to large data sets. Due to their size, they are processed using computer-supported methods. The terminology is somewhat misleading because it is about extracting knowledge from already existing data and not about generating data itself. Previously, we have pointed out the Different Tasks of Data Mining and Data Mining Issues. This article focuses on the legal and moral aspects of data mining.
Data mining as a scientific discipline is initially value-neutral. The methods allow the analysis of data from almost any source, such as measured values of components or the analysis of historical bone finds. However, if the data analysed relates to individuals, important legal and moral problems arise; Typically, however, this data is already collected and stored, not only during the analysis but regardless of the specific analysis method used (statistics, database queries, data mining etc).
Data that has been insufficiently anonymized may be reassigned to specific persons through data analysis. Typically, however, data mining will not be used here, but simpler and specialized analysis methods for deanonymization. Such an application – and especially the inadequate anonymization beforehand – may then be illegal (under data protection law). For example, researchers were able to identify user profiles in a social network based on just a few questions. If, for example, movement data is only pseudonymised, a simple database query (technically not data mining!) can often be used to identify the user as soon as one knows his place of residence and workplace: most people can be identified by the 2-3 places where they spend the most time.
Data protection law generally speaks of the “collection, processing or use” of personal data, as this problem does not only arise with the use of data mining but also with the use of other analysis methods (e.g. Statistics). Reliable protection against abusive analysis is only possible if the corresponding data is not collected and stored in the first place.
The application of data mining techniques to personally identifiable data also raises moral questions. For example, whether a computer program should divide people into “classes”. In addition, many of the methods are suitable for surveillance and advanced dragnet searches. For example, any credit score represents a division of people into the classes “creditworthy” and “not creditworthy” obtained through statistics, perhaps also data mining, and is criticized accordingly.
Data mining methods themselves work in a value-neutral manner and only calculate probabilities without knowing the meaning of these probabilities. However, when people are confronted with the results of these calculations, it can cause surprised, offended or alienated reactions. Therefore, it is important to consider whether and how to confront someone with such results.
Google gives its users insight into the target groups identified for them – if they have not opted out – and is often wrong. However, an American department store chain can tell whether a customer is pregnant based on shopping behaviour. With the help of this information, shopping vouchers can be sent in a targeted manner. It is even possible to predict the date of birth.