What is Data Fusion?

Abhishek Ghosh

By Abhishek Ghosh February 5, 2021 6:22 pm Updated on February 5, 2021

What is Data Fusion?

Data fusion is the process of merging and completing incomplete data sets. It is an important part of information integration. Data in a recipient record is supplemented with the help of a donor record. The donor record consists of variables and the recipient record from variables. The variables are therefore present in both data sets. Based on the donor data set, a model for calculating the values of the from the variables are created. This model is applied to the recipient record to create a new, merged record. The statistical methods used are summarized under the term statistical matching and are partly related to the methods of the imputation of missing values.

While duplicate detection is largely complete and has only small discrepancies, data fusion requires combining several partially incomplete data sets. Before the fusion of data from two sources is possible, they may need to be brought to a common schema (schema integration). Non-existent attributes are populated with NULL (for “no value”). As a rule, a common identifying attribute as an identifier is also necessary – this may have been determined previously, for example, by duplicate detection. A simple method of data fusion is to merge one record with another if it lacks more attributes and it matches the other record in all existing attributes (MINIMUM UNION). The record with more missing attributes is subsumed by the more complete record.

If related data sets not only lack individual attribute values but differ from each other, data conflicts are also referred to. For example, data conflicts can be due to typos, different spellings and encodings, errors in calculations and automatic text recognition, and outdated data. To resolve data conflicts by aggregating, preferences or other conflict resolution functions (for example, the average of different numbers) must be specified. The records are first grouped by duplicates (see duplicate detection) and then aggregated within the duplicates.

About Abhishek Ghosh

Here’s what we’ve got for you which might like :

Take The Conversation Further ...

Get new posts by email:

About Abhishek Ghosh

Here’s what we’ve got for you which might like :

Articles Related to What is Data Fusion?

Take The Conversation Further ...

Get new posts by email: