• Home
  • Archive
  • Tools
  • Contact Us

The Customize Windows

Technology Journal

  • Cloud Computing
  • Computer
  • Digital Photography
  • Windows 7
  • Archive
  • Cloud Computing
  • Virtualization
  • Computer and Internet
  • Digital Photography
  • Android
  • Sysadmin
  • Electronics
  • Big Data
  • Virtualization
  • Downloads
  • Web Development
  • Apple
  • Android
Advertisement
You are here: Home » What is Data Lineage?

By Abhishek Ghosh September 23, 2021 8:40 am Updated on September 23, 2021

What is Data Lineage?

Advertisement

Data lineage or data origin refers to the question in a data warehouse system to determine the original data records from which they were created for given aggregated data records. Data Lineage includes methods and tools that make the life cycle of data traceable and answer the questions of who, when, where, why and how. It is a discipline within metadata management that is often also a function of data catalogs. Data lineage capabilities allow users to understand the context of the data they use for decision-making and other business purposes.

Typically, in a data warehouse system, data is extracted from various sources, transformed according to specific rules, and made available for analysis (see ETL Process). With data lineage, the reverse path must be described in order to get from analysis results to the sources. For this purpose, the transformations are mathematically modeled in order to determine the associated input values for given output values of a transformation (EVA principle, Economic Value Added principle). Databases are the first choice when it comes to retaining, updating, querying, deleting and presenting data. Developers depend on data consistency so that APIs can perform the right transactions and applications can access the right data. Data scientists who develop machine learning models or create data visualizations also rely on data.

All processing steps are processed as transformations T modeled from an input E, one issue A produces: T(E)=A. The Lineage T' of a data set a a of the output is defined as the subset E'. The input in which the construction of a was involved: E'=T'(a,E). The lineage of a set of records is composed of the lineage of their elements.

Advertisement

---

All transformations can be divided into three classes. It is assumed that the transformations are stable and deterministic, that is, no new output objects are invented and the output is constant with the same input.

Blackbox
A black box is a transformation that cannot be used to specify special properties. Each element of the output can depend on any element of the input. An example of a black box is a function that indicates the deviation from the mean for each number of a set.

Dispatcher
A dispatcher is a transformation that handles elements of the input independently. Each input element can generate any number of output elements (even zero). The lineage of an element of the output of a dispatcher consists of all elements e of the input together, for which it applies that e on the transformation to a was involved.

Aggregator
An aggregator is a transformation in which each input element participates in at least one output element and the input can be divided into disjoint partitions in such a way that each partition is responsible for exactly one output element. Each element of the output can thus be clearly assigned to a group of input elements. A special example of aggregators are key-preserving aggregators, where only input elements with a matching key attribute produce the same output element in which the same key occurs.

Another class of aggregators are context-free aggregators, where the mapping of an input element to a particular partition is independent of the values of other input elements.

A transformation that maps all input objects to itself (identity) or subjects each input element to a simple calculation (e.g. format conversion) is both a dispatcher and an aggregator and is also referred to as a filter.

What is Data Lineage

 

Data Lineage Calculation

 

The data lineage of a given output can be determined with a tracing procedure if the property of the transformation is known.

  • For dispatchers, each element of the input is checked to see if it generates the output and, in this case, added to the data lineage.
  • For context-free aggregators, the partitions are first formed and then the one that leads to the output is selected. The partitions are determined by successively adding the input elements to existing partitions, if the size of the output remains the same as one element.
  • For key-preserving aggregators, the keys of the input elements are checked.
  • For filters, the data lineage corresponds to the output
  • For general aggregators or black boxes, the effort for tracing is too great, since power sets of the input elements would have to be formed. Therefore, to effectively determine the data lineage of a transformation, either an explicit tracing procedure must be known or an inverse function must be used. The inverse function of a transformation can only be used as a tracing procedure for aggregators because it is not necessarily unique.

In order to determine the data lineage for an entire chain of transformations without having to store all intermediate results, the transformations are normalized by combining some of them without losing the special properties (aggregator, dispatcher, filter…) so that effective tracing is possible. Determining the optimal sequence for tracing a series of transformations connected in a row also depends on the cost model.

This Article Has Been Shared 299 Times!

Facebook Twitter Pinterest
Abhishek Ghosh

About Abhishek Ghosh

Abhishek Ghosh is a Businessman, Surgeon, Author and Blogger. You can keep touch with him on Twitter - @AbhishekCTRL.

Here’s what we’ve got for you which might like :

Articles Related to What is Data Lineage?

  • How To Install Hue on Ubuntu 16.04

    Hue is Query Tool With GUI For Browsing, Querying, Visualizing Data & Developing Apps for Hadoop. Here is How To Install Hue on Ubuntu 16.04.

  • Big Data Analytics Solutions: On-Premise versus in the Cloud

    Objective of this article Big Data analytics solutions on-premise versus in the cloud is not limited to comparing on-premise & in the cloud.

  • Getting Started with Microservices

    Here is a Brief Getting Started with Microservices Article in Plain English for the Readers Who are Not Sure What Microservices are.

  • Install Bokeh Python Visualization Library in Jupyter Notebooks

    With Bokeh You Can Create Interactive Tables and Charts. Here is How to Install Bokeh Python Visualization Library in Jupyter Notebooks.

  • Uses of Text Mining in Web Content Mining : Part IV

    In the third part of Uses of Text Mining in Web Content Mining, we informed that in this part we will discuss about the areas of application or tasks of text mining. Basically, the different methods should analyze texts and make the implicit information too explicit. Then form relations from the information in different texts, […]

Additionally, performing a search on this website can help you. Also, we have YouTube Videos.

Take The Conversation Further ...

We'd love to know your thoughts on this article.
Meet the Author over on Twitter to join the conversation right now!

If you want to Advertise on our Article or want a Sponsored Article, you are invited to Contact us.

Contact Us

Subscribe To Our Free Newsletter

Get new posts by email:

Please Confirm the Subscription When Approval Email Will Arrive in Your Email Inbox as Second Step.

Search this website…

 

Popular Articles

Our Homepage is best place to find popular articles!

Here Are Some Good to Read Articles :

  • Cloud Computing Service Models
  • What is Cloud Computing?
  • Cloud Computing and Social Networks in Mobile Space
  • ARM Processor Architecture
  • What Camera Mode to Choose
  • Indispensable MySQL queries for custom fields in WordPress
  • Windows 7 Speech Recognition Scripting Related Tutorials

Social Networks

  • Pinterest (22.1K Followers)
  • Twitter (5.8k Followers)
  • Facebook (5.7k Followers)
  • LinkedIn (3.7k Followers)
  • YouTube (1.3k Followers)
  • GitHub (Repository)
  • GitHub (Gists)
Looking to publish sponsored article on our website?

Contact us

Recent Posts

  • The Cost of Doing Business as a Handyman July 1, 2022
  • Samsung Galaxy S22 Ultra: Long Term Review June 30, 2022
  • How to Make the Most of Your S Pen (S22 Ultra) June 29, 2022
  • Safe Chargers for Samsung Galaxy S22 Ultra June 27, 2022
  • How Telecoms Can Use The Cloud To Power Their 5G Network June 24, 2022

About This Article

Cite this article as: Abhishek Ghosh, "What is Data Lineage?," in The Customize Windows, September 23, 2021, July 1, 2022, https://thecustomizewindows.com/2021/09/what-is-data-lineage/.

Source:The Customize Windows, JiMA.in

This website uses cookies. If you do not want to allow us to use cookies and/or non-personalized Ads, kindly clear browser cookies after closing this webpage.

Read Privacy Policy.

PC users can consult Corrine Chorney for Security.

Want to know more about us? Read Notability and Mentions & Our Setup.

Copyright © 2022 - The Customize Windows | dESIGNed by The Customize Windows

Copyright  · Privacy Policy  · Advertising Policy  · Terms of Service  · Refund Policy