• Home
  • Archive
  • Tools
  • Contact Us

The Customize Windows

Technology Journal

  • Cloud Computing
  • Computer
  • Digital Photography
  • Windows 7
  • Archive
  • Cloud Computing
  • Virtualization
  • Computer and Internet
  • Digital Photography
  • Android
  • Sysadmin
  • Electronics
  • Big Data
  • Virtualization
  • Downloads
  • Web Development
  • Apple
  • Android
Advertisement
You are here: Home » What is Data Lineage?

By Abhishek Ghosh September 23, 2021 8:40 am Updated on September 23, 2021

What is Data Lineage?

Advertisement

Data lineage or data origin refers to the question in a data warehouse system to determine the original data records from which they were created for given aggregated data records. Data Lineage includes methods and tools that make the life cycle of data traceable and answer the questions of who, when, where, why and how. It is a discipline within metadata management that is often also a function of data catalogs. Data lineage capabilities allow users to understand the context of the data they use for decision-making and other business purposes.

Typically, in a data warehouse system, data is extracted from various sources, transformed according to specific rules, and made available for analysis (see ETL Process). With data lineage, the reverse path must be described in order to get from analysis results to the sources. For this purpose, the transformations are mathematically modeled in order to determine the associated input values for given output values of a transformation (EVA principle, Economic Value Added principle). Databases are the first choice when it comes to retaining, updating, querying, deleting and presenting data. Developers depend on data consistency so that APIs can perform the right transactions and applications can access the right data. Data scientists who develop machine learning models or create data visualizations also rely on data.

All processing steps are processed as transformations T modeled from an input E, one issue A produces: T(E)=A. The Lineage T' of a data set a a of the output is defined as the subset E'. The input in which the construction of a was involved: E'=T'(a,E). The lineage of a set of records is composed of the lineage of their elements.

Advertisement

---

All transformations can be divided into three classes. It is assumed that the transformations are stable and deterministic, that is, no new output objects are invented and the output is constant with the same input.

Blackbox
A black box is a transformation that cannot be used to specify special properties. Each element of the output can depend on any element of the input. An example of a black box is a function that indicates the deviation from the mean for each number of a set.

Dispatcher
A dispatcher is a transformation that handles elements of the input independently. Each input element can generate any number of output elements (even zero). The lineage of an element of the output of a dispatcher consists of all elements e of the input together, for which it applies that e on the transformation to a was involved.

Aggregator
An aggregator is a transformation in which each input element participates in at least one output element and the input can be divided into disjoint partitions in such a way that each partition is responsible for exactly one output element. Each element of the output can thus be clearly assigned to a group of input elements. A special example of aggregators are key-preserving aggregators, where only input elements with a matching key attribute produce the same output element in which the same key occurs.

Another class of aggregators are context-free aggregators, where the mapping of an input element to a particular partition is independent of the values of other input elements.

A transformation that maps all input objects to itself (identity) or subjects each input element to a simple calculation (e.g. format conversion) is both a dispatcher and an aggregator and is also referred to as a filter.

What is Data Lineage

 

Data Lineage Calculation

 

The data lineage of a given output can be determined with a tracing procedure if the property of the transformation is known.

  • For dispatchers, each element of the input is checked to see if it generates the output and, in this case, added to the data lineage.
  • For context-free aggregators, the partitions are first formed and then the one that leads to the output is selected. The partitions are determined by successively adding the input elements to existing partitions, if the size of the output remains the same as one element.
  • For key-preserving aggregators, the keys of the input elements are checked.
  • For filters, the data lineage corresponds to the output
  • For general aggregators or black boxes, the effort for tracing is too great, since power sets of the input elements would have to be formed. Therefore, to effectively determine the data lineage of a transformation, either an explicit tracing procedure must be known or an inverse function must be used. The inverse function of a transformation can only be used as a tracing procedure for aggregators because it is not necessarily unique.

In order to determine the data lineage for an entire chain of transformations without having to store all intermediate results, the transformations are normalized by combining some of them without losing the special properties (aggregator, dispatcher, filter…) so that effective tracing is possible. Determining the optimal sequence for tracing a series of transformations connected in a row also depends on the cost model.

This Article Has Been Shared 493 Times!

Facebook Twitter Pinterest

Abhishek Ghosh

About Abhishek Ghosh

Abhishek Ghosh is a Businessman, Surgeon, Author and Blogger. You can keep touch with him on Twitter - @AbhishekCTRL.

Here’s what we’ve got for you which might like :

Articles Related to What is Data Lineage?

  • How To Install Hue on Ubuntu 16.04

    Hue is Query Tool With GUI For Browsing, Querying, Visualizing Data & Developing Apps for Hadoop. Here is How To Install Hue on Ubuntu 16.04.

  • Big Data Analytics Solutions: On-Premise versus in the Cloud

    Objective of this article Big Data analytics solutions on-premise versus in the cloud is not limited to comparing on-premise & in the cloud.

  • Getting Started with Microservices

    Here is a Brief Getting Started with Microservices Article in Plain English for the Readers Who are Not Sure What Microservices are.

  • Install Bokeh Python Visualization Library in Jupyter Notebooks

    With Bokeh You Can Create Interactive Tables and Charts. Here is How to Install Bokeh Python Visualization Library in Jupyter Notebooks.

  • Uses of Text Mining in Web Content Mining : Part IV

    In the third part of Uses of Text Mining in Web Content Mining, we informed that in this part we will discuss about the areas of application or tasks of text mining. Basically, the different methods should analyze texts and make the implicit information too explicit. Then form relations from the information in different texts, […]

Additionally, performing a search on this website can help you. Also, we have YouTube Videos.

Take The Conversation Further ...

We'd love to know your thoughts on this article.
Meet the Author over on Twitter to join the conversation right now!

If you want to Advertise on our Article or want a Sponsored Article, you are invited to Contact us.

Contact Us

Subscribe To Our Free Newsletter

Get new posts by email:

Please Confirm the Subscription When Approval Email Will Arrive in Your Email Inbox as Second Step.

Search this website…

 

Popular Articles

Our Homepage is best place to find popular articles!

Here Are Some Good to Read Articles :

  • Cloud Computing Service Models
  • What is Cloud Computing?
  • Cloud Computing and Social Networks in Mobile Space
  • ARM Processor Architecture
  • What Camera Mode to Choose
  • Indispensable MySQL queries for custom fields in WordPress
  • Windows 7 Speech Recognition Scripting Related Tutorials

Social Networks

  • Pinterest (24.3K Followers)
  • Twitter (5.8k Followers)
  • Facebook (5.7k Followers)
  • LinkedIn (3.7k Followers)
  • YouTube (1.3k Followers)
  • GitHub (Repository)
  • GitHub (Gists)
Looking to publish sponsored article on our website?

Contact us

Recent Posts

  • Get Audiophile-Grade Music on Your Smartphone March 25, 2023
  • Simple Windows Security and Privacy Checklist for 2023 March 24, 2023
  • 7 Best Artificial Intelligence (AI) Software March 24, 2023
  • ESP32 Arduino Water Tank Level Monitoring Using Laser ToF Sensor March 23, 2023
  • Exploring the Benefits and Advantages of Microsoft’s Operating System March 22, 2023

About This Article

Cite this article as: Abhishek Ghosh, "What is Data Lineage?," in The Customize Windows, September 23, 2021, March 25, 2023, https://thecustomizewindows.com/2021/09/what-is-data-lineage/.

Source:The Customize Windows, JiMA.in

PC users can consult Corrine Chorney for Security.

Want to know more about us? Read Notability and Mentions & Our Setup.

Copyright © 2023 - The Customize Windows | dESIGNed by The Customize Windows

Copyright  · Privacy Policy  · Advertising Policy  · Terms of Service  · Refund Policy

We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
Do not sell my personal information.
Cookie SettingsAccept
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Others
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
SAVE & ACCEPT