Extract, Transform and Load (ETL) is the process which allows the organizations to move data from multiple sources, reformat, clean them, load them into another database, data mart, or data warehouse to analyze, or in use in another operational system to support a business process. ETL processes can also be used for integration with legacy systems. They became a popular concept in the 1970s.
How is the ETL Process?
The first part of the ETL process is to extract the data from the different source systems. Most of the data storage merge data from different source systems. Each separate system can use a different organization of different data or formats. The extraction process converts the data to a format prepared to start the transformation process. An intrinsic part of the extraction process is to analyze the extracted data, which results in a check that verifies if the data meets the expected pattern or structure. If not, the data is rejected. An important requirement that must be demanded from the extraction task is that it causes a minimal impact on the source system. If the data to be extracted are many, the source system could slow down and even collapse, causing it to not be used normally for everyday use. In large systems extraction operations are usually scheduled at times or days where this impact is zero or minimal.
The transformation phase is applied on the extracted data to convert it into data that will be loaded. Some data sources will require some small data manipulation. However in other cases it may be necessary to apply transformations, to give easy examples such as merge data from multiple sources, calculate totals from multiple rows of data, divide one column into several, reject the complete record of a wrong data.
The loading phase is the moment at which the data from the previous phase is loaded into the destination system. In some databases, old information is overwritten with new data. The data warehouse keeps a record of the records so that they can be audited and have a trace of the entire history of a value over time. The Rolling process, is applied in cases where it is decided to maintain several levels of granularity.
Other Points on ETL Process
Now the ETL software has parallel processing which allowed the development of a series of methods to improve the overall performance of ETL processes for large volumes of data. Data virtualization is an advance of ETL processing. Extract, load, transform (ELT) is a variant of ETL.