ETL (Extract, transform, load) is a process in which data from multiple structured data sources in a target database are merged. The process is known primarily for use in the operation of a data warehouse. Here large amounts of data from multiple operational databases need to be consolidated in order to be stored in the data warehouse.
ETL (Extract, transform, load) and Virtualization
ETL (Extract, transform, load) process can be as a general process of information integration applied to other databases. The aim is to bring together the heterogeneous structured data from different sources. The process must run both efficiently to minimize blocking times in the sources, as well as the quality of the data type, so that they can be fully and consistently maintained despite possible changes in the sources in the data warehouse.
Newer applications of data warehouses require the accelerated data addition. The focus of ETL therefore increasingly aimed at minimizing the latency time until the data from the source systems are available. For this, a more frequent implementation of the process is necessary. In general, a will at all stages repository involved, in particular, the necessary data cleansing and transformation rules and schema data as metadata receives and retains the long term.
Typically most of the ETL processing is performed during periods away from the peaks of processing, usually during the night when online transactions are much less intense. These workloads typically do not need dedicated servers or VM. For more efficient ETL processing, automation of virtualization can put ETL processing running on a pre-configured VM only if necessary. The VM can then be released once the ETL processing is complete, making these resources available for other applications and uses, and at other times. Ability to automatically create a new VM, run the ETL workload, and then release the VM , is essentially a form of infrastructure on demand. This type of infrastructure management enables a very efficient use of physical and virtual resources.
More on ETL and Virtualization
Basically, ETL process can be implemented by self-produced programs in any programming language. But more and more companies use existing ETL program systems from other manufacturers. The main reasons for the use of standard tools are usually the following:
- Each standard tool supports access to the popular database systems and ERP – and file systems
- The development is appropriate transformations, methods and procedures (such as visualization of the data flow, error handling, scheduling) support.
- Usually the conditions are implemented in the standard tool for high-performance Loading. A precise knowledge of the mechanisms of target systems legal driving mostly.
- Development and maintenance of the ETL processes are usually simple and inexpensive to perform visualized by standard tools than systems based on programs developed using programming languages.