The term data virtualization refers to certain approaches in the field of data management as a subset of data integration. These make it possible to query and manipulate data from source systems without the need for the querying system to be aware of its detailed technical information – such as the structure of the data source or the physical storage location.
Data virtualization can be seen as an alternative to the data warehouse approach with its ETL processes, in which the data is extracted from the source systems, transformed and finally loaded into the analytical system. The data, on the other hand, remains in its original systems, the virtualization component accesses this data directly and makes it available for further manipulation or consumption by further applications.
In order to eliminate the heterogeneity of the data (differences in data sources, format and semantics), various abstraction and transformation techniques are used. Potential benefits of this approach include the reduction of erroneous data and, if the virtualization component is designed appropriately, lower utilization of the systems involved. Furthermore, it is possible to write data back to the source systems.
---
Typical areas of application of the concept and corresponding software are in business intelligence, service-oriented architecture, cloud computing, enterprise search and master data management.
Data Virtualization and Data Warehousing
Many enterprise system landscapes consist of disparate data sources, including multiple data warehouses, data marts, and/or data lakes. Data virtualization can build a bridge over these source systems without the need for additional physical data storage. The existing data infrastructure can continue to perform its core functions, while the data virtualization layer only uses the data from those sources. This aspect can help increase data availability and usage.
Data virtualization can also be considered as an alternative to ETL processes and data warehousing. The concept aims to deliver insights from multiple data sources quickly and in a timely manner, without the need for extensive ETL processes and additional data storage. However, data virtualization can be extended and customized to meet data warehousing requirements as well. This requires an understanding of data storage requirements and historization, along with planning and design, to select appropriate data virtualization, integration, and storage strategies, as well as to perform infrastructure/performance optimizations (e.g., streaming, in-memory, hybrid storage).

Image credit: DataWerks Gmbh
Features of Data Virtualization
Data virtualization solutions offer a choice or all of the following features:
- Abstraction – abstracting the technical aspect of the stored data such as location, storage structure, API, query language, and storage technology
- Virtualized data access – access to different data sources and make the data available at a common logical access point
- Transformation – transformation, data quality improvements, reformatting, aggregation of source data
- Data federation – combining result sets from multiple source systems
- Data delivery – Publishing result sets as views and/or data services that can be accessed by client applications or users
In addition, data virtualization software may include development, operation, and/or management capabilities. When used correctly, the following benefits can be achieved with the concept of data virtualization:
- Reduction of erroneous data
- Reduction of system load by keeping the data in the source system
- Increased access speeds
- Reduction of time required for development and support
- Increased governance and reduced risk through policy application
- Reduce storage footprint
Disadvantages of Data Virtualization
- Operational systems could be affected in their response times. Especially if they can’t handle unexpected queries.
- Data virtualization does not enforce a heterogeneous data model, this means that the user must interpret the data unless it is combined with data federation and business understanding of the data.
- Data virtualization requires a defined governance approach to avoid budgeting issues across shared services.
- Data virtualization is not suitable for historizing data. A data warehouse is better suited for this.
- Change of management is associated with increased effort because all changes to the virtual data model must be accepted by all consuming applications and users.