A growing number of mobile devices, social networks and the digitization of human life creating a huge flood of data. This data flow is also referred to as big data. The mere expansion of the IT parameters – more memory, better servers etc are not enough to process this information. Instead, the entire IT infrastructure must be re-examined and new solutions must be found to the requirement.
What is in-memory computing?
In order to meet the need of the above described situation, from storage and network technology to database software, the individual processing processes demands optimization. This is where the in-memory computing is used to speed up data processing significantly by increasing access speeds. It is not exactly easy to answer what is in-memory computing rather it is easy to go in to description and use of in-memory computing. As growing number of users pushes the functionality of the existing database system to its limits more need of RAM based applications comes under question. In-Memory is characterized by the fact that the primary memory (RAM) of a computer is used as a data store. In-memory technology has been around for several years. Niche which needs to rely on real-time include telecom providers, social networks and trading platforms.
Basic principle of in-memory computing
The topic in-memory computing presents specific challenges to the hardware and applications for diverse requirements.The constant development of the hardware manufacturers and the rapid development of the IT infrastructure means that the corresponding storage systems are becoming increasingly cheaper and are becoming of higher capacities. Thus, it is now possible to upgrade a single server with large DRAM modules to several hundred GB of memory. The same applies to the CPU development – in the latest information of the multi-core development one speaks with Intel of 10 cores within a processor. Through the use and cooperation of different storage systems in the server systems, appropriate storage is used for each requirement and retention times. In the current and time-critical processing, the data is stored directly in the RAM memory. This is the fastest way and provides the best basis for analytical real-time applications. For data that exceeds the size of the RAM, the data is cached via a PCIe-Flash/SSD memory card and then saved to the local hard disk based on the SAS/SSD technology. Afterwards, the information in the storage system is stored via SAN technology and stored for long-term archiving.
In-memory analytics is about the fast processing of large data in Business Intelligence (BI). It is important to provide these data quickly so that you can adapt your company strategy if necessary. In the case of the in-memory variant, the read accesses to the hard disk are reduced or stopped. This brings a performance gain, because the access to the data in the RAM is by a factor 10,000 times faster than the access to the hard disk. For RDBM’s is the bottleneck, access to the data, the read, write and the response speed of the hard disk. This can also be counteracted with SSD, but these SSD’s have only a limited number of writes. As already shown in the development, the size of the databases is also a problem. The in-memory analytics are usually also used in in-memory databases, which are explained in more detail.
For most of the companies, maintaining the customer information is the task of the relational database. There, structured data is stored in tables or records in fields on the hard disk or server. When queries are made to the database, the data must usually be read from the hard disk.
The in-memory technologies are expected to provide a much shorter response time. Their approach is to load the whole database into the RAM and to answer the queries directly from the memory. Since the data is also compressed into a memory database, the size of the database can be reduced to one-tenth.
Classical databases, OLAP (On-Line Analytical Process) solutions are often offered for BI. These are mostly necessary, for example, to make evaluations of the company. Because these queries would otherwise manipulate the database so that no other operations would be possible. This is sometimes done when the databases are not well modeled. When feeding these OLAP cubes, many queries are generated and stored, which can take a long time. This is the bottleneck of these applications, of course these operations are not needed if they are well-modeled databases. These operations are not needed with in-memory databases because they return the requested data more quickly.
For IMDB, there are some problems as the RAM is just a volatile memory. If the server fails for power supply, the data in the memory were lost. This is why large database producers are working with cyclic (snapshot) images on the hard disk. In addition, transaction logs are written to the disk. In these transaction logs, changes to the database are written, for example, value x is changed to y. Mostly, two database servers, in the active standby cluster, are used for a database that shares a storage system. After the one server has lost the power supply, the second one leaves the snapshots and action logs on the hard disk and can thus restore the data from the previously active server. The data that was processed at the time of the crash can not be restored, if, for example, an update of the data was carried out during the loss of the power supply. For this reason, functions are also used in the application which recognize this and, if necessary, re-execute the request. There are many common features of the IMDB on the market:
- Data storage: On startup, all data are read into the RAM in order to avoid having to reload any data
- Snapshot Images: Changed data are periodically synchronized with the hard disk.
- Transaction logs: Between the individual checkpoint files, current changes are written to these logs in order to be able to follow a rollforward after a crash.
- ACID principle: Like conventional RDBMS, IMDB can also process all data according to the ACID principle (atomism, consistency conservation, isolation, durability).
- High availability: Usually, several database servers are used for a database to ensure data replication.
- Direct-Connect: The application can directly load the in-memory database into its address range to avoid any overhead.
Use of in-memory computing
In-memory computing offers several technologies, but the most important is by far the in-memory database (hereinafter referred to as IMDB), for which software and their use is available in the market. Already since 1984, there are in-memory databases, one of the first available products was the IBM TM1 – OLAP database. However, limitations in the operating system and available hardware meant that these systems could not take advantage of their advantages. In recent years, these limitations have been gradually phased out, and several IMDB solutions from major vendors have been available on the market.
IBM introduced SolidDB as an IMDB in 2008, which provides data integrity through two separate but permanently synchronized database copies as well as through permanent logging on the hard disks. In the event of data loss, the entire database can be restored within seconds without loss of data. Oracle introduced TimesTen in 2009 as an IMDB, which can be used as a cache for the traditional RDBMS or as a standalone database. TimesTen uses transaction logging and database checkpoints as data integrity measures. In 2010, SAP presented HANA as a database technology, which is a High Performance Analytic Appliance.
SQLite is a program library that contains a relational database system and is, due to many database interfaces, the most widely used SQL database in the world. To use SQLite databases in the main memory you can use the “: memory” option in the database connection. As soon as you close the database connection, the database is written to the disk. In-memory databases are becoming more and more popular and find their use mainly in the fields of time-critical applications, real-time data output as well as analysis of large amounts of data. For example, Google, Twitter and Facebook use all customized in-memory databases to ensure fast response times with ever-increasing amounts of data.
With regard to the topic of in-memory computing, IT experts speak of a paradigm shift, a new era of data processing or real realtime business. For almost a year now, the largest German software company, SAP, has not had the opportunity to demonstrate the benefits that users would be able to derive from the new technology. The speech is from completely new analysis and business applications, as well as a clearly simplified and cost-effective IT infrastructure.
The fact that these statements are not so generalized could be proved during the conduct of the research and experiments. The productive use of in-memory computing offers completely new possibilities in the field of big data and real-time analysis, but the conversion and acquisition costs in the field of both hardware and software are still very high large corporations.
In less complex areas, such as unit tests in software development, in-memory databases are already being used in small and medium-sized enterprises. The technologies of the in-memory computing will gain in importance in near future due to steadily increasing data volumes and falling hardware prices.