OS-level virtualization is a method of allowing multiple instances of an operating system (as “guests”) to use the kernel of a host system in isolation from each other. In contrast to virtualization using a hypervisor, container virtualization has some limitations like its guests but is considered to be particularly resource-efficient.
The Docker software became popular in IT in 2013, among other things through intensive cooperation with Red Hat and the integration into their product OpenShift. However, there had been similar projects before. On an ordinary operating system, any program can usually view and use all system resources. Among other things:
- Usable hardware (components), such as CPU and network
- Storage (read/write), folder structures, and network storage
- Peripherals such as keyboard, webcam, scanner and printer.
The operating system may restrict access to such resources depending on the user and context under which the process is running. Containerization can be used to manage which system resources are allocated to the processes in the container.
In 1979, the developers of Unix introduced the chroot system call, with which a part of the file system could be isolated from the rest and thus took a first step towards virtualizing the operating system. For many years, the approach was used only sporadically for software testing and server protection purposes, especially among BSD-Unix derivatives, who developed it further under the name Jails. Even though there were activities among Linux developers in the late 1990s with User Mode Linux to start the operating system in the operating system, this approach only received greater attention in professional circles. In the mid-2000s, the open-source project OpenVZ and the Virtuozzo product based on it spread software that allowed web hosts to run many Linux websites on a single server. The Solaris and BSD operating systems each had their implementations of the principle.
Under the impression of these developments, the developers of the Linux kernel had taken precautions to build similar functions into their operating system. These include namespaces, cgroups, and capabilities. Many of these techniques are used under the term LXC, but still require a lot of detailed knowledge of the structure of operating systems and operating system distributions. That changed in 2013 when the company then called dotCloud introduced Docker, which made it easier for application developers to package their software in containers. As a result, several alternatives to Docker have emerged, especially for Linux, including rkt (pronounced Rocket) and the Nspawn subproject of systemd. Some projects and vendors also package container virtualization into products that include other management software, such as orchestration or platform as a service. Examples of this are the projects Kubernetes or OpenShift.
Realizations of OS-Level Virtualization
Many projects and products implement the principle of container virtualization but differ in the scope of which system resources (for example, processes, file system, network interfaces) they virtualize and isolate from each other. Some realizations include:
- chroot isolates part of the file system from the surrounding rest, making it appear to a process as if the path passed as an argument is the root directory of the file system. However, the process can be easily overturned, which is why it is not suitable as real insulation.
- UML (User Mode Linux) runs a Linux kernel as a userland process.
- BSD Jails further develops the idea of chroot for Unix’s BSD derivatives.
- Solaris Zones was the evolution of Sun Microsystems for its Unix derivative Solaris.
- OpenVZ is a significant addition to the Linux kernel with features that are similar to LXC’s functionality today. However, the Linux kernel developers have rewritten and generalized most of the functions (namespaces, cgroups). The proprietary variant of OpenVZ is distributed as Virtuozzo by Parallels, Inc.
- LXD is a Canonical product that is built on top of LXC, but adds many more features, some of which come from hypervisor virtualization.
- Docker is a container virtualization platform that uses many of the above techniques and complements them with easy-to-use tools and services. This includes, for example, a description of images (Dockerfiles) or a repository that manages such images. The manufacturer of the same name offers many more supplements for the platform, some of them free of charge, others for a fee.
- systemd-nspawn is a subproject of the systemd framework and was originally intended to test systemd itself without rebooting the machine each time. It uses the Linux-specific namespaces and cgroups.
- rkt is an alternative implementation of the Docker approach from competitor CoreOS, which had criticized individual design decisions of Docker, which primarily concern security.
- runC is an attempt to settle the looming schism between Docker and kkt. For this purpose, several manufacturers have founded the Open Container Initiative under the umbrella of the Linux Foundation. runC is only a runtime environment and does not include many features of other container virtualization platforms.
- Podman is a container manager that can run without a daemon and is based on the concept of Kubernetes Pods Podman is an evolution of Skopeo by Redhat.
Most realizations of container virtualizations come from the environment of the operating system family Unix. It became popular especially in the context of Linux from 2013 through Docker. There are also implementations for the host operating systems Windows and MacOS, which, however, ultimately use a lightweight hypervisor in addition to container virtualization to restart a Linux kernel and then use it with Docker. There is also native container virtualization for operating systems other than Linux, but it has not yet become widely used.
Since all guests of container virtualization use the same kernel, it must have strong mechanisms to realize the isolation of the individual guests. This is not easy with complex software such as a Linux kernel with several hundred system calls and various other ways of communicating with the kernel.
By isolating the file systems, each container uses its own set of system libraries. If vulnerabilities become known in them, such as the Heartbleed vulnerability of the SSL/TLS libraries OpenSSL, a system administrator must update all their instances on one computer instead of just once per server.
Due to the variety of settings and configuration options, containers can be easily set to open up unwanted access possibilities. For example, privileged containers allow more functions to be performed within the container, but weaken the isolation of the containers from the host.
As a service for containers, repositories have been created that offer ready-made images that can be run directly on the container platform. Some of these artefacts are of dubious quality and may contain vulnerabilities due to ignorance or ill will on the part of the providers if they have not been checked before downloading and operating.