In first part of the series of articles on Analysis of Docker in DevOps, we discussed the basic terminologies and concepts around Docker and DevOps. As per our previously declaired plan, Dockers implementation will be discussed after discussing the specific requirements within software projects. In this part, requirements will be discussed. IT has several requirements to fulfill, needs to go through various processes and to carry out a continuous control of their systems. As a result, it poses various problems and errors every day. In this series of articles, a distinction is made between the requirements within the development and the operation, for the sake of clarity. We must ensure implementation of methods and methods which must be met to represent a complete software architecture, to ensure permanent integration of innovation within the software, and to prevent problems in software development.
Analysis of Docker in DevOps : Requirements
Already in the development process, so-called bugs in the software development are created, which have to be fixed as fast as possible, because otherwise they cause great problems and a lot of work in the further course of the system architecture.That is why it is important at such an early stage of the development to certain Respect processes and perform tests.
The biggest issues are developing and operating in different environments. Within the system management, the problem arises that the development does not have the same environments as in the productive or testing system As a result, errors can not be detected directly and take time. It must therefore be used in the development of a production-same system to avoid any errors.
The documentation plays a very important role in the development, it serves for the clarity of the systems, as well as for better communication. The system requirements and metadata of the production and development environments must be documented and made freely accessible.
Another development requirement is ensuring security and minimal system errors. To achieve this, it is advantageous to use different testing methodologies to continuously check the software for security and errors. These tests are an important part of the deployment process that involves both continuous development and software delivery. The optimal case offers the possibility to carry out all tests automatically without personnel, to create protocols and to make these available, depending on the topic, both the development and the operation.
One of the most used and best-known tests are the unit tests. They are created by developers and form the direct test for the code to be executed. For all functions offered by the software, tests are used to simulate input, log and check results. The unit tests thus support the modularity of the software, can check every part of the code and can be easily performed since they have a clear procedure.
Another method to test the software are the integration tests. An apt definition was made in the book “Software Reliability” – Integration testing is the verification of the interfaces among system parts (modules, components, and subsystems). Integration testing tests multiple pieces of software within a complex system. This type of test is performed to test the components together to ensure quality in the production system. Interfaces are often considered, since they are responsible for a high data exchange and a dependency is therefore indispensable.
The static code analysis is a method to test the code of the software for syntactic errors as well as error patterns. The entire code is searched for these errors, but not changed. This method is usually done automatically, and various tools can already annotate the development. Newest tools can also point to qualitative code passages such as duplicated code, speed drops or the like.
Another analysis to check the software for errors is the dynamic analysis, which is divided into load testing and benchmarking. The methods are characterized by the software being tested for various environments and inputs by the user using runtime parameters. The load test simulates the increased access to the software, for example on a web server. Within this test, the number and the behavior of the software are logged and subsequently evaluated. Benchmarking works in a similar way to the load test, but it can also be applied to the sub-process or components, while load testing tends to focus on the entire software.
There are also software tests that focus more on the behavior of users in relation to the software. These are not discussed further in the following paper, as they are not directly related to DevOps and automated deployment processes.
Requirements within the operation
The release of new features in deployment systems is the interface between development and operations. The goal of the deployment is, among other things, a low downtime and an automated, easy way to place new software releases within the production system. A more widespread deployment is the so-called blue-green deployment. With this deployment, it is possible to reduce downtime to a minimum. In this process, two production systems are operated and at the end of the deployment, a controller transfers from the obsolete system to the new system. The entire process from development to deployment in the live system is also called deployment pipeline. This describes the exact interface of DevOps’ architecture and applies methods – the deployment pipeline is the place where the architectural aspects and the process aspects of DevOps intersect. It is divided into the different phases of the development environment, the continuous integration, the operational test environment and the operating environment.
Development environment : The development environment then mainly performs unit testing, function debugging, and profiling. In this environment, all innovations are developed and tested at a low level. Subsequently, the developed blocks are transferred to Continuous Integration.
Continuous integration : Here are the already discussed tests carried out and checked the software for correctness. For errors and failed tests, these are returned to the development, in order to improve them.
Operational test environment : Subsequently, the software is implemented in a close-to-operational test environment. This is characterized by the high equivalence to the productive system. Within this environment, users can also test the software extensively, load tests can be performed, and the system can be monitored.
Production system : In the final step of the deployment, the software is moved to the production system. At this stage, a high error diagnosis is needed, a monitoring of the system should be present. In order to ensure high elasticity of the system, high scalability is almost unavoidable.
Release : At the end of the deployment pipeline, the new pieces of software will be released. This part is also called software release and describes the condition that all new features have gone through the Deployment pipeline and are now publicly available. At this stage, deployment is complete. This is followed by monitoring and monitoring of the system.
Requirements For Integration of Docker
Docker requires a 64-bit Linux environment with a minimum version of the kernel, other runtime requirements are iptables, git, procps or a similar package, xz utils, and correctly configured cgroupfs. To run Docker on other operating systems, such as Windows or Mac OS X, it requires a virtual machine that meets the above requirements.
Docker aims to solve previously discussed issues through a “build – ship – run” approach, thereby reducing the risk and workload of deployments. An important core of Docker concept is that containers do not change themselves and are thus stateless. Data that is generated in a particular session and is necessary for operation (eg the upload of a photo) must not be stored directly in the container. This has the advantage that containers can be destroyed and recreated at any time without losing runtime changes. Some of the solution concepts covered by Docker’s functionalities and tools are further explained below using examples in everyday scenarios to illustrate the purpose or need.
Data Volumes are directories which are suitable for storing files that are either over the lifetime of a container, or in several containers at the same time. Data volumes do not use Docker’s otherwise used Union File System, so they are not layer based.
Data volumes are initialized when a container is started
Data volumes can be used simultaneously by multiple containers
Data volumes are not based on layers, so changes are written directly without detour
Data volumes are never deleted automatically, even if all associated containers are deleted
The downside to data volumes is that the files reside on the host system; a container on host A can not access the same data volume as a container on host B. However, Docker also offers a solution with plugins in which it is possible to mount NFS, iSCSI and Fiber Channel drives in containers.
Another way to store data is through the use of external providers such as AWS S3 or Google Cloud Storage. The data is then stored completely in an external provider, so you do not have to worry about the scaling of the storage system.
There are basically two options for deployment: Either the container is built live using the Dockerfile, or a ready-built image is started. Since a built container runs on every docking system, a container is usually built on a “build machine” and then pushed into a repository.
The big advantage is that the container, including all dependencies of the application already exists. Dependencies that are resolved in Livedeployment represent a great risk. Some time ago, for example, some often used NodeJS packages were removed from the repositories, which ensured that dependable applications were no longer build. If this problem occurs while updating dependencies during development, it can be remedied without major problems. If these dependencies are fetched live during deployment, at least the deployment fails and as a result, depending on the process, with the result of a downtime. The Docker Images layer construct makes each image consist of multiple layers, each representing the changes to the previous layer. Modifying a container therefore only results in an additional layer describing this change.
The commands used to start a container, like to start a container based on the “ubuntu” image and then run “Hello world” has been shown on basic tutorials on Docker. Since Docker do not have the image locally, Docker queried the registries, downloaded the image there and then started. If the command is run again later, Docker will use the existing local images directly.
The use of a loadbalancer and Docker can increase the available resources in the short term. Another advantage can be provided by a management system, in which the Docker machines start automatically at high capacity (for example at an 80% utilization rate) and stop again after the number of visitors of website when will be dropped. The personnel expenses are completely eliminated, since only further containers have to be started. Since every container according to the Docker principle should be “immutable”, ie unchangeable, so-called autoscaler is relatively easy to implement since, apart from the image located on the registry, no further run-time-generated data must be transmitted during scaling. All we need have to do is use a load balancer and a management system. The loadbalancer forwards the traffic to X instances, while the management system monitors the load on each Docker machine and then starts or stops containers depending on how busy the instances are. To register the worker instances on the load balancer, there are several service registration approaches. Either the container registers itself fully automatically at the start of the load balancer and is thus available to it (and de-registered on stop), or the registration is taken over by the administrative entity, which initiates the start of the container. Furthermore, there are so-called container clouds which, facilitated by the use of Docker Swarm, are easy to implement. Many large vendors offer such container clouds for easier use, including, among others – AWS Elastic Container Cloud, Google Container Engine, Rackspace Carina, Kubernetes and so on.
By further abstraction, it is thus possible for any developer to build an automatically scaling environment completely without administration knowledge. Google Container Engine is based on Kubernetes, simply creates a load balancer and a container cluster. To pass Kubernetes the scaling of the cluster is simply activated autoscaling, as well as min/max number of containers configured. Kubernetes then takes over the automatic starting and stopping of containers based on, for example, the CPU utilization of the containers. The zero-downtime deployment is also abstracted by Kubernetes, where one container after the other is updated to the new image during an update.
To save on server costs, several systems are installed on a server and distributed to different folders. In a new software release of the systems, an employee renews the respective versions of the systems manually on the server. If a so-called exploit, i.e. a security vulnerability, is published and thus third parties get access to the file system of the server, then security of all customers on the same server is no longer guaranteed and 100% security of the systems can not be guaranteed. At this point, it is advisable to outsource the systems on the servers in Docker Container to keep all systems clean and secure. The Docker containers encapsulate the systems in such a way that it is not possible to access the executing server and services from a container. In addition, it is possible to limit customers’ resources so that one customer’s congestion does not affect the other customers.
Docker containers use the kernel namespace features and start each container in its own namespace. As a result, a container can neither see nor change another’s processes. Each container receives its own network stack, and can only communicate with other containers if configured accordingly. Another core security feature is Control Groups, which allows you to limit available resources such as CPU, memory and disk I/O per container. Against encroachments under different containers Docker is as well protected as virtual machines – only kernel exploits could make an outbreak here. However, by far the greatest threat comes from services on the host system, since from there via the Docker daemon access to all containers. It is therefore advisable to operate only the Docker Daemon and an SSH server on the host system and to move all other services into containers.
Conclusion on Part II of Analysis of Docker in DevOps
In this article, we had a revision on different types on requirements. However, it is not the end of requirements. There are matters like Employee Onboarding, Platform independence, Logging, Configuration Management and so on. In the next and final part of this series, we will discuss those matters, briefly discuss using Docker and draw a conclusion.