Data science, machine learning (ML) and artificial intelligence have triggered a real hype in recent years and received a lot of attention in the industry. Machine learning methods are used to increase either the productivity of the users or the interactivity of the application. Numerous data science teams spend their time training machine learning models.
However, we observe two types of problems that arise in practice.
In this article, we have tried to explain how continuous delivery works for machine learning models and the basics of evaluating model performance. Above all, how to accommodate data engineering, model engineering and software engineering pipelines in a CI/CD pipeline. In addition, we want to show how to automate the manual process of deploying ML models using DevOps practices.
|Table of Contents|
Listen to the audio of this article :
Either the majority of ML models do not manage to be integrated into a software product, or the model deployment takes too much time. We have identified several reasons that explain this problem:
- ML Model Deployment is a complex process. In general, this involves the management of the three pipelines Data Engineering, Model Engineering and Software Engineering.
- There are no standardized processes to bring an ML model into the production environment. Machine Learning Model Operation Management (MLOps) is still in its early stages.
- Defining the right infrastructure stack to automate machine learning deployments currently requires a trial-and-error process. Many tools and systems for machine learning and AI model monitoring are in an active commercial offering phase.
- Data engineering, in which you first analyze whether the necessary data is available or what effort we have to make to obtain the data and provide it with labels (for training algorithms). In addition, data engineering can include other steps such as data integration, preparation, cleaning and validation.
- Model engineering, in which different classification algorithms are trained on the data and thus different ML models are produced. By evaluating the performance, it is decided which ML model is best suited for our problem.
The implementation of machine learning in software systems takes place in few phases.
Even the best ML solution will fail if it does not solve usage or business problems. Therefore, in this first phase, we try to find out who the users of our software are. What problems do they have? Which of these problems are best solved with ML? And most importantly, can we accept a nondeterministic solution through ML? Let’s take as the example software for billing professional travel expenses. This software could automatically detect some metadata fields, such as the category of a document, and thus facilitate the process of travel expense reporting.
This phase is crucial for the success of an ML project. However, it may take some time until the right problem is found.
After we have analyzed the workflows in software usage and identified use cases (tasks) suitable for ML, we can convert these use cases into ML projects. You then start the so-called Data Science Research. In this second phase, you go through a series of processes iteratively:
The mapping of the user problem to the ML algorithms. You analyze and identify which ML algorithm is best suited to solve the problem. In our example, we chose unsupervised ML because category recognition is a classification task.
Operationalization of Machine Learning
Training the ML models can take a lot of time and be challenging, but the real challenge is integrating an ML system into the production environment, i.e. into the software product that interacts with users. An ML system consists of three main elements: training data, ML model and code for model training. We use the DevOps principles for ML systems (MLOps) to combine ML development and operations. As an extension of DevOps, MLOps is dedicated to automating and monitoring (in all steps) the integration of ML systems into software projects.
The measures may be applied for several purposes, including:
- Model evaluation: a developer may want to know how good the model is, i.e., how reliable are the model’s predictions or how frequent the errors or the expected magnitude of errors;
- Model comparison: a developer may want to compare two or more models to offer a client option to choose between them;
- Out-of-sample and out-of-time comparisons: a developer may want to check a model’s performance when applied to new data from the client to evaluate whether performance has been worsened.
Most of the commercially available model-performance measures are based on the comparison of the model’s predictions with the values in a provided dataset. In real life predictions and the dependent-variable values are not equal and hence for professional service delivery, and we want to quantify the disagreement.
The goal of this article was to direct the self-employed professionals towards the upcoming measures to offer better service to better clients.
Normally, we use rapid application development (RAD) tools for the experimental phase. The main focus of RAD tools is on creating ML prototypes and functional requirements. Very popular are e.g. Jupyter notebook or cloud-based environments such as Colab from Google. There are several advantages and disadvantages to using RAD tools. The data was prepared, new features were derived and various machine learning models were trained. But how a developer will evaluate the performance of these models? For this purpose, so-called model performance model monitoring is being introduced in the machine learning area.