Deployment of machine learning (ML) models means operationalizing your trained model to fulfill its intended business use case. If your model detects spam emails, operationalizing this model means integrating it into your company’s email workflow—seamlessly. So, the next time you receive spam emails, it’ll be automatically categorized as such. This step is also known as putting models into production.
Machine learning model deployment is also known “putting models into production”
Without this step, your model is just a nice little pet project, useful for demonstration purposes. Models cannot serve large applications from a developer’s laptop. What happens when the laptop is turned off? Or if it’s stolen?
When are ML models deployed?
Machine learning models are deployed when they have been successful in the development stage—where the accuracy is considered acceptable on a dataset not used for development (also known as validation data). Also, the known faults of the model should be clearly documented before deployment.
Even if your spam detection model has a 98% accuracy it doesn’t mean it’s perfect. There will always be some rough edges and that information needs to be clearly documented for future improvement. For example, emails with the words “save the date” in the subject line may always result in a spam prediction—even if it isn’t. While this is not ideal, deployment with some of these known faults is not necessarily a deal breaker as long as you’re able to improve its performance over time.
How do ML models integrate into applications?
Models can integrate into applications in several ways. One way is to have the model run as a separate cloud service. Applications that need to use the model can access it over a network. Another way is to have the model tightly integrated into the application itself. In this case, it will share a lot of the same computing resources.
How the model integrates within your business systems requires careful planning. This should ideally happen before any development begins. The setup of the problem you are trying to solve and constraints under which models need to operate will dictate the best deployment strategy.
For example, in detecting fraudulent credit card transactions, we need immediate confirmation on the legitimacy of a transaction. You can’t have a model generate a prediction sometime today only to be available tomorrow. With such a time constraint, the model needs to be tightly integrated into the credit card processing application and be able to instantaneously deliver predictions. If over a network, it should incur very minimal network latency.
For some applications, time is not of the essence. So we can wait for a certain amount of data to “pile up” before the machine learning model is run on that data. This is referred to as batch processing. For example, the recommendations you see from a shopping outlet may stay the same for a day or two. This is because the recommendations are only periodically “refreshed.” Even if the machine learning models are sluggish, it doesn’t have a big impact as long the recommendations are refreshed within the expected time range.
One of the companies we worked with wanted to develop a model that operates without any network latency. This meant that it had to be tightly integrated into their web application. As a result, we had to ensure that they were careful with the programming language used and external resources relied upon for speed and compatibility with the web application.
3 things to consider when thinking about ML model deployment:
- How will your model integrate into your workflow, product, and business system?
- Types of machine learning algorithms, frameworks, and technologies that will support your deployment constraints.
- Data dependencies when models are put into production—what data pipelines would be needed to train and serve models?
The process of planning model deployment should start early on. Without this planning, you may end up with a lot of rework, including rewriting code or using alternative machine learning frameworks and algorithms.
ML deployment challenges
There are many factors that can impact machine learning model deployment. Some common challenges faced during deployment include:
- Time lag. It takes too long to get back results from the model.
- Too large to operationalize. Some models are so large that it cannot easily fit in memory and generate results.
- Poor results. Although the development results may be acceptable, the model performs poorly when it comes time to serve real use cases.
- Lack of metrics. You don’t know how to measure the quality of models that are in production.
- Concept drift. Although the model was initially working well in practice, the quality of predictions is no longer acceptable.
- Silent errors. Your model generates strange predictions, but the cause of those errors is not easy to pinpoint.
Some of these issues can be avoided if more time is spent planning and fleshing out constraints and potential pitfalls before and during development. I’ve seen data scientists develop models in a couple of hours only to be dumbfounded when their models don’t work in practice.
If you’ve come across other issues during deployment, feel free to leave a comment below.