How To Measure The Stability Of A Machine Learning Algorithm
What is it. Why do we need to measure it. Metrics that can used.
Hey reader, welcome to the 💥 free edition 💥 of my weekly newsletter. I write about ML concepts, how to build ML products, and how to thrive at work. You can learn more about me here. Feel free to send me your questions and I’m happy to offer my thoughts. Subscribe to this newsletter to receive it in your inbox every week.
The metric that gets measured the most for a machine learning algorithm is its accuracy. New algorithms are being developed that have higher accuracy than the ones before them. These algorithms are used to train models on datasets that represent observations in the real world. What if we use the same algorithm to build a model using a slightly different dataset? Will it result in the same model?
Why do we need to measure stability?
An algorithm is designed to build a model that achieves a specific task. We need our machine learning system to be robust against small variations in training dataset. Analyzing its stability enables us to keep track of it.
Let's consider a supervised learning example. We need to build an image recognition system that can classify the images of cars, bikes, and buses into separate categories. We use 30,000 images for training (10,000 for each category). The output is a model that can reasonably classify an input image into one of the three categories. To measure its performance, we compute the accuracy metric using the dataset. It tells us how good the model is at classifying images.
As the next step, let's replace 100 images from each category with new images of that same category. We then go ahead and build a classifier model as before. Will this model be the same as before? If we keep repeating this experiment, will the model remain consistent and accurate?
We need this model to stay the same and do its job with the same level of accuracy. How can we measure it? This is where the concept of stability comes into the picture.
What is stability?
Stability is a concept in machine learning that aims to quantify how a model changes when we change the training dataset. A machine learning algorithm is said to be stable if the model doesn’t change much when the training dataset is modified. Now how do we define "much" in this case? We define it by using an upper limit.
A model tends to change when we change the training dataset. But if the algorithm is well designed, it shouldn’t change the model drastically. In fact, we don't want it to change more than a certain threshold. If it satisfies this condition, the algorithm is said to be "stable".
There are a few different ways in which the training dataset can change:
replacing the samples in the training dataset
reducing the size of the training dataset
choosing a different subset for training
presence of noise in the training dataset
missing data in the training dataset
How do we measure stability?
Let's say we have an algorithm and a training dataset. We create a set of models using different subsets of it and then measure the error for each one. The goal of stability analysis is to come up with a reasonable upper limit for these errors. This upper limit is called the stability of that algorithm.
We aim to keep this upper limit as low as possible. Why is that? Let's consider an example. You want to get some repair work done for your house. You call the services firm to get an estimate of the cost. They tell you that they can't give an exact number, but that it's definitely going to be less than $9,000,000 dollars. It might be factually correct, but this piece of information is useless. It needs to have a reasonable upper limit so that you can make a decision on whether you should go ahead with this firm.
To measure stability, we need metrics that are easy to understand and calculate. This allows us to estimate the stability of that algorithm. There are many definitions of stability. You can use the one that's relevant to a given use case.
We assign a stability value to a machine learning algorithm with respect to a loss function. Measuring the stability depends on what loss function is being used to measure the performance of the algorithm. When you specify a stability value of an algorithm, you always need to specify the corresponding loss function that's being used in this context.
Here are a few common measures of stability:
Leave-one-out cross-validation Stability
You can look up how these values are computed. Each metric aims to probe a different aspect of the given algorithm.
Where to go from here?
The goal of having stability metrics is to put an upper limit on the generalization error of a machine learning algorithm. This allows us to see how sensitive it is with respect to changes in the training dataset. Once we understand it, we can make the necessary changes to the algorithm to make it more robust.
🎙🔥 Two new episodes on the Infinite ML pod
📋 Job Opportunities in ML
Check out this job board for the open roles in Machine Learning.
💁🏻♀️ 💁🏻♂️ How would you rate this week’s newsletter?
You can rate this newsletter to let me know what you think. Your feedback will help make it better.