Hey reader, welcome to the š„ free edition š„ of my weekly newsletter. I write about ML concepts, how to build ML products, and how to thrive at work. You can learn more about me here. Feel free to send me your questions and Iām happy to offer my thoughts. Subscribe to this newsletter to receive it in your inbox every week.
Let's say we want to predict the efficiency of a pump 2 days later. We have historical data available along with labels. To perform the task, we can build a machine learning model that can predict this event.
Now there are many situations where it's difficult to model the problem. Let's consider a system of 15 pumps connected to each other. We want to predict the efficiency of this system after 3 months. More variables and dependencies to account for. Plus it's a longer time horizon. What do we do here?
There are problems where we have many variables that are dependent on each other. Any kind of assumption would lead to big errors in the output. It's usually a combination of uncertainty and variability. The above problem is an example of a system where the degrees of freedom are dependent on each other. How do we model this problem? How do we estimate the outcome?
Why do we need Monte Carlo simulation?
In the above problem, we need to assess the impact of all the variables. We need to predict the outcome under uncertainty here. But there isn't a single prediction. You need a range of possible outcomes depending on the operating conditions. This is where Monte Carlo simulation comes into picture.
Monte Carlo simulation allows us obtain outputs based on repeated random sampling. The core idea here is to specify the range of input values, let the algorithm repeatedly select random values, and then compute the output. If you keep repeating this experiment, the simulated output will start getting closer to the real output. By doing this, we get a range of possible outcomes. We assign probabilities to these outcomes.
This helps us know how our choice of inputs affects the outputs. The range of input values usually go from conservative to aggressive. By getting an estimate of these extremes, we can plan accordingly.
Where is it used?
Monte Carlo simulation is used wherever it's possible to do a deterministic computation on the input values. It's a popular technique that provides people with a range of possible outcomes along with the probabilities of their occurrence. This technique is particularly useful in physical systems.
It finds applications in many fields including physical sciences, climate change, finance, energy, engineering, biology, logistics, games, insurance, and more. In addition to predicting pump failures, it has a number of use cases such as:
predicting energy output from a wind energy farm
estimating variations in circuits
modeling radiation transport
determining the position of a robot in uncertain terrains
modeling fluid flows
How does it work?
Monte Carlo simulation applies to problems that are deterministic in principle. It works by building models of possible outcomes by inputting a range of values into the system. This range of input values corresponds to a probability distribution.
It means that when it samples the input space, it uses this probability distribution to choose the values from that space. Once the values are chosen, the algorithm computes the results for this set of values. We keep repeating this experiment with different set of values chosen randomly from the probability distribution.
Here's the outline of the method:
Specify a range of possible inputs
Choose a probability distribution
Generate random inputs (within our range) using the chosen probability distribution
Do a deterministic computation on the inputs
Gather the results to see how likely each outcome is
Is it the same as "what-if" scenario planning?
Not exactly. In "what-if" scenario planning, we assign values to each uncertain variable ourselves. For example, we may choose three scenarios to account for: the best case, the worst case, and the average case.
In contrast to that, Monte Carlo simulations choose the input values themselves by using a probability distribution. They produce thousands of possible outcomes. The results are then analyzed to get probabilities of different outcomes occurring.
The "what-if" scenario planning gives equal weightage to all the scenarios. On the other hand, Monte Carlo simulation doesn't sample too much in the low probability regions. Instead of trying to come up with an exact number, it comes up with probabilities. It's a much more realistic way of modeling the real world.
There are no certainties in the real world. Only probabilities.
Monte Carlo methods are useful when there is a high degree of uncertainty in the inputs. When we are dealing with physical infrastructure like a system of pumps, we need to be very careful about how we choose our operating parameters.
Why not just estimate the underlying parameters governing the data?
When we have a set of variables, we model the data by extracting the underlying parameters. We then quantify our findings into a model. This is what a machine learning model does.
Letās consider a simple example where we assume that the input data has a Gaussian distribution. Given this assumption, we try estimate the mean and variance of this data. This technique is called single point estimation. But the real world data is not so deterministic. It has a lot of uncertainty associated with it. And we usually donāt have any idea about their interdependencies.
Monte Carlo methods help us overcome this constraint. When we compute an outcome, we get an estimate the likelihood of its occurrence. It allows us to create visualizations of different possible outcomes. In contrast to this, the single point estimation technique doesn't allow us to see how different input variables impact the outcomes.
Where to go from here?
Monte Carlo simulation is a powerful technique when it comes to understanding physical systems. In physical systems, we can perform deterministic computations on the inputs. These deterministic computations are basically models that have been constructed by domain experts. Using these models, we predict the probabilities of various outcomes.
On the other hand, machine learning builds models based on known observations. Itās well suited to making predictions on unknown input data. Monte Carlo simulation and machine learning are not mutually exclusive. Machine Learning models can be used to as a model to run Monte Carlo simulations.
š Infinite ML podcast
This podcast features conversations with amazing builders and practitioners in machine learning. It now has listeners in 250 cities across 50 countries. Subscribe to the podcast below and let me know what you think:
š§ Apple Podcasts
š§ Spotify
š§ Google Podcasts
š§ Web
š ML Job Opportunities
Check out this job board for the open roles in Machine Learning.
šš»āāļø šš»āāļø How would you rate this weekās newsletter?
You can rate this newsletter to let me know what you think. Your feedback will help make it better.