What Is Gradient Boosting In Machine Learning

Why do we need it. How does it work. Where is it used.

Mar 29, 2022

Hey reader, welcome to the 💥 free edition 💥 of my weekly newsletter. I write about ML concepts, how to build ML products, and how to thrive at work. You can learn more about me here. Feel free to send me your questions and I’m happy to offer my thoughts. Subscribe to this newsletter to receive it in your inbox every week.

In last week's newsletter, we discussed how ensemble learning works. We talked about various types of methods to build ensemble learning models. Gradient Boosting has become really popular in machine learning and has become the go-to algorithm to solve many problems. What's the reason? And why do people favor it so much?

Why do we need boosting in the first place?

Let's say you want to buy a wireless speaker. There are many options available in the market. You need to build a model to estimate which wireless speaker to buy given all your constraints. To do that, you build separate models to estimate whether or not you want to buy a given speaker under consideration.

Let's say you built 4 models where each model uses one parameter as shown below:

Model 1 only uses the price factor to determine what speaker to buy
Model 2 only uses the information that your friends provided you to determine what speaker to buy
Model 3 only uses online reviews to determine what speaker to buy
Model 4 only uses what the salesperson told you to determine what speaker to buy

The outcome from each of these models will be different. The price can be misleading. Online reviews could be biased. Your friends might not know anything about wireless speakers. And the salesperson might be trying to sell you something that's not right for you.

So how do we make the final decision?

If you make a decision based on just one model, there's a high likelihood that you'll make a suboptimal decision. Each model individually is a weak learner. But if we combine all of them, then you can make a decision with a high degree of confidence.

You can take the outcome of each model and then average it out to make the final decision. But that is still vulnerable because you are not strengthening each model. That’s where boosting comes into the picture.

You can take the outcome of the first model and let the second model learn from its mistakes. And then you can take the outcome of the second model and let the third model learn from its mistakes. This is how boosting works. A boosting algorithm combines weak learners sequentially to build a strong model. It allows each model to learn from the previous model's mistakes.

Bagging vs boosting

There are 2 main ways to combine weak learners to create the final model:

Parallel method (called bagging)
Sequential method (called boosting)

In the parallel method, you train all the weak learners simultaneously. You then combine them by averaging the outputs. The weak learners are trained independently. They don't learn from each other. Random Forests are built this way.

In the sequential method, you train the weak learners one after the other. The goal for a weak learner is to learn from the mistakes of the previous weak learner. This allows the overall model to become more robust. AdaBoost and Gradient Boosting models are built this way.

What is Gradient Boosting?

When it comes to boosting, there are two main algorithms available: AdaBoost and Gradient Boosting. We’ll discuss XGBoost soon, which is an extended version of Gradient Boosting.

In AdaBoost algorithms, each weak learner is a decision tree. The algorithm assigns a weights to every observation. It gives more weight to observations that are difficult to classify and less weight to those that are easy to classify. The goal here is to highlight the instances that are difficult to classify so that the subsequent weak learner can learn from it. This mechanism gives the final model a chance to learn how to classify them correctly.

In Gradient Boosting, each weak learner is a decision tree as well. Same as AdaBoost. It also assigns weights to every observation. So how are they different from each other? It's in the way Gradient Boosting identifies where the model is misclassifying.

AdaBoost examines the data points that have more weight to see where the model is misclassifying. It gives more weightage to incorrect results so that the subsequent weak learner can learn from it.

In Gradient Boosting, an incorrect result is not given a higher weightage. It uses a loss function to measure the accuracy of a model. Why use a loss function? Because optimizing the loss function leads to the error being minimized. Once the error is computed, Gradient Boosting optimizes the model to reduce that error iteratively. It optimizes this loss function using gradient descent, hence it's called Gradient Boosting.

Where to go from here?

XGBoost stands for Extreme Gradient Boosting and is an extended version of Gradient Boosting. It's a gradient boosting library that implements machine learning algorithms under the Gradient Boosting framework. It optimizes the training process for Gradient Boosting. This library has garnered a lot of attention for its high performance, versatility, and scalability.

If you want to build a tool to see how gradient boosting works, you can check out the XGBoost library. It's easy to use and is available in many languages. You can build models for many types of classification and regression tasks.

🎙 New Podcast Episode

The latest episode of Infinite Machine Learning features Mihail Eric. He is the founder of Pametan Data Innovation and Confetti AI. He has published papers at some of the world's top conferences including ACL, AAAI, and NeurIPS. He has helped start teams at innovative companies like RideOS and Amazon Alexa. You can listen to the episode on Apple Podcasts, Google Podcasts, or Spotify.

🔥 Featured Job Opportunities

Check out Infinite AI Job Board for the latest job opportunities in AI. It features a list of open roles in Machine Learning, Data Science, Computer Vision, and NLP at startups and big tech.

💁🏻‍♀️ 💁🏻‍♂️ How would you rate this week’s newsletter?

You can rate this newsletter to let me know what you think. Your feedback will help make it better.

Infinite Curiosity Newsletter

Discussion about this post