What Is A Bayesian Classifier

Why do we need it. How it works. Use cases in the real world.

May 31, 2022

Hey reader, welcome to the 💥 free edition 💥 of my weekly newsletter. I write about ML concepts, how to build ML products, and how to thrive at work. You can learn more about me here. Feel free to send me your questions and I’m happy to offer my thoughts. Subscribe to this newsletter to receive it in your inbox every week.

A classifier is a machine learning model that can identify the category of a given input. The list of possible categories is known beforehand. We train the model to learn from the training dataset and predict the category of a new data point.

There are various real world use cases for this model such as like face recognition, detecting financial fraud, diagnosing illnesses, sentiment analysis, spam filtering, predicting whether or not it will rain, and more.

When a model makes predictions, it outputs a single answer. Let’s consider a system that predicts whether or not it will rain tomorrow e.g. “it will rain tomorrow”. But what’s the likelihood that it will rain? How do we build a classifier that can provide the likelihood of each possible event happening?

What are probabilistic classifiers?

When it comes to real world scenarios, we cannot predict the outcomes with absolute certainty. There are algorithms that simply output the best class. But how do we assess the likelihood of an event happening? Using probabilistic classification. Algorithms of this nature use probabilities to find the best answer to a given question.

These algorithms output probabilities of the input belonging to each class. The output is the one with the highest probability. These are called probabilistic classifiers.

What is Bayes' theorem?

Let's consider two events:

Event A: It will rain tomorrow
Event B: It will not rain tomorrow

Even though we cannot predict the outcome of these events with absolute certainty, we can definitely get a good estimate.

Bayes' theorem says that the probability of occurrence of an event changes if there is information available about a related event. For example, let's say that we need to assess the likelihood of Event A. The neutral answer would be that there is a 50% chance. It will either rain or not rain with equal likelihood.

Now if I tell you that it has rained every single day in the last 7 days, will you change your answer? You certainly will. Bayes' theorem gives us a way to account for the evidence available and update our model. We use this law to build a machine learning model that accounts for all the evidence available.

What is a Bayesian Classifier?

Bayesian classifier is a type of probabilistic classifier. Many models have been built using Bayes' theorem. A Bayesian classifier builds a probabilistic model of the features and uses that model to predict the class.

The idea is that if a model knows the category, it can predict the values of the other features. If it doesn’t know the category, Bayes' theorem can be used to predict the class given the fact that we know the value of other features.

The simplest case of a Bayesian Classifier is the Naive Bayes Classifier, which assumes that all the features are independent of each other. It means that for a given data point, the occurrence of a feature doesn’t affect the occurrence of any other feature.

How does a Naive Bayes Classifier work?

Let's consider an example of a deck of cards. You take a card out and put it back. Then you take out another card and put it back again. Now let's define two events:

Event X: The first card is greater than 7
Event Y: The second card is an even number

Events X and Y are totally independent. They don't impact each other. Now let's define another event:

Event Z: The sum of the two cards is greater than 10

Now events X and Z are dependent on each other. If you get 9 on the first card, there is definitely a higher chance that the sum of the two cards will be greater than 10. These two events are not independent.

In many real world scenarios, the occurrence of one feature depends on the occurrence of other features. Assuming that they are independent would be "naive". Hence the name Naive Bayes Classifier.

If you look at it from a practical standpoint, calculating the correlation between all the features is a time consuming task. To speed it up, Naive Bayes Classifier assumes that the input features are independent of each other and computes the probability of an unknown input belonging to a particular category.

The one with the highest probability will be assigned to that input. To train a classifier, the distributions for each input feature can be learned from the data. Even with these simplifying assumptions, Naive Bayes Classifiers give good results relative to many other approaches.

Where to go from here?

A Bayesian classifier has use cases across a range of scenarios. And they are easy to implement as well. The biggest disadvantage of a Naive Bayes Classifier is the independence assumption. These classifiers assume that the features are independent, which ends up hindering the performance of the classifier. All machine learning libraries come equipped with these models. You can take a supervised learning problem and implement a Bayesian classifier to get familiar with it.

🎙🔥 Two new episodes on the Infinite ML pod

Zach Keller on using machine learning in recommerce and marketplaces
Duration: 34 mins
🎧 Apple Podcasts
🎧 Spotify
🎧 Google Podcasts

AMA with Prateek Joshi: Python vs Matlab for ML, switching careers to ML, building startups, career paths
Duration: 16 mins
🎧 Apple Podcasts
🎧 Spotify
🎧 Google Podcasts

📋 Job Opportunities in ML

Check out this job board for the open roles in Machine Learning.

💁🏻‍♀️ 💁🏻‍♂️ How would you rate this week’s newsletter?

You can rate this newsletter to let me know what you think. Your feedback will help make it better.

Infinite Curiosity Newsletter

Discussion about this post