What Is An Activation Function

Why do we need it. What is it. Types of activation functions.

May 17, 2022

Hey reader, welcome to the 💥 free edition 💥 of my weekly newsletter. I write about ML concepts, how to build ML products, and how to thrive at work. You can learn more about me here. Feel free to send me your questions and I’m happy to offer my thoughts. Subscribe to this newsletter to receive it in your inbox every week.

The activation function is a key element of neural networks. It controls how the neural network learns from the training data and how the model makes predictions. Deep learning research focuses a lot on how the layers are designed. What role do activation functions play in neural networks? And what type of activation functions are available?

Why do we need it?

A neural network consists of a network of neurons that process input data in a certain way and produce an output. These neurons "fire" as needed. You can think of the process of firing as a bulb lighting up in a network of bulbs. Within a network, some neurons will fire and some won't depending on the input data. And this process decides how the network learns and make predictions.

What does a neuron do? A neuron is a mathematical object that can receive multiple input values with the goal of producing an output. It calculates the weighted sum of these inputs and adds a bias value. And then decides whether it should be fired or not. Since the inputs can have any possible value, the output of a neuron can also have any possible value.

So how do we decide whether this neuron should fire or not? This is where activation functions come into picture.

What is an activation function?

The goal of an activation function is to check the output value of a neuron and decide whether it should fire or not. Other neurons in the network need this information. If a neuron fires, then we say that it's activated. Hence the name "activation function".

There are many types of activation functions. Many of these functions are nonlinear in nature. We use activation functions to introduce nonlinearity in the network, which enables the neural network to learn complex tasks.

A neural network has three types of layers:

Input layer: This layer accepts input data. It doesn't do any computation. The neurons just pass the input data to the first hidden layer.
Hidden layers: These layers take the input data from the input layer, process it, and pass it on to the next hidden layer. These neurons are not exposed to the outside world, which is why they're called "hidden" layers.
Output layer: This layer makes the predictions. The last hidden layer is connected to this layer. It provides a way for the network to produce an output based on what it has learnt.

All hidden layers in a neural network tend to use the same activation function. The output layer usually uses a different activation function. The choice of activation function depends on what we want to achieve.

Types of activation functions

There are many types of activation functions, but a few have gained prominence over the years.

Linear function: A linear function is a straight line. This is just linear regression and it doesn’t take advantage of the neural network architecture to build models that can learn complex tasks.

Sigmoid function: This is an 'S' shaped graph that cuts the Y axis at 0.5. All the values that enter this function will be mapped to values ranging from 0 to 1. This is good for binary classification tasks (yes/no models). And it's used as an activation function for the hidden layers.

Tanh: It's the hyperbolic tangent function. All the values that enter this function will be mapped to values ranging from -1 to 1. It's a mathematically shifted version of the sigmoid function. It cuts the Y axis at the origin. Because of this, it helps center the data around 0. This is a better choice than sigmoid function for binary classification tasks. It works well as an activation function for hidden layers.

ReLU: It stands for Rectified Linear Unit and is one of the most popular activation functions for hidden layers. It's a straightforward function — The output is equal to input if the input is positive or else it’s 0. It's computationally less expensive than sigmoid and tanh functions because it's not really doing any mathematical operations. Only a few neurons get activated in the network, which makes it sparse. And it’s easy to compute. This make ReLU an efficient choice.

Softmax: The softmax function is a generalized version of the sigmoid function. It converts a vector of numbers into probabilities based on their relative values. The sigmoid function is good for binary classification tasks, but it won't work for multiclass classification problems where the number of output categories more than two. This is where softmax comes into play. It works great for multiclass problems and is generally used in the output layer.

ReLU and Tanh are used for hidden layers. Linear and softmax are used for output layers. Sigmoid can be used for both hidden layers as well as the output layer. A popular combination in the real world is ReLU for hidden layers and Softmax for the output layer. You get speed, accuracy, and generalization capability with this combination.

Where to go from here

Understanding activation functions is critical when you’re building neural networks. As neural networks learn to do more complex tasks, choosing a good activation function becomes even more important. All these activation functions are available as part of standard Python packages. You should experiment with these functions and see how they work so that you can choose the right one based on the task at hand.

🎙🔥 Two new episodes on the Infinite ML pod

Thom Ives on his AI journey, creating educational content, establishing online presence, building a portfolio of projects, preventing burnout, how growth happens through cycles, storytelling for data scientists.
Duration: 41 mins
🎧 Apple Podcasts
🎧 Spotify
🎧 Google Podcasts

What's new in ML: I talk about the latest ML news including brain computer interface, nanomagnetic computing, counting microplastics using Machine Learning, quantum tunneling memory, wine reviews written by AI, spotting cavity, cancer research.
Duration: 16 mins
🎧 Apple Podcasts
🎧 Spotify
🎧 Google Podcasts

📋 Job Opportunities in AI

Check out this job board for the latest opportunities in AI. It features a list of open roles in Machine Learning, Data Science, Computer Vision, and NLP at startups and big tech.

💁🏻‍♀️ 💁🏻‍♂️ How would you rate this week’s newsletter?

You can rate this newsletter to let me know what you think. Your feedback will help make it better.

Infinite Curiosity Newsletter

Discussion about this post