I have had the same question for a long time. The confusion comes when you read that the activation function must be non-linear (curve-like), and there is a weird ReLU function that is almost linear, which is the default choice for most Neural Networks. Let me explain how I answered that question. Although it might not be mathematically correct, It cleared my mind from the confusion, so I hope it helps.
A Single Neural Network Unit without an activation function looks like this:
y=w*x + b
- x is the input feature
- w is the weight
- b is the bias
- y is the output
How far can we go with this function?
We can only predict the linear relationship between x and y. However, most problems in the real world are non-linear, meaning the relationship between two variables is not only plus and multiplication. Hence, we need something else to express this relationship.
Adding if-else
expressions in the model is one way to do that. These conditions allow the model to output different values based on some criteria. For example, if the image has an eye in it, output the human class; otherwise, output an alien.
You can think of non-linearity as an if-else
condition in the model. We can't directly put if-else
conditions into our models because the main point of the model is to learn these conditions by itself. Therefore, we need to give the model the capability to do the if-else
conditions by itself and learn the criteria.
Why is ReLU a non-linear activation function?
Let's see what ReLU looks like:
It has two cuts, one from the left to 0 and one to the right after 0. Adding this function allows the model to output separate values based on the input, imitating the if-else
condition. Therefore, ReLU doesn't look like a curve function, But it does the job perfectly. In addition, ReLU has almost all the nice properties that a linear function has.
Although if-else
conditions do not accurately describe non-linear activation functions. At least you can interpret them that way.