Why is ReLU a non-linear activation function?

Asked 21/9, 2018 at 15:20 Answered 29/8, 2024 at 10:27

Solved machine-learning neural-network deep-learning activation-function relu

As I understand it, in a deep neural network, we use an activation function (g) after applying the weights (w) and bias(b) (z := w * X + b | a := g(z)). So there is a composition function of (g o z) and the activation function makes so our model can learn function other than linear functions. I see that Sigmoid and Tanh activation function makes our model non-linear, but I have some trouble seeing that a ReLu (which takes the max out of 0 and z) can make a model non-linear...

Let's say if every Z is always positive, then it would be as if there was no activation function...

So why does ReLu make a neural network model non-linear?

Thesda answered 21/9, 2018 at 15:20 Comment(1)

Bacause the function is... well... non-linear. Piece-wise linear is enough to make it non-linear. – Phonate 21/9, 2018 at 15:22

Deciding if a function is linear or not is of course not a matter of opinion or debate; there is a very simple definition of a linear function, which is roughly:

f(a*x + b*y) = a*f(x) + b*f(y)

for every x & y in the function domain and a & b constants.

The requirement "for every" means that, if we are able to find even a single example where the above condition does not hold, then the function is nonlinear.

Assuming for simplicity that a = b = 1, let's try x=-5, y=1 with f being the ReLU function:

f(-5 + 1) = f(-4) = 0
f(-5) + f(1) = 0 + 1 = 1

so, for these x & y (in fact for every x & y with x*y < 0) the condition f(x + y) = f(x) + f(y) does not hold, hence the function is nonlinear...

The fact that we may be able to find subdomains (e.g. both x and y being either negative or positive here) where the linearity condition holds is what defines some functions (such as ReLU) as piecewise-linear, which are still nonlinear nevertheless.

Now, to be fair to your question, if in a particular application the inputs happened to be always either all positive or all negative, then yes, in this case the ReLU would in practice end up behaving like a linear function. But for neural networks this is not the case, hence we can rely on it indeed to provide our necessary non-linearity...

Golconda answered 21/9, 2018 at 15:32 Comment(2)

Thanks for such an explanation, could you elaborate a bit on your last remarks "But for neural networks, this is not the case"... – Devout 8/1, 2021 at 10:49

@KhalidSaifullah inputs to NN layers are never always positive or always negative – Golconda 8/1, 2021 at 11:49

I have had the same question for a long time. The confusion comes when you read that the activation function must be non-linear (curve-like), and there is a weird ReLU function that is almost linear, which is the default choice for most Neural Networks. Let me explain how I answered that question. Although it might not be mathematically correct, It cleared my mind from the confusion, so I hope it helps.

A Single Neural Network Unit without an activation function looks like this:

y=w*x + b

x is the input feature
w is the weight
b is the bias
y is the output

How far can we go with this function?

We can only predict the linear relationship between x and y. However, most problems in the real world are non-linear, meaning the relationship between two variables is not only plus and multiplication. Hence, we need something else to express this relationship.

Adding if-else expressions in the model is one way to do that. These conditions allow the model to output different values based on some criteria. For example, if the image has an eye in it, output the human class; otherwise, output an alien.

You can think of non-linearity as an if-else condition in the model. We can't directly put if-else conditions into our models because the main point of the model is to learn these conditions by itself. Therefore, we need to give the model the capability to do the if-else conditions by itself and learn the criteria.

Why is ReLU a non-linear activation function? Let's see what ReLU looks like:

It has two cuts, one from the left to 0 and one to the right after 0. Adding this function allows the model to output separate values based on the input, imitating the if-else condition. Therefore, ReLU doesn't look like a curve function, But it does the job perfectly. In addition, ReLU has almost all the nice properties that a linear function has.

Although if-else conditions do not accurately describe non-linear activation functions. At least you can interpret them that way.

Rior answered 29/8, 2024 at 10:27 Comment(0)

Recommended topics

Hot tags