Neural Network composed of multiple activation functions
Asked Answered
C

2

5

I am using the sknn package to build a neural network. In order to optimize the parameters of the neural net for the dataset I am using I am using an evolutionary algorithm. Since the package allows me to build a neural net where each layer has a different activation function, I was wondering if that is a practical choice, or whether I should just use one activation function per net? Does having multiple activation functions in a neural net harm, does no damage, or benefit the neural network?

Also what is the maximum amount of neuron per layer I should have, and the maximum amount of layers per net should I have?

Cameraman answered 21/6, 2016 at 14:42 Comment(0)
G
6

A neural network is just a (big) mathematical function. You could even use different activation functions for different neurons in the same layer. Different activation functions allow for different non-linearities which might work better for solving a specific function. Using a sigmoid as opposed to a tanh will only make a marginal difference. What is more important is that the activation has a nice derivative. The reason tanh and sigmoid are usually used is that for values close to 0 they act like a linear function while for big absolute values they act more like the sign function ((-1 or 0) or 1 ) and they have a nice derivative. A relatively new introduced one is the ReLU (max(x,0)), which has a very easy derivative (except for at x=0), is non-linear but importantly is fast to compute so nice for deep networks with high training times.

What it comes down to is that for the global performance the choice in this is not very important, the non-linearity and capped range is important. To squeeze out the last percentage points this choice will matter however but is mostly dependent on your specific data. This choice just like the number of hidden layers and the number of neurons inside these layers will have to be found by crossvalidation, although you could adapt your genetic operators to include these.

Gary answered 21/6, 2016 at 14:53 Comment(0)
N
4

I was wondering if [having different activation functions on each layer] is a practical choice, or whether I should just use one activation function per net?

Short answer: it depends

Longer answer: I'm trying to think of why you would want to have multiple activation functions. You don't say in your question so I'll answer at a more theoretical level.

General Advice/Guidance

Neural networks are just approximations of a mathematical function, and the correct design will be based on answering the following questions/answers

  • How close does the approximation need to be, and how close can you train your network to approximate the function?
  • How well does the network generalize to datasets that it was not trained on? How well does it need to generalize?

Here's an extra one that I think is relevant to your question

  • How fast does the network need to perform? How does your choice of activation function hinder performance?

If you answer these questions, you'll have a better idea about your specific case.

My Opinion

Building a neural network with multiple activation functions is really muddying the waters and making the system more complicated than it needs to be. When I think of building good software, one of the first things I think of is cohesive design. In other words, does the system make sense as a whole or is it doing too much?

Pro tip: Don't build software Rube Goldburg Machines.

If you want multiple activation functions in the same network, this is not cohesive in my opinion. If your problem really calls for this for some reason, then rethink the problem and maybe design a system with multiple separate neural networks, and those networks will each serve their respective purposes with their respective architecture (including a choice of activation function).

Need answered 21/6, 2016 at 15:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.