UPDATE: Latest version of theano has native support of ReLU:
T.nnet.relu, which should be preferred over custom solutions.
I decided to compare the speed of solutions, since it is very important for NNs. Compared speed of function itself and it's gradient, in first case switch
is preferred, the gradient is faster for x * (x>0).
All the computed gradients are correct.
def relu1(x):
return T.switch(x<0, 0, x)
def relu2(x):
return T.maximum(x, 0)
def relu3(x):
return x * (x > 0)
z = numpy.random.normal(size=[1000, 1000])
for f in [relu1, relu2, relu3]:
x = theano.tensor.matrix()
fun = theano.function([x], f(x))
%timeit fun(z)
assert numpy.all(fun(z) == numpy.where(z > 0, z, 0))
Output: (time to compute ReLU function)
>100 loops, best of 3: 3.09 ms per loop
>100 loops, best of 3: 8.47 ms per loop
>100 loops, best of 3: 7.87 ms per loop
for f in [relu1, relu2, relu3]:
x = theano.tensor.matrix()
fun = theano.function([x], theano.grad(T.sum(f(x)), x))
%timeit fun(z)
assert numpy.all(fun(z) == (z > 0)
Output: time to compute gradient
>100 loops, best of 3: 8.3 ms per loop
>100 loops, best of 3: 7.46 ms per loop
>100 loops, best of 3: 5.74 ms per loop
Finally, let's compare to how gradient should be computed (the fastest way)
x = theano.tensor.matrix()
fun = theano.function([x], x > 0)
%timeit fun(z)
Output:
>100 loops, best of 3: 2.77 ms per loop
So theano generates inoptimal code for gradient. IMHO, switch version today should be preferred.