How is Hard Sigmoid defined

C

5

11

I am working on Deep Nets using keras. There is an activation "hard sigmoid". Whats its mathematical definition ?

I know what is Sigmoid. Someone asked similar question on Quora: https://www.quora.com/What-is-hard-sigmoid-in-artificial-neural-networks-Why-is-it-faster-than-standard-sigmoid-Are-there-any-disadvantages-over-the-standard-sigmoid

But I could not find the precise mathematical definition anywhere ?

Clause answered 15/2, 2016 at 13:54 Comment(0)

C

-3

it is

  clip((x + 1)/2, 0, 1)

in coding parlance:

  max(0, min(1, (x + 1)/2))

Clause answered 28/2, 2018 at 13:18 Comment(1)

You should add some references and / or explanation to this. – Thou 16/5, 2018 at 11:53

H

14

Since Keras supports both Tensorflow and Theano, the exact implementation might be different for each backend - I'll cover Theano only. For Theano backend Keras uses T.nnet.hard_sigmoid, which is in turn linearly approximated standard sigmoid:

slope = tensor.constant(0.2, dtype=out_dtype)
shift = tensor.constant(0.5, dtype=out_dtype)
x = (x * slope) + shift
x = tensor.clip(x, 0, 1)

i.e. it is: max(0, min(1, x*0.2 + 0.5))

Highland answered 23/2, 2016 at 14:51 Comment(1)

Keras' TensorFlow backend has the same math, though implemented by hand. github.com/fchollet/keras/blob/master/keras/backend/… – Caparison 5/11, 2016 at 5:53

O

3

For reference, the hard sigmoid function may be defined differently in different places. In Courbariaux et al. 2016 [1] it's defined as:

σ is the “hard sigmoid” function: σ(x) = clip((x + 1)/2, 0, 1) = max(0, min(1, (x + 1)/2))

The intent is to provide a probability value (hence constraining it to be between 0 and 1) for use in stochastic binarization of neural network parameters (e.g. weight, activation, gradient). You use the probability p = σ(x) returned from the hard sigmoid function to set the parameter x to +1 with p probability, or -1 with probability 1-p.

[1] https://arxiv.org/abs/1602.02830 - "Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1", Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio, (Submitted on 9 Feb 2016 (v1), last revised 17 Mar 2016 (this version, v3))

Oliveira answered 21/9, 2016 at 20:14 Comment(0)

Y

1

The hard sigmoid is normally a piecewise linear approximation of the logistic sigmoid function. Depending on what properties of the original sigmoid you want to keep, you can use a different approximation.

I personally like to keep the function correct at zero, i.e. σ(0) = 0.5 (shift) and σ'(0) = 0.25 (slope). This could be coded as follows

def hard_sigmoid(x):
    return np.maximum(0, np.minimum(1, (x + 2) / 4))

Yodle answered 13/9, 2018 at 12:49 Comment(0)

L

1

As of October 2023, the definition used in Tensorflow Keras seems to have changed slightly. The documentation for tf.keras.activations.hard_sigmoid states:

The hard sigmoid activation, defined as:
if x < -2.5: return 0
if x > 2.5: return 1
if -2.5 <= x <= 2.5: return 0.2 * x + 0.5

Here is some code to plot the function.

import tensorflow as tf
import numpy
import pandas

x = numpy.linspace(-10, 10, 100) 
a = tf.constant(x, dtype = tf.float32)
h = tf.keras.activations.hard_sigmoid(a).numpy()
s = tf.keras.activations.sigmoid(a).numpy()

df = pandas.DataFrame({
    'input': x,
    'sigmoid': s,
    'hardsigmoid': h,
}).set_index('input')
ax = df.plot()
ax.figure.savefig('hardsigmoid-keras.png')

Below is the output when ran with tensorflow 2.14.0.

Luis answered 29/10, 2023 at 11:16 Comment(0)

C

-3

it is

  clip((x + 1)/2, 0, 1)

in coding parlance:

  max(0, min(1, (x + 1)/2))

Clause answered 28/2, 2018 at 13:18 Comment(1)

You should add some references and / or explanation to this. – Thou 16/5, 2018 at 11:53

Recommended topics

Hot tags