For reference, the hard sigmoid function
may be defined differently in different places. In Courbariaux et al. 2016 [1] it's defined as:
σ is the “hard sigmoid” function: σ(x) = clip((x + 1)/2, 0, 1) =
max(0, min(1, (x + 1)/2))
The intent is to provide a probability value (hence constraining it to be between 0
and 1
) for use in stochastic binarization of neural network parameters (e.g. weight, activation, gradient). You use the probability p = σ(x)
returned from the hard sigmoid function to set the parameter x
to +1
with p
probability, or -1
with probability 1-p
.
[1] https://arxiv.org/abs/1602.02830 - "Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1", Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio, (Submitted on 9 Feb 2016 (v1), last revised 17 Mar 2016 (this version, v3))