DropPath in TIMM seems like a Dropout?

Asked 14/9, 2021 at 9:53 Answered 7/12, 2021 at 22:3

python deep-learning pytorch computer-vision conv-neural-network

The code below (taken from here) seems to implement only a simple Dropout, neither the DropPath nor DropConnect. Is that true?

def drop_path(x, drop_prob: float = 0., training: bool = False):
    """Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
    This is the same as the DropConnect impl I created for EfficientNet, etc networks, however,
    the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper...
    See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... I've opted for
    changing the layer and argument names to 'drop path' rather than mix DropConnect as a layer name and use
    'survival rate' as the argument.
    """
    if drop_prob == 0. or not training:
        return x
    keep_prob = 1 - drop_prob
    shape = (x.shape[0],) + (1,) * (x.ndim - 1)  # work with diff dim tensors, not just 2D ConvNets
    random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device)
    random_tensor.floor_()  # binarize
    output = x.div(keep_prob) * random_tensor
    return output

Jalisajalisco answered 14/9, 2021 at 9:53 Comment(0)

No, it is different from Dropout:

import torch
from torch.nn.functional import dropout

torch.manual_seed(2021)

def drop_path(x, drop_prob: float = 0., training: bool = False):
    if drop_prob == 0. or not training:
        return x
    keep_prob = 1 - drop_prob
    shape = (x.shape[0],) + (1,) * (x.ndim - 1)
    random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device)
    random_tensor.floor_()  # binarize
    output = x.div(keep_prob) * random_tensor
    return output

x = torch.rand(3, 2, 2, 2)

# DropPath
d1_out = drop_path(x, drop_prob=0.33, training=True)

# Dropout
d2_out = dropout(x, p=0.33, training=True)

Let's compare the outputs (I removed the line break between channel dimension for readability):

# DropPath
print(d1_out)
#  tensor([[[[0.1947, 0.7662],
#            [1.1083, 1.0685]],
#           [[0.8515, 0.2467],
#            [0.0661, 1.4370]]],
#
#          [[[0.0000, 0.0000],
#            [0.0000, 0.0000]],
#           [[0.0000, 0.0000],
#            [0.0000, 0.0000]]],
#
#          [[[0.7658, 0.4417],
#            [1.1692, 1.1052]],
#           [[1.2014, 0.4532],
#            [1.4840, 0.7499]]]])

# Dropout
print(d2_out)
#  tensor([[[[0.1947, 0.7662],
#            [1.1083, 1.0685]],
#           [[0.8515, 0.2467],
#            [0.0661, 1.4370]]],
#
#          [[[0.0000, 0.1480],
#            [1.2083, 0.0000]],
#           [[1.2272, 0.1853],
#            [0.0000, 0.5385]]],
#
#          [[[0.7658, 0.0000],
#            [1.1692, 1.1052]],
#           [[1.2014, 0.4532],
#            [0.0000, 0.7499]]]])

As you can see, they are different. DropPath is dropping an entire sample from the batch, which effectively results in stochastic depth when used as in Eq. 2 of their paper. On the other hand, Dropout is dropping random values, as expected (from the docs):

During training, randomly zeroes some of the elements of the input tensor with probability p using samples from a Bernoulli distribution. Each channel will be zeroed out independently on every forward call.

Also note that both scale the output values based on the probability, i.e., the non-zeroed out elements are identical for the same p.

Lebanon answered 14/9, 2021 at 12:39 Comment(2)

Seems like drop a sample randomly? What's the meaning of that? – Jalisajalisco 15/9, 2021 at 8:9

@inkzk A batch is a set of samples. DropPath is dropping samples (the actual number of samples will depend on the probability and RNG, of course) from the batch, which results in stochastic depth, when applied to Eq. 2 of the paper. Check the zeroes in d1_out. Please, consider upvoting if it was helpful. – Lebanon 15/9, 2021 at 11:21

Usually, droppath is used together with residual connections. For example:

x = x + droppath(block(x))

If block(x) is dropped, it equals to say block() is skipped and the input x is output directly.

Tsushima answered 7/12, 2021 at 22:3 Comment(0)

Recommended topics

Hot tags