Why is the clip_grad_norm_ function used here?
Asked Answered
C

1

7

I am learning LSTM with PyTorch from someone's code. Here he uses the clip_grad_norm_ function in the training process of a two layer LSTM. I want to know why he uses the clip_grad_norm_ function here, so I can understand the whole code properly (he used it in second last line).

for x, y in get_batches(data, batch_size, seq_length):
    counter += 1
                            
    x = one_hot_encode(x, n_chars)
    inputs, targets = torch.from_numpy(x), torch.from_numpy(y)
            
    if(train_on_gpu):
        inputs, targets = inputs.cuda(), targets.cuda()

    h = tuple([each.data for each in h])                    
    net.zero_grad()
                            
    output, h = net(inputs, h)                
            
    loss = criterion(output, targets.view(batch_size*seq_length).long())
    loss.backward()

    nn.utils.clip_grad_norm_(net.parameters(), clip)
    opt.step() 

If you need more information about question then please let me know.

Chile answered 23/4, 2021 at 20:29 Comment(0)
A
5

torch.nn.utils.clip_grad_norm_ performs gradient clipping. It is used to mitigate the problem of exploding gradients, which is of particular concern for recurrent networks (which LSTMs are a type of).

Further details can be found in the original paper.

Alesandrini answered 23/4, 2021 at 23:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.