I am learning LSTM with PyTorch from someone's code. Here he uses the clip_grad_norm_
function in the training process of a two layer LSTM. I want to know why he uses the clip_grad_norm_
function here, so I can understand the whole code properly (he used it in second last line).
for x, y in get_batches(data, batch_size, seq_length):
counter += 1
x = one_hot_encode(x, n_chars)
inputs, targets = torch.from_numpy(x), torch.from_numpy(y)
if(train_on_gpu):
inputs, targets = inputs.cuda(), targets.cuda()
h = tuple([each.data for each in h])
net.zero_grad()
output, h = net(inputs, h)
loss = criterion(output, targets.view(batch_size*seq_length).long())
loss.backward()
nn.utils.clip_grad_norm_(net.parameters(), clip)
opt.step()
If you need more information about question then please let me know.