Get positive and negative part of gradient for loss function in PyTorch
Asked Answered
A

1

7

I want to implement non-negative matrix factorization using PyTorch. Here is my initial implement:

def nmf(X, k, lr, epochs):
    # X: input matrix of size (m, n)
    # k: number of latent factors
    # lr: learning rate
    # epochs: number of training epochs
    m, n = X.shape
    W = torch.rand(m, k, requires_grad=True)  # initialize W randomly
    H = torch.rand(k, n, requires_grad=True)  # initialize H randomly
    # training loop
    for i in range(epochs):
        # compute reconstruction error
        loss = torch.norm(X - torch.matmul(W, H), p='fro')
        # compute gradients
        loss.backward()
        # update parameters using additive update rule
        with torch.no_grad():
            W -= lr * W.grad
            H -= lr * H.grad
            W.grad.zero_()
            H.grad.zero_()
        if i % 10 == 0:
            print(f"Epoch {i}: loss = {loss.item()}")
    return W.detach(), H.detach()

Lee and Seung in this paper, proposed to use adaptive learning rates to avoid subtraction and thus the production of negative elements. Here is the stats.SE thread where I get some idea. But I don't know how to implement multiplicative update rule for W,H in pytorch, as it need to separate the positive and negative part of their gradient respectively. Yes, I can manually implement that but I want to leverage this to the torch autograd.

image

Any idea how to manage to do so? Thanks in advance.

Albion answered 15/3, 2023 at 9:14 Comment(0)
C
0

In the multiplicative update rule, the positive and negative parts of the gradients are separated and the updates are computed based on the ratio of the positive and negative parts.

Note: small value eps is added to the denominators to avoid division by zero.

def nmf(X, k, lr, epochs):
    # X: input matrix of size (m, n)
    # k: number of latent factors
    # lr: learning rate
    # epochs: number of training epochs
    m, n = X.shape
    W = torch.rand(m, k, requires_grad=True)  # initialize W randomly
    H = torch.rand(k, n, requires_grad=True)  # initialize H randomly
    eps = 1e-9  # small value to avoid division by zero
    # training loop
    for i in range(epochs):
        # compute reconstruction error
        loss = torch.norm(X - torch.matmul(W, H), p='fro')
        # compute gradients
        W_pos = torch.relu(W)  # separate positive and negative parts of W
        W_neg = torch.relu(-W)
        H_pos = torch.relu(H)  # separate positive and negative parts of H
        H_neg = torch.relu(-H)
        grad_W_pos = torch.matmul((torch.matmul(W_pos, H_pos) - X), H_pos.t())
        grad_W_neg = torch.matmul((torch.matmul(W_neg, H_pos) - X), H_pos.t())
        grad_H_pos = torch.matmul(W_pos.t(), (torch.matmul(W_pos, H_pos) - X))
        grad_H_neg = torch.matmul(W_pos.t(), (torch.matmul(W_pos, H_neg) - X))
        # update parameters using multiplicative update rule
        W *= torch.sqrt((grad_W_pos + eps) / (grad_W_neg + eps))
        H *= torch.sqrt((grad_H_pos + eps) / (grad_H_neg + eps))
        if i % 10 == 0:
            print(f"Epoch {i}: loss = {loss.item()}")
    return W.detach(), H.detach()

However, implementing adaptive learning rates in PyTorch for NMF can be more complex and may require additional code

Concrete answered 22/3, 2023 at 5:30 Comment(4)
Can you explain how applying relu help to get the positive and negative part of the gradients? Actually, I am doing something similar, hence interested to know the solution! @RajeshKonthamLavinia
When the 'relu' function is applied to a tensor, it sets any negative values to zero, while leaving positive values unchanged. relu is applied separately to the positive and negative parts of the factor matrices W and H. This allows the gradients to be separated into positive and negative components.Concrete
Sorry for late response, @RajeshKontham. Using relu to separate the gradients part seem promising. But I didn't see how you backpropagate the gradients. Isn't there should be loss.backward()? I am new in PyTorch. After give it a try I get, Epoch 0: loss = 11363.654656173509 Epoch 10: loss = nan Epoch 20: loss = nan seems like something is not working after 1st iteration.Albion
Backpropagation is not needed in this case. NaN after the first epoch suggests that there may be a numerical instability or a division by zero occurring during the computation. A Issue is the use of the torch.sqrt function in the update rule, which can result in NaN values if the argument is negative. Solution: using the torch.clamp function to bound the factor matrices W and H to be non-negative. This prevents the computation of negative values that cause numerical issues. You can try reducing the learning rate lr to improve stability.Concrete

© 2022 - 2025 — McMap. All rights reserved.