gradient-descent Questions

7

Solved

Sometimes I run into a problem: OOM when allocating tensor with shape e.g. OOM when allocating tensor with shape (1024, 100, 160) Where 1024 is my batch size and I don't know what's the rest. If ...

2

I am learning gradient descent for calculating coefficients. Below is what I am doing: #!/usr/bin/Python import numpy as np # m denotes the number of examples here, not the number of features...
Callboard asked 25/6, 2014 at 14:24

7

Solved

I've noticed that a frequent occurrence during training is NANs being introduced. Often times it seems to be introduced by weights in inner-product/fully-connected or convolution layers blowing up...

2

Solved

Seems like a basic question, but I need to use feature scaling (take each feature value, subtract the mean then divide by the standard deviation) in my implementation of linear regression with grad...
Limpet asked 16/1, 2014 at 17:33

8

Solved

Why does zero_grad() need to be called during training? | zero_grad(self) | Sets gradients of all model parameters to zero.

6

Solved

def gradient(X_norm,y,theta,alpha,m,n,num_it): temp=np.array(np.zeros_like(theta,float)) for i in range(0,num_it): h=np.dot(X_norm,theta) #temp[j]=theta[j]-(alpha/m)*( np.sum( (h-y)*X_norm[:,j]...

7

Solved

Where is an explicit connection between the optimizer and the loss? How does the optimizer know where to get the gradients of the loss without a call liks this optimizer.step(loss)? -More context- ...

1

My program for training a model in PyTorch converges worse than the TensorFlow implementation. When I switch to SGD instead of Adam, the losses are identical. With Adam, the losses are different st...
Ceric asked 24/5, 2021 at 20:56

1

I want to implement non-negative matrix factorization using PyTorch. Here is my initial implement: def nmf(X, k, lr, epochs): # X: input matrix of size (m, n) # k: number of latent factors # lr:...

1

Solved

Loss functions in pytorch use "mean" reduction. So it means that the model gradient will have roughly the same magnitude given any batch size. It makes sense that you want to scale the le...

1

Solved

I'm working on trying to compare the converge rate of SGD and GD algorithms for the neural networks. In PyTorch, we often use SGD optimizer as follows. train_dataloader = torch.utils.data.DataLoade...
Salvidor asked 4/6, 2022 at 0:16

2

Solved

I am trying to comprehend inner workings of the gradient accumulation in PyTorch. My question is somewhat related to these two: Why do we need to call zero_grad() in PyTorch? Why do we need to ex...
Caloric asked 28/5, 2020 at 14:35

4

Solved

I am using scipy.optimize.fmin_l_bfgs_b to solve a gaussian mixture problem. The means of mixture distributions are modeled by regressions whose weights have to be optimized using EM algorithm. si...

3

Solved

I use the Pytorch. In the computation, I move some data and operators A in the GPU. In the middle step, I move the data and operators B to CPU and continue the forward. My question is that: My oper...
Fluid asked 23/8, 2021 at 4:11

1

from math import exp import numpy as np from sklearn.linear_model import LogisticRegression I used code below from How To Implement Logistic Regression From Scratch in Python def predict(row, coef...
Sarawak asked 27/2, 2022 at 9:47

2

Solved

I am learning Artificial Neural Network (ANN) recently and have got a code working and running in Python for the same based on mini-batch training. I followed the book of Michael Nilson's Neural Ne...
Coldblooded asked 24/7, 2015 at 4:28

2

Solved

Suppose I have my custom loss function and I want to fit the solution of some differential equation with help of my neural network. So in each forward pass, I am calculating the output of my neural...

2

Solved

Could be quite a trivial question to answer, but I just wanted to be clearer. From the available literature and the discussion in What is the difference between Gradient Descent and Newton's Gr...
Alansen asked 18/1, 2020 at 6:43

5

Solved

I understand what Gradient Descent does. Basically it tries to move towards the local optimal solution by slowly moving down the curve. I am trying to understand what is the actual difference betwe...

4

Solved

What is the correct way to perform gradient clipping in pytorch? I have an exploding gradients problem.

1

Solved

PyTorch has new functionality torch.inference_mode as of v1.9 which is "analogous to torch.no_grad... Code run under this mode gets better performance by disabling view tracking and version co...

4

Solved

Can you please tell me the difference between Stochastic Gradient Descent (SGD) and back-propagation?

2

Solved

I can see what this code below from this video is trying to do. But the sum from y=torch.sum(x**2) confuses me. With sum operation, y becomes a tensor with one single value. As I understand .backwa...
Ramshackle asked 2/8, 2019 at 6:12

2

Solved

I am reading the a deep learning with python book. After reading chapter 4, Fighting Overfitting, I have two questions. Why might increasing the number of epochs cause overfitting? I know increa...
Mazonson asked 27/12, 2018 at 9:22

3

Solved

I want to visualize the patterns that a given feature map in a CNN has learned (in this example I'm using vgg16). To do so I create a random image, feed through the network up to the desired convol...
Mandrake asked 6/7, 2019 at 17:38

© 2022 - 2025 — McMap. All rights reserved.