gradient-descent Questions
7
Solved
Sometimes I run into a problem:
OOM when allocating tensor with shape
e.g.
OOM when allocating tensor with shape (1024, 100, 160)
Where 1024 is my batch size and I don't know what's the rest. If ...
Bigener asked 9/10, 2017 at 20:25
2
I am learning gradient descent for calculating coefficients. Below is what I am doing:
#!/usr/bin/Python
import numpy as np
# m denotes the number of examples here, not the number of features...
Callboard asked 25/6, 2014 at 14:24
7
Solved
I've noticed that a frequent occurrence during training is NANs being introduced.
Often times it seems to be introduced by weights in inner-product/fully-connected or convolution layers blowing up...
Fowling asked 27/11, 2015 at 17:23
2
Solved
Seems like a basic question, but I need to use feature scaling (take each feature value, subtract the mean then divide by the standard deviation) in my implementation of linear regression with grad...
Limpet asked 16/1, 2014 at 17:33
8
Solved
Why does zero_grad() need to be called during training?
| zero_grad(self)
| Sets gradients of all model parameters to zero.
Shaky asked 28/12, 2017 at 4:31
6
Solved
def gradient(X_norm,y,theta,alpha,m,n,num_it):
temp=np.array(np.zeros_like(theta,float))
for i in range(0,num_it):
h=np.dot(X_norm,theta)
#temp[j]=theta[j]-(alpha/m)*( np.sum( (h-y)*X_norm[:,j]...
Burstone asked 22/7, 2013 at 9:55
7
Solved
Where is an explicit connection between the optimizer and the loss?
How does the optimizer know where to get the gradients of the loss without a call liks this optimizer.step(loss)?
-More context-
...
Seeley asked 30/12, 2018 at 6:30
1
My program for training a model in PyTorch converges worse than the TensorFlow implementation. When I switch to SGD instead of Adam, the losses are identical. With Adam, the losses are different st...
Ceric asked 24/5, 2021 at 20:56
1
I want to implement non-negative matrix factorization using PyTorch. Here is my initial implement:
def nmf(X, k, lr, epochs):
# X: input matrix of size (m, n)
# k: number of latent factors
# lr:...
Albion asked 15/3, 2023 at 9:14
1
Solved
Loss functions in pytorch use "mean" reduction. So it means that the model gradient will have roughly the same magnitude given any batch size. It makes sense that you want to scale the le...
Smolder asked 10/3, 2023 at 22:15
1
Solved
I'm working on trying to compare the converge rate of SGD and GD algorithms for the neural networks. In PyTorch, we often use SGD optimizer as follows.
train_dataloader = torch.utils.data.DataLoade...
Salvidor asked 4/6, 2022 at 0:16
2
Solved
I am trying to comprehend inner workings of the gradient accumulation in PyTorch. My question is somewhat related to these two:
Why do we need to call zero_grad() in PyTorch?
Why do we need to ex...
Caloric asked 28/5, 2020 at 14:35
4
Solved
I am using scipy.optimize.fmin_l_bfgs_b to solve a gaussian mixture problem. The means of mixture distributions are modeled by regressions whose weights have to be optimized using EM algorithm.
si...
Passementerie asked 7/1, 2016 at 19:27
3
Solved
I use the Pytorch. In the computation, I move some data and operators A in the GPU. In the middle step, I move the data and operators B to CPU and continue the forward.
My question is that:
My oper...
Fluid asked 23/8, 2021 at 4:11
1
from math import exp
import numpy as np
from sklearn.linear_model import LogisticRegression
I used code below from How To Implement Logistic Regression From Scratch in Python
def predict(row, coef...
Sarawak asked 27/2, 2022 at 9:47
2
Solved
I am learning Artificial Neural Network (ANN) recently and have got a code working and running in Python for the same based on mini-batch training. I followed the book of Michael Nilson's Neural Ne...
Coldblooded asked 24/7, 2015 at 4:28
2
Solved
Suppose I have my custom loss function and I want to fit the solution of some differential equation with help of my neural network. So in each forward pass, I am calculating the output of my neural...
Tenner asked 12/9, 2021 at 5:8
2
Solved
Could be quite a trivial question to answer, but I just wanted to be clearer. From the available literature and the discussion in What is the difference between Gradient Descent and Newton's Gr...
Alansen asked 18/1, 2020 at 6:43
5
Solved
I understand what Gradient Descent does. Basically it tries to move towards the local optimal solution by slowly moving down the curve. I am trying to understand what is the actual difference betwe...
Spectrum asked 22/8, 2012 at 5:27
4
Solved
What is the correct way to perform gradient clipping in pytorch?
I have an exploding gradients problem.
Peh asked 15/2, 2019 at 20:9
1
Solved
PyTorch has new functionality torch.inference_mode as of v1.9 which is "analogous to torch.no_grad... Code run under this mode gets better performance by disabling view tracking and version co...
Godber asked 12/10, 2021 at 16:21
4
Solved
Can you please tell me the difference between Stochastic Gradient Descent (SGD) and back-propagation?
Counterinsurgency asked 21/6, 2016 at 20:2
2
Solved
I can see what this code below from this video is trying to do. But the sum from y=torch.sum(x**2) confuses me. With sum operation, y becomes a tensor with one single value. As I understand .backwa...
Ramshackle asked 2/8, 2019 at 6:12
2
Solved
I am reading the a deep learning with python book.
After reading chapter 4, Fighting Overfitting, I have two questions.
Why might increasing the number of epochs cause overfitting?
I know increa...
Mazonson asked 27/12, 2018 at 9:22
3
Solved
I want to visualize the patterns that a given feature map in a CNN has learned (in this example I'm using vgg16). To do so I create a random image, feed through the network up to the desired convol...
Mandrake asked 6/7, 2019 at 17:38
1 Next >
© 2022 - 2025 — McMap. All rights reserved.