gradient-descent

7

Solved

Sometimes I run into a problem: OOM when allocating tensor with shape e.g. OOM when allocating tensor with shape (1024, 100, 160) Where 1024 is my batch size and I don't know what's the rest. If ...

machine-learning neural-network deep-learning keras gradient-descent

Bigener asked 9/10, 2017 at 20:25

2

Multi variable gradient descent

I am learning gradient descent for calculating coefficients. Below is what I am doing: #!/usr/bin/Python import numpy as np # m denotes the number of examples here, not the number of features...

python machine-learning linear-regression gradient-descent

Callboard asked 25/6, 2014 at 14:24

7

Solved

Common causes of nans during training of neural networks

I've noticed that a frequent occurrence during training is NANs being introduced. Often times it seems to be introduced by weights in inner-product/fully-connected or convolution layers blowing up...

machine-learning neural-network deep-learning caffe gradient-descent

Fowling asked 27/11, 2015 at 17:23

2

Solved

Rescaling after feature scaling, linear regression

Seems like a basic question, but I need to use feature scaling (take each feature value, subtract the mean then divide by the standard deviation) in my implementation of linear regression with grad...

machine-learning linear-regression gradient-descent

Limpet asked 16/1, 2014 at 17:33

8

Solved

Why do we need to call zero_grad() in PyTorch?

Why does zero_grad() need to be called during training? | zero_grad(self) | Sets gradients of all model parameters to zero.

python neural-network deep-learning pytorch gradient-descent

Shaky asked 28/12, 2017 at 4:31

6

Solved

gradient descent using python and numpy

def gradient(X_norm,y,theta,alpha,m,n,num_it): temp=np.array(np.zeros_like(theta,float)) for i in range(0,num_it): h=np.dot(X_norm,theta) #temp[j]=theta[j]-(alpha/m)*( np.sum( (h-y)*X_norm[:,j]...

python numpy machine-learning linear-regression gradient-descent

Burstone asked 22/7, 2013 at 9:55

7

Solved

pytorch - connection between loss.backward() and optimizer.step()

Where is an explicit connection between the optimizer and the loss? How does the optimizer know where to get the gradients of the loss without a call liks this optimizer.step(loss)? -More context- ...

machine-learning neural-network pytorch gradient-descent

Seeley asked 30/12, 2018 at 6:30

1

Suboptimal convergence in PyTorch compared to TensorFlow when using Adam optimizer

My program for training a model in PyTorch converges worse than the TensorFlow implementation. When I switch to SGD instead of Adam, the losses are identical. With Adam, the losses are different st...

tensorflow deep-learning pytorch gradient-descent

Ceric asked 24/5, 2021 at 20:56

1

Get positive and negative part of gradient for loss function in PyTorch

I want to implement non-negative matrix factorization using PyTorch. Here is my initial implement: def nmf(X, k, lr, epochs): # X: input matrix of size (m, n) # k: number of latent factors # lr:...

matrix pytorch mathematical-optimization gradient-descent autograd

Albion asked 15/3, 2023 at 9:14

1

Solved

Why do we multiply learning rate by gradient accumulation steps in PyTorch?

Loss functions in pytorch use "mean" reduction. So it means that the model gradient will have roughly the same magnitude given any batch size. It makes sense that you want to scale the le...

python deep-learning pytorch gradient-descent learning-rate

Smolder asked 10/3, 2023 at 22:15

1

Solved

Is SGD optimizer in PyTorch actually does Gradient Descent algorithm?

I'm working on trying to compare the converge rate of SGD and GD algorithms for the neural networks. In PyTorch, we often use SGD optimizer as follows. train_dataloader = torch.utils.data.DataLoade...

python pytorch gradient-descent

Salvidor asked 4/6, 2022 at 0:16

2

Solved

Understanding accumulated gradients in PyTorch

I am trying to comprehend inner workings of the gradient accumulation in PyTorch. My question is somewhat related to these two: Why do we need to call zero_grad() in PyTorch? Why do we need to ex...

python deep-learning pytorch gradient-descent

Caloric asked 28/5, 2020 at 14:35

4

Solved

scipy.optimize.fmin_l_bfgs_b returns 'ABNORMAL_TERMINATION_IN_LNSRCH'

I am using scipy.optimize.fmin_l_bfgs_b to solve a gaussian mixture problem. The means of mixture distributions are modeled by regressions whose weights have to be optimized using EM algorithm. si...

optimization machine-learning statistics normal-distribution gradient-descent

Passementerie asked 7/1, 2016 at 19:27

3

Solved

Will switching GPU device affect the gradient in PyTorch back propagation?

I use the Pytorch. In the computation, I move some data and operators A in the GPU. In the middle step, I move the data and operators B to CPU and continue the forward. My question is that: My oper...

pytorch gpu cpu gradient-descent backpropagation

Fluid asked 23/8, 2021 at 4:11

1

Why can't I get the result I got with the sklearn LogisticRegression with the coefficients_sgd method?

from math import exp import numpy as np from sklearn.linear_model import LogisticRegression I used code below from How To Implement Logistic Regression From Scratch in Python def predict(row, coef...

python scikit-learn iteration gradient-descent

Sarawak asked 27/2, 2022 at 9:47

2

Solved

Full-matrix approach to backpropagation in Artificial Neural Network

I am learning Artificial Neural Network (ANN) recently and have got a code working and running in Python for the same based on mini-batch training. I followed the book of Michael Nilson's Neural Ne...

python numpy neural-network backpropagation gradient-descent

Coldblooded asked 24/7, 2015 at 4:28

2

Solved

Difference between autograd.grad and autograd.backward?

Suppose I have my custom loss function and I want to fit the solution of some differential equation with help of my neural network. So in each forward pass, I am calculating the output of my neural...

pytorch gradient gradient-descent backpropagation autograd

Tenner asked 12/9, 2021 at 5:8

2

Solved

Would Newton's method classify as a Gradient Descent Method?

Could be quite a trivial question to answer, but I just wanted to be clearer. From the available literature and the discussion in What is the difference between Gradient Descent and Newton's Gr...

gradient-descent newtons-method

Alansen asked 18/1, 2020 at 6:43

5

Solved

What is the difference between Gradient Descent and Newton's Gradient Descent?

I understand what Gradient Descent does. Basically it tries to move towards the local optimal solution by slowly moving down the curve. I am trying to understand what is the actual difference betwe...

machine-learning data-mining mathematical-optimization gradient-descent newtons-method

Spectrum asked 22/8, 2012 at 5:27

4

Solved

How to do gradient clipping in pytorch?

What is the correct way to perform gradient clipping in pytorch? I have an exploding gradients problem.

python machine-learning deep-learning pytorch gradient-descent

Peh asked 15/2, 2019 at 20:9

1

Solved

PyTorch `torch.no_grad` vs `torch.inference_mode`

PyTorch has new functionality torch.inference_mode as of v1.9 which is "analogous to torch.no_grad... Code run under this mode gets better performance by disabling view tracking and version co...

machine-learning pytorch artificial-intelligence gradient-descent inference

Godber asked 12/10, 2021 at 16:21

4

Solved

What is the difference between SGD and back-propagation?

Can you please tell me the difference between Stochastic Gradient Descent (SGD) and back-propagation?

machine-learning artificial-intelligence gradient-descent backpropagation

Counterinsurgency asked 21/6, 2016 at 20:2

2

Solved

Why torch.sum() before doing .backward()?

I can see what this code below from this video is trying to do. But the sum from y=torch.sum(x**2) confuses me. With sum operation, y becomes a tensor with one single value. As I understand .backwa...

python matplotlib machine-learning pytorch gradient-descent

Ramshackle asked 2/8, 2019 at 6:12

2

Solved

why too many epochs will cause overfitting?

I am reading the a deep learning with python book. After reading chapter 4, Fighting Overfitting, I have two questions. Why might increasing the number of epochs cause overfitting? I know increa...

machine-learning gradient-descent

Mazonson asked 27/12, 2018 at 9:22

3

Solved

Tensorflow 2.0 doesn't compute the gradient

I want to visualize the patterns that a given feature map in a CNN has learned (in this example I'm using vgg16). To do so I create a random image, feed through the network up to the desired convol...

python tensorflow conv-neural-network gradient-descent

Mandrake asked 6/7, 2019 at 17:38

gradient-descent Questions

Recommended topics

Hot tags