Why do different methods for solving Xc=y in python give different solution when they should not?
Asked Answered
W

1

1

I was trying to solve a linear system Xc=y that was square. The methods I know to solve this are:

  1. using inverse c=<X^-1,y>
  2. using Gaussian elimination
  3. using the pseudo-inverse

It seems as far as I can tell that these don't match what I thought would be the ground truth.

  1. First generate the truth parameters by fitting a polynomial of degree 30 to a cosine with frequency 5. So I have y_truth = X*c_truth.
  2. Then I check if the above three methods match the truth

I tried it but the methods don't seem to match and I don't see why that should be the case.

I produced fully runnable reproducible code:

import numpy as np
from sklearn.preprocessing import PolynomialFeatures

## some parameters
degree_target = 25
N_train = degree_target+1
lb,ub = -2000,2000
x = np.linspace(lb,ub,N_train)
## generate target polynomial model
freq_cos = 5
y_cos = np.cos(2*np.pi*freq_cos*x)
c_polyfit = np.polyfit(x,y_cos,degree_target)[::-1] ## needs to me reverse to get highest power last
## generate kernel matrix
poly_feat = PolynomialFeatures(degree=degree_target)
K = poly_feat.fit_transform(x.reshape(N_train,1)) # generates degree 0 first
## get target samples of the function
y = np.dot(K,c_polyfit)
## get pinv approximation of c_polyfit
c_pinv = np.dot( np.linalg.pinv(K), y)
## get Gaussian-Elminiation approximation of c_polyfit
c_GE = np.linalg.solve(K,y)
## get inverse matrix approximation of c_polyfit
i = np.linalg.inv(K)
c_mdl_i = np.dot(i,y)
## check rank to see if its truly invertible
print('rank(K) = {}'.format( np.linalg.matrix_rank(K) ))
## comapre parameters
print('--c_polyfit')
print('||c_polyfit-c_GE||^2 = {}'.format( np.linalg.norm(c_polyfit-c_GE) ))
print('||c_polyfit-c_pinv||^2 = {}'.format( np.linalg.norm(c_polyfit-c_pinv) ))
print('||c_polyfit-c_mdl_i||^2 = {}'.format( np.linalg.norm(c_polyfit-c_mdl_i) ))
print('||c_polyfit-c_polyfit||^2 = {}'.format( np.linalg.norm(c_polyfit-c_polyfit) ))
##
print('--c_GE')
print('||c_GE-c_GE||^2 = {}'.format( np.linalg.norm(c_GE-c_GE) ))
print('||c_GE-c_pinv||^2 = {}'.format( np.linalg.norm(c_GE-c_pinv) ))
print('||c_GE-c_mdl_i||^2 = {}'.format( np.linalg.norm(c_GE-c_mdl_i) ))
print('||c_GE-c_polyfit||^2 = {}'.format( np.linalg.norm(c_GE-c_polyfit) ))
##
print('--c_pinv')
print('||c_pinv-c_GE||^2 = {}'.format( np.linalg.norm(c_pinv-c_GE) ))
print('||c_pinv-c_pinv||^2 = {}'.format( np.linalg.norm(c_pinv-c_pinv) ))
print('||c_pinv-c_mdl_i||^2 = {}'.format( np.linalg.norm(c_pinv-c_mdl_i) ))
print('||c_pinv-c_polyfit||^2 = {}'.format( np.linalg.norm(c_pinv-c_polyfit) ))
##
print('--c_mdl_i')
print('||c_mdl_i-c_GE||^2 = {}'.format( np.linalg.norm(c_mdl_i-c_GE) ))
print('||c_mdl_i-c_pinv||^2 = {}'.format( np.linalg.norm(c_mdl_i-c_pinv) ))
print('||c_mdl_i-c_mdl_i||^2 = {}'.format( np.linalg.norm(c_mdl_i-c_mdl_i) ))
print('||c_mdl_i-c_polyfit||^2 = {}'.format( np.linalg.norm(c_mdl_i-c_polyfit) ))

and I get the result:

rank(K) = 4
--c_polyfit
||c_polyfit-c_GE||^2 = 4.44089220304006e-16
||c_polyfit-c_pinv||^2 = 1.000000000000001
||c_polyfit-c_mdl_i||^2 = 1.1316233165135605e-06
||c_polyfit-c_polyfit||^2 = 0.0
--c_GE
||c_GE-c_GE||^2 = 0.0
||c_GE-c_pinv||^2 = 1.0000000000000007
||c_GE-c_mdl_i||^2 = 1.1316233160694804e-06
||c_GE-c_polyfit||^2 = 4.44089220304006e-16
--c_pinv
||c_pinv-c_GE||^2 = 1.0000000000000007
||c_pinv-c_pinv||^2 = 0.0
||c_pinv-c_mdl_i||^2 = 0.9999988683985006
||c_pinv-c_polyfit||^2 = 1.000000000000001
--c_mdl_i
||c_mdl_i-c_GE||^2 = 1.1316233160694804e-06
||c_mdl_i-c_pinv||^2 = 0.9999988683985006
||c_mdl_i-c_mdl_i||^2 = 0.0
||c_mdl_i-c_polyfit||^2 = 1.1316233165135605e-06

Why is it? Is it a machine precision thing? Or is it because the error accumulates (a lot) when degree is large (greater than 1)? Honestly I don't know but all those hypothesis seem silly to me. If anyone can spot my mistake, feel free to point it out. Otherwise, I might not understand linear algebra or something...which is a lot more worrying.

Also if I can get suggestions for this to work, it would be highly appreciated. Do I:

  1. increase the size of the interval to no be less than 1 (in magnitude)?
  2. what is the largest polynomial size degree I may use?
  3. different language...? Or increase the precision?

any suggestions are appreciated!

Weinberg answered 21/10, 2017 at 3:46 Comment(2)
as a fun side question, if I were to train my polynomial model with GD or SGD, would I have the same numerical errors if I restrict my model to be in [-1,1] or is this just special to (pseudo) inverting a matrix?Weinberg
Another follow up question. I've been suggested multiple times to just change the interval to say, [-5,5]. However, now my question is, why doesn't the opposite issue arise and then instead of 1.0+eps = 1.0 that we get 1.0+big_number = NaN? if big_number = number^30? Like why do numerical issue seem to be more sensitive when raising to the power of 30 a number with magnitude less than 1?Weinberg
P
2

The issue is floating point accuracy. You're raising numbers between zero and one to the 30th power, then adding them together. If you were doing this with infinite precision arithmetic, the methods would recover the inputs. With floating point arithmetic, precision loss means you can't.

Pejsach answered 21/10, 2017 at 3:53 Comment(10)
Is there a way to make these methods yield the same answer? Maybe changing the range of the inputs? Or maybe having lower degree of polynomials?Weinberg
I've been playing around with it and it seems particularly bad for the pseudo-inverse which is the one I cared about most...Weinberg
also, is this a language dependent issue? would it still occur in Matlab for example?Weinberg
I don't know of a way to easily modify high order polynomial regression to be numerically stable, either in matlab, python, or another language. What you could do is to use better-behaved basis functions, e.g a truncated Fourier series.Pejsach
if I were to train my polynomial model with GD or SGD, would I have the same numerical errors if I restrict my model to be in [-1,1] or is this just special to (pseudo) inverting a matrix?Weinberg
is there a way to know what is the largest degree of polynomials I can try without getting into these issues for a given interval size?Weinberg
If you're using float-64, a good rule of thumb is that you shouldn't add or subtract numbers that differ by more than a factor of 10^16. For float-32, it's no more than 10^8.Pejsach
Another follow up question. I've been suggested multiple times to just change the interval to say, [-5,5]. However, now my question is, why doesn't the opposite issue arise and then instead of 1.0+eps = 1.0 that we get 1.0+big_number = NaN? if big_number = number^30? Like why do numerical issue seem to be more sensitive when raising to the power of 30 a number with magnitude less than 1?Weinberg
why is a truncated Fourier series better behaved than a polynomial one?Weinberg
would changing the basis to orthogonal polyonomials like Hermite polynomials, have the same issue as the ones I am using now? If not, do you know why not?Weinberg

© 2022 - 2024 — McMap. All rights reserved.