I was trying to solve a linear system Xc=y
that was square. The methods I know to solve this are:
- using inverse
c=<X^-1,y>
- using Gaussian elimination
- using the pseudo-inverse
It seems as far as I can tell that these don't match what I thought would be the ground truth.
- First generate the truth parameters by fitting a polynomial of degree 30 to a cosine with frequency 5. So I have
y_truth = X*c_truth
. - Then I check if the above three methods match the truth
I tried it but the methods don't seem to match and I don't see why that should be the case.
I produced fully runnable reproducible code:
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
## some parameters
degree_target = 25
N_train = degree_target+1
lb,ub = -2000,2000
x = np.linspace(lb,ub,N_train)
## generate target polynomial model
freq_cos = 5
y_cos = np.cos(2*np.pi*freq_cos*x)
c_polyfit = np.polyfit(x,y_cos,degree_target)[::-1] ## needs to me reverse to get highest power last
## generate kernel matrix
poly_feat = PolynomialFeatures(degree=degree_target)
K = poly_feat.fit_transform(x.reshape(N_train,1)) # generates degree 0 first
## get target samples of the function
y = np.dot(K,c_polyfit)
## get pinv approximation of c_polyfit
c_pinv = np.dot( np.linalg.pinv(K), y)
## get Gaussian-Elminiation approximation of c_polyfit
c_GE = np.linalg.solve(K,y)
## get inverse matrix approximation of c_polyfit
i = np.linalg.inv(K)
c_mdl_i = np.dot(i,y)
## check rank to see if its truly invertible
print('rank(K) = {}'.format( np.linalg.matrix_rank(K) ))
## comapre parameters
print('--c_polyfit')
print('||c_polyfit-c_GE||^2 = {}'.format( np.linalg.norm(c_polyfit-c_GE) ))
print('||c_polyfit-c_pinv||^2 = {}'.format( np.linalg.norm(c_polyfit-c_pinv) ))
print('||c_polyfit-c_mdl_i||^2 = {}'.format( np.linalg.norm(c_polyfit-c_mdl_i) ))
print('||c_polyfit-c_polyfit||^2 = {}'.format( np.linalg.norm(c_polyfit-c_polyfit) ))
##
print('--c_GE')
print('||c_GE-c_GE||^2 = {}'.format( np.linalg.norm(c_GE-c_GE) ))
print('||c_GE-c_pinv||^2 = {}'.format( np.linalg.norm(c_GE-c_pinv) ))
print('||c_GE-c_mdl_i||^2 = {}'.format( np.linalg.norm(c_GE-c_mdl_i) ))
print('||c_GE-c_polyfit||^2 = {}'.format( np.linalg.norm(c_GE-c_polyfit) ))
##
print('--c_pinv')
print('||c_pinv-c_GE||^2 = {}'.format( np.linalg.norm(c_pinv-c_GE) ))
print('||c_pinv-c_pinv||^2 = {}'.format( np.linalg.norm(c_pinv-c_pinv) ))
print('||c_pinv-c_mdl_i||^2 = {}'.format( np.linalg.norm(c_pinv-c_mdl_i) ))
print('||c_pinv-c_polyfit||^2 = {}'.format( np.linalg.norm(c_pinv-c_polyfit) ))
##
print('--c_mdl_i')
print('||c_mdl_i-c_GE||^2 = {}'.format( np.linalg.norm(c_mdl_i-c_GE) ))
print('||c_mdl_i-c_pinv||^2 = {}'.format( np.linalg.norm(c_mdl_i-c_pinv) ))
print('||c_mdl_i-c_mdl_i||^2 = {}'.format( np.linalg.norm(c_mdl_i-c_mdl_i) ))
print('||c_mdl_i-c_polyfit||^2 = {}'.format( np.linalg.norm(c_mdl_i-c_polyfit) ))
and I get the result:
rank(K) = 4
--c_polyfit
||c_polyfit-c_GE||^2 = 4.44089220304006e-16
||c_polyfit-c_pinv||^2 = 1.000000000000001
||c_polyfit-c_mdl_i||^2 = 1.1316233165135605e-06
||c_polyfit-c_polyfit||^2 = 0.0
--c_GE
||c_GE-c_GE||^2 = 0.0
||c_GE-c_pinv||^2 = 1.0000000000000007
||c_GE-c_mdl_i||^2 = 1.1316233160694804e-06
||c_GE-c_polyfit||^2 = 4.44089220304006e-16
--c_pinv
||c_pinv-c_GE||^2 = 1.0000000000000007
||c_pinv-c_pinv||^2 = 0.0
||c_pinv-c_mdl_i||^2 = 0.9999988683985006
||c_pinv-c_polyfit||^2 = 1.000000000000001
--c_mdl_i
||c_mdl_i-c_GE||^2 = 1.1316233160694804e-06
||c_mdl_i-c_pinv||^2 = 0.9999988683985006
||c_mdl_i-c_mdl_i||^2 = 0.0
||c_mdl_i-c_polyfit||^2 = 1.1316233165135605e-06
Why is it? Is it a machine precision thing? Or is it because the error accumulates (a lot) when degree is large (greater than 1)? Honestly I don't know but all those hypothesis seem silly to me. If anyone can spot my mistake, feel free to point it out. Otherwise, I might not understand linear algebra or something...which is a lot more worrying.
Also if I can get suggestions for this to work, it would be highly appreciated. Do I:
- increase the size of the interval to no be less than 1 (in magnitude)?
- what is the largest polynomial size degree I may use?
- different language...? Or increase the precision?
any suggestions are appreciated!
[-1,1]
or is this just special to (pseudo) inverting a matrix? – Weinberg[-5,5]
. However, now my question is, why doesn't the opposite issue arise and then instead of1.0+eps = 1.0
that we get1.0+big_number = NaN
? ifbig_number = number^30
? Like why do numerical issue seem to be more sensitive when raising to the power of 30 a number with magnitude less than 1? – Weinberg