Asked 16/6, 2018 at 2:41 Answered 26/10, 2019 at 3:33

Solved python numpy scipy sparse-matrix svd

I was working on implementing a solver for sparse undetermined systems in Python (discussed here) and I was trying to rebuild the nullspace function that uses the standard numpy svd function (numpy.linalg.svd) in the SciPy cookbook using the scipy.sparse version of svd (scipy.sparse.linalg.svds) but it outputs different left and right singular vectors for the examples I ran - including the matrices:

[[1,1,0,0,0],[0,0,1,1,0],[1,1,1,1,1]]  
[[1,0,1],[1,1,0],[0,1,1]]

Why do these two solvers produce two different svd outputs for the matrices above? What can I do to ensure the same output?

Edit

Here's an example: table is a csc_matrix such that

table.todense()  = matrix([[1,1,0,0,0],[0,0,1,1,0],[1,1,1,1,1]],dtype=int64)

So, the following code outputs

numpy.linalg.svd(table.todense()) =  
[[ -3.64512933e-01   7.07106781e-01  -6.05912800e-01]  
[ -3.64512933e-01  -7.07106781e-01  -6.05912800e-01]  
[ -8.56890100e-01   2.32635116e-16   5.15499134e-01]]  
-----------------------------------------------------
[ 2.58873755  1.41421356  0.54629468]
-----------------------------------------------------
[[ -4.7181e-01 -4.7181e-01 -4.7181e-01 -4.7181e-01 -3.3101e-01]
[5e-01   5e-01  -5e-01  -5e-01 6.16450329e-17]
[-1.655e-01  -1.655e-01  -1.655e-01  -1.655e-01  9.436e-01]
[5e-01  -5e-01  -5e-01   5e-01 -1.77302319e-16]
[-5e-01  5e-01  -5e-01   5e-01 2.22044605e-16]]

And the following

scipy.sparse.linalg.svds(table,k=2)=  
[[  7.07106781e-01,  -3.64512933e-01],
[ -7.07106781e-01,  -3.64512933e-01],
[  2.73756255e-18,  -8.56890100e-01]]
-------------------------------------
[ 1.41421356,  2.58873755]
-------------------------------------
[[  5e-01,   5e-01,  -5e-01, -5e-01,   1.93574904e-18],
[ -4.71814e-01,  -4.71814e-01,  -4.71814e-01, -4.71814e-01,  -3.31006e-01]]

Note that there are quite a few values that overlap between the two solutions. Also, the scipy.sparse.linalg.svds function doesn't allow k to be greater than or equal to min(table.shape), which is why I chose k=2.

Lore answered 16/6, 2018 at 2:41 Comment(5)

Could you, please, show an example of input data, how you run these functions and output? It would highly improve this question. – Lydell 16/6, 2018 at 3:11

I would check out the linked question, I explain it better there. – Lore 16/6, 2018 at 14:36

For both functions, the singular values (s) is the main thing they promise. The k largest s values match. SVD is not unique. en.wikipedia.org/wiki/… – Speak 16/6, 2018 at 16:58

What fast algorithms exist for computing truncated SVD looks like a useful answer. It talks about truncated SVD, and mentions the ARPACK implementation, which svds also cites. MATLAB has both svd and svds. Also fa.bianp.net/blog/2012/singular-value-decomposition-in-scipy – Speak 16/6, 2018 at 19:20

Huh interesting. I didn't realize svd was not unique. That definitely throws a monkey wrench into my plans. – Lore 18/6, 2018 at 14:20

The output in the question you posted looks fine to me. In the numpy call you are calculating every singular value and in the scipy code you are calculating just the top k singular values, and they match the top k from the numpy output.

The sparse top k svd won't let you calculate every singular value because if you wanted to do that, then you could just use the full svd function.

Below I have included code for you to check this out yourself. The caveat is that while the numpy and scipy full svds can both recreate the original matrix well enough, the top k svd cannot. This is because you are throwing away data. Normally this is fine given that you are okay with being close enough. The issue is that SVD if used with top k can be used as a low rank approximation of the original matrix, not as a replacement.

For clarity, my experience on this comes from implementing a python, parallel version of this paper for the original author, A Sparse Plus Low-Rank Exponential Language Model for Limited Resource Scenarios.

import numpy as np                                                                                   
from scipy import linalg                                                                            
from scipy.sparse import linalg as slinalg                                                           

x = np.array([[1,1,0,0,0],[0,0,1,1,0],[1,1,1,1,1]],dtype=np.float64)                                 

npsvd = np.linalg.svd(x)                                                                             
spsvd = linalg.svd(x)                                                                                
sptop = slinalg.svds(x,k=2)                                                                          

print "np"                                                                                           
print "u: ", npsvd[0]                                                                                
print "s: ", npsvd[1]                                                                                
print "v: ", npsvd[2]                                                                                

print "\n=========================\n"                                                                

print "sp"                                                                                           
print "u: ", spsvd[0]                                                                                
print "s: ", spsvd[1]                                                                                
print "v: ", spsvd[2]                                                                                

print "\n=========================\n"                                                                

print "sp top k"                                                                                     
print "u: ", sptop[0]                                                                                
print "s: ", sptop[1]                                                                                
print "v: ", sptop[2]                                                                                

nptmp = np.zeros((npsvd[0].shape[1],npsvd[2].shape[0]))                                              
nptmp[np.diag_indices(np.min(nptmp.shape))] = npsvd[1]                                               
npreconstruct = np.dot(npsvd[0], np.dot(nptmp,npsvd[2]))                                             

print npreconstruct                                                                                  
print "np close? : ", np.allclose(npreconstruct, x)                                                  

sptmp = np.zeros((spsvd[0].shape[1],spsvd[2].shape[0]))                                              
sptmp[np.diag_indices(np.min(sptmp.shape))] = spsvd[1]                                               
spreconstruct = np.dot(spsvd[0], np.dot(sptmp,spsvd[2]))                                             

print spreconstruct                                                                                  
print "sp close? : ", np.allclose(spreconstruct, x)                                                  

sptoptmp = np.zeros((sptop[0].shape[1],sptop[2].shape[0]))                                           
sptoptmp[np.diag_indices(np.min(sptoptmp.shape))] = sptop[1]                                         
sptopreconstruct = np.dot(sptop[0], np.dot(sptoptmp,sptop[2]))                                       

print sptopreconstruct                                                                               
print "sp top close? : ", np.allclose(sptopreconstruct, x)

Condom answered 17/6, 2018 at 1:44 Comment(7)

This is quite interesting. Is it possible to calculate the nullspace of a matrix from this truncated svd, or not? If not, are there any full svd solvers for sparse matrices in python? – Lore 17/6, 2018 at 21:29

Off the top of my head, I don't believe that you can use a truncated svd to calculate the nullspace as you need all of the singular values. There are two main differences between the sparse version and full version. The full version is faster by a whole factor (O(n^3) v O(n^4), but scales by the size of the matrix in memory requirements. The sparse version scales in memory the number of non-zeros. In your case, as long at the matrix is not too large, I would use the full version. You can also look into things like sparsesvd if sparse is needed. – Condom 17/6, 2018 at 23:0

The examples I'm using make it nearly impossible to store the matrix in non-sparse form due to the size of data. Unfortunately, sparsesvd also computes a truncated svd, so I don't really have many options here. Do you know a way to hard code a sparse version of full svd? – Lore 19/6, 2018 at 21:49

Have you tried passing a sparse matrix to scipy.linalg.svd? – Condom 19/6, 2018 at 21:54

Yea I have... I got this: Sparse matrices are not supported by this function. Perhaps one of the scipy.sparse.linalg functions would work instead. – Lore 20/6, 2018 at 16:0

Have you thought about using something other than python? Looking around it seems like R is able to do this. stats.stackexchange.com/questions/41259/… – Condom 21/6, 2018 at 4:37

If it has full svd, I'm definitely willing to try it! – Lore 21/6, 2018 at 20:20

u,sigma,v = numpy.linalg.svd(A)

is equal to

u,sigma,v = scipy.sparse.linalg.svds(A,k=min(A.shape[0],A.shape[1]))
u = -u[:,::-1]
v = -v[::-1,:]
sigma = sigma[::-1]

Foundry answered 26/10, 2019 at 3:33 Comment(2)

How does this answer the question? – Lore 26/10, 2019 at 3:35

(This is not to say the above doesn't try to answer Why do these two solvers produce two different svd outputs or fails to do so correctly - but, please: explain. Heed How do I write a good answer?) – Play 26/10, 2019 at 4:47

Edit

Recommended topics

Hot tags