Is there a way to prevent numpy.linalg.svd running out of memory?

Z

2

6

I have 1 million 3d points I am passing to numpy.linalg.svd but it runs out of memory very quickly. Is there a way to break down this operation into smaller chunks?

I don't know what it's doing but am I only supposed to pass arrays that represent a 3x3, 4x4 matrix? Because I have seen uses of it online where they were passing arrays with arbitrary number of elements.

Zig answered 2/11, 2013 at 15:32 Comment(2)

from the documentation svd requires 2-D arrays as inputs. It seems you need less memory if you pass full_matrices=False... – Diplocardiac 2/11, 2013 at 15:58

That helps, would it give the exact same result though? I am getting the same result but the description of the argument doesn't make sense to me. – Zig 2/11, 2013 at 16:29

A

1

If you have an MxN in your case 1000000x3 matrixnumpy.linalg.svd does not require M==N. In fact this is precisely where the SVD can come in to compute things like rank and pseudo inverse. Methods such as linalg.inv require a square (and full rank) matrix to have a defined result.

@Saullo Castro is right on. The full_matrices=False can turn intractable to tractable because instead of the U matrix being 1Mx1M elements, it is 1Mx3, a huge savings. I am not sure which reduced SVD algorithm numpy uses (I think it might be the Compact SVD, or thin): a brief description of 3 widely used ones is on wikipedia: http://en.wikipedia.org/wiki/Singular_value_decomposition in the Reduced SVDs section. They all center around reducing the computation of the full U matrix, to a reduced form and this is sufficient for some, and perhaps even many, problems. The savings is most when numberObservations>>numberFeatures

Regarding whether you get the same result. The short answer can be yes depending on exactly what you will be doing with the SVD results. For example you will get the same matrix back (to a floating point tolerance level) with the reduced form as with the original, as can be seen in the below code. Note in the top case the size of U is numberObservations x numberObservations, whereas in the full_matrices=False, the size of U is numberObservations x numberFeatures

This code was adapted from the numpy.linalg.svd doc to allow one to experiment with arbitrary rows/columns, singular values to select.

One can always reduce the size of the U matrix to M x min(M,N). Further reductions may be possible depending on the structure of your data and how much noise is present. Just because the numpy.isclose is false does not mean the computed SV's are bad for all contexts. You can experiment with this using the mostSignificantSingularValues variable, which takes the top SV's from the full SVD.

numberObservations = 900
numberFeatures = 600
mostSignificantSingularValues = 600

a = np.random.randn( numberObservations, numberFeatures) + 1j*np.random.randn(numberObservations, numberFeatures)

#Reconstruction based on full SVD:

U, s, V = np.linalg.svd(a, full_matrices=True)
print(U.shape, V.shape, s.shape)
S = np.zeros((numberObservations, numberFeatures), dtype=complex)
S[:mostSignificantSingularValues, :mostSignificantSingularValues] = np.diag(s[:mostSignificantSingularValues])
print(np.allclose(a, np.dot(U, np.dot(S, V))))
d1 = a - np.dot(U, np.dot(S, V))#
#True

#Reconstruction based on reduced SVD:

U, s, V = np.linalg.svd(a, full_matrices=False)
print(U.shape, V.shape, s.shape)
S = np.diag(s)
print(np.allclose(a, np.dot(U, np.dot(S, V))))

d2 = a - np.dot(U, np.dot(S, V))#

Allottee answered 4/11, 2013 at 16:1 Comment(0)

N

0

Try to use scipy.linalg.svd instead of numpy's func.

Negrito answered 3/8, 2016 at 12:0 Comment(0)

Recommended topics

Hot tags