For all the up votes of the mmwrite
answer, I'm surprised no one tried to answer the actual question. But since it has been reactivated, I'll give it a try.
This reproduces the OP case:
In [90]: x=sparse.csr_matrix(np.arange(10).reshape(2,5))
In [91]: np.save('save_sparse.npy',x)
In [92]: X=np.load('save_sparse.npy')
In [95]: X
Out[95]:
array(<2x5 sparse matrix of type '<type 'numpy.int32'>'
with 9 stored elements in Compressed Sparse Row format>, dtype=object)
In [96]: X[()].A
Out[96]:
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
In [93]: X[()].A
Out[93]:
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
In [94]: x
Out[94]:
<2x5 sparse matrix of type '<type 'numpy.int32'>'
with 9 stored elements in Compressed Sparse Row format
The [()]
that `user4713166 gave us is not a 'hard way' to extract the sparse array.
np.save
and np.load
are designed to operate on ndarrays. But a sparse matrix is not such an array, nor is it a subclass (as np.matrix
is). It appears that np.save
wraps the non-array object in an object dtype array
, and saves it along with a pickled form of the object.
When I try to save a different kind of object, one that can't be pickled, I get an error message at:
403 # We contain Python objects so we cannot write out the data directly.
404 # Instead, we will pickle it out with version 2 of the pickle protocol.
--> 405 pickle.dump(array, fp, protocol=2)
So in answer to Is Scipy smart enough to understand that it has loaded a sparse array?
, no. np.load
does not know about sparse arrays. But np.save
is smart enough to punt when given something that isn't an array, and np.load
does what it can with what if finds in the file.
As to alternative methods of saving and loading sparse arrays, the io.savemat
, MATLAB compatible method, has been mentioned. It would be my first choice. But this example also shows that you can use the regular Python pickling
. That might be better if you need to save a particular sparse format. And np.save
isn't bad if you can live with the [()]
extraction step. :)
https://github.com/scipy/scipy/blob/master/scipy/io/matlab/mio5.py
write_sparse
- sparse are saved in csc
format. Along with headers it saves A.indices.astype('i4'))
, A.indptr.astype('i4'))
, A.data.real
, and optionally A.data.imag
.
In quick tests I find that np.save/load
handles all sparse formats, except dok
, where the load
complains about a missing shape
. Otherwise I'm not finding any special pickling code in the sparse files.
scipy.io
is the proper solution. I would add that if you want to go down the optimization road, you might considernumpy.load(mmap_mode='r'/'c')
. Memory-mapping the files from disk gives instant load and can save memory, as the same memory-mapped array can be shared across multiple processes. – Chaworth