Problems installing nimfa (Python Matrix Factorization library)
Asked Answered
S

1

7

I have a large (~25000 x 1000) matrix to factorize. I wrote my own code based on numpy, but it's inefficient and keeps throwing up a memory error.

I've been trying to install and use nimfa (http://nimfa.biolab.si/) and the install process (tried easy_install, pip, and downloading and running the git) doesn't show any errors. But when I try to call it using import nimfa I get the below error. I checked the nimfa prerequisites and it doesn't mention anything besides numpy and scipy.

I'm on Windows 8, and using Python 2.7.5 with numpy and scipy installed. I've also tried installing (and subsequently uninstalling) minGW and doing this.

Any ideas?

Traceback (most recent call last):
  File "<pyshell#0>", line 1, in <module>
    import nimfa
  File "C:\Python27\lib\site-packages\nimfa-1.0-py2.7.egg\nimfa\__init__.py", line 18, in    <module>
    from mf_run import *
  File "C:\Python27\lib\site-packages\nimfa-1.0-py2.7.egg\nimfa\mf_run.py", line 26, in <module>
    from utils import *
  File "C:\Python27\lib\site-packages\nimfa-1.0-py2.7.egg\nimfa\utils\__init__.py", line 8, in <module>
    import linalg
  File "C:\Python27\lib\site-packages\nimfa-1.0-py2.7.egg\nimfa\utils\linalg.py", line 15, in <module>
    import scipy.sparse.linalg as sla
  File "C:\Python27\lib\site-packages\scipy\sparse\linalg\__init__.py", line 100, in <module>
    from .isolve import *
  File "C:\Python27\lib\site-packages\scipy\sparse\linalg\isolve\__init__.py", line 6, in <module>
    from .iterative import *
  File "C:\Python27\lib\site-packages\scipy\sparse\linalg\isolve\iterative.py", line 7, in <module>
    from . import _iterative
ImportError: DLL load failed: The specified module could not be found.`
Showy answered 24/7, 2013 at 8:28 Comment(1)
Have you checked whether the statement import scipy.sparse.linalg in the Python interpreter gives the same error?Snapdragon
T
0

If your purpose is to factorize the matrix rather than to use nimfa to do it, I would suggest using dask instead. Dask is designed to make it possible to complete operations on data objects that fit on disk but not in memory with minimal changes to code. A working example:

import dask.array as da
import numpy as np
import dask

mtx = da.from_array(np.random.normal(size=(25000, 1000)), chunks=(250, 20))

q, r = np.linalg.qr(mtx)

You may need to tune the chunks parameter to fit your computing resources (see the FAQs for advice on that).

Teletypesetter answered 31/5, 2017 at 23:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.