How to efficiently convert Matlab engine arrays to numpy ndarray?
Asked Answered
S

2

13

I am currently working on a project where I need do some steps of processing with legacy Matlab code (using the Matlab engine) and the rest in Python (numpy).

I noticed that converting the results from Matlab's matlab.mlarray.double to numpy's numpy.ndarray seems horribly slow.

Here is some example code for creating an ndarray with 1000 elements from another ndarray, a list and an mlarray:

import timeit
setup_range = ("import numpy as np\n"
               "x = range(1000)")
setup_arange = ("import numpy as np\n"
                "x = np.arange(1000)")
setup_matlab = ("import numpy as np\n"
                "import matlab.engine\n"
                "eng = matlab.engine.start_matlab()\n"
                "x = eng.linspace(0., 1000.-1., 1000.)")
print 'From other array'
print timeit.timeit('np.array(x)', setup=setup_arange, number=1000)
print 'From list'
print timeit.timeit('np.array(x)', setup=setup_range, number=1000)
print 'From matlab'
print timeit.timeit('np.array(x)', setup=setup_matlab, number=1000)

Which takes the following times:

From other array
0.00150722111994
From list
0.0705359556928
From matlab
7.0873282467

The conversion takes about 100 times as long as a conversion from list.

Is there any way to speed up the conversion?

Steato answered 8/12, 2015 at 12:23 Comment(1)
RobR's answer is more general, look at it for N(>2) dimensional arraysHepner
S
16

Moments after posting the question I found the solution.

For one-dimensional arrays, access only the _data property of the Matlab array.

import timeit
print 'From list'
print timeit.timeit('np.array(x)', setup=setup_range, number=1000)
print 'From matlab'
print timeit.timeit('np.array(x)', setup=setup_matlab, number=1000)
print 'From matlab_data'
print timeit.timeit('np.array(x._data)', setup=setup_matlab, number=1000)

prints

From list
0.0719847538787
From matlab
7.12802865169
From matlab_data
0.118476275533

For multi-dimensional arrays you need to reshape the array afterwards. In the case of two-dimensional arrays this means calling

np.array(x._data).reshape(x.size[::-1]).T
Steato answered 8/12, 2015 at 12:27 Comment(6)
And if the data is complex, then use the _real and _imag property (instead of _data)Consonance
Or equivalently: np.array(x._data).reshape(x.size, order='F')Burleigh
which is slightly fasterBurleigh
This solution appears to no longer be working with Matlab R2022a. Support request has been submitted.Disconnect
With MATLAB R2022a and later, you can and should pass the MATLAB object directly into the NumPy constructor, rather than using the undocumented _data attribute. Given the fact that the implementation of multidimensional arrays is now orders of magnitude faster (see the R2022a release notes), any workaround is unnecessary. Here's the output I get from the code in the main section of the post after replacing "x._data" by "x": From other array 0.0007055000000000256 From list 0.09001790000000004 From matlab 0.005489099999998359Malvasia
Matlab support notes that in R2022a, _data changed from a Python array to a C++ Matlab Data Array object and is much faster. They provide the .noncomplex and .real and .imag calls on this object to retrieve the underlying data in a 1-D format.Disconnect
H
16

Tim's answer is great for 2D arrays, but a way to adapt it to N dimensional arrays is to use the order parameter of np.reshape() :

np_x = np.array(x._data).reshape(x.size, order='F')

Hipparch answered 7/9, 2016 at 22:42 Comment(2)
I think this should be np_x = np.array(x._data).reshape(x.size, order='F').TInterlining
@RuslanShaydulin no because order='F' has been explicitly definedBurleigh

© 2022 - 2024 — McMap. All rights reserved.