scipy.io.loadmat nested structures (i.e. dictionaries)
Asked Answered
E

6

43

Using the given routines (how to load Matlab .mat files with scipy), I could not access deeper nested structures to recover them into dictionaries

To present the problem I run into in more detail, I give the following toy example:

load scipy.io as spio
a = {'b':{'c':{'d': 3}}}
# my dictionary: a['b']['c']['d'] = 3
spio.savemat('xy.mat',a)

Now I want to read the mat-File back into python. I tried the following:

vig=spio.loadmat('xy.mat',squeeze_me=True)

If I now want to access the fields I get:

>> vig['b']
array(((array(3),),), dtype=[('c', '|O8')])
>> vig['b']['c']
array(array((3,), dtype=[('d', '|O8')]), dtype=object)
>> vig['b']['c']['d']
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

/<ipython console> in <module>()

ValueError: field named d not found.

However, by using the option struct_as_record=False the field could be accessed:

v=spio.loadmat('xy.mat',squeeze_me=True,struct_as_record=False)

Now it was possible to access it by

>> v['b'].c.d
array(3)
Eoin answered 10/8, 2011 at 9:32 Comment(1)
With the default settings, it is possible to dig down the nesting with an expression like; vig['b']['c'].item()['d'].item(), parsing a mix of structured arrays and object arrays. While `['b'] is dictionary indexing, the others are field name indexing.Congruity
E
67

Here are the functions, which reconstructs the dictionaries just use this loadmat instead of scipy.io's loadmat:

import scipy.io as spio

def loadmat(filename):
    '''
    this function should be called instead of direct spio.loadmat
    as it cures the problem of not properly recovering python dictionaries
    from mat files. It calls the function check keys to cure all entries
    which are still mat-objects
    '''
    data = spio.loadmat(filename, struct_as_record=False, squeeze_me=True)
    return _check_keys(data)

def _check_keys(dict):
    '''
    checks if entries in dictionary are mat-objects. If yes
    todict is called to change them to nested dictionaries
    '''
    for key in dict:
        if isinstance(dict[key], spio.matlab.mio5_params.mat_struct):
            dict[key] = _todict(dict[key])
    return dict        

def _todict(matobj):
    '''
    A recursive function which constructs from matobjects nested dictionaries
    '''
    dict = {}
    for strg in matobj._fieldnames:
        elem = matobj.__dict__[strg]
        if isinstance(elem, spio.matlab.mio5_params.mat_struct):
            dict[strg] = _todict(elem)
        else:
            dict[strg] = elem
    return dict
Eoin answered 12/1, 2012 at 8:47 Comment(8)
This needs to be advertised better. The current implementation of scipy's loadmat is a real pain to work with. Fantastic job!Perinephrium
Actually, @jpapon's method below is even better, and necessary when working with arrays like images.Perinephrium
Thank you very much! This is great!Massa
Up, up and up you ought to go! Please send this to Mathworks and tell them to get their act together.Hydrogenize
This is by far the best answer, but still not perfect because it squeezes 1-element dimensions. I probably have the unusual need of this fix + needing to keep 1-element dimensions.Brill
I hadn't even properly understood what the problem I was having was, but when I stumbled across this I instantly comprehended it. Now that is a good stackoverflow answer.Insolvent
This saved me sooo much time! Thanks a bunch!Claudianus
If you use scipy 1.5.0 or later, see @Pardhu's answer. In that case there is an inbuilt functionality.Ibbie
M
32

Just an enhancement to mergen's answer, which unfortunately will stop recursing if it reaches a cell array of objects. The following version will make lists of them instead, and continuing the recursion into the cell array elements if possible.

import scipy.io as spio
import numpy as np


def loadmat(filename):
    '''
    this function should be called instead of direct spio.loadmat
    as it cures the problem of not properly recovering python dictionaries
    from mat files. It calls the function check keys to cure all entries
    which are still mat-objects
    '''
    def _check_keys(d):
        '''
        checks if entries in dictionary are mat-objects. If yes
        todict is called to change them to nested dictionaries
        '''
        for key in d:
            if isinstance(d[key], spio.matlab.mat_struct):
                d[key] = _todict(d[key])
        return d

    def _todict(matobj):
        '''
        A recursive function which constructs from matobjects nested dictionaries
        '''
        d = {}
        for strg in matobj._fieldnames:
            elem = matobj.__dict__[strg]
            if isinstance(elem, spio.matlab.mat_struct):
                d[strg] = _todict(elem)
            elif isinstance(elem, np.ndarray):
                d[strg] = _tolist(elem)
            else:
                d[strg] = elem
        return d

    def _tolist(ndarray):
        '''
        A recursive function which constructs lists from cellarrays
        (which are loaded as numpy ndarrays), recursing into the elements
        if they contain matobjects.
        '''
        elem_list = []
        for sub_elem in ndarray:
            if isinstance(sub_elem, spio.matlab.mat_struct):
                elem_list.append(_todict(sub_elem))
            elif isinstance(sub_elem, np.ndarray):
                elem_list.append(_tolist(sub_elem))
            else:
                elem_list.append(sub_elem)
        return elem_list
    data = spio.loadmat(filename, struct_as_record=False, squeeze_me=True)
    return _check_keys(data)
Meagre answered 18/3, 2015 at 15:39 Comment(4)
Excellent job. It would be great if this could be incorporated into scipy.Perinephrium
This code converts a Matlab struct with fields that contain double arrays to a python dict with lists of lists of doubles, which may be the author's intention, but may not be what most people want. A better return value is a dict with ndarray as values.Bays
I've suggested an improved version that tests the array contents for structs before converting to an ndarray to a list.Bays
_tolist() is converting all ndarrays to lists... This may be ok for cell arrays but not for matrices. I had to deactivate it for my application reading arrays.Knoxville
L
9

As of scipy >= 1.5.0 this functionality now comes built-in using the simplify_cells argument.

from scipy.io import loadmat

mat_dict = loadmat(file_name, simplify_cells=True)
Lungfish answered 19/6, 2022 at 17:56 Comment(0)
G
4

I was advised on the scipy mailing list (https://mail.python.org/pipermail/scipy-user/) that there are two more ways to access this data.

This works:

import scipy.io as spio
vig=spio.loadmat('xy.mat')
print vig['b'][0, 0]['c'][0, 0]['d'][0, 0]

Output on my machine: 3

The reason for this kind of access: "For historic reasons, in Matlab everything is at least a 2D array, even scalars." So scipy.io.loadmat mimics Matlab behavior per default.

Grackle answered 27/2, 2017 at 10:25 Comment(1)
I had blindly stumbled upon the [0,0] thing myself having no idea why it was there, but l failed logically to extend it with cascaded [0,0]'s and so was quite stumped. So glad I found this page.Insolvent
E
2

Found a solution, one can access the content of the "scipy.io.matlab.mio5_params.mat_struct object" can be investigated via:

v['b'].__dict__['c'].__dict__['d']
Eoin answered 14/9, 2011 at 15:1 Comment(1)
what options did you use in loadmat ?Cocaine
G
1

Another method that works:

import scipy.io as spio
vig=spio.loadmat('xy.mat',squeeze_me=True)
print vig['b']['c'].item()['d']

Output:

3

I learned this method on the scipy mailing list, too. I certainly don't understand (yet) why '.item()' has to be added in, and:

print vig['b']['c']['d']

will throw an error instead:

IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

but I'll be back to supplement the explanation when I know it. Explanation of numpy.ndarray.item (from thenumpy reference): Copy an element of an array to a standard Python scalar and return it.

(Please notice that this answer is basically the same as the comment of hpaulj to the initial question, but I felt that the comment is not 'visible' or understandable enough. I certainly did not notice it when I searched for a solution for the first time, some weeks ago).

Grackle answered 27/2, 2017 at 10:41 Comment(1)
Reason why print vig['b']['c']['d'] does not work: vig['b']['c'] returns a numpy.void object, therefore python throws an error if you try to access items in it directly. The method item() returns the buffer object (numpy.org/doc/stable/reference/generated/…), and you then can access its content.Estabrook

© 2022 - 2024 — McMap. All rights reserved.