Complex Matlab struct mat file read by python
Asked Answered
B

2

5

I know the version issues of mat files which correspond to different loading modules in python, namely scipy.io and h5py. I also searched a lot of similar problems like scipy.io.loadmat nested structures (i.e. dictionaries) and How to preserve Matlab struct when accessing in python?. But they both fail when it comes to more complex mat files. My anno_bbox.mat file structure is shown as follows:

The first two level:

anno_bbox

bbox_test

In the size:

size

In the hoi:

hoi

In the hoi bboxhuman:

bboxhuman

When I use spio.loadmat('anno_bbox.mat', struct_as_record=False, squeeze_me=True), it could only get the first level information as a dictionary.

>>> anno_bbox.keys()
dict_keys(['__header__', '__version__', '__globals__', 'bbox_test', 
'bbox_train', 'list_action'])
>>> bbox_test = anno_bbox['bbox_test']
>>> bbox_test.keys()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'numpy.ndarray' object has no attribute 'keys'
>>> bbox_test
array([<scipy.io.matlab.mio5_params.mat_struct object at 0x7fa8660ab128>,
   <scipy.io.matlab.mio5_params.mat_struct object at 0x7fa8660ab2b0>,
   <scipy.io.matlab.mio5_params.mat_struct object at 0x7fa8660ab710>,
   ...,
   <scipy.io.matlab.mio5_params.mat_struct object at 0x7fa8622ec4a8>,
   <scipy.io.matlab.mio5_params.mat_struct object at 0x7fa8622ecb00>,
   <scipy.io.matlab.mio5_params.mat_struct object at 0x7fa8622f1198>], dtype=object)

I don't know what to do next. It is too complicated for me. The file is available at anno_bbox.mat (8.7MB)

Britton answered 25/2, 2018 at 6:15 Comment(0)
Y
6

I get (working from the shared file is a good idea on this case):

Loading with:

data = io.loadmat('../Downloads/anno_bbox.mat')

I get:

In [96]: data['bbox_test'].dtype
Out[96]: dtype([('filename', 'O'), ('size', 'O'), ('hoi', 'O')])
In [97]: data['bbox_test'].shape
Out[97]: (1, 9658)

I could have assigned bbox_test=data['bbox_test']. This variable has 9658 records, with three fields, each with object dtype.

So there's a filename (a string embedded in a 1 element array)

In [101]: data['bbox_test'][0,0]['filename']
Out[101]: array(['HICO_test2015_00000001.jpg'], dtype='<U26')

size has 3 fields, with 3 numbers embedded in arrays (2d matlab matrices):

In [102]: data['bbox_test'][0,0]['size']
Out[102]: 
array([[(array([[640]], dtype=uint16), array([[427]], dtype=uint16), array([[3]], dtype=uint8))]],
      dtype=[('width', 'O'), ('height', 'O'), ('depth', 'O')])
In [112]: data['bbox_test'][0,0]['size'][0,0].item()
Out[112]: 
(array([[640]], dtype=uint16),
 array([[427]], dtype=uint16),
 array([[3]], dtype=uint8))

hoi is more complicated:

In [103]: data['bbox_test'][0,0]['hoi']
Out[103]: 
array([[(array([[246]], dtype=uint8), array([[(array([[320]], dtype=uint16), array([[359]], dtype=uint16), array([[306]], dtype=uint16), array([[349]], dtype=uint16)),...
      dtype=[('id', 'O'), ('bboxhuman', 'O'), ('bboxobject', 'O'), ('connection', 'O'), ('invis', 'O')])


In [126]: data['bbox_test'][0,1]['hoi']['id']
Out[126]: 
array([[array([[132]], dtype=uint8), array([[140]], dtype=uint8),
        array([[144]], dtype=uint8)]], dtype=object)
In [130]: data['bbox_test'][0,1]['hoi']['bboxhuman'][0,0]
Out[130]: 
array([[(array([[226]], dtype=uint8), array([[340]], dtype=uint16), array([[18]], dtype=uint8), array([[210]], dtype=uint8))]],
      dtype=[('x1', 'O'), ('x2', 'O'), ('y1', 'O'), ('y2', 'O')])

So the data that you show in the MATLAB structures is all there, in a nested structure of arrays (often 2d (1,1) shape), object dtype or multiple fields.

Going back and loading with squeeze_me I get a simpler:

In [133]: data['bbox_test'][1]['hoi']['bboxhuman']
Out[133]: 
array([array((226, 340, 18, 210),
      dtype=[('x1', 'O'), ('x2', 'O'), ('y1', 'O'), ('y2', 'O')]),
       array((230, 356, 19, 212),
      dtype=[('x1', 'O'), ('x2', 'O'), ('y1', 'O'), ('y2', 'O')]),
       array((234, 342, 13, 202),
      dtype=[('x1', 'O'), ('x2', 'O'), ('y1', 'O'), ('y2', 'O')])],
      dtype=object)

With struct_as_record='False', I get

In [136]: data['bbox_test'][1]
Out[136]: <scipy.io.matlab.mio5_params.mat_struct at 0x7f90841e9748>

Looking at the attributes of this rec I see I can access 'fields' by attribute name:

In [137]: rec = data['bbox_test'][1]
In [138]: rec.filename
Out[138]: 'HICO_test2015_00000002.jpg'
In [139]: rec.size
Out[139]: <scipy.io.matlab.mio5_params.mat_struct at 0x7f90841e9b38>

In [141]: rec.size.width
Out[141]: 640
In [142]: rec.hoi
Out[142]: 
array([<scipy.io.matlab.mio5_params.mat_struct object at 0x7f90841e9be0>,
       <scipy.io.matlab.mio5_params.mat_struct object at 0x7f90841e9e10>,
       <scipy.io.matlab.mio5_params.mat_struct object at 0x7f90841ee0b8>],
      dtype=object)

In [145]: rec.hoi[1].bboxhuman
Out[145]: <scipy.io.matlab.mio5_params.mat_struct at 0x7f90841e9f98>
In [146]: rec.hoi[1].bboxhuman.x1
Out[146]: 230

In [147]: vars(rec.hoi[1].bboxhuman)
Out[147]: 
{'_fieldnames': ['x1', 'x2', 'y1', 'y2'],
 'x1': 230,
 'x2': 356,
 'y1': 19,
 'y2': 212}

and so on.

Yeorgi answered 25/2, 2018 at 9:10 Comment(0)
D
4

I've made changes to the answer at: https://mcmap.net/q/378550/-scipy-io-loadmat-nested-structures-i-e-dictionaries

from scipy.io import loadmat, matlab
def load_mat(filename):
    """
    This function should be called instead of direct scipy.io.loadmat
    as it cures the problem of not properly recovering python dictionaries
    from mat files. It calls the function check keys to cure all entries
    which are still mat-objects
    """

    def _check_vars(d):
        """
        Checks if entries in dictionary are mat-objects. If yes
        todict is called to change them to nested dictionaries
        """
        for key in d:
            if isinstance(d[key], matlab.mio5_params.mat_struct):
                d[key] = _todict(d[key])
            elif isinstance(d[key], np.ndarray):
                d[key] = _toarray(d[key])
        return d

    def _todict(matobj):
        """
        A recursive function which constructs from matobjects nested dictionaries
        """
        d = {}
        for strg in matobj._fieldnames:
            elem = matobj.__dict__[strg]
            if isinstance(elem, matlab.mio5_params.mat_struct):
                d[strg] = _todict(elem)
            elif isinstance(elem, np.ndarray):
                d[strg] = _toarray(elem)
            else:
                d[strg] = elem
        return d

    def _toarray(ndarray):
        """
        A recursive function which constructs ndarray from cellarrays
        (which are loaded as numpy ndarrays), recursing into the elements
        if they contain matobjects.
        """
        if ndarray.dtype != 'float64':
            elem_list = []
            for sub_elem in ndarray:
                if isinstance(sub_elem, matlab.mio5_params.mat_struct):
                    elem_list.append(_todict(sub_elem))
                elif isinstance(sub_elem, np.ndarray):
                    elem_list.append(_toarray(sub_elem))
                else:
                    elem_list.append(sub_elem)
            return np.array(elem_list)
        else:
            return ndarray

    data = loadmat(filename, struct_as_record=False, squeeze_me=True)
    return _check_vars(data)

it works to go through the variables if it's a matrix/cells with structs, and also make it faster by not going through matrixes that doesn't have structs.

Devonadevondra answered 23/2, 2020 at 16:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.