how to read Mat v7.3 files in python ?
Asked Answered
F

3

6

I am trying to read the mat file given in the following website, ufldl.stanford.edu/housenumbers, in the file train.tar.gz, there is a mat file named digitStruct.mat.

when i used scipy.io to read the mat file, it alerts me with the message ' please use hdf reader for matlab v7.3 files '.

the original matlab file is provided as below

load digitStruct.mat
for i = 1:length(digitStruct)
    im = imread([digitStruct(i).name]);
    for j = 1:length(digitStruct(i).bbox)
        [height, width] = size(im);
        aa = max(digitStruct(i).bbox(j).top+1,1);
        bb = min(digitStruct(i).bbox(j).top+digitStruct(i).bbox(j).height, height);
        cc = max(digitStruct(i).bbox(j).left+1,1);
        dd = min(digitStruct(i).bbox(j).left+digitStruct(i).bbox(j).width, width);

        imshow(im(aa:bb, cc:dd, :));
        fprintf('%d\n',digitStruct(i).bbox(j).label );
        pause;
    end
end

as shown above, the mat file has the key 'digitStruct', and within 'digitStruct', key 'name' and 'bbox' can be found, I used h5py API to read the file.

import h5py
f = h5py.File('train.mat')
print len( f['digitStruct']['name'] ), len(f['digitStruct']['bbox']   )

I can read the array, however when I loop though the array, how can I read each item?

for i in f['digitStruct']['name']:
    print i # only print out the HDF5 ref
Friedly answered 3/11, 2015 at 21:12 Comment(1)
Hey, did you got solution? I also having same issue. ThanksAlginate
O
4

Writing in Matlab:

test = {'Hello', 'world!'; 'Good', 'morning'; 'See', 'you!'};
save('data.mat', 'test', '-v7.3') % v7.3 so that it is readable by h5py

enter image description here

Reading in Python (works for any number or rows or columns, but assumes that each cell is a string):

import h5py
import numpy as np

data = []
with h5py.File("data.mat") as f:
    for column in f['test']:
        row_data = []
        for row_number in range(len(column)):            
            row_data.append(''.join(map(unichr, f[column[row_number]][:])))   
        data.append(row_data)

print data
print np.transpose(data)

Output:

[[u'Hello', u'Good', u'See'], [u'world!', u'morning', u'you!']]

[[u'Hello' u'world!']
 [u'Good' u'morning']
 [u'See' u'you!']]
Ought answered 3/11, 2015 at 21:15 Comment(0)
S
1
import numpy as np
import cPickle as pickle
import h5py

f = h5py.File('train/digitStruct.mat')

metadata= {}
metadata['height'] = []
metadata['label'] = []
metadata['left'] = []
metadata['top'] = []
metadata['width'] = []

def print_attrs(name, obj):
    vals = []
        if obj.shape[0] == 1:
            vals.append(int(obj[0][0]))
        else:
            for k in range(obj.shape[0]):
                vals.append(int(f[obj[k][0]][0][0]))
        metadata[name].append(vals)

for item in f['/digitStruct/bbox']:
    f[item[0]].visititems(print_attrs)

with open('train_metadata.pickle','wb') as pf:
  pickle.dump(metadata, pf, pickle.HIGHEST_PROTOCOL)    

I modified it from https://discussions.udacity.com/t/how-to-deal-with-mat-files/160657/3. Honestly, I can't figure out exactly what's going on with visititmes(). The HDF5 file is just too hierarchy and too abstract.

This metadata is a dictionary. The content of each key is an embedded array. The array has 33402 items, which correspond to the png file with the name in order. Each item is an array with length from 1~6. I count the numbers of different digits, which is 5137, 18130, 8691, 1434, 9,1.

What surprises me is that the pickle file is only 9 MB, more than 20 times smaller than the mat file. I guess the HDS file sacrifice storing space for hierarchy structure.

Note: I have converted the values to integers, for the sake of slicing images. Now the train_metadata.pickle file has a size of only 2 MB, 100 times than the mat file.

Stereography answered 31/1, 2017 at 22:34 Comment(0)
H
0

I've created a small library to load MATLAB 7.3 files:

pip install mat73

To load a .mat 7.3 into Python as a dictionary:

import mat73
data_dict = mat73.loadmat('data.mat')

simple as that!

Haddix answered 18/7 at 9:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.