Caffe: Reading LMDB from Python
Asked Answered
J

2

17

I've extracted features using caffe, which generates a .mdb file. Then I'm trying to read it using Python and display it as a readable number.

import lmdb

lmdb_env = lmdb.open('caffefeat')
lmdb_txn = lmdb_env.begin()
lmdb_cursor = lmdb_txn.cursor()

for key, value in lmdb_cursor:
    print str(value)

This prints out a very long line of unreadable, broken characters.

Then I tried printing int(value), which returns the following:

ValueError: invalid literal for int() with base 10: '\x08\x80 \x10\x01\x18\x015\x8d\x80\xad?5'

float(value) gives the following:

ValueError: could not convert string to float:? 5????5

Is this a problem with the lmdb file itself, or does it have to do with conversion of data type?

Juarez answered 14/10, 2015 at 5:53 Comment(0)
J
34

Here's the working code I figured out

import caffe
import lmdb

lmdb_env = lmdb.open('directory_containing_mdb')
lmdb_txn = lmdb_env.begin()
lmdb_cursor = lmdb_txn.cursor()
datum = caffe.proto.caffe_pb2.Datum()

for key, value in lmdb_cursor:
    datum.ParseFromString(value)
    label = datum.label
    data = caffe.io.datum_to_array(datum)
    for l, d in zip(label, data):
            print l, d
Juarez answered 14/10, 2015 at 10:47 Comment(2)
I got error ValueError: cannot reshape array of size 29367 into shape (0,0,0). I am using python2 under anaconda2, and installed caffe using conda install caffeHorrocks
Can I provide only the path for the mdb file instad of its folder?Miscellany
G
17

If you have encoded images in lmdb, you'll probably see this error when using @ytrewq's code

ValueError: total size of new array must be unchanged

Use this function instead:

import caffe
import lmdb
import PIL.Image
from io import StringIO
import numpy as np

def read_lmdb(lmdb_file):
    cursor = lmdb.open(lmdb_file, readonly=True).begin().cursor()
    datum = caffe.proto.caffe_pb2.Datum()
    for _, value in cursor:
        datum.ParseFromString(value)
        s = StringIO()
        s.write(datum.data)
        s.seek(0)

        yield np.array(PIL.Image.open(s)), datum.label

Example:

lmdb_dir = '/save/jobs/20160613-125532-958f/train_db/'
for im, label in read_lmdb(lmdb_dir):
    print label, im
Geulincx answered 14/6, 2016 at 9:18 Comment(6)
Does this error you are solving here stem from lmdb created with encoded images?Haar
@Haar Yes, see the discussion hereGeulincx
Thank you for linking to the relevant thread. adds a proper context here. Can you please edit your answer to reflect it's relevance to encoded lmdbs? It is very good to state both the error message as well as the root cause: encoded images in lmdb. Thanks!Haar
Done ! Thank you for the adviceGeulincx
Tried running and got the error google.protobuf.message.DecodeError: Unexpected end-group tag. Any idea how to fix this?Observant
This answer saved me, and I got error ValueError: cannot reshape array of size 29367 into shape (0,0,0)Horrocks

© 2022 - 2024 — McMap. All rights reserved.