how to append data to existing LMDB?
Asked Answered
I

1

7

I have around 1 million images to put in this dataset 10000 at a time appended to the set.

I"m sure the map_size is wrong with ref from this article

used this line to create the set

env = lmdb.open(Path+'mylmdb', map_size=int(1e12)

use this line every 10000 sample to write data to file where X and Y are placeholders for the data to be put in the LMDB.

env = create(env, X[:counter,:,:,:],Y,counter)


def create(env, X,Y,N):
    with env.begin(write=True) as txn:
        # txn is a Transaction object
        for i in range(N):
            datum = caffe.proto.caffe_pb2.Datum()
            datum.channels = X.shape[1]
            datum.height = X.shape[2]
            datum.width = X.shape[3]
            datum.data = X[i].tostring()  # or .tostring() if numpy < 1.9
            datum.label = int(Y[i])
            str_id = '{:08}'.format(i)

            # The encode is only essential in Python 3
            txn.put(str_id.encode('ascii'), datum.SerializeToString())
        #pdb.set_trace()
    return env

How can I edit this code such that new data is added to this LMDB and not replaced as this present method replaces it in the same position. I have check the length after generation with the env.stat().

Iyar answered 16/1, 2016 at 0:32 Comment(2)
If you know the length and know that all existing records have ids less than the length, why can't you replace the line str_id = '{:08}'.format(i) by str_id = '{:08}'.format(existing_length + 1 + i)?Katydid
Thanks you this worked :) @SudeepJuvekarIyar
K
5

Le me expand on my comment above.

All entries in LMDB are stored according to unique keys and your database already contains keys for i = 0, 1, 2, .... You need a way to find unique keys for each i. The simplest way to do that is to find the largest key in existing DB and keep adding to it.

Assuming that existing keys are consecutive,

max_key = env.stat()["entries"]

Otherwise, a more thorough approach is to iterate over all keys. (Check this.)

max_key = 0
for key, value in env.cursor():
    max_key = max(max_key, key)

Finally, simply replace line 7 of your for loop,

str_id = '{:08}'.format(i)

by

str_id = '{:08}'.format(max_key + 1 + i)

to append to the existing database.

Katydid answered 19/1, 2016 at 10:34 Comment(1)
As the keys are sorted, why not use last() then key() to find the largest key?Attrahent

© 2022 - 2024 — McMap. All rights reserved.