I have around 1 million images to put in this dataset 10000 at a time appended to the set.
I"m sure the map_size is wrong with ref from this article
used this line to create the set
env = lmdb.open(Path+'mylmdb', map_size=int(1e12)
use this line every 10000 sample to write data to file where X and Y are placeholders for the data to be put in the LMDB.
env = create(env, X[:counter,:,:,:],Y,counter)
def create(env, X,Y,N):
with env.begin(write=True) as txn:
# txn is a Transaction object
for i in range(N):
datum = caffe.proto.caffe_pb2.Datum()
datum.channels = X.shape[1]
datum.height = X.shape[2]
datum.width = X.shape[3]
datum.data = X[i].tostring() # or .tostring() if numpy < 1.9
datum.label = int(Y[i])
str_id = '{:08}'.format(i)
# The encode is only essential in Python 3
txn.put(str_id.encode('ascii'), datum.SerializeToString())
#pdb.set_trace()
return env
How can I edit this code such that new data is added to this LMDB and not replaced as this present method replaces it in the same position. I have check the length after generation with the env.stat().
str_id = '{:08}'.format(i)
bystr_id = '{:08}'.format(existing_length + 1 + i)
? – Katydid