Creating datasets for training with Caffe I both tried using HDF5 and LMDB. However, creating a LMDB is very slow even slower than HDF5. I am trying to write ~20,000 images.
Am I doing something terribly wrong? Is there something I am not aware of?
This is my code for LMDB creation:
DB_KEY_FORMAT = "{:0>10d}"
db = lmdb.open(path, map_size=int(1e12))
curr_idx = 0
commit_size = 1000
for curr_commit_idx in range(0, num_data, commit_size):
with in_db_data.begin(write=True) as in_txn:
for i in range(curr_commit_idx, min(curr_commit_idx + commit_size, num_data)):
d, l = data[i], labels[i]
im_dat = caffe.io.array_to_datum(d.astype(float), label=int(l))
key = DB_KEY_FORMAT.format(curr_idx)
in_txn.put(key, im_dat.SerializeToString())
curr_idx += 1
db.close()
As you can see I am creating a transaction for every 1,000 images, because I thought creating a transaction for each image would create an overhead, but it seems this doesn't influence performance too much.
convert_imageset
tool? – Chaseconvert_imageset
to woek on ilsvrc12 (imagenet) converting datasets of ~1M images, it takes a while but it works. – Chasedata
from? – Chase