How do I count and enumerate the keys in an lmdb with python?

Asked 9/9, 2015 at 21:55 Answered 11/1, 2021 at 8:58

import lmdb
env = lmdb.open(path_to_lmdb)

Now I seem to need to create a transaction and a cursor, but how do I get a list of keys that I can iterate over?

Pagination answered 9/9, 2015 at 21:55 Comment(1)

I spotted an extra parenthesis there. – Aquiline 9/9, 2015 at 21:58

A way to get the total number of keys without enumerating them individually, counting also all sub databases:

with env.begin() as txn:
    length = txn.stat()['entries']

Test result with a hand-made database of size 1000000 on my laptop:

the method above is instantaneous (0.0 s)
the iteration method takes about 1 second.

Hortense answered 3/5, 2016 at 23:48 Comment(0)

Are you looking for something like this:

with env.begin() as txn:
    with txn.cursor() as curs:
        # do stuff
        print 'key is:', curs.get('key')

Update:

This may not be the fastest:

with env.begin() as txn:
   myList = [ key for key, _ in txn.cursor() ]
   print(myList)

Disclaimer: I don't know anything about the library, just searched its docs and searched for key in the docs.

Aquiline answered 9/9, 2015 at 22:3 Comment(3)

No. I'm aware of the documentation page. I want to know how to get the total number of keys without enumerating them individually. I would also like to know the best (fastest) way to enumerate all the key value pairs. The method you mentioned seems to take quite a while for me, but it could have something to do with the size of my db (about 1m entries). – Pagination 9/9, 2015 at 22:16

@Pagination I updated my answer to get the list of keys, by iterating the cursor. There might be a faster way though. – Aquiline 9/9, 2015 at 22:30

Apart from the fact that it would take a long time to iterate through the keys, are there any other disadvantages to reading a list of keys? – Une 12/9, 2020 at 16:2

As Sait pointed out, you can iterate over a cursor to collect all keys. However, this may be a bit inefficient, as it would also load the values. This can be avoided, by using on the cursor.iternext() function with values=False.

with env.begin() as txn:
  keys = list(txn.cursor().iternext(values=False))

I did a short benchmark between both methods for a DB with 2^20 entries, each with a 16 B key and 1024 B value.

Retrieving keys by iterating over the cursor (including values) took 874 ms in average for 7 runs, while the second method, where only the keys are returned took 517 ms. These results may differ depending on the size of keys and values.

Delapaz answered 11/1, 2021 at 8:58 Comment(0)

Update:

Recommended topics

Hot tags