How do I count and enumerate the keys in an lmdb with python?
Asked Answered
P

3

16
import lmdb
env = lmdb.open(path_to_lmdb)

Now I seem to need to create a transaction and a cursor, but how do I get a list of keys that I can iterate over?

Pagination answered 9/9, 2015 at 21:55 Comment(1)
I spotted an extra parenthesis there.Aquiline
H
16

A way to get the total number of keys without enumerating them individually, counting also all sub databases:

with env.begin() as txn:
    length = txn.stat()['entries']

Test result with a hand-made database of size 1000000 on my laptop:

  • the method above is instantaneous (0.0 s)
  • the iteration method takes about 1 second.
Hortense answered 3/5, 2016 at 23:48 Comment(0)
A
9

Are you looking for something like this:

with env.begin() as txn:
    with txn.cursor() as curs:
        # do stuff
        print 'key is:', curs.get('key')

Update:

This may not be the fastest:

with env.begin() as txn:
   myList = [ key for key, _ in txn.cursor() ]
   print(myList)

Disclaimer: I don't know anything about the library, just searched its docs and searched for key in the docs.

Aquiline answered 9/9, 2015 at 22:3 Comment(3)
No. I'm aware of the documentation page. I want to know how to get the total number of keys without enumerating them individually. I would also like to know the best (fastest) way to enumerate all the key value pairs. The method you mentioned seems to take quite a while for me, but it could have something to do with the size of my db (about 1m entries).Pagination
@Pagination I updated my answer to get the list of keys, by iterating the cursor. There might be a faster way though.Aquiline
Apart from the fact that it would take a long time to iterate through the keys, are there any other disadvantages to reading a list of keys?Une
D
5

As Sait pointed out, you can iterate over a cursor to collect all keys. However, this may be a bit inefficient, as it would also load the values. This can be avoided, by using on the cursor.iternext() function with values=False.

with env.begin() as txn:
  keys = list(txn.cursor().iternext(values=False))

I did a short benchmark between both methods for a DB with 2^20 entries, each with a 16 B key and 1024 B value.

Retrieving keys by iterating over the cursor (including values) took 874 ms in average for 7 runs, while the second method, where only the keys are returned took 517 ms. These results may differ depending on the size of keys and values.

Delapaz answered 11/1, 2021 at 8:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.