I need a fast, reliable and memory-efficient key--value database for Linux. My keys are about 128 bytes, and the maximum value size can be 128K or 256K. The database subsystem shouldn't use more than about 1 MB of RAM. The total database size is 20G (!), but only a small random fraction of the data is accessed at a time. If necessary, I can move some data blobs out of the database (to regular files), so the size gets down to 2 GB maximum. The database must survive a system crash without any loss in recently unmodified data. I'll have about 100 times more reads than writes. It is a plus if it can use a block device (without a filesystem) as storage. I don't need client-server functionality, just a library. I need Python bindings (but I can implement them if they are not available).
Which solutions should I consider, and which one do you recommend?
Candidates I know of which could work:
- Tokyo Cabinet (Python bindings are pytc, see also pytc example code, supports hashes and B+trees, transaction log files and more, the size of the bucket array is fixed at database creation time; the writer must close the file to give others a chance; lots of small writes with reopening the file for each of them are very slow; the Tyrant server can help with the lots of small writes; speed comparison between Tokyo Cabinet, Tokyo Tyrant and Berkeley DB)
- VSDB (safe even on NFS, without locking; what about barriers?; updates are very slow, but not as slow as in cdb; last version in 2003)
- BerkeleyDB (provides crash recovery; provides transactions; the
bsddb
Python module provides bindings) - Samba's TDB (with transactions and Python bindings, some users experienced corruption, sometimes
mmap()
s the whole file, therepack
operation sometimes doubles the file size, produces mysterious failures if the database is larger than 2G (even on 64-bit systems), cluster implementation (CTDB) also available; file grows too large after lots of modifications; file becomes too slow after lots of hash contention; no built-in way to rebuild the file; very fast parallel updates by locking individual hash buckets) - aodbm (append-only so survives a system crash, with Python bindings)
- hamsterdb (with Python bindings)
- C-tree (mature, versatile commercial solution with high performance, has a free edition with reduced functionality)
- the old TDB (from 2001)
- bitcask (log-structured, written in Erlang)
- various other DBM implementations (such as GDBM, NDBM, QDBM,, Perl's SDBM or Ruby's; probably they don't have proper crash recovery)
I won't use these:
- MemcacheDB (client-server, uses BereleleyDB as a backend)
- cdb (needs to regenerate the whole database upon each write)
- http://www.wildsparx.com/apbcdb/ (ditto)
- Redis (keeps the whole database in memory)
- SQLite (it becomes very slow without periodic vacuuming, see autocompletion in the in the location bar in Firefox 3.0, even though versions 3.1 and later of sqlite allow
auto_vacuum
ing; beware: small writing transactions can be very slow; beware: if a busy process is doing many transactions, other processes starve, and they can never get the lock) - MongoDB (too heavy-weight, treats values as objects with internal structure)
- Firebird (SQL-based RDBMS, too heavy-weight)
FYI, a recent article about key--value databases in the Linux magazine.
FYI, an older software list
FYI, a speed comparison of MemcacheDB, Redis and Tokyo Cabinet Tyrant
Related questions on StackOverflow: