Choosing between Berkeley DB Core and Berkeley DB JE
Asked Answered
H

5

11

I'm designing a Java based web-app and I need a key-value store. Berkeley DB seems fitting enough for me, but there appears to be TWO Berkeley DBs to choose from: Berkeley DB Core which is implemented in C, and Berkeley DB Java Edition which is implemented in pure Java.

The question is, how to choose which one to use? With web-apps scalability and performance is quite important (who knows, maybe my idea will become the next Youtube), and I couldn't find easily any meaningful benchmarks between the two. I have yet to familiarize with Cores Java API, but I find it hard to believe that it could be much worse than Java Editions, which seems to be quite nice.

If some other key-value store would be much better, feel free to recommend that too. I'm storing smallish binary blobs, and keys probably will be hashes of the data, or some other unique id.

Hypermeter answered 7/4, 2010 at 15:13 Comment(0)
S
2

If you derive a common interface to these, and have a suitable set of unit tests, you should be able to swap between the two trivially at a later date (perhaps when you really need to make a decision based on hard facts that are not available right now)

Sverdlovsk answered 7/4, 2010 at 15:20 Comment(1)
Just a warning to this: The databases themselves will not be portable between the versions. If you go down this route you'll need a migration strategy for data itself if you find yourself wanting to swap implementations. For this reason, if portability in the data is important, you're better off going with Berkeley DB and the Java API over the Java Edition.Paraprofessional
O
12

I have quite a bit of experience using both BDB-JE and BDB-core with Java. Deciding which one to use is quite simple: If you want concurrency, use BDB-JE. If you want scalability, use BDB-core.

BDB-JE breaks down performance-wise with large databases due to its file format and its reliance on Java garbage collection to clean up evicted cache entries. Expect long garbage collection pauses or spend a lot of time tuning magic GC settings. The file format has issues too, because the background cleaner threads have to spend a lot of time cleaning up garbage created by early cache evictions. If your database fits in RAM, BDB-JE works quite well.

BDB-core relies on a page-locking strategy, and highly concurrent applications experience a lot of deadlocks. If you can randomly order operations it reduces the deadlock potential, but it never eliminates it. Because BDB-core stores data in a more traditional way, it scales to super large sizes with predictable and expected performance degradation. Because its cache is not managed by a garbage collector, it can be quite large and not cause any pauses.

Olympe answered 25/12, 2010 at 4:53 Comment(0)
S
2

If you derive a common interface to these, and have a suitable set of unit tests, you should be able to swap between the two trivially at a later date (perhaps when you really need to make a decision based on hard facts that are not available right now)

Sverdlovsk answered 7/4, 2010 at 15:20 Comment(1)
Just a warning to this: The databases themselves will not be portable between the versions. If you go down this route you'll need a migration strategy for data itself if you find yourself wanting to swap implementations. For this reason, if portability in the data is important, you're better off going with Berkeley DB and the Java API over the Java Edition.Paraprofessional
M
2

I faced the same problem and decided to go with the Java edition, mainly because of its portability(I need something that would ran even on mobile devices). There are also the Direct Persistence Layer (DPL) API and the fact that the whole db is a single jar makes its deployment fairly simple.

The recent version 4 brought in High availability and performance improvements. There is also the fact that long running java applications can achieve such an optimization, that they would surpass native C applications performance in some scenarios.

It's a natural fit for any Java application - desktop or web.

Marguerite answered 7/4, 2010 at 15:28 Comment(0)
E
2

I while ago I was having the same question, after doing some benchmarks I found that hash mode in the native edition is much faster and storage efficient than anything the java edition has to offer, so I decided to go with the native implementation.

I suggest you do your own benchmarks for the storage capacities you expect and decide if the Java edition is fast enough.

if it is, or if performance is not a big issue for you (it's critical for me), just go with the Java edition. otherwise go with the native one (assuming you see the same performance boost for your own use case).

btw: my benchmark was test the speed of querying random keys out of 20,000,000 records, where the key is a string and the value is an int (4 bytes). I saw that inserts (populating the benchmark) was much faster with the native version, and queries was twice as fast.

(This is not due to Java shortcoming but because the Java version is not of the same version as the native version - 4.0 vs 4.8 IIRC).

Exacerbate answered 7/4, 2010 at 16:8 Comment(0)
A
1

I decided to go with the Java Edition, simply because its possible to embed the database runtime within the same deployable. This was an important feature for my setup. I haven't benchmarked between core and JE, but I have seen great performance compared with other key-value stores that I tested when first evaluating database stores.

If you're creating a web-application though, then concurrency might be very important to you in the long run.

Afterdinner answered 22/6, 2011 at 11:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.