BerkeleyDB Concurrency
Asked Answered
G

5

38
  • What's the optimal level of concurrency that the C++ implementation of BerkeleyDB can reasonably support?
  • How many threads can I have hammering away at the DB before throughput starts to suffer because of resource contention?

I've read the manual and know how to set the number of locks, lockers, database page size, etc. but I'd just like some advice from someone who has real-world experience with BDB concurrency.

My application is pretty simple, I'll be doing gets and puts of records that are about 1KB each. No cursors, no deleting.

Gothar answered 1/8, 2008 at 23:28 Comment(2)
@digito_evo "hammering" was not a typo. It's correct, "hammered" is not. A thread isn't something you hammer; the threads are what's doing the hammering, pounding on the database with constant requests from each thread. In the present tense, not past. Also, the question isn't about how to benchmark, so the [benchmarking] tag doesn't really apply. The top answer recommends benchmarking your use-case, but that's kind of different. IMO many performance-related question shouldn't actually be tagged [benchmarking].Slather
@PeterCordes Thanks for the correction. I've been fixing typos and grammar in some of the oldest questions here. Mistakes happen (occasionally).Purgative
N
15

It depends on what kind of application you are building. Create a representative test scenario, and start hammering away. Then you will know the definitive answer.

Besides your use case, it also depends on CPU, memory, front-side bus, operating system, cache settings, etcetera.

Seriously, just test your own scenario.

If you need some numbers (that actually may mean nothing in your scenario):

Nimwegen answered 3/8, 2008 at 12:34 Comment(1)
The latter paper also explicitly says that Effects of concurrency are not tested.Knave
B
8

I strongly agree with Daan's point: create a test program, and make sure the way in which it accesses data mimics as closely as possible the patterns you expect your application to have. This is extremely important with BDB because different access patterns yield very different throughput.

Other than that, these are general factors I found to be of major impact on throughput:

  1. Access method (which in your case i guess is BTREE).

  2. Level of persistency with which you configured DBD (for example, in my case the 'DB_TXN_WRITE_NOSYNC' environment flag improved write performance by an order of magnitude, but it compromises persistency)

  3. Does the working set fit in cache?

  4. Number of Reads Vs. Writes.

  5. How spread out your access is (remember that BTREE has a page level locking - so accessing different pages with different threads is a big advantage).

  6. Access pattern - meanig how likely are threads to lock one another, or even deadlock, and what is your deadlock resolution policy (this one may be a killer).

  7. Hardware (disk & memory for cache).

This amounts to the following point: Scaling a solution based on DBD so that it offers greater concurrency has two key ways of going about it; either minimize the number of locks in your design or add more hardware.

Bunche answered 13/10, 2008 at 21:59 Comment(0)
S
5

Doesn't this depend on the hardware as well as number of threads and stuff?

I would make a simple test and run it with increasing amounts of threads hammering and see what seems best.

Seer answered 2/8, 2008 at 18:21 Comment(0)
B
3

What I did when working against a database of unknown performance was to measure turnaround time on my queries. I kept upping the thread count until turn-around time dropped, and dropping the thread count until turn-around time improved (well, it was processes in my environment, but whatever).

There were moving averages and all sorts of metrics involved, but the take-away lesson was: just adapt to how things are working at the moment. You never know when the DBAs will improve performance or hardware will be upgraded, or perhaps another process will come along to load down the system while you're running. So adapt.

Oh, and another thing: avoid process switches if you can - batch things up.


Oh, I should make this clear: this all happened at run time, not during development.

Babylonia answered 4/8, 2008 at 7:45 Comment(0)
G
3

The way I understand things, Samba created tdb to allow "multiple concurrent writers" for any particular database file. So if your workload has multiple writers your performance may be bad (as in, the Samba project chose to write its own system, apparently because it wasn't happy with Berkeley DB's performance in this case).

On the other hand, if your workload has lots of readers, then the question is how well your operating system handles multiple readers.

Gabbert answered 16/9, 2008 at 17:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.