I am having a hard time storing hundreds of millions of key/value pairs of 16/32bytes with a hash array on my SSD.
With Kyoto Cabinet: When it works fine, it inserts at 70000 record/s. Once it drops, it goes down to 10-500 records/s. With the default settings, the drop happens after around a million records. Looking at the documentation, that is the default number of buckets in the array, so it makes sense. I increased this number to 25 millions and indeed, it works fine until around 25 millions records. Problem is, as soon as I push the number of buckets to 30 millions or over, the insert rate is down to 10-500 records/s from the beginning. Kyoto Cabinet is not designed to increase the number of bucket after the database is created, so I cannot insert more than 25 millions records.
1/ Why would KC's insert rate get very low once the bucket number exceeds 25M ?
With Berkeley DB: The best speed I got is slightly lower than KC, closer to 50000 record/s, but still ok. With the default settings, just like KC, the speed drops suddenly after around a million records. I know BDB is designed to extend gradually its number of buckets. Regardless of that, It tried to increase the initial number, playing with HashNumElements and FillFactor, but any of these attempts made the situation worst. So I still cannot insert over 1-2 millions records with DBD. I tried activating non-synchronized transactions, tried different rates of checkpoints, increased caches. Nothing improves the drop down.
2/ What could cause BDB's insert rate to drop after 1-2 million inserts ?
Note: I'm working with java, and when the speed is dropping, the CPU usage lowers to 0-30% while at 100% when working at correct speed.
Note: Stopping the process and resuming the insertion changes nothing. So I don't think that is related to memory limits or garbage collection.
Thx.
database.put(transaction,k,v)
, read withdatabase.get(transaction,k,v,LockMode.DEFAULT)
, and checkpoint every 500000 inserts withenvironment.checkpoint(null)
. – Covariance