I'm building a system for data acquisition. Acquired data typically consists of 15 signals, each sampled at (say) 500 Hz. That is, each second approx 15 x 500 x 4 bytes (signed float) will arrive and have to persisted.
The previous version was built on .NET (C#) using a DB4O db for data storage. This was fairly efficient and performed well.
The new version will be Linux-based, using Python (or maybe Erlang) and ... Yes! What is a suitable storage-candidate?
I'm thinking MongoDB, storing each sample (or actually a bunch of them) as BSON objects. Each sample (block) will have a sample counter as a key (indexed) field, as well as a signal source identification.
The catch is that I have to be able to retrieve samples pretty quickly. When requested, up to 30 seconds of data have to be retrieved in much less than a second, using a sample counter range and requested signal sources. The current (C#/DB4O) version manages this OK, retrieving data in much less than 100 ms.
I know that Python might not be ideal performance-wise, but we'll see about that later on.
The system ("server") will have multiple acquisition clients connected, so the architecture must scale well.
Edit: After further research I will probably go with HDF5 for sample data and either Couch or Mongo for more document-like information. I'll keep you posted.
Edit: The final solution was based on HDF5 and CouchDB. It performed just fine, implemented in Python, running on a Raspberry Pi.