Suitable data storage backend for Erlang application when data doesn't fit memory
Asked Answered
C

5

4

I'm researching possible options how to organize data storage for an Erlang application. The data it supposed to use is basically a huge collection of binary blobs indexed by short string ids. Each blob is under 10 Kb but there are many of them. I'd expect that in total they would have size up to 200 Gb so obviously it cannot fit into memory. The typical operation on this data is either reading a blob by its id or updating a blob by its id or adding a new one. At each given period of day only a subset of ids is being used so the data storage access performance might benefit from in-memory cache. Speaking about performance - it is quite critical. The target is to have around 500 reads and 500 updates per second on commodity hardware (say on EC2 VM).

Any suggestions what to use here? As I understand dets is out of question as it is limited to 2G (or was it 4G?). Mnesia probably out of question too; my impression is that it was mainly designed for cases when data fits memory. I'm considering trying EDTK's Berkeley DB driver for the task. Would it work in the above scenario? Does anybody have experience using it in the production in the similar conditions?

Cardcarrying answered 13/11, 2008 at 15:25 Comment(0)
R
5

tcerl came out of facing the same size limit. I'm not using Erlang these days but it sounds like what you're looking for.

Rudderhead answered 25/11, 2008 at 17:22 Comment(1)
That's for reply except that it is a bit too late - I already playing with tcerl in my application :)Cardcarrying
E
1

Have you looked at what CouchDB is doing? It might not be quite what you are after as a drop in product, but there is lots of erlang code in there for storing data. There is also some talk of providing a native erlang interface instead of the REST api.

Excaudate answered 25/11, 2008 at 16:58 Comment(0)
L
1

Is there any reason why you can't just use a file system, treating filename as your string id and file contents as a binary blob? You can choose one (filesystem) that fits your performance requirements, and you should get caching basically for free, provided by your OS.

Landward answered 25/11, 2008 at 20:31 Comment(1)
Actually tried this and found this to be somewhat slower then tcerl based implementation. Though I didn't bother to tune filesystem and besides while it was slower then tcerl it was fast enough for my requirements at least in basic benchmarks.Cardcarrying
C
0

Mnesia can store data on disk just fine. There's also dets (disk based term storage) which is roughly analogous to Berkeley DB. It's in the standard lib: http://www.erlang.org/doc/apps/stdlib/index.html

Clo answered 17/11, 2008 at 18:34 Comment(2)
Dets is unusable in my project - quote from documentation: "The size of Dets files cannot exceed 2 GB". Mnesia is based on dets too so it inherits its restrictions. As as a workaround one can do partitioning but I suspect the performance will suffer. From my limited testing dets is rather slow.Cardcarrying
I'd guess the 2GB dets limit only exists on 32bit arch... Ask the erlang mailing list, probably better than here for erlang, anyhow.Clo
D
0

I would recommend Apache CouchDB.

It's a great fit for Erlang, and from the sound of it (you mention ID-based blobs and don't mention any relational requirements) you're looking for a document-oriented database.

Since the interface is REST, you can very simply add a commodity HTTP cache in front of it if you need caching.

The documentation for CouchDB is of a very high quality.

It also has built-in Map-Reduce :)

Desuetude answered 7/1, 2009 at 16:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.