Need a distributed key-value lookup system
Asked Answered
C

10

16

I need a way to do key-value lookups across (potentially) hundreds of GB of data. Ideally something based on a distributed hashtable, that works nicely with Java. It should be fault-tolerant, and open source.

The store should be persistent, but would ideally cache data in memory to speed things up.

It should be able to support concurrent reads and writes from multiple machines (reads will be 100X more common though). Basically the purpose is to do a quick initial lookup of user metadata for a web-service.

Can anyone recommend anything?

Chary answered 13/10, 2008 at 15:33 Comment(3)
What are you optimizing for? For example, read throughput (concurrent reads from multiple machines), fault tolerance in the face of machines becoming not available, low number of machines... Do you also need writes?Ticket
Thanks, I've edited the question with this information.Chary
How do you want your data distributed? Should all of the data be available to/on/from every node or not? In the first case the next question is "why the distributed lookup?".Ticket
S
12

You might want to check out Hazelcast. It is distributed/partitioned, super lite, easy and free.

java.util.Map map = Hazelcast.getMap ("mymap");
map.put ("key1", "value1");

Regards,

-talip

Sweitzer answered 29/10, 2008 at 17:10 Comment(0)
D
8

Open Chord is an implementation of the CHORD protocol in Java. It is a distributed hash table protocol that should fit your needs perfectly.

Derouen answered 13/10, 2008 at 15:40 Comment(0)
H
2

Depending on the use case, Terracotta may be just what you need.

Hoogh answered 15/10, 2008 at 1:11 Comment(0)
C
1

You should probably specify if it needs to be persistent or not, in memory or not, etc. You could try: http://www.danga.com/memcached/

Cram answered 13/10, 2008 at 15:37 Comment(2)
Thanks, I've added a note that it needs to be persistent, which I think rules out memcached.Chary
memcached was also my first thought, but "hundredths of GBs" is a bit too much for RAMCondyloma
A
0

Distributed hash tables include Tapestry, Chord, and Pastry. One of these should suit your needs.

Allowable answered 13/10, 2008 at 15:51 Comment(0)
C
0

OpenChord sounds promising; but i'd also consider BDB, or any other non-SQL hashtable, making it distributed can be dead-easy (if the number of storage nodes is (almost) constant, at least), just hash the key on the client to get the appropriate server.

Condyloma answered 13/10, 2008 at 15:55 Comment(0)
M
0

Open Source Cache Solutions in Java

Oracle Coherence (used to be Tangosol)

JCache JSR

Munafo answered 14/10, 2008 at 2:30 Comment(0)
D
0

nmdb sounds like its exactly what you need. Distributed, in memory cache, with a persistent on-disk storage. Current back-ends include qdbm, berkeley db, and (recently added after a quick email to the developer) tokyo cabinet. key/value size is limited though, but I believe that can be lifted if you don't need TICP support.

Diaspore answered 23/10, 2008 at 8:32 Comment(0)
B
0

Try distributed Map structure from Redisson, it based on Redis server. Using Redis cluster configuration you may split data across 1000 servers.

Usage example:

Redisson redisson = Redisson.create();

ConcurrentMap<String, SomeObject> map = redisson.getMap("anyMap");
map.put("123", new SomeObject());
map.putIfAbsent("323", new SomeObject());
map.remove("123");

...

redisson.shutdown();
Booklover answered 12/1, 2014 at 10:32 Comment(0)
O
-1

DNS has the capability to do this, I don't know how large each one of your records is (8GB of tons of small data?), but it may work.

Oleaster answered 13/10, 2008 at 15:37 Comment(1)
DNS assumes a hierarchical data structure, I'm afraid it won't do what I need.Chary

© 2022 - 2024 — McMap. All rights reserved.