Which NoSQL Implementation is Most Appropriate?
Asked Answered
K

5

6

I'm new to NoSQL, and I'm scratching my head trying to figure out the most appropriate NoSQL implementation for the application I'm trying to build.

My Java application needs to have an in-memory hashmap containing millions to billions of entries as it models a single-layer neural network. Right now we're using Trove in order to be able to use primitives as keys and values to reduce the size of the map and increase the access speed. The map is a map of maps where the outer map's keys are longs and the inner maps have long/float key/values.

We need to be able to read the saved state from disk to the map of maps when the application starts up. The changes to the map of maps need also to be saved to disk either continuously or according to some scheduled interval.

I was at first drawn towards OrientDB because of their document and object DBs, although I'm still not sure at this point what would be better. Then I came across Redis, which is a key value store and works with an in-memory dataset that can be dumped to disk, including master-slave replication. However, it doesn't look like the values of the map can be anything other than Strings.

Am I looking in the right places for a solution to my needs? Right now, I like the in-memory and master-slave aspect of Redis, but I like the object/document capabilities of OrientDB as my data structures are more complicated than simple Strings and being able to use Trove with the primitive key/value types is very advantageous. It would be better if reading was cheap and writing was expensive rather than the other way around.

Thoughts?

Keratitis answered 20/2, 2013 at 14:4 Comment(0)
R
4

Why not just serialize the Trove data structures directly to disk? There appears to be some sort of support for that judging by the documentation (http://trove4j.sourceforge.net/javadocs/serialized-form.html), but it's hard to tell because it's all auto-generated cruft instead of lovingly-made tutorials. Still, for your use case it's not obvious why you need a proper database, so perhaps KISS applies.

Radioactivity answered 20/2, 2013 at 14:13 Comment(1)
Thanks, I like this answer. I overlooked that in the docs, and I'll be writing some test code to try it out now. This will probably end up being the best solution. The downside, is that I'll have to write my own persistence code, but in the end my application will be optimized. If I try to shoe-horn it in to a NoSQL framework I'll probably have to make ugly compromises.Keratitis
A
2

OrientDB has the most flexible engine with index, graph, transactions and complex documents as JSON. Why not?

Apetalous answered 20/2, 2013 at 18:35 Comment(0)
L
2

Check out Java-Chronicle. It's a low latency persistence library. I think you may find it offers excellent performance for this type of data.

Lakeshialakey answered 21/2, 2013 at 11:59 Comment(1)
This looks pretty impressive, especially the writing to disk speeds. Wow. The docs and examples are pretty sparse though, and I'm not sure how I could implement my map of maps with it.Keratitis
M
1

If you'd like to use Redis for this, you'd likely be best suited by using either ZSETs or HASHes as underlying structures (Redis supports structures, not just string values). Unless you need to fetch your parts of your maps based on the values/sorted order of the values, HASHes would probably be best (in terms of memory and speed).

So you would probably want to use a long -> {long:float, ...} . That is, longs mapping to long/float maps. You can then either fetch individual entries in the map with HGET, multiple entries with HMGET, or the full map with HGETALL. You can see the command reference http://redis.io/commands

On the space saving side of things, depending on the expected size of your HASHes, you may be able to tune them to use less space with limited/no negative effects on performance.

On the persistence side of things, you can either run Redis with snapshots or using incremental saving with append-only files. You can see the persistence documentation here: http://redis.io/topics/persistence

If you'd like to ask more pointed questions, you should head over to the mailing list https://groups.google.com/forum/?fromgroups=#!topic/redis-db/33ZYReULius

Mensurable answered 20/2, 2013 at 19:20 Comment(1)
Thanks for the great detailed answer. I'm beginning to see how Redis might actually work for this. In order to get this to work with my current Java application, I could use the Jedis project. It looks like Jedis would communicate with Redis via a port. I'd have to do some benchmarking to compare a pure-Java Trove implementation with a Jedis/Redis implementation to see what's better.Keratitis
L
1

Redis supports more complex data structures than simple strings such as lists, (sorted) sets or hashes which might come handy for your domain model. On the other your neural network can leverage from rich graph capabilities of OrientDB depending on it's strucuture.

Lyophilize answered 25/2, 2013 at 8:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.