Which embedded DB written in Java for a simple key/value store? [closed]
Asked Answered
A

7

23

I recently asked a question about Neo4j, which I got working and which seems nice. It's embeddable and it's written in Java and there aren't (too) many dependencies.

However it's a graph DB and I don't know if it's a good idea or not to use it as a simply key/value store.

Basically I've got a big map, which in Java would look like this:

Map<Integer,Map<String,String>>

I've got a few tens of millions of entries in the main map and each entry contains itself a map of property/values. The "inner" map is relatively small: about 20 entries.

I need a way to persist that map from on run of the webapp to the other.

Using Neo4j, what I did is create one node for every ID (integer) and then put one property for each entry inside the inner map. From my early testing it seems to work but I'm not sure it's a good way to proceed.

Which embeddable DB, written in Java, would you use?

The requirements are:

  • written in Java

  • embeddable (so nothing too big)

  • not SQL (*)

  • open source

  • easy to backup (I need to be able to make "live" backups, while the server is running)

My terminology may be a bit wrong too, so feel free to help me / correct me. For my "map of maps", the best fit would be a key/value pair DB right?

I'm a bit lost as the difference between key/value pairs DB, Document DBs, big tables, graph DBs, etc.

I'd also like if it's a good idea to use a graph DB like Neo4J for my need (I think performance really ain't going to be an issue seen the relatively small amount of entries I'll have).

Of course I could simply persist my map of maps myself but I really don't want to reinvent any wheel here. I want to reuse a tried and tested DB...

(*) The reason I do not want SQL is that I'll always have this "map of maps" and that the inner map is going to constantly evolve, so I don't want something too structured.

Adamsun answered 19/3, 2012 at 14:32 Comment(3)
Is the inner map highly likely to be different in each entry of the main map, or will there be a substantial amount of overlap between the inner maps of the main map? There are a number of different routes you could take, but it really depends on the amount of referential replication within your structure.Ecstatic
@cdeszaq: thanks for your comment and help... The inner map shall mostly have the same number of properties and the same properties, but the value of each property is going to be a bit different. I'd say quite some overlap however I don't think performances is going to be that much of a concern: I'm more after something convenient/small/easy to backup. Would you think Neo4j would work here? I know there are several options: so many that I'm a bit lost : )Adamsun
@cdeszaq: I forgot to mention: the inner map's properties shall "evolve" during the lifetime of the app: new properties are going to be added (and old entries, that do not have these newer properties, shall default to a default value when queried for an inexistant property). So there's overlap but it's not "structured" in that there's no really any fixed schema (if I get my terminology right).Adamsun
A
15

There seem to be a couple of ports of Google's LevelDB into Java:

Then there is a whole list of embedded Java databases here:

Awning answered 19/3, 2012 at 14:57 Comment(2)
these are great links... Would there be any of them you'd recommend in my specific case?Adamsun
@CedricMartin I would definitely recommend LevelDB if you want a lightweight and extremely fast embedded database. To get the maximum performance from LevelDB, try to access your keys in sequential order by using the iterator rather than getting (Iterator.Seek vs DB.Get). LevelDB is very fast for random reads/writes, but it's highly optimized for sequential reads/writes. Furthermore, LevelDB is very resilient to failures and it has built in functions to repair the database if you can't open it.Awning
W
14

For your use case I would recommend MapDB (http://www.mapdb.org)

It matches your requirements:

  • written in Java
  • embeddable - single jar with no dependencies
  • not SQL - gives you maps that are persisted to disk
  • open source (Apache 2 licence)
  • easy to backup (few files)

and has other nice features like transactions, concurrency and performance.

Watchtower answered 1/7, 2014 at 10:39 Comment(0)
K
8

Chronicle-Map is a new nice player on this field.

  • It is off-heap residing (with ability for being persisted to disk by means of memory-mapped files) Map implementation
  • Super-fast -- sustains millions of queries/updates per second, i. e. each query has sub-microsecond latency on average
  • Supports concurrent updates (assumed to be a drop-in replacement of ConcurrentHashMap)
  • Special support of property maps you mentioned, if the set of properties is fixed within the collection -- allows to update specific properties of the value without any serialization/deserialization of the whole value (20 fields). This feature is called data value generation in Chronicle/Lang project.
  • And many more...
Kyser answered 6/12, 2014 at 8:18 Comment(0)
W
5

You could look into berkeley DB

http://docs.oracle.com/cd/E17277_02/html/GettingStartedGuide/index.html

It is quite efficient at dealing with big amount of data and it's key/value. I cannot really tell more about it since I'm discovering it myself but if you have time to take a look into it...

Wane answered 19/3, 2012 at 14:41 Comment(2)
I was going to answer you "yes, Berkeley is always nice but it's written in C" and then I realize there's nowadays a "Berkeley DB Java edition" entirely written in Java and open source... It may be interesting.Adamsun
It is pure Java ... problem is licence unfriendly to (small) software business.Drake
S
3

Checkout www.jsondb.io

This is a pure java, embeddable lightweight database that stores its data as files which makes it easy to backup

Salve answered 25/12, 2016 at 10:43 Comment(1)
Very cool ! Thanks for making it! Plus supporting XPath o yeah!!!!!!!!!!!!Marucci
H
2

Late to the part but you can use Tayzgrid. Its open source and its in-proc cache can be embedded in your application. Its basically an In Memory Data Grid or In Memory Key value store but it has also the capability you want i.e. to be a simple in process embedded key value store.

Henigman answered 8/9, 2015 at 12:38 Comment(0)
E
1

You could just stick with an XML or JSON file. Neither of those requires a schema and is fairly easy to go back and forth between disk and memory, especially if performance really doesn't matter too much. (eg. you only load configs every now and then)

The advantage is that XML and JSON are both very simple and deal with Maps pretty well.

You also have a much lighter dependency load on your application. An entire embedded DB-type system is pretty heavy if you are just persisting/un-persisting a big data structure when you need to and not using any of the query or similar capabilities most embedded solutions will add.

To pick off your requirements, it's built in to Java for the most part, easy to back up, since it's just a file, highly embed-able, very much Open Source, and not SQL. XML can be a bit verbose and unwieldy at times, but it's a well-known domain and has very rich tooling surrounding it so that you can deal with it external to your app if needed.

Ecstatic answered 19/3, 2012 at 15:16 Comment(5)
well I did consider XStream and I'm sure there are other ways to do that however it seems a bit "low-level'ish". I'd have to deal with failed "transactions", potentially inconsistent state should the power cord be removed while writing an XML file, etc. Moreover I'll have a few tens of millions entries (as I wrote in my question), so I'm not sure XML or JSON would be that "lightweight" in this case (I'd either need a lot of XML files or put several entries in the same file). XML or JSON is an option but I do have Neo4j running right now and it seems relatively lightweight.Adamsun
Yes, it is a bit low-level, but it doesn't require any other dependencies. And if you are worried about power failures mid-write and transactions, etc., I would question the use of just about any embedded solution. Lastly, if you have something that works, then just use that until you can show that you need something else. If Neo4j works and meets your needs, use it and move on to more important issues. Get it out the door first, then iterate once you have real feedback. Until then, you are just guessing.Ecstatic
"if you are worried about power failures mid-write and transactions, etc., I would question the use of just about any embedded solution"... Kinda. But surely some of them must have better protections against such events than others. Thing is: I know I'm guessing. It took me a few hours to get Neo4j up and running and I was wondering I could try to compare with. Oh well, I'll follow your advice and stay with Neo4j as of now, even though I do not need the "graph" feature. Next one I'll try if I encounter issue shall be "Berkeley DB Java edition" : )Adamsun
as an example of what is possible regarding failure using a tiny (one .jar) embeddable DB, here's what the Oracle sites says about "Berkeley DB Java": "Berkeley DB Java Edition stores data reliably and ensures data integrity. In the event of a system failure, Berkeley DB Java Edition will recover transactional data and reset the system to a functional and consistent state from log and database information." This is the kind of thing I'd like to benefit from without having to reinvent the wheel : )Adamsun
@Ecstatic most of the good embedded databases have some sort of recovery functionality in case of a failure. LevelDB, for example, is designed to be extremely resilient to failures, it has really good recovery functionality when there is a failure and it recovers with minimal data loss.Awning

© 2022 - 2024 — McMap. All rights reserved.