Problem
Following up on this question, it seems that a file- or disk-based Map
implementation may be the right solution to the problems I mentioned there. Short version:
- Right now, I have a
Map
implemented as aConcurrentHashMap
. - Entries are added to it continually, at a fairly fixed rate. Details on this later.
- Eventually, no matter what, this means the JVM runs out of heap space.
At work, it was (strongly) suggested that I solve this problem using SQLite, but after asking that previous question, I don't think that a database is the right tool for this job. So - let me know if this sounds crazy - I think a better solution would be a Map
stored on disk.
Bad idea: implement this myself. Better idea: use someone else's library! Which one?
Requirements
Must-haves:
- Free.
- Persistent. The data needs to stick around between JVM restarts.
- Some sort of searchability. Yes, I need the ability to retrieve this darn data as well as put it away. Basic result set filtering is a plus.
- Platform-independent. Needs to be production-deployable on Windows or Linux machines.
- Purgeable. Disk space is finite, just like heap space. I need to get rid of entries that are
n
days old. It's not a big deal if I have to do this manually.
Nice-to-haves:
- Easy to use. It would be great if I could get this working by the end of the week.
Better still: the end of the day. It would be really, really great if I could add one JAR to my classpath, changenew ConcurrentHashMap<Foo, Bar>();
tonew SomeDiskStoredMap<Foo, Bar>();
and be done. - Decent scalability and performance. Worst case: new entries are added (on average) 3 times per second, every second, all day long, every day. However, inserts won't always happen that smoothly. It might be
(no inserts for an hour)
then(insert 10,000 objects at once)
.
Possible Solutions
- Ehcache? I've never used it before. It was a suggested solution to my previous question.
- Berkeley DB? Again, I've never used it, and I really don't know anything about it.
- Hadoop (and which subproject)? Haven't used it. Based on these docs, its cross-platform-readiness is ambiguous to me. I don't need distributed operation in the foreseeable future.
- A SQLite JDBC driver after all?
- ???
Ehcache and Berkeley DB both look reasonable right now. Any particular recommendations in either direction?
LinkedBlockingQueue
. – Agribusiness-Xmx512m
; this is a Java EE app so there's a lot else going on. The Map itself is about 128m when the OOME is thrown - after running for ~6 hours. That's with adding 1 entry/sec, not 3/sec. Even if I run this thing with a crap-ton of memory (I can't) I just won't be able to store as much data as I need to (at least a month's worth). Doing some basic math: after a month, adding 3 entries/sec (which is the worst-case rate), the Map would be ~43 gigabytes. – Osborn