Looking for a drop-in replacement for a java.util.Map

Asked 18/1, 2011 at 16:23 Answered 19/1, 2011 at 1:18

Solved java caching hadoop ehcache berkeley-db

Problem

Following up on this question, it seems that a file- or disk-based Map implementation may be the right solution to the problems I mentioned there. Short version:

Right now, I have a Map implemented as a ConcurrentHashMap.
Entries are added to it continually, at a fairly fixed rate. Details on this later.
Eventually, no matter what, this means the JVM runs out of heap space.

At work, it was (strongly) suggested that I solve this problem using SQLite, but after asking that previous question, I don't think that a database is the right tool for this job. So - let me know if this sounds crazy - I think a better solution would be a Map stored on disk.

Bad idea: implement this myself. Better idea: use someone else's library! Which one?

Requirements

Must-haves:

Free.
Persistent. The data needs to stick around between JVM restarts.
Some sort of searchability. Yes, I need the ability to retrieve this darn data as well as put it away. Basic result set filtering is a plus.
Platform-independent. Needs to be production-deployable on Windows or Linux machines.
Purgeable. Disk space is finite, just like heap space. I need to get rid of entries that are n days old. It's not a big deal if I have to do this manually.

Nice-to-haves:

Easy to use. It would be great if I could get this working by the end of the week.
Better still: the end of the day. It would be really, really great if I could add one JAR to my classpath, change new ConcurrentHashMap<Foo, Bar>(); to new SomeDiskStoredMap<Foo, Bar>();
and be done.
Decent scalability and performance. Worst case: new entries are added (on average) 3 times per second, every second, all day long, every day. However, inserts won't always happen that smoothly. It might be (no inserts for an hour) then (insert 10,000 objects at once).

Possible Solutions

Ehcache? I've never used it before. It was a suggested solution to my previous question.
Berkeley DB? Again, I've never used it, and I really don't know anything about it.
Hadoop (and which subproject)? Haven't used it. Based on these docs, its cross-platform-readiness is ambiguous to me. I don't need distributed operation in the foreseeable future.
A SQLite JDBC driver after all?
???

Ehcache and Berkeley DB both look reasonable right now. Any particular recommendations in either direction?

Osborn answered 18/1, 2011 at 16:23 Comment(7)

Free as in speech or just free as in beer? – Nashville 18/1, 2011 at 16:28

I would be surprised there is nothing you can do about the Map filling to the point of an OutOfMemoryError. How much data do you have and how much memory do you have? – Platform 18/1, 2011 at 16:32

@Scott: free as in beer is fine. – Osborn 18/1, 2011 at 16:32

When I asked a version of this question, the suggestions were ehcache, hadoop, a real DB, and roll-your-own subclass of LinkedBlockingQueue. – Agribusiness 18/1, 2011 at 16:36

@Peter: I'm running with -Xmx512m; this is a Java EE app so there's a lot else going on. The Map itself is about 128m when the OOME is thrown - after running for ~6 hours. That's with adding 1 entry/sec, not 3/sec. Even if I run this thing with a crap-ton of memory (I can't) I just won't be able to store as much data as I need to (at least a month's worth). Doing some basic math: after a month, adding 3 entries/sec (which is the worst-case rate), the Map would be ~43 gigabytes. – Osborn 18/1, 2011 at 16:42

@Matt Ball, use a database, which can do the simple maths, then take those results and do any complex bits in Java. – Mongeau 18/1, 2011 at 16:48

@orangepips: if I'm going to use a database, it would probably be SQLite, in which case I'm back at my previous question. Any suggestions there? It really doesn't seem like the right way to do this - please convince me. – Osborn 18/1, 2011 at 16:59

UPDATE (some 4 years after first post...): beware that in newer versions of ehcache, persistence of cache items is available only in the pay product. Thanks @boday for pointing this out.

ehcache is great. It will give you the flexibility you need to implement the map in memory, disk or memory with spillover to disk. If you use this very simple wrapper for java.util.Map then using it is blindingly simple:

import java.util.Collection;
import java.util.List;
import java.util.Map;
import java.util.Set;

import net.sf.ehcache.Cache;
import net.sf.ehcache.Element;

import org.apache.log4j.Logger;

import com.google.common.collect.Sets;

public class EhCacheMapAdapter<K,V> implements Map<K,V> {
    @SuppressWarnings("unused")
    private final static Logger logger = Logger
            .getLogger(EhCacheMapAdapter.class);

    public Cache ehCache;

    public EhCacheMapAdapter(Cache ehCache) {
        super();
        this.ehCache = ehCache;
    } // end constructor

    @Override
    public void clear() {
        ehCache.removeAll();
    } // end method

    @Override
    public boolean containsKey(Object key) {
        return ehCache.isKeyInCache(key);
    } // end method

    @Override
    public boolean containsValue(Object value) {
        return ehCache.isValueInCache(value);
    } // end method

    @Override
    public Set<Entry<K, V>> entrySet() {
        throw new UnsupportedOperationException();
    } // end method

    @SuppressWarnings("unchecked")
    @Override
    public V get(Object key) {
        if( key == null ) return null;
        Element element = ehCache.get(key);
        if( element == null ) return null;
        return (V)element.getObjectValue();
    } // end method

    @Override
    public boolean isEmpty() {
        return ehCache.getSize() == 0;
    } // end method

    @SuppressWarnings("unchecked")
    @Override
    public Set<K> keySet() {
        List<K> l = ehCache.getKeys();
        return Sets.newHashSet(l);
    } // end method

    @SuppressWarnings("unchecked")
    @Override
    public V put(K key, V value) {
        Object o = this.get(key);
        if( o != null ) return (V)o;
        Element e = new Element(key,value);
        ehCache.put(e);
        return null;
    } // end method


    @Override
    public V remove(Object key) {
        V retObj = null;
        if( this.containsKey(key) ) {
            retObj = this.get(key);
        } // end if
        ehCache.remove(key);
        return retObj;
    } // end method

    @Override
    public int size() {
        return ehCache.getSize();
    } // end method

    @Override
    public Collection<V> values() {
        throw new UnsupportedOperationException();
    } // end method

    @Override
    public void putAll(Map<? extends K, ? extends V> m) {
        for( K key : m.keySet() ) {
            this.put(key, m.get(key));
        } // end for
    } // end method
} // end class

Substituent answered 18/1, 2011 at 18:41 Comment(8)

Yup, I just came across this very recipe and I'm working on getting ehcache set up right now. – Osborn 18/1, 2011 at 18:43

Yeah, but mine is a drop-in replacement for Map. Which is what you asked for. ;-) – Substituent 18/1, 2011 at 19:9

Indeed it is. Any idea where the best place to put ehcache.xml is, in a Java EE app (an EAR)? – Osborn 18/1, 2011 at 19:16

Nope I'm a Spring fan. It has EhCacheFactoryBean which can be useful. – Substituent 18/1, 2011 at 20:37

I'm going with Ehcache for now. Minor config details aside, this has been pretty painless. As best I can tell, it's satisfied every single one of my requirements, aside from searching, which is coming in 2.4 - I'll play with that tomorrow. Thank you. – Osborn 19/1, 2011 at 0:0

I think your isEmpty method is backward. I may be mixing things up myself, but I think we are returning true if the cache has items. – Injun 15/10, 2013 at 16:45

Also the put method doesn't match the Map specification: "If the map previously contained a mapping for the key, the old value is replaced by the specified value." This one just returns the old value without replacing it. – Erdah 5/8, 2014 at 16:31

btw, EhCache is not a valid option because the persistence seems to be available for BigMemory Go only...which is not free – Baptista 13/7, 2015 at 16:4

Have you never heard about prevalence frameworks ?

EDIT some clarifications on the term.

Like James Gosling now says, no SQL DB is as efficient as an in-memory storage. Prevalence frameworks (most known being prevayler and space4j) are built upon this idea of an in-memory, maybe storable on disk, storage. How do they work ? In fact, it's deceptively simple : a storage object contains all persistent entities. This storage can only be changed by serializable operations. As a consequence, putting an object in storage is a Put operation performed in isolated context. As this operation is serializable, it may (depending upon configuration) be also saved on disk for long-term persistence. However, the main data repository is the memory, which proides undoubtly fast access time, at the cost of a high memory usage.

Another advantage is that, because of their obvious simplicity, these frameworks hardly contain more than a tenth of classes

Considering your question, the use of Space4J immediatly came to my mind (as it provides support for "passivation" of rarely used objects, that's to say their index key is in memory, but the objects are kept on disk as long as they're not used).

Notice you can also find some infos at c2wiki.

Seducer answered 18/1, 2011 at 16:25 Comment(4)

Maybe "persistence frameworks"? Though searching for "prevalence frameworks" indirectly gave me this: prevayler.org – Agribusiness 18/1, 2011 at 16:34

@dkarp: maybe. A persistence framework is just something like Hibernate or EclipseLink, though... – Osborn 18/1, 2011 at 16:55

It is a concept that some frameworks provide and is different than persistence frameworks. Here are some details: ibm.com/developerworks/library/wa-objprev – Rudolf 18/1, 2011 at 18:48

Actually passivation was removed from the framework in the latest versions. But it does support transparent cluster and indexation. Take a look: forum.space4j.org/posts/list/5.page – Arouse 13/9, 2011 at 2:4

Berkeley DB Java Edition has a Collections API. Within that API, StoredMap in particular, is a drop-in replacement for a ConcurrentHashMap. You'll need to create the Environment and Database before creating the StoredMap, but the Collections tutorial should make that pretty easy.

Per your requirements, Berkeley DB is designed to be easy to use and I think that you'll find that it has exceptional scalability and performance. Berkeley DB is available under an open source license, it's persistent, platform independent and allows you to search for data. The data can certainly be purged/deleted, as needed. Berkeley DB has long list of other features which you may find highly useful to your application, especially as your requirements change and grow with the success of the application.

If you decide to use Berkeley DB Java Edition, please be sure to ask questions on the BDB JE Forum. There's an active developer community that's happy to help answer questions and resolve problems.

Mercurialize answered 19/1, 2011 at 1:18 Comment(0)

We have a similar solution implemented using Xapian. It's fast, it's scalable, it provedes almost all search functionality you requested, it's free, multiplatform, and of course purgeable.

Unapt answered 18/1, 2011 at 16:31 Comment(2)

How do I use Xapian with Java? – Osborn 18/1, 2011 at 16:56

The Java bindings are documented here (svn.xapian.org/trunk/xapian-bindings/java/README?revision=HEAD). – Unapt 19/1, 2011 at 9:55

I came accross jdbm2 a few weeks ago. The usage is very simple. You should be able to get it to work in half an hour. One drawback is that the object which is put into the map must be serializable, i.e. implement Serializable. Other Cons are given in their website.

However, all object persistence database are not a permanent solution for storing objects of you own java class. If you decide to make change to the fields of the class, you will no longer be able to reteive the object from the map collection. It is ideal to store standard serializable classes line String, Integer, etc.

Garretson answered 18/1, 2011 at 18:28 Comment(0)

The google-collections library, part of http://code.google.com/p/guava-libraries/, has some really useful Map tools. MapMaker in particular lets you make concurrent HashMaps with timed evictions, soft values that will be swept up by the garbage collector if you're running out of heap, and computing functions.

Map<String, String> cache = new MapMaker()
    .softValues()
    .expiration(30, TimeUnit.MINUTES)
    .makeComputingMap(new Function<String, String>() {
        @Override
        public String apply(String input) {
            // Work out what the value should be
            return null;
        }
    });

That will give you a Map cache that will clean up after itself and can work out its values. If you're able to compute values like that then great, otherwise it would map perfectly onto http://redis.io/ which you'd be writing into (to be fair, redis would probably be fast enough on its own!).

Gelhar answered 18/1, 2011 at 22:48 Comment(2)

Unfortunately I really need to be able to store more data than will fit in RAM, so MapMaker alone won't cut it. I haven't heard of Redis. How is it used with Java? What makes Redis better than Ehcache or Berkeley DB? – Osborn 18/1, 2011 at 22:54

Hi Matt. The .softValues() argument will tell the garbage collector to evict cache entries if it needs more memory. It will remove entries that have been least used, and can work them out again from the computing function if necessary. – Gelhar 19/6, 2011 at 20:37

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++