Looking for a drop-in replacement for a java.util.Map
Asked Answered
O

6

13

Problem

Following up on this question, it seems that a file- or disk-based Map implementation may be the right solution to the problems I mentioned there. Short version:

  • Right now, I have a Map implemented as a ConcurrentHashMap.
  • Entries are added to it continually, at a fairly fixed rate. Details on this later.
  • Eventually, no matter what, this means the JVM runs out of heap space.

At work, it was (strongly) suggested that I solve this problem using SQLite, but after asking that previous question, I don't think that a database is the right tool for this job. So - let me know if this sounds crazy - I think a better solution would be a Map stored on disk.

Bad idea: implement this myself. Better idea: use someone else's library! Which one?

Requirements

Must-haves:

  • Free.
  • Persistent. The data needs to stick around between JVM restarts.
  • Some sort of searchability. Yes, I need the ability to retrieve this darn data as well as put it away. Basic result set filtering is a plus.
  • Platform-independent. Needs to be production-deployable on Windows or Linux machines.
  • Purgeable. Disk space is finite, just like heap space. I need to get rid of entries that are n days old. It's not a big deal if I have to do this manually.

Nice-to-haves:

  • Easy to use. It would be great if I could get this working by the end of the week.
    Better still: the end of the day. It would be really, really great if I could add one JAR to my classpath, change new ConcurrentHashMap<Foo, Bar>(); to new SomeDiskStoredMap<Foo, Bar>();
    and be done.
  • Decent scalability and performance. Worst case: new entries are added (on average) 3 times per second, every second, all day long, every day. However, inserts won't always happen that smoothly. It might be (no inserts for an hour) then (insert 10,000 objects at once).

Possible Solutions

  • Ehcache? I've never used it before. It was a suggested solution to my previous question.
  • Berkeley DB? Again, I've never used it, and I really don't know anything about it.
  • Hadoop (and which subproject)? Haven't used it. Based on these docs, its cross-platform-readiness is ambiguous to me. I don't need distributed operation in the foreseeable future.
  • A SQLite JDBC driver after all?
  • ???

Ehcache and Berkeley DB both look reasonable right now. Any particular recommendations in either direction?

Osborn answered 18/1, 2011 at 16:23 Comment(7)
Free as in speech or just free as in beer?Nashville
I would be surprised there is nothing you can do about the Map filling to the point of an OutOfMemoryError. How much data do you have and how much memory do you have?Platform
@Scott: free as in beer is fine.Osborn
When I asked a version of this question, the suggestions were ehcache, hadoop, a real DB, and roll-your-own subclass of LinkedBlockingQueue.Agribusiness
@Peter: I'm running with -Xmx512m; this is a Java EE app so there's a lot else going on. The Map itself is about 128m when the OOME is thrown - after running for ~6 hours. That's with adding 1 entry/sec, not 3/sec. Even if I run this thing with a crap-ton of memory (I can't) I just won't be able to store as much data as I need to (at least a month's worth). Doing some basic math: after a month, adding 3 entries/sec (which is the worst-case rate), the Map would be ~43 gigabytes.Osborn
@Matt Ball, use a database, which can do the simple maths, then take those results and do any complex bits in Java.Mongeau
@orangepips: if I'm going to use a database, it would probably be SQLite, in which case I'm back at my previous question. Any suggestions there? It really doesn't seem like the right way to do this - please convince me.Osborn
S
8

UPDATE (some 4 years after first post...): beware that in newer versions of ehcache, persistence of cache items is available only in the pay product. Thanks @boday for pointing this out.

ehcache is great. It will give you the flexibility you need to implement the map in memory, disk or memory with spillover to disk. If you use this very simple wrapper for java.util.Map then using it is blindingly simple:

import java.util.Collection;
import java.util.List;
import java.util.Map;
import java.util.Set;

import net.sf.ehcache.Cache;
import net.sf.ehcache.Element;

import org.apache.log4j.Logger;

import com.google.common.collect.Sets;

public class EhCacheMapAdapter<K,V> implements Map<K,V> {
    @SuppressWarnings("unused")
    private final static Logger logger = Logger
            .getLogger(EhCacheMapAdapter.class);

    public Cache ehCache;

    public EhCacheMapAdapter(Cache ehCache) {
        super();
        this.ehCache = ehCache;
    } // end constructor

    @Override
    public void clear() {
        ehCache.removeAll();
    } // end method

    @Override
    public boolean containsKey(Object key) {
        return ehCache.isKeyInCache(key);
    } // end method

    @Override
    public boolean containsValue(Object value) {
        return ehCache.isValueInCache(value);
    } // end method

    @Override
    public Set<Entry<K, V>> entrySet() {
        throw new UnsupportedOperationException();
    } // end method

    @SuppressWarnings("unchecked")
    @Override
    public V get(Object key) {
        if( key == null ) return null;
        Element element = ehCache.get(key);
        if( element == null ) return null;
        return (V)element.getObjectValue();
    } // end method

    @Override
    public boolean isEmpty() {
        return ehCache.getSize() == 0;
    } // end method

    @SuppressWarnings("unchecked")
    @Override
    public Set<K> keySet() {
        List<K> l = ehCache.getKeys();
        return Sets.newHashSet(l);
    } // end method

    @SuppressWarnings("unchecked")
    @Override
    public V put(K key, V value) {
        Object o = this.get(key);
        if( o != null ) return (V)o;
        Element e = new Element(key,value);
        ehCache.put(e);
        return null;
    } // end method


    @Override
    public V remove(Object key) {
        V retObj = null;
        if( this.containsKey(key) ) {
            retObj = this.get(key);
        } // end if
        ehCache.remove(key);
        return retObj;
    } // end method

    @Override
    public int size() {
        return ehCache.getSize();
    } // end method

    @Override
    public Collection<V> values() {
        throw new UnsupportedOperationException();
    } // end method

    @Override
    public void putAll(Map<? extends K, ? extends V> m) {
        for( K key : m.keySet() ) {
            this.put(key, m.get(key));
        } // end for
    } // end method
} // end class
Substituent answered 18/1, 2011 at 18:41 Comment(8)
Yup, I just came across this very recipe and I'm working on getting ehcache set up right now.Osborn
Yeah, but mine is a drop-in replacement for Map. Which is what you asked for. ;-)Substituent
Indeed it is. Any idea where the best place to put ehcache.xml is, in a Java EE app (an EAR)?Osborn
Nope I'm a Spring fan. It has EhCacheFactoryBean which can be useful.Substituent
I'm going with Ehcache for now. Minor config details aside, this has been pretty painless. As best I can tell, it's satisfied every single one of my requirements, aside from searching, which is coming in 2.4 - I'll play with that tomorrow. Thank you.Osborn
I think your isEmpty method is backward. I may be mixing things up myself, but I think we are returning true if the cache has items.Injun
Also the put method doesn't match the Map specification: "If the map previously contained a mapping for the key, the old value is replaced by the specified value." This one just returns the old value without replacing it.Erdah
btw, EhCache is not a valid option because the persistence seems to be available for BigMemory Go only...which is not freeBaptista
S
5

Have you never heard about prevalence frameworks ?

EDIT some clarifications on the term.

Like James Gosling now says, no SQL DB is as efficient as an in-memory storage. Prevalence frameworks (most known being prevayler and space4j) are built upon this idea of an in-memory, maybe storable on disk, storage. How do they work ? In fact, it's deceptively simple : a storage object contains all persistent entities. This storage can only be changed by serializable operations. As a consequence, putting an object in storage is a Put operation performed in isolated context. As this operation is serializable, it may (depending upon configuration) be also saved on disk for long-term persistence. However, the main data repository is the memory, which proides undoubtly fast access time, at the cost of a high memory usage.

Another advantage is that, because of their obvious simplicity, these frameworks hardly contain more than a tenth of classes

Considering your question, the use of Space4J immediatly came to my mind (as it provides support for "passivation" of rarely used objects, that's to say their index key is in memory, but the objects are kept on disk as long as they're not used).

Notice you can also find some infos at c2wiki.

Seducer answered 18/1, 2011 at 16:25 Comment(4)
Maybe "persistence frameworks"? Though searching for "prevalence frameworks" indirectly gave me this: prevayler.orgAgribusiness
@dkarp: maybe. A persistence framework is just something like Hibernate or EclipseLink, though...Osborn
It is a concept that some frameworks provide and is different than persistence frameworks. Here are some details: ibm.com/developerworks/library/wa-objprevRudolf
Actually passivation was removed from the framework in the latest versions. But it does support transparent cluster and indexation. Take a look: forum.space4j.org/posts/list/5.pageArouse
M
1

Berkeley DB Java Edition has a Collections API. Within that API, StoredMap in particular, is a drop-in replacement for a ConcurrentHashMap. You'll need to create the Environment and Database before creating the StoredMap, but the Collections tutorial should make that pretty easy.

Per your requirements, Berkeley DB is designed to be easy to use and I think that you'll find that it has exceptional scalability and performance. Berkeley DB is available under an open source license, it's persistent, platform independent and allows you to search for data. The data can certainly be purged/deleted, as needed. Berkeley DB has long list of other features which you may find highly useful to your application, especially as your requirements change and grow with the success of the application.

If you decide to use Berkeley DB Java Edition, please be sure to ask questions on the BDB JE Forum. There's an active developer community that's happy to help answer questions and resolve problems.

Mercurialize answered 19/1, 2011 at 1:18 Comment(0)
U
0

We have a similar solution implemented using Xapian. It's fast, it's scalable, it provedes almost all search functionality you requested, it's free, multiplatform, and of course purgeable.

Unapt answered 18/1, 2011 at 16:31 Comment(2)
How do I use Xapian with Java?Osborn
The Java bindings are documented here (svn.xapian.org/trunk/xapian-bindings/java/README?revision=HEAD).Unapt
G
0

I came accross jdbm2 a few weeks ago. The usage is very simple. You should be able to get it to work in half an hour. One drawback is that the object which is put into the map must be serializable, i.e. implement Serializable. Other Cons are given in their website.

However, all object persistence database are not a permanent solution for storing objects of you own java class. If you decide to make change to the fields of the class, you will no longer be able to reteive the object from the map collection. It is ideal to store standard serializable classes line String, Integer, etc.

Garretson answered 18/1, 2011 at 18:28 Comment(0)
G
0

The google-collections library, part of http://code.google.com/p/guava-libraries/, has some really useful Map tools. MapMaker in particular lets you make concurrent HashMaps with timed evictions, soft values that will be swept up by the garbage collector if you're running out of heap, and computing functions.

Map<String, String> cache = new MapMaker()
    .softValues()
    .expiration(30, TimeUnit.MINUTES)
    .makeComputingMap(new Function<String, String>() {
        @Override
        public String apply(String input) {
            // Work out what the value should be
            return null;
        }
    });

That will give you a Map cache that will clean up after itself and can work out its values. If you're able to compute values like that then great, otherwise it would map perfectly onto http://redis.io/ which you'd be writing into (to be fair, redis would probably be fast enough on its own!).

Gelhar answered 18/1, 2011 at 22:48 Comment(2)
Unfortunately I really need to be able to store more data than will fit in RAM, so MapMaker alone won't cut it. I haven't heard of Redis. How is it used with Java? What makes Redis better than Ehcache or Berkeley DB?Osborn
Hi Matt. The .softValues() argument will tell the garbage collector to evict cache entries if it needs more memory. It will remove entries that have been least used, and can work them out again from the computing function if necessary.Gelhar

© 2022 - 2024 — McMap. All rights reserved.