Map alternative for primitive values
Asked Answered
D

5

30

I did some profiling on my application and one of the results turned out that about 18% of memory on the heap is used by objects of type Double. It turns out these objects are the values in Maps, where I cannot use the primitive type.

My reasoning is that the primitive type of double consumes less memory than it's object Double. Is there a way to have a map like data structure, that would accept any type as key and a primitive double as values?

Main operations would be:

  • insertion (probably only once)
  • Lookup (contains by key)
  • Retrieval (by key)
  • Iteration

Typical maps that I have are:

  • HashMap<T, HashMap<NodeData<T>, Double>> graph
  • HashMap<Point2D, Boolean> onSea (though not a double value)
  • ConcurrentHashMap<Point2D, HashMap<Point2D, Double>>

All used with Java 8.

Addendum

I am mainly not interested in frameworks that have a solution to these type of maps, but on what has to be considered when addressing these issues. If you want, what are the concepts/ideas/approaches behind any such framework. Or the solution may be also on another level, where the maps are replaced with objects following a certain pattern like @Ilmari Karonen pointed out in his answer.

Defroster answered 18/1, 2017 at 17:57 Comment(10)
How many entries do you have in the map?Clay
I use multiple maps with Double as values. The largest map however seems to contain more than 32'000 elements.Defroster
Which implementation do you use? Have you considered using another one?Pigeon
So far I used JDK type maps, mostly HashMap and ConncurrentHashMap.Defroster
Which Java version? ConcurrentHashMap is way slower than the normal HashMapPigeon
What kind of data do you have? Depeneding on them, another implementation could be useful.Pigeon
@ppasler: I use Java 8. Also added the three maps that consume the most memory (due to it's many entries).Defroster
"that about 18% is used by objects of type" 18% of total memory? Of heap memory? Of processing time? Numbers without context are meaningless.Denigrate
@jpmc26: It's about memory on the heap. Clarified the question.Defroster
It was not my intention to ask for any library or framework that would solve the issue, however as it turns out most answers exactly do that. I am more interested into how any such solution work, how it is possible to have a collection type that uses primitive types.Defroster
B
18

Others have already suggested several third-party implementations of primitive-valued maps. For completeness, I'd like to mention some ways to get rid of the maps entirely that you might want to consider. These solutions won't always be possible, but when they are, they will usually be both faster and more memory-efficient than any map can be.

Alternative 1: Use plain old arrays.

A simple double[] array may not be as sexy as a fancy map, but very little can beat it in compactness and speed of access.

Of course, arrays come with a bunch of limitations: their size is fixed (although you can always create a new array and copy the old one's content into it) and their keys can only be small positive integers which, for efficiency, should be reasonably dense (i.e. the total number of used keys should be a reasonably large fraction of the highest key value). But if that happens to be the case for your keys, or if you can arrange for it to be the case, arrays of primitive values can be very efficient.

In particular, if you can assign a unique small integer ID to each key object, then you can use that ID as an index into an array. Similarly, if you're already storing your objects in an array (e.g. as part of some more complex data structure) and looking them up by index, then you can simply use the same index to look up any additional metadata values in another array.

You could even dispense with the ID uniqueness requirement, if you implemented some kind of a collision handling mechanism, but at that point you're well on your way towards implementing your own hash table. In some cases that might actually make sense, but usually at that point it's probably easier to use an existing third-party implementation.

Alternative 2: Customize your objects.

Instead of maintaining a map from key objects into primitive values, why not just make those values into properties of the objects themselves? This is, after all, what object-oriented programming is all about — grouping related data into meaningful objects.

For example, instead of maintaining a HashMap<Point2D, Boolean> onSea, why not just give your points a boolean onSea property? Of course, you'll need to define your own custom point class for this, but there's no reason why you can't make it extend the standard Point2D class if you want, so that you can pass your custom points into any method that expects a Point2D.

Again, this approach may not always work directly, e.g. if you need to work with classes that you cannot modify (but see below), or if the values you want to store are associated with more than one object (as in your ConcurrentHashMap<Point2D, HashMap<Point2D, Double>>).

However, for the latter case, you may still be able to solve the problem by suitably redesigning your data representation. For example, instead of representing a weighted graph as a Map<Node, Map<Node, Double>>, you can define an Edge class like:

class Edge {
    Node a, b;
    double weight;
}

and then add an Edge[] (or Vector<Edge>) property to each node that contains any edges connected to that node.

Alternative 3: Combine multiple maps into one.

If you have multiple maps with the same keys, and cannot just turn the values into new properties of the key objects as suggested above, consider grouping them into a single metadata class and creating a single map from the keys into objects of that class. For example, instead of a Map<Item, Double> accessFrequency and a Map<Item, Long> creationTime, consider defining a single metadata class like:

class ItemMetadata {
    double accessFrequency;
    long creationTime;
}

and having a single Map<Item, ItemMetadata> to store all the metadata values. This is more memory-efficient than having multiple maps, and may also save time by avoiding redundant map lookups.

In some cases, for convenience, you may also wish to include in each metadata object a reference to its corresponding main object, so that you can access both through a single reference to the metadata object. Which naturally segues into...

Alternative 4: Use a decorator.

As a combination of the previous two alternatives, if you cannot directly add extra metadata properties into the key objects, consider instead wrapping them with decorators that can hold the extra values. Thus, for example, instead of directly creating your own point class with extra properties, you could simply do something like:

class PointWrapper {
    Point2D point;
    boolean onSea;
    // ...
}

If you like, you can even turn this wrapper into a full-blown decorator by implementing method forwarding, but even just a simple "dumb" wrapper may be sufficient for many purposes.

This approach is most useful if you can then arrange to store and work with just the wrappers, so that you never need to look up the wrapper corresponding to an unwrapped object. Of course, if you do need to do that occasionally (e.g. because you're only receiving the unwrapped objects from other code), then you can do that with a single Map<Point2D, PointWrapper>, but then you're effectively back at the previous alternative.

Bertie answered 19/1, 2017 at 9:52 Comment(1)
Thank you. This was originally the kind of answer I was looking for. The decorator approach looks very promising at least in some cases where the base object is just a reference. In such cases customizeable objects would add additional memory overhead.Defroster
C
17

Eclipse Collections has object and primitive maps and has Mutable and Immutable versions for both.

MutableObjectDoubleMap<String> doubleMap = ObjectDoubleMaps.mutable.empty();
doubleMap.put("1", 1.0d);
doubleMap.put("2", 2.0d);

MutableObjectBooleanMap<String> booleanMap = ObjectBooleanMaps.mutable.empty();
booleanMap.put("ok", true);

ImmutableObjectDoubleMap<String> immutableMap = doubleMap.toImmutable();
Assert.assertEquals(doubleMap, immutableMap);

A MutableMap can be used as a factory for an ImmutableMap in Eclipse Collections by calling toImmutable as I have done in the example above. Both mutable and immutable maps share a common parent interface, which in the case of the MutableObjectDoubleMap and ImmutableObjectDoubleMap above, is named ObjectDoubleMap.

Eclipse Collections also has synchronized and unmodifiable versions for all mutable containers in the library. The following code will give you a synchronized view wrapped around the primitive maps.

MutableObjectDoubleMap<String> doubleMap = 
        ObjectDoubleMaps.mutable.<String>empty().asSynchronized();
doubleMap.put("1", 1.0d);
doubleMap.put("2", 2.0d);

MutableObjectBooleanMap<String> booleanMap = 
        ObjectBooleanMaps.mutable.<String>empty().asSynchronized();
booleanMap.put("ok", true);

This performance comparison of large maps was published a couple of years ago.

Large HashMap overview: JDK, FastUtil, Goldman Sachs, HPPC, Koloboke, Trove – January 2015 version

GS Collections has since been migrated to the Eclipse Foundation and is now Eclipse Collections.

Note: I am a committer for Eclipse Collections.

Culicid answered 18/1, 2017 at 21:17 Comment(3)
I assume that the mutable map variants are not thread safe, right? The comparison on memory usage shown on the product page for maps: has the map used for that comparison both primitive key and value or mixed type?Defroster
Correct, the mutable maps are not thread safe. There are also Synchronized wrappers available for the mutable primitive maps by calling asSynchronized(). There will be trade offs using these, and certain methods may need to be protected explicitly via a synchronized block.Culicid
The memory comparisons on the product page were between JDK HashMap and EC UnifiedMap which are both Map<Object, Object>.Culicid
T
13

What you are looking for is a Object2DoubleOpenHashMap from fastutil (Collections Framework with a small memory footprint and fast access and insertion) which provides methods of type double getDouble(Object k) and double put(K k, double v).

For example:

// Create a Object2DoubleOpenHashMap instance
Object2DoubleMap<String> map = new Object2DoubleOpenHashMap<>();
// Put a new entry
map.put("foo", 12.50d);
// Access to the entry
double value = map.getDouble("foo");

The class Object2DoubleOpenHashMap is a real implementation of a Map that is not thread-safe, however you can still use the utility method Object2DoubleMaps.synchronize(Object2DoubleMap<K> m) to make it thread-safe thanks to a decorator.

The creation code would then be:

// Create a thread safe Object2DoubleMap
Object2DoubleMap<String> map =  Object2DoubleMaps.synchronize(
    new Object2DoubleOpenHashMap<>()
);
Traffic answered 18/1, 2017 at 18:3 Comment(0)
P
2

There are several implementations:

Here are questions related to best performance:

The actual implementation can also influence performance

Pigeon answered 18/1, 2017 at 18:10 Comment(2)
Looks like the impact on memory consumption is not that significant, as I imagined.Defroster
fastutil, Koloboke and Eclipse collections are generally better (by most useful metrics like correctness, speed, completeness, API, etc.) than Trove and HPPC.Withy
D
2

To have a better estimate of how these various libraries stack up against each other, I put together a small benchmark that checks the performance of:

  • total time for 300'000 insertions
  • average time for a check of containments with 1000 samples that are in the map
  • Memory size of the data structure I had a look at the Map-like structure which takes a String as key and double as value. The checked frameworks are Eclipse Collection, HPPC, Trove and FastUtil, as well as for comparison the HashMap and ConcurrentHashMap.

In short, these are the results:

Filling in 300000 into the JDK HashMap took 107ms
Filling in 300000 into the JDK ConcurrentHashMap took 152ms
Filling in 300000 into the Eclipse map took 107ms
Filling in 300000 into the Trove map took 855ms
Filling in 300000 into the HPPC map took 93ms
Filling in 300000 into the FastUtil map took 163ms
1000 lookups average in JDK HashMap took: 550ns
1000 lookups average in JDK Concurrent HashMap took: 748ns
1000 lookups average in Eclipse Map took: 894ns
1000 lookups average in Trove Map took: 1033ns
1000 lookups average in HPPC Map took: 523ns
1000 lookups average in FastUtil Map took: 680ns
JDK HashMap:            43'809'895B
JDK Concurrent HashMap: 43'653'740B => save  0.36%
Eclipse Map:            35'755'084B => save 18.39%
Trove Map:              32'147'798B => save 26.62%
HPPC Map:               27'366'533B => save 37.53%
FastUtil Map:           31'560'889B => save 27.96%

For all the details, as well as the test application take a look at my blog entry.

Defroster answered 24/1, 2017 at 15:0 Comment(4)
Your benchmark is very problematic: no warmup, testing all variants in the same VM, much too short measurements, OSR. Trust me, Java benchmarking is much harder than expected; have a look at JMH.Rennin
@Rennin I am aware of these shortcomings, so the results are not that reliable, however they point into a certain direction.Defroster
Being off by a factor of two or more would be no big surprise. There are two System.nanoTime calls per containsKey call, which is funny as the former is not much faster than the latter. Interleaving all the call is funny as the optimizer may decide to inline some of the calls (but not all of them as there inlining limits). Similarly for register allocation and whatever. Your results do point into a certain direction, but it may be the opposite direction.Rennin
@maaartinus. When looking at the timings the test done to get them sure are not solid enough to jump to conclusions. However my main concern were the memory usage and there the differences are much more clear.Defroster

© 2022 - 2024 — McMap. All rights reserved.