What is the most efficient Java Collections library? [closed]

E

12

138

What is the most efficient Java Collections library?

A few years ago, I did a lot of Java and had the impression back then that trove is the best (most efficient) Java Collections implementation. But when I read the answers to the question "Most useful free Java libraries?" I noticed that trove is hardly mentioned. So which Java Collections library is best now?

UPDATE: To clarify, I mostly want to know what library to use when I have to store millions of entries in a hash table etc. (need a small runtime and memory footprint).

Experienced answered 10/3, 2009 at 11:48 Comment(6)

What are the keys and values in this table? If they're not primitives, what's wrong with the normal HashMap etc? – Apophasis 10/3, 2009 at 12:31

For a very large map you might want a probing implementation, or even inlined like a database table. – Regenaregency 10/3, 2009 at 12:51

Interestingly I see no mention of Colt here which was subsequently subsumed into Mahout. – Ozalid 10/2, 2012 at 0:3

It is worth to mention very nice collection library - GS collections (github.com/goldmansachs/gs-collections). It has excelent documentation and an exhaustive set of mutable and immutable colections – Adopted 25/3, 2014 at 7:48

java.dzone.com/articles/time-memory-tradeoff-example – Interpolation 27/7, 2014 at 9:48

GS Collections was migrated to the Eclipse Foundation a little over a year ago and is now Eclipse Collections - eclipse.org/collections – Im 21/2, 2017 at 3:27

A

73

From inspection, it looks like Trove is just a library of collections for primitive types - it's not like it's meant to be adding a lot of functionality over the normal collections in the JDK.

Personally (and I'm biased) I love Guava (including the former Google Java Collections project). It makes various tasks (including collections) a lot easier, in a way which is at least reasonably efficient. Given that collection operations rarely form a bottleneck in my code (in my experience) this is "better" than a collections API which may be more efficient but doesn't make my code as readable.

Given that the overlap between Trove and the Guava is pretty much nil, perhaps you could clarify what you're actually looking for from a collections library.

Apophasis answered 10/3, 2009 at 11:59 Comment(9)

should be mentioned that for most tasks google collections is too complex, and java collections are more than sufficient. – Bores 10/3, 2009 at 12:14

@Andreas: Can't say I agree. Not that it's a "one or the other" scenario - I use the regular collections (with helpers like the Lists class) and then use Iterables etc when I need to. Use the complexity only when it helps you. – Apophasis 10/3, 2009 at 12:30

after reading my own comment several months after using G-C extensively - I disagree with my past opinion, and agree fully with yours. use the helper methods/classes extensively, they make much of the code more readable and safer. – Bores 7/9, 2009 at 21:48

@Andreas: Thanks for coming back and saying so - I'm glad to hear that GJC is helping :) – Apophasis 8/9, 2009 at 5:19

Hey, Jon, Google Java Collections is now Guava. You might want to update your post for future references :) – Iselaisenberg 25/10, 2011 at 18:24

@ArturCzajka: I'm aware of Guava, it's just that posts two and a half years old do tend to get out of date... I'll edit now. – Apophasis 25/10, 2011 at 18:50

I've worked on quite a few data intensive projects where collections were a huge bottleneck. Java Collections are terribly inefficient (both memory and speed) especially if they store primitives. – Cavitation 25/9, 2014 at 16:7

@JayAskren: Yes, any situation where you're boxing a large amount of data is going to be a candidate for optimization. Unfortunately, the OP of this question didn't elaborate on what kind of data they were storing :( – Apophasis 25/9, 2014 at 16:13

The question asked what is the most efficient. There is zero evidence presented that Guava is the most efficient. – Cultism 9/5, 2017 at 14:33

A

105

The question is (now) about storing lots of data, which can be represented using primitive types like int, in a Map. Some of the answers here are very misleading in my opinion. Let's see why.

I modified the benchmark from trove to measure both runtime and memory consumption. I also added PCJ to this benchmark, which is another collections library for primitive types (I use that one extensively). The 'official' trove benchmark does not compare IntIntMaps to Java Collection's Map<Integer, Integer>, probably storing Integers and storing ints is not the same from a technical point of view. But a user might not care about this technical detail, he wants to store data representable with ints efficiently.

First the relevant part of the code:

new Operation() {

     private long usedMem() {
        System.gc();
        return Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
     }

     // trove
     public void ours() {
        long mem = usedMem();
        TIntIntHashMap ours = new TIntIntHashMap(SET_SIZE);
        for ( int i = dataset.size(); i-- > 0; ) {
           ours.put(i, i);
        }
        mem = usedMem() - mem;
        System.err.println("trove " + mem + " bytes");
        ours.clear();
     }

     public void pcj() {
        long mem = usedMem();
        IntKeyIntMap map = new IntKeyIntOpenHashMap(SET_SIZE);
        for ( int i = dataset.size(); i-- > 0; ) {
           map.put(i, i);
        }
        mem = usedMem() - mem;
        System.err.println("pcj " + mem + " bytes");
        map.clear();
     }

     // java collections
     public void theirs() {
        long mem = usedMem();
        Map<Integer, Integer> map = new HashMap<Integer, Integer>(SET_SIZE);
        for ( int i = dataset.size(); i-- > 0; ) {
           map.put(i, i);
        }
        mem = usedMem() - mem;
        System.err.println("java " + mem + " bytes");
        map.clear();
     }

I assume the data comes as primitive ints, which seems sane. But this implies a runtime penalty for java util, because of the auto-boxing, which is not neccessary for the primitive collections frameworks.

The runtime results (without gc() calls, of course) on WinXP, jdk1.6.0_10:

                      100000 put operations      100000 contains operations 
java collections             1938 ms                        203 ms
trove                         234 ms                        125 ms
pcj                           516 ms                         94 ms

While this might already seem drastic, this is not the reason to use such a framework.

The reason is memory performance. The results for a Map containing 100000 int entries:

java collections        oscillates between 6644536 and 7168840 bytes
trove                                      1853296 bytes
pcj                                        1866112 bytes

Java Collections needs more than three times the memory compared to the primitive collection frameworks. I.e. you can keep three times as much data in memory, without resorting to disk IO which lowers runtime performance by magnitudes. And this matters. Read highscalability to find out why.

In my experience high memory consumption is the biggest performance issue with Java, which of course results in worse runtime performance as well. Primitive collection frameworks can really help here.

So: No, java.util is not the answer. And "adding functionality" to Java collections is not the point when asking about efficiency. Also the modern JDK collections do not "out-perform even the specialized Trove collections".

Disclaimer: The benchmark here is far from complete, nor is it perfect. It is meant to drive home the point, which I have experienced in many projects. Primitive collections are useful enough to tolerate fishy API - if you work with lots of data.

Abacist answered 10/3, 2009 at 22:23 Comment(5)

Actually, I think your answer is misleading. Storing ints vs Integers is very different, and most likely the main reason for the increased memory usage. I agree a raw type collection framework could be useful, but it doesn't make trove or pcj "better" than java.util. – Scavenger 10/3, 2009 at 22:47

The question is about storing int data efficiently. Not about storing Integers. For this task trove/pcj are more efficient, as I tried to show. Using Integers imposes runtime and memory inefficiencies. Since java.util doesn't allow usage of primitives, it is not the best choice for this task. – Abacist 11/3, 2009 at 9:12

(for Russian community) here goes another benchmark: total-holywar.blogspot.com/2011/07/… – Vankirk 13/8, 2011 at 11:17

Not sure if we don't use int as key,just normal String. What will be the workbench result for them? – Ianiana 14/8, 2011 at 2:18

@ClarkBao (sorry for being late) Storing any object as key will use the object hashCode(). It gets you an int as the key. – Yarbrough 7/4, 2014 at 16:9