Maximum size of HashSet, Vector, LinkedList
Asked Answered
S

5

31

What is the maximum size of HashSet, Vector, LinkedList? I know that ArrayList can store more than 3277000 numbers.

However the size of list depends on the memory (heap) size. If it reaches maximum the JDK throws an OutOfMemoryError.

But I don't know the limit for the number of elements in HashSet, Vector and LinkedList.

Scrouge answered 3/10, 2011 at 7:25 Comment(0)
P
59

There is no specified maximum size of these structures.

The actual practical size limit is probably somewhere in the region of Integer.MAX_VALUE (i.e. 2147483647, roughly 2 billion elements), as that's the maximum size of an array in Java.

  • A HashSet uses a HashMap internally, so it has the same maximum size as that
    • A HashMap uses an array which always has a size that is a power of two, so it can be at most 230 = 1073741824 elements big (since the next power of two is bigger than Integer.MAX_VALUE).
    • Normally the number of elements is at most the number of buckets multiplied by the load factor (0.75 by default). However, when the HashMap stops resizing, then it will still allow you to add elements, exploiting the fact that each bucket is managed via a linked list. Therefore the only limit for elements in a HashMap/HashSet is memory.
  • A Vector uses an array internally which has a maximum size of exactly Integer.MAX_VALUE, so it can't support more than that many elements
  • A LinkedList doesn't use an array as the underlying storage, so that doesn't limit the size. It uses a classical doubly linked list structure with no inherent limit, so its size is only bounded by the available memory. Note that a LinkedList will report the size wrongly if it is bigger than Integer.MAX_VALUE, because it uses a int field to store the size and the return type of size() is int as well.

Note that while the Collection API does define how a Collection with more than Integer.MAX_VALUE elements should behave. Most importantly it states this the size() documentation:

If this collection contains more than Integer.MAX_VALUE elements, returns Integer.MAX_VALUE.

Note that while HashMap, HashSet and LinkedList seem to support more than Integer.MAX_VALUE elements, none of those implement the size() method in this way (i.e. they simply let the internal size field overflow).

This leads me to believe that other operations also aren't well-defined in this condition.

So I'd say it's safe to use those general-purpose collections with up to Integer.MAX_VLAUE elements. If you know that you'll need to store more than that, then you should switch to dedicated collection implementations that actually support this.

Punkah answered 3/10, 2011 at 7:39 Comment(5)
HashMap uses an array for the first lookup. But if key collisions happen, these will stored in a linked list. Therefore a HashMap could contain more than Integer.MAX_VALUE elements - in an unpredictable way.Flavorous
For the LinkedList (actually this goes for all Lists) the get(int) function also accepts an integer, which means you can not use that to retrieve the elements. In any case I wouldn't not bet on the LinkedList behaving as expected above Integer.MAX_VALUE.Francenefrances
The limit for HashMap is the load-factor * one billion. After this point it will fail to grow the underlying array. A Vector will not grow to Integer.MAX_VALUE, you would have to create the vector with this size as an initial capacity. (unlikely) size() documents that Integer.MAX_VALUE is returned for sizes larger than this, so the size() for LinkedList isn't wrong IMHO.Wheelsman
I don't think that you've got it right for HashMap / HashSet. It is true that the hash array is limited to 2^30. However, you can keep adding elements to the table ad infinitum, since the hash chains are simple linked lists. (The performance will degrade as the hash chains grow, but that is a different issue.) See docjar.com/html/api/java/util/HashMap.java.html line 764Woodley
@StephenC, @A.H.: you are right, it simply ceases to resize once the limit is reached, so HashMap/HashSet acts the same way as a LinkedList after that (growing unboundedly). I'll update my answer.Punkah
O
9

In all cases, you're likely to be limited by the JVM heap size rather than anything else. Eventually you'll always get down to arrays so I very much doubt that any of them will manage more than 231 - 1 elements, but you're very, very likely to run out of heap before then anyway.

Opacity answered 3/10, 2011 at 7:30 Comment(0)
W
7

It very much depends on the implementation details.

A HashSet uses an array as an underlying store which by default it attempt to grow when the collection is 75% full. This means it will fail if you try to add more than about 750,000,000 entries. (It cannot grow the array from 2^30 to 2^31 entries)

Increasing the load factor increases the maximum size of the collection. e.g. a load factor of 10 allows 10 billion elements. (It is worth noting that HashSet is relatively inefficient past 100 million elements as the distribution of the 32-bit hashcode starts to look less random, and the number of collisions increases)

A Vector doubles its capacity and starts at 10. This means it will fail to grow above approx 1.34 billion. Changing the initial size to 2^n-1 gives you slightly more head room.

BTW: Use ArrayList rather than Vector if you can.

A LinkedList has no inherent limit and can grow beyond 2.1 billion. At this point size() could return Integer.MAX_VALUE, however some functions such as toArray will fail as it cannot put all objects into an array, in will instead give you the first Integer.MAX_VALUE rather than throw an exception.

As @Joachim Sauer notes, the current OpenJDK could return an incorrect result for sizes above Integer.MAX_VALUE. e.g. it could be a negative number.

Wheelsman answered 3/10, 2011 at 7:54 Comment(1)
Note: in the OpenJDK implementation of LinkedList (and I assume in the Oracle JDK as well) there's no provision for correctly returning Integer.MAX_VALUE once the size exceeds that value.Punkah
C
3

The maximum size depends on the memory settings of the JVM and of course the available system memory. Specific size of memory consumption per list entry also differs between platforms, so the easiest way might be to run simple tests.

Crus answered 3/10, 2011 at 7:29 Comment(0)
F
2

As stated in other answers, an array cannot reach 2^31 entries. Other data types are limited either by this or they will likely misreport their size() eventually. However, these theoretical limits cannot be reached on some systems:

On a 32 bit system, the number of bytes available never exceeds 2^32 exactly. And that is assuming that you have no operating system taking up memory. A 32 bit pointer is 4 bytes. Anything which does not rely on arrays must include at least one pointer per entry: this means that the maximum number of entries is 2^32/4 or 2^30 for things that do not utilize arrays.

A plain array can achieve it's theoretical limit, but only a byte array, a short array of length 2^31-1 would use up about 2^32+38 bytes.

Some java VMs have introduced a new memory model that uses compressed pointers. By adjusting pointer alignment, slightly more than 2^32 bytes may be referenced with 32 byte pointers. Around four times more. This is enough to cause a LinkedList size() to become negative, but not enough to allow it to wrap around to zero.

A sixty four bit system has sixty four bit pointers, making all pointers twice as big, making non array lists a bunch fatter. This also means that the maximum capacity supported jumps to 2^64 bytes exactly. This is enough for a 2D array to reach its theoretical maximum. byte[0x7fffffff][0x7fffffff] uses memory apporximately equal to 40+40*(2^31-1)+(2^31-1)(2^31-1)=40+40(2^31-1)+(2^62-2^32+1)

Florous answered 1/10, 2012 at 6:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.