OutOfMemoryException when adding more items to a very large HashSet<Int32>
Asked Answered
S

4

14

Exception of type System.OutOfMemoryException was thrown while trying to add 23997908th item in a HashSet<Int32>.

We need maintain a high performance unique collection of integer sizeof Int32.MaxValue i.e. 2147483647. HashSet of Int32 can store only 23997907 items in it. Looking for a suggestion to resolve this issue.

Sox answered 27/12, 2011 at 6:42 Comment(4)
Just out of sheer curiosity: what do you need this for?Basel
web.archive.org/web/20121008002115/http://blog.mischel.com/2008/…Enravish
@MikeNakis: As mentioned in the post, we need maintain a high performance unique collection of integer to cater some of our requirementSox
@rfmodulator: its a simple WPF based WindowsApplicationSox
D
18

capacity of a HashSet(Of T) object is the number of elements that the object can hold. object's capacity automatically increases as elements are added to it.

if you are using 64 bit system, you can increase Hashset's max capacity upto 2 billion elements by setting the enabled attribute of the gcAllowVeryLargeObjects to true in runtime environment.

You can enable this settings from config file,

<configuration>
 <runtime>
   <gcAllowVeryLargeObjects enabled="true" />
  </runtime>
 </configuration>

Check this MSDN link for setting the configuration.

Update:

Above config gcAllowVeryLargeObjects supports on .Net framework 4.5 only.

Dmitri answered 27/12, 2011 at 7:3 Comment(4)
I'm all of a sudden much more excited about .NET 4.5. I've bumped into the 2 GB limit far too many times.Nicholenicholl
@MitchWheat: yes i forgot to mention. updating my answer. ThanksDmitri
Above config gcAllowVeryLargeObjects supports on .Net framework 4.5 only. Why didn't I read this line.... why ....Dollydolman
@Dmitri I tried this gcAllowVeryLargeObjects parameter and it still generates OOM errors around the 2GB markStimulus
N
11

HashSet grows by doubling. So when you have 23,997,907 items in the list and try to add the next one, it tries to double the size of its backing array. And that allocation causes it to exceed available memory. I assume you're running this on a 32-bit system, because on a 64-bit system a HashSet<object> can hold upwards of 89 million items. The limit is about 61.7 million items in the 32-bit runtime.

What you need to do is pre-allocate the HashSet to hold as many items as you need. Unfortunately, there's no direct way to do that. HashSet doesn't have a constructor that will pre-allocate it with a given capacity.

You can, however, create a List, use it to initialize the HashSet, and then call Clear on the HashSet. That ends up giving you a HashSet that has no items in it, but a capacity of the max you requested. I showed how to do that in a blog post: More on .NET Collection Sizes.

The limits on HashSet size are due to the two gigabyte limit in .NET. No single object can be larger than two gigabytes. The number is actually slightly smaller, due to allocation overhead.

Nicholenicholl answered 27/12, 2011 at 7:4 Comment(2)
Dot net is allowing only 134,217,728 items to add in a List of Int32Sox
@Debasis: If you're running in 64-bit mode, I would expect a List<int> to give you more than 500 million entries. Your 134 million items works out to more than 512 megabytes of memory, which could easily be more than you can allocate in the 32-bit runtime. 134 million is pretty close to the largest HashSet you can build, even in 64-bit mode.Nicholenicholl
D
2

To get around this problem, I created a class that implements HashSet methods and properties (Contains, Add, Count, ...) and behind the scenes keeps an array of HashSets to store the actual data. The first implementation just maxed out each HashSet one-by-one and moved to the next in the array when full. The latest takes a mod of the hash key as the index to the internal HashSet array. This works well for me since the keys are pretty much random so the distribution of values to the HashSets array is pretty even.

Darkness answered 4/1, 2012 at 21:48 Comment(0)
B
0

At this point, I think you'd need to use a database to persist your items (or their hash keys) as this is too many items to store in the default .NET objects. You could also write a custom object that has the same properties as a HashSet, but that might be more trouble that just using a database table to store the hashes.

Bonze answered 27/12, 2011 at 7:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.