C# creating a list of random unique integers
Asked Answered
P

8

9

I need to create a list with one billion integers and they must all be unique. I also need this to be done extremely fast.

Creating a list and adding random numbers one by one and checking to see if each one is a duplicate is extremely slow.

It seems to be quite fast if I just populate a list with random numbers without checking if they are duplicates and then using distinct().toList(). I repeat this until there are no more duplicates. However the extra memory used by creating a new list is not optimal. Is there a way to get the performance of distinct() but instead of creating a new list it just modifies the source list?

Psychosis answered 2/11, 2011 at 14:12 Comment(11)
How large is the range in which you want to create the integers?Tyrannize
This was previously asked for the C language. The highest-voted answer (as opposed to the accepted answer) has some advice that can be ported to C#. #1608681Honorine
What datatype are you storing these ints in? A 32-bit Integer can only hold 2.1bn anyway (2,147,483,647), so there's a limit to how 'random' you can really be.Gustaf
There is a similiar post here (it's a list of 300 unique numbers) #5562242Cockeyed
than just add Numbers from 1 to int.MaxValue ,that will be unique for sure .Oxtail
from 1-1000000000. Perhaps all I need is an extremely fast and efficient shuffle algorithm because I cannot have them in order.Psychosis
Do 1 billion ints even fit into a single List<int>. I thought .net supports only 2GB arrays.Tyrannize
@CodeInChaos, memory is the only limit for array sizes in .NET. On most x86 Windows editions attempting to do this will result in an OutOfMemoryException but on x64 Windows editions it will work.Thundercloud
@Darin Did they add that feature in .net 4? In .net 2 the 2GB limit applied even to 64 bit applications. blogs.msdn.com/b/joshwil/archive/2005/08/10/450202.aspxTyrannize
@CodeInChaos, you seem to be correct. I didn't know about this restriction.Thundercloud
You have not included nearly enough information in the question to give a solution. For example, do you require the list to be in a random order, or merely to contain random integers? ("Pick Six" style lottery tickets choose six random distinct integers; the order does not matter. But when choosing six random songs from a list and playing them, the order should be random as well.) What is the range of the integers? Can they be negative? Can they be larger than what fits in an Int32? An Int64?Unanswerable
B
16

I found this the fastest while maintaining randomness:

        Random rand = new Random();
        var ints = Enumerable.Range(0, numOfInts)
                                     .Select(i => new Tuple<int, int>(rand.Next(numOfInts), i))
                                     .OrderBy(i => i.Item1)
                                     .Select(i => i.Item2);

...basically assigning a random id to each int and then sorting by that id and selecting the resulting list of ints.

Bethesda answered 18/10, 2015 at 4:37 Comment(0)
K
14

Do the integers need to be in a certain range? If so, you could create an array or list with all numbers in that range (for example from 1 to 1000000000) and shuffle that list.

Katykatya answered 2/11, 2011 at 14:15 Comment(0)
T
3

If the amount of possible integers from which you draw is significantly larger (say factor 2) than the amount of integers you want you can simply use a HashSet<T> to check for duplicates.

List<int> GetUniqueRandoms(Random random, int count)
{
  List<int> result = new List<int>(count);
  HashSet<int> set = new HashSet<int>(count);
  for(int i = 0; i < count; i++)
  {
    int num;

    do
    {
      num = random.NextInt();
    while(!set.Add(num));

    result.Add(num);
  }
  return result;
}

This allocates the collections with the correct capacity to avoid reallocation during growth. Since your collections are large this should be a large improvement.

You can also use Distinct a single time:

IEnumerable<int> RandomSequence(Random random)
{
    while(true)
    {
      yield return random.NextInt();
    }
}

RandomSequence(rand).Distinct().Take(1000000000).ToList();

But with both solutions you need enough memory for a HashSet<int> and a List<int>.


If the amount of possible integers from which you draw is about as large as the amount of integers you want, you can create an array that contains all of them, shuffle them and finally cut off those you're not interested in.

You can use Jon Skeet's shuffle implementation.

Tyrannize answered 2/11, 2011 at 14:16 Comment(1)
Why? I didn't say I want to iterate over the hashset. I'd only use it for checking for duplicates.Tyrannize
V
2

You can track duplicates in a separate HashSet<int>:

var set = new HashSet<int>();
var nums = new List<int>();

while(nums.Count < 1000000000) {
    int num;
    do {
        num = rand.NextInt();
    } while (!set.Contains(num));
    set.Add(num);
    list.Add(num);
}

You need a separate List<int> to store the numbers because a hashset will not preserve your random ordering.

Version answered 2/11, 2011 at 14:16 Comment(4)
This will be increasingly slower when the list gets more filled. It might literally take hours to find the last number that fits into the list.Katykatya
@CodeCaster: Yes. If shuffling is an option, it's a much better option.Version
Having two objects containing all the ints is not an option.Psychosis
The inner loop will run forever because set will initially be empty and will never contain num. Should remove the "!" from the do-loop condition.Casease
B
2

Taking the question literally (a list with one billion integers and they must all be unique):

Enumerable<int>.Range(0, 1000000000)

But along the lines of CodeCaster's answer, you can create the list and shuffle it at the same time:

var count = 1000000000;
var list = new List<int>(count);
var random = new Random();
list.Add(0);
for (var i = 1; i < count; i++)
{
    var swap = random.Next(i - 1);
    list.Add(list[swap]);
    list[swap] = i;
}
Baisden answered 2/11, 2011 at 14:57 Comment(0)
L
0

What if you created the list in a sorted but still random fashion (such as adding a random number to the last element of the list as the next element), and then shuffled the list with a Fisher-Yates-Durstenfeld? That would execute in linear time overall, which is pretty much as good as it gets for list generation. However, it might have some significant bias that would affect the distribution.

Laryngo answered 2/11, 2011 at 14:19 Comment(0)
A
0

You can trick LINQ to jumble the numbers for you by supplying a random number lambda to OrderBy:

Random rand = new Random();
var ints = Enumerable.Range(0, 1000000000).OrderBy(i => rand.Next());
Acus answered 20/7, 2023 at 1:29 Comment(0)
U
0

I couldn't find anything good enough on the internet so I ended up writing this method, which eliminates unnecessary generation of random numbers in case of duplicates and also does not keep a large list of numbers in memory in case you're working with large min-max ranges.

// Generates a list of random numbers between min (inclusive) and max (exclusive), without any duplicates

List<int> GenerateRandomNumbers(int min, int max, int count)
{
    if (max <= min || max - min < count)
    {
        throw new ArgumentException($"Invalid arguments: min {min}, max {max} and count {count}.");
    }

    List<int> numbers = new();
    Dictionary<int, int> dict = new();

    for (int i = 0; i < count; i++)
    {
        int r = RandomNumberGenerator.GetInt32(max - min - i);
        // int r = Random.Shared.Next(max - min - i);

        if (!dict.TryGetValue(r, out int n))
        {
            n = r;
        }

        numbers.Add(n + min);

        if (!dict.TryGetValue(max - min - i - 1, out int last))
        {
            last = max - min - i - 1;
        }

        dict[r] = last;
    }

    return numbers;
}
Ulphia answered 3/2 at 17:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.