Unique random number generation in an integer array [duplicate]
Asked Answered
K

9

31

Possible Duplicate:
Unique random numbers in O(1)?

How do I fill an integer array with unique values (no duplicates) in C?

int vektor[10];   

for (i = 0; i < 10; i++) {
    vektor[i] = rand() % 100 + 1;
}

//No uniqueness here
Katusha answered 22/10, 2009 at 15:51 Comment(2)
As an aside, just assigning the array index would meet the requirement of "unique value" while not addressing the implied "unique RANDOM value".Hypoderm
No, it isn't. The requirement to select only M out of N (as "10 out of 100" above) is an important detail.Misquote
S
81

There are several ways to solve your problem, each has its own advantages and disadvantages.

First I'd like to note that you already got quite a few of responses that do the following: they generate a random number, then check somehow whether it was already used in the array, and if it was already used, they just generate another number until they find an unused one. This is a naive and, truth to be said, seriously flawed approach. The problem is with the cyclic trial-and-error nature of the number generation ("if already used, try again"). If the numeric range (say, [1..N]) is close to the length of the desired array (say, M), then towards the end the algorithm might spend a huge amount of time trying to find the next number. If the random number generator is even a little bit broken (say, never generates some number, or does it very rarely), then with N == M the algorithm is guaranteed to loop forever (or for a very long time). Generally this trial-and-error approach is a useless one, or a flawed one at best.

Another approach already presented here is generating a random permutation in an array of size N. The idea of random permutation is a promising one, but doing it on an array of size N (when M << N) will certainly generate more heat than light, speaking figuratively.

Good solutions to this problem can be found, for example, in Bentley's "Programming Pearls" (and some of them are taken from Knuth).


  • The Knuth algorithm. This is a very simple algorithm with a complexity of O(N) (i.e. the numeric range), meaning that it is most usable when M is close to N. However, this algorithm doesn't require any extra memory in addition to your vektor array, as opposed to already offered variant with permutations (meaning that it takes O(M) memory, not O(N) as other permutation-based algorithms suggested here). The latter makes it a viable algorithm even for M << N cases.

The algorithm works as follows: iterate through all numbers from 1 to N and select the current number with probability rm / rn, where rm is how many numbers we still need to find, and rn is how many numbers we still need to iterate through. Here's a possible implementation for your case

#define M 10
#define N 100

int in, im;

im = 0;

for (in = 0; in < N && im < M; ++in) {
  int rn = N - in;
  int rm = M - im;
  if (rand() % rn < rm)    
    /* Take it */
    vektor[im++] = in + 1; /* +1 since your range begins from 1 */
}

assert(im == M);

After this cycle we get an array vektor filled with randomly chosen numbers in ascending order. The "ascending order" bit is what we don't need here. So, in order to "fix" that we just make a random permutation of elements of vektor and we are done. Note, that the this is a O(M) permutation requiring no extra memory. (I leave out the implementation of the permutation algorithm. Plenty of links was given here already.).

If you look carefully at the permutation-based algorithms proposed here that operate on an array of length N, you'll see that most of them are pretty much this very same Knuth algorithm, but re-formulated for M == N. In that case the above selection cycle will chose each and every number in [1..N] range with probabilty 1, effectively turning into initialization of an N-array with numbers 1 to N. Taking this into account, I think it becomes rather obvious that running this algorithm for M == N and then truncating the result (possibly discarding most of it) makes much less sense than just running this algorithm in its original form for the original value of M and getting the result right away, without any truncation.


  • The Floyd algorithm (see here). This approach has the complexity of about O(M) (depends on the search structure used), so it is better suitable when M << N. This approach keeps track of already generated random numbers, so it requires extra memory. However, the beauty of it is that it does not make any of those abominable trial-and-error iterations, trying to find an unused random number. This algorithm is guaranteed to generate one unique random number after each call to the random number generator.

Here's a possible implementation for it for your case. (There are different ways to keep track of already used numbers. I'll just use an array of flags, assuming that N is not prohibitively large)

#define M 10
#define N 100    

unsigned char is_used[N] = { 0 }; /* flags */
int in, im;

im = 0;

for (in = N - M; in < N && im < M; ++in) {
  int r = rand() % (in + 1); /* generate a random number 'r' */

  if (is_used[r])
    /* we already have 'r' */
    r = in; /* use 'in' instead of the generated number */

  assert(!is_used[r]);
  vektor[im++] = r + 1; /* +1 since your range begins from 1 */
  is_used[r] = 1;
}

assert(im == M);

Why the above works is not immediately obvious. But it works. Exactly M numbers from [1..N] range will be picked with uniform distribution.

Note, that for large N you can use a search-based structure to store "already used" numbers, thus getting a nice O(M log M) algorithm with O(M) memory requirement.

(There's one thing about this algorithm though: while the resultant array will not be ordered, a certain "influence" of the original 1..N ordering will still be present in the result. For example, it is obvious that number N, if selected, can only be the very last member of the resultant array. If this "contamination" of the result by the unintended ordering is not acceptable, the resultant vektor array can be random-shuffled, just like in the Khuth algorithm).


Note the very critical point observed in the design of these two algoritms: they never loop, trying to find a new unused random number. Any algorithm that makes trial-and-error iterations with random numbers is flawed from practical point of view. Also, the memory consumption of these algorithms is tied to M, not to N

To the OP I would recommend the Floyd's algorithm, since in his application M seems to be considerably less than N and that it doesn't (or may not) require an extra pass for permutation. However, for such small values of N the difference might be negligible.

Sunroom answered 22/10, 2009 at 15:51 Comment(6)
I don't agree with your claim that "trial and error" is useless. The naive trial and error algorithm has a strong guarantee even when N==M (it completes in O(nlgn) time with high probability). For M<N/2, say, it completes in O(n) time with high probability.Inesita
I can only say that this guarantee doesn't project well to practice. For N==M case the chances of running into an infinite loop with bad or low-quality 'rand()' are rather high (and the chances of getting a very long search times for the last elements even with good rand() are higher). I don't know how you can reasonably expect O(n lg n) in practice. In ideal world, maybe...Misquote
The O(n lg n) probably comes from an analysis similar to that of the (surprising) coupon collector's problem: en.wikipedia.org/wiki/Coupon_collector%27s_problem While a poor rand() might make things worse, as long as the rand() actually hits all values it should only be off by a constant: I do not know of any rand() implementation for which this would not be true.Gaffrigged
Or, an informal summary of the (short) solution to the coupon collector's problem: it is true that near the end of the list you may need to call rand() for O(n) times to find a new element, but this is only true for about O(log n) of them, so it all works out. Whether this O(n log n) is actually good enough is another matter: don't underestimate those logarithmic factors!Gaffrigged
I too disagree that trial and error is 'useless', as most reliable hill climbing algorithms employ it at some point. An interesting google would be the enigma m4 project, where steckers were hill climbed in a distributed network. Yet, +1, this is clearly the best answer to the question.Ovoid
Haha, looks like I created Floyd algorithm from scratch on the interview today. Good info. +1Quillet
H
6

In your example (choose 10 unique random numbers between 1 and 100), you could create a list with the numbers 1 to 100, use the random number generator to shuffle the list, and then take the first 10 values from the list.

int list[100], vektor[10];
for (i = 0; i < 100; i++) {
    list[i] = i;
}
for (i = 0; i < 100; i++) {
    int j = i + rand() % (100 - i);
    int temp = list[i];
    list[i] = list[j];
    list[j] = temp;
}
for (i = 0; i < 10; i++) {
    vektor[i] = list[i];
}

Based on cobbal's comment below, it is even better to just say:

for (i = 0; i < 10; i++) {
    int j = i + rand() % (100 - i);
    int temp = list[i];
    list[i] = list[j];
    list[j] = temp;

    vektor[i] = list[i];
}

Now it is O(N) to set up the list but O(M) to choose the random elements.

Honoria answered 22/10, 2009 at 16:1 Comment(7)
Rather inefficient in general case, i.e. when the range length (say, N) is notably greater than the required array length (say, M). An efficient algorithm should be close to O(M). This one is O(N).Misquote
I agree -- see the accepted answer in eyalim's link.Honoria
There is a small but not necessarily negligible bias in the random number mechanism used, but if you fix that, this is a good technique. Beware the upper bound in the middle loop; you can only swap list[99] with itself, which your code does, but it is a tad 'wasteful'.Chandless
it would be possible to run to i < 10, as once list[i] is assigned in the second loop, it doesn't change again.Igenia
@mobrule: The accepted answer at the link is only good for situations when you need, say, 1000 numers out of 1000-long range. For the OP's problem that method will just generate more heat than light.Misquote
I have 10 numbers, I want to choose these 10 numbers randomly, i used int randomNumber=arc4random()%10; it takes random numbers but, it is repeating . please help me.Bascule
Quite Smart way !! Most useful in java if you are using ArrayList as it can be suffled by CollectionValadez
R
3

I think this will do it (I've not tried to build it, so syntax errors are left to fix as an exercise for the reader). There might be more elegant ways, but this is the brute force solution:

int vektor[10];    
int random;
int uniqueflag;
int i, j

for(i = 0; i < 10; i++) {
     do {
        /* Assume things are unique... we'll reset this flag if not. */
        uniqueflag = 1;
        random = rand() % 100+ 1;
        /* This loop checks for uniqueness */
        for (j = 0; j < i && uniqueflag == 1; j++) {
           if (vektor[j] == random) {
              uniqueflag = 0;
           }
        }
     } while (uniqueflag != 1);
     vektor[i] = random;
}
Rideout answered 22/10, 2009 at 16:1 Comment(1)
Any algorithm that used the "try again" approach has very limited practical value. Actually, I'd say that the suffling approach is better, but the shuffling can be implemented better (see Knuth method in my reply).Misquote
T
3

Simply generating random numbers and seeing whether they are OK is a poor way to solve this problem in general. This approach takes all the possible values, shuffles them and then takes the top ten. This is directly analogous to shuffling a deck of cards and dealing off the top.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define randrange(N) rand() / (RAND_MAX/(N) + 1)

#define MAX 100        /* Values will be in the range (1 .. MAX) */
static int vektor[10];
int candidates[MAX];

int main (void) {
  int i;

  srand(time(NULL));   /* Seed the random number generator. */

  for (i=0; i<MAX; i++)
    candidates[i] = i;

  for (i = 0; i < MAX-1; i++) {
    int c = randrange(MAX-i);
    int t = candidates[i];
    candidates[i] = candidates[i+c];
    candidates[i+c] = t;
  }

  for (i=0; i<10; i++)
    vektor[i] = candidates[i] + 1;

  for (i=0; i<10; i++)
    printf("%i\n", vektor[i]);

  return 0;
}

For more information, see comp.lang.c FAQ list question 13.19 for shuffling and question 13.16 about generating random numbers.

Tango answered 22/10, 2009 at 17:34 Comment(0)
U
0

One way would be to check if the array already contains the new random number, and if it does, make a new one and try again.

This opens up for the (random ;) ) possibility that you'd never get a number which is not in the array. Therefore you should count how many times you check if the number is already in the array, and if the count exceeds MAX_DUPLICATE_COUNT, throw an exception or so :) (EDIT, saw you're in C. Forget the exceptionpart :) Return an error code instead :P )

Uninhibited answered 22/10, 2009 at 15:55 Comment(2)
Well, I'd be quite surprized if I gave a function a well-defined task with a well-defined soluition, and the function would return with a "sorry, I just coudn't do it this time" error code :)Misquote
Haha, yeah, that would look awesome :)Uninhibited
M
0

An quick solution is to create a mask array of all possible numbers initialized to zeros, and set an entry if that number is generated

int rand_array[100] = {0};
int vektor[10];   
int i=0, rnd;
while(i<10) {
    rnd = rand() % 100+ 1;
    if ( rand_array[rnd-1] == 0 ) {
        vektor[i++] = rnd;
        rand_array[rnd-1] = 1;
    }
}
Morrie answered 22/10, 2009 at 16:4 Comment(0)
O
0

Generate first and second digits separately. Shuffle them later if required. (syntax from memory)

int vektor[10];
int i = 0;

while(i < 10) {
  int j = rand() % 10;
  if (vektor[j] == 0) { vektor[j] = rand() % 10 + j * 10; i ++;}
}

However, the numbers will be nearly apart by n, 0 < n < 10.

Or else, you need to keep the numbers sorted (O(n log n)), so that newly generated can be quickly checked for presence (O(log n)).

Overscrupulous answered 24/10, 2009 at 0:17 Comment(0)
S
0

Here is an O(M) average-time method.

Method: If M <= N/2, use procedure S(M,N) (below) to generate result array R, and return R. If M > N/2, use procedure S(N-M,N) to generate R, then compute X = {1..M}\R [the complement of R in {1..M}], shuffle X with Fisher-Yates shuffle [in time O(M)], and return X.

In the M > N/2 case, where O(M) == O(N), there are several fast ways to compute the complement. In the code shown below, for brevity I have only included an example of procedure S(M,N) coded inline in main(). Fisher-Yates shuffle is O(M) and is illustrated in main answer to related question #196017. Other previous related questions: #158716 and #54059.

The reason that S(M,N) takes O(M) time instead of O(N) time when M < N/2 is that, as described in Coupon-collector's problem the expectation E(t_k) is kH_k, from which E(t_{k/2}) = k(H_k - H_{k/2}) or about k*(ln(k)-ln(k/2)+O(1)) = k*(ln(k/(k/2))+O(1)) = k*(ln(2)+O(1)) = O(k).

Procedure S(k,N): [The body of this procedure is the dozen lines after the comment "Gen M distinct random numbers" in the code below.] Allocate and initialize three M+1-element integer arrays H, L, and V to all -1 values. For i=0 to M-1: Put a random value v into V[i] and into the sentinel node V[-1]. Get one of M list heads from H[v%M] and follow that list until finding a match to v. If the match is at V[-1] then v is a new value; so update list head H[v%M] and list link L[i]. If the match is not at V[-1], get and test another v, etc.

Each "follow the list" step has expected cost O(1) because at each step except the last, average list length is less than 1. (At end of processing, the M lists contain M elements, so average length gradually rises to exactly 1.)

 // randomMofN - jiw 8 Nov 2011     
 // Re: https://stackoverflow.com/questions/1608181/
 #include <stdlib.h>
 #include <stdio.h>
 int main(int argc, char *argv[]) {
   int h, i, j, tM, M, N, par=0, *H, *L, *V, cxc=0;
   // Get M and N values
   ++par; M = 42;  if (argc > par) M = atoi(argv[par]);
   ++par; N = 137; if (argc > par) N = atoi(argv[par]);
   tM = 3*M+3;
   H = malloc(tM*sizeof(int));
   printf ("M = %d,  N = %d  %s\n", M, N, H?"":"\nmem error");
   if (!H) exit(13);
   for (i=0; i<tM; ++i)           // Init arrays to -1's
     H[i] = -1;
   L = H+M;  V = L+M;

   // Gen M distinct random numbers
   for (i=0; i<M; ++i) {
     do {
       ++cxc;                     // complexity counter
       V[-1] = V[i] = random()%N;
       h = V[i]%M;                // h = list-head index
       j = H[h];
       while (V[j] != V[i])
         j = L[j];
     } while (j>=0);
     L[i] = H[h];
     H[h] = i;
   }

   // Print results
   for (j=i=0; i<M; ++i) {
     j += printf ("%4d ", V[i]);
     if (j>66) j = printf ("\n");
   }
   printf ("\ncxc %d\n", cxc);
   return 0;
 }
Step answered 9/11, 2011 at 8:40 Comment(1)
Question #2394246 is also related, and includes discussion of Robert Floyd's sampling algorithm.Step
D
0

i like the Floyd algorithm.

but we can take all the random number from 0 to M (an not to in) :

#define M 10
#define N 100    

unsigned char is_used[N] = { 0 }; /* flags */
int in, im;

im = 0;

for (in = N - M; in < N && im < M; ++in) {
  int r = rand() % (N + 1); /* generate a random number 'r' */

  while (is_used[r])
  {
     /* we already have 'r' */
     r = rand() % (N + 1);
  }
  vektor[im++] = r + 1; /* +1 since your range begins from 1 */
  is_used[r] = 1;
}

assert(im == M);
Directory answered 9/11, 2011 at 11:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.