What does the integer while setting the seed mean?
Asked Answered
C

3

18

I want to randomly select n rows from my data set using the sample() function in R. I was getting different outputs each time and hence used set.seed() function to get the same output. I know that each integer in the set.seed() will give me a unique output and the output will be the same if set the same seed. But I'm not able to make out what that integer that is passed as a parameter to the set.seed() function means. Is it just an index that goes into the random generator algorithm or does it mean some part of the data from where you start sampling? For example, what does the 2 in set.seed(2) mean?

Confront answered 4/2, 2013 at 10:7 Comment(0)
M
11

A random seed (or seed state, or just seed) is a number (or vector) used to initialize a pseudorandom number generator.

For a seed to be used in a pseudorandom number generator, it does not need to be random. Because of the nature of number generating algorithms, so long as the original seed is ignored, the rest of the values that the algorithm generates will follow probability distribution in a pseudorandom manner.

-- wikipedia

So, random function could be implemented like this:

int rand_r(unsigned int *seed)
{
    *seed = *seed * 1103515245 + 12345;
    return (*seed % ((unsigned int)RAND_MAX + 1));
}

(sample taken from glibc)

Mendive answered 4/2, 2013 at 10:22 Comment(2)
I get it. Its like a parameter that goes into the psuedo-random number generator, which returns a number or a series or numbers generated by the algorithm.Confront
exactly. moreover, you can calculate arbitrary "random" number. nice and short info can be found here link. also be sure to call function seed only once or you could have same series of random numbers (at least in C).Mendive
D
29

In the old days, there were books that contained pages and pages of random digits (in a random order, of course).

I like to think of set.seed(x) as telling the computer to start reading random numbers from page x in a huge book of random numbers. x has nothing to do with the data, but how the algorithm for choosing random numbers should begin.

This might be a bit facile, but I like the analogy.

Delossantos answered 4/2, 2013 at 19:39 Comment(0)
M
11

A random seed (or seed state, or just seed) is a number (or vector) used to initialize a pseudorandom number generator.

For a seed to be used in a pseudorandom number generator, it does not need to be random. Because of the nature of number generating algorithms, so long as the original seed is ignored, the rest of the values that the algorithm generates will follow probability distribution in a pseudorandom manner.

-- wikipedia

So, random function could be implemented like this:

int rand_r(unsigned int *seed)
{
    *seed = *seed * 1103515245 + 12345;
    return (*seed % ((unsigned int)RAND_MAX + 1));
}

(sample taken from glibc)

Mendive answered 4/2, 2013 at 10:22 Comment(2)
I get it. Its like a parameter that goes into the psuedo-random number generator, which returns a number or a series or numbers generated by the algorithm.Confront
exactly. moreover, you can calculate arbitrary "random" number. nice and short info can be found here link. also be sure to call function seed only once or you could have same series of random numbers (at least in C).Mendive
F
5

It is just a number used to set seeds for the random number generator. It has nothing to do with your data. If you don't explicitly provide a seed, a new one is created from the current time.

See the ?set.seed help page for plenty of details about it.

Frydman answered 4/2, 2013 at 10:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.