Correct way to generate random numbers in Cython?
Asked Answered
A

5

24

What is the most efficient and portable way to generate a random random in [0,1] in Cython? One approach is to use INT_MAX and rand() from the C library:

from libc.stdlib cimport rand
cdef extern from "limits.h":
    int INT_MAX
cdef float randnum = rand() / float(INT_MAX)

Is it OK to use INT_MAX in this way? I noticed that it's quite different from the constant you get from Python's max int:

import sys
print INT_MAX
print sys.maxint 

yields:

2147483647  (C max int)
9223372036854775807  (python max int)

Which is the right "normalization" number for rand()? EDIT additionally, how can the random seed be set (e.g. seeded based on current time) if one uses the C approach of calling rand() from libc?

Aekerly answered 22/4, 2013 at 1:25 Comment(2)
Have you just tried taking the max and min of a few thousand samples, and seeing which scaling factor gets near 1.0?Carrillo
Please watch "rand() considered harmful"Oculo
Q
15

The C standard says rand returns an int in the range 0 to RAND_MAX inclusive, so dividing it by RAND_MAX (from stdlib.h) is the proper way to normalise it. In practice, RAND_MAX will almost always be equal to MAX_INT, but don't rely on that.

Because rand has been part of ISO C since C89, it's guaranteed to be available everywhere, but no guarantees are made regarding the quality of its random numbers. If portability is your main concern, though, it's your best option, unless you're willing to use Python's random module.

Python's sys.maxint is a different concept entirely; it's just the largest positive number Python can represent in its own int type; larger ones will have to be longs. Python's ints and longs aren't particularly related to C's.

Querist answered 22/4, 2013 at 2:21 Comment(5)
Thanks! Do you know how the seed can be set using the libc approach?Aekerly
I'd probably use random.randint(0, INT_MAX) for that, really. Overhead isn't an issue because it will only happen once.Querist
overhead is a major issue since this is called many times in a loop that needs to generate random numbers (in order to sample from multinomial in part). Can I set random.randint(0, INT_MAX) and then call C's rand() and have the C and Python seeds be "in sync"?Aekerly
to clarify: overhead is only an issue for the generation of numbers, of course as you say calling python back once to set the seed is not an issue at allAekerly
I meant use random.randint(0, INT_MAX) as an argument to libc's srand.Querist
P
4

I'm not sure if drand is a new addition but it seems to do exactly what you want while avoiding the costly division.

cdef extern from "stdlib.h":
    double drand48()
    void srand48(long int seedval)

cdef extern from "time.h":
    long int time(int)

# srand48(time(0))
srand48(100)
# TODO: this is a seed to reproduce bugs, put to line of code above for
# production
drand48() #This gives a float in range [0,1)

I came across this idea while researching if your division method generated sufficient randomness. The source I found makes the good point that in my case I am comparing the random number to a decimal with two digits so I really only need 3 decimal points of precision. So INT_MAX is more than adequate. But, it seems like drand48 saves the cost of division so might be worth using.

Palacios answered 23/9, 2015 at 18:32 Comment(1)
By the way, this is not portable because srand48 and drand48 are only available on POSIX systemsOculo
C
3

'c' stdlib rand() returns a number between 0 and RAND_MAX which is generally 32767.

Is there any reason not to use the python random() ?

Generate random integers between 0 and 9

Crud answered 22/4, 2013 at 2:17 Comment(4)
I am using this an inner loop in a Cython function and calling Python for this is too costlyAekerly
RAND_MAX is 2147483647 (231-1) on my system. It's guaranteed to be at least 215-1, but I don't think I've ever seen it that low in practice.Querist
Oh, apparently msvc's is 2**15-1. That's awful.Querist
Yes. There is a reason which is time reduction. Using the native C rand() function reduced the time by 300 ms less than using numpy.rand().Burdened
S
3

All of the above answers are correct, but I'd like to add a note that took me way too long to catch. The C rand() function is NOT thread-safe. So if you are running cython in parallel without the gil, the standard C rand() function has a chance of causing enormous slowdowns while it tries to handle all of the kernel calls. Just a warning.

Sadducee answered 13/7, 2022 at 11:0 Comment(2)
That's a good point. I think it could generate the same "random" numbers in each thread or just corrupt its internal state or something else. The slowdown you see is one way that "not thread-safe" could show up.Perforation
#40977380 covers this I thinkPerforation
S
0

As shown below, I see no reason to not prefer python random.random():

import numpy as np
from ext.random import random as rd
%timeit rd()
48.6 ns ± 0.396 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

np.unique([rd() for _ in range(10000000)]).size
Out[5]: 32768

from random import random as rd_python
%timeit rd_python()
33.2 ns ± 0.213 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

np.unique([rd_python() for _ in range(10000000)]).size
Out[8]: 10000000

for cython version:

from libc.stdlib cimport rand, RAND_MAX

cpdef float random():
    return float(rand()) / RAND_MAX 

same values for the version:

from libc.stdlib cimport rand, RAND_MAX

cdef float scale = 1.0 / RAND_MAX
cpdef float crandom():
    return rand() * scale
Showpiece answered 24/12, 2022 at 12:52 Comment(2)
This isn't a great test though. It's showing that they take about the same length of time to call from Python. However if you call them from Cython itself then you can get significant speed-up calling a C function but not when calling np.randomPerforation
@Perforation as I found, for bigger random numbers count calling big np.random once can be faster than calling cython random many times; so it is not so obviouslyShowpiece

© 2022 - 2024 — McMap. All rights reserved.