I am learning to set up the seed of NumPy ver 1.19 psuedo-random number generator for a Python 3.6 concurrent.futures.ProcessPoolExecutor
analysis. After reading NumPy's documentation on Random sampling and Parallel Random Number Generation, I wrote the following script to evaluate my understanding.
My Objective: I want to ensure each concurrent process uses the same seed to start the random process.
What I leant from my Results?
(a) Using a global seed, (b) predefining
numpy.random.default_rng
ornumpy.random.SeedSequence
with seed before passing it into a concurrent process and (c) passing a seed as an argument into the concurrent process give the same results and ensure that each concurrent process uses the same seed to start the random process. That is, there isn't a need to recreate a BitGenerator for each concurrent process.Using the spawned child seeds of a seeded
numpy.random.SeedSequence()
object cannot ensure each concurrent process uses the same seed to start the random process. The job of thespawn()
method of theSeedSequence()
object is to ensure different parts of the BitGenerator results are used so as to avoid repeats?
Question: Are my conclusions correct?
Test Script:
import numpy as np
from numpy.random import default_rng, SeedSequence
import concurrent.futures as cf
def random( loop ):
rg = default_rng()
return loop, [rg.random() for x in range(5)]
def random_global( loop ):
rg = default_rng(SEED)
return loop, [rg.random() for x in range(5)]
def random_rg( loop, rg ):
return loop, [rg.random() for x in range(5)]
def random_wseed( loop, seed ):
rg = default_rng( seed )
return loop, [rg.random() for x in range(5)]
def printresults( futures ):
for future in cf.as_completed( futures ):
print( future.result() )
SEED = 1234
nworkers = 4
nloops = 4
rg = default_rng(SEED)
ss = SeedSequence(SEED)
child_seeds = ss.spawn(nloops) # Spawn off 4 child SeedSequences to pass to child processes.
futures_noseed = []
futures_global = []
futures_rg = []
futures_wseed = []
futures_seedseq = []
futures_seedseq_childseeds = []
with cf.ProcessPoolExecutor( max_workers=nworkers ) as executor:
for nl in range(nloops):
futures_noseed.append( executor.submit( random, nl ) )
futures_global.append( executor.submit( random_global, nl ) )
futures_rg.append( executor.submit( random_rg, nl, rg ) )
futures_wseed.append( executor.submit( random_wseed, nl, SEED ) )
futures_seedseq.append( executor.submit( random_wseed, nl, ss) )
futures_seedseq_childseeds.append( executor.submit( random_wseed, nl, child_seeds[nl]) )
print( f'\nNO SEED')
printresults(futures_noseed)
print( f'\nGLOBAL SEED')
printresults(futures_global)
print( f'\nRG PREDEFINED WITH SEED PASS INTO FUNCTION')
printresults(futures_rg)
print(f'\nPASS SEED INTO FUNCTION')
printresults(futures_wseed)
print(f'\nWITH SEEDSEQUENCE')
printresults(futures_seedseq)
print(f'\nWITH SEEDSEQUENCE CHILD SEEDS')
printresults(futures_seedseq_childseeds)
Output:
NO SEED
(0, [0.739015261152181, 0.14451069021561325, 0.350594672768367, 0.20752211613920601, 0.795523682962996])
(2, [0.7984800506892198, 0.8583726299238038, 0.06791593362457293, 0.53430686768646, 0.0961085560717182])
(3, [0.5277372591285804, 0.33460069291263295, 0.8784128027557904, 0.9050110393243033, 0.6994660907632239])
(1, [0.5819290163279096, 0.9126020141058546, 0.17326463037949713, 0.8475223328152056, 0.23048284365911964])
GLOBAL SEED
(3, [0.9766997666981422, 0.3801957350196178, 0.9232462337639554, 0.2616924238635442, 0.31909705841419755])
(2, [0.9766997666981422, 0.3801957350196178, 0.9232462337639554, 0.2616924238635442, 0.31909705841419755])
(1, [0.9766997666981422, 0.3801957350196178, 0.9232462337639554, 0.2616924238635442, 0.31909705841419755])
(0, [0.9766997666981422, 0.3801957350196178, 0.9232462337639554, 0.2616924238635442, 0.31909705841419755])
RG PREDEFINED WITH SEED PASS INTO FUNCTION
(3, [0.9766997666981422, 0.3801957350196178, 0.9232462337639554, 0.2616924238635442, 0.31909705841419755])
(2, [0.9766997666981422, 0.3801957350196178, 0.9232462337639554, 0.2616924238635442, 0.31909705841419755])
(1, [0.9766997666981422, 0.3801957350196178, 0.9232462337639554, 0.2616924238635442, 0.31909705841419755])
(0, [0.9766997666981422, 0.3801957350196178, 0.9232462337639554, 0.2616924238635442, 0.31909705841419755])
PASS SEED INTO FUNCTION
(1, [0.9766997666981422, 0.3801957350196178, 0.9232462337639554, 0.2616924238635442, 0.31909705841419755])
(0, [0.9766997666981422, 0.3801957350196178, 0.9232462337639554, 0.2616924238635442, 0.31909705841419755])
(2, [0.9766997666981422, 0.3801957350196178, 0.9232462337639554, 0.2616924238635442, 0.31909705841419755])
(3, [0.9766997666981422, 0.3801957350196178, 0.9232462337639554, 0.2616924238635442, 0.31909705841419755])
WITH SEEDSEQUENCE
(2, [0.9766997666981422, 0.3801957350196178, 0.9232462337639554, 0.2616924238635442, 0.31909705841419755])
(3, [0.9766997666981422, 0.3801957350196178, 0.9232462337639554, 0.2616924238635442, 0.31909705841419755])
(1, [0.9766997666981422, 0.3801957350196178, 0.9232462337639554, 0.2616924238635442, 0.31909705841419755])
(0, [0.9766997666981422, 0.3801957350196178, 0.9232462337639554, 0.2616924238635442, 0.31909705841419755])
WITH SEEDSEQUENCE CHILD SEEDS
(2, [0.07734677155697511, 0.8570271790573564, 0.10048845220790636, 0.0478704579870608, 0.30020477671271684])
(3, [0.22148724095124595, 0.09787195733339815, 0.17127991416955768, 0.4819142922814075, 0.7368117871750866])
(1, [0.7137868247717851, 0.5945483974175882, 0.3889492785448826, 0.32053552182074196, 0.6488990935363684])
(0, [0.5293458940996787, 0.2331172694518996, 0.7607005642504421, 0.9940522082501517, 0.6181026121532509])
random()
function in which the 5 inrange(5)
varies. Hence, I thought it is appropriate that I should use different seeds (i.e. child seed) for each scenario while the seed for each loop should use the same seed. Is this approach reasonable or flawed? – Towardly