Having problems keeping a simulation deterministic with random.Random(0) in python
Asked Answered
H

1

1

I have a very large simulation in python with lots of modules. I call a lot of random functions. To keep the same random results I have a variable keep_seed_random.

As so:

import random

keep_seed_random = True

if keep_seed_random is False:
    fixed_seed = random.Random(0)
else:
    fixed_seed = random

Then I use fixed_seed all over the program, such as

fixed_seed.choice(['male', 'female'])
fixed_seed.randint()
fixed_seed.gammavariate(3, 3)
fixed_seed.random()
fixed_seed.randrange(20, 40)

and so on...

It used to work well. But now, that the programme is too large, there is something else interfering and the results are no longer identical, even when I choose keep_seed_random = False

My question is whether there is any other source of randomness in Python that I am missing?

P.S. I import random just once.

EDITED

We have been trying to pinpoint the exact moment when the program turned from exact same results to different results. It seemed to be when we introduced a lot of reading of databases with no connection to random modules.

The results now ALTERNATE among two similar results. That is, I run main.py once get a result of 8148.78 for GDP I run again I get 7851.49 Again 8148.78 back Again 7851.49

Also for the working version, before the change, the first result (when we create instances and pickle save them) I get one result. Then, from the second onwards the results are the same. So, I am guessing it is related to pickle reading/loading.

The question remains!

2nd EDITED

We partially found the problem. The problem is when we create instances and pickle dump and then pickle load.

We still cannot have the exact same results for creating and just loading. However, when loading repeatedly the results are exact.

Thus, the problem is in PICKLE Some randomization may occur when dumping and loading (I guess).

Thanks,

Hying answered 17/6, 2016 at 17:16 Comment(4)
I would try to narrow the program down to find the minimum code that can reproduce the problem. The question is difficult to answer in the way it is presented now.Meridel
I know. That's the point. I have no idea where the randomness may be coming from... Thanks @mart0903Hying
There's a recent pycon talk with this exact issue.Luisluisa
Good talk. I watched it! But in the example of the video it worked for a very specific problem. It is probably a different one for me...Hying
M
2

This is difficult to diagnose without a good reproduce case as @mart0903 mentions. However, in general, there are several sources of randomness that can occur. A few things come to mind:

If for example you are using the multiprocessing and/or subprocess packages to spawn several parallel processes, you may be experiencing a race condition. That is, different processes finishing at different times each time you run the program. Perhaps you are combining the result in some way that is dependent on these threads executing in a particular order.

Perhaps you are simply looping over a dictionary and expecting the keys to be in a certain order, when in fact, dictionaries are not ordered. For example run the following a couple times in a row (I'm using Python 3.5 in case it matters) and you'll notice that the key-value pairs print out in a different order each time:

if __name__=='__main__':
    data = dict()
    data['a'] = 6
    data['b'] = 7
    data['c'] = 42
    for key in data:
        print(key + ' : ' + str(data[key]))

You might even be looking at time-stamps or set some value, or perhaps generating a uuid somewhere that you are using in a calculation.

The possibilities could go on. But again, difficult to nail down without a simple reproduce case. It may just take some good-ol breakpoints and a lot of stepping through code.

Good luck!

Misology answered 17/6, 2016 at 18:32 Comment(5)
Thanks, but I do not use multiprocessing. Neither I use key dictionaries as you suggested here. I guess the problem comes from importing one library or another. Specially, I use osgeo, ogr, pandas, numpy, pickle, subprocess, sys, os, geopandas, shapely, ggplot, glob, itertools, operatorHying
Ah, yes. subprocess could get you into a race condition just as easily as the multiprocessing package if you have more than one of them running at a time. That would essentially be the same thing. If you have two processes going and your result depends on the order in which they finish, you could definitely run into the same thing. I'll update my answer above to include subprocess.Misology
But subprocess is only called when I run the simulation many times. In a single run, main.py is not called from out of the problem and then subprocess is not used... I'm thinking that the source of randomness may come from another library!Hying
It could certainly be another library then, as you suggest. It's difficult to diagnose without a narrowed down reproduce-case.Misology
Thanks! I have reformulated all my code, including setseed, saveseed, checked numpy.seed and still have problems. I also checked all solutions at #11527475 without results. The problem is that it is a full program with 25 modules and a number of necessary reading databases. Maybe I should share it in GitHubt, but I have yet to submit any papers...Hying

© 2022 - 2024 — McMap. All rights reserved.