How to make dill deterministic?
Asked Answered
S

1

0

We intend to use dill to generate a hash of a function. Our previous approach was using bytecode, but it is slower and it is an extra unnecessary step if we decide to unpickle the function in the future. The output of successive calls is as follows:

import dill as d
from hashlib import md5
md5(d.dumps(lambda x: {"y": x+2})).hexdigest()
# output: 'f063cdd725f0e6f5a1d211925a1024b1'

import dill as d
from hashlib import md5
md5(d.dumps(lambda x: {"y": x+2})).hexdigest()
# output: 'ea85fa41e85f0c78c54bbe0e00e55798'
Succeed answered 26/11, 2021 at 19:19 Comment(0)
C
1

You can't. The dill result of a function includes the id of the function. If you define the function explicitly:

def fn(x):
    return {"y": x+2}

then you get the same dill result every time, UNTIL you add another function to the file. That causes this function's dill result to change.

Counterfoil answered 26/11, 2021 at 19:31 Comment(1)
You are correct. That may be not so bad news for my project, if changes in the file are the only reason for the nondeterminism. People from this project seemed to get around the issue, but I couldn't replicate/understand wheter they did something with it or not: github.com/huggingface/datasets/pull/819Succeed

© 2022 - 2024 — McMap. All rights reserved.