Reproducible builds in python
Asked Answered
S

2

5

I need to ship a compiled version of a python script and be able to prove (using a hash) that the compiled file is indeed the same as the original one.

What we use so far is a simple:

find . -name "*.py" -print0 | xargs -0 python2 -m py_compile

The issue is that this is not reproducible (not sure what are the fluctuating factors but 2 executions will not give us the same .pyc for the same python file) and forces us to always ship the same compiled version instead of being able to just give the build script to anyone to produce a new compiled version.

Is there a way to achieve that?

Thanks

Supercharger answered 13/9, 2016 at 13:59 Comment(5)
Byte level fluctuations of compilation can be expected. What is wrong with shipping compiled versions?Sorbian
1) os specifics 2) exact python version 3) time related fluctuationsSorbian
We need to store a zip file containing the compiled version on a server/drive, maintain a copy of each version and that kind of headache I want to avoid when using git for the code hosting and a build script. Would be way easier if I could just checkout at specific revision, remake the build and check if it is the same.Supercharger
You can store versions on your server, or S3 or any other storage. I think this battle against a compiler is not worth it...Sorbian
If you're looking to create reproducible zips, even with hashed-source .pycs you've still got to fight zips' inclusion of file permissions, file order, and file modification timestamps. See github.com/bboe/deterministic_zip and Barriers to deterministic, reproducible zip files by Mark Rushakoff (2014) for more details.Braud
L
10

Compiled Python files include a four-byte magic number and the four-byte datetime of compilation. This probably accounts for the discrepancies you are seeing.

If you omit bytes 5-8 from the checksumming process then you should see constant checksums for a given version of Python.

The format of the .pyc file is given in this blog post by Ned Batchelder.

Lightsome answered 13/9, 2016 at 14:19 Comment(0)
B
7

2019 / python3.7+ update: since PEP 552

python -m compileall -f --invalidation-mode=checked-hash [file|dir]
# or
export SOURCE_DATE_EPOCH=1 # set py_compile to use 
python -m py_compile       # pycompile.PycInvalidationMode.CHECKED_HASH

will create .pycs which will not change until their source code changes.

Braud answered 21/11, 2019 at 20:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.