Are executables produced with Cython really free of the source code?
Asked Answered
A

1

24

I have read Making an executable in Cython and BuvinJ's answer to How to obfuscate Python code effectively? and would like to test if the source code compiled with Cython is really "no-more-there" after the compilation. It is indeed a popular opinion that using Cython is a way to protect a Python source code, see for example the article Protecting Python Sources With Cython.

Let's take this simple example test.pyx:

import json, time  # this will allow to see what happens when we import a library
print(json.dumps({'key': 'hello world'}))
time.sleep(3)
print(1/0)  # division error!

Then let's use Cython:

cython test.pyx --embed

This produces a test.c. Let's compile it:

call "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\vcvarsall.bat" x64
cl test.c /I C:\Python37\include /link C:\Python37\libs\python37.lib

It works! It produces a 140KB test.exe executable, nice!

But in this answer How to obfuscate Python code effectively? it is said implicitly that this "compilation" will hide the source code. It does not seem true, if you run test.exe, you will see:

Traceback (most recent call last):
  File "test.pyx", line 4, in init test
    print(1/0)  # division error!         <-- the source code and even the comments are still there!
ZeroDivisionError: integer division or modulo by zero

which shows that the source code in human-readable form is still there.

Question: Is there a way to compile code with Cython, such that the claim "the source code is no longer revealed" is true?

Note: I'm looking for a solution where neither the source code nor the bytecode (.pyc) is present (if the bytecode/.pyc is embedded, it's trivial to recover the source code with uncompyle6)


PS: I remembered I did the same observation a few years ago but I could not find it anymore, after deeper research here it is: Is it possible to decompile a .dll/.pyd file to extract Python Source Code?

Azoic answered 15/6, 2020 at 12:57 Comment(5)
Note that even if the code was being "obfuscated", it is just being translated into another language at best, e.g. machine language. It's still entirely possible to reverse engineer the logic and all values from compiled code—by necessity, since that code needs to execute what you told it to.Sonata
@Sonata It's true, but there's still a big difference between "here is the code in machine language, you can possibly reverse-engineer it" and "Here is the original source file with the original comments!"Azoic
I think the problem is that you have pyx-file next to your exe. If you delete/rename it, Python will not find the code any longer. IIRC there must be a duplicate somewhere...Syllabify
I also would recommend to use compiler/linker flags used by cythonize -i test.pyx (it will be logged to the console)Syllabify
#53359701 - One of these should be closed as a duplicate. I personally think this has the slightly better answer (and so the other question should be closed, despite being earlier).Surety
S
28

The code is found in the original pyx-file next to your exe. Delete/don't distribute this pyx-file with your exe.


When you look at the generated C-code, you will see why the error message is shown by your executable:

For a raised error, Cython will emit a code similar to the following:

__PYX_ERR(0, 11, __pyx_L3_error) 

where __PYX_ERR is a macro defined as:

#define __PYX_ERR(f_index, lineno, Ln_error) \
{ \
  __pyx_filename = __pyx_f[f_index]; __pyx_lineno = lineno; __pyx_clineno = __LINE__; goto Ln_error; \
}

and the variable __pyx_f is defined as

static const char *__pyx_f[] = {
  "test.pyx",
  "stringsource",
};

Basically __pyx_f[0] tells where the original code could be found. Now, when an exception is raised, the (embedded) Python interpreter looks for your original pyx-file and finds the corresponding code (this can be looked up in __Pyx_AddTraceback which is called when an error is raised).

Once this pyx-file is not around, the original source code will no longer be known to the Python interpreter/anybody else. However, the error trace will still show the names of the functions and line-numbers but no longer any code snippets.

The resulting executable (or extension if one creates one) doesn't content any bytecode (as in pyc-files) and cannot be decompiled with tools like uncompyle: bytecode is produced when py-file is translated into Python-opcodes which are then evaluated in a huge loop in ceval.c. Yet for builtin/cython modules no bytecode is needed because the resulting code uses directly Python's C-API, cutting out the need to have/evaluate the opcodes - these modules skip interpretation, which a reason for them being faster. Thus no bytecode will be in the executable.

One important note though: One should check that the linker doesn't include debug information (and thus the C-code where the pyx-file content can be found as comments). MSVC with /Z7 options is such an example.


However, the resulting executable can be disassembled to assembler and then the generated C-code can be reverse engineered - so while cythonizing is Ok to make it hard to understand the code, it is not the right tool to conceal keys or security algorithms.

Syllabify answered 15/6, 2020 at 13:33 Comment(1)
Thanks again @Syllabify for this great answer. In the particular case we write "normal Python code" (without types, so without Cython compiling) and we produce an exe with Cython, what happens? Here is a question about this: #72275731Azoic

© 2022 - 2024 — McMap. All rights reserved.