Undecompilable Python
Asked Answered
A

2

-2

It is possible to decompile .pyc files: Decompile Python 2.7 .pyc

Is it possible to `compile` python files so there is a human-unreadable code, like the c++ -> exe binary file? ..unlike the plaintext .py and very easily recoverable .pyc files? (I don't mind if it can be cracked by brute force)

Alcatraz answered 26/2, 2013 at 10:58 Comment(4)
The word you are looking for is "obsfuscate". I don't know anything about successful Python code obfuscation, but you can try encrypting python source code and decrypting it on-the-fly during execution.Nicolettenicoli
But then you need to have the encryption key stored somewhere, so the attacker can use it to decrypt the code. Possibly even simpler would be for him to use a debugger do retrieve the decrypted version from RAM. I think distributing the byte-code would be more secure than a flawed encryption system...Exenterate
I dont't understand why was my question downvoted.Alcatraz
I only don't want my clients to open, read/change and CTRL+S the code. It is possible to ship only .pyc files, but they are very easilly decompiled to .py equivalent even with comments.Alcatraz
E
8

Python is a highly dynamic language, and supports many different levels of introspection. Because of that, obfuscating Python bytecode is a mountainous task.

Moreover, your embedded python interpreter will still need to be able to execute the bytecode you ship with your product. And if the interpreter needs to be able to access the bytecode, then everyone else can too. Encryption won't help, because you still need to decrypt the bytecode yourself and then everyone else can read the bytecode from memory. Obfuscation only makes default tools harder, not impossible to use.

With that said, here is what you'd have to do to make it really bloody hard to read your application's Python bytecode:

  • Re-assign all python opcode values a new value. Rewire the whole interpreter to use different byte values for different opcodes.

  • Remove all as many introspection features as you can get away with. Your functions need to have closures, and codeobjects need constants still, but to hell with the locals list in the code object, for example. Neuter the sys._getframe() function, slash traceback information.

Both these steps require in-depth knowledge of how the Python interpreter works, and how the Python object model fits together. You will most certainly introduce bugs that will be hard to solve.

In the end, you have to ask yourself if this is worth it. A determined hacker can still analyze your bytecode, do a some frequency analysis to reconstruct the opcode table, and / or feed your program different opcodes to see what happens, and decipher all the obfuscation. Once a translation table is created, decompiling your bytecode is a snap, and reconstructing your code is not far away.

If all you want to do is prevent bytecode files from being altered, embed checksums for your .pyc files, and check those on startup. Refuse to load if they don't match. Someone will patch your binary to remove the checksum check or replace the checksums, but you won't have to put in nearly as much effort to provide at least some token protection from tampering.

Eonian answered 26/2, 2013 at 11:25 Comment(13)
You're the kind of guy that sneaks #defines into people's C/C++ code aren't you? :)Manolete
Not sure whether I should upvote. Answers shows deep understanding of the topic at hand, but might give OP too much information about an altogether rather inadvisable course of action.Sarre
@JonClements: I actually have seen an application do this. Rewire the whole opcode table, that is. I was sorely tempted to re-construct the table, it's just a classic substitution cypher so frequency analysis should help, and you can feed the system your own constructed bytecode to crack it.Eonian
@Junuxx: I didn't give any details on how to do it, did I? I wanted to illustrate to what lengths you'd had to go to and that those lengths would still be futile.Eonian
@Martijn: True, with the possible result that he'll attempt it anyway and ask a bunch of questions about how to remove Python's introspection features etc :PSarre
@Junuxx: And we'll remind the OP at every step it is pointless to do so.Eonian
This is rather overkilled solution. I only don't want my clients to open, read/change and CTRL+S the code. It is possible to ship only .pyc files, but they are very easilly decompiled to .py equivalent even with comments.Alcatraz
@Qwerty: You cannot avoid shipping with bytecode. The only options you have are to ship with the bytecode as is, or to obfuscate the bytecode. I merely tried to show you that that path is not going to be practical, nor foolproof.Eonian
@Qwerty: The last option you have is to not use python. Your machine code (produced from your C++ source) can be decompiled, altered and saved too though, so your mileage may vary.Eonian
So.. Is it possible to compile python files so there is a human-unreadable code, unlike the plaintext .py and .pyc files? (I don't mind if it can be cracked by brute force)Alcatraz
@Qwerty: .pyc files are not plaintext, they are not human readable. Disassembly is basically 'brute force' cracking. Otherwise, encrypt files using a key (which can be lifted from your binary again), or go the whole hog as I described in my answer.Eonian
@MartijnPieters You're right. I probably don't even know what I want. I somewhat learned that .pyc files are very easily recovered back into .py files, even with comments. The only thing I wanted was some kind of binary file, full of 1,0 and stuff similarly to c++ -> exe. But I will stick to the .pyc after all. I heard a rumour about python and zip archives. Could it help.. maybe?Alcatraz
@Qwerty: You can import from zip files, yes. See docs.python.org/2/library/zipimport.htmlEonian
K
3

Every system with code encrypting can be attacked as the decrypting key must be somewhere present.

As such, it is just a question of effort.

Kumiss answered 26/2, 2013 at 11:8 Comment(4)
+1. You can make it hard for the attacker, but you can't make it impossible.Massenet
I only don't want my clients to open, read/change and CTRL+S the code. It is possible to ship only .pyc files, but they are very easilly decompiled to .py equivalent even with comments.Alcatraz
Given "As such, it is just a question of effort." can the effort be increased if you wrap the code in java and ship a .jar file? I'm curious?Napoleon
@Napoleon Yes, the effort needed can be marginally increased by that, but I am not sure it is worth the added work.Kumiss

© 2022 - 2024 — McMap. All rights reserved.