How do I protect Python code from being read by users?
Asked Answered
A

28

794

I am developing a piece of software in Python that will be distributed to my employer's customers. My employer wants to limit the usage of the software with a time-restricted license file.

If we distribute the .py files or even .pyc files it will be easy to (decompile and) remove the code that checks the license file.

Another aspect is that my employer does not want the code to be read by our customers, fearing that the code may be stolen or at least the "novel ideas".

Is there a good way to handle this problem?

Allayne answered 4/11, 2008 at 11:57 Comment(5)
py2exe just stores the .pyc byte code files in a .zip archive, so this is definitely not a solution. Still, that can be useful when combined with a suitable starup script to make it run unter LinuxHoofbeat
like this: #15956448Precedence
This is the most comprehensive answer to your question: wiki.python.org/moin/Asking%20for%20Help/…Kava
The only thing you could do is use licensing and a remote backend to complete operations. The important code will be placed in the backend, so for the client app perspective the backend will act as a black box. Nobody knows what is behind those network calls, so your important code will be protected. If the license expires, network calls will be unauthenticated. This is the only solution I can think of to keep the important code hidden for end users.Thereof
You can always add value by connecting your software to a server, ie updates,Freshwater
A
422

Python, being a byte-code-compiled interpreted language, is very difficult to lock down. Even if you use a exe-packager like py2exe, the layout of the executable is well-known, and the Python byte-codes are well understood.

Usually in cases like this, you have to make a tradeoff. How important is it really to protect the code? Are there real secrets in there (such as a key for symmetric encryption of bank transfers), or are you just being paranoid? Choose the language that lets you develop the best product quickest, and be realistic about how valuable your novel ideas are.

If you decide you really need to enforce the license check securely, write it as a small C extension so that the license check code can be extra-hard (but not impossible!) to reverse engineer, and leave the bulk of your code in Python.

Aga answered 4/11, 2008 at 12:0 Comment(10)
Even if the license-checking code were hard to reverse engineer because it's written in C, wouldn't it still be relatively easy to remove the calls to the license-checking code?Bedew
Yes it would, depending on where the license check is performed. If there are many calls to the extension, it could be difficult to eradicate. Or you can move some other crucial part of the application into the license check as well so that removing the call to the extension cripples the app.Aga
Really, all of this work is not about preventing modification, but about increasing its difficulty so that it's no longer worth it. Anything can be reverse-engineered and modified if there's enough benefit.Aga
@Blair Conrad: Not if the license-checking code hides functionality, too. E.g. mylicensedfunction(licenseblob liblob, int foo, int bar, std::string bash)Tweedsmuir
I think the clever way is implementing critical parts in C and implement all license checking stuff in there. (I use hardware dongle which can implement some calculation inside it. So it's almost impossible to reverse it back.)Duplessismornay
I've actually seen commercial python code shipped as embedded python inside of a C library. Instead of converting some parts of the code to C, they hide the entire python code inside a protective C layer. Then, if they want a module importable by python, they write a thin python extension on top of the C. Open source is a much easier way of life.Aerodrome
Is cython a viable alterative for this?Evolutionary
I know this is old, but I must note that storing crypto keys or anything sensitive inside an executable is a Bad Idea™. The fact that the executable is compiled to binary is irrelevant. You can simply run the strings command against it and you'll have the key right there in front of you.Mandy
A Windows fix would be to compile the file into a. Pyd and import it laterDegreeday
But using py2exe generate new bugs for you.Iritis
K
524

"Is there a good way to handle this problem?" No. Nothing can be protected against reverse engineering. Even the firmware on DVD machines has been reverse engineered and the AACS Encryption key exposed. And that's in spite of the DMCA making that a criminal offense.

Since no technical method can stop your customers from reading your code, you have to apply ordinary commercial methods.

  1. Licenses. Contracts. Terms and Conditions. This still works even when people can read the code. Note that some of your Python-based components may require that you pay fees before you sell software using those components. Also, some open-source licenses prohibit you from concealing the source or origins of that component.

  2. Offer significant value. If your stuff is so good -- at a price that is hard to refuse -- there's no incentive to waste time and money reverse engineering anything. Reverse engineering is expensive. Make your product slightly less expensive.

  3. Offer upgrades and enhancements that make any reverse engineering a bad idea. When the next release breaks their reverse engineering, there's no point. This can be carried to absurd extremes, but you should offer new features that make the next release more valuable than reverse engineering.

  4. Offer customization at rates so attractive that they'd rather pay you to build and support the enhancements.

  5. Use a license key which expires. This is cruel, and will give you a bad reputation, but it certainly makes your software stop working.

  6. Offer it as a web service. SaaS involves no downloads to customers.

Katlynkatmai answered 4/11, 2008 at 12:29 Comment(5)
Point 2 is even more important. If it's cheaper buy than reverse engineering, plus yearly updates, no one will try and even if it does, no one will pay a hacker instead the provider of the software.Alderman
That's true. Reverse engineering is doable but expensive in most situations. @S.Lott, I believe point 6 holds more importance based on the question. If the source code really needs to be protected then it should be remote from the end user.Nameplate
Question: "is there a good way to protect my family and myself from being murdered by intruders in our sleep?" Internet: "No. Anyone can be gotten to, and no dwelling is ever 100 percent impenetrable. A mortal human family is the wrong tool for the job."Photophobia
Point 5 could not be applied on the same assumption that it can be reverse engineered and cracked.Virchow
This is a good question. Arguments that "no code is 100% impenetrable" do not provide enough context to support decisions for or against the use of Python. Real-world organizations must protect IP, comply with HIPAA, DoD STIGs, Zero Trust Architectures, and do Fortify scans on all produced software. Commercial companies must also make payrolls so that their developers can pay bills and afford to work for their employers. Areas where Python code can be exposed to the light of day must be carefully managed. Apps distributed as binaries must also be secured against reverse engineering.Foti
A
422

Python, being a byte-code-compiled interpreted language, is very difficult to lock down. Even if you use a exe-packager like py2exe, the layout of the executable is well-known, and the Python byte-codes are well understood.

Usually in cases like this, you have to make a tradeoff. How important is it really to protect the code? Are there real secrets in there (such as a key for symmetric encryption of bank transfers), or are you just being paranoid? Choose the language that lets you develop the best product quickest, and be realistic about how valuable your novel ideas are.

If you decide you really need to enforce the license check securely, write it as a small C extension so that the license check code can be extra-hard (but not impossible!) to reverse engineer, and leave the bulk of your code in Python.

Aga answered 4/11, 2008 at 12:0 Comment(10)
Even if the license-checking code were hard to reverse engineer because it's written in C, wouldn't it still be relatively easy to remove the calls to the license-checking code?Bedew
Yes it would, depending on where the license check is performed. If there are many calls to the extension, it could be difficult to eradicate. Or you can move some other crucial part of the application into the license check as well so that removing the call to the extension cripples the app.Aga
Really, all of this work is not about preventing modification, but about increasing its difficulty so that it's no longer worth it. Anything can be reverse-engineered and modified if there's enough benefit.Aga
@Blair Conrad: Not if the license-checking code hides functionality, too. E.g. mylicensedfunction(licenseblob liblob, int foo, int bar, std::string bash)Tweedsmuir
I think the clever way is implementing critical parts in C and implement all license checking stuff in there. (I use hardware dongle which can implement some calculation inside it. So it's almost impossible to reverse it back.)Duplessismornay
I've actually seen commercial python code shipped as embedded python inside of a C library. Instead of converting some parts of the code to C, they hide the entire python code inside a protective C layer. Then, if they want a module importable by python, they write a thin python extension on top of the C. Open source is a much easier way of life.Aerodrome
Is cython a viable alterative for this?Evolutionary
I know this is old, but I must note that storing crypto keys or anything sensitive inside an executable is a Bad Idea™. The fact that the executable is compiled to binary is irrelevant. You can simply run the strings command against it and you'll have the key right there in front of you.Mandy
A Windows fix would be to compile the file into a. Pyd and import it laterDegreeday
But using py2exe generate new bugs for you.Iritis
B
329

Python is not the tool you need

You must use the right tool to do the right thing, and Python was not designed to be obfuscated. It's the contrary; everything is open or easy to reveal or modify in Python because that's the language's philosophy.

If you want something you can't see through, look for another tool. This is not a bad thing, it is important that several different tools exist for different usages.

Obfuscation is really hard

Even compiled programs can be reverse-engineered so don't think that you can fully protect any code. You can analyze obfuscated PHP, break the flash encryption key, etc. Newer versions of Windows are cracked every time.

Having a legal requirement is a good way to go

You cannot prevent somebody from misusing your code, but you can easily discover if someone does. Therefore, it's just a casual legal issue.

Code protection is overrated

Nowadays, business models tend to go for selling services instead of products. You cannot copy a service, pirate nor steal it. Maybe it's time to consider to go with the flow...

Bullough answered 4/11, 2008 at 13:3 Comment(7)
Python is not the tool you need. Malbolge is. :)Oconner
Good answer, but "casual legal issue"? Really? Where do you live that you have any legal issues that are casual?Rosinweed
I think, if we have a frequency - how often expensive obfuscated code is hacked - we could say about practicability of using Python and obfuscated code.Lawrencelawrencium
If your code have interesting features, the one who was able to misuse it would redistribute it @MackeNameplate
How in the world would you "easily discover if someone does"?Mats
This is an opinion, not a technical answer. I agree that obfuscation doesn't mean your code is completely locked down, but it does prevent low level hacks and makes sense depending on your use case.Luigiluigino
With technologies and upgrade, everything is possible.Naturalize
J
180

Compile python and distribute binaries!

Sensible idea:

Use Cython, Nuitka, Shed Skin or something similar to compile python to C code, then distribute your app as python binary libraries (pyd) instead.

That way, no Python (byte) code is left and you've done any reasonable amount of obscurification anyone (i.e. your employer) could expect from regular Code, I think. (.NET or Java less safe than this case, as that bytecode is not obfuscated and can relatively easily be decompiled into reasonable source.)

Cython is getting more and more compatible with CPython, so I think it should work. (I'm actually considering this for our product.. We're already building some thirdparty libs as pyd/dlls, so shipping our own python code as binaries is not a overly big step for us.)

See This Blog Post (not by me) for a tutorial on how to do it. (thx @hithwen)

Crazy idea:

You could probably get Cython to store the C-files separately for each module, then just concatenate them all and build them with heavy inlining. That way, your Python module is pretty monolithic and difficult to chip at with common tools.

Beyond crazy:

You might be able to build a single executable if you can link to (and optimize with) the python runtime and all libraries (dlls) statically. That way, it'd sure be difficult to intercept calls to/from python and whatever framework libraries you use. This cannot be done if you're using LGPL code though.

Jamiejamieson answered 8/9, 2011 at 11:14 Comment(3)
#32578364Hylozoism
@mlvljr FWIW, IMHO compiling to binaries is a nice tradeoff between selling all your secrets and trying to protect against NSA-class reverse engineering. Esp if you have a big python code base and reasons to be paranoid. ;)Jamiejamieson
hithwen's POST is invalid now.Hartung
J
62

I understand that you want your customers to use the power of python but do not want expose the source code.

Here are my suggestions:

(a) Write the critical pieces of the code as C or C++ libraries and then use SIP or swig to expose the C/C++ APIs to Python namespace.

(b) Use cython instead of Python

(c) In both (a) and (b), it should be possible to distribute the libraries as licensed binary with a Python interface.

Jumper answered 6/11, 2008 at 7:41 Comment(1)
Other possibilities in the same vein: Shed Skin code.google.com/p/shedskin and Nuitka kayhayen24x7.homelinux.org/blog/nuitka-a-python-compilerSaffier
W
50

Have you had a look at pyminifier? It does Minify, obfuscate, and compress Python code. The example code looks pretty nasty for casual reverse engineering.

$ pyminifier --nonlatin --replacement-length=50 /tmp/tumult.py
#!/usr/bin/env python3
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲמּ=ImportError
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱=print
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ巡=False
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ澨=object
try:
 import demiurgic
except ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲמּ:
 ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱("Warning: You're not demiurgic. Actually, I think that's normal.")
try:
 import mystificate
except ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲמּ:
 ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱("Warning: Dark voodoo may be unreliable.")
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲﺬ=ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ巡
class ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𐦚(ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ澨):
 def __init__(self,*args,**kwargs):
  pass
 def ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ클(self,dactyl):
  ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ퐐=demiurgic.palpitation(dactyl)
  ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𠛲=mystificate.dark_voodoo(ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ퐐)
  return ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𠛲
 def ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𐠯(self,whatever):
  ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱(whatever)
if __name__=="__main__":
 ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱("Forming...")
 ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲﺃ=ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𐦚("epicaricacy","perseverate")
 ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲﺃ.ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𐠯("Codswallop")
# Created by pyminifier (https://github.com/liftoff/pyminifier)
Wachtel answered 8/5, 2015 at 10:21 Comment(7)
The good point on this, is to demoralize anyone who try to decode functionallity. Combine that with Cython and some extra crypt over modules or internet calls, and you probably got prize.Alderman
The only thing this package managed to accomplish is to fool the 'obfuscator' that the code is obfuscated.Burkett
this was making errors when i tried. i think it mishandled the data, and didn't fully convert it.Bedclothes
doesn't work for whole project, or template engine since it needs variable name to display on templateThun
This library does not seem to be maintained, and gives me indentation errors. I am using Python 3.7Luigiluigino
yep. i can confirm pyminifier is deadOriane
pyminifier may be dead, but I found this repo last push was in May 2020... (how I found it: techgaun.github.io/active-forks/index.html#liftoff/… fork seems to add other people's fixes (probably by looking at the open pull requests on the original repo)...Gaiseric
H
38

Use Cython. It will compile your modules to high-performant C files, which can then be compiled to native binary libraries. This is basically un-reversable, compared to .pyc bytecode!

I've written a detailed article on how to set up Cython for a Python project, check it out:

Protecting Python Sources With Cython

Hymenopteran answered 1/9, 2017 at 6:1 Comment(0)
C
36

Is your employer aware that he can "steal" back any ideas that other people get from your code? I mean, if they can read your work, so can you theirs. Maybe looking at how you can benefit from the situation would yield a better return of your investment than fearing how much you could lose.

[EDIT] Answer to Nick's comment:

Nothing gained and nothing lost. The customer has what he wants (and paid for it since he did the change himself). Since he doesn't release the change, it's as if it didn't happen for everyone else.

Now if the customer sells the software, they have to change the copyright notice (which is illegal, so you can sue and will win -> simple case).

If they don't change the copyright notice, the 2nd level customers will notice that the software comes from you original and wonder what is going on. Chances are that they will contact you and so you will learn about the reselling of your work.

Again we have two cases: The original customer sold only a few copies. That means they didn't make much money anyway, so why bother. Or they sold in volume. That means better chances for you to learn about what they do and do something about it.

But in the end, most companies try to comply to the law (once their reputation is ruined, it's much harder to do business). So they will not steal your work but work with you to improve it. So if you include the source (with a license that protects you from simple reselling), chances are that they will simply push back changes they made since that will make sure the change is in the next version and they don't have to maintain it. That's win-win: You get changes and they can make the change themselves if they really, desperately need it even if you're unwilling to include it in the official release.

Connie answered 4/11, 2008 at 12:27 Comment(16)
+1 for stealing ideas back. Why limit your client-serving power to your in-house solutions, when you could see how others improve on your solution and accordingly improve your own product? "If you have an apple and I have an apple and we exchange these apples then you and I will still each have one apple. But if you have an idea and I have an idea and we exchange these ideas, then each of us will have two ideas."Meatus
What, if one of your customers re-releases your code or the ideas for free and anonymously? You can't tell who did it and sue them and because they didn't get benifit from it, you won't as well. This will ruine your work while one of you customers only paid the basic price for it. (obviously only works if you have more than one customer for your solution)Provoke
@Provoke How exactly would that work? Uploading your work on the Internet doesn't harm you. It would start to harm you if a lot of people would find it AND those people would be paying customers instead. Code theft is a myth. "My knowledge is for free, my time is expensive" (not sure who said that).Connie
How would I steal anything back? They just put the code into their product and don't tell anyone how it works and just sell it. How would I every find out, that they used my code in the first place?Mats
@Mats If you never noticed how would that be different from it not happening? It simply would have no effect on you at all. So to make a case here, you must know they "stole" your code.Connie
@AaronDigulla: I don't think this applies. The other people the thief sells to, might have become my customers, but I would never know. If someone A gives B some money to deliver to me unbeknownst to me. B steals the money by keeping it. I might not be sad, because I never knew, but I still have less money then if B had kept the promise.Mats
@Mats How does obfuscation help here? They can still sell your software unchanged to anyone without sharing with you. I think a good example is MongoDB and similar services where a large group of people has put a lot of effort in and GAFA "steals" their income by selling services around the products without giving back. Being open in your product is what creates a bigger market. Being able to provide services around the product generates most of the income. The product itself is just a door opener.Connie
@AaronDigulla: Protecting your code (not necessarily obfuscation) helps with two points: 1) Code is rarely transferable 100%. Usually one needs to tweak a couple of things. Often this is very little (so it is worth changing), but without these changes, the code is unsuable in the new project. Protecting your code, protects your work from being used in projects, in which you are not involved. 2) When you sell software, then what you are actually selling is the license key (as you most likely well know). Protecting your code ensures that the mechanism for needing the key cannot be circumvented.Mats
Regarding MongoDB: You example is an example for my point: They opened up the code and at the end of the day, GAFA steals their income. You want to say that selling services is worth it. Sure, but MongoDB didn't.Mats
Regarding "My knowledge is for free, my time is expensive": No it isn't. Some projects are finished by me in a couple of days, but it took years, if not decades, to acquire that knowledge. In fact, that is (in an ideal world) the reason why someone with a long education (say a PhD) gets paid more than someone who just received training on the job (e.g. a cleaner): They are reimbursed for the unpaid years in which they are expected to have acquired a lot of knowledge, while cleaning does not require a lot of training. (This is not meant to be disrespectful to the cleaner as a human though.)Mats
@Mats MongoDB is a point for both of us. It's an example where most people feel this is "obvious" theft. Obfuscation wouldn't have helped since GAFA can use code as is. That said, it's a corner case because it's a database and those are meant to be useful as is. I agree that you're selling a license key but that key is impossible to protect from code - any code based protection can be circumvented; it's just a question of effort. Therefore, the license key is protected by a contract (software license or good old commercial product).Connie
My core argument is: Obfuscating code is usually a waste of time. Set up a software license or get a lawyer. Adding some basic protection like a license file demonstrates that you want to protect your code against "theft" which is good enough in court (they care about intention, not the height of technical hurdles).Connie
Regarding "as is"-usefulness: Yes, that may be the case here, but for many software applications in the professional context this is not the case.Mats
"It is just a question of effort" - certainly! But if the cost of the effort is larger than the cost of my actual product or service, then I have protected my code well enough, because it is more economical to pay me than to steal from me. That is the hurdle I have to set up. The license of course will have an expiration date.Mats
"Obfuscating code is usually a waste of time." - when I talk about protection I mean something like compiling Python to C and C to bytecode, not using pyminifier. When I hear "obfuscating" I think of pyminifier, not the former approach. (Maybe we misunderstand each other.) Well I think that if using Nuitka takes me a couple of hours after a project that took me a year, than it does not seem to be a lot of time spend and if I protect my code this way enough (it the mentioned economical sense), it does not seem to be a waste to me.Mats
Winning a court case is expensive and difficult and harmful to everyone involved. The moment you need lawyers, you already are in deep trouble.Mats
B
24

Do not rely on obfuscation. As You have correctly concluded, it offers very limited protection. UPDATE: Here is a link to paper which reverse engineered obfuscated python code in Dropbox. The approach - opcode remapping is a good barrier, but clearly it can be defeated.

Instead, as many posters have mentioned make it:

  • Not worth reverse engineering time (Your software is so good, it makes sense to pay)
  • Make them sign a contract and do a license audit if feasible.

Alternatively, as the kick-ass Python IDE WingIDE does: Give away the code. That's right, give the code away and have people come back for upgrades and support.

Brickyard answered 4/11, 2008 at 18:53 Comment(3)
Like this extreme idea. Gets it out there in a huge way and massive market share, then you have a very big customer base for support and addons. I have also been grappling with this question and all the "licensing" answers are basically bull because it doesn't protect against widespread copying, yet doesn't give you any market share advantage.Personification
But, the upgrades are also just give-aways... so how would they charge for that? Wouldn't it just be the support?Mats
Regarding the WingIDE business model: Support is a service, software a product. Products scale, service don't. Support is only a good business model if there is no other business model - meaning, if nobody would buy your product (for whatever reason), you give the product away, so that you have a customer base that at least buys your service.Mats
P
20

Shipping .pyc files has its problems - they are not compatible with any other python version than the python version they were created with, which means you must know which python version is running on the systems the product will run on. That's a very limiting factor.

Piperidine answered 16/2, 2009 at 21:9 Comment(2)
Yes, but not if you distribute that exact Python version with your obfuscated code.Dihydrostreptomycin
autopy2exe compiles in and ships a portable python installation with the distributable in a single <application.exe> file format. Note: also Linux-compatible. It can be complex and a pain point to manage python installations on client computers.Gymno
R
18

In some circumstances, it may be possible to move (all, or at least a key part) of the software into a web service that your organization hosts.

That way, the license checks can be performed in the safety of your own server room.

Retentive answered 4/11, 2008 at 12:29 Comment(3)
Beaware that if your licensing webserver goes down or the customers internet access is down your customer will not be happy that they can't run thier business because of loss of access to licensing checks.Pentstemon
@Pentstemon There are solutions to this. You could implement a local key mechanism that allows temporary access when the software cannot reach the remote licensing server.Anthracite
@Jeffrey: That gets you right back to where you started - how to you protect that code. To be safer, you need to put some of the key functionality on your own server, so replacing it would involve substantially effort (at which point, why not just start an open-source competitor?)Retentive
R
18

I was surprised in not seeing pyconcrete in any answer. Maybe because it's newer than the question?

It could be exactly what you need(ed).

Instead of obfuscating the code, it encrypts it and decrypts at load time.

From pypi page:

Protect python script work flow

  • your_script.py import pyconcrete
  • pyconcrete will hook import module
  • when your script do import MODULE, pyconcrete import hook will try to find MODULE.pye first and then decrypt MODULE.pye via _pyconcrete.pyd and execute decrypted data (as .pyc content)
  • encrypt & decrypt secret key record in _pyconcrete.pyd (like DLL or SO) the secret key would be hide in binary code, can’t see it directly in HEX view
Rosado answered 17/3, 2018 at 0:43 Comment(2)
> (as .pyc content) .pyc files can be decompiled, right? So, probably, it saves this .pyc files to a cache directory. Then, what is the point of all this?Eckard
the pyc content is generated in memory after decryption - the file itself is encryptedRosado
I
17

Though there's no perfect solution, the following can be done:

  1. Move some critical piece of startup code into a native library.
  2. Enforce the license check in the native library.

If the call to the native code were to be removed, the program wouldn't start anyway. If it's not removed then the license will be enforced.

Though this is not a cross-platform or a pure-Python solution, it will work.

Intrepid answered 5/11, 2008 at 6:10 Comment(2)
The native library approach makes it much easier for someone to programmatically brute force your license key system as they can use your own code and API to validate their licenses.Comfortable
So? Use RSA to sign your licence and let them brute force your private key, say consisting of 1024 bits. It is possible, but takes a lot of time... and thus - money.Diwan
B
13

The reliable only way to protect code is to run it on a server you control and provide your clients with a client which interfaces with that server.

Bridesmaid answered 4/11, 2008 at 20:27 Comment(0)
P
13

I think there is one more method to protect your Python code; part of the Obfuscation method. I believe there was a game like Mount and Blade or something that changed and recompiled their own python interpreter (the original interpreter which i believe is open source) and just changed the OP codes in the OP code table to be different then the standard python OP codes.

So the python source is unmodified but the file extensions of the *.pyc files are different and the op codes don't match to the public python.exe interpreter. If you checked the games data files all the data was in Python source format.

All sorts of nasty tricks can be done to mess with immature hackers this way. Stopping a bunch of inexperienced hackers is easy. It's the professional hackers that you will not likely beat. But most companies don't keep pro hackers on staff long I imagine (likely because things get hacked). But immature hackers are all over the place (read as curious IT staff).

You could for example, in a modified interpreter, allow it to check for certain comments or doc strings in your source. You could have special OP codes for such lines of code. For example:

OP 234 is for source line "# Copyright I wrote this" or compile that line into op codes that are equivalent to "if False:" if "# Copyright" is missing. Basically disabling a whole block of code for what appears to be some obscure reason.

One use case where recompiling a modified interpreter may be feasible is where you didn't write the app, the app is big, but you are paid to protect it, such as when you're a dedicated server admin for a financial app.

I find it a little contradictory to leave the source or opcodes open for eyeballs, but use SSL for network traffic. SSL is not 100% safe either. But it's used to stop MOST eyes from reading it. A wee bit precaution is sensible.

Also, if enough people deem that Python source and opcodes are too visible, it's likely someone will eventually develop at least a simple protection tool for it. So the more people asking "how to protect Python app" only promotes that development.

Pentstemon answered 3/7, 2012 at 17:7 Comment(0)
J
10

Depending in who the client is, a simple protection mechanism, combined with a sensible license agreement will be far more effective than any complex licensing/encryption/obfuscation system.

The best solution would be selling the code as a service, say by hosting the service, or offering support - although that isn't always practical.

Shipping the code as .pyc files will prevent your protection being foiled by a few #s, but it's hardly effective anti-piracy protection (as if there is such a technology), and at the end of the day, it shouldn't achieve anything that a decent license agreement with the company will.

Concentrate on making your code as nice to use as possible - having happy customers will make your company far more money than preventing some theoretical piracy..

Jollanta answered 4/11, 2008 at 12:53 Comment(0)
B
7

Another attempt to make your code harder to steal is to use jython and then use java obfuscator.

This should work pretty well as jythonc translate python code to java and then java is compiled to bytecode. So ounce you obfuscate the classes it will be really hard to understand what is going on after decompilation, not to mention recovering the actual code.

The only problem with jython is that you can't use python modules written in c.

Bertilla answered 5/5, 2009 at 21:53 Comment(0)
P
6

You should take a look at how the guys at getdropbox.com do it for their client software, including Linux. It's quite tricky to crack and requires some quite creative disassembly to get past the protection mechanisms.

Plainsman answered 4/11, 2008 at 12:20 Comment(1)
but the fact that it was gotten past meant that they failed - the bottom line is just don't try, but go for legal protection.Rockribbed
P
6

The best you can do with Python is to obscure things.

  • Strip out all docstrings
  • Distribute only the .pyc compiled files.
  • freeze it
  • Obscure your constants inside a class/module so that help(config) doesn't show everything

You may be able to add some additional obscurity by encrypting part of it and decrypting it on the fly and passing it to eval(). But no matter what you do someone can break it.

None of this will stop a determined attacker from disassembling the bytecode or digging through your api with help, dir, etc.

Pantalets answered 4/11, 2008 at 18:45 Comment(0)
X
5

What about signing your code with standard encryption schemes by hashing and signing important files and checking it with public key methods?

In this way you can issue license file with a public key for each customer.

Additional you can use an python obfuscator like this one (just googled it).

Xenia answered 4/11, 2008 at 12:59 Comment(2)
Signing does not work in this context. It's always possible to bypass the signature-checking loader. The first thing you need for useful software protection is an opaque bootstrap mechanism. Not something that Python makes easy.Mahratta
Or validate the licence not only on startup but in several other places. Can be easily implemented, and can severely increase the time to bypass.Diwan
S
5

Idea of having time restricted license and check for it in locally installed program will not work. Even with perfect obfuscation, license check can be removed. However if you check license on remote system and run significant part of the program on your closed remote system, you will be able to protect your IP.

Preventing competitors from using the source code as their own or write their inspired version of the same code, one way to protect is to add signatures to your program logic (some secrets to be able to prove that code was stolen from you) and obfuscate the python source code so, it's hard to read and utilize.

Good obfuscation adds basically the same protection to your code, that compiling it to executable (and stripping binary) does. Figuring out how obfuscated complex code works might be even harder than actually writing your own implementation.

This will not help preventing hacking of your program. Even with obfuscation code license stuff will be cracked and program may be modified to have slightly different behaviour (in the same way that compiling code to binary does not help protection of native programs).

In addition to symbol obfuscation might be good idea to unrefactor the code, which makes everything even more confusing if e.g. call graphs points to many different places even if actually those different places does eventually the same thing.

Logical signature inside obfuscated code (e.g. you may create table of values which are used by program logic, but also used as signature), which can be used to determine that code is originated from you. If someone decides to use your obfuscated code module as part of their own product (even after reobfuscating it to make it seem different) you can show, that code is stolen with your secret signature.

September answered 7/6, 2010 at 5:7 Comment(0)
H
5

Neiher Cython nor Nuitka were not the answer, because when running the solution that is compiled with Nuitka or Cython into .pyd or .exe files a cache directory is generated and all .pyc files are copied into the cache directory, so an attacker simply can decompile .pyc files and see your code or change it.

Homesick answered 27/6, 2022 at 6:15 Comment(1)
Edit: You can use following code to prevent cache directory generation: import sys sys.dont_write_bytecode = trueHomesick
P
4

I have looked at software protection in general for my own projects and the general philosophy is that complete protection is impossible. The only thing that you can hope to achieve is to add protection to a level that would cost your customer more to bypass than it would to purchase another license.

With that said I was just checking google for python obsfucation and not turning up a lot of anything. In a .Net solution, obsfucation would be a first approach to your problem on a windows platform, but I am not sure if anyone has solutions on Linux that work with Mono.

The next thing would be to write your code in a compiled language, or if you really want to go all the way, then in assembler. A stripped out executable would be a lot harder to decompile than an interpreted language.

It all comes down to tradeoffs. On one end you have ease of software development in python, in which it is also very hard to hide secrets. On the other end you have software written in assembler which is much harder to write, but is much easier to hide secrets.

Your boss has to choose a point somewhere along that continuum that supports his requirements. And then he has to give you the tools and time so you can build what he wants. However my bet is that he will object to real development costs versus potential monetary losses.

Poulard answered 4/11, 2008 at 12:28 Comment(0)
C
3

It is possible to have the py2exe byte-code in a crypted resource for a C launcher that loads and executes it in memory. Some ideas here and here.

Some have also thought of a self modifying program to make reverse engineering expensive.

You can also find tutorials for preventing debuggers, make the disassembler fail, set false debugger breakpoints and protect your code with checksums. Search for ["crypted code" execute "in memory"] for more links.

But as others already said, if your code is worth it, reverse engineers will succeed in the end.

Crier answered 22/5, 2013 at 14:53 Comment(0)
W
2

Use the same way to protect binary file of c/c++, that is, obfuscate each function body in executable or library binary file, insert an instruction "jump" at the begin of each function entry, jump to special function to restore obfuscated code. Byte-code is binary code of Python script, so

  • First compile python script to code object
  • Then iterate each code object, obfuscate co_code of each code object as the following
    0   JUMP_ABSOLUTE            n = 3 + len(bytecode)

    3
    ...
    ... Here it's obfuscated bytecode
    ...

    n   LOAD_GLOBAL              ? (__pyarmor__)
    n+3 CALL_FUNCTION            0
    n+6 POP_TOP
    n+7 JUMP_ABSOLUTE            0
  • Save obfuscated code object as .pyc or .pyo file

Those obfuscated file (.pyc or .pyo) can be used by normal python interpreter, when those code object is called first time

  • First op is JUMP_ABSOLUTE, it will jump to offset n

  • At offset n, the instruction is to call a PyCFunction. This function will restore those obfuscated bytecode between offset 3 and n, and put the original byte-code at offset 0. The obfuscated code can be got by the following code

        char *obfucated_bytecode;
        Py_ssize_t len;
        PyFrameObject* frame = PyEval_GetFrame();
        PyCodeObject *f_code = frame->f_code;
        PyObject *co_code = f_code->co_code;      
        PyBytes_AsStringAndSize(co_code, &obfucated_bytecode, &len)
    
  • After this function returns, the last instruction is to jump to offset 0. The really byte-code now is executed.

There is a tool Pyarmor to obfuscate python scripts by this way.

Widdershins answered 14/12, 2017 at 0:35 Comment(0)
K
2

There is a comprehensive answer on concealing the python source code, which can be find here.

Possible techniques discussed are:
- use compiled bytecode (python -m compileall)
- executable creators (or installers like PyInstaller)
- software as an service (the best solution to conceal your code in my opinion)
- python source code obfuscators

Kava answered 17/12, 2019 at 6:40 Comment(1)
It seems that the bytecode could be easily uncompiled using uncompyle6.Cryogenics
S
1

If we focus on software licensing, I would recommend to take a look at another Stack Overflow answer I wrote here to get some inspiration of how a license key verification system can be constructed.

There is an open-source library on GitHub that can help you with the license verification bit.

You can install it by pip install licensing and then add the following code:

pubKey = "<RSAKeyValue><Modulus>sGbvxwdlDbqFXOMlVUnAF5ew0t0WpPW7rFpI5jHQOFkht/326dvh7t74RYeMpjy357NljouhpTLA3a6idnn4j6c3jmPWBkjZndGsPL4Bqm+fwE48nKpGPjkj4q/yzT4tHXBTyvaBjA8bVoCTnu+LiC4XEaLZRThGzIn5KQXKCigg6tQRy0GXE13XYFVz/x1mjFbT9/7dS8p85n8BuwlY5JvuBIQkKhuCNFfrUxBWyu87CFnXWjIupCD2VO/GbxaCvzrRjLZjAngLCMtZbYBALksqGPgTUN7ZM24XbPWyLtKPaXF2i4XRR9u6eTj5BfnLbKAU5PIVfjIS+vNYYogteQ==</Modulus><Exponent>AQAB</Exponent></RSAKeyValue>"

res = Key.activate(token="WyIyNTU1IiwiRjdZZTB4RmtuTVcrQlNqcSszbmFMMHB3aWFJTlBsWW1Mbm9raVFyRyJd",\
                   rsa_pub_key=pubKey,\
                   product_id=3349, key="ICVLD-VVSZR-ZTICT-YKGXL", machine_code=Helpers.GetMachineCode())

if res[0] == None not Helpers.IsOnRightMachine(res[0]):
    print("An error occured: {0}".format(res[1]))
else:
    print("Success")

You can read more about the way the RSA public key, etc are configured here.

Squeak answered 29/1, 2019 at 12:47 Comment(0)
C
0

First, here is a list of public Github repositories regarding obfuscation.

Among all choices out there I ended up using python-obfuscator followed by python-minifier to decrease also the script size. Be careful though to obfuscate first and then minify in order for the script to be still executable.

Calcareous answered 9/4, 2023 at 4:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.