Why compile Python code?
Asked Answered
E

10

298

Why would you compile a Python script? You can run them directly from the .py file and it works fine, so is there a performance advantage or something?

I also notice that some files in my application get compiled into .pyc while others do not, why is this?

Energy answered 22/1, 2009 at 22:57 Comment(3)
You may also note that, including the faster startup of your application, you also gain in security, if you can't share your code if it's a corporate secret.Oleneolenka
@PSyLoCKe You really, really don't. Python bytecode is really readable, because the compiler doesn't need to obfuscate it to optimise it. (Not that it optimises it much...)Faires
The reason some files get compiled automatically is because they are imported; for instance, if you use import mylib.py, Python will compile mylib.py so that future import statements run a little faster. If you later change mylib.py, then it will get re-compiled next time it is imported (Python uses the file date to see that this happens.)Roswald
M
327

It's compiled to bytecode which can be used much, much, much faster.

The reason some files aren't compiled is that the main script, which you invoke with python main.py is recompiled every time you run the script. All imported scripts will be compiled and stored on the disk.

Important addition by Ben Blank:

It's worth noting that while running a compiled script has a faster startup time (as it doesn't need to be compiled), it doesn't run any faster.

Museology answered 22/1, 2009 at 23:6 Comment(10)
It's worth noting that while running a compiled script has a faster startup time (as it doesn't need to be compiled), it doesn't run any faster.Alben
In addition to not requiring compilation, the .pyc file is almost invariably smaller. Especially if you comment a lot. One of mine is 28419 as .py, but only 17879 as .pyc -- so load time is better as well. Finally, you can precompile top level scripts this way: python -m compileall myscript.pyRoswald
Is there any difference in memory consumption? I'm testing Python on embedded devices based on mips cpu with only 64MB of RAM, so is there any advantage in memory usage when starting a compiled version of python script?Tried
@valentt: Probably not. I don't know much about the Python internals, but I don't think that parsing to bytecode takes a lot of memory in Python. I cannot think of something that needs a lot of memory to remember some state.Headset
@BenBlank would there be a difference in run speed if you are dynamically importing a module, such as at the beginning of a function. I ask because when I had a circular import between two django models in different apps (there was a many to many relationship between the two and each had properties that did aggregations and such on the other) and I am unsure whether it is all compiled and loaded into memory anyway, or if calling that function would actually take longer with uncompiled Python.Hagler
@Hagler — Python imports modules only when the actual import line is encountered, so the hit for compiling would be incurred when the function is first executed. That's only the first time after the import's code has changed, however; it won't recompile as long as an up-to-date .pyc exists for it.Alben
@BenBlank so the instant it encounters it when the function is run it will create a .pyc and always use that in future?Hagler
@Hagler — Until and unless the import's .py file changes, yes.Alben
IMHO, the Important addition should be mentioned first as the first sentence of this answer is a little bit misleadingArawak
@valentt: Even if it doesn't compile to byte code on disk (because it's the main script, or because you used a switch or environment variable to disable generation of .pyc files), it still compiles to byte code in memory, so memory-wise it makes little difference; if it loads from the source file, it briefly uses a little more memory while compiling it, then keeps the compiled bytecode either way.Fluorescent
S
102

The .pyc file is Python that has already been compiled to byte-code. Python automatically runs a .pyc file if it finds one with the same name as a .py file you invoke.

"An Introduction to Python" says this about compiled Python files:

A program doesn't run any faster when it is read from a ‘.pyc’ or ‘.pyo’ file than when it is read from a ‘.py’ file; the only thing that's faster about ‘.pyc’ or ‘.pyo’ files is the speed with which they are loaded.

The advantage of running a .pyc file is that Python doesn't have to incur the overhead of compiling it before running it. Since Python would compile to byte-code before running a .py file anyway, there shouldn't be any performance improvement aside from that.

How much improvement can you get from using compiled .pyc files? That depends on what the script does. For a very brief script that simply prints "Hello World," compiling could constitute a large percentage of the total startup-and-run time. But the cost of compiling a script relative to the total run time diminishes for longer-running scripts.

The script you name on the command-line is never saved to a .pyc file. Only modules loaded by that "main" script are saved in that way.

Skyscraper answered 22/1, 2009 at 23:14 Comment(1)
In many cases it's hard to see a difference, but I have a particular python file with over 300,000 lines. (It's a bunch of math calculations generated by another script for testing) It takes 37 seconds to compile, and only 2 seconds to execute.Remediosremedy
R
75

Pluses:

First: mild, defeatable obfuscation.

Second: if compilation results in a significantly smaller file, you will get faster load times. Nice for the web.

Third: Python can skip the compilation step. Faster at intial load. Nice for the CPU and the web.

Fourth: the more you comment, the smaller the .pyc or .pyo file will be in comparison to the source .py file.

Fifth: an end user with only a .pyc or .pyo file in hand is much less likely to present you with a bug they caused by an un-reverted change they forgot to tell you about.

Sixth: if you're aiming at an embedded system, obtaining a smaller size file to embed may represent a significant plus, and the architecture is stable so drawback one, detailed below, does not come into play.

Top level compilation

It is useful to know that you can compile a top level python source file into a .pyc file this way:

python -m py_compile myscript.py

This removes comments. It leaves docstrings intact. If you'd like to get rid of the docstrings as well (you might want to seriously think about why you're doing that) then compile this way instead...

python -OO -m py_compile myscript.py

...and you'll get a .pyo file instead of a .pyc file; equally distributable in terms of the code's essential functionality, but smaller by the size of the stripped-out docstrings (and less easily understood for subsequent employment if it had decent docstrings in the first place). But see drawback three, below.

Note that python uses the .py file's date, if it is present, to decide whether it should execute the .py file as opposed to the .pyc or .pyo file --- so edit your .py file, and the .pyc or .pyo is obsolete and whatever benefits you gained are lost. You need to recompile it in order to get the .pyc or .pyo benefits back again again, such as they may be.

Drawbacks:

First: There's a "magic cookie" in .pyc and .pyo files that indicates the system architecture that the python file was compiled in. If you distribute one of these files into an environment of a different type, it will break. If you distribute the .pyc or .pyo without the associated .py to recompile or touch so it supersedes the .pyc or .pyo, the end user can't fix it, either.

Second: If docstrings are skipped with the use of the -OO command line option as described above, no one will be able to get at that information, which can make use of the code more difficult (or impossible.)

Third: Python's -OO option also implements some optimizations as per the -O command line option; this may result in changes in operation. Known optimizations are:

  • sys.flags.optimize = 1
  • assert statements are skipped
  • __debug__ = False

Fourth: if you had intentionally made your python script executable with something on the order of #!/usr/bin/python on the first line, this is stripped out in .pyc and .pyo files and that functionality is lost.

Fifth: with option -O, as well as -OO, assert statements are not compiled in, eliminating a source of runtime validation. You can compensate for this by using try except but this requires abandoning the assert statement for use in anything that will be compiled.

Sixth: somewhat obvious, but if you compile your code, not only can its use be impacted, but the potential for others to learn from your work is reduced, often severely.

Roswald answered 23/4, 2014 at 22:26 Comment(2)
"the more you comment, the smaller" ;))Cleric
A colloquialism. ;) Guilty.Roswald
C
19

Something not touched upon is source-to-source-compiling. For example, nuitka translates Python code to C/C++, and compiles it to binary code which directly runs on the CPU, instead of Python bytecode which runs on the slower virtual machine.

This can lead to significant speedups, or it would let you work with Python while your environment depends on C/C++ code.

Corollaceous answered 23/12, 2017 at 19:30 Comment(0)
E
11

There is a performance increase in running compiled python. However when you run a .py file as an imported module, python will compile and store it, and as long as the .py file does not change it will always use the compiled version.

With any interpeted language when the file is used the process looks something like this:
1. File is processed by the interpeter.
2. File is compiled
3. Compiled code is executed.

obviously by using pre-compiled code you can eliminate step 2, this applies python, PHP and others.

Heres an interesting blog post explaining the differences http://julipedia.blogspot.com/2004/07/compiled-vs-interpreted-languages.html
And here's an entry that explains the Python compile process http://effbot.org/zone/python-compile.htm

Enki answered 22/1, 2009 at 23:3 Comment(0)
I
9

As already mentioned, you can get a performance increase from having your python code compiled into bytecode. This is usually handled by python itself, for imported scripts only.

Another reason you might want to compile your python code, could be to protect your intellectual property from being copied and/or modified.

You can read more about this in the Python documentation.

Indorse answered 22/1, 2009 at 23:9 Comment(4)
In regards to protecting your code - compiling won't help a whole lot. Compiling obfuscates - but someone with the desire will get your code regardless.Thermodynamic
@josh that is always possible, if one can access the memory or watch the instructions to the cpu, with enough time and will they can re construct your app.Enki
Agreed, however as Unkwntech said, that will always be possible, if the person is determined enough. But I'm convinced it will suffice in most situations, where you typically just want to restrict people from "fixing" your code...Indorse
Languages that are compiled to bytecode are generally not all that hard to reverse-compile unless you take extra steps to obfuscate them - merely compiling generally won't be sufficient.Fricassee
U
7

There's certainly a performance difference when running a compiled script. If you run normal .py scripts, the machine compiles it every time it is run and this takes time. On modern machines this is hardly noticeable but as the script grows it may become more of an issue.

Undercharge answered 22/1, 2009 at 23:2 Comment(0)
B
4

We use compiled code to distribute to users who do not have access to the source code. Basically to stop inexperienced programers accidentally changing something or fixing bugs without telling us.

Bloke answered 15/6, 2015 at 16:14 Comment(0)
A
2

Yep, performance is the main reason and, as far as I know, the only reason.

If some of your files aren't getting compiled, maybe Python isn't able to write to the .pyc file, perhaps because of the directory permissions or something. Or perhaps the uncompiled files just aren't ever getting loaded... (scripts/modules only get compiled when they first get loaded)

Atlas answered 22/1, 2009 at 23:3 Comment(0)
S
1

Beginners assume Python is compiled because of .pyc files. The .pyc file is the compiled bytecode, which is then interpreted. So if you've run your Python code before and have the .pyc file handy, it will run faster the second time, as it doesn't have to re-compile the bytecode

compiler: A compiler is a piece of code that translates the high level language into machine language

Interpreters: Interpreters also convert the high level language into machine readable binary equivalents. Each time when an interpreter gets a high level language code to be executed, it converts the code into an intermediate code before converting it into the machine code. Each part of the code is interpreted and then execute separately in a sequence and an error is found in a part of the code it will stop the interpretation of the code without translating the next set of the codes.

Sources: http://www.toptal.com/python/why-are-there-so-many-pythons http://www.engineersgarage.com/contribution/difference-between-compiler-and-interpreter

Squally answered 8/1, 2014 at 14:13 Comment(2)
Your definition of "compiler" is incorrect. A compiler has never been under to compile to machine code. A compiler is merely a translator from one language to another. This is why we say that Python "compiles" to bytecode, Coffeescript "compiles" to Javascript, and so on and so forth.Mistral
I always use the term „compilation“ to refer to translation to a lower level language and „transpilation“ for translation between languages of similar level.Mezzotint

© 2022 - 2024 — McMap. All rights reserved.