If Python is interpreted, what are .pyc files?

P

12

1317

Python is an interpreted language. But why does my source directory contain .pyc files, which are identified by Windows as "Compiled Python Files"?

Pratt answered 8/6, 2010 at 14:27 Comment(5)

See #11434079 for a justification. In one word: speed. – Weswesa 19/8, 2015 at 19:49

@GregSchmit You're right that this question is not the duplicate, but MrBultitude is correct that the timing is irrelevant. "Usually a recent question will be closed as a duplicate of an older question, but this isn't an absolute rule. The general rule is to keep the question with the best collection of answers, and close the other one as a duplicate." – Gasman 3/10, 2017 at 22:45

Also see “All programs are interpreted”. How? – Tarsier 7/12, 2017 at 7:57

Does it mean that even python has that 'Write once, run anywhere' just like Java.? – Gingrich 28/2, 2019 at 8:30

@MrakVladar Even Java is "Write once, run anywhere [that you have a JVM]". Python is no different; it's "run anywhere you have a Python virtual machine". The big difference is that most Python implementations combine the compiler and the interpreter into one executable, rather than separating them like java and javac. – Lecia 7/11, 2019 at 16:37

L

768

They contain byte code, which is what the Python interpreter compiles the source to. This code is then executed by Python's virtual machine.

Python's documentation explains the definition like this:

Python is an interpreted language, as opposed to a compiled one, though the distinction can be blurry because of the presence of the bytecode compiler. This means that source files can be run directly without explicitly creating an executable which is then run.

Lordan answered 8/6, 2010 at 14:28 Comment(14)

Interesting, thanks. So is Python considered a purely interpreted language? – Pratt 8/6, 2010 at 14:37

@froadie: a language is not "interpreted" or "compiled" as such. A specific implementation can be an interpreter or a compiler (or a hybrid or a JIT compiler). – Sparing 8/6, 2010 at 14:41

One test of 'compiled': is it compiled to actual machine instructions? Python bytecode are not machine instructions, and neither are Java 'JVM' instructions, so neither of these languages are compiled by that definition. But both 'compiled' to an intermediate 'abstract machine' code, and both are far, faster than running the program by more or less directly interpreting the source code (which is what old-school BASIC does). – Cristen 5/4, 2013 at 18:11

To be pedantic, 'compiled' means 'translated'. Python is then compiled to a bytecode. AFAIK, only Bash is really interpreted , all other popular "interpreted" languages are all compiled to a bytecode. – Rausch 6/8, 2014 at 13:42

Actually, they are machine instructions, just not native machine instructions for the host's physical CPU. Hence why we call it a VM ? Like Esperanto for assembly language really. Nowadays we even have native code for fictional (but still emulated) CPU's (Mojang's effort to get the kiddies interested). Rexx was (or could be) truly interpreted, and BAT and CMD (and DCL) are interpreted. – Starfish 13/12, 2014 at 10:38

I am lookin at some python sources, and I see file.py and file.pyc and file.pyo. I need to do a quick fix without debugging. Can I just change file.py, or I need to "compile" and regenerate all? – Pastime 25/3, 2015 at 9:26

@Danijel: when importing a module, python automatically detects if the .py file has been modified and automatically recompile a new .pyc/.pyo as necessary. In most cases, you would never need to worry about the managing the .pyc/.pyo files. – Cooley 28/5, 2015 at 10:53

@Lordan - So if I have a git repository of Python files, I should ignore all *.pyc files (no need to keep them) , or shell it be more "efficient" to keep them around ? – Ygerne 10/11, 2017 at 15:26

@GuyAvraham: Typically you want to add a line to your .gitignore to prevent it from tracking *.py[co] (and git rm any you've already committed so they don't appear if someone else clones it). The cost of tracking them in git far exceeds the benefits. The cost of compiling from source to bytecode, including all the disk I/O, is typically in the single digit millisecond range (a test of calling py_compile.compile on Python's built-in _collections_abc.py outputting to a junk file took about 8 ms for a 26 KB file). Paying that cost for a few dozen files once after cloning is trivial. – Borgia 26/2, 2019 at 20:2

@Cristen Nuitka can compile the entirety Python to machine code via machine-dependent C++, and Cython can similarly compile Python to portable C code. – Photostat 8/8, 2019 at 16:46

@GuyAvraham more important than efficiency (.pyc files generally take milliseconds to generate), you don't want to put a file into source control which is built directly from another file or files, which are also in source control. Because such a file is not "source". Specific issues (a) how will you make sure it's always consistent with .py (b) the format of .pyc depends on the python release you're using (c) Whenever you merge changes from two different branches to a .py, git will want to you manually resolve the conflict on the binary .pyc file. – Cristen 1/11, 2019 at 13:47

@bfontaine, does that mean that all python *.py code is first "translated" to the *.pyc byte code and then interpreted (run to do its job)? – Hessney 17/12, 2019 at 7:2

@Hessney Yes, although that *.pyc file is not always written to disk. – Rausch 18/12, 2019 at 9:16

@Rausch .py is compiled and/or loaded when PVM encounters import(not "all .py codes first are translated to .pyc"). – Koontz 28/8, 2020 at 16:7

S

1173

I've been given to understand that Python is an interpreted language...

This popular meme is incorrect, or, rather, constructed upon a misunderstanding of (natural) language levels: a similar mistake would be to say "the Bible is a hardcover book". Let me explain that simile...

"The Bible" is "a book" in the sense of being a class of (actual, physical objects identified as) books; the books identified as "copies of the Bible" are supposed to have something fundamental in common (the contents, although even those can be in different languages, with different acceptable translations, levels of footnotes and other annotations) -- however, those books are perfectly well allowed to differ in a myriad of aspects that are not considered fundamental -- kind of binding, color of binding, font(s) used in the printing, illustrations if any, wide writable margins or not, numbers and kinds of builtin bookmarks, and so on, and so forth.

It's quite possible that a typical printing of the Bible would indeed be in hardcover binding -- after all, it's a book that's typically meant to be read over and over, bookmarked at several places, thumbed through looking for given chapter-and-verse pointers, etc, etc, and a good hardcover binding can make a given copy last longer under such use. However, these are mundane (practical) issues that cannot be used to determine whether a given actual book object is a copy of the Bible or not: paperback printings are perfectly possible!

Similarly, Python is "a language" in the sense of defining a class of language implementations which must all be similar in some fundamental respects (syntax, most semantics except those parts of those where they're explicitly allowed to differ) but are fully allowed to differ in just about every "implementation" detail -- including how they deal with the source files they're given, whether they compile the sources to some lower level forms (and, if so, which form -- and whether they save such compiled forms, to disk or elsewhere), how they execute said forms, and so forth.

The classical implementation, CPython, is often called just "Python" for short -- but it's just one of several production-quality implementations, side by side with Microsoft's IronPython (which compiles to CLR codes, i.e., ".NET"), Jython (which compiles to JVM codes), PyPy (which is written in Python itself and can compile to a huge variety of "back-end" forms including "just-in-time" generated machine language). They're all Python (=="implementations of the Python language") just like many superficially different book objects can all be Bibles (=="copies of The Bible").

If you're interested in CPython specifically: it compiles the source files into a Python-specific lower-level form (known as "bytecode"), does so automatically when needed (when there is no bytecode file corresponding to a source file, or the bytecode file is older than the source or compiled by a different Python version), usually saves the bytecode files to disk (to avoid recompiling them in the future). OTOH IronPython will typically compile to CLR codes (saving them to disk or not, depending) and Jython to JVM codes (saving them to disk or not -- it will use the .class extension if it does save them).

These lower level forms are then executed by appropriate "virtual machines" also known as "interpreters" -- the CPython VM, the .Net runtime, the Java VM (aka JVM), as appropriate.

So, in this sense (what do typical implementations do), Python is an "interpreted language" if and only if C# and Java are: all of them have a typical implementation strategy of producing bytecode first, then executing it via a VM/interpreter.

More likely the focus is on how "heavy", slow, and high-ceremony the compilation process is. CPython is designed to compile as fast as possible, as lightweight as possible, with as little ceremony as feasible -- the compiler does very little error checking and optimization, so it can run fast and in small amounts of memory, which in turns lets it be run automatically and transparently whenever needed, without the user even needing to be aware that there is a compilation going on, most of the time. Java and C# typically accept more work during compilation (and therefore don't perform automatic compilation) in order to check errors more thoroughly and perform more optimizations. It's a continuum of gray scales, not a black or white situation, and it would be utterly arbitrary to put a threshold at some given level and say that only above that level you call it "compilation"!-)

Sonority answered 8/6, 2010 at 15:0 Comment(4)

Beautiful answer. Just a small correction to the last paragraph: Python is designed to compile as fast as possible (etc.). This time it really is the language, with its lack of static type system and stuff. When people talk about "interpreted" languages, they usually mean "dynamic" languages. – Twigg 14/12, 2015 at 21:58

@Elazar, actually, other implementations of Python, such as PyPy, which are in no hurry to compile, manage to do the more thorough analysis required by the lack of static typing and produce just-in-time compilation to machine code (thus speeding up long-running programs by many times). – Sonority 15/12, 2015 at 1:42

Where does Cython fit in here? Would you consider it a different language or is it a Python implementation? Also, is this meme of "interpreted" vs compiled perhaps just a terminology confusion because Python's VM is often referred to as its "interpreter"? It would be just as valid to call the JVM or the .NET runtime interpreters. They both mostly interpret bytecode into JIT machine code (with some caching optimization exceptions) – Albumen 6/4, 2018 at 12:28

Great answer, but I think the last paragraph could do a better job of emphasizing the main reason why Python is commonly considered an "interpreted" language: it's about the user (developer) experience - no separate build step. You execute your source code file(s) (with the Python executable as the interpreter) and immediately have a process doing what your source code says to do. And this is the default behavior - the normal or even standard behavior for Python implementations. You do vaguely gesture at this with "ceremony", but only vaguely while talking about the "work" aspect. – Caltanissetta 9/1, 2023 at 0:38