Option 1:
You can always create a code object from the module source code using compile
. To do that within the module itself:
myvar = 1
print('hello from top level')
def myfunction():
print('hello from function')
import inspect, sys
c = compile(inspect.getsource(sys.modules[__name__]), "mymodule", "exec")
print(c)
print(c.co_consts[1].upper())
Output:
hello from top level
<code object <module> at 0xcafef00d, file "mymodule", line 1>
HELLO FROM TOP LEVEL
This is probably your most sensible option, assuming the original source code still exists and is available to inspect
.
Option 2:
To access the identical code object of the module itself, rather than recreating a new one, I've found an "off-label" way by intentionally causing an exception within the module and handling it:
myvar = 1
print('hello from top level')
def myfunction():
print('hello from function')
try:
errorerrorerror
except NameError as e:
c = e.__traceback__.tb_frame.f_code
print(c)
print(c.co_consts[:2])
Output:
hello from top level
<code object <module> at 0xdeadbeef, file "/path/to/mymodule.py", line 1>
(1, 'hello from top level')
This is sort of a diabolical trick and I wouldn't be surprised if it breaks on other implementations of Python, or even in future versions of CPython. But it works in some additional cases where inspecting the source code will not. See also Marcin's answer, which does similar using inspect.currentframe()
.
Option 3:
The import system caches the bytecode of an imported source file when creating the module instance, so that it doesn't need to be recompiled unless necessary. Option 3 involves loading the bytecode of an imported module from that cache. It only works for modules that can be imported, i.e. it won't work for scripts (files executed as __main__
like python myfile.py
) nor in several other situations where the cached bytecode is unavailable for whatever reason (e.g. PYTHONDONTWRITEBYTECODE is enabled, Python was run with the -B option, the filesystem for laying down a cached .pyc file was read-only, etc)
In your file:
# mymodule.py
myvar = 1
print('hello from top level')
def myfunction():
print('hello from function')
From "outside":
>>> import mymodule
hello from top level
>>> import marshal
>>> with open(mymodule.__cached__, 'rb') as f:
... f.read(16)
... c = marshal.load(f)
...
b'\xcb\r\r\n\x00\x00\x00\x00\xe3\xa7|ej\x00\x00\x00'
>>> c.co_consts
(1, 'hello from top level', <code object myfunction at 0xdefeca7e, file "/path/to/mymodule.py", line 5>, None)
The statement import mymodule
will compile the module and lay down a bytecode cache at /path/to/__pycache__/mymodule.cpython-312.pyc
, if a valid cache wasn't already existing there, or if the existing cache was stale (i.e. the source was since modified). This bytecode filename is specific to the Python minor-version and implementation, so the filename will be different if you're not on CPython 3.12. It also may be in a different location if PYTHONPYCACHEPREFIX was set.
Those 16 bytes "discarded" with f.read(16)
are the .pyc header:
- 4 bytes magic number, indicating the Python minor version.
- 4 bytes flags, specifying the bytecode invalidation mode.
- Either a serialized version of the original
.py
file's info (8 bytes mtime + 8 bytes size) if using the TIMESTAMP invalidation mode (which is the default), otherwise a 16 bytes SipHash of the original .py
file's source code.
The marshaled code object follows the .pyc header. It can be easily deserialized into a code object as demonstrated.
__main__
, but for "normal" modules, you can read an unmarshal their__cached__
file to get a code object. Demo one-liner:import datetime; import io; import marshal; import pathlib; print(marshal.load(io.BytesIO(pathlib.Path(datetime.__cached__).read_bytes()[16:])).co_consts)
– Hypothesizesys._getframe(0).f_code
would get the top script. I'm not sure about other modules. Once a module import completes, its code block will never run again. It would be reasonable for python to discard it. – Euripusinspect.currentframe().f_code
, as already demonstrated in Marcin's answer. – Taxiplanesys._getframe(0)
refers to the top of the stack, i.e. the current frame. I wouldn't call it a shortcut, I'd call it an implementation detail. – Taxiplane__main__
particularly), or afterwards (as makes sense for functions whose__code__
is set when they are created)? – Langouste