In Python is there a way to get the code object of top level code?
Asked Answered
D

2

19

Is it possible to get the code object of top level code within a module? For example, if you have a python file like this:

myvar = 1
print('hello from top level')

def myfunction():
    print('hello from function')

and you want to access the code object for myfunction, then you can use myfunction.__code__. For example, myfunction.__code__.co_consts will contain the string 'hello from function' etc...

Is there a way to get the code object for the top level code? That is, for the code:

myvar = 1

print('hello from top level')

I would like something like __main__.__code__.co_consts that will contain 'hello from top level', but I cannot find any way to get this. Does such a thing exist?

Decarbonate answered 15/12, 2023 at 18:20 Comment(8)
I'm not sure about __main__, but for "normal" modules, you can read an unmarshal their __cached__ file to get a code object. Demo one-liner: import datetime; import io; import marshal; import pathlib; print(marshal.load(io.BytesIO(pathlib.Path(datetime.__cached__).read_bytes()[16:])).co_consts)Hypothesize
Interesting information here. In cpython sys._getframe(0).f_code would get the top script. I'm not sure about other modules. Once a module import completes, its code block will never run again. It would be reasonable for python to discard it.Euripus
@Euripus That is just an undocumented way to do inspect.currentframe().f_code, as already demonstrated in Marcin's answer.Taxiplane
@Taxiplane - depends on where you are in the stack. You could do currentframe, and then get its outer frames and then grab the one you want from there. This is a shortcut.Euripus
@Euripus It's the same thing. the "0" in sys._getframe(0) refers to the top of the stack, i.e. the current frame. I wouldn't call it a shortcut, I'd call it an implementation detail.Taxiplane
@win - oh, got that backwards. The answer I highlight showed both ways. The whole thing is an implementation detail. Even having a frame object at all is an implementation detail.Euripus
Are you asking about getting this while the module is executing (as makes sense for __main__ particularly), or afterwards (as makes sense for functions whose __code__ is set when they are created)?Langouste
Yes, I was asking about getting this while the module is executing. sys._getframe(0).f_code and Marcin's answer both worked for me. Thank you all!Decarbonate
P
19

The code that is executed at the top level of a module is not directly accessible as a code object in the same way that functions' code objects are, because the top-level code is executed immediately when the module is imported or run, and it doesn't exist as a separate entity like a function does.

But when Python runs a script, it compiles it first to bytecode and stores it in a code object. The top-level code (__main__ module), have a code object, but it is not directly exposed, so you need to use inspect module to dig deeper:

import inspect

def get_top_level_code_object():
    frame = inspect.currentframe()

    # Go back to the top-level frame
    while frame.f_back:
        frame = frame.f_back

    # The code object is stored in f_code
    return frame.f_code

if __name__ == "__main__":
    top_level_code_obj = get_top_level_code_object()
    print(top_level_code_obj.co_consts) 

would yield

(0, None, <code object get_top_level_code_object at 0x7f970ad658f0, file "/tmp/test.py", line 3>, '__main__')
Paphian answered 15/12, 2023 at 18:51 Comment(0)
T
10

Option 1:

You can always create a code object from the module source code using compile. To do that within the module itself:

myvar = 1
print('hello from top level')

def myfunction():
    print('hello from function')

import inspect, sys
c = compile(inspect.getsource(sys.modules[__name__]), "mymodule", "exec")
print(c)
print(c.co_consts[1].upper())

Output:

hello from top level
<code object <module> at 0xcafef00d, file "mymodule", line 1>
HELLO FROM TOP LEVEL

This is probably your most sensible option, assuming the original source code still exists and is available to inspect.

Option 2:

To access the identical code object of the module itself, rather than recreating a new one, I've found an "off-label" way by intentionally causing an exception within the module and handling it:

myvar = 1
print('hello from top level')

def myfunction():
    print('hello from function')

try:
    errorerrorerror
except NameError as e:
    c = e.__traceback__.tb_frame.f_code

print(c)
print(c.co_consts[:2])

Output:

hello from top level
<code object <module> at 0xdeadbeef, file "/path/to/mymodule.py", line 1>
(1, 'hello from top level')

This is sort of a diabolical trick and I wouldn't be surprised if it breaks on other implementations of Python, or even in future versions of CPython. But it works in some additional cases where inspecting the source code will not. See also Marcin's answer, which does similar using inspect.currentframe().

Option 3:

The import system caches the bytecode of an imported source file when creating the module instance, so that it doesn't need to be recompiled unless necessary. Option 3 involves loading the bytecode of an imported module from that cache. It only works for modules that can be imported, i.e. it won't work for scripts (files executed as __main__ like python myfile.py) nor in several other situations where the cached bytecode is unavailable for whatever reason (e.g. PYTHONDONTWRITEBYTECODE is enabled, Python was run with the -B option, the filesystem for laying down a cached .pyc file was read-only, etc)

In your file:

# mymodule.py
myvar = 1
print('hello from top level')

def myfunction():
    print('hello from function')

From "outside":

>>> import mymodule
hello from top level
>>> import marshal
>>> with open(mymodule.__cached__, 'rb') as f:
...     f.read(16)
...     c = marshal.load(f)
... 
b'\xcb\r\r\n\x00\x00\x00\x00\xe3\xa7|ej\x00\x00\x00'
>>> c.co_consts
(1, 'hello from top level', <code object myfunction at 0xdefeca7e, file "/path/to/mymodule.py", line 5>, None)

The statement import mymodule will compile the module and lay down a bytecode cache at /path/to/__pycache__/mymodule.cpython-312.pyc, if a valid cache wasn't already existing there, or if the existing cache was stale (i.e. the source was since modified). This bytecode filename is specific to the Python minor-version and implementation, so the filename will be different if you're not on CPython 3.12. It also may be in a different location if PYTHONPYCACHEPREFIX was set.

Those 16 bytes "discarded" with f.read(16) are the .pyc header:

  • 4 bytes magic number, indicating the Python minor version.
  • 4 bytes flags, specifying the bytecode invalidation mode.
  • Either a serialized version of the original .py file's info (8 bytes mtime + 8 bytes size) if using the TIMESTAMP invalidation mode (which is the default), otherwise a 16 bytes SipHash of the original .py file's source code.

The marshaled code object follows the .pyc header. It can be easily deserialized into a code object as demonstrated.

Taxiplane answered 15/12, 2023 at 18:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.