How to create a code object in python?
Asked Answered
L

3

25

I'd like to create a new code object with the function types.CodeType() .
There is almost no documentation about this and the existing one says "not for faint of heart"
Tell me what i need and give me some information about each argument passed to types.CodeType ,
possibly posting an example.

Note:
In normal use cases you will just need the builtin function compile()
You should use types.CodeType() only if you want to create new instructions that couldn't be obtained writing normal source code and that require direct access to bytecode.

Ludovico answered 17/4, 2013 at 15:40 Comment(4)
Why are you trying this? It might be easier to accomplish via other means...Dole
#6612949Jaws
FWIW, if it's not mentioned in the language reference, that means that the arguments it expects could be implementation dependentDole
One wouldn't normally directly run the constructor. Instead, one would write code (or construct an Abstract Syntax Tree from AST nodes), then use compile().Princedom
L
56

–––––––––––
Disclaimer :
Documentation in this answer is not official and may be incorrect.

This answer is valid only for python version 3.x

–––––––––––

In order to create a code object you have to pass to the function CodeType() the following arguments:

CodeType(
        argcount,             #   integer
        kwonlyargcount,       #   integer
        nlocals,              #   integer
        stacksize,            #   integer
        flags,                #   integer
        codestring,           #   bytes
        consts,               #   tuple
        names,                #   tuple
        varnames,             #   tuple
        filename,             #   string
        name,                 #   string
        firstlineno,          #   integer
        lnotab,               #   bytes
        freevars,             #   tuple
        cellvars              #   tuple
        )

Now i will try to explain what is the meaning of each argument.

argcount
Number of arguments to be passed to the function (*args and **kwargs are not included).

kwonlyargcount
Number of keyword-only arguments.

nlocals
Number of local variables ,
namely all variables and parameters(*args and **kwargs included) except global names.

stacksize The amount of stack (virtual machine stack) required by the code ,
if you want to understand how it works , see official Documentation.

flags
A bitmap that says something about the code object:
1 –> code was optimized
2 –> newlocals: there is a new local namespace(for example a function)
4 –> the code accepts an arbitrary number of positional arguments (*args is used)
8 –> the code accepts an arbitrary number of keyworded arguments (*kwargs is used)
32 –> the code is a generator

othes flags are used in older python versions or are activated to say what is imported from __ future __

codestring
A sequence of bytes representing bytecode instructions
if you want a better understanding , see Documentation (same as above)

consts
A tuple containing literals used by the bytecode (for example pre-computed numbers, tuples,and strings)

names
A tuple containing names used by the bytecode
this names are global variables, functions and classes or also attributes loaded from objects

varnames
A tuple containing local names used by the bytecode (arguments first, then local variables)

filename
It is the filename from which the code was compiled.
It can be whatever you want,you are free to lie about this. ;)

name
It gives the name of the function. Also this can be whatever you want,but be careful:
this is the name shown in the traceback,if the name is unclear,the traceback could be unclear,
just think about how lambdas can be annoying.

firstlineno
The first line of the function (for debug purpose if you compiled source code)

lnotab
A mapping of bytes that correlates bytecode offsets to line numbers.
(i think also this is for debug purpose,there is few documentation about this)

freevars
A tuple containing the names of free variables.
Free variables are variables declared in the namespace where the code object was defined, they are used when nested functions are declared;
this doesn't happen at module level because in that case free variables are also global variables.

cellvars
A tuple containing names of local variables referenced by nested functions.

––––––––––––
Examples :
following examples should clarify the meaning of what has been said above.

Note: in finished code objects attributes mentioned above have the co_ prefix,
and a function stores its executable body in the __code__ attribute

––––––––––––
1st Example

def F(a,b):
    global c
    k=a*c
    w=10
    p=(1,"two",3)

print(F.__code__.co_argcount)
print(F.__code__.co_nlocals , F.__code__.co_varnames)
print(F.__code__.co_stacksize)
print(F.__code__.co_flags)
print(F.__code__.co_names)
print(F.__code__.co_consts)

Output:

2
5 ('a', 'b', 'k', 'w', 'p')
3
67
('c' ,)
(None, 10, 1, 'two'. 3, (1, 'two', 3))
  1. there are two arguments passed to this function ("a","b")

  2. this function has two parameters("a","b") and three local variables("k","w","p")

  3. disassembling the function bytecode we obtain this:

    3         0 LOAD_FAST                0 (a)             #stack:  ["a"] 
              3 LOAD_GLOBAL              0 (c)             #stack:  ["a","c"]
              6 BINARY_MULTIPLY                            #stack:  [result of a*c]
              7 STORE_FAST               2 (k)             #stack:  []
    
    4        10 LOAD_CONST               1 (10)            #stack:  [10]
             13 STORE_FAST               3 (w)             #stack:  []
    
    5        16 LOAD_CONST               5 ((1, 'two', 3)) #stack:  [(1,"two",3)]
             19 STORE_FAST               4 (p)             #stack:  []
             22 LOAD_CONST               0 (None)          #stack:  [None]
             25 RETURN_VALUE                               #stack:  []
    

    as you can notice chile executing the function we never have more than three elements in the stack (tuple counts as its lenght in this case)

  4. flag's value is dec 67 = bin 1000011 = bin 1000000 +10 +1 = dec 64 +2 +1 ,so we understand that

    • the code is optimized(as most of the automatically generated code is)
    • while executing the function bytecode local namespace changes
    • 64? Actually i don't know what is its meaning
  5. the only global name that is used in the function is "c" , it is stored in co_names

  6. every explicit literal we use is stored in co_consts:

    • None is the return value of the function
    • we explicitly assign the number 10 to w
    • we explicitly assign (1, 'two', 3) to p
    • if the tuple is a constant each element of that tuple is a constant,so 1,"two",3 are constants

––––––––––––
2nd example

ModuleVar="hi"

def F():
    FunctionVar=106
    UnusedVar=ModuleVar

    def G():
        return (FunctionVar,ModuleVar)

    print(G.__code__.co_freevars)
    print(G.__code__.co_names)

F()
print(F.__code__.co_cellvars)
print(F.__code__.co_freevars)
print(F.__code__.co_names)

Output:

('FunctionVar',)
('ModuleVar',)
('FunctionVar',)
()
('print', '__code__', 'co_freevars', 'co_names', 'ModuleVar')

the meaning of the output is this:

first and second line are printed when F is executed,so they show co_freevars and co_names of G code:
"FunctionVar" is in the namespace of F function,where G was created,
"ModuleVar" instead is a module variable,so it is considered as global.

following three lines are about co_cellvars,co_freevars and co_names attributes of F code:
"FunctionVar" is referenced in the G nested function ,so it is marked as a cellvar,
"ModuleVar" is in the namespace where F was created,but it is a module variable,
so it is not marked as freevar,but it is found in global names.
also the builtin function print is marked in names , and all the names of attributes used in F.

––––––––––––
3rd example

This is a working code object initialization,
this is unuseful but you can do everything you want with this function.

MyCode= CodeType(
        0,
        0,
        0,
        3,
        64,
        bytes([101, 0, 0,    #Load print function
               101, 1, 0,    #Load name 'a'
               101, 2, 0,    #Load name 'b'
               23,           #Take first two stack elements and store their sum
               131, 1, 0,    #Call first element in the stack with one positional argument
               1,            #Pop top of stack
               101, 0, 0,    #Load print function
               101, 1, 0,    #Load name 'a'
               101, 2, 0,    #Load name 'b'
               20,           #Take first two stack elements and store their product
               131, 1, 0,    #Call first element in the stack with one positional argument
               1,            #Pop top of stack
               100, 0, 0,    #Load constant None
               83]),         #Return top of stack
        (None,),
        ('print', 'a', 'b'),
        (),
        'PersonalCodeObject',
        'MyCode',
        1,
        bytes([14,1]),
        (),
        () )

a=2
b=3
exec(MyCode) # code prints the sum and the product of "a" and "b"

Output:

5
6
Ludovico answered 20/4, 2013 at 17:26 Comment(4)
Flag 64 appears to be NOFREE (according to output of dis.code_info).Opaque
in core of python for anyone who wants to dig deeper.Amoy
Alberto, i've got a question when you wrote co_code element of your 3rd example. When you use operation, listID, 0 in your bytes function did you use listID,0 to emulate little endian or does 0 really stand for something ?Amoy
it seems that you have to fill in 16 arguments in types.CodeType with python 3.12Lamb
K
9

Example usage of the CodeType constructor may be found in the standard library, specifically Lib/modulefinder.py. If you look there, you'll see it being used to redefine the read-only co_filename attribute on all the code objects in a file.

I recently ran into a similar use case where I had a function factory, but the generated functions always had the "generic" name in the traceback, so I had to regenerate the code objects to contain the desired name.

>>> def x(): raise NotImplementedError
...
>>> x.__name__
'x'
>>> x.__name__ = 'y'
>>> x.__name__
'y'
>>> x()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in x
NotImplementedError

>>> x.__code__.co_name
'x'
>>> x.__code__.__name__ = 'y'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: readonly attribute

>>> 'Gah!'
'Gah!'

But, wait, the function's __code__ member is not read-only, so we can do what the modulefinder does:

>>> from types import CodeType
>>> co = x.__code__
>>> x.__code__ = CodeType(co.co_argcount, co.co_kwonlyargcount,
             co.co_nlocals, co.co_stacksize, co.co_flags,
             co.co_code, co.co_consts, co.co_names,
             co.co_varnames, co.co_filename,
             'MyNewCodeName',
             co.co_firstlineno, co.co_lnotab, co.co_freevars,
             co.co_cellvars)
>>> x.__code__.co_name
'MyNewCodeName'
>>> x()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in MyNewCodeName
NotImplementedError

The thing to note in this example is that the traceback uses the co_name attribute, not the func.__name__ attribute when producing values in the stack trace.

One more note: The above is Python 3, to make it Python 2 compatible, just leave out the second argument to the constructor (co_kwonlyargcount).

UPDATE: Victor Stinner added a new method, 'replace', to the CodeType class in Python 3.8, which simplifies the situation quite considerably. This was done to eliminate future compatibility issues, as 3.8 also added a new 'co_posonlyargcount' argument into the call list after 'co_argcount', so at least your 3.8 and later code will be somewhat future proofed if the argument list changes again.

>>> x.__code__ = x.__code__.replace(co_name='MyNewCodeName')
Kermis answered 16/10, 2016 at 0:31 Comment(1)
my exact use-case! that replace is a lifesaver.Ginetteginevra
M
3

At the request of @KarlKnechtel, this answer is meant to supplement @AlbertoPerrella's answer with update-to-date (as of the time of this writing, for Python 3.11) information and documentation/sources of the CodeType constructor, the bytecode, and the new CodeType.replace method.

First of all, as the OP suggests, the official documentation for types.CodeType is extremely limited, with no real signature:

class types.CodeType(**kwargs)

So we have to piece together available information ourselves by looking at different but relevant sources.

For brief but reasonable explanations of all attributes of a CodeType object, we can find them in the code portion of the table in the Types and members section of the documentation for the inspect module.

However, that documentation has the attribute names listed in alphabetical order, rather than in the order the CodeType constructor expects them as positional arguments.

Since types.CodeType is defined in types.py as simply the type of the __code__ attribute of an ad-hoc function, with no signature definition written Python:

def _f(): pass
CodeType = type(_f.__code__)

the only definitive way to obtain the real signature of CodeType is to look at its reference implemention in CPython, where all arguments are nicely validated in the argument clinic for the code constructor by their positions and expected types.

For example, from this line:

argcount = PyLong_AsInt(PyTuple_GET_ITEM(args, 0));

we can see that the first argument (position 0) is argcount as an int object because PyLong_AsInt is called.

And from these lines:

if (!PyBytes_Check(PyTuple_GET_ITEM(args, 6))) {
    _PyArg_BadArgument("code", "argument 7", "bytes", PyTuple_GET_ITEM(args, 6));
    goto exit;
}
code = PyTuple_GET_ITEM(args, 6);

we can see that the 7th argument (position 6) is code as a bytes object because PyBytes_Check is called, and ditto for the rest of the arguments.

Unfortunately, if we go through the arguments actually validated by the argument clinic, we'll notice some discrepancies with what's listed in the aforementioned inspect documentation, namely that the inspect doc is missing co_linetable and co_exceptiontable, and that the argument clinic no longer has the documented co_lnotab. With a quick search, we can find that per PEP-626, co_linetable has replaced co_lnotab, which is now generated on-the-fly by the new co_lines method, and that per issue 47236, co_exceptiontable is now required to accompany co_code to hold exception handling information.

While issue 47236 helpfully points to the documentation of the format of co_exceptiontable here, PEP-626 on the other hand refrains from documenting the format of co_linetable, which it claims will remain "opaque":

The co_linetable attribute will hold the line number information. The format is opaque, unspecified and may be changed without notice. The attribute is public only to support creation of new code objects.

But a look at the GitHub changeset commited for an implementation of PEP-626 reveals that the format of co_linetable is actually well documented in lnotab_notes.txt, and then later for Python 3.11, in locations.md.

With all of the above pieces of information from various sources, we can now put together a full list of arguments in the positional order expected by the CodeType constructor, along with their types and descriptions:

Position Argument Type Description
0 co_argcount int number of arguments (not including keyword only arguments, * or ** args)
1 co_posonlyargcount int number of positional only arguments
2 co_kwonlyargcount int number of keyword only arguments (not including ** arg)
3 co_nlocals int number of local variables
4 co_stacksize int virtual machine stack space required
5 co_flags int bitmap of CO_* flags, read more here
6 co_code bytes string of raw compiled bytecode
7 co_consts tuple tuple of constants used in the bytecode
8 co_names tuple tuple of names other than arguments and function locals
9 co_varnames tuple tuple of names of arguments and local variables
10 co_filename str name of file in which this code object was created
11 co_name str name with which this code object was defined
12 co_qualname str fully qualified name with which this code object was defined
13 co_firstlineno int number of first line in Python source code
14 co_linetable bytes bytecode address-to-line information encoded in a format specified here
15 co_exceptiontable bytes exception handling information encoded in a format specified here
16 co_freevars tuple tuple of names of free variables (referenced via a function’s closure)
17 co_cellvars tuple tuple of names of cell variables (referenced by containing scopes)

As for the documentation of bytecode, we can find it nicely laid out in the Python Bytecode Instructions section of the documentation of the dis module, with the option to switch to the documentation for an earlier Python version with the drop-down menu at the top of the page, while the actual mapping of each bytecode to its respective integer value can be found in _opcode_metadata.py.

Finally, although there has been an attempt to document the CodeType.replace method in bpo-37032, the end result in the official documentation of types is a very brief description with no examples:

replace(**kwargs)

Return a copy of the code object with new values for the specified fields.

Fortunately, we can find a good usage example in a unit test here:

def new_code(c):
    '''A new code object with a __class__ cell added to freevars'''
    return c.replace(co_freevars=c.co_freevars + ('__class__',), co_code=bytes([COPY_FREE_VARS, 1])+c.co_code)

As can be seen, the usage of CodeType.replace is fairly straightforward, that all of the arguments that the CodeType constructor takes can be passed to the replace method with a direct replacement value. The idea is to eliminate the need to rebuild a new CodeType object with a long list of arguments from an existing CodeType object when only a few of the attributes need changed, so that code written for an older Python version does not need to be refactored when parameters are added to or removed from the CodeType constructor in a new Python version.

Hope this helps.

Menticide answered 22/9, 2023 at 7:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.