Convert python disassembly from dis.dis back to codeobject
Asked Answered
W

1

6

Is there any way to create code object from its disassembly acquired with dis.dis?

For example, I compiled some code using co = compile('print("lol")', '<string>', 'exec') and then printed disassembly using dis.dis(co), and now I want to "compile" disassembly back to codeobject (since it holds all the same data and nothing is lost).

Wsan answered 29/6, 2019 at 12:19 Comment(2)
dis.dis disassembles the bytecode in a human-readable form. It is meant to be read, not acted upon. (dis.dis does printing, and returns None).Parasiticide
@heemayl, the question is if that human-readable form can be assembled back. I understand the purpose of dis.dis.Wsan
P
5

Amazingly, yes there is - sort of.

However there are a number of caveats you need to understand. The first caveat is that Python bytecode, and by extension assembly instructions, can change every release. The second caveat to understand is that simply the information emitted by dis.dis() in text form is incomplete with respect to what the Python interpreter needs. So you'd need a way to somehow fill in the missing information.

I have written a bytecode assembler which converts a text file assembly similar to what you have above into a python bytecode.

In your example you have a code object rather than the full information needed to create a bytecode file, but the guts of xasm of course creates the code objects before writing them out with the additional information needed in a bytecode file. This is done in function create_code() of https://github.com/rocky/python-xasm/blob/master/xasm/assemble.py

To see the difference between what is in a code object and how that fits into a Python bytecode file, I'll use your example and then finish with how to create a bytecode file.

If I run your example in Python 3.6.10, I get:

  1           0 LOAD_NAME                0 (print)
              2 LOAD_CONST               0 ('lol')
              4 CALL_FUNCTION            1
              6 POP_TOP
              8 LOAD_CONST               1 (None)
             10 RETURN_VALUE

But if I put your Python code into a file, say foo.py, byte compile it using py_compile.compile(source, bytecode, source) and the use xdis's cross-version Python disassembler pydisasm I get:

  # pydisasm version 4.2.4
  # Python bytecode 3.6 (3379)
  # Disassembled from Python 3.6.10 (default, Jan 23 2020, 16:43:38) 
  # [GCC 7.4.0]
  # Timestamp in code: 1586703495 (2020-04-12 10:58:15)
  # Source code size mod 2**32: 13 bytes
  # Method Name:       <module>
  # Filename:          foo.py
  # Argument count:    0
  # Kw-only arguments: 0
  # Number of locals:  0
  # Stack size:        2
  # Flags:             0x00000040 (NOFREE)
  # First Line:        1
  # Constants:
  #    0: 'lol'
  #    1: None
  # Names:
  #    0: print
    1:           0 LOAD_NAME                 0 (print)
                 2 LOAD_CONST                0 ('lol')
                 4 CALL_FUNCTION             1
                 6 POP_TOP
                 8 LOAD_CONST                1 (None)
                10 RETURN_VALUE

Notice that in a bytecode file there is a bit of additional information that is not in strictly the code object:

  • which bytecode is being used, (3.6 with magic number 3379),
  • a timestamp of when the code was created,
  • a size (mod 2**32) of the source code,
  • a method name,
  • a filename,
  • parameters to the code,
  • method flags, and
  • names of various sorts: constants, variables.

Now let's put of this to a file like foo2.pyasm. To write that into a bytecode file simply run pyc-xasm:

  $ pyc-xasm foo2.pyasm
  Wrote foo2.pyc
  $ python foo2.pyc
  lol

I gave a demonstration of all of this in my 2018 lighting talk at PyColumbia 2018

I should note that until the next release of xasm and xdis, Python 3.7 and above don't work, but 3.6 and earlier do.

Philae answered 12/4, 2020 at 15:46 Comment(2)
Impressive, but the question implies that only the dis.dis ouput is available, not the code object or even source to disassemble it with another tool. Does this approach work with the bare dis.dis output as well?Sirajuddaula
Yes, it does - sort of. One of the things I was trying to convey is that the information given by dis.dis() isn't a complete thing in of itself, more stuff needs to be known in order for it to be run. In particular you need to know what parameters are passed to the code object, code flags, the maximum stack needed and so on. Function create_code() github.com/rocky/python-xasm/blob/… is the portion that handles just the code object.Philae

© 2022 - 2024 — McMap. All rights reserved.