TopHome
<2024-09-23 Mon>pythontech

Python Bytecode

Some useful links - in no order:

  1. https://devguide.python.org/internals/interpreter/
  2. https://docs.python.org/3/library/dis.html
  3. https://stackoverflow.com/questions/59431770/why-cant-python-dis-module-disassembly-this-pyc-file
  4. https://docs.python.org/3/reference/datamodel.html#code-objects
  5. https://nedbatchelder.com/blog/200804/the_structure_of_pyc_files.html

To compile a py file to pyc manually:

python -m py_compile <file>.py

will create a .pyc file in __pycache__ folder.

To disassemble a pyc file:

import sys
import dis
import marshal

with open(sys.argv[1], 'rb') as f:
    f.seek(16)
    co = marshal.load(f)
    dis.dis(co)

Simply call this script with a "pyc" file. Otherwise, you can also use the online godbolt tool.

List of the Python bytecode Opcodes can be found here: https://docs.python.org/3/library/dis.html. Do note that unlike VM, the Python one is not specified and is not guaranteed to remain stable. In fact, it changes even in minor version updates!

If you see the 2 byte op-codes, you will see that they don't have any registers referenced. Instead, the Python VM is a Stack Based one - where each operation pops args off a global stack and pushes results back in. (You can get a first hand idea of this by trying to program in Forth, where normal "functions" are all stack based.)

See also, other Stack Machines: https://en.wikipedia.org/wiki/Stack_machine#Virtual_stack_machines

Some other common Stack machine VMs are Java and WASM. On the other hand Dalvik, BEAM and Lua (> 5.0) are all register based.

On this topic, Tailbiter is an excellent blog post on writing a compiler for Python in Python - by translating the AST into bytecode directly. In some sense, Byterun is the reverse of this - it is an interpreter (written in Python) for the Python bytecode.