Python Bytecode
Some useful links - in no order:
- https://devguide.python.org/internals/interpreter/
- https://docs.python.org/3/library/dis.html
- https://stackoverflow.com/questions/59431770/why-cant-python-dis-module-disassembly-this-pyc-file
- https://docs.python.org/3/reference/datamodel.html#code-objects
- https://nedbatchelder.com/blog/200804/the_structure_of_pyc_files.html
To compile a py file to pyc manually:
python -m py_compile <file>.py
will create a .pyc
file in __pycache__
folder.
To disassemble a pyc file:
import sys import dis import marshal with open(sys.argv[1], 'rb') as f: f.seek(16) co = marshal.load(f) dis.dis(co)
Simply call this script with a "pyc" file. Otherwise, you can also use the online godbolt tool.
List of the Python bytecode Opcodes can be found here: https://docs.python.org/3/library/dis.html. Do note that unlike VM, the Python one is not specified and is not guaranteed to remain stable. In fact, it changes even in minor version updates!
If you see the 2 byte op-codes, you will see that they don't have any registers referenced. Instead, the Python VM is a Stack Based one - where each operation pops args off a global stack and pushes results back in. (You can get a first hand idea of this by trying to program in Forth, where normal "functions" are all stack based.)
See also, other Stack Machines: https://en.wikipedia.org/wiki/Stack_machine#Virtual_stack_machines
Some other common Stack machine VMs are Java and WASM. On the other hand Dalvik, BEAM and Lua (> 5.0) are all register based.
On this topic, Tailbiter is an excellent blog post on writing a compiler for Python in Python - by translating the AST into bytecode directly. In some sense, Byterun is the reverse of this - it is an interpreter (written in Python) for the Python bytecode.