Protobuf Infodump
I am getting more into the habit of infodumping, so here is another one in this series, this time on Protobuf.
The official Protobuf library does not have C versions, but it is easy to find third party implementations:
I also found this interesting C++ version https://github.com/mapbox/protozero which claims to be high performance version by operating at a lower level than other libraries. Their tutorial is particularly interesting - it seems that you have to manually examine tags and cast the data into the right form.
Also look at the Nim version: https://github.com/PMunch/protobuf-nim which is an in-language approach that doesn't need any external compilers etc.
While playing with the Nim version, I faced some problem in decoding Protobufs from an external source. Trying to debug that led me to protoscope.
This tool is pretty useful. My workflow became:
- Try to deserialize the message from my Nim program. This didn't work.
- Dump a hex representation of the protobuf message binary from Nim into a file.
xxd -r -ps <file with hex data> | protoscope
to output Protoscope's understanding of the message.- If the above step fails (ie, the output does not match what you expect), you know that the message itself is incorrectly formed - ie, not a valid protobuf message.
In the good scenario, you would get the skeleton structure from protoscope, something like the following:
1: { 1: { 1: {"__name__"} 2: {"sample_metric"} } 1: { 1: {"key"} 2: {"value1"} } 2: { 1: 289.0 # 0x4072100000000000i64 2: 1692785281000 } }
What is this?
- There is one top level field with tag 1. This is a submessage.
- Inside it are 2 submessages of tag 1 and one of tag 2.
- The inner submessages are obvious.
(In case this looks vaguely familiar, this is the protobuf structure used in Prometheus Remote Writes.)
Finally, protoscope
can regenerate messages from this skeleton format.
- Save the above output to a text file.
- Make changes.
protoscope -s <filename> | xxd -p > output.txt
will give you the hex dump of new proto message.
One more thing. I had 2 such hexdump files which I wanted to compare. Normal diff
doesn't really work. git-diff
to the rescue: you don't have to be in a git repo to use it.
git diff --no-index --word-diff=color --word-diff-regex=. ver1.txt ver2.txt
shows a character level difference between the 2 text files (in this case containing data in hex format) with the nice green/red syntax.
While on this topic, I also found the following slightly unrelated things:
- https://github.com/dbcode/protobuf-nginx: protobuf encoding/decoding for Nginx modules using Nginx specific data structures.
- This MICRO'21 paper from Google about Protobuf deserialization in Hardware: https://dl.acm.org/doi/pdf/10.1145/3466752.3480051