TopHome
<2024-07-24 Wed>systems

Packing NaNs for fun

After going through Bartosz's excellent blog on all things floating point, I had the same thought as the linked blog "the secret life of nans". I wanted to quickly experiment with this concept, so here it is.

Specifically, the idea is this. The IEEE 754 specification calls out how a float is to be represented and how the bits are to be interpreted. For eg, with 32 bits, we have 1 sign bit, 23 bits of significant and 8 bits of exponent. The linked blog goes into this deeply, which I cannot match, so I recommend you go through the entire blog and come back here.

As mentioned there, the space dedicated for NaNs is surprisingly large. Unlike infinity, which is where all exponent bits are set to 1 and all significand bits are set to 0. Then, the sign bit determines plus or minus inf. So, we have 2 values set aside from the full space of values for Inf.

But, for NaNs, instead, the rule is as follows. If the exponent has all 1's, and the significant has atleaset on 1, the number is considered as a NaN. The Spec does not care what those bits are, the number is just labelled as a NaN.

Think about this, there is not 1 NaN, but over 8 million different NaNs. If there is a float considered as a NaN, it actually has 23 bits of information sitting there, just ignored.

The second blog, links to how some use can be extracted out of this, called NaN boxing. I wanted to do something similar for fun, so here it is, in Python.

import struct
import pickle

struct is used for all things binary packing/unpacking in python. (I have used it before here to construct custom NTP packets by hand) pickle is to optionally write down the encoded floats into disk or for transmitting.

def encode(msg):
    msg_bytes = bytes(msg, encoding="ascii")

    out = []

    i = 0
    while i < len(msg_bytes):
        data = b'\xff\xf0' 
        pack = [msg_bytes[i]]
        if i+1 == len(msg_bytes):
            pack.append(0)
        else:
            pack.append(msg_bytes[i+1])
        data += bytes(pack)
        res = struct.unpack(">f", data)
        out.append(res[0])
        i += 2

    return out

Here:

  1. We need to construct a valid float out of 4 bytes. Since we have the rules for NaN clear (all exponent bits set to 1, at least 1 significand bit set to 1), and the first blog makes the order of the bits clear, we need the "sign" and exponent all upfront - the first 9 bits (in big-endian format). This means, we start out with `z11111111` where z can be 0 or 1.
  2. Since this is 9 bits, and we need 1 more bit to be set to 1 and I don't care about efficiency of packing here, we can simply round 11 bits to 16 and set the first 2 bytes to 0xfff0 which will solve all of our requirements.
  3. The remaining 2 bytes are free - the payload. This is where we will store our characters into.
  4. You can extend this logic to double, where you will have more space.
  5. Since we construct this in big-endian, you need to explicitly set the endianness in the unpack function.
  6. Since we are packing in 2 bytes, if we don't have any information - I just pack in the NUL 0x00.
>>> encode("Hi")
[nan]
>>> encode("Hi there.")
[nan, nan, nan, nan, nan]

Our super secret encoding now only outputs an array of NaNs. But, not to be concerned, hidden within those bits is our message, which we can decode.

def decode(msg):
    out = bytearray()
    for f in msg:
        res = struct.pack(">f", f)
        out += res[2:]

    if out[-1] == 0:
        out.pop()
    return out.decode("ascii")

Decoding is a straightforward reverse of the original.

>>> a = encode("Hi there.")
>>> a
[nan, nan, nan, nan, nan]
>>> decode(a)
'Hi there.'

You can also write out the encoded floats into a pickle file, transmit it, etc, since we are not doing anything special within.

msg = "Hi there..."

enc_msg = encode(msg)

with open("out.bin", 'wb') as f:
    pickle.dump(enc_msg, f)

with open("out.bin", 'rb') as f:
    obj = pickle.load(f)
    print("Decoded: ", decode(obj))

Now that we have used the NaN payload slots, we could have done something similar with actual numbered floats too. But, like I said, this is just a fun hack.