1.3. Configuring structs#

When working with binary data, it’s essential to account for how the data is ordered, particularly when dealing with multi-byte fields. The endianess of the data specifies the byte order: either big-endian (most significant byte first) or little-endian (least significant byte first). In this section, we’ll look at how to configure the endianess for structs in Caterpillar.

What is endianess?

You might find these resources helpful: Mozilla Docs, StackOverflow or Wikipedia

Let’s take a look at another chunk from the PNG format: the pHYS chunk. It contains two 4-byte unsigned integers that represent pixel density. Since PNG files use big-endian encoding for integers, we need to configure the struct to handle this correctly.

Configuring a struct-wide endianess#
@struct(order=BigEndian)        # <-- extra argument to apply the order to all fields.
class PHYSChunk:
    pixels_per_unit_x: uint32   # <-- same definition as above
    pixels_per_unit_y: uint32
    unit: uint8                 # <-- endianess meaningless, only one byte
Configuring a struct-wide endianess#
@struct(endian=BIG_ENDIAN)   # <-- extra argument to apply the order to all fields.
class PHYSChunk:
    pixels_per_unit_x: u32   # <-- same definition as above
    pixels_per_unit_y: u32
    unit: u8                 # <-- endianess meaningless, only one byte

Note

Even though, there will be <le uint32> visible in the annotations of the class, the created struct stores the modified big endian integer atom.

In both cases, the pixels_per_unit_x and pixels_per_unit_y fields are 4 bytes long, so they will be interpreted using big-endian encoding. The unit field is only 1 byte, so endianess doesn’t affect it.

In addition to configuring the endianess, you can also specify the architecture associated with the struct using the Arch class with the arch keyword.

Challenge

You can try to implement the struct for the tIME chunk as a challenge.

Solution

Example implementation

1@struct(order=BigEndian)
2class TIMEChunk:
3    year: uint16        # <-- we could also use: BigEndian + uint16
4    month: uint8
5    day: uint8
6    hour: uint8
7    minute: uint8
8    second: uint8
1@struct(endian=BIG_ENDIAN)
2class TIMEChunk:
3    year: u16        # <-- we could also use: BIG_ENDIAN + u16
4    month: u8
5    day: u8
6    hour: u8
7    minute: u8
8    second: u8

As you can see, the struct is fairly simple. The year field is 2 bytes, and the rest are single-byte fields. By applying BigEndian or BIG_ENDIAN to the struct, we ensure that the fields that require endian configuration are handled correctly.