2.1.3. Configuring structs#
When working with binary data, it’s essential to account for how the data is ordered, particularly when dealing with multi-byte fields. The endianess of the data specifies the byte order: either big-endian (most significant byte first) or little-endian (least significant byte first). In this section, we’ll look at how to configure the endianess for structs in Caterpillar.
What is endianess?
You might find these resources helpful: Mozilla Docs, StackOverflow or Wikipedia
Let’s take a look at another chunk from the PNG format: the pHYS chunk. It contains two 4-byte unsigned integers that represent pixel density. Since PNG files use big-endian encoding for integers, we need to configure the struct to handle this correctly.
@struct(order=BigEndian) # <-- extra argument to apply the order to all fields.
class PHYSChunk:
pixels_per_unit_x: uint32 # <-- same definition as above
pixels_per_unit_y: uint32
unit: uint8 # <-- endianess meaningless, only one byte
In both cases, the pixels_per_unit_x and pixels_per_unit_y fields are 4 bytes long,
so they will be interpreted using big-endian encoding. The unit field is only 1 byte, so
endianess doesn’t affect it.
In addition to configuring the endianess, you can also specify the architecture associated
with the struct using the Arch class with the arch keyword.
Challenge
You can try to implement the struct for the tIME chunk as a challenge.
Solution
Example implementation
1@struct(order=BigEndian)
2class TIMEChunk:
3 year: uint16 # <-- we could also use: BigEndian + uint16
4 month: uint8
5 day: uint8
6 hour: uint8
7 minute: uint8
8 second: uint8
As you can see, the struct is fairly simple. The year field is 2 bytes, and the rest are
single-byte fields. By applying BigEndian or BIG_ENDIAN to the struct,
we ensure that the fields that require endian configuration are handled correctly.