1.3. Configuring structs#
When working with binary data, it’s essential to account for how the data is ordered, particularly when dealing with multi-byte fields. The endianess of the data specifies the byte order: either big-endian (most significant byte first) or little-endian (least significant byte first). In this section, we’ll look at how to configure the endianess for structs in Caterpillar.
What is endianess?
You might find these resources helpful: Mozilla Docs, StackOverflow or Wikipedia
Let’s take a look at another chunk from the PNG format: the pHYS chunk. It contains two 4-byte unsigned integers that represent pixel density. Since PNG files use big-endian encoding for integers, we need to configure the struct to handle this correctly.
@struct(order=BigEndian) # <-- extra argument to apply the order to all fields.
class PHYSChunk:
pixels_per_unit_x: uint32 # <-- same definition as above
pixels_per_unit_y: uint32
unit: uint8 # <-- endianess meaningless, only one byte
@struct(endian=BIG_ENDIAN) # <-- extra argument to apply the order to all fields.
class PHYSChunk:
pixels_per_unit_x: u32 # <-- same definition as above
pixels_per_unit_y: u32
unit: u8 # <-- endianess meaningless, only one byte
Note
Even though, there will be <le uint32>
visible in the annotations
of the class, the created struct stores the modified big endian integer
atom.
In both cases, the pixels_per_unit_x
and pixels_per_unit_y
fields are 4 bytes long,
so they will be interpreted using big-endian encoding. The unit
field is only 1 byte, so
endianess doesn’t affect it.
In addition to configuring the endianess, you can also specify the architecture associated
with the struct using the Arch
class with the arch
keyword.
Challenge
You can try to implement the struct for the tIME chunk as a challenge.
Solution
Example implementation
1@struct(order=BigEndian)
2class TIMEChunk:
3 year: uint16 # <-- we could also use: BigEndian + uint16
4 month: uint8
5 day: uint8
6 hour: uint8
7 minute: uint8
8 second: uint8
1@struct(endian=BIG_ENDIAN)
2class TIMEChunk:
3 year: u16 # <-- we could also use: BIG_ENDIAN + u16
4 month: u8
5 day: u8
6 hour: u8
7 minute: u8
8 second: u8
As you can see, the struct is fairly simple. The year field is 2 bytes, and the rest are
single-byte fields. By applying BigEndian
or BIG_ENDIAN
to the struct,
we ensure that the fields that require endian configuration are handled correctly.