2.3. Byte Sequences#
When working with binary data, sometimes you need to deal with raw byte sequences. Caterpillar provides several structs to handle these byte sequences efficiently, whether they are stored in memory, byte arrays, or prefixed with length information.
2.3.1. Memory#
The Memory
struct is ideal when you need to handle
data that can be wrapped by a memoryview
. It allows you to define
fields with a specified size (static or dynamic) and is especially useful
for printing out unpacked objects in a readable way.
>>> m = F(Memory(5)) # static size; dynamic size is allowed too
>>> pack(bytes([i for i in range(5)], m))
b'\x00\x01\x02\x03\x04'
>>> unpack(m, _)
<memory at 0x00000204FDFA4411>
2.3.2. Bytes#
If you need direct access to byte sequences, the Bytes
struct is the solution. This struct converts a memoryview
to bytes
for easy manipulation. You can define fields with static, dynamic, or greedy
sizes based on your needs.
>>> bytes_obj = Bytes(5) # static, dynamic and greedy size allowed
>>> b = octetstring(5) # static, dynamic size allowed
Let’s implement a struct for the fDAT chunk
of the PNG format, which stores frame data. In this case, we use the Memory
struct to handle the frame data.
@struct(order=BigEndian) # <-- endianess as usual
class FDATChunk:
sequence_number: uint32
# We rather use a memory instance here instead of Bytes()
frame_data: Memory(parent.length - 4)
parent = ContextPath("parent.obj")
@struct(endian=BIG_ENDIAN)
class FDATChunk:
sequence_number: u32
frame_data: octetstring(parent.length - 4)
Challenge
If you feel ready for a more advanced structure, try implementing the zTXt chunk for compressed textual data.
Solution
Python API only:
@struct # <-- actually, we don't need a specific byteorder
class ZTXTChunk:
keyword: CString(...) # <-- variable length
compression_method: uint8
# Okay, we haven't introduced this struct yet, but Memory() or Bytes()
# would heve been okay, too.
text: ZLibCompressed(parent.length - lenof(this.keyword) - 1)