1. First Steps#
In this section, we present examples designed to provide a relaxed introduction to caterpillar, avoiding getting into technical details yet. Most of the code snippets are using interpreter prompts. To reproduce the examples, simply input everything after the prompt.
In the context of this tutorial, we focus on the implementation of the PNG file format. While we won’t list every available chunk, you’re encouraged to explore and implement additional chunks independently.
The PNG format organizes data into chunks, each following a common format. The global chunk definition will be introduced later due to the impossibility of forward declarations in Python. [*]
Important
Assume that each code block starts with an import of all fields and the struct function from
caterpillar.model
.
1.1. Defining structs#
Given the important role of structs in this library, let’s start by understanding their definition. Our starting point is the PLTE chunk, which uses three-byte entries for its data.
from caterpillar.py import * # <-- just import everything
@struct # <-- just decorate the class with the struct() function
class RGB:
r: uint8 # <-- a field can be defined just like this
g: uint8
b: uint8
from caterpillar.c import * # <-- just import everything
@struct # <-- just decorate the class with the struct() function
class RGB:
r: u8 # <-- a field can be defined just like this
g: u8
b: u8
With this simple annotation, the struct class becomes universally applicable. You can integrate it into other struct definitions or instantiate objects of the class.
>>> obj = RGB(1, 2, 3)
Note
To optimize memory space and get faster attribute access times, you have to explicitly
enable the S_SLOTS
option. More information can be taken from
Options.
Wow, thats it? That was less than expected? Let’s move directly to working with the defined class.
1.2. Working with structs#
Instantiating an object of your class involves providing all required fields as arguments. They must be keyword arguments if one of the defined fields contains a default value. Defaults or constant values are automatically applied, removing concerns about them.
1.2.1. Packing data#
This library’s packing and unpacking is similar to Python’s struct module. When packing data, a struct and an input object are needed.
Thanks to the RGB class encapsulating its struct instance, explicitly stating the struct to use becomes unnecessary.
>>> obj = RGB(r=1, g=2, b=3)
>>> pack(obj) # equivalent to pack(obj, RGB)
b'\x01\x02\x03'
>>> obj = RGB(r=1, g=2, b=3)
>>> pack(obj, RGB) # required as of version 2.2.0
b'\x01\x02\x03'
1.2.2. Unpacking data#
Recreating data from binary streams is as easy as serializing objects. Here, providing the struct directly or our struct class is necessary.
>>> unpack(RGB, b"\x01\x02\x03")
RGB(r=1, g=2, b=3)
>>> unpack(b"\x01\x02\x03", RGB)
RGB(r=1, g=2, b=3)
And no, we’re not done yet - we’ve just wrapped up the warm-up!
1.3. Configuring structs#
Now, let’s take a look at another chunk from the PNG format: pHYS. It specifies two four-byte unsigned integers. Given that PNG files encode numbers in big-endian, we must configure the struct to correctly decode these integer fields.
What is endianess?
You might find these resources helpful: Mozilla Docs, StackOverflow or Wikipedia
@struct(order=BigEndian) # <-- extra argument to apply the order to all fields.
class PHYSChunk:
pixels_per_unit_x: uint32 # <-- same definition as above
pixels_per_unit_y: uint32
unit: uint8 # <-- endianess meaningless, only one byte
@struct(endian=BIG_ENDIAN) # <-- extra argument to apply the order to all fields.
class PHYSChunk:
pixels_per_unit_x: u32 # <-- same definition as above
pixels_per_unit_y: u32
unit: u8 # <-- endianess meaningless, only one byte
Note
Even though, there will be <le uint32>
visible in the annotations
of the class, the created struct stores the modified big endian integer
atom.
If your structs depend on the architecture associated with the binary, you can also specify a
struct-wide Arch
.
Challenge
You can try to implement the struct for the tIME chunk as a challenge.
Solution
Example implementation
1@struct(order=BigEndian)
2class TIMEChunk:
3 year: uint16 # <-- we could also use: BigEndian + uint16
4 month: uint8
5 day: uint8
6 hour: uint8
7 minute: uint8
8 second: uint8
1@struct(endian=BIG_ENDIAN)
2class TIMEChunk:
3 year: u16 # <-- we could also use: BIG_ENDIAN + u16
4 month: u8
5 day: u8
6 hour: u8
7 minute: u8
8 second: u8
Note that we can integrate this struct later on.
1.3.1. Documenting structs#
To minimize changes to your codebase or require as little adaptation as possible from users of this library, there’s a documentation feature. By utilizing the ability to globally apply options, you just need the following code:
from caterpillar.shortcuts import opt
opt.set_struct_flags(opt.S_REPLACE_TYPES)
from caterpillar.c import *
STRUCT_OPTIONS.add(S_REPLACE_TYPES)
Tip
If you are working with Sphinx, you might need
to enable autodoc_member_order = 'bysource'
to display all struct members in the
correct order.
1.4. Next Steps#
With the fundamentals of defining and using structs, we’re ready to start more advanced topics. The upcoming sections will explore basic structs, array definitions, enum inclusion, and much more.