1. First Steps#

In this section, we present examples designed to provide a relaxed introduction to caterpillar, avoiding getting into technical details yet. Most of the code snippets are using interpreter prompts. To reproduce the examples, simply input everything after the prompt.

In the context of this tutorial, we focus on the implementation of the PNG file format. While we won’t list every available chunk, you’re encouraged to explore and implement additional chunks independently.

The PNG format organizes data into chunks, each following a common format. The global chunk definition will be introduced later due to the impossibility of forward declarations in Python. [*]

Important

Assume that each code block starts with an import of all fields and the struct function from caterpillar.model.

1.1. Defining structs#

Given the important role of structs in this library, let’s start by understanding their definition. Our starting point is the PLTE chunk, which uses three-byte entries for its data.

RGB struct for the PLTE chunk#
from caterpillar.py import * # <-- just import everything

@struct         # <-- just decorate the class with the struct() function
class RGB:
    r: uint8    # <-- a field can be defined just like this
    g: uint8
    b: uint8
RGB struct for the PLTE chunk (using Caterpillar C)#
from caterpillar.c import * # <-- just import everything

@struct         # <-- just decorate the class with the struct() function
class RGB:
    r: u8       # <-- a field can be defined just like this
    g: u8
    b: u8

With this simple annotation, the struct class becomes universally applicable. You can integrate it into other struct definitions or instantiate objects of the class.

>>> obj = RGB(1, 2, 3)

Note

To optimize memory space and get faster attribute access times, you have to explicitly enable the S_SLOTS option. More information can be taken from Options.

Wow, thats it? That was less than expected? Let’s move directly to working with the defined class.

1.2. Working with structs#

Instantiating an object of your class involves providing all required fields as arguments. They must be keyword arguments if one of the defined fields contains a default value. Defaults or constant values are automatically applied, removing concerns about them.

1.2.1. Packing data#

This library’s packing and unpacking is similar to Python’s struct module. When packing data, a struct and an input object are needed.

Thanks to the RGB class encapsulating its struct instance, explicitly stating the struct to use becomes unnecessary.

>>> obj = RGB(r=1, g=2, b=3)
>>> pack(obj) # equivalent to pack(obj, RGB)
b'\x01\x02\x03'
>>> obj = RGB(r=1, g=2, b=3)
>>> pack(obj, RGB) # required as of version 2.2.0
b'\x01\x02\x03'

1.2.2. Unpacking data#

Recreating data from binary streams is as easy as serializing objects. Here, providing the struct directly or our struct class is necessary.

>>> unpack(RGB, b"\x01\x02\x03")
RGB(r=1, g=2, b=3)
>>> unpack(b"\x01\x02\x03", RGB)
RGB(r=1, g=2, b=3)

And no, we’re not done yet - we’ve just wrapped up the warm-up!

1.3. Configuring structs#

Now, let’s take a look at another chunk from the PNG format: pHYS. It specifies two four-byte unsigned integers. Given that PNG files encode numbers in big-endian, we must configure the struct to correctly decode these integer fields.

What is endianess?

You might find these resources helpful: Mozilla Docs, StackOverflow or Wikipedia

Configuring a struct-wide endianess#
@struct(order=BigEndian)        # <-- extra argument to apply the order to all fields.
class PHYSChunk:
    pixels_per_unit_x: uint32   # <-- same definition as above
    pixels_per_unit_y: uint32
    unit: uint8                 # <-- endianess meaningless, only one byte
Configuring a struct-wide endianess#
@struct(endian=BIG_ENDIAN)   # <-- extra argument to apply the order to all fields.
class PHYSChunk:
    pixels_per_unit_x: u32   # <-- same definition as above
    pixels_per_unit_y: u32
    unit: u8                 # <-- endianess meaningless, only one byte

Note

Even though, there will be <le uint32> visible in the annotations of the class, the created struct stores the modified big endian integer atom.

If your structs depend on the architecture associated with the binary, you can also specify a struct-wide Arch.

Challenge

You can try to implement the struct for the tIME chunk as a challenge.

Solution

Example implementation

1@struct(order=BigEndian)
2class TIMEChunk:
3    year: uint16        # <-- we could also use: BigEndian + uint16
4    month: uint8
5    day: uint8
6    hour: uint8
7    minute: uint8
8    second: uint8
1@struct(endian=BIG_ENDIAN)
2class TIMEChunk:
3    year: u16        # <-- we could also use: BIG_ENDIAN + u16
4    month: u8
5    day: u8
6    hour: u8
7    minute: u8
8    second: u8

Note that we can integrate this struct later on.

1.3.1. Documenting structs#

To minimize changes to your codebase or require as little adaptation as possible from users of this library, there’s a documentation feature. By utilizing the ability to globally apply options, you just need the following code:

Enable documentation feature#
from caterpillar.shortcuts import opt

opt.set_struct_flags(opt.S_REPLACE_TYPES)
Enable documentation feature#
from caterpillar.c import *

STRUCT_OPTIONS.add(S_REPLACE_TYPES)

Tip

If you are working with Sphinx, you might need to enable autodoc_member_order = 'bysource' to display all struct members in the correct order.

1.4. Next Steps#

With the fundamentals of defining and using structs, we’re ready to start more advanced topics. The upcoming sections will explore basic structs, array definitions, enum inclusion, and much more.