Welcome to Caterpillar’s documentation!#

caterpillar is a powerful python framework designed to navigate through structured binary data. This framework operates as a declarative parser and builder using standard Python class definitions and annotations.

Whetting Your Appetite#

If you’re involved in reverse engineering or working with custom or proprietary file formats, there often comes a need to structure data in a way that is both easy and efficient.

🐛 lets you define standard structs with a fixed size while also supporting dynamic structs that adjust their size based on the current context. This framework enables you to write complex structures in a compact and readable manner.

Simple example of a custom struct#
@struct
class Format:
   magic: b"Foo"                       # constant values
   name: CString(...)                  # C-String without a fixed length
   value: le + uint16                  # little endian encoding
   entries: be + CString[uint32::]     # arrays with big-endian prefixed length
Simple example of a custom struct#
from caterpillar.c import BIG_ENDIAN as be

@struct
class Format:
   magic: b"Foo"                         # constant values
   name: cstring(...)                    # C-String without a fixed length
   value: u16                            # little endian encoding
   entries: cstring(...)[be + u32::]     # arrays with big-endian prefixed length
   value2: bool                          # python type mapping configurable

Hold up, wait a minute!

How does this even work? Is this still Python? Answers to these questions are given in the general Introduction.

Working with defined classes is as straightforward as working with normal classes. All constant values are created automatically!

>>> obj = Format(name="Hello, World!", value=10, entries=["Bar", "Baz"])
>>> print(obj)
Format(magic=b'Foo', name='Hello, World!', value=10, entries=['Bar', 'Baz'])

Packing and unpacking have never been easier:

>>> pack(obj)
b'FooHello, World!\x00\n\x00\x00\x00\x00\x02Bar\x00Baz\x00'
>>> unpack(Format, _)
Format(magic=b'Foo', name='Hello, World!', value=10, entries=['Bar', 'Baz'])
>>> pack(obj, Format)
b'FooHello, World!\x00\n\x00\x00\x00\x00\x02Bar\x00Baz\x00\x01'
>>> unpack(_, Format)
<Format object at 0x...>
  • What about documentation? There are specialized options created only for documentation purposes, so you don’t have to worry about documenting fields. Just apply the documentation as usual.

  • You want to optimize memory space? No problem! It is possible to shrink the memory space occupied by unpacked objects up to 4 times. More information are provided when discussing available configuration Options.

Where to start?#

It is recommended to take a look at the explanation of the internal Data Model to get in touch with all forms of structs. Additionally, there’s a detailed documentation on what configuration options can be used. Alternatively you can follow the Tutorial.

Tutorial

User-friendly tutorial to caterpillar.

tutorial/index.html
Reference

Information about the internal data model of caterpillar.

reference/index.html
Library

Source code documentation generated by Sphinx.

library/index.html
Development

Changelog and contributing information.

development/index.html
C Reference

Caterpillar C API Reference and extension development.

reference/capi/extension.html

Indices and tables#