2.2. String Types#

In binary file formats, string handling can be complex due to varying lengths, encodings, and termination methods. Caterpillar provides several specialized string types to manage these intricacies with ease.

2.2.1. Default Strings#

The standard type for regular string handling, requiring you to specify a fixed or dynamic length.

Python

>>> s = String(100 or this.length) # static integer or context lambda

Caterpillar C

>>> # takes static length, context lambda or ... for greedy parsing
>>> s = string(100)

2.2.2. CString#

The CString type is used to handle strings that end with a null byte. It extends beyond simple C-style strings. Here’s how you might define a structure using a CString:

Python

The tEXt chunk structure#

from caterpillar.py import *
from caterpillar.shortcuts import lenof

@struct
class TEXTChunk:
    # dynamic sized string that ends with a null-byte
    keyword: CString(encoding="ISO-8859-1")
    # static sized string based on the current context. some notes:
    #   - parent.length is the current chunkt's length
    #   - lenof(...) is the runtime length of the context variable
    #   - 1 because of the extra null-byte that is stripped from keyword
    text: CString(encoding="ISO-8859-1", length=parent.length - lenof(this.keword) - 1)

Caterpillar C

The tEXt chunk structure#

from caterpillar.c import *                 # <-- main difference
from caterpillar.shortcuts import lenof
# NOTE: lenof works here, because Caterpillar C's Context implements
# the 'Context Protocol'.

parent = ContextPath("parent.obj")
this = ContextPath("obj")

@struct
class TEXTChunk:
    # dynamic sized string that ends with a null-byte
    keyword: cstring(encoding="ISO-8859-1")
    # static sized string based on the current context. some notes:
    #   - parent.length is the current chunkt's length
    #   - lenof(...) is the runtime length of the context variable
    #   - 1 because of the extra null-byte that is stripped from keyword
    text: cstring(encoding="ISO-8859-1", length=parent.length - lenof(this.keword) - 1)

Challenge

Try implementing the iTXt chunk from the PNG format. This chunk uses a combination of strings and fixed-length fields. Here’s a possible solution:

You can also customize the string’s termination character if needed:

Python

>>> struct = CString(pad="\x0A")

Caterpillar C

>>> s = cstring(sep="\x0A")

2.2.3. Pascal Strings#

The Prefixed class implements Pascal strings, where the length of the string is prefixed to the actual data. This is useful when dealing with raw bytes or strings with a length indicator.

Python

>>> s = Prefixed(uint8, encoding="utf-8")
>>> pack("Hello, World!", s, as_field=True)
b'\rHello, World!'
>>> unpack(s, _, as_field=True)
'Hello, World!'

Caterpillar C

>>> s = pstring(u8)
>>> pack("Hello, World!", s)
b'\rHello, World!'
>>> unpack(_, s)
'Hello, World!'

2.2. String Types#

2.2.1. Default Strings#

2.2.2. CString#

2.2.3. Pascal Strings#

This Page