2.2.2. String Types#

In binary file formats, string handling can be complex due to varying lengths, encodings, and termination methods. Caterpillar provides several specialized string types to manage these intricacies with ease.

2.2.2.1. Default Strings#

The standard type for regular string handling, requiring you to specify a fixed or dynamic length.

>>> # takes static length, context lambda or ... for greedy parsing
>>> s = String(100 or this.length) # static integer or context lambda

2.2.2.2. CString#

The CString type is used to handle strings that end with a null byte. It extends beyond simple C-style strings. Here’s how you might define a structure using a CString:

The tEXt chunk structure#
from caterpillar.py import *
from caterpillar.shortcuts import lenof

@struct
class TEXTChunk:
    # dynamic sized string that ends with a null-byte
    keyword: CString(encoding="ISO-8859-1")
    # static sized string based on the current context. some notes:
    #   - parent.length is the current chunkt's length
    #   - lenof(...) is the runtime length of the context variable
    #   - 1 because of the extra null-byte that is stripped from keyword
    text: CString(encoding="ISO-8859-1", length=parent.length - lenof(this.keword) - 1)

Challenge

Try implementing the iTXt chunk from the PNG format. This chunk uses a combination of strings and fixed-length fields. Here’s a possible solution:

Solution

This solution serves as an example and isn’t the only way to approach it!

 1@struct
 2class ITXTChunk:
 3    keyword: CString(encoding="utf-8")
 4    compression_flag: uint8
 5    # we actually don't need an Enum here
 6    compression_method: uint8
 7    language_tag: CString(encoding="ASCII")
 8    translated_keyword: CString(encoding="utf-8")
 9    # length is calculated with parent.length - len(keyword)+len(b"\x00") - ...
10    text: CString(
11        encoding="utf-8",
12        length=parent.length - lenof(this.translated_keyword) - lenof(this.keyword) - 5,
13    )

You can also customize the string’s termination character if needed:

>>> struct = CString(pad="\x0A")

2.2.2.3. Pascal Strings#

The Prefixed class implements Pascal strings, where the length of the string is prefixed to the actual data. This is useful when dealing with raw bytes or strings with a length indicator.

>>> s = Prefixed(uint8, encoding="utf-8")
>>> pack("Hello, World!", s, as_field=True)
b'\rHello, World!'
>>> unpack(s, _, as_field=True)
'Hello, World!'