.. _tutorial-basics-strings: ************ String Types ************ In binary file formats, string handling can be complex due to varying lengths, encodings, and termination methods. *Caterpillar* provides several specialized string types to manage these intricacies with ease. Default Strings --------------- The standard type for regular string handling, requiring you to specify a fixed or dynamic length. .. tab-set:: .. tab-item:: Python >>> s = String(100 or this.length) # static integer or context lambda .. tab-item:: Caterpillar C >>> # takes static length, context lambda or ... for greedy parsing >>> s = string(100) CString ------- The :code:`CString` type is used to handle strings that end with a null byte. It extends beyond simple C-style strings. Here's how you might define a structure using a :code:`CString`: .. tab-set:: .. tab-item:: Python .. code-block:: python :caption: The `tEXt <https://www.w3.org/TR/png/#11tEXt>`_ chunk structure from caterpillar.py import * from caterpillar.shortcuts import lenof @struct class TEXTChunk: # dynamic sized string that ends with a null-byte keyword: CString(encoding="ISO-8859-1") # static sized string based on the current context. some notes: # - parent.length is the current chunkt's length # - lenof(...) is the runtime length of the context variable # - 1 because of the extra null-byte that is stripped from keyword text: CString(encoding="ISO-8859-1", length=parent.length - lenof(this.keword) - 1) .. tab-item:: Caterpillar C .. code-block:: python :caption: The `tEXt` chunk structure from caterpillar.c import * # <-- main difference from caterpillar.shortcuts import lenof # NOTE: lenof works here, because Caterpillar C's Context implements # the 'Context Protocol'. parent = ContextPath("parent.obj") this = ContextPath("obj") @struct class TEXTChunk: # dynamic sized string that ends with a null-byte keyword: cstring(encoding="ISO-8859-1") # static sized string based on the current context. some notes: # - parent.length is the current chunkt's length # - lenof(...) is the runtime length of the context variable # - 1 because of the extra null-byte that is stripped from keyword text: cstring(encoding="ISO-8859-1", length=parent.length - lenof(this.keword) - 1) .. admonition:: Challenge Try implementing the `iTXt <https://www.w3.org/TR/png/#11iTXt>`_ chunk from the PNG format. This chunk uses a combination of strings and fixed-length fields. Here's a possible solution: .. dropdown:: Solution :icon: check This solution serves as an example and isn't the only way to approach it! .. tab-set:: .. tab-item:: Python .. code-block:: python :linenos: @struct class ITXTChunk: keyword: CString(encoding="utf-8") compression_flag: uint8 # we actually don't need an Enum here compression_method: uint8 language_tag: CString(encoding="ASCII") translated_keyword: CString(encoding="utf-8") # length is calculated with parent.length - len(keyword)+len(b"\x00") - ... text: CString( encoding="utf-8", length=parent.length - lenof(this.translated_keyword) - lenof(this.keyword) - 5, ) .. tab-item:: Caterpillar C .. code-block:: python :linenos: from caterpillar.c import * # <-- main difference from caterpillar.shortcuts import lenof parent = ContextPath("parent.obj") this = ContextPath("obj") @struct class ITXTChunk: keyword: cstring() # default encoding is "utf-8" compression_flag: u8 # we actually don't need an Enum here compression_method: u8 language_tag: cstring(encoding="ASCII") translated_keyword: cstring(...) # explicit greedy parsing # length is calculated with parent.length - len(keyword)+len(b"\x00") - ... text: cstring( parent.length - lenof(this.translated_keyword) - lenof(this.keyword) - 5, ) You can also customize the string's termination character if needed: .. tab-set:: .. tab-item:: Python >>> struct = CString(pad="\x0A") .. tab-item:: Caterpillar C >>> s = cstring(sep="\x0A") Pascal Strings -------------- The :class:`~caterpillar.py.Prefixed` class implements Pascal strings, where the length of the string is prefixed to the actual data. This is useful when dealing with raw bytes or strings with a length indicator. .. tab-set:: .. tab-item:: Python >>> s = Prefixed(uint8, encoding="utf-8") >>> pack("Hello, World!", s, as_field=True) b'\rHello, World!' >>> unpack(s, _, as_field=True) 'Hello, World!' .. tab-item:: Caterpillar C >>> s = pstring(u8) >>> pack("Hello, World!", s) b'\rHello, World!' >>> unpack(_, s) 'Hello, World!'