// SPDX-FileCopyrightText: 2023 Philippe Proulx // SPDX-License-Identifier: CC-BY-SA-4.0 // Show ToC at a specific location for a GitHub rendering ifdef::env-github[] :toc: macro endif::env-github[] ifndef::env-github[] :toc: left endif::env-github[] // This is to mimic what GitHub does so that anchors work in an offline // rendering too. :idprefix: :idseparator: - // Other attributes :py3: Python{nbsp}3 = Normand Philippe Proulx image::normand-logo.png[] [.normal] image:https://img.shields.io/pypi/v/normand.svg?label=Latest%20version[link="https://pypi.python.org/pypi/normand"] [.lead] _**Normand**_ is a text-to-binary processor with its own language. This package offers both a portable {py3} module and a command-line tool. WARNING: This version of Normand is 0.23, meaning both the Normand language and the module/CLI interface aren't stable. ifdef::env-github[] // ToC location for a GitHub rendering toc::[] endif::env-github[] == Introduction The purpose of Normand is to consume human-readable text representing bytes and to produce the corresponding binary data. .Simple bytes input. ==== Consider the following Normand input: ---- 4f 55 32 bb $167 fe %10100111 a9 $-32 ---- The generated nine bytes are: ---- 4f 55 32 bb a7 fe a7 a9 e0 ---- ==== As you can see in the last example, the fundamental unit of the Normand language is the _byte_. The order in which you list bytes will be the order of the generated data. The Normand language is more than simple lists of bytes, though. Its main features are: Comments, including a bunch of insignificant symbols which may improve readability:: + Input: + ---- ff bb %1101:0010 # This is a comment 78 29 af $192 # This too # 99 $-80 fe80::6257:18ff:fea3:4229 60:57:18:a3:42:29 10839636-5d65-4a68-8e6a-21608ddf7258 ---- + Output: + ---- ff bb d2 78 29 af c0 99 b0 fe 80 62 57 18 ff fe a3 42 29 60 57 18 a3 42 29 10 83 96 36 5d 65 4a 68 8e 6a 21 60 8d df 72 58 ---- Hexadecimal, decimal, and binary byte constants:: + Input: + ---- aa bb $247 $-89 %0011_0010 %11.01= 10/10 ---- + Output: + ---- aa bb f7 a7 32 da ---- Strings:: + Input: + ---- "hello world!" 00 u16le"stress\nverdict 🤣" s:latin3{hex(ICITTE)} ---- + Output: + ---- 68 65 6c 6c 6f 20 77 6f 72 6c 64 21 00 73 00 74 ┆ hello world!•s•t 00 72 00 65 00 73 00 73 00 0a 00 76 00 65 00 72 ┆ •r•e•s•s•••v•e•r 00 64 00 69 00 63 00 74 00 20 00 3e d8 23 dd 30 ┆ •d•i•c•t• •>•#•0 78 32 66 ┆ x2f ---- Labels: special variables holding the offset where they're defined:: + ---- b2 52 e3 bc 91 05 $100 $50 33 9f fe 25 e9 89 8a ---- Variables:: + ---- 5e 65 {tower = 47} c6 7f f2 c4 44 {hurl = tower - 14} b5 {tower = hurl} 26 2d ---- + The value of a variable assignment is the evaluation of a valid {py3} expression which may include label and variable names. Fixed-length number with a given length (8{nbsp}bits to 64{nbsp}bits) and byte order:: + Input: + ---- {strength = 4} !be 67 44 $178 [(end - lbl) * 8 + strength : 16] $99 !le [-1993 : 32] [-3.141593 : 64be] ---- + Output: + ---- 67 44 b2 00 2c 63 37 f8 ff ff c0 09 21 fb 82 c2 bd 7f ---- + The encoded number is the evaluation of a valid {py3} expression which may include label and variable names. https://en.wikipedia.org/wiki/LEB128[LEB128] integer:: + Input: + ---- aa bb cc [-1993 : sleb128] dd ee ff [meow * 199 : uleb128] ---- + Output: + ---- aa bb cc b7 70 dd ee ff e3 07 ---- + The encoded integer is the evaluation of a valid {py3} expression which may include label and variable names. Conditional:: + Input: + ---- aa bb cc ( "foo" !if {ICITTE > 10} "bar" !else "fight" !end ) * 4 ---- + Output: + ---- aa bb cc 66 6f 6f 66 69 67 68 74 66 6f 6f 66 69 ┆ •••foofightfoofi 67 68 74 66 6f 6f 62 61 72 66 6f 6f 62 61 72 ┆ ghtfoobarfoobar ---- Repetition:: + Input: + ---- aa bb * 5 cc "yeah\0" * {zoom * 3} !repeat 3 ff ee "juice" !end ---- + Output: + ---- aa bb bb bb bb bb cc 79 65 61 68 00 79 65 61 68 ┆ •••••••yeah•yeah 00 79 65 61 68 00 79 65 61 68 00 79 65 61 68 00 ┆ •yeah•yeah•yeah• 79 65 61 68 00 79 65 61 68 00 79 65 61 68 00 79 ┆ yeah•yeah•yeah•y 65 61 68 00 79 65 61 68 00 79 65 61 68 00 79 65 ┆ eah•yeah•yeah•ye 61 68 00 79 65 61 68 00 79 65 61 68 00 79 65 61 ┆ ah•yeah•yeah•yea 68 00 79 65 61 68 00 79 65 61 68 00 79 65 61 68 ┆ h•yeah•yeah•yeah 00 79 65 61 68 00 79 65 61 68 00 79 65 61 68 00 ┆ •yeah•yeah•yeah• ff ee 6a 75 69 63 65 ff ee 6a 75 69 63 65 ff ee ┆ ••juice••juice•• 6a 75 69 63 65 ┆ juice ---- Alignment:: + Input: + ---- !be [199:32] @64 [43:64] @16 [-123:16] @32~255 [5584:32] ---- + Output: + ---- 00 00 00 c7 00 00 00 00 00 00 00 00 00 00 00 2b ff 85 ff ff 00 00 15 d0 ---- Filling:: + Input: + ---- !le [0xdeadbeef:32] [-1993:16] [9:16] +0x40 [ICITTE:8] "meow mix" +200~FFh [ICITTE:8] ---- + Output: + ---- ef be ad de 37 f8 09 00 00 00 00 00 00 00 00 00 ┆ ••••7••••••••••• 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ┆ •••••••••••••••• 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ┆ •••••••••••••••• 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ┆ •••••••••••••••• 40 6d 65 6f 77 20 6d 69 78 ff ff ff ff ff ff ff ┆ @meow mix••••••• ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ┆ •••••••••••••••• ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ┆ •••••••••••••••• ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ┆ •••••••••••••••• ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ┆ •••••••••••••••• ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ┆ •••••••••••••••• ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ┆ •••••••••••••••• ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ┆ •••••••••••••••• ff ff ff ff ff ff ff ff c8 ┆ ••••••••• ---- Transformation:: + Input: + ---- "end of file @ " [end:8] !transform gzip "this part will be gzipped" !end ---- + Output: + ---- 65 6e 64 20 6f 66 20 66 69 6c 65 20 40 20 3c 1f ┆ end of file @ <• 8b 08 00 7b 7b 26 65 02 ff 2b c9 c8 2c 56 28 48 ┆ •••{{&e••+••,V(H 2c 2a 51 28 cf cc c9 51 48 4a 55 48 af ca 2c 28 ┆ ,*Q(•••QHJUH••,( 48 4d 01 00 d4 cc 5b 8a 19 00 00 00 ┆ HM••••[••••• ---- Multilevel grouping:: + Input: + ---- ff ((aa bb "zoom" cc) * 5) * 3 $-34 * 4 ---- + Output: + ---- ff aa bb 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa ┆ •••zoom•••zoom•• bb 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa bb 7a ┆ •zoom•••zoom•••z 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa bb 7a 6f 6f ┆ oom•••zoom•••zoo 6d cc aa bb 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc ┆ m•••zoom•••zoom• aa bb 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa bb ┆ ••zoom•••zoom••• 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa bb 7a 6f ┆ zoom•••zoom•••zo 6f 6d cc aa bb 7a 6f 6f 6d cc de de de de ┆ om•••zoom••••• ---- Macros:: + Input: + ---- !macro hello(world) "hello" !if world " world" !end !end !repeat 17 ff ff ff ff m:hello({ICITTE > 15 and ICITTE < 60}) !end ---- + Output: + ---- ff ff ff ff 68 65 6c 6c 6f ff ff ff ff 68 65 6c ┆ ••••hello••••hel 6c 6f ff ff ff ff 68 65 6c 6c 6f 20 77 6f 72 6c ┆ lo••••hello worl 64 ff ff ff ff 68 65 6c 6c 6f 20 77 6f 72 6c 64 ┆ d••••hello world ff ff ff ff 68 65 6c 6c 6f 20 77 6f 72 6c 64 ff ┆ ••••hello world• ff ff ff 68 65 6c 6c 6f ff ff ff ff 68 65 6c 6c ┆ •••hello••••hell 6f ff ff ff ff 68 65 6c 6c 6f ff ff ff ff 68 65 ┆ o••••hello••••he 6c 6c 6f ff ff ff ff 68 65 6c 6c 6f ff ff ff ff ┆ llo••••hello•••• 68 65 6c 6c 6f ff ff ff ff 68 65 6c 6c 6f ff ff ┆ hello••••hello•• ff ff 68 65 6c 6c 6f ff ff ff ff 68 65 6c 6c 6f ┆ ••hello••••hello ff ff ff ff 68 65 6c 6c 6f ff ff ff ff 68 65 6c ┆ ••••hello••••hel 6c 6f ff ff ff ff 68 65 6c 6c 6f ┆ lo••••hello ---- Precise error reporting:: + ---- /tmp/meow.normand:10:24 - Expecting a bit (`0` or `1`). ---- + ---- /tmp/meow.normand:32:6 - Unexpected character `k`. ---- + ---- /tmp/meow.normand:24:19 - Illegal (unknown or unreachable) variable/label name `meow` in expression `(meow - 45) // 8`; the legal names are {`ICITTE`, `mix`, `zoom`}. ---- + ---- /tmp/meow.normand:32:19 - While expanding the macro `meow`: /tmp/meow.normand:35:5 - While expanding the macro `zzz`: /tmp/meow.normand:18:9 - Value 315 is outside the 8-bit range when evaluating expression `end - ICITTE`. ---- You can use Normand to track data source files in your favorite VCS instead of raw binary files. The binary files that Normand generates can be used to test file format decoding, including malformatted data, for example, as well as for education. See <> to explore all the Normand features. == Install Normand Normand requires Python ≥ 3.4. To install Normand: ---- $ python3 -m pip install --user normand ---- See https://packaging.python.org/en/latest/tutorials/installing-packages/#installing-to-the-user-site[Installing to the User Site] to learn more about a user site installation. [NOTE] ==== Normand has a single module file, `normand.py`, which you can copy as is to your project to use it (both the <> function and the <>). `normand.py` has _no external dependencies_, but if you're using Python{nbsp}3.4 or Python{nbsp}3.5, you'll need a local copy of the standard `typing` module. ==== == Design goals The design goals of Normand are: Portability:: We're making sure `normand.py` works with Python{nbsp}≥{nbsp}3.4 and doesn't have any external dependencies so that you may just copy the module as is to your own project. Ease of use:: The most basic Normand input is a sequence of hexadecimal constants (for example, `4e6f726d616e64`) which produce exactly what you'd expect. + Most Normand features map to programming language concepts you already know and understand: constant integers, literal strings, variables, conditionals, repetitions/loops, and the rest. Concise and readable input:: We could have chosen XML or YAML as the input format, but having a DSL here makes a Normand input compact and easy to read, two important traits when using Normand to write tests, for example. + Compare the following Normand input and some hypothetical XML equivalent, for example: + .Actual Normand input. ---- ff dd 01 ab $192 $-128 %1101:0011 [end:8] {iter = 1} !if {not something} # five times because xyz !repeat 5 "hello world " [iter:8] {iter = iter + 1} !end !end ---- + .Hypothetical Normand XML input. [source,xml] ---- hello world ---- == Learn Normand A Normand text input is a sequence of items which represent a sequence of raw bytes. [[state]] During the processing of items to data, Normand relies on a current state: [%header%autowidth] |=== |State variable |Description |Initial value: <> |Initial value: <> |[[cur-offset]] Current offset | The current offset has an effect on the value of <> and of the special `ICITTE` name in <>, <>, <>, <>, <>, <>, <>, <>, and <> expression evaluation. Each generated byte increments the current offset. A <> may change the current offset without generating data. An <> generates padding bytes to make the current offset satisfy a given alignment. |`init_offset` parameter of the `parse()` function. |`--offset` option. |[[cur-bo]] Current byte order | The current byte order can have an effect on the encoding of <>. A <> may change the current byte order. |`init_byte_order` parameter of the `parse()` function. |`--byte-order` option. |<> |Mapping of label names to integral values. |`init_labels` parameter of the `parse()` function. |One or more `--label` options. |<> |Mapping of variable names to integral or floating point number values. |`init_variables` parameter of the `parse()` function. |One or more `--var` or `--var-str` options. |=== The available items are: * A <> representing one or more constant bytes. * A <> representing a constant sequence of bytes encoding UTF-8, UTF-16, UTF-32, or Latin-1 to Latin-10 data. * A <> (big or little endian). * A <> (integer or floating point), possibly using the <>, and of which the value is the result of a {py3} expression. * An <> of which the value is the result of a {py3} expression. * A <> representing a sequence of bytes encoding UTF-8, UTF-16, UTF-32, or Latin-1 to Latin-10 data, and of which the value is the result of a {py3} expression. * A <>. * A <>. * A <>. * A <>, that is, a named constant holding the current offset. + This is similar to an assembly label. * A <> associating a name to the integral result of an evaluated {py3} expression. * A <>, that is, a scoped sequence of items. * A <>. * A <>. * A <>. * A <>. * A <>. Moreover, you can repeat many items above a constant or variable number of times with the ``pass:[*]`` operator _after_ the item to repeat. This is called a <>. A Normand comment may exist pretty much anywhere between tokens. A comment is anything between two ``pass:[#]`` characters on the same line, or from ``pass:[#]`` until the end of the line. Whitespaces are also considered comments. The following symbols are also considered comments around and between items, as well as between hexadecimal nibbles and binary bits of <>: ---- & , - . / : ; = ? \ _ | ---- The latter serve to improve readability so that you may write, for example, a MAC address or a UUID as is. [[const-int]] Many items require a _constant integer_, possibly negative, in which case it may start with `-` for a negative integer. A positive constant integer is any of: Decimal:: One or mode digits (`0` to `9`). Hexadecimal:: One of: + * The `0x` or `0X` prefix followed with one or more hexadecimal digits (`0` to `9`, `a` to `f`, or `A` to `F`). * One or more hexadecimal digits followed with the `h` or `H` suffix. Octal:: One of: + * The `0o` or `0O` prefix followed with one or more octal digits (`0` to `7`). * One or more octal digits followed with the `o`, `O`, `q`, or `Q` suffix. Binary:: One of: + * The `0b` or `0B` prefix followed with one or more bits (`0` or `1`). * One or more bits followed with the `b` or `B` suffix. In general, anything between `pass:[{]` and `}` is a {py3} expression. You can test the examples of this section with the `normand` <> as such: ---- $ normand file | hexdump -C ---- where `file` is the name of a file containing the Normand input. === Byte constant A _byte constant_ represents one or more constant bytes. A byte constant is: Hexadecimal form:: Two consecutive hexadecimal digits representing a single byte. Decimal form:: One or more digits after the `$` prefix representing a single byte. Binary form:: {empty} + -- . __**N**__ `%` prefixes (at least one). + The number of `%` characters is the number of subsequent expected bytes. . __**N**__{nbsp}×{nbsp}8 bits (`0` or `1`). -- ==== Input: ---- ab cd (3d 8F) CC ---- Output: ---- ab cd 3d 8f cc ---- ==== ==== Input: ---- $192 %1100/0011 $ -77 ---- Output: ---- c0 c3 b3 ---- ==== ==== Input: ---- 58f64689-6316-4d55-8a1a-04cada366172 fe80::6257:18ff:fea3:4229 ---- Output: ---- 58 f6 46 89 63 16 4d 55 8a 1a 04 ca da 36 61 72 ┆ X•F•c•MU•••••6ar fe 80 62 57 18 ff fe a3 42 29 ┆ ••bW••••B) ---- ==== ==== Input: ---- %01110011 %01100001 %01101100 %01110101 %01110100 %%%1101:0010 11111111 #A#11 #B#00 #C#011 #D#1 ---- Output: ---- 73 61 6c 75 74 d2 ff c7 ┆ salut••• ---- ==== === Literal string A _literal string_ represents the encoded bytes of a literal string using the UTF-8, UTF-16, UTF-32, or Latin-1 to Latin-10 encoding. The string to encode isn't implicitly null-terminated: use `\0` at the end of the string to add a null character. A literal string is: . **Optional**: one of the following encodings instead of the default UTF-8: + -- [horizontal] `s:u8`:: `u8`:: UTF-8. `s:u16be`:: `u16be`:: UTF-16BE. `s:u16le`:: `u16le`:: UTF-16LE. `s:u32be`:: `u32be`:: UTF-32BE. `s:u32le`:: `u32le`:: UTF-32LE. `s:latin1`:: ISO/IEC 8859-1. `s:latin2`:: ISO/IEC 8859-2. `s:latin3`:: ISO/IEC 8859-3. `s:latin4`:: ISO/IEC 8859-4. `s:latin5`:: ISO/IEC 8859-9. `s:latin6`:: ISO/IEC 8859-10. `s:latin7`:: ISO/IEC 8859-13. `s:latin8`:: ISO/IEC 8859-14. `s:latin9`:: ISO/IEC 8859-15. `s:latin10`:: ISO/IEC 8859-16. -- . The ``pass:["]`` prefix. . A sequence of zero or more characters, possibly containing escape sequences. + An escape sequence is the ``\`` character followed by one of: + -- [horizontal] `0`:: Null (U+0000) `a`:: Alert (U+0007) `b`:: Backspace (U+0008) `e`:: Escape (U+001B) `f`:: Form feed (U+000C) `n`:: End of line (U+000A) `r`:: Carriage return (U+000D) `t`:: Character tabulation (U+0009) `v`:: Line tabulation (U+000B) ``\``:: Reverse solidus (U+005C) ``pass:["]``:: Quotation mark (U+0022) -- . The ``pass:["]`` suffix. ==== Input: ---- "coucou tout le monde!" ---- Output: ---- 63 6f 75 63 6f 75 20 74 6f 75 74 20 6c 65 20 6d ┆ coucou tout le m 6f 6e 64 65 21 ┆ onde! ---- ==== ==== Input: ---- u16le"I am not young enough to know everything." ---- Output: ---- 49 00 20 00 61 00 6d 00 20 00 6e 00 6f 00 74 00 ┆ I• •a•m• •n•o•t• 20 00 79 00 6f 00 75 00 6e 00 67 00 20 00 65 00 ┆ •y•o•u•n•g• •e• 6e 00 6f 00 75 00 67 00 68 00 20 00 74 00 6f 00 ┆ n•o•u•g•h• •t•o• 20 00 6b 00 6e 00 6f 00 77 00 20 00 65 00 76 00 ┆ •k•n•o•w• •e•v• 65 00 72 00 79 00 74 00 68 00 69 00 6e 00 67 00 ┆ e•r•y•t•h•i•n•g• 2e 00 ┆ .• ---- ==== ==== Input: ---- s:u32be "\"illusion is the first\nof all pleasures\" 🦉" ---- Output: ---- 00 00 00 22 00 00 00 69 00 00 00 6c 00 00 00 6c ┆ •••"•••i•••l•••l 00 00 00 75 00 00 00 73 00 00 00 69 00 00 00 6f ┆ •••u•••s•••i•••o 00 00 00 6e 00 00 00 20 00 00 00 69 00 00 00 73 ┆ •••n••• •••i•••s 00 00 00 20 00 00 00 74 00 00 00 68 00 00 00 65 ┆ ••• •••t•••h•••e 00 00 00 20 00 00 00 66 00 00 00 69 00 00 00 72 ┆ ••• •••f•••i•••r 00 00 00 73 00 00 00 74 00 00 00 0a 00 00 00 6f ┆ •••s•••t•••••••o 00 00 00 66 00 00 00 20 00 00 00 61 00 00 00 6c ┆ •••f••• •••a•••l 00 00 00 6c 00 00 00 20 00 00 00 70 00 00 00 6c ┆ •••l••• •••p•••l 00 00 00 65 00 00 00 61 00 00 00 73 00 00 00 75 ┆ •••e•••a•••s•••u 00 00 00 72 00 00 00 65 00 00 00 73 00 00 00 22 ┆ •••r•••e•••s•••" 00 00 00 20 00 01 f9 89 ┆ ••• •••• ---- ==== ==== Input: ---- s:latin1 "Paul Piché" ---- Output: ---- 50 61 75 6c 20 50 69 63 68 e9 ┆ Paul Pich• ---- ==== === Current byte order setting This special item sets the <>. The two accepted forms are: [horizontal] `!be`:: Set the current byte order to big endian. `!le`:: Set the current byte order to little endian. === Fixed-length number A _fixed-length number_ represents a fixed number of bytes encoding either: * An unsigned or signed integer (two's complement). + The available lengths are 8, 16, 24, 32, 40, 48, 56, and 64. * A floating point number (https://standards.ieee.org/standard/754-2008.html[IEEE{nbsp}754-2008]). + The available lengths are 32 (_binary32_) and 64 (_binary64_). The value is the result of evaluating a {py3} expression. The byte order to use to encode the value is either directly specified or is the <>. A fixed-length number is: . The `[` prefix. . A valid {py3} expression. + For a fixed-length number at some source location{nbsp}__**L**__, this expression may contain the name of any accessible <> (not within a nested group), including the name of a label defined after{nbsp}__**L**__ (except within a <>), as well as the name of any <> known at{nbsp}__**L**__. + The value of the special name `ICITTE` (`int` type) in this expression is the <> (before encoding the number). . The `:` character. . An encoding length in bits amongst: + -- The expression evaluates to an `int` or `bool` value:: `8`, `16`, `24`, `32`, `40`, `48`, `56`, and `64`. + NOTE: Normand automatically converts a `bool` value to `int`. The expression evaluates to a `float` value:: `32` and `64`. -- . **Optional**: a suffix of the previous encoding length, without any whitespace, amongst: + -- [horizontal] `be`:: Encode in big endian. `le`:: Encode in little endian. -- + Without this suffix, the encoding byte order is the <> which must be defined if the encoding length is greater than eight. . The `]` suffix. ==== Input: ---- [345:16le] [-0xabcd:32be] ---- Output: ---- 59 01 ff ff 54 33 ---- ==== ==== Input: ---- !be # String length in bits [8 * (str_end - str_beg) : 16] # String "hello world!" ---- Output: ---- 00 60 68 65 6c 6c 6f 20 77 6f 72 6c 64 21 ┆ •`hello world! ---- ==== ==== Input: ---- [20 - ICITTE : 8] * 10 ---- Output: ---- 14 13 12 11 10 0f 0e 0d 0c 0b ---- ==== ==== Input: ---- [2 * 0.0529 : 32le] ---- Output: ---- ac ad d8 3d ---- ==== === LEB128 integer An _LEB128 integer_ represents a variable number of bytes encoding an unsigned or signed integer which is the result of evaluating a {py3} expression following the https://en.wikipedia.org/wiki/LEB128[LEB128] format. An LEB128 integer is: . The `[` prefix. . A valid {py3} expression of which the evaluation result type is `int` or `bool` (automatically converted to `int`). + For an LEB128 integer at some source location{nbsp}__**L**__, this expression may contain: + -- * The name of any <> defined before{nbsp}__**L**__ which isn't within a nested group. * The name of any <> known at{nbsp}__**L**__. -- + The value of the special name `ICITTE` (`int` type) in this expression is the <> (before encoding the integer). . The `:` character. . One of: + -- [horizontal] `uleb128`:: Use the unsigned LEB128 format. `sleb128`:: Use the signed LEB128 format. -- . The `]` suffix. ==== Input: ---- [624485 : uleb128] ---- Output: ---- e5 8e 26 ---- ==== ==== Input: ---- aa bb cc dd ee ff [-981238311 + (meow * -23) : sleb128] "hello" ---- Output: ---- aa bb cc dd ee ff fd fa 8d ac 7c 68 65 6c 6c 6f ┆ ••••••••••|hello ---- ==== === String A _string_ represents a variable number of bytes encoding a string which is the result of evaluating a {py3} expression using the UTF-8, UTF-16, UTF-32, or Latin-1 to Latin-10 encoding. A string has two possible forms: Encoding prefix form:: {empty} + . An encoding amongst: + -- [horizontal] `s:u8`:: `u8`:: UTF-8. `s:u16be`:: `u16be`:: UTF-16BE. `s:u16le`:: `u16le`:: UTF-16LE. `s:u32be`:: `u32be`:: UTF-32BE. `s:u32le`:: `u32le`:: UTF-32LE. `s:latin1`:: ISO/IEC 8859-1. `s:latin2`:: ISO/IEC 8859-2. `s:latin3`:: ISO/IEC 8859-3. `s:latin4`:: ISO/IEC 8859-4. `s:latin5`:: ISO/IEC 8859-9. `s:latin6`:: ISO/IEC 8859-10. `s:latin7`:: ISO/IEC 8859-13. `s:latin8`:: ISO/IEC 8859-14. `s:latin9`:: ISO/IEC 8859-15. `s:latin10`:: ISO/IEC 8859-16. -- . The ``pass:[{]`` prefix. . A valid {py3} expression of which the evaluation result type is `bool`, `int`, `float`, or `str` (the first three automatically converted to `str`). + For a string at some source location{nbsp}__**L**__, this expression may contain: + -- * The name of any <> defined before{nbsp}__**L**__ which isn't within a nested group. * The name of any <> known at{nbsp}__**L**__. -- + The value of the special name `ICITTE` (`int` type) in this expression is the <> (before encoding the string). . The `}` suffix. Encoding suffix form:: {empty} + . The `[` prefix. . A valid {py3} expression of which the evaluation result type is `bool`, `int`, `float`, or `str` (the first three automatically converted to `str`). + For a string at some source location{nbsp}__**L**__, this expression may contain: + -- * The name of any <> defined before{nbsp}__**L**__ which isn't within a nested group. * The name of any <> known at{nbsp}__**L**__. -- + The value of the special name `ICITTE` (`int` type) in this expression is the <> (before encoding the string). . The `:` character. . A string encoding amongst: + -- [horizontal] `s:u8`:: UTF-8. `s:u16be`:: UTF-16BE. `s:u16le`:: UTF-16LE. `s:u32be`:: UTF-32BE. `s:u32le`:: UTF-32LE. `s:latin1`:: ISO/IEC 8859-1. `s:latin2`:: ISO/IEC 8859-2. `s:latin3`:: ISO/IEC 8859-3. `s:latin4`:: ISO/IEC 8859-4. `s:latin5`:: ISO/IEC 8859-9. `s:latin6`:: ISO/IEC 8859-10. `s:latin7`:: ISO/IEC 8859-13. `s:latin8`:: ISO/IEC 8859-14. `s:latin9`:: ISO/IEC 8859-15. `s:latin10`:: ISO/IEC 8859-16. -- . The `]` suffix. ==== Input: ---- {iter = 1} !repeat 10 u8{iter} " " {iter = iter + 1} !end ---- Output: ---- 31 20 32 20 33 20 34 20 35 20 36 20 37 20 38 20 ┆ 1 2 3 4 5 6 7 8 39 20 31 30 20 ┆ 9 10 ---- ==== ==== Input: ---- {meow = 'salut jérémie'} [meow.upper() : s:latin1] ---- Output: ---- 53 41 4c 55 54 20 4a c9 52 c9 4d 49 45 ┆ SALUT J•R•MIE ---- ==== === Current offset setting This special item sets the <>. A current offset setting is: . The `<` prefix. . A <> which is the new current offset. . The `>` suffix. ==== Input: ---- [ICITTE : 8] * 8 <0x61> [ICITTE : 8] * 8 ---- Output: ---- 00 01 02 03 04 05 06 07 61 62 63 64 65 66 67 68 ┆ ••••••••abcdefgh ---- ==== ==== Input: ---- aa bb cc dd ee ff <12> 11 22 33 44 55 [meow : 8] [mix : 8] ---- Output: ---- aa bb cc dd ee ff 11 22 33 44 55 04 0f ┆ •••••••"3DU•• ---- ==== === Current offset alignment A _current offset alignment_ represents zero or more padding bytes to make the <> meet a given https://en.wikipedia.org/wiki/Data_structure_alignment[alignment] value. More specifically, for an alignment value of{nbsp}__**N**__{nbsp}bits, a current offset alignment represents the required padding bytes until the current offset is a multiple of __**N**__{nbsp}/{nbsp}8. A current offset alignment is: . The `@` prefix. . A <> which is the alignment value in _bits_. + This value must be greater than zero and a multiple of{nbsp}8. . **Optional**: + -- . The ``pass:[~]`` prefix. . A <> which is the value of the byte to use as padding to align the <>. -- + Without this section, the padding byte value is zero. ==== Input: ---- 11 22 (@32 aa bb cc) * 3 ---- Output: ---- 11 22 00 00 aa bb cc 00 aa bb cc 00 aa bb cc ---- ==== ==== Input: ---- !le 77 88 @32~0xcc [-893.5:32] @128~0x55 "meow" ---- Output: ---- 77 88 cc cc 00 60 5f c4 55 55 55 55 55 55 55 55 ┆ w••••`_•UUUUUUUU 6d 65 6f 77 ┆ meow ---- ==== ==== Input: ---- aa bb cc <29> @64~255 "zoom" ---- Output: ---- aa bb cc ff ff ff 7a 6f 6f 6d ┆ ••••••zoom ---- ==== === Filling A _filling_ represents zero or more padding bytes to make the <> reach a given value. A filling is: . The ``pass:[+]`` prefix. . One of: ** A <> which is the current offset target. ** The ``pass:[{]`` prefix, a valid {py3} expression of which the evaluation result type is `int` or `bool` (automatically converted to `int`), and the `}` suffix. + For a filling at some source location{nbsp}__**L**__, this expression may contain: + -- * The name of any <> defined before{nbsp}__**L**__ which isn't within a nested group. * The name of any <> known at{nbsp}__**L**__. -- + The value of the special name `ICITTE` (`int` type) in this expression is the <> (before handling the items to repeat). ** A valid {py3} name. + For the name `__NAME__`, this is equivalent to the `pass:[{]__NAME__}` form above. + This value must be greater than or equal to the current offset where it's used. . **Optional**: + -- . The ``pass:[~]`` prefix. . A <> which is the value of the byte to use as padding to reach the current offset target. -- + Without this section, the padding byte value is zero. ==== Input: ---- aa bb cc dd +0x40 "hello world" ---- Output: ---- aa bb cc dd 00 00 00 00 00 00 00 00 00 00 00 00 ┆ •••••••••••••••• 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ┆ •••••••••••••••• 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ┆ •••••••••••••••• 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ┆ •••••••••••••••• 68 65 6c 6c 6f 20 77 6f 72 6c 64 ┆ hello world ---- ==== ==== Input: ---- !macro part(iter, fill) <0> "particular security " [ord('0') + iter : 8] +fill~0x80 !end {iter = 1} !repeat 5 m:part(iter, {32 + 4 * iter}) {iter = iter + 1} !end ---- Output: ---- 70 61 72 74 69 63 75 6c 61 72 20 73 65 63 75 72 ┆ particular secur 69 74 79 20 31 80 80 80 80 80 80 80 80 80 80 80 ┆ ity 1••••••••••• 80 80 80 80 70 61 72 74 69 63 75 6c 61 72 20 73 ┆ ••••particular s 65 63 75 72 69 74 79 20 32 80 80 80 80 80 80 80 ┆ ecurity 2••••••• 80 80 80 80 80 80 80 80 80 80 80 80 70 61 72 74 ┆ ••••••••••••part 69 63 75 6c 61 72 20 73 65 63 75 72 69 74 79 20 ┆ icular security 33 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ┆ 3••••••••••••••• 80 80 80 80 80 80 80 80 70 61 72 74 69 63 75 6c ┆ ••••••••particul 61 72 20 73 65 63 75 72 69 74 79 20 34 80 80 80 ┆ ar security 4••• 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ┆ •••••••••••••••• 80 80 80 80 80 80 80 80 70 61 72 74 69 63 75 6c ┆ ••••••••particul 61 72 20 73 65 63 75 72 69 74 79 20 35 80 80 80 ┆ ar security 5••• 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ┆ •••••••••••••••• 80 80 80 80 80 80 80 80 80 80 80 80 ┆ •••••••••••• ---- ==== === Label A _label_ associates a name to the <>. All the labels of a whole Normand input must have unique names. A label must not share the name of a <> name. A label is: . The `<` prefix. . A valid {py3} name which is not `ICITTE`. . The `>` suffix. === Variable assignment A _variable assignment_ associates a name to the integral result of an evaluated {py3} expression. A variable assignment is: . The ``pass:[{]`` prefix. . A valid {py3} name which is not `ICITTE`. . The `=` character. . A valid {py3} expression of which the evaluation result type is `int`, `float`, or `bool` (automatically converted to `int`), or `str`. + For a variable assignment at some source location{nbsp}__**L**__, this expression may contain: + -- * The name of any <> defined before{nbsp}__**L**__ which isn't within a nested group. * The name of any <> known at{nbsp}__**L**__. -- + The value of the special name `ICITTE` (`int` type) in this expression is the <>. . The `}` suffix. ==== Input: ---- {mix = 101} !le {meow = 42} 11 22 [meow:8] 33 {meow = ICITTE + 17} "yooo" [meow + mix : 16] ---- Output: ---- 11 22 2a 33 79 6f 6f 6f 7a 00 ┆ •"*3yoooz• ---- ==== === Group A _group_ is a scoped sequence of items. The <> within a group aren't visible outside of it. The main purpose of a group is to <> more than a single item and to isolate labels. A group is: . The `(`, `!group`, or `!g` opening. . Zero or more items except, recursively, a macro definition block. . Depending on the group opening: + -- `(`:: The `)` closing. `!group`:: `!g`:: The `!end` closing. -- ==== Input: ---- ((aa bb cc) dd () ee) "leclerc" ---- Output: ---- aa bb cc dd ee 6c 65 63 6c 65 72 63 ┆ •••••leclerc ---- ==== ==== Input: ---- !group (aa bb cc) * 3 dd ee !end * 5 ---- Output: ---- aa bb cc aa bb cc aa bb cc dd ee aa bb cc aa bb cc aa bb cc dd ee aa bb cc aa bb cc aa bb cc dd ee aa bb cc aa bb cc aa bb cc dd ee aa bb cc aa bb cc aa bb cc dd ee ---- ==== ==== Input: ---- !be ( u16le"sébastien diaz" [ICITTE - str_beg : 8] [(end - str_beg) * 5 : 24] ) * 3 ---- Output: ---- 73 00 e9 00 62 00 61 00 73 00 74 00 69 00 65 00 ┆ s•••b•a•s•t•i•e• 6e 00 20 00 64 00 69 00 61 00 7a 00 1c 00 01 e0 ┆ n• •d•i•a•z••••• 73 00 e9 00 62 00 61 00 73 00 74 00 69 00 65 00 ┆ s•••b•a•s•t•i•e• 6e 00 20 00 64 00 69 00 61 00 7a 00 1c 00 01 40 ┆ n• •d•i•a•z••••@ 73 00 e9 00 62 00 61 00 73 00 74 00 69 00 65 00 ┆ s•••b•a•s•t•i•e• 6e 00 20 00 64 00 69 00 61 00 7a 00 1c 00 00 a0 ┆ n• •d•i•a•z••••• ---- ==== === Conditional block A _conditional block_ represents either the bytes of zero or more items if some expression is true, or the bytes of zero or more other items if it's false. A conditional block is: . The `!if` opening. . One of: ** The ``pass:[{]`` prefix, a valid {py3} expression of which the evaluation result type is `int` or `bool` (automatically converted to `int`), and the `}` suffix. + For a conditional block at some source location{nbsp}__**L**__, this expression may contain: + -- * The name of any <> defined before{nbsp}__**L**__ which isn't within a nested group. * The name of any <> known at{nbsp}__**L**__. -- + The value of the special name `ICITTE` (`int` type) in this expression is the <> (before handling the contained items). ** A valid {py3} name. + For the name `__NAME__`, this is equivalent to the `pass:[{]__NAME__}` form above. . Zero or more items to be handled when the condition is true except, recursively, a macro definition block. . **Optional**: .. The `!else` opening. .. Zero or more items to be handled when the condition is false except, recursively, a macro definition block . The `!end` closing. ==== Input: ---- {at = 1} {rep_count = 9} !repeat rep_count "meow " !if {ICITTE > 25} "mix" !else "zoom" !end !if {at < rep_count} 20 !end {at = at + 1} !end ---- Output: ---- 6d 65 6f 77 20 7a 6f 6f 6d 20 6d 65 6f 77 20 7a ┆ meow zoom meow z 6f 6f 6d 20 6d 65 6f 77 20 7a 6f 6f 6d 20 6d 65 ┆ oom meow zoom me 6f 77 20 6d 69 78 20 6d 65 6f 77 20 6d 69 78 20 ┆ ow mix meow mix 6d 65 6f 77 20 6d 69 78 20 6d 65 6f 77 20 6d 69 ┆ meow mix meow mi 78 20 6d 65 6f 77 20 6d 69 78 20 6d 65 6f 77 20 ┆ x meow mix meow 6d 69 78 ┆ mix ---- ==== ==== Input: ---- u16le"meow mix!" !if {str_end - str_beg > 10} " BIG" !end ---- Output: ---- 6d 00 65 00 6f 00 77 00 20 00 6d 00 69 00 78 00 ┆ m•e•o•w• •m•i•x• 21 00 20 42 49 47 ┆ !• BIG ---- ==== === Repetition block A _repetition block_ represents the bytes of one or more items repeated a given number of times. A repetition block is: . The `!repeat` or `!r` opening. . One of: ** A <> which is the number of times to repeat the previous item. ** The ``pass:[{]`` prefix, a valid {py3} expression of which the evaluation result type is `int` or `bool` (automatically converted to `int`), and the `}` suffix. + For a repetition block at some source location{nbsp}__**L**__, this expression may contain: + -- * The name of any <> defined before{nbsp}__**L**__ which isn't within a nested group. * The name of any <> known at{nbsp}__**L**__. -- + The value of the special name `ICITTE` (`int` type) in this expression is the <> (before handling the items to repeat). ** A valid {py3} name. + For the name `__NAME__`, this is equivalent to the `pass:[{]__NAME__}` form above. . Zero or more items except, recursively, a macro definition block. . The `!end` closing. You may also use a <> after some items. The form ``!repeat{nbsp}__X__{nbsp}__ITEMS__{nbsp}!end`` is equivalent to ``(__ITEMS__){nbsp}pass:[*]{nbsp}__X__``. ==== Input: ---- !repeat 0o400 [end - ICITTE - 1 : 8] !end ---- Output: ---- ff fe fd fc fb fa f9 f8 f7 f6 f5 f4 f3 f2 f1 f0 ┆ •••••••••••••••• ef ee ed ec eb ea e9 e8 e7 e6 e5 e4 e3 e2 e1 e0 ┆ •••••••••••••••• df de dd dc db da d9 d8 d7 d6 d5 d4 d3 d2 d1 d0 ┆ •••••••••••••••• cf ce cd cc cb ca c9 c8 c7 c6 c5 c4 c3 c2 c1 c0 ┆ •••••••••••••••• bf be bd bc bb ba b9 b8 b7 b6 b5 b4 b3 b2 b1 b0 ┆ •••••••••••••••• af ae ad ac ab aa a9 a8 a7 a6 a5 a4 a3 a2 a1 a0 ┆ •••••••••••••••• 9f 9e 9d 9c 9b 9a 99 98 97 96 95 94 93 92 91 90 ┆ •••••••••••••••• 8f 8e 8d 8c 8b 8a 89 88 87 86 85 84 83 82 81 80 ┆ •••••••••••••••• 7f 7e 7d 7c 7b 7a 79 78 77 76 75 74 73 72 71 70 ┆ •~}|{zyxwvutsrqp 6f 6e 6d 6c 6b 6a 69 68 67 66 65 64 63 62 61 60 ┆ onmlkjihgfedcba` 5f 5e 5d 5c 5b 5a 59 58 57 56 55 54 53 52 51 50 ┆ _^]\[ZYXWVUTSRQP 4f 4e 4d 4c 4b 4a 49 48 47 46 45 44 43 42 41 40 ┆ ONMLKJIHGFEDCBA@ 3f 3e 3d 3c 3b 3a 39 38 37 36 35 34 33 32 31 30 ┆ ?>=<;:9876543210 2f 2e 2d 2c 2b 2a 29 28 27 26 25 24 23 22 21 20 ┆ /.-,+*)('&%$#"! 1f 1e 1d 1c 1b 1a 19 18 17 16 15 14 13 12 11 10 ┆ •••••••••••••••• 0f 0e 0d 0c 0b 0a 09 08 07 06 05 04 03 02 01 00 ┆ •••••••••••••••• ---- ==== ==== Input: ---- {times = 1} aa bb cc dd !repeat 3 !repeat {here + 1} ee ff !end 11 22 !repeat times 33 !end {times = times + 1} !end "coucou!" ---- Output: ---- aa bb cc dd ee ff ee ff ee ff ee ff ee ff 11 22 ┆ •••••••••••••••" 33 ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ 3••••••••••••••• ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ •••••••••••••••• ff ee ff ee ff 11 22 33 33 ee ff ee ff ee ff ee ┆ ••••••"33••••••• ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ •••••••••••••••• ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ •••••••••••••••• ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ •••••••••••••••• ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ •••••••••••••••• ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ •••••••••••••••• ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ •••••••••••••••• ff ee ff ee ff ee ff ee ff ee ff ee ff 11 22 33 ┆ ••••••••••••••"3 33 33 63 6f 75 63 6f 75 21 ┆ 33coucou! ---- ==== === Transformation block A _transformation block_ represents the bytes of one or more items transformed into other bytes by a function. As of this version, Normand only offers a predetermined set of transformation functions. An encoded block is: . The `!transform` or `!t` opening. . A transformation function name amongst: + -- [horizontal] `base64`:: `b64`:: Standard https://datatracker.ietf.org/doc/html/rfc4648.html#section-4[Base64]. `base64u`:: `b64u`:: URL-safe Base64, using `-` instead of `pass:[+]` and `_` instead of `/`. `base32`:: `b32`:: Standard https://datatracker.ietf.org/doc/html/rfc4648.html#section-6[Base32]. `base16`:: `b16`:: Standard https://datatracker.ietf.org/doc/html/rfc4648.html#section-8[Base16]. `ascii85`:: `a85`:: https://en.wikipedia.org/wiki/Ascii85[Ascii85] without padding. `ascii85p`:: `a85p`:: Ascii85 with padding. `base85`:: `b85`:: https://en.wikipedia.org/wiki/Ascii85[Base85] (like Git-style binary diffs) without padding. `base85p`:: `b85p`:: Base85 with padding. `quopri`:: `qp`:: MIME https://datatracker.ietf.org/doc/html/rfc2045#section-6.7[quoted-printable] without quoted whitespaces. `quoprit`:: `qpt`:: MIME quoted-printable with quoted whitespaces. `gzip`:: `gz`:: https://en.wikipedia.org/wiki/Gzip[gzip]. `bzip2`:: `bz2`:: https://en.wikipedia.org/wiki/Bzip2[bzip2]. -- . Zero or more items except, recursively, a macro definition block. + Any {py3} expression within any of those items may not refer to a future <>. + The value of the special name `ICITTE` in any {py3} expression within any of those items is the <> _before_ Normand applies the transformation function. Therefore, labels defined within those items also have the current offset value _before_ Normand applies the transformation function. . The `!end` closing. The <> after having handled the last item of a transformation block is the value of the current offset before handling the first item plus the size of the generated (transformed) bytes. In other words, <> within the items of the block have no impact outside said block. ==== Input: ---- aa bb cc dd "size of compressed section: " [end - start : 8] !transform bzip2 "this will be compressed!" 89*100 00*5000 !end "yes!" ---- Output: ---- aa bb cc dd 73 69 7a 65 20 6f 66 20 63 6f 6d 70 ┆ ••••size of comp 72 65 73 73 65 64 20 73 65 63 74 69 6f 6e 3a 20 ┆ ressed section: 52 42 5a 68 39 31 41 59 26 53 59 68 e1 8c fc 00 ┆ RBZh91AY&SYh•••• 00 33 d1 e0 c0 00 60 00 5e 66 dc 80 00 20 00 80 ┆ •3••••`•^f••• •• 00 08 20 00 31 40 d3 43 23 26 20 ca 87 a9 a1 e8 ┆ •• •1@•C#& ••••• 18 29 44 80 9c 80 49 bf cc b3 e8 45 ed e2 76 ad ┆ •)D•••I••••E••v• 0f 12 8b 8a d6 cd 40 04 7e 2e e4 8a 70 a1 20 d1 ┆ ••••••@•~.••p• • c3 19 f8 79 65 73 21 ┆ •••yes! ---- ==== ==== Input: ---- 88*16 !t a85 "I am determined to be cheerful and happy in whatever situation " "I may find myself. For I have learned that the greater part of " "our misery or unhappiness is determined not by our circumstance " "but by our disposition." !end @128~99h !t qp [ICITTE - beg : 8] * 50 !end ---- Output: ---- 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 ┆ •••••••••••••••• 38 4b 5f 47 59 2b 43 6f 26 2a 41 54 44 58 25 44 ┆ 8K_GY+Co&*ATDX%D 49 6d 3f 24 46 44 69 3a 32 41 4b 59 4a 72 41 53 ┆ Im?$FDi:2AKYJrAS 23 6d 6f 46 5f 69 31 2f 44 49 61 6c 27 40 3b 70 ┆ #moF_i1/DIal'@;p 31 32 2b 44 47 5e 39 47 41 28 45 2c 41 54 68 58 ┆ 12+DG^9GA(E,AThX 2a 2b 45 4d 37 3d 46 5e 5d 42 2b 44 66 2d 5b 68 ┆ *+EM7=F^]B+Df-[h 2b 44 6b 50 34 2b 44 2c 3e 2a 41 30 3e 60 37 46 ┆ +DkP4+D,>*A0>`7F 28 4b 30 22 2f 67 2a 57 25 45 5a 64 70 72 42 4f ┆ (K0"/g*W%EZdprBO 51 27 71 2b 44 62 55 74 45 63 2c 48 21 2b 45 56 ┆ Q'q+DbUtEc,H!+EV 3a 2a 46 3c 47 5b 3d 41 4b 59 57 2b 41 52 54 5b ┆ :*F 63 2e 46 3c 47 25 3c 2b 45 29 43 43 2b 43 66 2c ┆ c.F> does so. A macro definition may only exist at the root level, that is, not within a <>, a <>, a <>, or another <>. All macro definitions must have unique names. A macro definition is: . The `!macro` or `!m` opening. . A valid {py3} name (the macro name). . The `(` parameter name list prefix. . A comma-separated list of zero or more unique parameter names, each one being a valid {py3} name. . The `)` parameter name list suffix. . Zero or more items except, recursively, a macro definition block. . The `!end` closing. ==== ---- !macro bake() !le [ICITTE * 8 : 16] u16le"predict explode" !end ---- ==== ==== ---- !macro nail(rep, with_extra, val) {iter = 1} !repeat rep [val + iter : uleb128] [0xdeadbeef : 32] {iter = iter + 1} !end !if with_extra "meow mix\0" !end !end ---- ==== === Macro expansion A _macro expansion_ expands the items of a defined <>. The macro to expand must be defined _before_ the expansion. The <> before handling the first item of the chosen macro is: <>:: Unchanged. <>:: Unchanged. Variables:: The only available variables initially are the macro parameters. Labels:: None. The state after having handled the last item of the chosen macro is: Current offset:: The one before handling the first item of the macro plus the size of the generated data of the macro expansion. + IMPORTANT: This means <> items within the expanded macro don't impact the final current offset. Current byte order:: The one before handling the first item of the macro. Variables:: The ones before handling the first item of the macro. Labels:: The ones before handling the first item of the macro. A macro expansion is: . The `m:` prefix. . A valid {py3} name (the name of the macro to expand). . The `(` parameter value list prefix. . A comma-separated list of zero or more unique parameter values. + The number of parameter values must match the number of parameter names of the definition of the chosen macro. + A parameter value is one of: + -- * A <>, possibly negative. * A constant floating point number. * The ``pass:[{]`` prefix, a valid {py3} expression of which the evaluation result type is `int` or `bool` (automatically converted to `int`), and the `}` suffix. + For a macro expansion at some source location{nbsp}__**L**__, this expression may contain: ** The name of any <> defined before{nbsp}__**L**__ which isn't within a nested group. ** The name of any <> known at{nbsp}__**L**__. + The value of the special name `ICITTE` (`int` type) in this expression is the <> (before handling the items of the chosen macro). * A valid {py3} name. + For the name `__NAME__`, this is equivalent to the `pass:[{]__NAME__pass:[}]` form above. -- . The `)` parameter value list suffix. ==== Input: ---- !macro bake() !le [ICITTE * 8 : 16] u16le"predict explode" !end "hello [" m:bake() "] world" m:bake() * 5 ---- Output: ---- 68 65 6c 6c 6f 20 5b 38 00 70 00 72 00 65 00 64 ┆ hello [8•p•r•e•d 00 69 00 63 00 74 00 20 00 65 00 78 00 70 00 6c ┆ •i•c•t• •e•x•p•l 00 6f 00 64 00 65 00 5d 20 77 6f 72 6c 64 70 01 ┆ •o•d•e•] worldp• 70 00 72 00 65 00 64 00 69 00 63 00 74 00 20 00 ┆ p•r•e•d•i•c•t• • 65 00 78 00 70 00 6c 00 6f 00 64 00 65 00 70 02 ┆ e•x•p•l•o•d•e•p• 70 00 72 00 65 00 64 00 69 00 63 00 74 00 20 00 ┆ p•r•e•d•i•c•t• • 65 00 78 00 70 00 6c 00 6f 00 64 00 65 00 70 03 ┆ e•x•p•l•o•d•e•p• 70 00 72 00 65 00 64 00 69 00 63 00 74 00 20 00 ┆ p•r•e•d•i•c•t• • 65 00 78 00 70 00 6c 00 6f 00 64 00 65 00 70 04 ┆ e•x•p•l•o•d•e•p• 70 00 72 00 65 00 64 00 69 00 63 00 74 00 20 00 ┆ p•r•e•d•i•c•t• • 65 00 78 00 70 00 6c 00 6f 00 64 00 65 00 70 05 ┆ e•x•p•l•o•d•e•p• 70 00 72 00 65 00 64 00 69 00 63 00 74 00 20 00 ┆ p•r•e•d•i•c•t• • 65 00 78 00 70 00 6c 00 6f 00 64 00 65 00 ┆ e•x•p•l•o•d•e• ---- ==== ==== Input: ---- !macro A(val, is_be) !le !if is_be !be !end [val : 16] !end !macro B(rep, is_be) {iter = 1} !repeat rep m:A({iter * 3}, is_be) {iter = iter + 1} !end !end m:B(5, 1) m:B(3, 0) ---- Output: ---- 00 03 00 06 00 09 00 0c 00 0f 03 00 06 00 09 00 ---- ==== ==== Input: ---- !macro flt32be(val) !be [val : 32] !end "CHEETOS" m:flt32be(-42.17) m:flt32be(56.23e-4) ---- Output: ---- 43 48 45 45 54 4f 53 c2 28 ae 14 3b b8 41 25 ┆ CHEETOS•(••;•A% ---- ==== === Post-item repetition A _post-item repetition_ represents the bytes of an item repeated a given number of times. A post-item repetition is: . One of those items: ** A <>. ** A <>. ** A <>. ** An <>. ** A <>. ** A <>. ** A <>. ** A <>. . The ``pass:[*]`` character. . One of: ** A positive integer (hexadecimal starting with `0x` or `0X` accepted) which is the number of times to repeat the previous item. ** The ``pass:[{]`` prefix, a valid {py3} expression of which the evaluation result type is `int` or `bool` (automatically converted to `int`), and the `}` suffix. + For a post-item repetition at some source location{nbsp}__**L**__, this expression may contain: + -- * The name of any <> defined before{nbsp}__**L**__ which isn't within a nested group and which isn't part of the repeated item. * The name of any <> known at{nbsp}__**L**__, which isn't part of its repeated item, and which doesn't. -- + The value of the special name `ICITTE` (`int` type) in this expression is the <> (before handling the items to repeat). ** A valid {py3} name. + For the name `__NAME__`, this is equivalent to the `pass:[{]__NAME__pass:[}]` form above. You may also use a <>. The form ``__ITEM__{nbsp}pass:[*]{nbsp}__X__`` is equivalent to ``!repeat{nbsp}__X__{nbsp}__ITEM__{nbsp}!end``. ==== Input: ---- [end - ICITTE - 1 : 8] * 0x100 ---- Output: ---- ff fe fd fc fb fa f9 f8 f7 f6 f5 f4 f3 f2 f1 f0 ┆ •••••••••••••••• ef ee ed ec eb ea e9 e8 e7 e6 e5 e4 e3 e2 e1 e0 ┆ •••••••••••••••• df de dd dc db da d9 d8 d7 d6 d5 d4 d3 d2 d1 d0 ┆ •••••••••••••••• cf ce cd cc cb ca c9 c8 c7 c6 c5 c4 c3 c2 c1 c0 ┆ •••••••••••••••• bf be bd bc bb ba b9 b8 b7 b6 b5 b4 b3 b2 b1 b0 ┆ •••••••••••••••• af ae ad ac ab aa a9 a8 a7 a6 a5 a4 a3 a2 a1 a0 ┆ •••••••••••••••• 9f 9e 9d 9c 9b 9a 99 98 97 96 95 94 93 92 91 90 ┆ •••••••••••••••• 8f 8e 8d 8c 8b 8a 89 88 87 86 85 84 83 82 81 80 ┆ •••••••••••••••• 7f 7e 7d 7c 7b 7a 79 78 77 76 75 74 73 72 71 70 ┆ •~}|{zyxwvutsrqp 6f 6e 6d 6c 6b 6a 69 68 67 66 65 64 63 62 61 60 ┆ onmlkjihgfedcba` 5f 5e 5d 5c 5b 5a 59 58 57 56 55 54 53 52 51 50 ┆ _^]\[ZYXWVUTSRQP 4f 4e 4d 4c 4b 4a 49 48 47 46 45 44 43 42 41 40 ┆ ONMLKJIHGFEDCBA@ 3f 3e 3d 3c 3b 3a 39 38 37 36 35 34 33 32 31 30 ┆ ?>=<;:9876543210 2f 2e 2d 2c 2b 2a 29 28 27 26 25 24 23 22 21 20 ┆ /.-,+*)('&%$#"! 1f 1e 1d 1c 1b 1a 19 18 17 16 15 14 13 12 11 10 ┆ •••••••••••••••• 0f 0e 0d 0c 0b 0a 09 08 07 06 05 04 03 02 01 00 ┆ •••••••••••••••• ---- ==== ==== Input: ---- {times = 1} aa bb cc dd ( (ee ff) * {here + 1} 11 22 33 * {times} {times = times + 1} ) * 3 "coucou!" ---- Output: ---- aa bb cc dd ee ff ee ff ee ff ee ff ee ff 11 22 ┆ •••••••••••••••" 33 ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ 3••••••••••••••• ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ •••••••••••••••• ff ee ff ee ff 11 22 33 33 ee ff ee ff ee ff ee ┆ ••••••"33••••••• ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ •••••••••••••••• ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ •••••••••••••••• ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ •••••••••••••••• ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ •••••••••••••••• ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ •••••••••••••••• ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ •••••••••••••••• ff ee ff ee ff ee ff ee ff ee ff ee ff 11 22 33 ┆ ••••••••••••••"3 33 33 63 6f 75 63 6f 75 21 ┆ 33coucou! ---- ==== == Command-line tool If you <> the `normand` package, then you can use the `normand` command-line tool: ---- $ normand <<< '"ma gang de malades"' | hexdump -C ---- ---- 00000000 6d 61 20 67 61 6e 67 20 64 65 20 6d 61 6c 61 64 |ma gang de malad| 00000010 65 73 |es| ---- If you copy the `normand.py` module to your own project, then you can run the module itself: ---- $ python3 -m normand <<< '"ma gang de malades"' | hexdump -C ---- ---- 00000000 6d 61 20 67 61 6e 67 20 64 65 20 6d 61 6c 61 64 |ma gang de malad| 00000010 65 73 |es| ---- Without a path argument, the `normand` tool reads from the standard input. The `normand` tool prints the generated binary data to the standard output. Various options control the initial <> of the processor: use the `--help` option to learn more. == {py3} API The whole `normand` package/module public API is: [source,python] ---- # Byte order. class ByteOrder(enum.Enum): # Big endian. BE = ... # Little endian. LE = ... # Text location. class TextLocation: # Line number. @property def line_no(self) -> int: ... # Column number. @property def col_no(self) -> int: ... # Parsing error message. class ParseErrorMessage: # Message text. @property def text(self): ... # Source text location. @property def text_location(self): ... # Parsing error. class ParseError(RuntimeError): # Parsing error messages. # # The first message is the most _specific_ one. @property def messages(self): ... # Variables dictionary type (for type hints). VariablesT = typing.Dict[str, typing.Union[int, float]] # Labels dictionary type (for type hints). LabelsT = typing.Dict[str, int] # Parsing result. class ParseResult: # Generated data. @property def data(self) -> bytearray: ... # Updated variable values. @property def variables(self) -> SymbolsT: ... # Updated main group label values. @property def labels(self) -> SymbolsT: ... # Final offset. @property def offset(self) -> int: ... # Final byte order. @property def byte_order(self) -> typing.Optional[ByteOrder]: ... # Parses the `normand` input using the initial state defined by # `init_variables`, `init_labels`, `init_offset`, and `init_byte_order`, # and returns the corresponding parsing result. def parse(normand: str, init_variables: typing.Optional[SymbolsT] = None, init_labels: typing.Optional[SymbolsT] = None, init_offset: int = 0, init_byte_order: typing.Optional[ByteOrder] = None) -> ParseResult: ... ---- The `normand` parameter is the actual <> while the other parameters control the initial <>. The `parse()` function raises a `ParseError` instance should it fail to parse the `normand` string for any reason. == Development Normand is a https://python-poetry.org/[Poetry] project. To develop it, install it through Poetry and enter the virtual environment: ---- $ poetry install $ poetry shell $ normand <<< '"lol" * 10 0a' ---- `normand.py` is processed by: * https://microsoft.github.io/pyright/[Pyright] * https://github.com/psf/black[Black] * https://pycqa.github.io/isort/[isort] Licensing and copyright follows the https://reuse.software/tutorial/[REUSE] specification and is checked with the https://github.com/fsfe/reuse-tool[reuse tool]. === Testing Use https://docs.pytest.org/[pytest] to test Normand once the package is part of your virtual environment, for example: ---- $ poetry install $ poetry run pip3 install pytest $ poetry run pytest ---- The `pytest` project is currently not a development dependency in `pyproject.toml` due to backward compatibiliy issues with Python{nbsp}3.4. In the `tests` directory, each `*.nt` file is a test. The file name prefix indicates what it's meant to test: `pass-`:: Everything above the `---` line is the valid Normand input to test. + Everything below the `---` line is the expected data (whitespace-separated hexadecimal bytes). `fail-`:: Everything above the `---` line is the invalid Normand input to test. + Everything below the `---` line is the expected error message having this form: + ---- LINE:COL - MESSAGE ---- === Contributing Normand uses https://review.lttng.org/admin/repos/normand,general[Gerrit] for code review. To report a bug, https://github.com/efficios/normand/issues/new[create a GitHub issue].