89731e668dd836dee15b8ae73159e68920d981e6
[normand.git] / README.adoc
1 // Show ToC at a specific location for a GitHub rendering
2 ifdef::env-github[]
3 :toc: macro
4 endif::env-github[]
5
6 ifndef::env-github[]
7 :toc: left
8 endif::env-github[]
9
10 // This is to mimic what GitHub does so that anchors work in an offline
11 // rendering too.
12 :idprefix:
13 :idseparator: -
14
15 // Other attributes
16 :py3: Python{nbsp}3
17
18 = Normand
19 Philippe Proulx
20
21 image::normand-logo.png[]
22
23 [.normal]
24 image:https://img.shields.io/pypi/v/normand.svg?label=Latest%20version[link="https://pypi.python.org/pypi/normand"]
25
26 [.lead]
27 _**Normand**_ is a text-to-binary processor with its own language.
28
29 This package offers both a portable {py3} module and a command-line
30 tool.
31
32 WARNING: This version of Normand is 0.15, meaning both the Normand
33 language and the module/CLI interface aren't stable.
34
35 ifdef::env-github[]
36 // ToC location for a GitHub rendering
37 toc::[]
38 endif::env-github[]
39
40 == Introduction
41
42 The purpose of Normand is to consume human-readable text representing
43 bytes and to produce the corresponding binary data.
44
45 .Simple bytes input.
46 ====
47 Consider the following Normand input:
48
49 ----
50 4f 55 32 bb $167 fe %10100111 a9 $-32
51 ----
52
53 The generated nine bytes are:
54
55 ----
56 4f 55 32 bb a7 fe a7 a9 e0
57 ----
58 ====
59
60 As you can see in the last example, the fundamental unit of the Normand
61 language is the _byte_. The order in which you list bytes will be the
62 order of the generated data.
63
64 The Normand language is more than simple lists of bytes, though. Its
65 main features are:
66
67 Comments, including a bunch of insignificant symbols which may improve readability::
68 +
69 Input:
70 +
71 ----
72 ff bb %1101:0010 # This is a comment
73 78 29 af $192 # This too # 99 $-80
74 fe80::6257:18ff:fea3:4229
75 60:57:18:a3:42:29
76 10839636-5d65-4a68-8e6a-21608ddf7258
77 ----
78 +
79 Output:
80 +
81 ----
82 ff bb d2 78 29 af c0 99 b0 fe 80 62 57 18 ff fe
83 a3 42 29 60 57 18 a3 42 29 10 83 96 36 5d 65 4a
84 68 8e 6a 21 60 8d df 72 58
85 ----
86
87 Hexadecimal, decimal, and binary byte constants::
88 +
89 Input:
90 +
91 ----
92 aa bb $247 $-89 %0011_0010 %11.01= 10/10
93 ----
94 +
95 Output:
96 +
97 ----
98 aa bb f7 a7 32 da
99 ----
100
101 UTF-8, UTF-16, and UTF-32 literal strings::
102 +
103 Input:
104 +
105 ----
106 "hello world!" 00
107 u16le"stress\nverdict 🤣"
108 ----
109 +
110 Output:
111 +
112 ----
113 68 65 6c 6c 6f 20 77 6f 72 6c 64 21 00 73 00 74 ┆ hello world!•s•t
114 00 72 00 65 00 73 00 73 00 0a 00 76 00 65 00 72 ┆ •r•e•s•s•••v•e•r
115 00 64 00 69 00 63 00 74 00 20 00 3e d8 23 dd ┆ •d•i•c•t• •>•#•
116 ----
117
118 Labels: special variables holding the offset where they're defined::
119 +
120 ----
121 <beg> b2 52 e3 bc 91 05
122 $100 $50 <chair> 33 9f fe
123 25 e9 89 8a <end>
124 ----
125
126 Variables::
127 +
128 ----
129 5e 65 {tower = 47} c6 7f f2 c4
130 44 {hurl = tower - 14} b5 {tower = hurl} 26 2d
131 ----
132 +
133 The value of a variable assignment is the evaluation of a valid {py3}
134 expression which may include label and variable names.
135
136 Fixed-length number with a given length (8{nbsp}bits to 64{nbsp}bits) and byte order::
137 +
138 Input:
139 +
140 ----
141 {strength = 4}
142 {be} 67 <lbl> 44 $178 {(end - lbl) * 8 + strength : 16} $99 <end>
143 {le} {-1993 : 32}
144 {-3.141593 : 64}
145 ----
146 +
147 Output:
148 +
149 ----
150 67 44 b2 00 2c 63 37 f8 ff ff 7f bd c2 82 fb 21
151 09 c0
152 ----
153 +
154 The encoded number is the evaluation of a valid {py3} expression which
155 may include label and variable names.
156
157 https://en.wikipedia.org/wiki/LEB128[LEB128] integer::
158 +
159 Input:
160 +
161 ----
162 aa bb cc {-1993 : sleb128} <meow> dd ee ff
163 {meow * 199 : uleb128}
164 ----
165 +
166 Output:
167 +
168 ----
169 aa bb cc b7 70 dd ee ff e3 07
170 ----
171 +
172 The encoded integer is the evaluation of a valid {py3} expression which
173 may include label and variable names.
174
175 Conditional::
176 +
177 Input:
178 +
179 ----
180 aa bb cc
181
182 (
183 "foo"
184
185 !if {ICITTE > 10}
186 "bar"
187 !else
188 "fight"
189 !end
190 ) * 4
191 ----
192 +
193 Output:
194 +
195 ----
196 aa bb cc 66 6f 6f 66 69 67 68 74 66 6f 6f 66 69 ┆ •••foofightfoofi
197 67 68 74 66 6f 6f 62 61 72 66 6f 6f 62 61 72 ┆ ghtfoobarfoobar
198 ----
199
200 Repetition::
201 +
202 Input:
203 +
204 ----
205 aa bb * 5 cc <zoom> "yeah\0" * {zoom * 3}
206
207 !repeat 3
208 ff ee "juice"
209 !end
210 ----
211 +
212 Output:
213 +
214 ----
215 aa bb bb bb bb bb cc 79 65 61 68 00 79 65 61 68 ┆ •••••••yeah•yeah
216 00 79 65 61 68 00 79 65 61 68 00 79 65 61 68 00 ┆ •yeah•yeah•yeah•
217 79 65 61 68 00 79 65 61 68 00 79 65 61 68 00 79 ┆ yeah•yeah•yeah•y
218 65 61 68 00 79 65 61 68 00 79 65 61 68 00 79 65 ┆ eah•yeah•yeah•ye
219 61 68 00 79 65 61 68 00 79 65 61 68 00 79 65 61 ┆ ah•yeah•yeah•yea
220 68 00 79 65 61 68 00 79 65 61 68 00 79 65 61 68 ┆ h•yeah•yeah•yeah
221 00 79 65 61 68 00 79 65 61 68 00 79 65 61 68 00 ┆ •yeah•yeah•yeah•
222 ff ee 6a 75 69 63 65 ff ee 6a 75 69 63 65 ff ee ┆ ••juice••juice••
223 6a 75 69 63 65 ┆ juice
224 ----
225
226 Alignment::
227 +
228 Input:
229 +
230 ----
231 {be}
232
233 {199:32}
234 @64 {43:64}
235 @16 {-123:16}
236 @32~255 {5584:32}
237 ----
238 +
239 Output:
240 +
241 ----
242 00 00 00 c7 00 00 00 00 00 00 00 00 00 00 00 2b
243 ff 85 ff ff 00 00 15 d0
244 ----
245
246 Filling::
247 +
248 Input:
249 +
250 ----
251 {le}
252 {0xdeadbeef:32}
253 {-1993:16}
254 {9:16}
255 +0x40
256 {ICITTE:8}
257 "meow mix"
258 +200~FFh
259 {ICITTE:8}
260 ----
261 +
262 Output:
263 +
264 ----
265 ef be ad de 37 f8 09 00 00 00 00 00 00 00 00 00 ┆ ••••7•••••••••••
266 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ┆ ••••••••••••••••
267 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ┆ ••••••••••••••••
268 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ┆ ••••••••••••••••
269 40 6d 65 6f 77 20 6d 69 78 ff ff ff ff ff ff ff ┆ @meow mix•••••••
270 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ┆ ••••••••••••••••
271 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ┆ ••••••••••••••••
272 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ┆ ••••••••••••••••
273 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ┆ ••••••••••••••••
274 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ┆ ••••••••••••••••
275 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ┆ ••••••••••••••••
276 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ┆ ••••••••••••••••
277 ff ff ff ff ff ff ff ff c8 ┆ •••••••••
278 ----
279
280 Multilevel grouping::
281 +
282 Input:
283 +
284 ----
285 ff ((aa bb "zoom" cc) * 5) * 3 $-34 * 4
286 ----
287 +
288 Output:
289 +
290 ----
291 ff aa bb 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa ┆ •••zoom•••zoom••
292 bb 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa bb 7a ┆ •zoom•••zoom•••z
293 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa bb 7a 6f 6f ┆ oom•••zoom•••zoo
294 6d cc aa bb 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc ┆ m•••zoom•••zoom•
295 aa bb 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa bb ┆ ••zoom•••zoom•••
296 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa bb 7a 6f ┆ zoom•••zoom•••zo
297 6f 6d cc aa bb 7a 6f 6f 6d cc de de de de ┆ om•••zoom•••••
298 ----
299
300 Macros::
301 +
302 Input:
303 +
304 ----
305 !macro hello(world)
306 "hello"
307 !if world " world" !end
308 !end
309
310 !repeat 17
311 ff ff ff ff
312 m:hello({ICITTE > 15 and ICITTE < 60})
313 !end
314 ----
315 +
316 Output:
317 +
318 ----
319 ff ff ff ff 68 65 6c 6c 6f ff ff ff ff 68 65 6c ┆ ••••hello••••hel
320 6c 6f ff ff ff ff 68 65 6c 6c 6f 20 77 6f 72 6c ┆ lo••••hello worl
321 64 ff ff ff ff 68 65 6c 6c 6f 20 77 6f 72 6c 64 ┆ d••••hello world
322 ff ff ff ff 68 65 6c 6c 6f 20 77 6f 72 6c 64 ff ┆ ••••hello world•
323 ff ff ff 68 65 6c 6c 6f ff ff ff ff 68 65 6c 6c ┆ •••hello••••hell
324 6f ff ff ff ff 68 65 6c 6c 6f ff ff ff ff 68 65 ┆ o••••hello••••he
325 6c 6c 6f ff ff ff ff 68 65 6c 6c 6f ff ff ff ff ┆ llo••••hello••••
326 68 65 6c 6c 6f ff ff ff ff 68 65 6c 6c 6f ff ff ┆ hello••••hello••
327 ff ff 68 65 6c 6c 6f ff ff ff ff 68 65 6c 6c 6f ┆ ••hello••••hello
328 ff ff ff ff 68 65 6c 6c 6f ff ff ff ff 68 65 6c ┆ ••••hello••••hel
329 6c 6f ff ff ff ff 68 65 6c 6c 6f ┆ lo••••hello
330 ----
331
332 Precise error reporting::
333 +
334 ----
335 /tmp/meow.normand:10:24 - Expecting a bit (`0` or `1`).
336 ----
337 +
338 ----
339 /tmp/meow.normand:32:6 - Unexpected character `k`.
340 ----
341 +
342 ----
343 /tmp/meow.normand:24:19 - Illegal (unknown or unreachable) variable/label name `meow` in expression `(meow - 45) // 8`; the legal names are {`ICITTE`, `mix`, `zoom`}.
344 ----
345 +
346 ----
347 /tmp/meow.normand:32:19 - While expanding the macro `meow`:
348 /tmp/meow.normand:35:5 - While expanding the macro `zzz`:
349 /tmp/meow.normand:18:9 - Value 315 is outside the 8-bit range when evaluating expression `end - ICITTE`.
350 ----
351
352 You can use Normand to track data source files in your favorite VCS
353 instead of raw binary files. The binary files that Normand generates can
354 be used to test file format decoding, including malformatted data, for
355 example, as well as for education.
356
357 See <<learn-normand>> to explore all the Normand features.
358
359 == Install Normand
360
361 Normand requires Python ≥ 3.4.
362
363 To install Normand:
364
365 ----
366 $ python3 -m pip install --user normand
367 ----
368
369 See
370 https://packaging.python.org/en/latest/tutorials/installing-packages/#installing-to-the-user-site[Installing to the User Site]
371 to learn more about a user site installation.
372
373 [NOTE]
374 ====
375 Normand has a single module file, `normand.py`, which you can copy as is
376 to your project to use it (both the <<python3-api,`normand.parse()`>>
377 function and the <<command-line-tool,command-line tool>>).
378
379 `normand.py` has _no external dependencies_, but if you're using
380 Python{nbsp}3.4, you'll need a local copy of the standard `typing`
381 module.
382 ====
383
384 == Learn Normand
385
386 A Normand text input is a sequence of items which represent a sequence
387 of raw bytes.
388
389 [[state]] During the processing of items to data, Normand relies on a
390 current state:
391
392 [%header%autowidth]
393 |===
394 |State variable |Description |Initial value: <<python3-api,{py3} API>> |Initial value: <<command-line-tool,CLI>>
395
396 |[[cur-offset]] Current offset
397 |
398 The current offset has an effect on the value of <<label,labels>> and of
399 the special `ICITTE` name in <<fixed-length-number,fixed-length
400 number>>, <<leb-128-integer,LEB128 integer>>,
401 <<variable-assignment,variable assignment>>,
402 <<conditional-block,conditional block>>, <<repetition-block,repetition
403 block>>, <<macro-expansion,macro expansion>>, and
404 <<post-item-repetition,post-item repetition>> expression evaluation.
405
406 Each generated byte increments the current offset.
407
408 A <<current-offset-setting,current offset setting>> may change the
409 current offset without generating data.
410
411 An <<current-offset-alignment,current offset alignment>> generates
412 padding bytes to make the current offset satisfy a given alignment.
413 |`init_offset` parameter of the `parse()` function.
414 |`--offset` option.
415
416 |[[cur-bo]] Current byte order
417 |
418 The current byte order has an effect on the encoding of
419 <<fixed-length-number,fixed-length numbers>>.
420
421 A <<current-byte-order-setting,current byte order setting>> may change
422 the current byte order.
423 |`init_byte_order` parameter of the `parse()` function.
424 |`--byte-order` option.
425
426 |<<label,Labels>>
427 |Mapping of label names to integral values.
428 |`init_labels` parameter of the `parse()` function.
429 |One or more `--label` options.
430
431 |<<variable-assignment,Variables>>
432 |Mapping of variable names to integral or floating point number values.
433 |`init_variables` parameter of the `parse()` function.
434 |One or more `--var` options.
435 |===
436
437 The available items are:
438
439 * A <<byte-constant,constant integer>> representing a single byte.
440
441 * A <<literal-string,literal string>> representing a sequence of bytes
442 encoding UTF-8, UTF-16, or UTF-32 data.
443
444 * A <<current-byte-order-setting,current byte order setting>> (big or
445 little endian).
446
447 * A <<fixed-length-number,fixed-length number>> (integer or
448 floating point) using the <<cur-bo,current byte order>> and of which
449 the value is the result of a {py3} expression.
450
451 * An <<leb128-integer,LEB128 integer>> of which the value is the result
452 of a {py3} expression.
453
454 * A <<current-offset-setting,current offset setting>>.
455
456 * A <<current-offset-alignment,current offset alignment>>.
457
458 * A <<filling,filling>>.
459
460 * A <<label,label>>, that is, a named constant holding the current
461 offset.
462 +
463 This is similar to an assembly label.
464
465 * A <<variable-assignment,variable assignment>> associating a name to
466 the integral result of an evaluated {py3} expression.
467
468 * A <<group,group>>, that is, a scoped sequence of items.
469
470 * A <<conditional-block,conditional block>>.
471
472 * A <<repetition-block,repetition block>>.
473
474 * A <<macro-definition-block,macro definition block>>.
475
476 * A <<macro-expansion,macro expansion>>.
477
478 Moreover, you can repeat many items above a constant or variable number
479 of times with the ``pass:[*]`` operator _after_ the item to repeat. This
480 is called a <<post-item-repetition,post-item repetition>>.
481
482 A Normand comment may exist:
483
484 * Between items, possibly within a group.
485 * Between the nibbles of a constant hexadecimal byte.
486 * Between the bits of a constant binary byte.
487 * Between the last item and the ``pass:[*]`` character of a post-item
488 repetition, and between that ``pass:[*]`` character and the following
489 number or expression.
490 * Between the ``!repeat``/``!r`` block opening and the following
491 constant integer, name, or expression of a repetition block.
492 * Between the ``!if`` block opening and the following name or expression
493 of a conditional block.
494
495 A comment is anything between two ``pass:[#]`` characters on the same
496 line, or from ``pass:[#]`` until the end of the line. Whitespaces and
497 the following symbol characters are also considered comments where a
498 comment may exist:
499
500 ----
501 / \ ? & : ; . , [ ] _ = | -
502 ----
503
504 The latter serve to improve readability so that you may write, for
505 example, a MAC address or a UUID as is.
506
507 [[const-int]] Many items require a _constant integer_, possibly
508 negative, in which case it may start with `-` for a negative integer. A
509 positive constant integer is any of:
510
511 Decimal::
512 One or mode digits (`0` to `9`).
513
514 Hexadecimal::
515 One of:
516 +
517 * The `0x` or `0X` prefix followed with one or more hexadecimal digits
518 (`0` to `9`, `a` to `f`, or `A` to `F`).
519 * One or more hexadecimal digits followed with the `h` or `H` suffix.
520
521 Octal::
522 One of:
523 +
524 * The `0o` or `0O` prefix followed with one or more octal digits
525 (`0` to `7`).
526 * One or more octal digits followed with the `o`, `O`, `q`, or `Q`
527 suffix.
528
529 Binary::
530 One of:
531 +
532 * The `0b` or `0B` prefix followed with one or more bits (`0` or `1`).
533 * One or more bits followed with the `b` or `B` suffix.
534
535 You can test the examples of this section with the `normand`
536 <<command-line-tool,command-line tool>> as such:
537
538 ----
539 $ normand file | hexdump -C
540 ----
541
542 where `file` is the name of a file containing the Normand input.
543
544 === Byte constant
545
546 A _byte constant_ represents a single byte.
547
548 A byte constant is:
549
550 Hexadecimal form::
551 Two consecutive hexadecimal digits.
552
553 Decimal form::
554 One or more digits after the `$` prefix.
555
556 Binary form::
557 Eight bits after the `%` prefix.
558
559 ====
560 Input:
561
562 ----
563 ab cd [3d 8F] CC
564 ----
565
566 Output:
567
568 ----
569 ab cd 3d 8f cc
570 ----
571 ====
572
573 ====
574 Input:
575
576 ----
577 $192 %1100/0011 $ -77
578 ----
579
580 Output:
581
582 ----
583 c0 c3 b3
584 ----
585 ====
586
587 ====
588 Input:
589
590 ----
591 58f64689-6316-4d55-8a1a-04cada366172
592 fe80::6257:18ff:fea3:4229
593 ----
594
595 Output:
596
597 ----
598 58 f6 46 89 63 16 4d 55 8a 1a 04 ca da 36 61 72 ┆ X•F•c•MU•••••6ar
599 fe 80 62 57 18 ff fe a3 42 29 ┆ ••bW••••B)
600 ----
601 ====
602
603 ====
604 Input:
605
606 ----
607 %01110011 %01100001 %01101100 %01110101 %01110100
608 ----
609
610 Output:
611
612 ----
613 73 61 6c 75 74 ┆ salut
614 ----
615 ====
616
617 === Literal string
618
619 A _literal string_ represents the UTF-8-, UTF-16-, or UTF-32-encoded
620 bytes of a string.
621
622 The string to encode isn't implicitly null-terminated: use `\0` at the
623 end of the string to add a null character.
624
625 A literal string is:
626
627 . **Optional**: one of the following encodings instead of UTF-8:
628 +
629 --
630 [horizontal]
631 `u16be`:: UTF-16BE.
632 `u16le`:: UTF-16LE.
633 `u32be`:: UTF-32BE.
634 `u32le`:: UTF-32LE.
635 --
636
637 . The ``pass:["]`` prefix.
638
639 . A sequence of zero or more characters, possibly containing escape
640 sequences.
641 +
642 An escape sequence is the ``\`` character followed by one of:
643 +
644 --
645 [horizontal]
646 `0`:: Null (U+0000)
647 `a`:: Alert (U+0007)
648 `b`:: Backspace (U+0008)
649 `e`:: Escape (U+001B)
650 `f`:: Form feed (U+000C)
651 `n`:: End of line (U+000A)
652 `r`:: Carriage return (U+000D)
653 `t`:: Character tabulation (U+0009)
654 `v`:: Line tabulation (U+000B)
655 ``\``:: Reverse solidus (U+005C)
656 ``pass:["]``:: Quotation mark (U+0022)
657 --
658
659 . The ``pass:["]`` suffix.
660
661 ====
662 Input:
663
664 ----
665 "coucou tout le monde!"
666 ----
667
668 Output:
669
670 ----
671 63 6f 75 63 6f 75 20 74 6f 75 74 20 6c 65 20 6d ┆ coucou tout le m
672 6f 6e 64 65 21 ┆ onde!
673 ----
674 ====
675
676 ====
677 Input:
678
679 ----
680 u16le"I am not young enough to know everything."
681 ----
682
683 Output:
684
685 ----
686 49 00 20 00 61 00 6d 00 20 00 6e 00 6f 00 74 00 ┆ I• •a•m• •n•o•t•
687 20 00 79 00 6f 00 75 00 6e 00 67 00 20 00 65 00 ┆ •y•o•u•n•g• •e•
688 6e 00 6f 00 75 00 67 00 68 00 20 00 74 00 6f 00 ┆ n•o•u•g•h• •t•o•
689 20 00 6b 00 6e 00 6f 00 77 00 20 00 65 00 76 00 ┆ •k•n•o•w• •e•v•
690 65 00 72 00 79 00 74 00 68 00 69 00 6e 00 67 00 ┆ e•r•y•t•h•i•n•g•
691 2e 00 ┆ .•
692 ----
693 ====
694
695 ====
696 Input:
697
698 ----
699 u32be "\"illusion is the first\nof all pleasures\" 🦉"
700 ----
701
702 Output:
703
704 ----
705 00 00 00 22 00 00 00 69 00 00 00 6c 00 00 00 6c ┆ •••"•••i•••l•••l
706 00 00 00 75 00 00 00 73 00 00 00 69 00 00 00 6f ┆ •••u•••s•••i•••o
707 00 00 00 6e 00 00 00 20 00 00 00 69 00 00 00 73 ┆ •••n••• •••i•••s
708 00 00 00 20 00 00 00 74 00 00 00 68 00 00 00 65 ┆ ••• •••t•••h•••e
709 00 00 00 20 00 00 00 66 00 00 00 69 00 00 00 72 ┆ ••• •••f•••i•••r
710 00 00 00 73 00 00 00 74 00 00 00 0a 00 00 00 6f ┆ •••s•••t•••••••o
711 00 00 00 66 00 00 00 20 00 00 00 61 00 00 00 6c ┆ •••f••• •••a•••l
712 00 00 00 6c 00 00 00 20 00 00 00 70 00 00 00 6c ┆ •••l••• •••p•••l
713 00 00 00 65 00 00 00 61 00 00 00 73 00 00 00 75 ┆ •••e•••a•••s•••u
714 00 00 00 72 00 00 00 65 00 00 00 73 00 00 00 22 ┆ •••r•••e•••s•••"
715 00 00 00 20 00 01 f9 89 ┆ ••• ••••
716 ----
717 ====
718
719 === Current byte order setting
720
721 This special item sets the <<cur-bo,_current byte order_>>.
722
723 The two accepted forms are:
724
725 [horizontal]
726 ``pass:[{be}]``:: Set the current byte order to big endian.
727 ``pass:[{le}]``:: Set the current byte order to little endian.
728
729 === Fixed-length number
730
731 A _fixed-length number_ represents a fixed number of bytes encoding
732 either:
733
734 * An unsigned or signed integer (two's complement).
735 +
736 The available lengths are 8, 16, 24, 32, 40, 48, 56, and 64.
737
738 * A floating point number
739 (IEEE{nbsp}754-2008[https://standards.ieee.org/standard/754-2008.html]).
740 +
741 The available length are 32 (_binary32_) and 64 (_binary64_).
742
743 The value is the result of evaluating a {py3} expression using the
744 <<cur-bo,current byte order>>.
745
746 A fixed-length number is:
747
748 . The ``pass:[{]`` prefix.
749
750 . A valid {py3} expression.
751 +
752 For a fixed-length number at some source location{nbsp}__**L**__, this
753 expression may contain the name of any accessible <<label,label>> (not
754 within a nested group), including the name of a label defined
755 after{nbsp}__**L**__, as well as the name of any
756 <<variable-assignment,variable>> known at{nbsp}__**L**__.
757 +
758 The value of the special name `ICITTE` (`int` type) in this expression
759 is the <<cur-offset,current offset>> (before encoding the number).
760
761 . The `:` character.
762
763 . An encoding length in bits amongst:
764 +
765 --
766 The expression evaluates to an `int` or `bool` value::
767 `8`, `16`, `24`, `32`, `40`, `48`, `56`, and `64`.
768 +
769 NOTE: Normand automatically converts a `bool` value to `int`.
770
771 The expression evaluates to a `float` value::
772 `32` and `64`.
773 --
774
775 . The `}` suffix.
776
777 ====
778 Input:
779
780 ----
781 {le} {345:16}
782 {be} {-0xabcd:32}
783 ----
784
785 Output:
786
787 ----
788 59 01 ff ff 54 33
789 ----
790 ====
791
792 ====
793 Input:
794
795 ----
796 {be}
797
798 # String length in bits
799 {8 * (str_end - str_beg) : 16}
800
801 # String
802 <str_beg>
803 "hello world!"
804 <str_end>
805 ----
806
807 Output:
808
809 ----
810 00 60 68 65 6c 6c 6f 20 77 6f 72 6c 64 21 ┆ •`hello world!
811 ----
812 ====
813
814 ====
815 Input:
816
817 ----
818 {20 - ICITTE : 8} * 10
819 ----
820
821 Output:
822
823 ----
824 14 13 12 11 10 0f 0e 0d 0c 0b
825 ----
826 ====
827
828 ====
829 Input:
830
831 ----
832 {le}
833 {2 * 0.0529 : 32}
834 ----
835
836 Output:
837
838 ----
839 ac ad d8 3d
840 ----
841 ====
842
843 === LEB128 integer
844
845 An _LEB128 integer_ represents a variable number of bytes encoding an
846 unsigned or signed integer which is the result of evaluating a {py3}
847 expression following the https://en.wikipedia.org/wiki/LEB128[LEB128]
848 format.
849
850 An LEB128 integer is:
851
852 . The ``pass:[{]`` prefix.
853
854 . A valid {py3} expression of which the evaluation result type
855 is `int` or `bool` (automatically converted to `int`).
856 +
857 For an LEB128 integer at some source location{nbsp}__**L**__, this
858 expression may contain:
859 +
860 --
861 * The name of any <<label,label>> defined before{nbsp}__**L**__
862 which isn't within a nested group.
863 * The name of any <<variable-assignment,variable>> known
864 at{nbsp}__**L**__.
865 --
866 +
867 The value of the special name `ICITTE` (`int` type) in this expression
868 is the <<cur-offset,current offset>> (before encoding the integer).
869
870 . The `:` character.
871
872 . One of:
873 +
874 --
875 [horizontal]
876 `uleb128`:: Use the unsigned LEB128 format.
877 `sleb128`:: Use the signed LEB128 format.
878 --
879
880 . The `}` suffix.
881
882 ====
883 Input:
884
885 ----
886 {624485 : uleb128}
887 ----
888
889 Output:
890
891 ----
892 e5 8e 26
893 ----
894 ====
895
896 ====
897 Input:
898
899 ----
900 aa bb cc dd
901 <meow>
902 ee ff
903 {-981238311 + (meow * -23) : sleb128}
904 "hello"
905 ----
906
907 Output:
908
909 ----
910 aa bb cc dd ee ff fd fa 8d ac 7c 68 65 6c 6c 6f ┆ ••••••••••|hello
911 ----
912 ====
913
914 === Current offset setting
915
916 This special item sets the <<cur-offset,_current offset_>>.
917
918 A current offset setting is:
919
920 . The `<` prefix.
921
922 . A <<const-int,positive constant integer>> which is the new current
923 offset.
924
925 . The `>` suffix.
926
927 ====
928 Input:
929
930 ----
931 {ICITTE : 8} * 8
932 <0x61> {ICITTE : 8} * 8
933 ----
934
935 Output:
936
937 ----
938 00 01 02 03 04 05 06 07 61 62 63 64 65 66 67 68 ┆ ••••••••abcdefgh
939 ----
940 ====
941
942 ====
943 Input:
944
945 ----
946 aa bb cc dd <meow> ee ff
947 <12> 11 22 33 <mix> 44 55
948 {meow : 8} {mix : 8}
949 ----
950
951 Output:
952
953 ----
954 aa bb cc dd ee ff 11 22 33 44 55 04 0f ┆ •••••••"3DU••
955 ----
956 ====
957
958 === Current offset alignment
959
960 A _current offset alignment_ represents zero or more padding bytes to
961 make the <<cur-offset,current offset>> meet a given
962 https://en.wikipedia.org/wiki/Data_structure_alignment[alignment] value.
963
964 More specifically, for an alignment value of{nbsp}__**N**__{nbsp}bits,
965 a current offset alignment represents the required padding bytes until
966 the current offset is a multiple of __**N**__{nbsp}/{nbsp}8.
967
968 A current offset alignment is:
969
970 . The `@` prefix.
971
972 . A <<const-int,positive constant integer>> which is the alignment value
973 in _bits_.
974 +
975 This value must be greater than zero and a multiple of{nbsp}8.
976
977 . **Optional**:
978 +
979 --
980 . The ``pass:[~]`` prefix.
981 . A <<const-int,positive constant integer>> which is the value of the
982 byte to use as padding to align the <<cur-offset,current offset>>.
983 --
984 +
985 Without this section, the padding byte value is zero.
986
987 ====
988 Input:
989
990 ----
991 11 22 (@32 aa bb cc) * 3
992 ----
993
994 Output:
995
996 ----
997 11 22 00 00 aa bb cc 00 aa bb cc 00 aa bb cc
998 ----
999 ====
1000
1001 ====
1002 Input:
1003
1004 ----
1005 {le}
1006 77 88
1007 @32~0xcc {-893.5:32}
1008 @128~0x55 "meow"
1009 ----
1010
1011 Output:
1012
1013 ----
1014 77 88 cc cc 00 60 5f c4 55 55 55 55 55 55 55 55 ┆ w••••`_•UUUUUUUU
1015 6d 65 6f 77 ┆ meow
1016 ----
1017 ====
1018
1019 ====
1020 Input:
1021
1022 ----
1023 aa bb cc <29> @64~255 "zoom"
1024 ----
1025
1026 Output:
1027
1028 ----
1029 aa bb cc ff ff ff 7a 6f 6f 6d ┆ ••••••zoom
1030 ----
1031 ====
1032
1033 === Filling
1034
1035 A _filling_ represents zero or more padding bytes to make the
1036 <<cur-offset,current offset>> reach a given value.
1037
1038 A filling is:
1039
1040 . The ``pass:[+]`` prefix.
1041
1042 . One of:
1043
1044 ** A <<const-int,positive constant integer>> which is the current offset
1045 target.
1046
1047 ** The ``pass:[{]`` prefix, a valid {py3} expression of which the
1048 evaluation result type is `int` or `bool` (automatically converted to
1049 `int`), and the ``pass:[}]`` suffix.
1050 +
1051 For a filling at some source location{nbsp}__**L**__, this expression
1052 may contain:
1053 +
1054 --
1055 * The name of any <<label,label>> defined before{nbsp}__**L**__
1056 which isn't within a nested group.
1057 * The name of any <<variable-assignment,variable>> known
1058 at{nbsp}__**L**__.
1059 --
1060 +
1061 The value of the special name `ICITTE` (`int` type) in this expression
1062 is the <<cur-offset,current offset>> (before handling the items to
1063 repeat).
1064
1065 ** A valid {py3} name.
1066 +
1067 For the name `__NAME__`, this is equivalent to the
1068 `pass:[{]__NAME__pass:[}]` form above.
1069
1070 +
1071 This value must be greater than or equal to the current offset where
1072 it's used.
1073
1074 . **Optional**:
1075 +
1076 --
1077 . The ``pass:[~]`` prefix.
1078 . A <<const-int,positive constant integer>> which is the value of the
1079 byte to use as padding to reach the current offset target.
1080 --
1081 +
1082 Without this section, the padding byte value is zero.
1083
1084 ====
1085 Input:
1086
1087 ----
1088 aa bb cc dd
1089 +0x40
1090 "hello world"
1091 ----
1092
1093 Output:
1094
1095 ----
1096 aa bb cc dd 00 00 00 00 00 00 00 00 00 00 00 00 ┆ ••••••••••••••••
1097 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ┆ ••••••••••••••••
1098 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ┆ ••••••••••••••••
1099 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ┆ ••••••••••••••••
1100 68 65 6c 6c 6f 20 77 6f 72 6c 64 ┆ hello world
1101 ----
1102 ====
1103
1104 ====
1105 Input:
1106
1107 ----
1108 !macro part(iter, fill)
1109 <0> "particular security " {ord('0') + iter : 8} +fill~0x80
1110 !end
1111
1112 {iter = 1}
1113
1114 !repeat 5
1115 m:part(iter, {32 + 4 * iter})
1116 {iter = iter + 1}
1117 !end
1118 ----
1119
1120 Output:
1121
1122 ----
1123 70 61 72 74 69 63 75 6c 61 72 20 73 65 63 75 72 ┆ particular secur
1124 69 74 79 20 31 80 80 80 80 80 80 80 80 80 80 80 ┆ ity 1•••••••••••
1125 80 80 80 80 70 61 72 74 69 63 75 6c 61 72 20 73 ┆ ••••particular s
1126 65 63 75 72 69 74 79 20 32 80 80 80 80 80 80 80 ┆ ecurity 2•••••••
1127 80 80 80 80 80 80 80 80 80 80 80 80 70 61 72 74 ┆ ••••••••••••part
1128 69 63 75 6c 61 72 20 73 65 63 75 72 69 74 79 20 ┆ icular security
1129 33 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ┆ 3•••••••••••••••
1130 80 80 80 80 80 80 80 80 70 61 72 74 69 63 75 6c ┆ ••••••••particul
1131 61 72 20 73 65 63 75 72 69 74 79 20 34 80 80 80 ┆ ar security 4•••
1132 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ┆ ••••••••••••••••
1133 80 80 80 80 80 80 80 80 70 61 72 74 69 63 75 6c ┆ ••••••••particul
1134 61 72 20 73 65 63 75 72 69 74 79 20 35 80 80 80 ┆ ar security 5•••
1135 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ┆ ••••••••••••••••
1136 80 80 80 80 80 80 80 80 80 80 80 80 ┆ ••••••••••••
1137 ----
1138 ====
1139
1140 === Label
1141
1142 A _label_ associates a name to the <<cur-offset,current offset>>.
1143
1144 All the labels of a whole Normand input must have unique names.
1145
1146 A label must not share the name of a <<variable-assignment,variable>>
1147 name.
1148
1149 A label is:
1150
1151 . The `<` prefix.
1152
1153 . A valid {py3} name which is not `ICITTE`.
1154
1155 . The `>` suffix.
1156
1157 === Variable assignment
1158
1159 A _variable assignment_ associates a name to the integral result of an
1160 evaluated {py3} expression.
1161
1162 A variable assignment is:
1163
1164 . The ``pass:[{]`` prefix.
1165
1166 . A valid {py3} name which is not `ICITTE`.
1167
1168 . The `=` character.
1169
1170 . A valid {py3} expression of which the evaluation result type
1171 is `int`, `float`, or `bool` (automatically converted to `int`).
1172 +
1173 For a variable assignment at some source location{nbsp}__**L**__, this
1174 expression may contain:
1175 +
1176 --
1177 * The name of any <<label,label>> defined before{nbsp}__**L**__
1178 which isn't within a nested group.
1179 * The name of any <<variable-assignment,variable>> known
1180 at{nbsp}__**L**__.
1181 --
1182 +
1183 The value of the special name `ICITTE` (`int` type) in this expression
1184 is the <<cur-offset,current offset>>.
1185
1186 . The `}` suffix.
1187
1188 ====
1189 Input:
1190
1191 ----
1192 {mix = 101} {le}
1193 {meow = 42} 11 22 {meow:8} 33 {meow = ICITTE + 17}
1194 "yooo" {meow + mix : 16}
1195 ----
1196
1197 Output:
1198
1199 ----
1200 11 22 2a 33 79 6f 6f 6f 7a 00 ┆ •"*3yoooz•
1201 ----
1202 ====
1203
1204 === Group
1205
1206 A _group_ is a scoped sequence of items.
1207
1208 The <<label,labels>> within a group aren't visible outside of it.
1209
1210 The main purpose of a group is to <<post-item-repetition,repeat>> more
1211 than a single item and to isolate labels.
1212
1213 A group is:
1214
1215 . The `(`, `!group`, or `!g` opening.
1216
1217 . Zero or more items.
1218
1219 . Depending on the group opening:
1220 +
1221 --
1222 `(`::
1223 The `)` closing.
1224
1225 `!group`::
1226 `!g`::
1227 The `!end` closing.
1228 --
1229
1230 ====
1231 Input:
1232
1233 ----
1234 ((aa bb cc) dd () ee) "leclerc"
1235 ----
1236
1237 Output:
1238
1239 ----
1240 aa bb cc dd ee 6c 65 63 6c 65 72 63 ┆ •••••leclerc
1241 ----
1242 ====
1243
1244 ====
1245 Input:
1246
1247 ----
1248 !group
1249 (aa bb cc) * 3 dd ee
1250 !end * 5
1251 ----
1252
1253 Output:
1254
1255 ----
1256 aa bb cc aa bb cc aa bb cc dd ee aa bb cc aa bb
1257 cc aa bb cc dd ee aa bb cc aa bb cc aa bb cc dd
1258 ee aa bb cc aa bb cc aa bb cc dd ee aa bb cc aa
1259 bb cc aa bb cc dd ee
1260 ----
1261 ====
1262
1263 ====
1264 Input:
1265
1266 ----
1267 {be}
1268 (
1269 <str_beg> u16le"sébastien diaz" <str_end>
1270 {ICITTE - str_beg : 8}
1271 {(end - str_beg) * 5 : 24}
1272 ) * 3
1273 <end>
1274 ----
1275
1276 Output:
1277
1278 ----
1279 73 00 e9 00 62 00 61 00 73 00 74 00 69 00 65 00 ┆ s•••b•a•s•t•i•e•
1280 6e 00 20 00 64 00 69 00 61 00 7a 00 1c 00 01 e0 ┆ n• •d•i•a•z•••••
1281 73 00 e9 00 62 00 61 00 73 00 74 00 69 00 65 00 ┆ s•••b•a•s•t•i•e•
1282 6e 00 20 00 64 00 69 00 61 00 7a 00 1c 00 01 40 ┆ n• •d•i•a•z••••@
1283 73 00 e9 00 62 00 61 00 73 00 74 00 69 00 65 00 ┆ s•••b•a•s•t•i•e•
1284 6e 00 20 00 64 00 69 00 61 00 7a 00 1c 00 00 a0 ┆ n• •d•i•a•z•••••
1285 ----
1286 ====
1287
1288 === Conditional block
1289
1290 A _conditional block_ represents either the bytes of zero or more items
1291 if some expression is true, or the bytes of zero or more other items if
1292 it's false.
1293
1294 A conditional block is:
1295
1296 . The `!if` opening.
1297
1298 . One of:
1299
1300 ** The ``pass:[{]`` prefix, a valid {py3} expression of which the
1301 evaluation result type is `int` or `bool` (automatically converted to
1302 `int`), and the ``pass:[}]`` suffix.
1303 +
1304 For a conditional block at some source location{nbsp}__**L**__, this
1305 expression may contain:
1306 +
1307 --
1308 * The name of any <<label,label>> defined before{nbsp}__**L**__
1309 which isn't within a nested group.
1310 * The name of any <<variable-assignment,variable>> known
1311 at{nbsp}__**L**__.
1312 --
1313 +
1314 The value of the special name `ICITTE` (`int` type) in this expression
1315 is the <<cur-offset,current offset>> (before handling the contained
1316 items).
1317
1318 ** A valid {py3} name.
1319 +
1320 For the name `__NAME__`, this is equivalent to the
1321 `pass:[{]__NAME__pass:[}]` form above.
1322
1323 . Zero or more items to be handled when the condition is true.
1324
1325 . **Optional**:
1326
1327 .. The `!else` opening.
1328 .. Zero or more items to be handled when the condition is false.
1329
1330 . The `!end` closing.
1331
1332 ====
1333 Input:
1334
1335 ----
1336 {at = 1}
1337 {rep_count = 9}
1338
1339 !repeat rep_count
1340 "meow "
1341
1342 !if {ICITTE > 25}
1343 "mix"
1344 !else
1345 "zoom"
1346 !end
1347
1348 !if {at < rep_count} 20 !end
1349
1350 {at = at + 1}
1351 !end
1352 ----
1353
1354 Output:
1355
1356 ----
1357 6d 65 6f 77 20 7a 6f 6f 6d 20 6d 65 6f 77 20 7a ┆ meow zoom meow z
1358 6f 6f 6d 20 6d 65 6f 77 20 7a 6f 6f 6d 20 6d 65 ┆ oom meow zoom me
1359 6f 77 20 6d 69 78 20 6d 65 6f 77 20 6d 69 78 20 ┆ ow mix meow mix
1360 6d 65 6f 77 20 6d 69 78 20 6d 65 6f 77 20 6d 69 ┆ meow mix meow mi
1361 78 20 6d 65 6f 77 20 6d 69 78 20 6d 65 6f 77 20 ┆ x meow mix meow
1362 6d 69 78 ┆ mix
1363 ----
1364 ====
1365
1366 ====
1367 Input:
1368
1369 ----
1370 <str_beg>
1371 u16le"meow mix!"
1372 <str_end>
1373
1374 !if {str_end - str_beg > 10}
1375 " BIG"
1376 !end
1377 ----
1378
1379 Output:
1380
1381 ----
1382 6d 00 65 00 6f 00 77 00 20 00 6d 00 69 00 78 00 ┆ m•e•o•w• •m•i•x•
1383 21 00 20 42 49 47 ┆ !• BIG
1384 ----
1385 ====
1386
1387 === Repetition block
1388
1389 A _repetition block_ represents the bytes of one or more items repeated
1390 a given number of times.
1391
1392 A repetition block is:
1393
1394 . The `!repeat` or `!r` opening.
1395
1396 . One of:
1397
1398 ** A <<const-int,positive constant integer>> which is the number of
1399 times to repeat the previous item.
1400
1401 ** The ``pass:[{]`` prefix, a valid {py3} expression of which the
1402 evaluation result type is `int` or `bool` (automatically converted to
1403 `int`), and the ``pass:[}]`` suffix.
1404 +
1405 For a repetition block at some source location{nbsp}__**L**__, this
1406 expression may contain:
1407 +
1408 --
1409 * The name of any <<label,label>> defined before{nbsp}__**L**__
1410 which isn't within a nested group.
1411 * The name of any <<variable-assignment,variable>> known
1412 at{nbsp}__**L**__.
1413 --
1414 +
1415 The value of the special name `ICITTE` (`int` type) in this expression
1416 is the <<cur-offset,current offset>> (before handling the items to
1417 repeat).
1418
1419 ** A valid {py3} name.
1420 +
1421 For the name `__NAME__`, this is equivalent to the
1422 `pass:[{]__NAME__pass:[}]` form above.
1423
1424 . Zero or more items.
1425
1426 . The `!end` closing.
1427
1428 You may also use a <<post-item-repetition,post-item repetition>> after
1429 some items. The form ``!repeat{nbsp}__X__{nbsp}__ITEMS__{nbsp}!end``
1430 is equivalent to ``(__ITEMS__){nbsp}pass:[*]{nbsp}__X__``.
1431
1432 ====
1433 Input:
1434
1435 ----
1436 !repeat 0o400
1437 {end - ICITTE - 1 : 8}
1438 !end
1439
1440 <end>
1441 ----
1442
1443 Output:
1444
1445 ----
1446 ff fe fd fc fb fa f9 f8 f7 f6 f5 f4 f3 f2 f1 f0 ┆ ••••••••••••••••
1447 ef ee ed ec eb ea e9 e8 e7 e6 e5 e4 e3 e2 e1 e0 ┆ ••••••••••••••••
1448 df de dd dc db da d9 d8 d7 d6 d5 d4 d3 d2 d1 d0 ┆ ••••••••••••••••
1449 cf ce cd cc cb ca c9 c8 c7 c6 c5 c4 c3 c2 c1 c0 ┆ ••••••••••••••••
1450 bf be bd bc bb ba b9 b8 b7 b6 b5 b4 b3 b2 b1 b0 ┆ ••••••••••••••••
1451 af ae ad ac ab aa a9 a8 a7 a6 a5 a4 a3 a2 a1 a0 ┆ ••••••••••••••••
1452 9f 9e 9d 9c 9b 9a 99 98 97 96 95 94 93 92 91 90 ┆ ••••••••••••••••
1453 8f 8e 8d 8c 8b 8a 89 88 87 86 85 84 83 82 81 80 ┆ ••••••••••••••••
1454 7f 7e 7d 7c 7b 7a 79 78 77 76 75 74 73 72 71 70 ┆ •~}|{zyxwvutsrqp
1455 6f 6e 6d 6c 6b 6a 69 68 67 66 65 64 63 62 61 60 ┆ onmlkjihgfedcba`
1456 5f 5e 5d 5c 5b 5a 59 58 57 56 55 54 53 52 51 50 ┆ _^]\[ZYXWVUTSRQP
1457 4f 4e 4d 4c 4b 4a 49 48 47 46 45 44 43 42 41 40 ┆ ONMLKJIHGFEDCBA@
1458 3f 3e 3d 3c 3b 3a 39 38 37 36 35 34 33 32 31 30 ┆ ?>=<;:9876543210
1459 2f 2e 2d 2c 2b 2a 29 28 27 26 25 24 23 22 21 20 ┆ /.-,+*)('&%$#"!
1460 1f 1e 1d 1c 1b 1a 19 18 17 16 15 14 13 12 11 10 ┆ ••••••••••••••••
1461 0f 0e 0d 0c 0b 0a 09 08 07 06 05 04 03 02 01 00 ┆ ••••••••••••••••
1462 ----
1463 ====
1464
1465 ====
1466 Input:
1467
1468 ----
1469 {times = 1}
1470
1471 aa bb cc dd
1472
1473 !repeat 3
1474 <here>
1475
1476 !repeat {here + 1}
1477 ee ff
1478 !end
1479
1480 11 22 !repeat times 33 !end
1481
1482 {times = times + 1}
1483 !end
1484
1485 "coucou!"
1486 ----
1487
1488 Output:
1489
1490 ----
1491 aa bb cc dd ee ff ee ff ee ff ee ff ee ff 11 22 ┆ •••••••••••••••"
1492 33 ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ 3•••••••••••••••
1493 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1494 ff ee ff ee ff 11 22 33 33 ee ff ee ff ee ff ee ┆ ••••••"33•••••••
1495 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1496 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1497 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1498 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1499 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1500 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1501 ff ee ff ee ff ee ff ee ff ee ff ee ff 11 22 33 ┆ ••••••••••••••"3
1502 33 33 63 6f 75 63 6f 75 21 ┆ 33coucou!
1503 ----
1504 ====
1505
1506 === Macro definition block
1507
1508 A _macro definition block_ associates a name and parameter names to
1509 a group of items.
1510
1511 A macro definition block doesn't lead to generated bytes itself: a
1512 <<macro-expansion,macro expansion>> does so.
1513
1514 A macro definition may only exist at the root level, that is, not within
1515 a <<group,group>>, a <<repetition-block,repetition block>>, a
1516 <<conditional-block,conditional block>>, or another
1517 <<macro-definition-block,macro definition block>>.
1518
1519 All macro definitions must have unique names.
1520
1521 A macro definition is:
1522
1523 . The `!macro` or `!m` opening.
1524
1525 . A valid {py3} name (the macro name).
1526
1527 . The `(` parameter name list prefix.
1528
1529 . A comma-separated list of zero or more unique parameter names,
1530 each one being a valid {py3} name.
1531
1532 . The `)` parameter name list suffix.
1533
1534 . Zero or more items except, recursively, a macro definition block.
1535
1536 . The `!end` closing.
1537
1538 ====
1539 ----
1540 !macro bake()
1541 {le} {ICITTE * 8 : 16}
1542 u16le"predict explode"
1543 !end
1544 ----
1545 ====
1546
1547 ====
1548 ----
1549 !macro nail(rep, with_extra, val)
1550 {iter = 1}
1551
1552 !repeat rep
1553 {val + iter : uleb128}
1554 {0xdeadbeef : 32}
1555 {iter = iter + 1}
1556 !end
1557
1558 !if with_extra
1559 "meow mix\0"
1560 !end
1561 !end
1562 ----
1563 ====
1564
1565 === Macro expansion
1566
1567 A _macro expansion_ expands the items of a defined
1568 <<macro-definition-block,macro>>.
1569
1570 The macro to expand must be defined _before_ the expansion.
1571
1572 The <<state,state>> before handling the first item of the chosen macro
1573 is:
1574
1575 <<cur-offset,Current offset>>::
1576 Unchanged.
1577
1578 <<cur-bo,Current byte order>>::
1579 Unchanged.
1580
1581 Variables::
1582 The only available variables initially are the macro parameters.
1583
1584 Labels::
1585 None.
1586
1587 The state after having handled the last item of the chosen macro is:
1588
1589 Current offset::
1590 The one before handling the first item of the macro plus the size
1591 of the generated data of the macro expansion.
1592 +
1593 IMPORTANT: This means <<current-offset-setting,current offset setting>>
1594 items within the expanded macro don't impact the final current offset.
1595
1596 Current byte order::
1597 The one before handling the first item of the macro.
1598
1599 Variables::
1600 The ones before handling the first item of the macro.
1601
1602 Labels::
1603 The ones before handling the first item of the macro.
1604
1605 A macro expansion is:
1606
1607 . The `m:` prefix.
1608
1609 . A valid {py3} name (the name of the macro to expand).
1610
1611 . The `(` parameter value list prefix.
1612
1613 . A comma-separated list of zero or more unique parameter values.
1614 +
1615 The number of parameter values must match the number of parameter
1616 names of the definition of the chosen macro.
1617 +
1618 A parameter value is one of:
1619 +
1620 --
1621 * A <<const-int,constant integer>>, possibly negative.
1622
1623 * The ``pass:[{]`` prefix, a valid {py3} expression of which the
1624 evaluation result type is `int` or `bool` (automatically converted to
1625 `int`), and the ``pass:[}]`` suffix.
1626 +
1627 For a macro expansion at some source location{nbsp}__**L**__, this
1628 expression may contain:
1629
1630 ** The name of any <<label,label>> defined before{nbsp}__**L**__
1631 which isn't within a nested group.
1632 ** The name of any <<variable-assignment,variable>> known
1633 at{nbsp}__**L**__.
1634
1635 +
1636 The value of the special name `ICITTE` (`int` type) in this expression
1637 is the <<cur-offset,current offset>> (before handling the items of the
1638 chosen macro).
1639
1640 * A valid {py3} name.
1641 +
1642 For the name `__NAME__`, this is equivalent to the
1643 `pass:[{]__NAME__pass:[}]` form above.
1644 --
1645
1646 . The `)` parameter value list suffix.
1647
1648 ====
1649 Input:
1650
1651 ----
1652 !macro bake()
1653 {le} {ICITTE * 8 : 16}
1654 u16le"predict explode"
1655 !end
1656
1657 "hello [" m:bake() "] world"
1658
1659 m:bake() * 5
1660 ----
1661
1662 Output:
1663
1664 ----
1665 68 65 6c 6c 6f 20 5b 38 00 70 00 72 00 65 00 64 ┆ hello [8•p•r•e•d
1666 00 69 00 63 00 74 00 20 00 65 00 78 00 70 00 6c ┆ •i•c•t• •e•x•p•l
1667 00 6f 00 64 00 65 00 5d 20 77 6f 72 6c 64 70 01 ┆ •o•d•e•] worldp•
1668 70 00 72 00 65 00 64 00 69 00 63 00 74 00 20 00 ┆ p•r•e•d•i•c•t• •
1669 65 00 78 00 70 00 6c 00 6f 00 64 00 65 00 70 02 ┆ e•x•p•l•o•d•e•p•
1670 70 00 72 00 65 00 64 00 69 00 63 00 74 00 20 00 ┆ p•r•e•d•i•c•t• •
1671 65 00 78 00 70 00 6c 00 6f 00 64 00 65 00 70 03 ┆ e•x•p•l•o•d•e•p•
1672 70 00 72 00 65 00 64 00 69 00 63 00 74 00 20 00 ┆ p•r•e•d•i•c•t• •
1673 65 00 78 00 70 00 6c 00 6f 00 64 00 65 00 70 04 ┆ e•x•p•l•o•d•e•p•
1674 70 00 72 00 65 00 64 00 69 00 63 00 74 00 20 00 ┆ p•r•e•d•i•c•t• •
1675 65 00 78 00 70 00 6c 00 6f 00 64 00 65 00 70 05 ┆ e•x•p•l•o•d•e•p•
1676 70 00 72 00 65 00 64 00 69 00 63 00 74 00 20 00 ┆ p•r•e•d•i•c•t• •
1677 65 00 78 00 70 00 6c 00 6f 00 64 00 65 00 ┆ e•x•p•l•o•d•e•
1678 ----
1679 ====
1680
1681 ====
1682 Input:
1683
1684 ----
1685 !macro A(val, is_be)
1686 {le}
1687
1688 !if is_be
1689 {be}
1690 !end
1691
1692 {val : 16}
1693 !end
1694
1695 !macro B(rep, is_be)
1696 {iter = 1}
1697
1698 !repeat rep
1699 m:A({iter * 3}, is_be)
1700 {iter = iter + 1}
1701 !end
1702 !end
1703
1704 m:B(5, 1)
1705 m:B(3, 0)
1706 ----
1707
1708 Output:
1709
1710 ----
1711 00 03 00 06 00 09 00 0c 00 0f 03 00 06 00 09 00
1712 ----
1713 ====
1714
1715 === Post-item repetition
1716
1717 A _post-item repetition_ represents the bytes of an item repeated a
1718 given number of times.
1719
1720 A post-item repetition is:
1721
1722 . One of those items:
1723
1724 ** A <<byte-constant,byte constant>>.
1725 ** A <<literal-string,literal string>>.
1726 ** A <<fixed-length-number,fixed-length number>>.
1727 ** An <<leb128-integer,LEB128 integer>>.
1728 ** A <<macro-expansion,macro-expansion>>.
1729 ** A <<group,group>>.
1730
1731 . The ``pass:[*]`` character.
1732
1733 . One of:
1734
1735 ** A positive integer (hexadecimal starting with `0x` or `0X` accepted)
1736 which is the number of times to repeat the previous item.
1737
1738 ** The ``pass:[{]`` prefix, a valid {py3} expression of which the
1739 evaluation result type is `int` or `bool` (automatically converted to
1740 `int`), and the ``pass:[}]`` suffix.
1741 +
1742 For a post-item repetition at some source location{nbsp}__**L**__, this
1743 expression may contain:
1744 +
1745 --
1746 * The name of any <<label,label>> defined before{nbsp}__**L**__
1747 which isn't within a nested group and
1748 which isn't part of the repeated item.
1749 * The name of any <<variable-assignment,variable>> known
1750 at{nbsp}__**L**__, which isn't part of its repeated item, and which
1751 doesn't.
1752 --
1753 +
1754 The value of the special name `ICITTE` (`int` type) in this expression
1755 is the <<cur-offset,current offset>> (before handling the items to
1756 repeat).
1757
1758 ** A valid {py3} name.
1759 +
1760 For the name `__NAME__`, this is equivalent to the
1761 `pass:[{]__NAME__pass:[}]` form above.
1762
1763 You may also use a <<repetition-block,repetition block>>. The form
1764 ``__ITEM__{nbsp}pass:[*]{nbsp}__X__`` is equivalent to
1765 ``!repeat{nbsp}__X__{nbsp}__ITEM__{nbsp}!end``.
1766
1767 ====
1768 Input:
1769
1770 ----
1771 {end - ICITTE - 1 : 8} * 0x100 <end>
1772 ----
1773
1774 Output:
1775
1776 ----
1777 ff fe fd fc fb fa f9 f8 f7 f6 f5 f4 f3 f2 f1 f0 ┆ ••••••••••••••••
1778 ef ee ed ec eb ea e9 e8 e7 e6 e5 e4 e3 e2 e1 e0 ┆ ••••••••••••••••
1779 df de dd dc db da d9 d8 d7 d6 d5 d4 d3 d2 d1 d0 ┆ ••••••••••••••••
1780 cf ce cd cc cb ca c9 c8 c7 c6 c5 c4 c3 c2 c1 c0 ┆ ••••••••••••••••
1781 bf be bd bc bb ba b9 b8 b7 b6 b5 b4 b3 b2 b1 b0 ┆ ••••••••••••••••
1782 af ae ad ac ab aa a9 a8 a7 a6 a5 a4 a3 a2 a1 a0 ┆ ••••••••••••••••
1783 9f 9e 9d 9c 9b 9a 99 98 97 96 95 94 93 92 91 90 ┆ ••••••••••••••••
1784 8f 8e 8d 8c 8b 8a 89 88 87 86 85 84 83 82 81 80 ┆ ••••••••••••••••
1785 7f 7e 7d 7c 7b 7a 79 78 77 76 75 74 73 72 71 70 ┆ •~}|{zyxwvutsrqp
1786 6f 6e 6d 6c 6b 6a 69 68 67 66 65 64 63 62 61 60 ┆ onmlkjihgfedcba`
1787 5f 5e 5d 5c 5b 5a 59 58 57 56 55 54 53 52 51 50 ┆ _^]\[ZYXWVUTSRQP
1788 4f 4e 4d 4c 4b 4a 49 48 47 46 45 44 43 42 41 40 ┆ ONMLKJIHGFEDCBA@
1789 3f 3e 3d 3c 3b 3a 39 38 37 36 35 34 33 32 31 30 ┆ ?>=<;:9876543210
1790 2f 2e 2d 2c 2b 2a 29 28 27 26 25 24 23 22 21 20 ┆ /.-,+*)('&%$#"!
1791 1f 1e 1d 1c 1b 1a 19 18 17 16 15 14 13 12 11 10 ┆ ••••••••••••••••
1792 0f 0e 0d 0c 0b 0a 09 08 07 06 05 04 03 02 01 00 ┆ ••••••••••••••••
1793 ----
1794 ====
1795
1796 ====
1797 Input:
1798
1799 ----
1800 {times = 1}
1801 aa bb cc dd
1802 (
1803 <here>
1804 (ee ff) * {here + 1}
1805 11 22 33 * {times}
1806 {times = times + 1}
1807 ) * 3
1808 "coucou!"
1809 ----
1810
1811 Output:
1812
1813 ----
1814 aa bb cc dd ee ff ee ff ee ff ee ff ee ff 11 22 ┆ •••••••••••••••"
1815 33 ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ 3•••••••••••••••
1816 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1817 ff ee ff ee ff 11 22 33 33 ee ff ee ff ee ff ee ┆ ••••••"33•••••••
1818 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1819 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1820 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1821 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1822 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1823 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1824 ff ee ff ee ff ee ff ee ff ee ff ee ff 11 22 33 ┆ ••••••••••••••"3
1825 33 33 63 6f 75 63 6f 75 21 ┆ 33coucou!
1826 ----
1827 ====
1828
1829 == Command-line tool
1830
1831 If you <<install-normand,installed>> the `normand` package, then you
1832 can use the `normand` command-line tool:
1833
1834 ----
1835 $ normand <<< '"ma gang de malades"' | hexdump -C
1836 ----
1837
1838 ----
1839 00000000 6d 61 20 67 61 6e 67 20 64 65 20 6d 61 6c 61 64 |ma gang de malad|
1840 00000010 65 73 |es|
1841 ----
1842
1843 If you copy the `normand.py` module to your own project, then you can
1844 run the module itself:
1845
1846 ----
1847 $ python3 -m normand <<< '"ma gang de malades"' | hexdump -C
1848 ----
1849
1850 ----
1851 00000000 6d 61 20 67 61 6e 67 20 64 65 20 6d 61 6c 61 64 |ma gang de malad|
1852 00000010 65 73 |es|
1853 ----
1854
1855 Without a path argument, the `normand` tool reads from the standard
1856 input.
1857
1858 The `normand` tool prints the generated binary data to the standard
1859 output.
1860
1861 Various options control the initial <<state,state>> of the processor:
1862 use the `--help` option to learn more.
1863
1864 == {py3} API
1865
1866 The whole `normand` package/module public API is:
1867
1868 [source,python]
1869 ----
1870 # Byte order.
1871 class ByteOrder(enum.Enum):
1872 # Big endian.
1873 BE = ...
1874
1875 # Little endian.
1876 LE = ...
1877
1878
1879 # Text location.
1880 class TextLocation:
1881 # Line number.
1882 @property
1883 def line_no(self) -> int:
1884 ...
1885
1886 # Column number.
1887 @property
1888 def col_no(self) -> int:
1889 ...
1890
1891
1892 # Parsing error message.
1893 class ParseErrorMessage:
1894 # Message text.
1895 @property
1896 def text(self):
1897 ...
1898
1899 # Source text location.
1900 @property
1901 def text_location(self):
1902 ...
1903
1904
1905 # Parsing error.
1906 class ParseError(RuntimeError):
1907 # Parsing error messages.
1908 #
1909 # The first message is the most _specific_ one.
1910 @property
1911 def messages(self):
1912 ...
1913
1914
1915 # Variables dictionary type (for type hints).
1916 VariablesT = typing.Dict[str, typing.Union[int, float]]
1917
1918
1919 # Labels dictionary type (for type hints).
1920 LabelsT = typing.Dict[str, int]
1921
1922
1923 # Parsing result.
1924 class ParseResult:
1925 # Generated data.
1926 @property
1927 def data(self) -> bytearray:
1928 ...
1929
1930 # Updated variable values.
1931 @property
1932 def variables(self) -> SymbolsT:
1933 ...
1934
1935 # Updated main group label values.
1936 @property
1937 def labels(self) -> SymbolsT:
1938 ...
1939
1940 # Final offset.
1941 @property
1942 def offset(self) -> int:
1943 ...
1944
1945 # Final byte order.
1946 @property
1947 def byte_order(self) -> typing.Optional[ByteOrder]:
1948 ...
1949
1950
1951 # Parses the `normand` input using the initial state defined by
1952 # `init_variables`, `init_labels`, `init_offset`, and `init_byte_order`,
1953 # and returns the corresponding parsing result.
1954 def parse(normand: str,
1955 init_variables: typing.Optional[SymbolsT] = None,
1956 init_labels: typing.Optional[SymbolsT] = None,
1957 init_offset: int = 0,
1958 init_byte_order: typing.Optional[ByteOrder] = None) -> ParseResult:
1959 ...
1960 ----
1961
1962 The `normand` parameter is the actual <<learn-normand,Normand input>>
1963 while the other parameters control the initial <<state,state>>.
1964
1965 The `parse()` function raises a `ParseError` instance should it fail to
1966 parse the `normand` string for any reason.
1967
1968 == Development
1969
1970 Normand is a https://python-poetry.org/[Poetry] project.
1971
1972 To develop it, install it through Poetry and enter the virtual
1973 environment:
1974
1975 ----
1976 $ poetry install
1977 $ poetry shell
1978 $ normand <<< '"lol" * 10 0a'
1979 ----
1980
1981 `normand.py` is processed by:
1982
1983 * https://microsoft.github.io/pyright/[Pyright]
1984 * https://github.com/psf/black[Black]
1985 * https://pycqa.github.io/isort/[isort]
1986
1987 === Testing
1988
1989 Use https://docs.pytest.org/[pytest] to test Normand once the package is
1990 part of your virtual environment, for example:
1991
1992 ----
1993 $ poetry install
1994 $ poetry run pip3 install pytest
1995 $ poetry run pytest
1996 ----
1997
1998 The `pytest` project is currently not a development dependency in
1999 `pyproject.toml` due to backward compatibiliy issues with
2000 Python{nbsp}3.4.
2001
2002 In the `tests` directory, each `*.nt` file is a test. The file name
2003 prefix indicates what it's meant to test:
2004
2005 `pass-`::
2006 Everything above the `---` line is the valid Normand input
2007 to test.
2008 +
2009 Everything below the `---` line is the expected data
2010 (whitespace-separated hexadecimal bytes).
2011
2012 `fail-`::
2013 Everything above the `---` line is the invalid Normand input
2014 to test.
2015 +
2016 Everything below the `---` line is the expected error message having
2017 this form:
2018 +
2019 ----
2020 LINE:COL - MESSAGE
2021 ----
2022
2023 === Contributing
2024
2025 Normand uses https://review.lttng.org/admin/repos/normand,general[Gerrit]
2026 for code review.
2027
2028 To report a bug, https://github.com/efficios/normand/issues/new[create a
2029 GitHub issue].
This page took 0.065113 seconds and 3 git commands to generate.