README.adoc: add "Design goals" section
[normand.git] / README.adoc
1 // Show ToC at a specific location for a GitHub rendering
2 ifdef::env-github[]
3 :toc: macro
4 endif::env-github[]
5
6 ifndef::env-github[]
7 :toc: left
8 endif::env-github[]
9
10 // This is to mimic what GitHub does so that anchors work in an offline
11 // rendering too.
12 :idprefix:
13 :idseparator: -
14
15 // Other attributes
16 :py3: Python{nbsp}3
17
18 = Normand
19 Philippe Proulx
20
21 image::normand-logo.png[]
22
23 [.normal]
24 image:https://img.shields.io/pypi/v/normand.svg?label=Latest%20version[link="https://pypi.python.org/pypi/normand"]
25
26 [.lead]
27 _**Normand**_ is a text-to-binary processor with its own language.
28
29 This package offers both a portable {py3} module and a command-line
30 tool.
31
32 WARNING: This version of Normand is 0.15, meaning both the Normand
33 language and the module/CLI interface aren't stable.
34
35 ifdef::env-github[]
36 // ToC location for a GitHub rendering
37 toc::[]
38 endif::env-github[]
39
40 == Introduction
41
42 The purpose of Normand is to consume human-readable text representing
43 bytes and to produce the corresponding binary data.
44
45 .Simple bytes input.
46 ====
47 Consider the following Normand input:
48
49 ----
50 4f 55 32 bb $167 fe %10100111 a9 $-32
51 ----
52
53 The generated nine bytes are:
54
55 ----
56 4f 55 32 bb a7 fe a7 a9 e0
57 ----
58 ====
59
60 As you can see in the last example, the fundamental unit of the Normand
61 language is the _byte_. The order in which you list bytes will be the
62 order of the generated data.
63
64 The Normand language is more than simple lists of bytes, though. Its
65 main features are:
66
67 Comments, including a bunch of insignificant symbols which may improve readability::
68 +
69 Input:
70 +
71 ----
72 ff bb %1101:0010 # This is a comment
73 78 29 af $192 # This too # 99 $-80
74 fe80::6257:18ff:fea3:4229
75 60:57:18:a3:42:29
76 10839636-5d65-4a68-8e6a-21608ddf7258
77 ----
78 +
79 Output:
80 +
81 ----
82 ff bb d2 78 29 af c0 99 b0 fe 80 62 57 18 ff fe
83 a3 42 29 60 57 18 a3 42 29 10 83 96 36 5d 65 4a
84 68 8e 6a 21 60 8d df 72 58
85 ----
86
87 Hexadecimal, decimal, and binary byte constants::
88 +
89 Input:
90 +
91 ----
92 aa bb $247 $-89 %0011_0010 %11.01= 10/10
93 ----
94 +
95 Output:
96 +
97 ----
98 aa bb f7 a7 32 da
99 ----
100
101 UTF-8, UTF-16, and UTF-32 literal strings::
102 +
103 Input:
104 +
105 ----
106 "hello world!" 00
107 u16le"stress\nverdict 🤣"
108 ----
109 +
110 Output:
111 +
112 ----
113 68 65 6c 6c 6f 20 77 6f 72 6c 64 21 00 73 00 74 ┆ hello world!•s•t
114 00 72 00 65 00 73 00 73 00 0a 00 76 00 65 00 72 ┆ •r•e•s•s•••v•e•r
115 00 64 00 69 00 63 00 74 00 20 00 3e d8 23 dd ┆ •d•i•c•t• •>•#•
116 ----
117
118 Labels: special variables holding the offset where they're defined::
119 +
120 ----
121 <beg> b2 52 e3 bc 91 05
122 $100 $50 <chair> 33 9f fe
123 25 e9 89 8a <end>
124 ----
125
126 Variables::
127 +
128 ----
129 5e 65 {tower = 47} c6 7f f2 c4
130 44 {hurl = tower - 14} b5 {tower = hurl} 26 2d
131 ----
132 +
133 The value of a variable assignment is the evaluation of a valid {py3}
134 expression which may include label and variable names.
135
136 Fixed-length number with a given length (8{nbsp}bits to 64{nbsp}bits) and byte order::
137 +
138 Input:
139 +
140 ----
141 {strength = 4}
142 {be} 67 <lbl> 44 $178 {(end - lbl) * 8 + strength : 16} $99 <end>
143 {le} {-1993 : 32}
144 {-3.141593 : 64}
145 ----
146 +
147 Output:
148 +
149 ----
150 67 44 b2 00 2c 63 37 f8 ff ff 7f bd c2 82 fb 21
151 09 c0
152 ----
153 +
154 The encoded number is the evaluation of a valid {py3} expression which
155 may include label and variable names.
156
157 https://en.wikipedia.org/wiki/LEB128[LEB128] integer::
158 +
159 Input:
160 +
161 ----
162 aa bb cc {-1993 : sleb128} <meow> dd ee ff
163 {meow * 199 : uleb128}
164 ----
165 +
166 Output:
167 +
168 ----
169 aa bb cc b7 70 dd ee ff e3 07
170 ----
171 +
172 The encoded integer is the evaluation of a valid {py3} expression which
173 may include label and variable names.
174
175 Conditional::
176 +
177 Input:
178 +
179 ----
180 aa bb cc
181
182 (
183 "foo"
184
185 !if {ICITTE > 10}
186 "bar"
187 !else
188 "fight"
189 !end
190 ) * 4
191 ----
192 +
193 Output:
194 +
195 ----
196 aa bb cc 66 6f 6f 66 69 67 68 74 66 6f 6f 66 69 ┆ •••foofightfoofi
197 67 68 74 66 6f 6f 62 61 72 66 6f 6f 62 61 72 ┆ ghtfoobarfoobar
198 ----
199
200 Repetition::
201 +
202 Input:
203 +
204 ----
205 aa bb * 5 cc <zoom> "yeah\0" * {zoom * 3}
206
207 !repeat 3
208 ff ee "juice"
209 !end
210 ----
211 +
212 Output:
213 +
214 ----
215 aa bb bb bb bb bb cc 79 65 61 68 00 79 65 61 68 ┆ •••••••yeah•yeah
216 00 79 65 61 68 00 79 65 61 68 00 79 65 61 68 00 ┆ •yeah•yeah•yeah•
217 79 65 61 68 00 79 65 61 68 00 79 65 61 68 00 79 ┆ yeah•yeah•yeah•y
218 65 61 68 00 79 65 61 68 00 79 65 61 68 00 79 65 ┆ eah•yeah•yeah•ye
219 61 68 00 79 65 61 68 00 79 65 61 68 00 79 65 61 ┆ ah•yeah•yeah•yea
220 68 00 79 65 61 68 00 79 65 61 68 00 79 65 61 68 ┆ h•yeah•yeah•yeah
221 00 79 65 61 68 00 79 65 61 68 00 79 65 61 68 00 ┆ •yeah•yeah•yeah•
222 ff ee 6a 75 69 63 65 ff ee 6a 75 69 63 65 ff ee ┆ ••juice••juice••
223 6a 75 69 63 65 ┆ juice
224 ----
225
226 Alignment::
227 +
228 Input:
229 +
230 ----
231 {be}
232
233 {199:32}
234 @64 {43:64}
235 @16 {-123:16}
236 @32~255 {5584:32}
237 ----
238 +
239 Output:
240 +
241 ----
242 00 00 00 c7 00 00 00 00 00 00 00 00 00 00 00 2b
243 ff 85 ff ff 00 00 15 d0
244 ----
245
246 Filling::
247 +
248 Input:
249 +
250 ----
251 {le}
252 {0xdeadbeef:32}
253 {-1993:16}
254 {9:16}
255 +0x40
256 {ICITTE:8}
257 "meow mix"
258 +200~FFh
259 {ICITTE:8}
260 ----
261 +
262 Output:
263 +
264 ----
265 ef be ad de 37 f8 09 00 00 00 00 00 00 00 00 00 ┆ ••••7•••••••••••
266 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ┆ ••••••••••••••••
267 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ┆ ••••••••••••••••
268 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ┆ ••••••••••••••••
269 40 6d 65 6f 77 20 6d 69 78 ff ff ff ff ff ff ff ┆ @meow mix•••••••
270 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ┆ ••••••••••••••••
271 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ┆ ••••••••••••••••
272 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ┆ ••••••••••••••••
273 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ┆ ••••••••••••••••
274 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ┆ ••••••••••••••••
275 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ┆ ••••••••••••••••
276 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ┆ ••••••••••••••••
277 ff ff ff ff ff ff ff ff c8 ┆ •••••••••
278 ----
279
280 Multilevel grouping::
281 +
282 Input:
283 +
284 ----
285 ff ((aa bb "zoom" cc) * 5) * 3 $-34 * 4
286 ----
287 +
288 Output:
289 +
290 ----
291 ff aa bb 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa ┆ •••zoom•••zoom••
292 bb 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa bb 7a ┆ •zoom•••zoom•••z
293 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa bb 7a 6f 6f ┆ oom•••zoom•••zoo
294 6d cc aa bb 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc ┆ m•••zoom•••zoom•
295 aa bb 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa bb ┆ ••zoom•••zoom•••
296 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa bb 7a 6f ┆ zoom•••zoom•••zo
297 6f 6d cc aa bb 7a 6f 6f 6d cc de de de de ┆ om•••zoom•••••
298 ----
299
300 Macros::
301 +
302 Input:
303 +
304 ----
305 !macro hello(world)
306 "hello"
307 !if world " world" !end
308 !end
309
310 !repeat 17
311 ff ff ff ff
312 m:hello({ICITTE > 15 and ICITTE < 60})
313 !end
314 ----
315 +
316 Output:
317 +
318 ----
319 ff ff ff ff 68 65 6c 6c 6f ff ff ff ff 68 65 6c ┆ ••••hello••••hel
320 6c 6f ff ff ff ff 68 65 6c 6c 6f 20 77 6f 72 6c ┆ lo••••hello worl
321 64 ff ff ff ff 68 65 6c 6c 6f 20 77 6f 72 6c 64 ┆ d••••hello world
322 ff ff ff ff 68 65 6c 6c 6f 20 77 6f 72 6c 64 ff ┆ ••••hello world•
323 ff ff ff 68 65 6c 6c 6f ff ff ff ff 68 65 6c 6c ┆ •••hello••••hell
324 6f ff ff ff ff 68 65 6c 6c 6f ff ff ff ff 68 65 ┆ o••••hello••••he
325 6c 6c 6f ff ff ff ff 68 65 6c 6c 6f ff ff ff ff ┆ llo••••hello••••
326 68 65 6c 6c 6f ff ff ff ff 68 65 6c 6c 6f ff ff ┆ hello••••hello••
327 ff ff 68 65 6c 6c 6f ff ff ff ff 68 65 6c 6c 6f ┆ ••hello••••hello
328 ff ff ff ff 68 65 6c 6c 6f ff ff ff ff 68 65 6c ┆ ••••hello••••hel
329 6c 6f ff ff ff ff 68 65 6c 6c 6f ┆ lo••••hello
330 ----
331
332 Precise error reporting::
333 +
334 ----
335 /tmp/meow.normand:10:24 - Expecting a bit (`0` or `1`).
336 ----
337 +
338 ----
339 /tmp/meow.normand:32:6 - Unexpected character `k`.
340 ----
341 +
342 ----
343 /tmp/meow.normand:24:19 - Illegal (unknown or unreachable) variable/label name `meow` in expression `(meow - 45) // 8`; the legal names are {`ICITTE`, `mix`, `zoom`}.
344 ----
345 +
346 ----
347 /tmp/meow.normand:32:19 - While expanding the macro `meow`:
348 /tmp/meow.normand:35:5 - While expanding the macro `zzz`:
349 /tmp/meow.normand:18:9 - Value 315 is outside the 8-bit range when evaluating expression `end - ICITTE`.
350 ----
351
352 You can use Normand to track data source files in your favorite VCS
353 instead of raw binary files. The binary files that Normand generates can
354 be used to test file format decoding, including malformatted data, for
355 example, as well as for education.
356
357 See <<learn-normand>> to explore all the Normand features.
358
359 == Install Normand
360
361 Normand requires Python ≥ 3.4.
362
363 To install Normand:
364
365 ----
366 $ python3 -m pip install --user normand
367 ----
368
369 See
370 https://packaging.python.org/en/latest/tutorials/installing-packages/#installing-to-the-user-site[Installing to the User Site]
371 to learn more about a user site installation.
372
373 [NOTE]
374 ====
375 Normand has a single module file, `normand.py`, which you can copy as is
376 to your project to use it (both the <<python3-api,`normand.parse()`>>
377 function and the <<command-line-tool,command-line tool>>).
378
379 `normand.py` has _no external dependencies_, but if you're using
380 Python{nbsp}3.4, you'll need a local copy of the standard `typing`
381 module.
382 ====
383
384 == Design goals
385
386 The design goals of Normand are:
387
388 Portability::
389 We're making sure `normand.py` works with Python{nbsp}≥{nbsp}3.4 and
390 doesn't have any external dependencies so that you may just copy the
391 module as is to your own project.
392
393 Ease of use::
394 The most basic Normand input is a sequence of hexadecimal constants
395 (for example, `4e6f726d616e64`) which produce exactly what you'd
396 expect.
397 +
398 Most Normand features map to programming language concepts you already
399 know and understand: constant integers, literal strings, variables,
400 conditionals, repetitions/loops, and the rest.
401
402 Concise and readable input::
403 We could have chosen XML or YAML as the input format, but having a
404 DSL here makes a Normand input compact and easy to read, two
405 important traits when using Normand to write tests, for example.
406 +
407 Compare the following Normand input and some hypothetical XML
408 equivalent, for example:
409 +
410 .Actual normand input.
411 ----
412 ff dd 01 ab $192 $-128 %1101:0011
413
414 {end:8}
415
416 {iter = 1}
417
418 !if {not something}
419 # five times because xyz
420 !repeat 5
421 "hello world " {iter:8}
422 {iter = iter + 1}
423 !end
424 !end
425
426 <end>
427 ----
428 +
429 .Hypothetical Normand XML input.
430 [source,xml]
431 ----
432 <?xml version="1.0" encoding="utf-8" ?>
433 <group>
434 <byte base="x" val="ff" />
435 <byte base="x" val="dd" />
436 <byte base="x" val="1" />
437 <byte base="x" val="ab" />
438 <byte base="d" val="192" />
439 <byte base="d" val="-128" />
440 <byte base="b" val="11010011" />
441 <fixed-len-num expr="end" len="8" />
442 <var-assign name="iter" expr="1" />
443 <cond expr="not something">
444 <!-- five times because xyz -->
445 <repeat expr="5">
446 <str>hello world </str>
447 <fixed-len-num expr="iter" len="8" />
448 <var-assign name="iter" expr="iter + 1" />
449 </repeat>
450 </cond>
451 <label name="end" />
452 </group>
453 ----
454
455 == Learn Normand
456
457 A Normand text input is a sequence of items which represent a sequence
458 of raw bytes.
459
460 [[state]] During the processing of items to data, Normand relies on a
461 current state:
462
463 [%header%autowidth]
464 |===
465 |State variable |Description |Initial value: <<python3-api,{py3} API>> |Initial value: <<command-line-tool,CLI>>
466
467 |[[cur-offset]] Current offset
468 |
469 The current offset has an effect on the value of <<label,labels>> and of
470 the special `ICITTE` name in <<fixed-length-number,fixed-length
471 number>>, <<leb-128-integer,LEB128 integer>>,
472 <<filling,filling>>, <<variable-assignment,variable assignment>>,
473 <<conditional-block,conditional block>>, <<repetition-block,repetition
474 block>>, <<macro-expansion,macro expansion>>, and
475 <<post-item-repetition,post-item repetition>> expression evaluation.
476
477 Each generated byte increments the current offset.
478
479 A <<current-offset-setting,current offset setting>> may change the
480 current offset without generating data.
481
482 An <<current-offset-alignment,current offset alignment>> generates
483 padding bytes to make the current offset satisfy a given alignment.
484 |`init_offset` parameter of the `parse()` function.
485 |`--offset` option.
486
487 |[[cur-bo]] Current byte order
488 |
489 The current byte order has an effect on the encoding of
490 <<fixed-length-number,fixed-length numbers>>.
491
492 A <<current-byte-order-setting,current byte order setting>> may change
493 the current byte order.
494 |`init_byte_order` parameter of the `parse()` function.
495 |`--byte-order` option.
496
497 |<<label,Labels>>
498 |Mapping of label names to integral values.
499 |`init_labels` parameter of the `parse()` function.
500 |One or more `--label` options.
501
502 |<<variable-assignment,Variables>>
503 |Mapping of variable names to integral or floating point number values.
504 |`init_variables` parameter of the `parse()` function.
505 |One or more `--var` options.
506 |===
507
508 The available items are:
509
510 * A <<byte-constant,constant integer>> representing a single byte.
511
512 * A <<literal-string,literal string>> representing a sequence of bytes
513 encoding UTF-8, UTF-16, or UTF-32 data.
514
515 * A <<current-byte-order-setting,current byte order setting>> (big or
516 little endian).
517
518 * A <<fixed-length-number,fixed-length number>> (integer or
519 floating point) using the <<cur-bo,current byte order>> and of which
520 the value is the result of a {py3} expression.
521
522 * An <<leb128-integer,LEB128 integer>> of which the value is the result
523 of a {py3} expression.
524
525 * A <<current-offset-setting,current offset setting>>.
526
527 * A <<current-offset-alignment,current offset alignment>>.
528
529 * A <<filling,filling>>.
530
531 * A <<label,label>>, that is, a named constant holding the current
532 offset.
533 +
534 This is similar to an assembly label.
535
536 * A <<variable-assignment,variable assignment>> associating a name to
537 the integral result of an evaluated {py3} expression.
538
539 * A <<group,group>>, that is, a scoped sequence of items.
540
541 * A <<conditional-block,conditional block>>.
542
543 * A <<repetition-block,repetition block>>.
544
545 * A <<macro-definition-block,macro definition block>>.
546
547 * A <<macro-expansion,macro expansion>>.
548
549 Moreover, you can repeat many items above a constant or variable number
550 of times with the ``pass:[*]`` operator _after_ the item to repeat. This
551 is called a <<post-item-repetition,post-item repetition>>.
552
553 A Normand comment may exist:
554
555 * Between items, possibly within a group.
556 * Between the nibbles of a constant hexadecimal byte.
557 * Between the bits of a constant binary byte.
558 * Between the last item and the ``pass:[*]`` character of a post-item
559 repetition, and between that ``pass:[*]`` character and the following
560 number or expression.
561 * Between the ``!repeat``/``!r`` block opening and the following
562 constant integer, name, or expression of a repetition block.
563 * Between the ``!if`` block opening and the following name or expression
564 of a conditional block.
565
566 A comment is anything between two ``pass:[#]`` characters on the same
567 line, or from ``pass:[#]`` until the end of the line. Whitespaces and
568 the following symbol characters are also considered comments where a
569 comment may exist:
570
571 ----
572 / \ ? & : ; . , [ ] _ = | -
573 ----
574
575 The latter serve to improve readability so that you may write, for
576 example, a MAC address or a UUID as is.
577
578 [[const-int]] Many items require a _constant integer_, possibly
579 negative, in which case it may start with `-` for a negative integer. A
580 positive constant integer is any of:
581
582 Decimal::
583 One or mode digits (`0` to `9`).
584
585 Hexadecimal::
586 One of:
587 +
588 * The `0x` or `0X` prefix followed with one or more hexadecimal digits
589 (`0` to `9`, `a` to `f`, or `A` to `F`).
590 * One or more hexadecimal digits followed with the `h` or `H` suffix.
591
592 Octal::
593 One of:
594 +
595 * The `0o` or `0O` prefix followed with one or more octal digits
596 (`0` to `7`).
597 * One or more octal digits followed with the `o`, `O`, `q`, or `Q`
598 suffix.
599
600 Binary::
601 One of:
602 +
603 * The `0b` or `0B` prefix followed with one or more bits (`0` or `1`).
604 * One or more bits followed with the `b` or `B` suffix.
605
606 You can test the examples of this section with the `normand`
607 <<command-line-tool,command-line tool>> as such:
608
609 ----
610 $ normand file | hexdump -C
611 ----
612
613 where `file` is the name of a file containing the Normand input.
614
615 === Byte constant
616
617 A _byte constant_ represents a single byte.
618
619 A byte constant is:
620
621 Hexadecimal form::
622 Two consecutive hexadecimal digits.
623
624 Decimal form::
625 One or more digits after the `$` prefix.
626
627 Binary form::
628 Eight bits after the `%` prefix.
629
630 ====
631 Input:
632
633 ----
634 ab cd [3d 8F] CC
635 ----
636
637 Output:
638
639 ----
640 ab cd 3d 8f cc
641 ----
642 ====
643
644 ====
645 Input:
646
647 ----
648 $192 %1100/0011 $ -77
649 ----
650
651 Output:
652
653 ----
654 c0 c3 b3
655 ----
656 ====
657
658 ====
659 Input:
660
661 ----
662 58f64689-6316-4d55-8a1a-04cada366172
663 fe80::6257:18ff:fea3:4229
664 ----
665
666 Output:
667
668 ----
669 58 f6 46 89 63 16 4d 55 8a 1a 04 ca da 36 61 72 ┆ X•F•c•MU•••••6ar
670 fe 80 62 57 18 ff fe a3 42 29 ┆ ••bW••••B)
671 ----
672 ====
673
674 ====
675 Input:
676
677 ----
678 %01110011 %01100001 %01101100 %01110101 %01110100
679 ----
680
681 Output:
682
683 ----
684 73 61 6c 75 74 ┆ salut
685 ----
686 ====
687
688 === Literal string
689
690 A _literal string_ represents the UTF-8-, UTF-16-, or UTF-32-encoded
691 bytes of a string.
692
693 The string to encode isn't implicitly null-terminated: use `\0` at the
694 end of the string to add a null character.
695
696 A literal string is:
697
698 . **Optional**: one of the following encodings instead of UTF-8:
699 +
700 --
701 [horizontal]
702 `u16be`:: UTF-16BE.
703 `u16le`:: UTF-16LE.
704 `u32be`:: UTF-32BE.
705 `u32le`:: UTF-32LE.
706 --
707
708 . The ``pass:["]`` prefix.
709
710 . A sequence of zero or more characters, possibly containing escape
711 sequences.
712 +
713 An escape sequence is the ``\`` character followed by one of:
714 +
715 --
716 [horizontal]
717 `0`:: Null (U+0000)
718 `a`:: Alert (U+0007)
719 `b`:: Backspace (U+0008)
720 `e`:: Escape (U+001B)
721 `f`:: Form feed (U+000C)
722 `n`:: End of line (U+000A)
723 `r`:: Carriage return (U+000D)
724 `t`:: Character tabulation (U+0009)
725 `v`:: Line tabulation (U+000B)
726 ``\``:: Reverse solidus (U+005C)
727 ``pass:["]``:: Quotation mark (U+0022)
728 --
729
730 . The ``pass:["]`` suffix.
731
732 ====
733 Input:
734
735 ----
736 "coucou tout le monde!"
737 ----
738
739 Output:
740
741 ----
742 63 6f 75 63 6f 75 20 74 6f 75 74 20 6c 65 20 6d ┆ coucou tout le m
743 6f 6e 64 65 21 ┆ onde!
744 ----
745 ====
746
747 ====
748 Input:
749
750 ----
751 u16le"I am not young enough to know everything."
752 ----
753
754 Output:
755
756 ----
757 49 00 20 00 61 00 6d 00 20 00 6e 00 6f 00 74 00 ┆ I• •a•m• •n•o•t•
758 20 00 79 00 6f 00 75 00 6e 00 67 00 20 00 65 00 ┆ •y•o•u•n•g• •e•
759 6e 00 6f 00 75 00 67 00 68 00 20 00 74 00 6f 00 ┆ n•o•u•g•h• •t•o•
760 20 00 6b 00 6e 00 6f 00 77 00 20 00 65 00 76 00 ┆ •k•n•o•w• •e•v•
761 65 00 72 00 79 00 74 00 68 00 69 00 6e 00 67 00 ┆ e•r•y•t•h•i•n•g•
762 2e 00 ┆ .•
763 ----
764 ====
765
766 ====
767 Input:
768
769 ----
770 u32be "\"illusion is the first\nof all pleasures\" 🦉"
771 ----
772
773 Output:
774
775 ----
776 00 00 00 22 00 00 00 69 00 00 00 6c 00 00 00 6c ┆ •••"•••i•••l•••l
777 00 00 00 75 00 00 00 73 00 00 00 69 00 00 00 6f ┆ •••u•••s•••i•••o
778 00 00 00 6e 00 00 00 20 00 00 00 69 00 00 00 73 ┆ •••n••• •••i•••s
779 00 00 00 20 00 00 00 74 00 00 00 68 00 00 00 65 ┆ ••• •••t•••h•••e
780 00 00 00 20 00 00 00 66 00 00 00 69 00 00 00 72 ┆ ••• •••f•••i•••r
781 00 00 00 73 00 00 00 74 00 00 00 0a 00 00 00 6f ┆ •••s•••t•••••••o
782 00 00 00 66 00 00 00 20 00 00 00 61 00 00 00 6c ┆ •••f••• •••a•••l
783 00 00 00 6c 00 00 00 20 00 00 00 70 00 00 00 6c ┆ •••l••• •••p•••l
784 00 00 00 65 00 00 00 61 00 00 00 73 00 00 00 75 ┆ •••e•••a•••s•••u
785 00 00 00 72 00 00 00 65 00 00 00 73 00 00 00 22 ┆ •••r•••e•••s•••"
786 00 00 00 20 00 01 f9 89 ┆ ••• ••••
787 ----
788 ====
789
790 === Current byte order setting
791
792 This special item sets the <<cur-bo,_current byte order_>>.
793
794 The two accepted forms are:
795
796 [horizontal]
797 ``pass:[{be}]``:: Set the current byte order to big endian.
798 ``pass:[{le}]``:: Set the current byte order to little endian.
799
800 === Fixed-length number
801
802 A _fixed-length number_ represents a fixed number of bytes encoding
803 either:
804
805 * An unsigned or signed integer (two's complement).
806 +
807 The available lengths are 8, 16, 24, 32, 40, 48, 56, and 64.
808
809 * A floating point number
810 (https://standards.ieee.org/standard/754-2008.html[IEEE{nbsp}754-2008]).
811 +
812 The available length are 32 (_binary32_) and 64 (_binary64_).
813
814 The value is the result of evaluating a {py3} expression using the
815 <<cur-bo,current byte order>>.
816
817 A fixed-length number is:
818
819 . The ``pass:[{]`` prefix.
820
821 . A valid {py3} expression.
822 +
823 For a fixed-length number at some source location{nbsp}__**L**__, this
824 expression may contain the name of any accessible <<label,label>> (not
825 within a nested group), including the name of a label defined
826 after{nbsp}__**L**__, as well as the name of any
827 <<variable-assignment,variable>> known at{nbsp}__**L**__.
828 +
829 The value of the special name `ICITTE` (`int` type) in this expression
830 is the <<cur-offset,current offset>> (before encoding the number).
831
832 . The `:` character.
833
834 . An encoding length in bits amongst:
835 +
836 --
837 The expression evaluates to an `int` or `bool` value::
838 `8`, `16`, `24`, `32`, `40`, `48`, `56`, and `64`.
839 +
840 NOTE: Normand automatically converts a `bool` value to `int`.
841
842 The expression evaluates to a `float` value::
843 `32` and `64`.
844 --
845
846 . The `}` suffix.
847
848 ====
849 Input:
850
851 ----
852 {le} {345:16}
853 {be} {-0xabcd:32}
854 ----
855
856 Output:
857
858 ----
859 59 01 ff ff 54 33
860 ----
861 ====
862
863 ====
864 Input:
865
866 ----
867 {be}
868
869 # String length in bits
870 {8 * (str_end - str_beg) : 16}
871
872 # String
873 <str_beg>
874 "hello world!"
875 <str_end>
876 ----
877
878 Output:
879
880 ----
881 00 60 68 65 6c 6c 6f 20 77 6f 72 6c 64 21 ┆ •`hello world!
882 ----
883 ====
884
885 ====
886 Input:
887
888 ----
889 {20 - ICITTE : 8} * 10
890 ----
891
892 Output:
893
894 ----
895 14 13 12 11 10 0f 0e 0d 0c 0b
896 ----
897 ====
898
899 ====
900 Input:
901
902 ----
903 {le}
904 {2 * 0.0529 : 32}
905 ----
906
907 Output:
908
909 ----
910 ac ad d8 3d
911 ----
912 ====
913
914 === LEB128 integer
915
916 An _LEB128 integer_ represents a variable number of bytes encoding an
917 unsigned or signed integer which is the result of evaluating a {py3}
918 expression following the https://en.wikipedia.org/wiki/LEB128[LEB128]
919 format.
920
921 An LEB128 integer is:
922
923 . The ``pass:[{]`` prefix.
924
925 . A valid {py3} expression of which the evaluation result type
926 is `int` or `bool` (automatically converted to `int`).
927 +
928 For an LEB128 integer at some source location{nbsp}__**L**__, this
929 expression may contain:
930 +
931 --
932 * The name of any <<label,label>> defined before{nbsp}__**L**__
933 which isn't within a nested group.
934 * The name of any <<variable-assignment,variable>> known
935 at{nbsp}__**L**__.
936 --
937 +
938 The value of the special name `ICITTE` (`int` type) in this expression
939 is the <<cur-offset,current offset>> (before encoding the integer).
940
941 . The `:` character.
942
943 . One of:
944 +
945 --
946 [horizontal]
947 `uleb128`:: Use the unsigned LEB128 format.
948 `sleb128`:: Use the signed LEB128 format.
949 --
950
951 . The `}` suffix.
952
953 ====
954 Input:
955
956 ----
957 {624485 : uleb128}
958 ----
959
960 Output:
961
962 ----
963 e5 8e 26
964 ----
965 ====
966
967 ====
968 Input:
969
970 ----
971 aa bb cc dd
972 <meow>
973 ee ff
974 {-981238311 + (meow * -23) : sleb128}
975 "hello"
976 ----
977
978 Output:
979
980 ----
981 aa bb cc dd ee ff fd fa 8d ac 7c 68 65 6c 6c 6f ┆ ••••••••••|hello
982 ----
983 ====
984
985 === Current offset setting
986
987 This special item sets the <<cur-offset,_current offset_>>.
988
989 A current offset setting is:
990
991 . The `<` prefix.
992
993 . A <<const-int,positive constant integer>> which is the new current
994 offset.
995
996 . The `>` suffix.
997
998 ====
999 Input:
1000
1001 ----
1002 {ICITTE : 8} * 8
1003 <0x61> {ICITTE : 8} * 8
1004 ----
1005
1006 Output:
1007
1008 ----
1009 00 01 02 03 04 05 06 07 61 62 63 64 65 66 67 68 ┆ ••••••••abcdefgh
1010 ----
1011 ====
1012
1013 ====
1014 Input:
1015
1016 ----
1017 aa bb cc dd <meow> ee ff
1018 <12> 11 22 33 <mix> 44 55
1019 {meow : 8} {mix : 8}
1020 ----
1021
1022 Output:
1023
1024 ----
1025 aa bb cc dd ee ff 11 22 33 44 55 04 0f ┆ •••••••"3DU••
1026 ----
1027 ====
1028
1029 === Current offset alignment
1030
1031 A _current offset alignment_ represents zero or more padding bytes to
1032 make the <<cur-offset,current offset>> meet a given
1033 https://en.wikipedia.org/wiki/Data_structure_alignment[alignment] value.
1034
1035 More specifically, for an alignment value of{nbsp}__**N**__{nbsp}bits,
1036 a current offset alignment represents the required padding bytes until
1037 the current offset is a multiple of __**N**__{nbsp}/{nbsp}8.
1038
1039 A current offset alignment is:
1040
1041 . The `@` prefix.
1042
1043 . A <<const-int,positive constant integer>> which is the alignment value
1044 in _bits_.
1045 +
1046 This value must be greater than zero and a multiple of{nbsp}8.
1047
1048 . **Optional**:
1049 +
1050 --
1051 . The ``pass:[~]`` prefix.
1052 . A <<const-int,positive constant integer>> which is the value of the
1053 byte to use as padding to align the <<cur-offset,current offset>>.
1054 --
1055 +
1056 Without this section, the padding byte value is zero.
1057
1058 ====
1059 Input:
1060
1061 ----
1062 11 22 (@32 aa bb cc) * 3
1063 ----
1064
1065 Output:
1066
1067 ----
1068 11 22 00 00 aa bb cc 00 aa bb cc 00 aa bb cc
1069 ----
1070 ====
1071
1072 ====
1073 Input:
1074
1075 ----
1076 {le}
1077 77 88
1078 @32~0xcc {-893.5:32}
1079 @128~0x55 "meow"
1080 ----
1081
1082 Output:
1083
1084 ----
1085 77 88 cc cc 00 60 5f c4 55 55 55 55 55 55 55 55 ┆ w••••`_•UUUUUUUU
1086 6d 65 6f 77 ┆ meow
1087 ----
1088 ====
1089
1090 ====
1091 Input:
1092
1093 ----
1094 aa bb cc <29> @64~255 "zoom"
1095 ----
1096
1097 Output:
1098
1099 ----
1100 aa bb cc ff ff ff 7a 6f 6f 6d ┆ ••••••zoom
1101 ----
1102 ====
1103
1104 === Filling
1105
1106 A _filling_ represents zero or more padding bytes to make the
1107 <<cur-offset,current offset>> reach a given value.
1108
1109 A filling is:
1110
1111 . The ``pass:[+]`` prefix.
1112
1113 . One of:
1114
1115 ** A <<const-int,positive constant integer>> which is the current offset
1116 target.
1117
1118 ** The ``pass:[{]`` prefix, a valid {py3} expression of which the
1119 evaluation result type is `int` or `bool` (automatically converted to
1120 `int`), and the ``pass:[}]`` suffix.
1121 +
1122 For a filling at some source location{nbsp}__**L**__, this expression
1123 may contain:
1124 +
1125 --
1126 * The name of any <<label,label>> defined before{nbsp}__**L**__
1127 which isn't within a nested group.
1128 * The name of any <<variable-assignment,variable>> known
1129 at{nbsp}__**L**__.
1130 --
1131 +
1132 The value of the special name `ICITTE` (`int` type) in this expression
1133 is the <<cur-offset,current offset>> (before handling the items to
1134 repeat).
1135
1136 ** A valid {py3} name.
1137 +
1138 For the name `__NAME__`, this is equivalent to the
1139 `pass:[{]__NAME__pass:[}]` form above.
1140
1141 +
1142 This value must be greater than or equal to the current offset where
1143 it's used.
1144
1145 . **Optional**:
1146 +
1147 --
1148 . The ``pass:[~]`` prefix.
1149 . A <<const-int,positive constant integer>> which is the value of the
1150 byte to use as padding to reach the current offset target.
1151 --
1152 +
1153 Without this section, the padding byte value is zero.
1154
1155 ====
1156 Input:
1157
1158 ----
1159 aa bb cc dd
1160 +0x40
1161 "hello world"
1162 ----
1163
1164 Output:
1165
1166 ----
1167 aa bb cc dd 00 00 00 00 00 00 00 00 00 00 00 00 ┆ ••••••••••••••••
1168 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ┆ ••••••••••••••••
1169 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ┆ ••••••••••••••••
1170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ┆ ••••••••••••••••
1171 68 65 6c 6c 6f 20 77 6f 72 6c 64 ┆ hello world
1172 ----
1173 ====
1174
1175 ====
1176 Input:
1177
1178 ----
1179 !macro part(iter, fill)
1180 <0> "particular security " {ord('0') + iter : 8} +fill~0x80
1181 !end
1182
1183 {iter = 1}
1184
1185 !repeat 5
1186 m:part(iter, {32 + 4 * iter})
1187 {iter = iter + 1}
1188 !end
1189 ----
1190
1191 Output:
1192
1193 ----
1194 70 61 72 74 69 63 75 6c 61 72 20 73 65 63 75 72 ┆ particular secur
1195 69 74 79 20 31 80 80 80 80 80 80 80 80 80 80 80 ┆ ity 1•••••••••••
1196 80 80 80 80 70 61 72 74 69 63 75 6c 61 72 20 73 ┆ ••••particular s
1197 65 63 75 72 69 74 79 20 32 80 80 80 80 80 80 80 ┆ ecurity 2•••••••
1198 80 80 80 80 80 80 80 80 80 80 80 80 70 61 72 74 ┆ ••••••••••••part
1199 69 63 75 6c 61 72 20 73 65 63 75 72 69 74 79 20 ┆ icular security
1200 33 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ┆ 3•••••••••••••••
1201 80 80 80 80 80 80 80 80 70 61 72 74 69 63 75 6c ┆ ••••••••particul
1202 61 72 20 73 65 63 75 72 69 74 79 20 34 80 80 80 ┆ ar security 4•••
1203 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ┆ ••••••••••••••••
1204 80 80 80 80 80 80 80 80 70 61 72 74 69 63 75 6c ┆ ••••••••particul
1205 61 72 20 73 65 63 75 72 69 74 79 20 35 80 80 80 ┆ ar security 5•••
1206 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ┆ ••••••••••••••••
1207 80 80 80 80 80 80 80 80 80 80 80 80 ┆ ••••••••••••
1208 ----
1209 ====
1210
1211 === Label
1212
1213 A _label_ associates a name to the <<cur-offset,current offset>>.
1214
1215 All the labels of a whole Normand input must have unique names.
1216
1217 A label must not share the name of a <<variable-assignment,variable>>
1218 name.
1219
1220 A label is:
1221
1222 . The `<` prefix.
1223
1224 . A valid {py3} name which is not `ICITTE`.
1225
1226 . The `>` suffix.
1227
1228 === Variable assignment
1229
1230 A _variable assignment_ associates a name to the integral result of an
1231 evaluated {py3} expression.
1232
1233 A variable assignment is:
1234
1235 . The ``pass:[{]`` prefix.
1236
1237 . A valid {py3} name which is not `ICITTE`.
1238
1239 . The `=` character.
1240
1241 . A valid {py3} expression of which the evaluation result type
1242 is `int`, `float`, or `bool` (automatically converted to `int`).
1243 +
1244 For a variable assignment at some source location{nbsp}__**L**__, this
1245 expression may contain:
1246 +
1247 --
1248 * The name of any <<label,label>> defined before{nbsp}__**L**__
1249 which isn't within a nested group.
1250 * The name of any <<variable-assignment,variable>> known
1251 at{nbsp}__**L**__.
1252 --
1253 +
1254 The value of the special name `ICITTE` (`int` type) in this expression
1255 is the <<cur-offset,current offset>>.
1256
1257 . The `}` suffix.
1258
1259 ====
1260 Input:
1261
1262 ----
1263 {mix = 101} {le}
1264 {meow = 42} 11 22 {meow:8} 33 {meow = ICITTE + 17}
1265 "yooo" {meow + mix : 16}
1266 ----
1267
1268 Output:
1269
1270 ----
1271 11 22 2a 33 79 6f 6f 6f 7a 00 ┆ •"*3yoooz•
1272 ----
1273 ====
1274
1275 === Group
1276
1277 A _group_ is a scoped sequence of items.
1278
1279 The <<label,labels>> within a group aren't visible outside of it.
1280
1281 The main purpose of a group is to <<post-item-repetition,repeat>> more
1282 than a single item and to isolate labels.
1283
1284 A group is:
1285
1286 . The `(`, `!group`, or `!g` opening.
1287
1288 . Zero or more items.
1289
1290 . Depending on the group opening:
1291 +
1292 --
1293 `(`::
1294 The `)` closing.
1295
1296 `!group`::
1297 `!g`::
1298 The `!end` closing.
1299 --
1300
1301 ====
1302 Input:
1303
1304 ----
1305 ((aa bb cc) dd () ee) "leclerc"
1306 ----
1307
1308 Output:
1309
1310 ----
1311 aa bb cc dd ee 6c 65 63 6c 65 72 63 ┆ •••••leclerc
1312 ----
1313 ====
1314
1315 ====
1316 Input:
1317
1318 ----
1319 !group
1320 (aa bb cc) * 3 dd ee
1321 !end * 5
1322 ----
1323
1324 Output:
1325
1326 ----
1327 aa bb cc aa bb cc aa bb cc dd ee aa bb cc aa bb
1328 cc aa bb cc dd ee aa bb cc aa bb cc aa bb cc dd
1329 ee aa bb cc aa bb cc aa bb cc dd ee aa bb cc aa
1330 bb cc aa bb cc dd ee
1331 ----
1332 ====
1333
1334 ====
1335 Input:
1336
1337 ----
1338 {be}
1339 (
1340 <str_beg> u16le"sébastien diaz" <str_end>
1341 {ICITTE - str_beg : 8}
1342 {(end - str_beg) * 5 : 24}
1343 ) * 3
1344 <end>
1345 ----
1346
1347 Output:
1348
1349 ----
1350 73 00 e9 00 62 00 61 00 73 00 74 00 69 00 65 00 ┆ s•••b•a•s•t•i•e•
1351 6e 00 20 00 64 00 69 00 61 00 7a 00 1c 00 01 e0 ┆ n• •d•i•a•z•••••
1352 73 00 e9 00 62 00 61 00 73 00 74 00 69 00 65 00 ┆ s•••b•a•s•t•i•e•
1353 6e 00 20 00 64 00 69 00 61 00 7a 00 1c 00 01 40 ┆ n• •d•i•a•z••••@
1354 73 00 e9 00 62 00 61 00 73 00 74 00 69 00 65 00 ┆ s•••b•a•s•t•i•e•
1355 6e 00 20 00 64 00 69 00 61 00 7a 00 1c 00 00 a0 ┆ n• •d•i•a•z•••••
1356 ----
1357 ====
1358
1359 === Conditional block
1360
1361 A _conditional block_ represents either the bytes of zero or more items
1362 if some expression is true, or the bytes of zero or more other items if
1363 it's false.
1364
1365 A conditional block is:
1366
1367 . The `!if` opening.
1368
1369 . One of:
1370
1371 ** The ``pass:[{]`` prefix, a valid {py3} expression of which the
1372 evaluation result type is `int` or `bool` (automatically converted to
1373 `int`), and the ``pass:[}]`` suffix.
1374 +
1375 For a conditional block at some source location{nbsp}__**L**__, this
1376 expression may contain:
1377 +
1378 --
1379 * The name of any <<label,label>> defined before{nbsp}__**L**__
1380 which isn't within a nested group.
1381 * The name of any <<variable-assignment,variable>> known
1382 at{nbsp}__**L**__.
1383 --
1384 +
1385 The value of the special name `ICITTE` (`int` type) in this expression
1386 is the <<cur-offset,current offset>> (before handling the contained
1387 items).
1388
1389 ** A valid {py3} name.
1390 +
1391 For the name `__NAME__`, this is equivalent to the
1392 `pass:[{]__NAME__pass:[}]` form above.
1393
1394 . Zero or more items to be handled when the condition is true.
1395
1396 . **Optional**:
1397
1398 .. The `!else` opening.
1399 .. Zero or more items to be handled when the condition is false.
1400
1401 . The `!end` closing.
1402
1403 ====
1404 Input:
1405
1406 ----
1407 {at = 1}
1408 {rep_count = 9}
1409
1410 !repeat rep_count
1411 "meow "
1412
1413 !if {ICITTE > 25}
1414 "mix"
1415 !else
1416 "zoom"
1417 !end
1418
1419 !if {at < rep_count} 20 !end
1420
1421 {at = at + 1}
1422 !end
1423 ----
1424
1425 Output:
1426
1427 ----
1428 6d 65 6f 77 20 7a 6f 6f 6d 20 6d 65 6f 77 20 7a ┆ meow zoom meow z
1429 6f 6f 6d 20 6d 65 6f 77 20 7a 6f 6f 6d 20 6d 65 ┆ oom meow zoom me
1430 6f 77 20 6d 69 78 20 6d 65 6f 77 20 6d 69 78 20 ┆ ow mix meow mix
1431 6d 65 6f 77 20 6d 69 78 20 6d 65 6f 77 20 6d 69 ┆ meow mix meow mi
1432 78 20 6d 65 6f 77 20 6d 69 78 20 6d 65 6f 77 20 ┆ x meow mix meow
1433 6d 69 78 ┆ mix
1434 ----
1435 ====
1436
1437 ====
1438 Input:
1439
1440 ----
1441 <str_beg>
1442 u16le"meow mix!"
1443 <str_end>
1444
1445 !if {str_end - str_beg > 10}
1446 " BIG"
1447 !end
1448 ----
1449
1450 Output:
1451
1452 ----
1453 6d 00 65 00 6f 00 77 00 20 00 6d 00 69 00 78 00 ┆ m•e•o•w• •m•i•x•
1454 21 00 20 42 49 47 ┆ !• BIG
1455 ----
1456 ====
1457
1458 === Repetition block
1459
1460 A _repetition block_ represents the bytes of one or more items repeated
1461 a given number of times.
1462
1463 A repetition block is:
1464
1465 . The `!repeat` or `!r` opening.
1466
1467 . One of:
1468
1469 ** A <<const-int,positive constant integer>> which is the number of
1470 times to repeat the previous item.
1471
1472 ** The ``pass:[{]`` prefix, a valid {py3} expression of which the
1473 evaluation result type is `int` or `bool` (automatically converted to
1474 `int`), and the ``pass:[}]`` suffix.
1475 +
1476 For a repetition block at some source location{nbsp}__**L**__, this
1477 expression may contain:
1478 +
1479 --
1480 * The name of any <<label,label>> defined before{nbsp}__**L**__
1481 which isn't within a nested group.
1482 * The name of any <<variable-assignment,variable>> known
1483 at{nbsp}__**L**__.
1484 --
1485 +
1486 The value of the special name `ICITTE` (`int` type) in this expression
1487 is the <<cur-offset,current offset>> (before handling the items to
1488 repeat).
1489
1490 ** A valid {py3} name.
1491 +
1492 For the name `__NAME__`, this is equivalent to the
1493 `pass:[{]__NAME__pass:[}]` form above.
1494
1495 . Zero or more items.
1496
1497 . The `!end` closing.
1498
1499 You may also use a <<post-item-repetition,post-item repetition>> after
1500 some items. The form ``!repeat{nbsp}__X__{nbsp}__ITEMS__{nbsp}!end``
1501 is equivalent to ``(__ITEMS__){nbsp}pass:[*]{nbsp}__X__``.
1502
1503 ====
1504 Input:
1505
1506 ----
1507 !repeat 0o400
1508 {end - ICITTE - 1 : 8}
1509 !end
1510
1511 <end>
1512 ----
1513
1514 Output:
1515
1516 ----
1517 ff fe fd fc fb fa f9 f8 f7 f6 f5 f4 f3 f2 f1 f0 ┆ ••••••••••••••••
1518 ef ee ed ec eb ea e9 e8 e7 e6 e5 e4 e3 e2 e1 e0 ┆ ••••••••••••••••
1519 df de dd dc db da d9 d8 d7 d6 d5 d4 d3 d2 d1 d0 ┆ ••••••••••••••••
1520 cf ce cd cc cb ca c9 c8 c7 c6 c5 c4 c3 c2 c1 c0 ┆ ••••••••••••••••
1521 bf be bd bc bb ba b9 b8 b7 b6 b5 b4 b3 b2 b1 b0 ┆ ••••••••••••••••
1522 af ae ad ac ab aa a9 a8 a7 a6 a5 a4 a3 a2 a1 a0 ┆ ••••••••••••••••
1523 9f 9e 9d 9c 9b 9a 99 98 97 96 95 94 93 92 91 90 ┆ ••••••••••••••••
1524 8f 8e 8d 8c 8b 8a 89 88 87 86 85 84 83 82 81 80 ┆ ••••••••••••••••
1525 7f 7e 7d 7c 7b 7a 79 78 77 76 75 74 73 72 71 70 ┆ •~}|{zyxwvutsrqp
1526 6f 6e 6d 6c 6b 6a 69 68 67 66 65 64 63 62 61 60 ┆ onmlkjihgfedcba`
1527 5f 5e 5d 5c 5b 5a 59 58 57 56 55 54 53 52 51 50 ┆ _^]\[ZYXWVUTSRQP
1528 4f 4e 4d 4c 4b 4a 49 48 47 46 45 44 43 42 41 40 ┆ ONMLKJIHGFEDCBA@
1529 3f 3e 3d 3c 3b 3a 39 38 37 36 35 34 33 32 31 30 ┆ ?>=<;:9876543210
1530 2f 2e 2d 2c 2b 2a 29 28 27 26 25 24 23 22 21 20 ┆ /.-,+*)('&%$#"!
1531 1f 1e 1d 1c 1b 1a 19 18 17 16 15 14 13 12 11 10 ┆ ••••••••••••••••
1532 0f 0e 0d 0c 0b 0a 09 08 07 06 05 04 03 02 01 00 ┆ ••••••••••••••••
1533 ----
1534 ====
1535
1536 ====
1537 Input:
1538
1539 ----
1540 {times = 1}
1541
1542 aa bb cc dd
1543
1544 !repeat 3
1545 <here>
1546
1547 !repeat {here + 1}
1548 ee ff
1549 !end
1550
1551 11 22 !repeat times 33 !end
1552
1553 {times = times + 1}
1554 !end
1555
1556 "coucou!"
1557 ----
1558
1559 Output:
1560
1561 ----
1562 aa bb cc dd ee ff ee ff ee ff ee ff ee ff 11 22 ┆ •••••••••••••••"
1563 33 ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ 3•••••••••••••••
1564 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1565 ff ee ff ee ff 11 22 33 33 ee ff ee ff ee ff ee ┆ ••••••"33•••••••
1566 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1567 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1568 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1569 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1570 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1571 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1572 ff ee ff ee ff ee ff ee ff ee ff ee ff 11 22 33 ┆ ••••••••••••••"3
1573 33 33 63 6f 75 63 6f 75 21 ┆ 33coucou!
1574 ----
1575 ====
1576
1577 === Macro definition block
1578
1579 A _macro definition block_ associates a name and parameter names to
1580 a group of items.
1581
1582 A macro definition block doesn't lead to generated bytes itself: a
1583 <<macro-expansion,macro expansion>> does so.
1584
1585 A macro definition may only exist at the root level, that is, not within
1586 a <<group,group>>, a <<repetition-block,repetition block>>, a
1587 <<conditional-block,conditional block>>, or another
1588 <<macro-definition-block,macro definition block>>.
1589
1590 All macro definitions must have unique names.
1591
1592 A macro definition is:
1593
1594 . The `!macro` or `!m` opening.
1595
1596 . A valid {py3} name (the macro name).
1597
1598 . The `(` parameter name list prefix.
1599
1600 . A comma-separated list of zero or more unique parameter names,
1601 each one being a valid {py3} name.
1602
1603 . The `)` parameter name list suffix.
1604
1605 . Zero or more items except, recursively, a macro definition block.
1606
1607 . The `!end` closing.
1608
1609 ====
1610 ----
1611 !macro bake()
1612 {le} {ICITTE * 8 : 16}
1613 u16le"predict explode"
1614 !end
1615 ----
1616 ====
1617
1618 ====
1619 ----
1620 !macro nail(rep, with_extra, val)
1621 {iter = 1}
1622
1623 !repeat rep
1624 {val + iter : uleb128}
1625 {0xdeadbeef : 32}
1626 {iter = iter + 1}
1627 !end
1628
1629 !if with_extra
1630 "meow mix\0"
1631 !end
1632 !end
1633 ----
1634 ====
1635
1636 === Macro expansion
1637
1638 A _macro expansion_ expands the items of a defined
1639 <<macro-definition-block,macro>>.
1640
1641 The macro to expand must be defined _before_ the expansion.
1642
1643 The <<state,state>> before handling the first item of the chosen macro
1644 is:
1645
1646 <<cur-offset,Current offset>>::
1647 Unchanged.
1648
1649 <<cur-bo,Current byte order>>::
1650 Unchanged.
1651
1652 Variables::
1653 The only available variables initially are the macro parameters.
1654
1655 Labels::
1656 None.
1657
1658 The state after having handled the last item of the chosen macro is:
1659
1660 Current offset::
1661 The one before handling the first item of the macro plus the size
1662 of the generated data of the macro expansion.
1663 +
1664 IMPORTANT: This means <<current-offset-setting,current offset setting>>
1665 items within the expanded macro don't impact the final current offset.
1666
1667 Current byte order::
1668 The one before handling the first item of the macro.
1669
1670 Variables::
1671 The ones before handling the first item of the macro.
1672
1673 Labels::
1674 The ones before handling the first item of the macro.
1675
1676 A macro expansion is:
1677
1678 . The `m:` prefix.
1679
1680 . A valid {py3} name (the name of the macro to expand).
1681
1682 . The `(` parameter value list prefix.
1683
1684 . A comma-separated list of zero or more unique parameter values.
1685 +
1686 The number of parameter values must match the number of parameter
1687 names of the definition of the chosen macro.
1688 +
1689 A parameter value is one of:
1690 +
1691 --
1692 * A <<const-int,constant integer>>, possibly negative.
1693
1694 * The ``pass:[{]`` prefix, a valid {py3} expression of which the
1695 evaluation result type is `int` or `bool` (automatically converted to
1696 `int`), and the ``pass:[}]`` suffix.
1697 +
1698 For a macro expansion at some source location{nbsp}__**L**__, this
1699 expression may contain:
1700
1701 ** The name of any <<label,label>> defined before{nbsp}__**L**__
1702 which isn't within a nested group.
1703 ** The name of any <<variable-assignment,variable>> known
1704 at{nbsp}__**L**__.
1705
1706 +
1707 The value of the special name `ICITTE` (`int` type) in this expression
1708 is the <<cur-offset,current offset>> (before handling the items of the
1709 chosen macro).
1710
1711 * A valid {py3} name.
1712 +
1713 For the name `__NAME__`, this is equivalent to the
1714 `pass:[{]__NAME__pass:[}]` form above.
1715 --
1716
1717 . The `)` parameter value list suffix.
1718
1719 ====
1720 Input:
1721
1722 ----
1723 !macro bake()
1724 {le} {ICITTE * 8 : 16}
1725 u16le"predict explode"
1726 !end
1727
1728 "hello [" m:bake() "] world"
1729
1730 m:bake() * 5
1731 ----
1732
1733 Output:
1734
1735 ----
1736 68 65 6c 6c 6f 20 5b 38 00 70 00 72 00 65 00 64 ┆ hello [8•p•r•e•d
1737 00 69 00 63 00 74 00 20 00 65 00 78 00 70 00 6c ┆ •i•c•t• •e•x•p•l
1738 00 6f 00 64 00 65 00 5d 20 77 6f 72 6c 64 70 01 ┆ •o•d•e•] worldp•
1739 70 00 72 00 65 00 64 00 69 00 63 00 74 00 20 00 ┆ p•r•e•d•i•c•t• •
1740 65 00 78 00 70 00 6c 00 6f 00 64 00 65 00 70 02 ┆ e•x•p•l•o•d•e•p•
1741 70 00 72 00 65 00 64 00 69 00 63 00 74 00 20 00 ┆ p•r•e•d•i•c•t• •
1742 65 00 78 00 70 00 6c 00 6f 00 64 00 65 00 70 03 ┆ e•x•p•l•o•d•e•p•
1743 70 00 72 00 65 00 64 00 69 00 63 00 74 00 20 00 ┆ p•r•e•d•i•c•t• •
1744 65 00 78 00 70 00 6c 00 6f 00 64 00 65 00 70 04 ┆ e•x•p•l•o•d•e•p•
1745 70 00 72 00 65 00 64 00 69 00 63 00 74 00 20 00 ┆ p•r•e•d•i•c•t• •
1746 65 00 78 00 70 00 6c 00 6f 00 64 00 65 00 70 05 ┆ e•x•p•l•o•d•e•p•
1747 70 00 72 00 65 00 64 00 69 00 63 00 74 00 20 00 ┆ p•r•e•d•i•c•t• •
1748 65 00 78 00 70 00 6c 00 6f 00 64 00 65 00 ┆ e•x•p•l•o•d•e•
1749 ----
1750 ====
1751
1752 ====
1753 Input:
1754
1755 ----
1756 !macro A(val, is_be)
1757 {le}
1758
1759 !if is_be
1760 {be}
1761 !end
1762
1763 {val : 16}
1764 !end
1765
1766 !macro B(rep, is_be)
1767 {iter = 1}
1768
1769 !repeat rep
1770 m:A({iter * 3}, is_be)
1771 {iter = iter + 1}
1772 !end
1773 !end
1774
1775 m:B(5, 1)
1776 m:B(3, 0)
1777 ----
1778
1779 Output:
1780
1781 ----
1782 00 03 00 06 00 09 00 0c 00 0f 03 00 06 00 09 00
1783 ----
1784 ====
1785
1786 === Post-item repetition
1787
1788 A _post-item repetition_ represents the bytes of an item repeated a
1789 given number of times.
1790
1791 A post-item repetition is:
1792
1793 . One of those items:
1794
1795 ** A <<byte-constant,byte constant>>.
1796 ** A <<literal-string,literal string>>.
1797 ** A <<fixed-length-number,fixed-length number>>.
1798 ** An <<leb128-integer,LEB128 integer>>.
1799 ** A <<macro-expansion,macro-expansion>>.
1800 ** A <<group,group>>.
1801
1802 . The ``pass:[*]`` character.
1803
1804 . One of:
1805
1806 ** A positive integer (hexadecimal starting with `0x` or `0X` accepted)
1807 which is the number of times to repeat the previous item.
1808
1809 ** The ``pass:[{]`` prefix, a valid {py3} expression of which the
1810 evaluation result type is `int` or `bool` (automatically converted to
1811 `int`), and the ``pass:[}]`` suffix.
1812 +
1813 For a post-item repetition at some source location{nbsp}__**L**__, this
1814 expression may contain:
1815 +
1816 --
1817 * The name of any <<label,label>> defined before{nbsp}__**L**__
1818 which isn't within a nested group and
1819 which isn't part of the repeated item.
1820 * The name of any <<variable-assignment,variable>> known
1821 at{nbsp}__**L**__, which isn't part of its repeated item, and which
1822 doesn't.
1823 --
1824 +
1825 The value of the special name `ICITTE` (`int` type) in this expression
1826 is the <<cur-offset,current offset>> (before handling the items to
1827 repeat).
1828
1829 ** A valid {py3} name.
1830 +
1831 For the name `__NAME__`, this is equivalent to the
1832 `pass:[{]__NAME__pass:[}]` form above.
1833
1834 You may also use a <<repetition-block,repetition block>>. The form
1835 ``__ITEM__{nbsp}pass:[*]{nbsp}__X__`` is equivalent to
1836 ``!repeat{nbsp}__X__{nbsp}__ITEM__{nbsp}!end``.
1837
1838 ====
1839 Input:
1840
1841 ----
1842 {end - ICITTE - 1 : 8} * 0x100 <end>
1843 ----
1844
1845 Output:
1846
1847 ----
1848 ff fe fd fc fb fa f9 f8 f7 f6 f5 f4 f3 f2 f1 f0 ┆ ••••••••••••••••
1849 ef ee ed ec eb ea e9 e8 e7 e6 e5 e4 e3 e2 e1 e0 ┆ ••••••••••••••••
1850 df de dd dc db da d9 d8 d7 d6 d5 d4 d3 d2 d1 d0 ┆ ••••••••••••••••
1851 cf ce cd cc cb ca c9 c8 c7 c6 c5 c4 c3 c2 c1 c0 ┆ ••••••••••••••••
1852 bf be bd bc bb ba b9 b8 b7 b6 b5 b4 b3 b2 b1 b0 ┆ ••••••••••••••••
1853 af ae ad ac ab aa a9 a8 a7 a6 a5 a4 a3 a2 a1 a0 ┆ ••••••••••••••••
1854 9f 9e 9d 9c 9b 9a 99 98 97 96 95 94 93 92 91 90 ┆ ••••••••••••••••
1855 8f 8e 8d 8c 8b 8a 89 88 87 86 85 84 83 82 81 80 ┆ ••••••••••••••••
1856 7f 7e 7d 7c 7b 7a 79 78 77 76 75 74 73 72 71 70 ┆ •~}|{zyxwvutsrqp
1857 6f 6e 6d 6c 6b 6a 69 68 67 66 65 64 63 62 61 60 ┆ onmlkjihgfedcba`
1858 5f 5e 5d 5c 5b 5a 59 58 57 56 55 54 53 52 51 50 ┆ _^]\[ZYXWVUTSRQP
1859 4f 4e 4d 4c 4b 4a 49 48 47 46 45 44 43 42 41 40 ┆ ONMLKJIHGFEDCBA@
1860 3f 3e 3d 3c 3b 3a 39 38 37 36 35 34 33 32 31 30 ┆ ?>=<;:9876543210
1861 2f 2e 2d 2c 2b 2a 29 28 27 26 25 24 23 22 21 20 ┆ /.-,+*)('&%$#"!
1862 1f 1e 1d 1c 1b 1a 19 18 17 16 15 14 13 12 11 10 ┆ ••••••••••••••••
1863 0f 0e 0d 0c 0b 0a 09 08 07 06 05 04 03 02 01 00 ┆ ••••••••••••••••
1864 ----
1865 ====
1866
1867 ====
1868 Input:
1869
1870 ----
1871 {times = 1}
1872 aa bb cc dd
1873 (
1874 <here>
1875 (ee ff) * {here + 1}
1876 11 22 33 * {times}
1877 {times = times + 1}
1878 ) * 3
1879 "coucou!"
1880 ----
1881
1882 Output:
1883
1884 ----
1885 aa bb cc dd ee ff ee ff ee ff ee ff ee ff 11 22 ┆ •••••••••••••••"
1886 33 ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ 3•••••••••••••••
1887 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1888 ff ee ff ee ff 11 22 33 33 ee ff ee ff ee ff ee ┆ ••••••"33•••••••
1889 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1890 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1891 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1892 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1893 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1894 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1895 ff ee ff ee ff ee ff ee ff ee ff ee ff 11 22 33 ┆ ••••••••••••••"3
1896 33 33 63 6f 75 63 6f 75 21 ┆ 33coucou!
1897 ----
1898 ====
1899
1900 == Command-line tool
1901
1902 If you <<install-normand,installed>> the `normand` package, then you
1903 can use the `normand` command-line tool:
1904
1905 ----
1906 $ normand <<< '"ma gang de malades"' | hexdump -C
1907 ----
1908
1909 ----
1910 00000000 6d 61 20 67 61 6e 67 20 64 65 20 6d 61 6c 61 64 |ma gang de malad|
1911 00000010 65 73 |es|
1912 ----
1913
1914 If you copy the `normand.py` module to your own project, then you can
1915 run the module itself:
1916
1917 ----
1918 $ python3 -m normand <<< '"ma gang de malades"' | hexdump -C
1919 ----
1920
1921 ----
1922 00000000 6d 61 20 67 61 6e 67 20 64 65 20 6d 61 6c 61 64 |ma gang de malad|
1923 00000010 65 73 |es|
1924 ----
1925
1926 Without a path argument, the `normand` tool reads from the standard
1927 input.
1928
1929 The `normand` tool prints the generated binary data to the standard
1930 output.
1931
1932 Various options control the initial <<state,state>> of the processor:
1933 use the `--help` option to learn more.
1934
1935 == {py3} API
1936
1937 The whole `normand` package/module public API is:
1938
1939 [source,python]
1940 ----
1941 # Byte order.
1942 class ByteOrder(enum.Enum):
1943 # Big endian.
1944 BE = ...
1945
1946 # Little endian.
1947 LE = ...
1948
1949
1950 # Text location.
1951 class TextLocation:
1952 # Line number.
1953 @property
1954 def line_no(self) -> int:
1955 ...
1956
1957 # Column number.
1958 @property
1959 def col_no(self) -> int:
1960 ...
1961
1962
1963 # Parsing error message.
1964 class ParseErrorMessage:
1965 # Message text.
1966 @property
1967 def text(self):
1968 ...
1969
1970 # Source text location.
1971 @property
1972 def text_location(self):
1973 ...
1974
1975
1976 # Parsing error.
1977 class ParseError(RuntimeError):
1978 # Parsing error messages.
1979 #
1980 # The first message is the most _specific_ one.
1981 @property
1982 def messages(self):
1983 ...
1984
1985
1986 # Variables dictionary type (for type hints).
1987 VariablesT = typing.Dict[str, typing.Union[int, float]]
1988
1989
1990 # Labels dictionary type (for type hints).
1991 LabelsT = typing.Dict[str, int]
1992
1993
1994 # Parsing result.
1995 class ParseResult:
1996 # Generated data.
1997 @property
1998 def data(self) -> bytearray:
1999 ...
2000
2001 # Updated variable values.
2002 @property
2003 def variables(self) -> SymbolsT:
2004 ...
2005
2006 # Updated main group label values.
2007 @property
2008 def labels(self) -> SymbolsT:
2009 ...
2010
2011 # Final offset.
2012 @property
2013 def offset(self) -> int:
2014 ...
2015
2016 # Final byte order.
2017 @property
2018 def byte_order(self) -> typing.Optional[ByteOrder]:
2019 ...
2020
2021
2022 # Parses the `normand` input using the initial state defined by
2023 # `init_variables`, `init_labels`, `init_offset`, and `init_byte_order`,
2024 # and returns the corresponding parsing result.
2025 def parse(normand: str,
2026 init_variables: typing.Optional[SymbolsT] = None,
2027 init_labels: typing.Optional[SymbolsT] = None,
2028 init_offset: int = 0,
2029 init_byte_order: typing.Optional[ByteOrder] = None) -> ParseResult:
2030 ...
2031 ----
2032
2033 The `normand` parameter is the actual <<learn-normand,Normand input>>
2034 while the other parameters control the initial <<state,state>>.
2035
2036 The `parse()` function raises a `ParseError` instance should it fail to
2037 parse the `normand` string for any reason.
2038
2039 == Development
2040
2041 Normand is a https://python-poetry.org/[Poetry] project.
2042
2043 To develop it, install it through Poetry and enter the virtual
2044 environment:
2045
2046 ----
2047 $ poetry install
2048 $ poetry shell
2049 $ normand <<< '"lol" * 10 0a'
2050 ----
2051
2052 `normand.py` is processed by:
2053
2054 * https://microsoft.github.io/pyright/[Pyright]
2055 * https://github.com/psf/black[Black]
2056 * https://pycqa.github.io/isort/[isort]
2057
2058 === Testing
2059
2060 Use https://docs.pytest.org/[pytest] to test Normand once the package is
2061 part of your virtual environment, for example:
2062
2063 ----
2064 $ poetry install
2065 $ poetry run pip3 install pytest
2066 $ poetry run pytest
2067 ----
2068
2069 The `pytest` project is currently not a development dependency in
2070 `pyproject.toml` due to backward compatibiliy issues with
2071 Python{nbsp}3.4.
2072
2073 In the `tests` directory, each `*.nt` file is a test. The file name
2074 prefix indicates what it's meant to test:
2075
2076 `pass-`::
2077 Everything above the `---` line is the valid Normand input
2078 to test.
2079 +
2080 Everything below the `---` line is the expected data
2081 (whitespace-separated hexadecimal bytes).
2082
2083 `fail-`::
2084 Everything above the `---` line is the invalid Normand input
2085 to test.
2086 +
2087 Everything below the `---` line is the expected error message having
2088 this form:
2089 +
2090 ----
2091 LINE:COL - MESSAGE
2092 ----
2093
2094 === Contributing
2095
2096 Normand uses https://review.lttng.org/admin/repos/normand,general[Gerrit]
2097 for code review.
2098
2099 To report a bug, https://github.com/efficios/normand/issues/new[create a
2100 GitHub issue].
This page took 0.069876 seconds and 5 git commands to generate.