tests/test_api.py: apply Black and Pyright
[normand.git] / README.adoc
1 // Show ToC at a specific location for a GitHub rendering
2 ifdef::env-github[]
3 :toc: macro
4 endif::env-github[]
5
6 ifndef::env-github[]
7 :toc: left
8 endif::env-github[]
9
10 // This is to mimic what GitHub does so that anchors work in an offline
11 // rendering too.
12 :idprefix:
13 :idseparator: -
14
15 // Other attributes
16 :py3: Python{nbsp}3
17
18 = Normand
19 Philippe Proulx
20
21 image::normand-logo.png[]
22
23 [.normal]
24 image:https://img.shields.io/pypi/v/normand.svg?label=Latest%20version[link="https://pypi.python.org/pypi/normand"]
25
26 [.lead]
27 _**Normand**_ is a text-to-binary processor with its own language.
28
29 This package offers both a portable {py3} module and a command-line
30 tool.
31
32 WARNING: This version of Normand is 0.6, meaning both the Normand
33 language and the module/CLI interface aren't stable.
34
35 ifdef::env-github[]
36 // ToC location for a GitHub rendering
37 toc::[]
38 endif::env-github[]
39
40 == Introduction
41
42 The purpose of Normand is to consume human-readable text representing
43 bytes and to produce the corresponding binary data.
44
45 .Simple bytes input.
46 ====
47 Consider the following Normand input:
48
49 ----
50 4f 55 32 bb $167 fe %10100111 a9 $-32
51 ----
52
53 The generated nine bytes are:
54
55 ----
56 4f 55 32 bb a7 fe a7 a9 e0
57 ----
58 ====
59
60 As you can see in the last example, the fundamental unit of the Normand
61 language is the _byte_. The order in which you list bytes will be the
62 order of the generated data.
63
64 The Normand language is more than simple lists of bytes, though. Its
65 main features are:
66
67 Comments, including a bunch of insignificant symbols which may improve readability::
68 +
69 Input:
70 +
71 ----
72 ff bb %1101:0010 # This is a comment
73 78 29 af $192 # This too # 99 $-80
74 fe80::6257:18ff:fea3:4229
75 60:57:18:a3:42:29
76 10839636-5d65-4a68-8e6a-21608ddf7258
77 ----
78 +
79 Output:
80 +
81 ----
82 ff bb d2 78 29 af c0 99 b0 fe 80 62 57 18 ff fe
83 a3 42 29 60 57 18 a3 42 29 10 83 96 36 5d 65 4a
84 68 8e 6a 21 60 8d df 72 58
85 ----
86
87 Hexadecimal, decimal, and binary byte constants::
88 +
89 Input:
90 +
91 ----
92 aa bb $247 $-89 %0011_0010 %11.01= 10/10
93 ----
94 +
95 Output:
96 +
97 ----
98 aa bb f7 a7 32 da
99 ----
100
101 UTF-8, UTF-16, and UTF-32 literal strings::
102 +
103 Input:
104 +
105 ----
106 "hello world!" 00
107 u16le"stress\nverdict 🤣"
108 ----
109 +
110 Output:
111 +
112 ----
113 68 65 6c 6c 6f 20 77 6f 72 6c 64 21 00 73 00 74 ┆ hello world!•s•t
114 00 72 00 65 00 73 00 73 00 0a 00 76 00 65 00 72 ┆ •r•e•s•s•••v•e•r
115 00 64 00 69 00 63 00 74 00 20 00 3e d8 23 dd ┆ •d•i•c•t• •>•#•
116 ----
117
118 Labels: special variables holding the offset where they're defined::
119 +
120 ----
121 <beg> b2 52 e3 bc 91 05
122 $100 $50 <chair> 33 9f fe
123 25 e9 89 8a <end>
124 ----
125
126 Variables::
127 +
128 ----
129 5e 65 {tower = 47} c6 7f f2 c4
130 44 {hurl = tower - 14} b5 {tower = hurl} 26 2d
131 ----
132 +
133 The value of a variable assignment is the evaluation of a valid {py3}
134 expression which may include label and variable names.
135
136 Fixed-length number with a given length (8{nbsp}bits to 64{nbsp}bits) and byte order::
137 +
138 Input:
139 +
140 ----
141 {strength = 4}
142 {be} 67 <lbl> 44 $178 {(end - lbl) * 8 + strength : 16} $99 <end>
143 {le} {-1993 : 32}
144 {-3.141593 : 64}
145 ----
146 +
147 Output:
148 +
149 ----
150 67 44 b2 00 2c 63 37 f8 ff ff 7f bd c2 82 fb 21
151 09 c0
152 ----
153 +
154 The encoded number is the evaluation of a valid {py3} expression which
155 may include label and variable names.
156
157 https://en.wikipedia.org/wiki/LEB128[LEB128] integer::
158 +
159 Input:
160 +
161 ----
162 aa bb cc {-1993 : sleb128} <meow> dd ee ff
163 {meow * 199 : uleb128}
164 ----
165 +
166 Output:
167 +
168 ----
169 aa bb cc b7 70 dd ee ff e3 07
170 ----
171 +
172 The encoded integer is the evaluation of a valid {py3} expression which
173 may include label and variable names.
174
175 Repetition::
176 +
177 Input:
178 +
179 ----
180 aa bb * 5 cc <zoom> "yeah\0" * {zoom * 3}
181 ----
182 +
183 Output:
184 +
185 ----
186 aa bb bb bb bb bb cc 79 65 61 68 00 79 65 61 68 ┆ •••••••yeah•yeah
187 00 79 65 61 68 00 79 65 61 68 00 79 65 61 68 00 ┆ •yeah•yeah•yeah•
188 79 65 61 68 00 79 65 61 68 00 79 65 61 68 00 79 ┆ yeah•yeah•yeah•y
189 65 61 68 00 79 65 61 68 00 79 65 61 68 00 79 65 ┆ eah•yeah•yeah•ye
190 61 68 00 79 65 61 68 00 79 65 61 68 00 79 65 61 ┆ ah•yeah•yeah•yea
191 68 00 79 65 61 68 00 79 65 61 68 00 79 65 61 68 ┆ h•yeah•yeah•yeah
192 00 79 65 61 68 00 79 65 61 68 00 79 65 61 68 00 ┆ •yeah•yeah•yeah•
193 ----
194
195
196 Multilevel grouping::
197 +
198 Input:
199 +
200 ----
201 ff ((aa bb "zoom" cc) * 5) * 3 $-34 * 4
202 ----
203 +
204 Output:
205 +
206 ----
207 ff aa bb 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa ┆ •••zoom•••zoom••
208 bb 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa bb 7a ┆ •zoom•••zoom•••z
209 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa bb 7a 6f 6f ┆ oom•••zoom•••zoo
210 6d cc aa bb 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc ┆ m•••zoom•••zoom•
211 aa bb 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa bb ┆ ••zoom•••zoom•••
212 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa bb 7a 6f ┆ zoom•••zoom•••zo
213 6f 6d cc aa bb 7a 6f 6f 6d cc de de de de ┆ om•••zoom•••••
214 ----
215
216 Precise error reporting::
217 +
218 ----
219 /tmp/meow.normand:10:24 - Expecting a bit (`0` or `1`).
220 ----
221 +
222 ----
223 /tmp/meow.normand:32:6 - Unexpected character `k`.
224 ----
225 +
226 ----
227 /tmp/meow.normand:24:19 - Illegal (unknown or unreachable) variable/label name `meow` in expression `(meow - 45) // 8`; the legal names are {`mix`, `zoom`}.
228 ----
229 +
230 ----
231 /tmp/meow.normand:18:9 - Value 315 is outside the 8-bit range when evaluating expression `end - ICITTE` at byte offset 45.
232 ----
233
234 You can use Normand to track data source files in your favorite VCS
235 instead of raw binary files. The binary files that Normand generates can
236 be used to test file format decoding, including malformatted data, for
237 example, as well as for education.
238
239 See <<learn-normand>> to explore all the Normand features.
240
241 == Install Normand
242
243 Normand requires Python ≥ 3.4.
244
245 To install Normand:
246
247 ----
248 $ python3 -m pip install --user normand
249 ----
250
251 See
252 https://packaging.python.org/en/latest/tutorials/installing-packages/#installing-to-the-user-site[Installing to the User Site]
253 to learn more about a user site installation.
254
255 [NOTE]
256 ====
257 Normand has a single module file, `normand.py`, which you can copy as is
258 to your project to use it (both the <<python3-api,`normand.parse()`>>
259 function and the <<command-line-tool,command-line tool>>).
260
261 `normand.py` has _no external dependencies_, but if you're using
262 Python{nbsp}3.4, you'll need a local copy of the standard `typing`
263 module.
264 ====
265
266 == Learn Normand
267
268 A Normand text input is a sequence of items which represent a sequence
269 of raw bytes.
270
271 [[state]] During the processing of items to data, Normand relies on a
272 current state:
273
274 [%header%autowidth]
275 |===
276 |State variable |Description |Initial value: <<python3-api,{py3} API>> |Initial value: <<command-line-tool,CLI>>
277
278 |[[cur-offset]] Current offset
279 |
280 The current offset has an effect on the value of <<label,labels>> and of
281 the special `ICITTE` name in <<fixed-length-number,fixed-length
282 number>>, <<leb-128-integer,LEB128 integer>>, and
283 <<variable-assignment,variable assignment>> expression evaluation.
284
285 Each generated byte increments the current offset.
286
287 A <<current-offset-setting,current offset setting>> may change the
288 current offset.
289 |`init_offset` parameter of the `parse()` function.
290 |`--offset` option.
291
292 |[[cur-bo]] Current byte order
293 |
294 The current byte order has an effect on the encoding of
295 <<fixed-length-number,fixed-length numbers>>.
296
297 A <<current-byte-order-setting,current byte order setting>> may change
298 the current byte order.
299 |`init_byte_order` parameter of the `parse()` function.
300 |`--byte-order` option.
301
302 |<<label,Labels>>
303 |Mapping of label names to integral values.
304 |`init_labels` parameter of the `parse()` function.
305 |One or more `--label` options.
306
307 |<<variable-assignment,Variables>>
308 |Mapping of variable names to integral values.
309 |`init_variables` parameter of the `parse()` function.
310 |One or more `--var` options.
311 |===
312
313 The available items are:
314
315 * A <<byte-constant,constant integer>> representing a single byte.
316
317 * A <<literal-string,literal string>> representing a sequence of bytes
318 encoding UTF-8, UTF-16, or UTF-32 data.
319
320 * A <<current-byte-order-setting,current byte order setting>> (big or
321 little endian).
322
323 * A <<fixed-length-number,fixed-length number>> (integer or
324 floating point) using the <<cur-bo,current byte order>> and of which
325 the value is the result of a {py3} expression.
326
327 * An <<leb128-integer,LEB128 integer>> of which the value is the result
328 of a {py3} expression.
329
330 * A <<current-offset-setting,current offset setting>>.
331
332 * A <<label,label>>, that is, a named constant holding the current
333 offset.
334 +
335 This is similar to an assembly label.
336
337 * A <<variable-assignment,variable assignment>> associating a name to
338 the integral result of an evaluated {py3} expression.
339
340 * A <<group,group>>, that is, a scoped sequence of items.
341
342 Moreover, you can <<repetition,repeat>> any item above, except an offset
343 or a label, a given fixed or variable number of times. This is called a
344 repetition.
345
346 A Normand comment may exist:
347
348 * Between items, possibly within a group.
349 * Between the nibbles of a constant hexadecimal byte.
350 * Between the bits of a constant binary byte.
351 * Between the last item and the ``pass:[*]`` character of a repetition,
352 and between that ``pass:[*]`` character and the following number
353 or expression.
354
355 A comment is anything between two ``pass:[#]`` characters on the same
356 line, or from ``pass:[#]`` until the end of the line. Whitespaces and
357 the following symbol characters are also considered comments where a
358 comment may exist:
359
360 ----
361 ! @ / \ ? & : ; . , + [ ] _ = | -
362 ----
363
364 The latter serve to improve readability so that you may write, for
365 example, a MAC address or a UUID as is.
366
367 You can test the examples of this section with the `normand`
368 <<command-line-tool,command-line tool>> as such:
369
370 ----
371 $ normand file | hexdump -C
372 ----
373
374 where `file` is the name of a file containing the Normand input.
375
376 === Byte constant
377
378 A _byte constant_ represents a single byte.
379
380 A byte constant is:
381
382 Hexadecimal form::
383 Two consecutive hexits.
384
385 Decimal form::
386 A decimal number after the `$` prefix.
387
388 Binary form::
389 Eight bits after the `%` prefix.
390
391 ====
392 Input:
393
394 ----
395 ab cd [3d 8F] CC
396 ----
397
398 Output:
399
400 ----
401 ab cd 3d 8f cc
402 ----
403 ====
404
405 ====
406 Input:
407
408 ----
409 $192 %1100/0011 $ -77
410 ----
411
412 Output:
413
414 ----
415 c0 c3 b3
416 ----
417 ====
418
419 ====
420 Input:
421
422 ----
423 58f64689-6316-4d55-8a1a-04cada366172
424 fe80::6257:18ff:fea3:4229
425 ----
426
427 Output:
428
429 ----
430 58 f6 46 89 63 16 4d 55 8a 1a 04 ca da 36 61 72 ┆ X•F•c•MU•••••6ar
431 fe 80 62 57 18 ff fe a3 42 29 ┆ ••bW••••B)
432 ----
433 ====
434
435 ====
436 Input:
437
438 ----
439 %01110011 %01100001 %01101100 %01110101 %01110100
440 ----
441
442 Output:
443
444 ----
445 73 61 6c 75 74 ┆ salut
446 ----
447 ====
448
449 === Literal string
450
451 A _literal string_ represents the UTF-8-, UTF-16-, or UTF-32-encoded
452 bytes of a string.
453
454 The string to encode isn't implicitly null-terminated: use `\0` at the
455 end of the string to add a null character.
456
457 A literal string is:
458
459 . **Optional**: one of the following encodings instead of UTF-8:
460 +
461 --
462 [horizontal]
463 `u16be`:: UTF-16BE.
464 `u16le`:: UTF-16LE.
465 `u32be`:: UTF-32BE.
466 `u32le`:: UTF-32LE.
467 --
468
469 . The ``pass:["]`` prefix.
470
471 . A sequence of zero or more characters, possibly containing escape
472 sequences.
473 +
474 An escape sequence is the ``\`` character followed by one of:
475 +
476 --
477 [horizontal]
478 `0`:: Null (U+0000)
479 `a`:: Alert (U+0007)
480 `b`:: Backspace (U+0008)
481 `e`:: Escape (U+001B)
482 `f`:: Form feed (U+000C)
483 `n`:: End of line (U+000A)
484 `r`:: Carriage return (U+000D)
485 `t`:: Character tabulation (U+0009)
486 `v`:: Line tabulation (U+000B)
487 ``\``:: Reverse solidus (U+005C)
488 ``pass:["]``:: Quotation mark (U+0022)
489 --
490
491 . The ``pass:["]`` suffix.
492
493 ====
494 Input:
495
496 ----
497 "coucou tout le monde!"
498 ----
499
500 Output:
501
502 ----
503 63 6f 75 63 6f 75 20 74 6f 75 74 20 6c 65 20 6d ┆ coucou tout le m
504 6f 6e 64 65 21 ┆ onde!
505 ----
506 ====
507
508 ====
509 Input:
510
511 ----
512 u16le"I am not young enough to know everything."
513 ----
514
515 Output:
516
517 ----
518 49 00 20 00 61 00 6d 00 20 00 6e 00 6f 00 74 00 ┆ I• •a•m• •n•o•t•
519 20 00 79 00 6f 00 75 00 6e 00 67 00 20 00 65 00 ┆ •y•o•u•n•g• •e•
520 6e 00 6f 00 75 00 67 00 68 00 20 00 74 00 6f 00 ┆ n•o•u•g•h• •t•o•
521 20 00 6b 00 6e 00 6f 00 77 00 20 00 65 00 76 00 ┆ •k•n•o•w• •e•v•
522 65 00 72 00 79 00 74 00 68 00 69 00 6e 00 67 00 ┆ e•r•y•t•h•i•n•g•
523 2e 00 ┆ .•
524 ----
525 ====
526
527 ====
528 Input:
529
530 ----
531 u32be "\"illusion is the first\nof all pleasures\" 🦉"
532 ----
533
534 Output:
535
536 ----
537 00 00 00 22 00 00 00 69 00 00 00 6c 00 00 00 6c ┆ •••"•••i•••l•••l
538 00 00 00 75 00 00 00 73 00 00 00 69 00 00 00 6f ┆ •••u•••s•••i•••o
539 00 00 00 6e 00 00 00 20 00 00 00 69 00 00 00 73 ┆ •••n••• •••i•••s
540 00 00 00 20 00 00 00 74 00 00 00 68 00 00 00 65 ┆ ••• •••t•••h•••e
541 00 00 00 20 00 00 00 66 00 00 00 69 00 00 00 72 ┆ ••• •••f•••i•••r
542 00 00 00 73 00 00 00 74 00 00 00 0a 00 00 00 6f ┆ •••s•••t•••••••o
543 00 00 00 66 00 00 00 20 00 00 00 61 00 00 00 6c ┆ •••f••• •••a•••l
544 00 00 00 6c 00 00 00 20 00 00 00 70 00 00 00 6c ┆ •••l••• •••p•••l
545 00 00 00 65 00 00 00 61 00 00 00 73 00 00 00 75 ┆ •••e•••a•••s•••u
546 00 00 00 72 00 00 00 65 00 00 00 73 00 00 00 22 ┆ •••r•••e•••s•••"
547 00 00 00 20 00 01 f9 89 ┆ ••• ••••
548 ----
549 ====
550
551 === Current byte order setting
552
553 This special item sets the <<cur-bo,_current byte order_>>.
554
555 The two accepted forms are:
556
557 [horizontal]
558 ``pass:[{be}]``:: Set the current byte order to big endian.
559 ``pass:[{le}]``:: Set the current byte order to little endian.
560
561 === Fixed-length number
562
563 A _fixed-length number_ represents a fixed number of bytes encoding
564 either:
565
566 * An unsigned or signed integer (two's complement).
567 +
568 The available lengths are 8, 16, 24, 32, 40, 48, 56, and 64.
569
570 * A floating point number
571 ([IEEE{nbsp}754-2008[https://standards.ieee.org/standard/754-2008.html]).
572 +
573 The available length are 32 (_binary32_) and 64 (_binary64_).
574
575 The value is the result of evaluating a {py3} expression using the
576 <<cur-bo,current byte order>>.
577
578 A fixed-length number is:
579
580 . The ``pass:[{]`` prefix.
581
582 . A valid {py3} expression.
583 +
584 For a fixed-length number at some source location{nbsp}__**L**__, this
585 expression may contain the name of any accessible <<label,label>> (not
586 within a nested group), including the name of a label defined
587 after{nbsp}__**L**__, as well as the name of any
588 <<variable-assignment,variable>> known at{nbsp}__**L**__.
589 +
590 The value of the special name `ICITTE` (`int` type) in this expression
591 is the <<cur-offset,current offset>> (before encoding the number).
592
593 . The `:` character.
594
595 . An encoding length in bits amongst:
596 +
597 --
598 The expression evaluates to an `int` value::
599 `8`, `16`, `24`, `32`, `40`, `48`, `56`, and `64`.
600
601 The expression evaluates to a `float` value::
602 `32` and `64`.
603 --
604
605 . The `}` suffix.
606
607 ====
608 Input:
609
610 ----
611 {le} {345:16}
612 {be} {-0xabcd:32}
613 ----
614
615 Output:
616
617 ----
618 59 01 ff ff 54 33
619 ----
620 ====
621
622 ====
623 Input:
624
625 ----
626 {be}
627
628 # String length in bits
629 {8 * (str_end - str_beg) : 16}
630
631 # String
632 <str_beg>
633 "hello world!"
634 <str_end>
635 ----
636
637 Output:
638
639 ----
640 00 60 68 65 6c 6c 6f 20 77 6f 72 6c 64 21 ┆ •`hello world!
641 ----
642 ====
643
644 ====
645 Input:
646
647 ----
648 {20 - ICITTE : 8} * 10
649 ----
650
651 Output:
652
653 ----
654 14 13 12 11 10 0f 0e 0d 0c 0b
655 ----
656 ====
657
658 ====
659 Input:
660
661 ----
662 {le}
663 {2 * 0.0529 : 32}
664 ----
665
666 Output:
667
668 ----
669 ac ad d8 3d
670 ----
671 ====
672
673 === LEB128 integer
674
675 An _LEB128 integer_ represents a variable number of bytes encoding an
676 unsigned or signed integer which is the result of evaluating a {py3}
677 expression following the https://en.wikipedia.org/wiki/LEB128[LEB128]
678 format.
679
680 An LEB128 integer is:
681
682 . The ``pass:[{]`` prefix.
683
684 . A valid {py3} expression.
685 +
686 For an LEB128 integer at some source location{nbsp}__**L**__, this
687 expression may contain:
688 +
689 --
690 * The name of any <<label,label>> defined before{nbsp}__**L**__.
691 * The name of any <<variable-assignment,variable>> known at{nbsp}__**L**__
692 which doesn't, directly or indirectly, refer to a label
693 defined after{nbsp}__**L**__.
694 --
695 +
696 The value of the special name `ICITTE` (`int` type) in this expression
697 is the <<cur-offset,current offset>> (before encoding the integer).
698
699 . The `:` character.
700
701 . One of:
702 +
703 --
704 [horizontal]
705 `uleb128`:: Use the unsigned LEB128 format.
706 `sleb128`:: Use the signed LEB128 format.
707 --
708
709 . The `}` suffix.
710
711 ====
712 Input:
713
714 ----
715 {624485 : uleb128}
716 ----
717
718 Output:
719
720 ----
721 e5 8e 26
722 ----
723 ====
724
725 ====
726 Input:
727
728 ----
729 aa bb cc dd
730 <meow>
731 ee ff
732 {-981238311 + (meow * -23) : sleb128}
733 "hello"
734 ----
735
736 Output:
737
738 ----
739 aa bb cc dd ee ff fd fa 8d ac 7c 68 65 6c 6c 6f ┆ ••••••••••|hello
740 ----
741 ====
742
743 === Current offset setting
744
745 This special item sets the <<cur-offset,_current offset_>>.
746
747 A current offset setting is:
748
749 . The `<` prefix.
750
751 . A positive integer (hexadecimal starting with `0x` or `0X` accepted)
752 which is the new current offset.
753
754 . The `>` suffix.
755
756 ====
757 Input:
758
759 ----
760 {ICITTE : 8} * 8
761 <0x61> {ICITTE : 8} * 8
762 ----
763
764 Output:
765
766 ----
767 00 01 02 03 04 05 06 07 61 62 63 64 65 66 67 68 ┆ ••••••••abcdefgh
768 ----
769 ====
770
771 ====
772 Input:
773
774 ----
775 aa bb cc dd <meow> ee ff
776 <12> 11 22 33 <mix> 44 55
777 {meow : 8} {mix : 8}
778 ----
779
780 Output:
781
782 ----
783 aa bb cc dd ee ff 11 22 33 44 55 04 0f ┆ •••••••"3DU••
784 ----
785 ====
786
787 === Label
788
789 A _label_ associates a name to the <<cur-offset,current offset>>.
790
791 All the labels of a whole Normand input must have unique names.
792
793 A label must not share the name of a <<variable-assignment,variable>>
794 name.
795
796 A label is:
797
798 . The `<` prefix.
799
800 . A valid {py3} name which is not `ICITTE` (see
801 <<fixed-length-number>>, <<leb128-integer>>, and
802 <<variable-assignment>> to learn more).
803
804 . The `>` suffix.
805
806 === Variable assignment
807
808 A _variable assignment_ associates a name to the integral result of an
809 evaluated {py3} expression.
810
811 A variable assignment is:
812
813 . The ``pass:[{]`` prefix.
814
815 . A valid {py3} name which is not `ICITTE` (see
816 <<fixed-length-number>>, <<leb128-integer>>, and
817 <<variable-assignment>> to learn more).
818
819 . The `=` character.
820
821 . A valid {py3} expression.
822 +
823 For a variable assignment at some source location{nbsp}__**L**__, this
824 expression may contain the name of any accessible <<label,label>> (not
825 within a nested group), including the name of a label defined
826 after{nbsp}__**L**__, as well as the name of any
827 <<variable-assignment,variable>> known at{nbsp}__**L**__.
828 +
829 The value of the special name `ICITTE` (`int` type) in this expression
830 is the <<cur-offset,current offset>>.
831
832 . The `}` suffix.
833
834 ====
835 Input:
836
837 ----
838 {mix = 101} {le}
839 {meow = 42} 11 22 {meow:8} 33 {meow = ICITTE + 17}
840 "yooo" {meow + mix : 16}
841 ----
842
843 Output:
844
845 ----
846 11 22 2a 33 79 6f 6f 6f 7a 00 ┆ •"*3yoooz•
847 ----
848 ====
849
850 === Group
851
852 A _group_ is a scoped sequence of items.
853
854 The <<label,labels>> within a group aren't visible outside of it.
855
856 The main purpose of a group is to <<repetition,repeat>> more than a
857 single item.
858
859 A group is:
860
861 . The `(` prefix.
862
863 . Zero or more items.
864
865 . The `)` suffix.
866
867 ====
868 Input:
869
870 ----
871 ((aa bb cc) dd () ee) "leclerc"
872 ----
873
874 Output:
875
876 ----
877 aa bb cc dd ee 6c 65 63 6c 65 72 63 ┆ •••••leclerc
878 ----
879 ====
880
881 ====
882 Input:
883
884 ----
885 ((aa bb cc) * 3 dd ee) * 5
886 ----
887
888 Output:
889
890 ----
891 aa bb cc aa bb cc aa bb cc dd ee aa bb cc aa bb
892 cc aa bb cc dd ee aa bb cc aa bb cc aa bb cc dd
893 ee aa bb cc aa bb cc aa bb cc dd ee aa bb cc aa
894 bb cc aa bb cc dd ee
895 ----
896 ====
897
898 ====
899 Input:
900
901 ----
902 {be}
903 (
904 <str_beg> u16le"sébastien diaz" <str_end>
905 {ICITTE - str_beg : 8}
906 {(end - str_beg) * 5 : 24}
907 ) * 3
908 <end>
909 ----
910
911 Output:
912
913 ----
914 73 00 e9 00 62 00 61 00 73 00 74 00 69 00 65 00 ┆ s•••b•a•s•t•i•e•
915 6e 00 20 00 64 00 69 00 61 00 7a 00 1c 00 01 e0 ┆ n• •d•i•a•z•••••
916 73 00 e9 00 62 00 61 00 73 00 74 00 69 00 65 00 ┆ s•••b•a•s•t•i•e•
917 6e 00 20 00 64 00 69 00 61 00 7a 00 1c 00 01 40 ┆ n• •d•i•a•z••••@
918 73 00 e9 00 62 00 61 00 73 00 74 00 69 00 65 00 ┆ s•••b•a•s•t•i•e•
919 6e 00 20 00 64 00 69 00 61 00 7a 00 1c 00 00 a0 ┆ n• •d•i•a•z•••••
920 ----
921 ====
922
923 === Repetition
924
925 A _repetition_ represents the bytes of an item repeated a given number
926 of times.
927
928 A repetition is:
929
930 . Any item.
931
932 . The ``pass:[*]`` character.
933
934 . One of:
935
936 ** A positive integer (hexadecimal starting with `0x` or `0X` accepted)
937 which is the number of times to repeat the previous item.
938
939 ** The ``pass:[{]`` prefix, a valid {py3} expression, and the
940 ``pass:[}]`` suffix.
941 +
942 For a repetition at some source location{nbsp}__**L**__, this expression
943 may contain:
944 +
945 --
946 * The name of any <<label,label>> defined before{nbsp}__**L**__ and
947 which isn't part of its repeated item.
948 * The name of any <<variable-assignment,variable>> known
949 at{nbsp}__**L**__, which isn't part of its repeated item, and which
950 doesn't, directly or indirectly, refer to a label defined
951 after{nbsp}__**L**__.
952 --
953 +
954 This expression must not contain the special name `ICITTE`.
955
956 ====
957 Input:
958
959 ----
960 {end - ICITTE - 1 : 8} * 0x100 <end>
961 ----
962
963 Output:
964
965 ----
966 ff fe fd fc fb fa f9 f8 f7 f6 f5 f4 f3 f2 f1 f0 ┆ ••••••••••••••••
967 ef ee ed ec eb ea e9 e8 e7 e6 e5 e4 e3 e2 e1 e0 ┆ ••••••••••••••••
968 df de dd dc db da d9 d8 d7 d6 d5 d4 d3 d2 d1 d0 ┆ ••••••••••••••••
969 cf ce cd cc cb ca c9 c8 c7 c6 c5 c4 c3 c2 c1 c0 ┆ ••••••••••••••••
970 bf be bd bc bb ba b9 b8 b7 b6 b5 b4 b3 b2 b1 b0 ┆ ••••••••••••••••
971 af ae ad ac ab aa a9 a8 a7 a6 a5 a4 a3 a2 a1 a0 ┆ ••••••••••••••••
972 9f 9e 9d 9c 9b 9a 99 98 97 96 95 94 93 92 91 90 ┆ ••••••••••••••••
973 8f 8e 8d 8c 8b 8a 89 88 87 86 85 84 83 82 81 80 ┆ ••••••••••••••••
974 7f 7e 7d 7c 7b 7a 79 78 77 76 75 74 73 72 71 70 ┆ •~}|{zyxwvutsrqp
975 6f 6e 6d 6c 6b 6a 69 68 67 66 65 64 63 62 61 60 ┆ onmlkjihgfedcba`
976 5f 5e 5d 5c 5b 5a 59 58 57 56 55 54 53 52 51 50 ┆ _^]\[ZYXWVUTSRQP
977 4f 4e 4d 4c 4b 4a 49 48 47 46 45 44 43 42 41 40 ┆ ONMLKJIHGFEDCBA@
978 3f 3e 3d 3c 3b 3a 39 38 37 36 35 34 33 32 31 30 ┆ ?>=<;:9876543210
979 2f 2e 2d 2c 2b 2a 29 28 27 26 25 24 23 22 21 20 ┆ /.-,+*)('&%$#"!
980 1f 1e 1d 1c 1b 1a 19 18 17 16 15 14 13 12 11 10 ┆ ••••••••••••••••
981 0f 0e 0d 0c 0b 0a 09 08 07 06 05 04 03 02 01 00 ┆ ••••••••••••••••
982 ----
983 ====
984
985 ====
986 Input:
987
988 ----
989 {times = 1}
990 aa bb cc dd
991 (
992 <here>
993 (ee ff) * {here + 1}
994 11 22 33 * {times}
995 {times = times + 1}
996 ) * 3
997 "coucou!"
998 ----
999
1000 Output:
1001
1002 ----
1003 aa bb cc dd ee ff ee ff ee ff ee ff ee ff 11 22 ┆ •••••••••••••••"
1004 33 ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ 3•••••••••••••••
1005 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1006 ff ee ff ee ff 11 22 33 33 ee ff ee ff ee ff ee ┆ ••••••"33•••••••
1007 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1008 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1009 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1010 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1011 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1012 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1013 ff ee ff ee ff ee ff ee ff ee ff ee ff 11 22 33 ┆ ••••••••••••••"3
1014 33 33 63 6f 75 63 6f 75 21 ┆ 33coucou!
1015 ----
1016 ====
1017
1018 ====
1019 This example shows how to use a repetition as a conditional section
1020 depending on some predefined variable.
1021
1022 Input:
1023
1024 ----
1025 aa bb cc dd
1026 (ee ff "meow mix" 00) * {cond}
1027 {be} {-1993:16}
1028 ----
1029
1030 Output (`cond` is 0):
1031
1032 ----
1033 aa bb cc dd f8 37
1034 ----
1035
1036 Output (`cond` is 1):
1037
1038 ----
1039 aa bb cc dd ee ff 6d 65 6f 77 20 6d 69 78 00 f8 ┆ ••••••meow mix••
1040 37 ┆ 7
1041 ----
1042 ====
1043
1044 == Command-line tool
1045
1046 If you <<install-normand,installed>> the `normand` package, then you
1047 can use the `normand` command-line tool:
1048
1049 ----
1050 $ normand <<< '"ma gang de malades"' | hexdump -C
1051 ----
1052
1053 ----
1054 00000000 6d 61 20 67 61 6e 67 20 64 65 20 6d 61 6c 61 64 |ma gang de malad|
1055 00000010 65 73 |es|
1056 ----
1057
1058 If you copy the `normand.py` module to your own project, then you can
1059 run the module itself:
1060
1061 ----
1062 $ python3 -m normand <<< '"ma gang de malades"' | hexdump -C
1063 ----
1064
1065 ----
1066 00000000 6d 61 20 67 61 6e 67 20 64 65 20 6d 61 6c 61 64 |ma gang de malad|
1067 00000010 65 73 |es|
1068 ----
1069
1070 Without a path argument, the `normand` tool reads from the standard
1071 input.
1072
1073 The `normand` tool prints the generated binary data to the standard
1074 output.
1075
1076 Various options control the initial <<state,state>> of the processor:
1077 use the `--help` option to learn more.
1078
1079 == {py3} API
1080
1081 The whole `normand` package/module API is:
1082
1083 [source,python]
1084 ----
1085 class ByteOrder(enum.Enum):
1086 # Big endian.
1087 BE = ...
1088
1089 # Little endian.
1090 LE = ...
1091
1092
1093 class TextLoc:
1094 # Line number.
1095 @property
1096 def line_no(self) -> int:
1097 ...
1098
1099 # Column number.
1100 @property
1101 def col_no(self) -> int:
1102 ...
1103
1104
1105 class ParseError(RuntimeError):
1106 # Source text location.
1107 @property
1108 def text_loc(self) -> TextLoc:
1109 ...
1110
1111
1112 SymbolsT = typing.Dict[str, int]
1113
1114
1115 class ParseResult:
1116 # Generated data.
1117 @property
1118 def data(self) -> bytearray:
1119 ...
1120
1121 # Updated variable values.
1122 @property
1123 def variables(self) -> SymbolsT:
1124 ...
1125
1126 # Updated main group label values.
1127 @property
1128 def labels(self) -> SymbolsT:
1129 ...
1130
1131 # Final offset.
1132 @property
1133 def offset(self) -> int:
1134 ...
1135
1136 # Final byte order.
1137 @property
1138 def byte_order(self) -> typing.Optional[ByteOrder]:
1139 ...
1140
1141
1142 def parse(normand: str,
1143 init_variables: typing.Optional[SymbolsT] = None,
1144 init_labels: typing.Optional[SymbolsT] = None,
1145 init_offset: int = 0,
1146 init_byte_order: typing.Optional[ByteOrder] = None) -> ParseResult:
1147 ...
1148 ----
1149
1150 The `normand` parameter is the actual <<learn-normand,Normand input>>
1151 while the other parameters control the initial <<state,state>>.
1152
1153 The `parse()` function raises a `ParseError` instance should it fail to
1154 parse the `normand` string for any reason.
1155
1156 == Development
1157
1158 Normand is a https://python-poetry.org/[Poetry] project.
1159
1160 To develop it, install it through Poetry and enter the virtual
1161 environment:
1162
1163 ----
1164 $ poetry install
1165 $ poetry shell
1166 $ normand <<< '"lol" * 10 0a'
1167 ----
1168
1169 `normand.py` is processed by:
1170
1171 * https://microsoft.github.io/pyright/[Pyright]
1172 * https://github.com/psf/black[Black]
1173 * https://pycqa.github.io/isort/[isort]
1174
1175 === Testing
1176
1177 Use https://docs.pytest.org/[pytest] to test Normand once the package is
1178 part of your virtual environment, for example:
1179
1180 ----
1181 $ poetry install
1182 $ poetry run pip3 install pytest
1183 $ poetry run pytest
1184 ----
1185
1186 The `pytest` project is currently not a development dependency in
1187 `pyproject.toml` due to backward compatibiliy issues with
1188 Python{nbsp}3.4.
1189
1190 In the `tests` directory, each `*.nt` file is a test. The file name
1191 prefix indicates what it's meant to test:
1192
1193 `pass-`::
1194 Everything above the `---` line is the valid Normand input
1195 to test.
1196 +
1197 Everything below the `---` line is the expected data
1198 (whitespace-separated hexadecimal bytes).
1199
1200 `fail-`::
1201 Everything above the `---` line is the invalid Normand input
1202 to test.
1203 +
1204 Everything below the `---` line is the expected error message having
1205 this form:
1206 +
1207 ----
1208 LINE:COL - MESSAGE
1209 ----
1210
1211 === Contributing
1212
1213 Normand uses https://review.lttng.org/admin/repos/normand,general[Gerrit]
1214 for code review.
1215
1216 To report a bug, https://github.com/efficios/normand/issues/new[create a
1217 GitHub issue].
This page took 0.054519 seconds and 4 git commands to generate.