Rename `normand.VarsT` to `normand.SymbolsT`
[normand.git] / README.adoc
1 // Show ToC at a specific location for a GitHub rendering
2 ifdef::env-github[]
3 :toc: macro
4 endif::env-github[]
5
6 ifndef::env-github[]
7 :toc: left
8 endif::env-github[]
9
10 // This is to mimic what GitHub does so that anchors work in an offline
11 // rendering too.
12 :idprefix:
13 :idseparator: -
14
15 // Other attributes
16 :py3: Python{nbsp}3
17
18 = Normand
19 Philippe Proulx
20
21 image::normand-logo.png[]
22
23 [.normal]
24 image:https://img.shields.io/pypi/v/normand.svg?label=Latest%20version[link="https://pypi.python.org/pypi/normand"]
25
26 [.lead]
27 _**Normand**_ is a text-to-binary processor with its own language.
28
29 This package offers both a portable {py3} module and a command-line
30 tool.
31
32 WARNING: This version of Normand is 0.5, meaning both the Normand
33 language and the module/CLI interface aren't stable.
34
35 ifdef::env-github[]
36 // ToC location for a GitHub rendering
37 toc::[]
38 endif::env-github[]
39
40 == Introduction
41
42 The purpose of Normand is to consume human-readable text representing
43 bytes and to produce the corresponding binary data.
44
45 .Simple bytes input.
46 ====
47 Consider the following Normand input:
48
49 ----
50 4f 55 32 bb $167 fe %10100111 a9 $-32
51 ----
52
53 The generated nine bytes are:
54
55 ----
56 4f 55 32 bb a7 fe a7 a9 e0
57 ----
58 ====
59
60 As you can see in the last example, the fundamental unit of the Normand
61 language is the _byte_. The order in which you list bytes will be the
62 order of the generated data.
63
64 The Normand language is more than simple lists of bytes, though. Its
65 main features are:
66
67 Comments, including a bunch of insignificant symbols which may improve readability::
68 +
69 Input:
70 +
71 ----
72 ff bb %1101:0010 # This is a comment
73 78 29 af $192 # This too # 99 $-80
74 fe80::6257:18ff:fea3:4229
75 60:57:18:a3:42:29
76 10839636-5d65-4a68-8e6a-21608ddf7258
77 ----
78 +
79 Output:
80 +
81 ----
82 ff bb d2 78 29 af c0 99 b0 fe 80 62 57 18 ff fe
83 a3 42 29 60 57 18 a3 42 29 10 83 96 36 5d 65 4a
84 68 8e 6a 21 60 8d df 72 58
85 ----
86
87 Hexadecimal, decimal, and binary byte constants::
88 +
89 Input:
90 +
91 ----
92 aa bb $247 $-89 %0011_0010 %11.01= 10/10
93 ----
94 +
95 Output:
96 +
97 ----
98 aa bb f7 a7 32 da
99 ----
100
101 UTF-8, UTF-16, and UTF-32 literal strings::
102 +
103 Input:
104 +
105 ----
106 "hello world!" 00
107 u16le"stress\nverdict 🤣"
108 ----
109 +
110 Output:
111 +
112 ----
113 68 65 6c 6c 6f 20 77 6f 72 6c 64 21 00 73 00 74 ┆ hello world!•s•t
114 00 72 00 65 00 73 00 73 00 0a 00 76 00 65 00 72 ┆ •r•e•s•s•••v•e•r
115 00 64 00 69 00 63 00 74 00 20 00 3e d8 23 dd ┆ •d•i•c•t• •>•#•
116 ----
117
118 Labels: special variables holding the offset where they're defined::
119 +
120 ----
121 <beg> b2 52 e3 bc 91 05
122 $100 $50 <chair> 33 9f fe
123 25 e9 89 8a <end>
124 ----
125
126 Variables::
127 +
128 ----
129 5e 65 {tower = 47} c6 7f f2 c4
130 44 {hurl = tower - 14} b5 {tower = hurl} 26 2d
131 ----
132 +
133 The value of a variable assignment is the evaluation of a valid {py3}
134 expression which may include label and variable names.
135
136 Fixed-length integer with a given length (8{nbsp}bits to 64{nbsp}bits) and byte order::
137 +
138 Input:
139 +
140 ----
141 {strength = 4}
142 {be} 67 <lbl> 44 $178 {(end - lbl) * 8 + strength : 16} $99 <end>
143 {le} {-1993 : 32}
144 ----
145 +
146 Output:
147 +
148 ----
149 67 44 b2 00 2c 63 37 f8 ff ff
150 ----
151 +
152 The encoded integer is the evaluation of a valid {py3} expression which
153 may include label and variable names.
154
155 https://en.wikipedia.org/wiki/LEB128[LEB128] integer::
156 +
157 Input:
158 +
159 ----
160 aa bb cc {-1993 : sleb128} <meow> dd ee ff
161 {meow * 199 : uleb128}
162 ----
163 +
164 Output:
165 +
166 ----
167 aa bb cc b7 70 dd ee ff e3 07
168 ----
169 +
170 The encoded integer is the evaluation of a valid {py3} expression which
171 may include label and variable names.
172
173 Repetition::
174 +
175 Input:
176 +
177 ----
178 aa bb * 5 cc <zoom> "yeah\0" * {zoom * 3}
179 ----
180 +
181 Output:
182 +
183 ----
184 aa bb bb bb bb bb cc 79 65 61 68 00 79 65 61 68 ┆ •••••••yeah•yeah
185 00 79 65 61 68 00 79 65 61 68 00 79 65 61 68 00 ┆ •yeah•yeah•yeah•
186 79 65 61 68 00 79 65 61 68 00 79 65 61 68 00 79 ┆ yeah•yeah•yeah•y
187 65 61 68 00 79 65 61 68 00 79 65 61 68 00 79 65 ┆ eah•yeah•yeah•ye
188 61 68 00 79 65 61 68 00 79 65 61 68 00 79 65 61 ┆ ah•yeah•yeah•yea
189 68 00 79 65 61 68 00 79 65 61 68 00 79 65 61 68 ┆ h•yeah•yeah•yeah
190 00 79 65 61 68 00 79 65 61 68 00 79 65 61 68 00 ┆ •yeah•yeah•yeah•
191 ----
192
193
194 Multilevel grouping::
195 +
196 Input:
197 +
198 ----
199 ff ((aa bb "zoom" cc) * 5) * 3 $-34 * 4
200 ----
201 +
202 Output:
203 +
204 ----
205 ff aa bb 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa ┆ •••zoom•••zoom••
206 bb 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa bb 7a ┆ •zoom•••zoom•••z
207 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa bb 7a 6f 6f ┆ oom•••zoom•••zoo
208 6d cc aa bb 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc ┆ m•••zoom•••zoom•
209 aa bb 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa bb ┆ ••zoom•••zoom•••
210 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa bb 7a 6f ┆ zoom•••zoom•••zo
211 6f 6d cc aa bb 7a 6f 6f 6d cc de de de de ┆ om•••zoom•••••
212 ----
213
214 Precise error reporting::
215 +
216 ----
217 /tmp/meow.normand:10:24 - Expecting a bit (`0` or `1`).
218 ----
219 +
220 ----
221 /tmp/meow.normand:32:6 - Unexpected character `k`.
222 ----
223 +
224 ----
225 /tmp/meow.normand:24:19 - Illegal (unknown or unreachable) variable/label name `meow` in expression `(meow - 45) // 8`; the legal names are {`mix`, `zoom`}.
226 ----
227 +
228 ----
229 /tmp/meow.normand:18:9 - Value 315 is outside the 8-bit range when evaluating expression `end - ICITTE` at byte offset 45.
230 ----
231
232 You can use Normand to track data source files in your favorite VCS
233 instead of raw binary files. The binary files that Normand generates can
234 be used to test file format decoding, including malformatted data, for
235 example, as well as for education.
236
237 See <<learn-normand>> to explore all the Normand features.
238
239 == Install Normand
240
241 Normand requires Python ≥ 3.4.
242
243 To install Normand:
244
245 ----
246 $ python3 -m pip install --user normand
247 ----
248
249 See
250 https://packaging.python.org/en/latest/tutorials/installing-packages/#installing-to-the-user-site[Installing to the User Site]
251 to learn more about a user site installation.
252
253 [NOTE]
254 ====
255 Normand has a single module file, `normand.py`, which you can copy as is
256 to your project to use it (both the <<python-3-api,`normand.parse()`>>
257 function and the <<command-line-tool,command-line tool>>).
258
259 `normand.py` has _no external dependencies_, but if you're using
260 Python{nbsp}3.4, you'll need a local copy of the standard `typing`
261 module.
262 ====
263
264 == Learn Normand
265
266 A Normand text input is a sequence of items which represent a sequence
267 of raw bytes.
268
269 [[state]] During the processing of items to data, Normand relies on a
270 current state:
271
272 [%header%autowidth]
273 |===
274 |State variable |Description |Initial value: <<python-3-api,{py3} API>> |Initial value: <<command-line-tool,CLI>>
275
276 |[[cur-offset]] Current offset
277 |
278 The current offset has an effect on the value of <<label,labels>> and of
279 the special `ICITTE` name in <<fixed-length-integer,fixed-length
280 integer>>, <<leb-128-integer,LEB128 integer>>, and
281 <<variable-assignment,variable assignment>> expression evaluation.
282
283 Each generated byte increments the current offset.
284
285 A <<current-offset-setting,current offset setting>> may change the
286 current offset.
287 |`init_offset` parameter of the `parse()` function.
288 |`--offset` option.
289
290 |[[cur-bo]] Current byte order
291 |
292 The current byte order has an effect on the encoding of
293 <<fixed-length-integer,fixed-length integers>>.
294
295 A <<current-byte-order-setting,current byte order setting>> may change
296 the current byte order.
297 |`init_byte_order` parameter of the `parse()` function.
298 |`--byte-order` option.
299
300 |<<label,Labels>>
301 |Mapping of label names to integral values.
302 |`init_labels` parameter of the `parse()` function.
303 |One or more `--label` options.
304
305 |<<variable-assignment,Variables>>
306 |Mapping of variable names to integral values.
307 |`init_variables` parameter of the `parse()` function.
308 |One or more `--var` options.
309 |===
310
311 The available items are:
312
313 * A <<byte-constant,constant integer>> representing a single byte.
314
315 * A <<literal-string,literal string>> representing a sequence of bytes
316 encoding UTF-8, UTF-16, or UTF-32 data.
317
318 * A <<current-byte-order-setting,current byte order setting>> (big or
319 little endian).
320
321 * A <<fixed-length-integer,fixed-length integer>> using the
322 <<cur-bo,current byte order>> and of which the value is the result of
323 a {py3} expression.
324
325 * An <<leb128-integer,LEB128 integer>> of which the value is the result
326 of a {py3} expression.
327
328 * A <<current-offset-setting,current offset setting>>.
329
330 * A <<label,label>>, that is, a named constant holding the current
331 offset.
332 +
333 This is similar to an assembly label.
334
335 * A <<variable-assignment,variable assignment>> associating a name to
336 the integral result of an evaluated {py3} expression.
337
338 * A <<group,group>>, that is, a scoped sequence of items.
339
340 Moreover, you can <<repetition,repeat>> any item above, except an offset
341 or a label, a given fixed or variable number of times. This is called a
342 repetition.
343
344 A Normand comment may exist:
345
346 * Between items, possibly within a group.
347 * Between the nibbles of a constant hexadecimal byte.
348 * Between the bits of a constant binary byte.
349 * Between the last item and the ``pass:[*]`` character of a repetition,
350 and between that ``pass:[*]`` character and the following number
351 or expression.
352
353 A comment is anything between two ``pass:[#]`` characters on the same
354 line, or from ``pass:[#]`` until the end of the line. Whitespaces and
355 the following symbol characters are also considered comments where a
356 comment may exist:
357
358 ----
359 ! @ / \ ? & : ; . , + [ ] _ = | -
360 ----
361
362 The latter serve to improve readability so that you may write, for
363 example, a MAC address or a UUID as is.
364
365 You can test the examples of this section with the `normand`
366 <<command-line-tool,command-line tool>> as such:
367
368 ----
369 $ normand file | hexdump -C
370 ----
371
372 where `file` is the name of a file containing the Normand input.
373
374 === Byte constant
375
376 A _byte constant_ represents a single byte.
377
378 A byte constant is:
379
380 Hexadecimal form::
381 Two consecutive hexits.
382
383 Decimal form::
384 A decimal number after the `$` prefix.
385
386 Binary form::
387 Eight bits after the `%` prefix.
388
389 ====
390 Input:
391
392 ----
393 ab cd [3d 8F] CC
394 ----
395
396 Output:
397
398 ----
399 ab cd 3d 8f cc
400 ----
401 ====
402
403 ====
404 Input:
405
406 ----
407 $192 %1100/0011 $ -77
408 ----
409
410 Output:
411
412 ----
413 c0 c3 b3
414 ----
415 ====
416
417 ====
418 Input:
419
420 ----
421 58f64689-6316-4d55-8a1a-04cada366172
422 fe80::6257:18ff:fea3:4229
423 ----
424
425 Output:
426
427 ----
428 58 f6 46 89 63 16 4d 55 8a 1a 04 ca da 36 61 72 ┆ X•F•c•MU•••••6ar
429 fe 80 62 57 18 ff fe a3 42 29 ┆ ••bW••••B)
430 ----
431 ====
432
433 ====
434 Input:
435
436 ----
437 %01110011 %01100001 %01101100 %01110101 %01110100
438 ----
439
440 Output:
441
442 ----
443 73 61 6c 75 74 ┆ salut
444 ----
445 ====
446
447 === Literal string
448
449 A _literal string_ represents the UTF-8-, UTF-16-, or UTF-32-encoded
450 bytes of a string.
451
452 The string to encode isn't implicitly null-terminated: use `\0` at the
453 end of the string to add a null character.
454
455 A literal string is:
456
457 . **Optional**: one of the following encodings instead of UTF-8:
458 +
459 --
460 [horizontal]
461 `u16be`:: UTF-16BE.
462 `u16le`:: UTF-16LE.
463 `u32be`:: UTF-32BE.
464 `u32le`:: UTF-32LE.
465 --
466
467 . The ``pass:["]`` prefix.
468
469 . A sequence of zero or more characters, possibly containing escape
470 sequences.
471 +
472 An escape sequence is the ``\`` character followed by one of:
473 +
474 --
475 [horizontal]
476 `0`:: Null (U+0000)
477 `a`:: Alert (U+0007)
478 `b`:: Backspace (U+0008)
479 `e`:: Escape (U+001B)
480 `f`:: Form feed (U+000C)
481 `n`:: End of line (U+000A)
482 `r`:: Carriage return (U+000D)
483 `t`:: Character tabulation (U+0009)
484 `v`:: Line tabulation (U+000B)
485 ``\``:: Reverse solidus (U+005C)
486 ``pass:["]``:: Quotation mark (U+0022)
487 --
488
489 . The ``pass:["]`` suffix.
490
491 ====
492 Input:
493
494 ----
495 "coucou tout le monde!"
496 ----
497
498 Output:
499
500 ----
501 63 6f 75 63 6f 75 20 74 6f 75 74 20 6c 65 20 6d ┆ coucou tout le m
502 6f 6e 64 65 21 ┆ onde!
503 ----
504 ====
505
506 ====
507 Input:
508
509 ----
510 u16le"I am not young enough to know everything."
511 ----
512
513 Output:
514
515 ----
516 49 00 20 00 61 00 6d 00 20 00 6e 00 6f 00 74 00 ┆ I• •a•m• •n•o•t•
517 20 00 79 00 6f 00 75 00 6e 00 67 00 20 00 65 00 ┆ •y•o•u•n•g• •e•
518 6e 00 6f 00 75 00 67 00 68 00 20 00 74 00 6f 00 ┆ n•o•u•g•h• •t•o•
519 20 00 6b 00 6e 00 6f 00 77 00 20 00 65 00 76 00 ┆ •k•n•o•w• •e•v•
520 65 00 72 00 79 00 74 00 68 00 69 00 6e 00 67 00 ┆ e•r•y•t•h•i•n•g•
521 2e 00 ┆ .•
522 ----
523 ====
524
525 ====
526 Input:
527
528 ----
529 u32be "\"illusion is the first\nof all pleasures\" 🦉"
530 ----
531
532 Output:
533
534 ----
535 00 00 00 22 00 00 00 69 00 00 00 6c 00 00 00 6c ┆ •••"•••i•••l•••l
536 00 00 00 75 00 00 00 73 00 00 00 69 00 00 00 6f ┆ •••u•••s•••i•••o
537 00 00 00 6e 00 00 00 20 00 00 00 69 00 00 00 73 ┆ •••n••• •••i•••s
538 00 00 00 20 00 00 00 74 00 00 00 68 00 00 00 65 ┆ ••• •••t•••h•••e
539 00 00 00 20 00 00 00 66 00 00 00 69 00 00 00 72 ┆ ••• •••f•••i•••r
540 00 00 00 73 00 00 00 74 00 00 00 0a 00 00 00 6f ┆ •••s•••t•••••••o
541 00 00 00 66 00 00 00 20 00 00 00 61 00 00 00 6c ┆ •••f••• •••a•••l
542 00 00 00 6c 00 00 00 20 00 00 00 70 00 00 00 6c ┆ •••l••• •••p•••l
543 00 00 00 65 00 00 00 61 00 00 00 73 00 00 00 75 ┆ •••e•••a•••s•••u
544 00 00 00 72 00 00 00 65 00 00 00 73 00 00 00 22 ┆ •••r•••e•••s•••"
545 00 00 00 20 00 01 f9 89 ┆ ••• ••••
546 ----
547 ====
548
549 === Current byte order setting
550
551 This special item sets the <<cur-bo,_current byte order_>>.
552
553 The two accepted forms are:
554
555 [horizontal]
556 ``pass:[{be}]``:: Set the current byte order to big endian.
557 ``pass:[{le}]``:: Set the current byte order to little endian.
558
559 === Fixed-length integer
560
561 A _fixed-length integer_ represents a fixed number of bytes encoding an
562 unsigned or signed integer which is the result of evaluating a {py3}
563 expression using the <<cur-bo,current byte order>>.
564
565 A fixed-length integer is:
566
567 . The ``pass:[{]`` prefix.
568
569 . A valid {py3} expression.
570 +
571 For a fixed-length integer at some source location{nbsp}__**L**__, this
572 expression may contain the name of any accessible <<label,label>> (not
573 within a nested group), including the name of a label defined
574 after{nbsp}__**L**__, as well as the name of any
575 <<variable-assignment,variable>> known at{nbsp}__**L**__.
576 +
577 The value of the special name `ICITTE` in this expression is the
578 <<cur-offset,current offset>> (before encoding the integer).
579
580 . The `:` character.
581
582 . An encoding length in bits amongst `8`, `16`, `24`, `32`, `40`,
583 `48`, `56`, and `64`.
584
585 . The `}` suffix.
586
587 ====
588 Input:
589
590 ----
591 {le} {345:16}
592 {be} {-0xabcd:32}
593 ----
594
595 Output:
596
597 ----
598 59 01 ff ff 54 33
599 ----
600 ====
601
602 ====
603 Input:
604
605 ----
606 {be}
607
608 # String length in bits
609 {8 * (str_end - str_beg) : 16}
610
611 # String
612 <str_beg>
613 "hello world!"
614 <str_end>
615 ----
616
617 Output:
618
619 ----
620 00 60 68 65 6c 6c 6f 20 77 6f 72 6c 64 21 ┆ •`hello world!
621 ----
622 ====
623
624 ====
625 Input:
626
627 ----
628 {20 - ICITTE : 8} * 10
629 ----
630
631 Output:
632
633 ----
634 14 13 12 11 10 0f 0e 0d 0c 0b
635 ----
636 ====
637
638 === LEB128 integer
639
640 An _LEB128 integer_ represents a variable number of bytes encoding an
641 unsigned or signed integer which is the result of evaluating a {py3}
642 expression following the https://en.wikipedia.org/wiki/LEB128[LEB128]
643 format.
644
645 An LEB128 integer is:
646
647 . The ``pass:[{]`` prefix.
648
649 . A valid {py3} expression.
650 +
651 For an LEB128 integer at some source location{nbsp}__**L**__, this
652 expression may contain:
653 +
654 --
655 * The name of any <<label,label>> defined before{nbsp}__**L**__.
656 * The name of any <<variable-assignment,variable>> known at{nbsp}__**L**__
657 which doesn't, directly or indirectly, refer to a label
658 defined after{nbsp}__**L**__.
659 --
660 +
661 The value of the special name `ICITTE` in this expression is the
662 <<cur-offset,current offset>> (before encoding the integer).
663
664 . The `:` character.
665
666 . One of:
667 +
668 --
669 [horizontal]
670 `uleb128`:: Use the unsigned LEB128 format.
671 `sleb128`:: Use the signed LEB128 format.
672 --
673
674 . The `}` suffix.
675
676 ====
677 Input:
678
679 ----
680 {624485 : uleb128}
681 ----
682
683 Output:
684
685 ----
686 e5 8e 26
687 ----
688 ====
689
690 ====
691 Input:
692
693 ----
694 aa bb cc dd
695 <meow>
696 ee ff
697 {-981238311 + (meow * -23) : sleb128}
698 "hello"
699 ----
700
701 ----
702 aa bb cc dd ee ff fd fa 8d ac 7c 68 65 6c 6c 6f ┆ ••••••••••|hello
703 ----
704 ====
705
706 === Current offset setting
707
708 This special item sets the <<cur-offset,_current offset_>>.
709
710 A current offset setting is:
711
712 . The `<` prefix.
713
714 . A positive integer (hexadecimal starting with `0x` or `0X` accepted)
715 which is the new current offset.
716
717 . The `>` suffix.
718
719 ====
720 Input:
721
722 ----
723 {ICITTE : 8} * 8
724 <0x61> {ICITTE : 8} * 8
725 ----
726
727 Output:
728
729 ----
730 00 01 02 03 04 05 06 07 61 62 63 64 65 66 67 68 ┆ ••••••••abcdefgh
731 ----
732 ====
733
734 ====
735 Input:
736
737 ----
738 aa bb cc dd <meow> ee ff
739 <12> 11 22 33 <mix> 44 55
740 {meow : 8} {mix : 8}
741 ----
742
743 Output:
744
745 ----
746 aa bb cc dd ee ff 11 22 33 44 55 04 0f ┆ •••••••"3DU••
747 ----
748 ====
749
750 === Label
751
752 A _label_ associates a name to the <<cur-offset,current offset>>.
753
754 All the labels of a whole Normand input must have unique names.
755
756 A label must not share the name of a <<variable-assignment,variable>>
757 name.
758
759 A label is:
760
761 . The `<` prefix.
762
763 . A valid {py3} name which is not `ICITTE` (see
764 <<fixed-length-integer>>, <<leb128-integer>>, and
765 <<variable-assignment>> to learn more).
766
767 . The `>` suffix.
768
769 === Variable assignment
770
771 A _variable assignment_ associates a name to the integral result of an
772 evaluated {py3} expression.
773
774 A variable assignment is:
775
776 . The ``pass:[{]`` prefix.
777
778 . A valid {py3} name which is not `ICITTE` (see
779 <<fixed-length-integer>>, <<leb128-integer>>, and
780 <<variable-assignment>> to learn more).
781
782 . The `=` character.
783
784 . A valid {py3} expression.
785 +
786 For a variable assignment at some source location{nbsp}__**L**__, this
787 expression may contain the name of any accessible <<label,label>> (not
788 within a nested group), including the name of a label defined
789 after{nbsp}__**L**__, as well as the name of any
790 <<variable-assignment,variable>> known at{nbsp}__**L**__.
791 +
792 The value of the special name `ICITTE` in this expression is the
793 <<cur-offset,current offset>>.
794
795 . The `}` suffix.
796
797 ====
798 Input:
799
800 ----
801 {mix = 101} {le}
802 {meow = 42} 11 22 {meow:8} 33 {meow = ICITTE + 17}
803 "yooo" {meow + mix : 16}
804 ----
805
806 Output:
807
808 ----
809 11 22 2a 33 79 6f 6f 6f 7a 00 ┆ •"*3yoooz•
810 ----
811 ====
812
813 === Group
814
815 A _group_ is a scoped sequence of items.
816
817 The <<label,labels>> within a group aren't visible outside of it.
818
819 The main purpose of a group is to <<repetition,repeat>> more than a
820 single item.
821
822 A group is:
823
824 . The `(` prefix.
825
826 . Zero or more items.
827
828 . The `)` suffix.
829
830 ====
831 Input:
832
833 ----
834 ((aa bb cc) dd () ee) "leclerc"
835 ----
836
837 Output:
838
839 ----
840 aa bb cc dd ee 6c 65 63 6c 65 72 63 ┆ •••••leclerc
841 ----
842 ====
843
844 ====
845 Input:
846
847 ----
848 ((aa bb cc) * 3 dd ee) * 5
849 ----
850
851 Output:
852
853 ----
854 aa bb cc aa bb cc aa bb cc dd ee aa bb cc aa bb
855 cc aa bb cc dd ee aa bb cc aa bb cc aa bb cc dd
856 ee aa bb cc aa bb cc aa bb cc dd ee aa bb cc aa
857 bb cc aa bb cc dd ee
858 ----
859 ====
860
861 ====
862 Input:
863
864 ----
865 {be}
866 (
867 <str_beg> u16le"sébastien diaz" <str_end>
868 {ICITTE - str_beg : 8}
869 {(end - str_beg) * 5 : 24}
870 ) * 3
871 <end>
872 ----
873
874 Output:
875
876 ----
877 73 00 e9 00 62 00 61 00 73 00 74 00 69 00 65 00 ┆ s•••b•a•s•t•i•e•
878 6e 00 20 00 64 00 69 00 61 00 7a 00 1c 00 01 e0 ┆ n• •d•i•a•z•••••
879 73 00 e9 00 62 00 61 00 73 00 74 00 69 00 65 00 ┆ s•••b•a•s•t•i•e•
880 6e 00 20 00 64 00 69 00 61 00 7a 00 1c 00 01 40 ┆ n• •d•i•a•z••••@
881 73 00 e9 00 62 00 61 00 73 00 74 00 69 00 65 00 ┆ s•••b•a•s•t•i•e•
882 6e 00 20 00 64 00 69 00 61 00 7a 00 1c 00 00 a0 ┆ n• •d•i•a•z•••••
883 ----
884 ====
885
886 === Repetition
887
888 A _repetition_ represents the bytes of an item repeated a given number
889 of times.
890
891 A repetition is:
892
893 . Any item.
894
895 . The ``pass:[*]`` character.
896
897 . One of:
898
899 ** A positive integer (hexadecimal starting with `0x` or `0X` accepted)
900 which is the number of times to repeat the previous item.
901
902 ** The ``pass:[{]`` prefix, a valid {py3} expression, and the
903 ``pass:[}]`` suffix.
904 +
905 For a repetition at some source location{nbsp}__**L**__, this expression
906 may contain:
907 +
908 --
909 * The name of any <<label,label>> defined before{nbsp}__**L**__ and
910 which isn't part of its repeated item.
911 * The name of any <<variable-assignment,variable>> known
912 at{nbsp}__**L**__, which isn't part of its repeated item, and which
913 doesn't, directly or indirectly, refer to a label defined
914 after{nbsp}__**L**__.
915 --
916 +
917 This expression must not contain the special name `ICITTE`.
918
919 ====
920 Input:
921
922 ----
923 {end - ICITTE - 1 : 8} * 0x100 <end>
924 ----
925
926 Output:
927
928 ----
929 ff fe fd fc fb fa f9 f8 f7 f6 f5 f4 f3 f2 f1 f0 ┆ ••••••••••••••••
930 ef ee ed ec eb ea e9 e8 e7 e6 e5 e4 e3 e2 e1 e0 ┆ ••••••••••••••••
931 df de dd dc db da d9 d8 d7 d6 d5 d4 d3 d2 d1 d0 ┆ ••••••••••••••••
932 cf ce cd cc cb ca c9 c8 c7 c6 c5 c4 c3 c2 c1 c0 ┆ ••••••••••••••••
933 bf be bd bc bb ba b9 b8 b7 b6 b5 b4 b3 b2 b1 b0 ┆ ••••••••••••••••
934 af ae ad ac ab aa a9 a8 a7 a6 a5 a4 a3 a2 a1 a0 ┆ ••••••••••••••••
935 9f 9e 9d 9c 9b 9a 99 98 97 96 95 94 93 92 91 90 ┆ ••••••••••••••••
936 8f 8e 8d 8c 8b 8a 89 88 87 86 85 84 83 82 81 80 ┆ ••••••••••••••••
937 7f 7e 7d 7c 7b 7a 79 78 77 76 75 74 73 72 71 70 ┆ •~}|{zyxwvutsrqp
938 6f 6e 6d 6c 6b 6a 69 68 67 66 65 64 63 62 61 60 ┆ onmlkjihgfedcba`
939 5f 5e 5d 5c 5b 5a 59 58 57 56 55 54 53 52 51 50 ┆ _^]\[ZYXWVUTSRQP
940 4f 4e 4d 4c 4b 4a 49 48 47 46 45 44 43 42 41 40 ┆ ONMLKJIHGFEDCBA@
941 3f 3e 3d 3c 3b 3a 39 38 37 36 35 34 33 32 31 30 ┆ ?>=<;:9876543210
942 2f 2e 2d 2c 2b 2a 29 28 27 26 25 24 23 22 21 20 ┆ /.-,+*)('&%$#"!
943 1f 1e 1d 1c 1b 1a 19 18 17 16 15 14 13 12 11 10 ┆ ••••••••••••••••
944 0f 0e 0d 0c 0b 0a 09 08 07 06 05 04 03 02 01 00 ┆ ••••••••••••••••
945 ----
946 ====
947
948 ====
949 Input:
950
951 ----
952 {times = 1}
953 aa bb cc dd
954 (
955 <here>
956 (ee ff) * {here + 1}
957 11 22 33 * {times}
958 {times = times + 1}
959 ) * 3
960 "coucou!"
961 ----
962
963 Output:
964
965 ----
966 aa bb cc dd ee ff ee ff ee ff ee ff ee ff 11 22 ┆ •••••••••••••••"
967 33 ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ 3•••••••••••••••
968 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
969 ff ee ff ee ff 11 22 33 33 ee ff ee ff ee ff ee ┆ ••••••"33•••••••
970 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
971 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
972 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
973 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
974 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
975 ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
976 ff ee ff ee ff ee ff ee ff ee ff ee ff 11 22 33 ┆ ••••••••••••••"3
977 33 33 63 6f 75 63 6f 75 21 ┆ 33coucou!
978 ----
979 ====
980
981 ====
982 This example shows how to use a repetition as a conditional section
983 depending on some predefined variable.
984
985 Input:
986
987 ----
988 aa bb cc dd
989 (ee ff "meow mix" 00) * {cond}
990 {be} {-1993:16}
991 ----
992
993 Output (`cond` is 0):
994
995 ----
996 aa bb cc dd f8 37
997 ----
998
999 Output (`cond` is 1):
1000
1001 ----
1002 aa bb cc dd ee ff 6d 65 6f 77 20 6d 69 78 00 f8 ┆ ••••••meow mix••
1003 37 ┆ 7
1004 ----
1005 ====
1006
1007 == Command-line tool
1008
1009 If you <<install-normand,installed>> the `normand` package, then you
1010 can use the `normand` command-line tool:
1011
1012 ----
1013 $ normand <<< '"ma gang de malades"' | hexdump -C
1014 ----
1015
1016 ----
1017 00000000 6d 61 20 67 61 6e 67 20 64 65 20 6d 61 6c 61 64 |ma gang de malad|
1018 00000010 65 73 |es|
1019 ----
1020
1021 If you copy the `normand.py` module to your own project, then you can
1022 run the module itself:
1023
1024 ----
1025 $ python3 -m normand <<< '"ma gang de malades"' | hexdump -C
1026 ----
1027
1028 ----
1029 00000000 6d 61 20 67 61 6e 67 20 64 65 20 6d 61 6c 61 64 |ma gang de malad|
1030 00000010 65 73 |es|
1031 ----
1032
1033 Without a path argument, the `normand` tool reads from the standard
1034 input.
1035
1036 The `normand` tool prints the generated binary data to the standard
1037 output.
1038
1039 Various options control the initial <<state,state>> of the processor:
1040 use the `--help` option to learn more.
1041
1042 == {py3} API
1043
1044 The whole `normand` package/module API is:
1045
1046 [source,python]
1047 ----
1048 class ByteOrder(enum.Enum):
1049 # Big endian.
1050 BE = ...
1051
1052 # Little endian.
1053 LE = ...
1054
1055
1056 class TextLoc:
1057 # Line number.
1058 @property
1059 def line_no(self) -> int:
1060 ...
1061
1062 # Column number.
1063 @property
1064 def col_no(self) -> int:
1065 ...
1066
1067
1068 class ParseError(RuntimeError):
1069 # Source text location.
1070 @property
1071 def text_loc(self) -> TextLoc:
1072 ...
1073
1074
1075 SymbolsT = typing.Dict[str, int]
1076
1077
1078 class ParseResult:
1079 # Generated data.
1080 @property
1081 def data(self) -> bytearray:
1082 ...
1083
1084 # Updated variable values.
1085 @property
1086 def variables(self) -> SymbolsT:
1087 ...
1088
1089 # Updated main group label values.
1090 @property
1091 def labels(self) -> SymbolsT:
1092 ...
1093
1094 # Final offset.
1095 @property
1096 def offset(self) -> int:
1097 ...
1098
1099 # Final byte order.
1100 @property
1101 def byte_order(self) -> typing.Optional[ByteOrder]:
1102 ...
1103
1104
1105 def parse(normand: str,
1106 init_variables: typing.Optional[SymbolsT] = None,
1107 init_labels: typing.Optional[SymbolsT] = None,
1108 init_offset: int = 0,
1109 init_byte_order: typing.Optional[ByteOrder] = None) -> ParseResult:
1110 ...
1111 ----
1112
1113 The `normand` parameter is the actual <<learn-normand,Normand input>>
1114 while the other parameters control the initial <<state,state>>.
1115
1116 The `parse()` function raises a `ParseError` instance should it fail to
1117 parse the `normand` string for any reason.
1118
1119 == Development
1120
1121 Normand is a https://python-poetry.org/[Poetry] project.
1122
1123 To develop it, install it through Poetry and enter the virtual
1124 environment:
1125
1126 ----
1127 $ poetry install
1128 $ poetry shell
1129 $ normand <<< '"lol" * 10 0a'
1130 ----
1131
1132 `normand.py` is processed by:
1133
1134 * https://microsoft.github.io/pyright/[Pyright]
1135 * https://github.com/psf/black[Black]
1136 * https://pycqa.github.io/isort/[isort]
1137
1138 === Testing
1139
1140 Use https://docs.pytest.org/[pytest] to test Normand once the package is
1141 part of your virtual environment, for example:
1142
1143 ----
1144 $ poetry install
1145 $ poetry run pip3 install pytest
1146 $ poetry run pytest
1147 ----
1148
1149 The `pytest` project is currently not a development dependency in
1150 `pyproject.toml` due to backward compatibiliy issues with
1151 Python{nbsp}3.4.
1152
1153 In the `tests` directory, each `*.nt` file is a test. The file name
1154 prefix indicates what it's meant to test:
1155
1156 `pass-`::
1157 Everything above the `---` line is the valid Normand input
1158 to test.
1159 +
1160 Everything below the `---` line is the expected data
1161 (whitespace-separated hexadecimal bytes).
1162
1163 `fail-`::
1164 Everything above the `---` line is the invalid Normand input
1165 to test.
1166 +
1167 Everything below the `---` line is the expected error message having
1168 this form:
1169 +
1170 ----
1171 LINE:COL - MESSAGE
1172 ----
1173
1174 === Contributing
1175
1176 Normand uses https://review.lttng.org/admin/repos/normand,general[Gerrit]
1177 for code review.
1178
1179 To report a bug, https://github.com/efficios/normand/issues/new[create a
1180 GitHub issue].
This page took 0.052766 seconds and 4 git commands to generate.