Add the directive form of a repetition (`!repeat`)
[normand.git] / README.adoc
CommitLineData
bb2f9e9c
PP
1// Show ToC at a specific location for a GitHub rendering
2ifdef::env-github[]
3:toc: macro
4endif::env-github[]
5
6ifndef::env-github[]
71aaa3f7 7:toc: left
bb2f9e9c
PP
8endif::env-github[]
9
10// This is to mimic what GitHub does so that anchors work in an offline
11// rendering too.
12:idprefix:
13:idseparator: -
71aaa3f7 14
bb2f9e9c 15// Other attributes
71aaa3f7
PP
16:py3: Python{nbsp}3
17
bb2f9e9c
PP
18= Normand
19Philippe Proulx
20
df0f8552
PP
21image::normand-logo.png[]
22
71aaa3f7
PP
23[.normal]
24image:https://img.shields.io/pypi/v/normand.svg?label=Latest%20version[link="https://pypi.python.org/pypi/normand"]
25
26[.lead]
27_**Normand**_ is a text-to-binary processor with its own language.
28
29This package offers both a portable {py3} module and a command-line
30tool.
31
e57a18e1 32WARNING: This version of Normand is 0.8, meaning both the Normand
71aaa3f7
PP
33language and the module/CLI interface aren't stable.
34
bb2f9e9c
PP
35ifdef::env-github[]
36// ToC location for a GitHub rendering
37toc::[]
38endif::env-github[]
39
71aaa3f7
PP
40== Introduction
41
42The purpose of Normand is to consume human-readable text representing
43bytes and to produce the corresponding binary data.
44
45.Simple bytes input.
46====
47Consider the following Normand input:
48
49----
504f 55 32 bb $167 fe %10100111 a9 $-32
51----
52
53The generated nine bytes are:
54
55----
564f 55 32 bb a7 fe a7 a9 e0
57----
58====
59
60As you can see in the last example, the fundamental unit of the Normand
61language is the _byte_. The order in which you list bytes will be the
62order of the generated data.
63
64The Normand language is more than simple lists of bytes, though. Its
65main features are:
66
67Comments, including a bunch of insignificant symbols which may improve readability::
68+
69Input:
70+
71----
72ff bb %1101:0010 # This is a comment
7378 29 af $192 # This too # 99 $-80
74fe80::6257:18ff:fea3:4229
7560:57:18:a3:42:29
7610839636-5d65-4a68-8e6a-21608ddf7258
77----
78+
79Output:
80+
81----
82ff bb d2 78 29 af c0 99 b0 fe 80 62 57 18 ff fe
83a3 42 29 60 57 18 a3 42 29 10 83 96 36 5d 65 4a
8468 8e 6a 21 60 8d df 72 58
85----
86
87Hexadecimal, decimal, and binary byte constants::
88+
89Input:
90+
91----
92aa bb $247 $-89 %0011_0010 %11.01= 10/10
93----
94+
95Output:
96+
97----
98aa bb f7 a7 32 da
99----
100
101UTF-8, UTF-16, and UTF-32 literal strings::
102+
103Input:
104+
105----
106"hello world!" 00
107u16le"stress\nverdict 🤣"
108----
109+
110Output:
111+
112----
11368 65 6c 6c 6f 20 77 6f 72 6c 64 21 00 73 00 74 ┆ hello world!•s•t
11400 72 00 65 00 73 00 73 00 0a 00 76 00 65 00 72 ┆ •r•e•s•s•••v•e•r
11500 64 00 69 00 63 00 74 00 20 00 3e d8 23 dd ┆ •d•i•c•t• •>•#•
116----
117
118Labels: special variables holding the offset where they're defined::
119+
120----
121<beg> b2 52 e3 bc 91 05
122$100 $50 <chair> 33 9f fe
12325 e9 89 8a <end>
124----
125
126Variables::
127+
128----
1295e 65 {tower = 47} c6 7f f2 c4
13044 {hurl = tower - 14} b5 {tower = hurl} 26 2d
131----
132+
133The value of a variable assignment is the evaluation of a valid {py3}
134expression which may include label and variable names.
135
269f6eb3 136Fixed-length number with a given length (8{nbsp}bits to 64{nbsp}bits) and byte order::
71aaa3f7
PP
137+
138Input:
139+
140----
141{strength = 4}
142{be} 67 <lbl> 44 $178 {(end - lbl) * 8 + strength : 16} $99 <end>
143{le} {-1993 : 32}
269f6eb3 144{-3.141593 : 64}
71aaa3f7
PP
145----
146+
147Output:
148+
149----
269f6eb3
PP
15067 44 b2 00 2c 63 37 f8 ff ff 7f bd c2 82 fb 21
15109 c0
71aaa3f7
PP
152----
153+
269f6eb3 154The encoded number is the evaluation of a valid {py3} expression which
05f81895
PP
155may include label and variable names.
156
157https://en.wikipedia.org/wiki/LEB128[LEB128] integer::
158+
159Input:
160+
161----
162aa bb cc {-1993 : sleb128} <meow> dd ee ff
163{meow * 199 : uleb128}
164----
165+
166Output:
167+
168----
169aa bb cc b7 70 dd ee ff e3 07
170----
171+
172The encoded integer is the evaluation of a valid {py3} expression which
71aaa3f7
PP
173may include label and variable names.
174
175Repetition::
176+
177Input:
178+
179----
2adf4336 180aa bb * 5 cc <zoom> "yeah\0" * {zoom * 3}
e57a18e1
PP
181
182!repeat 3
183 ff ee "juice"
184!end
71aaa3f7
PP
185----
186+
187Output:
188+
189----
2adf4336
PP
190aa bb bb bb bb bb cc 79 65 61 68 00 79 65 61 68 ┆ •••••••yeah•yeah
19100 79 65 61 68 00 79 65 61 68 00 79 65 61 68 00 ┆ •yeah•yeah•yeah•
19279 65 61 68 00 79 65 61 68 00 79 65 61 68 00 79 ┆ yeah•yeah•yeah•y
19365 61 68 00 79 65 61 68 00 79 65 61 68 00 79 65 ┆ eah•yeah•yeah•ye
19461 68 00 79 65 61 68 00 79 65 61 68 00 79 65 61 ┆ ah•yeah•yeah•yea
19568 00 79 65 61 68 00 79 65 61 68 00 79 65 61 68 ┆ h•yeah•yeah•yeah
71aaa3f7 19600 79 65 61 68 00 79 65 61 68 00 79 65 61 68 00 ┆ •yeah•yeah•yeah•
e57a18e1
PP
197ff ee 6a 75 69 63 65 ff ee 6a 75 69 63 65 ff ee ┆ ••juice••juice••
1986a 75 69 63 65 ┆ juice
71aaa3f7
PP
199----
200
676f6189
PP
201Alignment::
202+
203Input:
204+
205----
206{be}
207
208 {199:32}
209@64 {43:64}
210@16 {-123:16}
211@32~255 {5584:32}
212----
213+
214Output:
215+
216----
21700 00 00 c7 00 00 00 00 00 00 00 00 00 00 00 2b
218ff 85 ff ff 00 00 15 d0
219----
71aaa3f7
PP
220
221Multilevel grouping::
222+
223Input:
224+
225----
226ff ((aa bb "zoom" cc) * 5) * 3 $-34 * 4
227----
228+
229Output:
230+
231----
232ff aa bb 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa ┆ •••zoom•••zoom••
233bb 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa bb 7a ┆ •zoom•••zoom•••z
2346f 6f 6d cc aa bb 7a 6f 6f 6d cc aa bb 7a 6f 6f ┆ oom•••zoom•••zoo
2356d cc aa bb 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc ┆ m•••zoom•••zoom•
236aa bb 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa bb ┆ ••zoom•••zoom•••
2377a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa bb 7a 6f ┆ zoom•••zoom•••zo
2386f 6d cc aa bb 7a 6f 6f 6d cc de de de de ┆ om•••zoom•••••
239----
240
241Precise error reporting::
242+
243----
244/tmp/meow.normand:10:24 - Expecting a bit (`0` or `1`).
245----
246+
247----
248/tmp/meow.normand:32:6 - Unexpected character `k`.
249----
250+
251----
2adf4336 252/tmp/meow.normand:24:19 - Illegal (unknown or unreachable) variable/label name `meow` in expression `(meow - 45) // 8`; the legal names are {`mix`, `zoom`}.
71aaa3f7
PP
253----
254+
255----
256/tmp/meow.normand:18:9 - Value 315 is outside the 8-bit range when evaluating expression `end - ICITTE` at byte offset 45.
257----
258
259You can use Normand to track data source files in your favorite VCS
260instead of raw binary files. The binary files that Normand generates can
261be used to test file format decoding, including malformatted data, for
262example, as well as for education.
263
264See <<learn-normand>> to explore all the Normand features.
265
266== Install Normand
267
268Normand requires Python ≥ 3.4.
269
270To install Normand:
271
272----
273$ python3 -m pip install --user normand
274----
275
276See
277https://packaging.python.org/en/latest/tutorials/installing-packages/#installing-to-the-user-site[Installing to the User Site]
278to learn more about a user site installation.
279
280[NOTE]
281====
282Normand has a single module file, `normand.py`, which you can copy as is
af3cf417 283to your project to use it (both the <<python3-api,`normand.parse()`>>
71aaa3f7
PP
284function and the <<command-line-tool,command-line tool>>).
285
286`normand.py` has _no external dependencies_, but if you're using
287Python{nbsp}3.4, you'll need a local copy of the standard `typing`
288module.
289====
290
291== Learn Normand
292
293A Normand text input is a sequence of items which represent a sequence
294of raw bytes.
295
296[[state]] During the processing of items to data, Normand relies on a
297current state:
298
299[%header%autowidth]
300|===
af3cf417 301|State variable |Description |Initial value: <<python3-api,{py3} API>> |Initial value: <<command-line-tool,CLI>>
71aaa3f7
PP
302
303|[[cur-offset]] Current offset
304|
05f81895 305The current offset has an effect on the value of <<label,labels>> and of
269f6eb3
PP
306the special `ICITTE` name in <<fixed-length-number,fixed-length
307number>>, <<leb-128-integer,LEB128 integer>>, and
71aaa3f7
PP
308<<variable-assignment,variable assignment>> expression evaluation.
309
310Each generated byte increments the current offset.
311
312A <<current-offset-setting,current offset setting>> may change the
676f6189
PP
313current offset without generating data.
314
315An <<current-offset-alignment,current offset alignment>> generates
316padding bytes to make the current offset satisfy a given alignment.
71aaa3f7
PP
317|`init_offset` parameter of the `parse()` function.
318|`--offset` option.
319
320|[[cur-bo]] Current byte order
321|
05f81895 322The current byte order has an effect on the encoding of
269f6eb3 323<<fixed-length-number,fixed-length numbers>>.
71aaa3f7
PP
324
325A <<current-byte-order-setting,current byte order setting>> may change
326the current byte order.
327|`init_byte_order` parameter of the `parse()` function.
328|`--byte-order` option.
329
330|<<label,Labels>>
331|Mapping of label names to integral values.
332|`init_labels` parameter of the `parse()` function.
333|One or more `--label` options.
334
335|<<variable-assignment,Variables>>
336|Mapping of variable names to integral values.
337|`init_variables` parameter of the `parse()` function.
338|One or more `--var` options.
339|===
340
341The available items are:
342
343* A <<byte-constant,constant integer>> representing a single byte.
344
345* A <<literal-string,literal string>> representing a sequence of bytes
346 encoding UTF-8, UTF-16, or UTF-32 data.
347
348* A <<current-byte-order-setting,current byte order setting>> (big or
349 little endian).
350
269f6eb3
PP
351* A <<fixed-length-number,fixed-length number>> (integer or
352 floating point) using the <<cur-bo,current byte order>> and of which
353 the value is the result of a {py3} expression.
05f81895
PP
354
355* An <<leb128-integer,LEB128 integer>> of which the value is the result
356 of a {py3} expression.
71aaa3f7
PP
357
358* A <<current-offset-setting,current offset setting>>.
359
676f6189
PP
360* A <<current-offset-alignment,current offset alignment>>.
361
71aaa3f7
PP
362* A <<label,label>>, that is, a named constant holding the current
363 offset.
364+
365This is similar to an assembly label.
366
367* A <<variable-assignment,variable assignment>> associating a name to
368 the integral result of an evaluated {py3} expression.
369
370* A <<group,group>>, that is, a scoped sequence of items.
371
e57a18e1
PP
372* A <<repetition-block,repetition block>>.
373
374Moreover, you can repeat many items above a constant or variable number
375of times with the ``pass:[*]`` operator _after_ the item to repeat. This
376is called a <<post-item-repetition,post-item repetition>>.
71aaa3f7
PP
377
378A Normand comment may exist:
379
380* Between items, possibly within a group.
381* Between the nibbles of a constant hexadecimal byte.
382* Between the bits of a constant binary byte.
e57a18e1
PP
383* Between the last item and the ``pass:[*]`` character of a post-item
384 repetition, and between that ``pass:[*]`` character and the following
385 number or expression.
386* Between the ``!repeat``/``!r`` prefix and the following constant
387 integer, name, or expression of a repetition block.
71aaa3f7
PP
388
389A comment is anything between two ``pass:[#]`` characters on the same
390line, or from ``pass:[#]`` until the end of the line. Whitespaces and
391the following symbol characters are also considered comments where a
392comment may exist:
393
394----
e57a18e1 395/ \ ? & : ; . , + [ ] _ = | -
71aaa3f7
PP
396----
397
398The latter serve to improve readability so that you may write, for
399example, a MAC address or a UUID as is.
400
401You can test the examples of this section with the `normand`
402<<command-line-tool,command-line tool>> as such:
403
404----
405$ normand file | hexdump -C
406----
407
408where `file` is the name of a file containing the Normand input.
409
410=== Byte constant
411
412A _byte constant_ represents a single byte.
413
414A byte constant is:
415
416Hexadecimal form::
417 Two consecutive hexits.
418
419Decimal form::
420 A decimal number after the `$` prefix.
421
422Binary form::
423 Eight bits after the `%` prefix.
424
425====
426Input:
427
428----
429ab cd [3d 8F] CC
430----
431
432Output:
433
434----
435ab cd 3d 8f cc
436----
437====
438
439====
440Input:
441
442----
443$192 %1100/0011 $ -77
444----
445
446Output:
447
448----
449c0 c3 b3
450----
451====
452
453====
454Input:
455
456----
45758f64689-6316-4d55-8a1a-04cada366172
458fe80::6257:18ff:fea3:4229
459----
460
461Output:
462
463----
46458 f6 46 89 63 16 4d 55 8a 1a 04 ca da 36 61 72 ┆ X•F•c•MU•••••6ar
465fe 80 62 57 18 ff fe a3 42 29 ┆ ••bW••••B)
466----
467====
468
469====
470Input:
471
472----
473%01110011 %01100001 %01101100 %01110101 %01110100
474----
475
476Output:
477
478----
47973 61 6c 75 74 ┆ salut
480----
481====
482
483=== Literal string
484
485A _literal string_ represents the UTF-8-, UTF-16-, or UTF-32-encoded
486bytes of a string.
487
488The string to encode isn't implicitly null-terminated: use `\0` at the
489end of the string to add a null character.
490
491A literal string is:
492
493. **Optional**: one of the following encodings instead of UTF-8:
494+
495--
496[horizontal]
497`u16be`:: UTF-16BE.
498`u16le`:: UTF-16LE.
499`u32be`:: UTF-32BE.
500`u32le`:: UTF-32LE.
501--
502
503. The ``pass:["]`` prefix.
504
505. A sequence of zero or more characters, possibly containing escape
506 sequences.
507+
508An escape sequence is the ``\`` character followed by one of:
509+
510--
511[horizontal]
512`0`:: Null (U+0000)
513`a`:: Alert (U+0007)
514`b`:: Backspace (U+0008)
515`e`:: Escape (U+001B)
516`f`:: Form feed (U+000C)
517`n`:: End of line (U+000A)
518`r`:: Carriage return (U+000D)
519`t`:: Character tabulation (U+0009)
520`v`:: Line tabulation (U+000B)
521``\``:: Reverse solidus (U+005C)
522``pass:["]``:: Quotation mark (U+0022)
523--
524
525. The ``pass:["]`` suffix.
526
527====
528Input:
529
530----
531"coucou tout le monde!"
532----
533
534Output:
535
536----
53763 6f 75 63 6f 75 20 74 6f 75 74 20 6c 65 20 6d ┆ coucou tout le m
5386f 6e 64 65 21 ┆ onde!
539----
540====
541
542====
543Input:
544
545----
546u16le"I am not young enough to know everything."
547----
548
549Output:
550
551----
55249 00 20 00 61 00 6d 00 20 00 6e 00 6f 00 74 00 ┆ I• •a•m• •n•o•t•
55320 00 79 00 6f 00 75 00 6e 00 67 00 20 00 65 00 ┆ •y•o•u•n•g• •e•
5546e 00 6f 00 75 00 67 00 68 00 20 00 74 00 6f 00 ┆ n•o•u•g•h• •t•o•
55520 00 6b 00 6e 00 6f 00 77 00 20 00 65 00 76 00 ┆ •k•n•o•w• •e•v•
55665 00 72 00 79 00 74 00 68 00 69 00 6e 00 67 00 ┆ e•r•y•t•h•i•n•g•
5572e 00 ┆ .•
558----
559====
560
561====
562Input:
563
564----
565u32be "\"illusion is the first\nof all pleasures\" 🦉"
566----
567
568Output:
569
570----
57100 00 00 22 00 00 00 69 00 00 00 6c 00 00 00 6c ┆ •••"•••i•••l•••l
57200 00 00 75 00 00 00 73 00 00 00 69 00 00 00 6f ┆ •••u•••s•••i•••o
57300 00 00 6e 00 00 00 20 00 00 00 69 00 00 00 73 ┆ •••n••• •••i•••s
57400 00 00 20 00 00 00 74 00 00 00 68 00 00 00 65 ┆ ••• •••t•••h•••e
57500 00 00 20 00 00 00 66 00 00 00 69 00 00 00 72 ┆ ••• •••f•••i•••r
57600 00 00 73 00 00 00 74 00 00 00 0a 00 00 00 6f ┆ •••s•••t•••••••o
57700 00 00 66 00 00 00 20 00 00 00 61 00 00 00 6c ┆ •••f••• •••a•••l
57800 00 00 6c 00 00 00 20 00 00 00 70 00 00 00 6c ┆ •••l••• •••p•••l
57900 00 00 65 00 00 00 61 00 00 00 73 00 00 00 75 ┆ •••e•••a•••s•••u
58000 00 00 72 00 00 00 65 00 00 00 73 00 00 00 22 ┆ •••r•••e•••s•••"
58100 00 00 20 00 01 f9 89 ┆ ••• ••••
582----
583====
584
585=== Current byte order setting
586
587This special item sets the <<cur-bo,_current byte order_>>.
588
589The two accepted forms are:
590
591[horizontal]
592``pass:[{be}]``:: Set the current byte order to big endian.
593``pass:[{le}]``:: Set the current byte order to little endian.
594
269f6eb3 595=== Fixed-length number
71aaa3f7 596
269f6eb3
PP
597A _fixed-length number_ represents a fixed number of bytes encoding
598either:
599
600* An unsigned or signed integer (two's complement).
601+
602The available lengths are 8, 16, 24, 32, 40, 48, 56, and 64.
603
604* A floating point number
605 ([IEEE{nbsp}754-2008[https://standards.ieee.org/standard/754-2008.html]).
606+
607The available length are 32 (_binary32_) and 64 (_binary64_).
71aaa3f7 608
269f6eb3
PP
609The value is the result of evaluating a {py3} expression using the
610<<cur-bo,current byte order>>.
611
612A fixed-length number is:
71aaa3f7
PP
613
614. The ``pass:[{]`` prefix.
615
616. A valid {py3} expression.
05f81895 617+
269f6eb3 618For a fixed-length number at some source location{nbsp}__**L**__, this
05f81895
PP
619expression may contain the name of any accessible <<label,label>> (not
620within a nested group), including the name of a label defined
621after{nbsp}__**L**__, as well as the name of any
622<<variable-assignment,variable>> known at{nbsp}__**L**__.
623+
269f6eb3
PP
624The value of the special name `ICITTE` (`int` type) in this expression
625is the <<cur-offset,current offset>> (before encoding the number).
71aaa3f7
PP
626
627. The `:` character.
628
269f6eb3
PP
629. An encoding length in bits amongst:
630+
631--
632The expression evaluates to an `int` value::
633 `8`, `16`, `24`, `32`, `40`, `48`, `56`, and `64`.
634
635The expression evaluates to a `float` value::
636 `32` and `64`.
637--
71aaa3f7
PP
638
639. The `}` suffix.
640
641====
642Input:
643
644----
645{le} {345:16}
646{be} {-0xabcd:32}
647----
648
649Output:
650
651----
65259 01 ff ff 54 33
653----
654====
655
656====
657Input:
658
659----
660{be}
661
662# String length in bits
663{8 * (str_end - str_beg) : 16}
664
665# String
666<str_beg>
667 "hello world!"
668<str_end>
669----
670
671Output:
672
673----
67400 60 68 65 6c 6c 6f 20 77 6f 72 6c 64 21 ┆ •`hello world!
675----
676====
677
678====
679Input:
680
681----
682{20 - ICITTE : 8} * 10
683----
684
685Output:
686
687----
68814 13 12 11 10 0f 0e 0d 0c 0b
689----
690====
691
269f6eb3
PP
692====
693Input:
694
695----
696{le}
697{2 * 0.0529 : 32}
698----
699
700Output:
701
702----
703ac ad d8 3d
704----
705====
706
05f81895
PP
707=== LEB128 integer
708
709An _LEB128 integer_ represents a variable number of bytes encoding an
710unsigned or signed integer which is the result of evaluating a {py3}
711expression following the https://en.wikipedia.org/wiki/LEB128[LEB128]
712format.
713
714An LEB128 integer is:
715
716. The ``pass:[{]`` prefix.
717
718. A valid {py3} expression.
719+
720For an LEB128 integer at some source location{nbsp}__**L**__, this
721expression may contain:
722+
723--
724* The name of any <<label,label>> defined before{nbsp}__**L**__.
725* The name of any <<variable-assignment,variable>> known at{nbsp}__**L**__
726 which doesn't, directly or indirectly, refer to a label
727 defined after{nbsp}__**L**__.
728--
729+
269f6eb3
PP
730The value of the special name `ICITTE` (`int` type) in this expression
731is the <<cur-offset,current offset>> (before encoding the integer).
05f81895
PP
732
733. The `:` character.
734
735. One of:
736+
737--
738[horizontal]
739`uleb128`:: Use the unsigned LEB128 format.
740`sleb128`:: Use the signed LEB128 format.
741--
742
743. The `}` suffix.
744
745====
746Input:
747
748----
749{624485 : uleb128}
750----
751
752Output:
753
754----
755e5 8e 26
756----
757====
758
759====
760Input:
761
762----
763aa bb cc dd
764<meow>
765ee ff
766{-981238311 + (meow * -23) : sleb128}
767"hello"
768----
769
c2b79cf6
PP
770Output:
771
05f81895
PP
772----
773aa bb cc dd ee ff fd fa 8d ac 7c 68 65 6c 6c 6f ┆ ••••••••••|hello
774----
775====
776
71aaa3f7
PP
777=== Current offset setting
778
779This special item sets the <<cur-offset,_current offset_>>.
780
781A current offset setting is:
782
783. The `<` prefix.
784
785. A positive integer (hexadecimal starting with `0x` or `0X` accepted)
786 which is the new current offset.
787
788. The `>` suffix.
789
790====
791Input:
792
793----
794 {ICITTE : 8} * 8
795<0x61> {ICITTE : 8} * 8
796----
797
798Output:
799
800----
80100 01 02 03 04 05 06 07 61 62 63 64 65 66 67 68 ┆ ••••••••abcdefgh
802----
803====
804
805====
806Input:
807
808----
809aa bb cc dd <meow> ee ff
810<12> 11 22 33 <mix> 44 55
811{meow : 8} {mix : 8}
812----
813
814Output:
815
816----
817aa bb cc dd ee ff 11 22 33 44 55 04 0f ┆ •••••••"3DU••
818----
819====
820
676f6189
PP
821=== Current offset alignment
822
00deb9fa 823A _current offset alignment_ represents zero or more padding bytes to
676f6189
PP
824make the <<cur-offset,current offset>> meet a given
825https://en.wikipedia.org/wiki/Data_structure_alignment[alignment] value.
826
827More specifically, for an alignment value of{nbsp}__**N**__{nbsp}bits,
828a current offset alignment represents the required padding bytes until
829the current offset is a multiple of __**N**__{nbsp}/{nbsp}8.
830
831A current offset alignment is:
832
833. The `@` prefix.
834
835. A positive integer (hexadecimal starting with `0x` or `0X` accepted)
836 which is the alignment value in _bits_.
837+
838This value must be greater than zero and a multiple of{nbsp}8.
839
840. **Optional**:
841+
842--
843. The ``pass:[~]`` prefix.
844. A positive integer (hexadecimal starting with `0x` or `0X` accepted)
845 which is the value of the byte to use as padding to align the
846 <<cur-offset,current offset>>.
847--
848+
849Without this section, the padding byte value is zero.
850
851====
852Input:
853
854----
85511 22 (@32 aa bb cc) * 3
856----
857
858Output:
859
860----
86111 22 00 00 aa bb cc 00 aa bb cc 00 aa bb cc
862----
863====
864
865====
866Input:
867
868----
869{le}
87077 88
871@32~0xcc {-893.5:32}
872@128~0x55 "meow"
873----
874
875Output:
876
877----
87877 88 cc cc 00 60 5f c4 55 55 55 55 55 55 55 55 ┆ w••••`_•UUUUUUUU
8796d 65 6f 77 ┆ meow
880----
881====
882
883====
884Input:
885
886----
887aa bb cc <29> @64~255 "zoom"
888----
889
890Output:
891
892----
893aa bb cc ff ff ff 7a 6f 6f 6d ┆ ••••••zoom
894----
895====
896
71aaa3f7
PP
897=== Label
898
899A _label_ associates a name to the <<cur-offset,current offset>>.
900
901All the labels of a whole Normand input must have unique names.
902
05f81895 903A label must not share the name of a <<variable-assignment,variable>>
71aaa3f7
PP
904name.
905
71aaa3f7
PP
906A label is:
907
908. The `<` prefix.
909
05f81895 910. A valid {py3} name which is not `ICITTE` (see
269f6eb3 911 <<fixed-length-number>>, <<leb128-integer>>, and
05f81895 912 <<variable-assignment>> to learn more).
71aaa3f7
PP
913
914. The `>` suffix.
915
916=== Variable assignment
917
918A _variable assignment_ associates a name to the integral result of an
919evaluated {py3} expression.
920
05f81895 921A variable assignment is:
71aaa3f7
PP
922
923. The ``pass:[{]`` prefix.
924
05f81895 925. A valid {py3} name which is not `ICITTE` (see
269f6eb3 926 <<fixed-length-number>>, <<leb128-integer>>, and
05f81895 927 <<variable-assignment>> to learn more).
71aaa3f7
PP
928
929. The `=` character.
930
931. A valid {py3} expression.
05f81895
PP
932+
933For a variable assignment at some source location{nbsp}__**L**__, this
934expression may contain the name of any accessible <<label,label>> (not
935within a nested group), including the name of a label defined
936after{nbsp}__**L**__, as well as the name of any
937<<variable-assignment,variable>> known at{nbsp}__**L**__.
938+
269f6eb3
PP
939The value of the special name `ICITTE` (`int` type) in this expression
940is the <<cur-offset,current offset>>.
71aaa3f7
PP
941
942. The `}` suffix.
943
944====
945Input:
946
947----
948{mix = 101} {le}
949{meow = 42} 11 22 {meow:8} 33 {meow = ICITTE + 17}
950"yooo" {meow + mix : 16}
951----
952
953Output:
954
955----
95611 22 2a 33 79 6f 6f 6f 7a 00 ┆ •"*3yoooz•
957----
958====
959
960=== Group
961
962A _group_ is a scoped sequence of items.
963
964The <<label,labels>> within a group aren't visible outside of it.
965
e57a18e1
PP
966The main purpose of a group is to <<post-item-repetition,repeat>> more
967than a single item and to isolate labels.
71aaa3f7
PP
968
969A group is:
970
971. The `(` prefix.
972
973. Zero or more items.
974
975. The `)` suffix.
976
977====
978Input:
979
980----
981((aa bb cc) dd () ee) "leclerc"
982----
983
984Output:
985
986----
987aa bb cc dd ee 6c 65 63 6c 65 72 63 ┆ •••••leclerc
988----
989====
990
991====
992Input:
993
994----
995((aa bb cc) * 3 dd ee) * 5
996----
997
998Output:
999
1000----
1001aa bb cc aa bb cc aa bb cc dd ee aa bb cc aa bb
1002cc aa bb cc dd ee aa bb cc aa bb cc aa bb cc dd
1003ee aa bb cc aa bb cc aa bb cc dd ee aa bb cc aa
1004bb cc aa bb cc dd ee
1005----
1006====
1007
1008====
1009Input:
1010
1011----
1012{be}
1013(
1014 <str_beg> u16le"sébastien diaz" <str_end>
1015 {ICITTE - str_beg : 8}
1016 {(end - str_beg) * 5 : 24}
1017) * 3
1018<end>
1019----
1020
1021Output:
1022
1023----
102473 00 e9 00 62 00 61 00 73 00 74 00 69 00 65 00 ┆ s•••b•a•s•t•i•e•
10256e 00 20 00 64 00 69 00 61 00 7a 00 1c 00 01 e0 ┆ n• •d•i•a•z•••••
102673 00 e9 00 62 00 61 00 73 00 74 00 69 00 65 00 ┆ s•••b•a•s•t•i•e•
10276e 00 20 00 64 00 69 00 61 00 7a 00 1c 00 01 40 ┆ n• •d•i•a•z••••@
102873 00 e9 00 62 00 61 00 73 00 74 00 69 00 65 00 ┆ s•••b•a•s•t•i•e•
10296e 00 20 00 64 00 69 00 61 00 7a 00 1c 00 00 a0 ┆ n• •d•i•a•z•••••
1030----
1031====
1032
e57a18e1 1033=== Repetition block
71aaa3f7 1034
e57a18e1
PP
1035A _repetition block_ represents the bytes of one or more items repeated
1036a given number of times.
676f6189 1037
e57a18e1 1038A repetition block is:
71aaa3f7 1039
e57a18e1 1040. The `!repeat` or `!r` prefix.
71aaa3f7 1041
2adf4336
PP
1042. One of:
1043
1044** A positive integer (hexadecimal starting with `0x` or `0X` accepted)
1045 which is the number of times to repeat the previous item.
1046
1047** The ``pass:[{]`` prefix, a valid {py3} expression, and the
1048 ``pass:[}]`` suffix.
05f81895
PP
1049+
1050For a repetition at some source location{nbsp}__**L**__, this expression
1051may contain:
1052+
1053--
e57a18e1 1054* The name of any <<label,label>> defined before{nbsp}__**L**__.
05f81895 1055* The name of any <<variable-assignment,variable>> known
e57a18e1
PP
1056 at{nbsp}__**L**__ which doesn't, directly or indirectly, refer to a
1057 label defined after{nbsp}__**L**__.
05f81895
PP
1058--
1059+
e57a18e1
PP
1060The value of the special name `ICITTE` (`int` type) in this expression
1061is the <<cur-offset,current offset>> (before handling the items to
1062repeat).
1063
1064** A valid {py3} name.
1065+
1066For the name `__NAME__`, this is equivalent to the
1067`pass:[{]__NAME__pass:[}]` form above.
1068
1069. Zero or more items.
1070
1071. The `!end` suffix.
1072
1073You may also use a <<post-item-repetition,post-item repetition>> after
1074some items. The form ``!repeat{nbsp}__X__{nbsp}__ITEMS__{nbsp}!end``
1075is equivalent to ``(__ITEMS__){nbsp}pass:[*]{nbsp}__X__``.
71aaa3f7
PP
1076
1077====
1078Input:
1079
1080----
e57a18e1
PP
1081!repeat 0x100
1082 {end - ICITTE - 1 : 8}
1083!end
1084
1085<end>
71aaa3f7
PP
1086----
1087
1088Output:
1089
1090----
1091ff fe fd fc fb fa f9 f8 f7 f6 f5 f4 f3 f2 f1 f0 ┆ ••••••••••••••••
1092ef ee ed ec eb ea e9 e8 e7 e6 e5 e4 e3 e2 e1 e0 ┆ ••••••••••••••••
1093df de dd dc db da d9 d8 d7 d6 d5 d4 d3 d2 d1 d0 ┆ ••••••••••••••••
1094cf ce cd cc cb ca c9 c8 c7 c6 c5 c4 c3 c2 c1 c0 ┆ ••••••••••••••••
1095bf be bd bc bb ba b9 b8 b7 b6 b5 b4 b3 b2 b1 b0 ┆ ••••••••••••••••
1096af ae ad ac ab aa a9 a8 a7 a6 a5 a4 a3 a2 a1 a0 ┆ ••••••••••••••••
10979f 9e 9d 9c 9b 9a 99 98 97 96 95 94 93 92 91 90 ┆ ••••••••••••••••
10988f 8e 8d 8c 8b 8a 89 88 87 86 85 84 83 82 81 80 ┆ ••••••••••••••••
10997f 7e 7d 7c 7b 7a 79 78 77 76 75 74 73 72 71 70 ┆ •~}|{zyxwvutsrqp
11006f 6e 6d 6c 6b 6a 69 68 67 66 65 64 63 62 61 60 ┆ onmlkjihgfedcba`
11015f 5e 5d 5c 5b 5a 59 58 57 56 55 54 53 52 51 50 ┆ _^]\[ZYXWVUTSRQP
11024f 4e 4d 4c 4b 4a 49 48 47 46 45 44 43 42 41 40 ┆ ONMLKJIHGFEDCBA@
11033f 3e 3d 3c 3b 3a 39 38 37 36 35 34 33 32 31 30 ┆ ?>=<;:9876543210
11042f 2e 2d 2c 2b 2a 29 28 27 26 25 24 23 22 21 20 ┆ /.-,+*)('&%$#"!
11051f 1e 1d 1c 1b 1a 19 18 17 16 15 14 13 12 11 10 ┆ ••••••••••••••••
11060f 0e 0d 0c 0b 0a 09 08 07 06 05 04 03 02 01 00 ┆ ••••••••••••••••
1107----
1108====
1109
2adf4336
PP
1110====
1111Input:
1112
1113----
1114{times = 1}
e57a18e1 1115
2adf4336 1116aa bb cc dd
e57a18e1
PP
1117
1118!repeat 3
2adf4336 1119 <here>
e57a18e1
PP
1120
1121 !repeat {here + 1}
1122 ee ff
1123 !end
1124
1125 11 22 !repeat times 33 !end
1126
2adf4336 1127 {times = times + 1}
e57a18e1
PP
1128!end
1129
2adf4336
PP
1130"coucou!"
1131----
1132
1133Output:
1134
1135----
1136aa bb cc dd ee ff ee ff ee ff ee ff ee ff 11 22 ┆ •••••••••••••••"
113733 ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ 3•••••••••••••••
1138ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1139ff ee ff ee ff 11 22 33 33 ee ff ee ff ee ff ee ┆ ••••••"33•••••••
1140ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1141ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1142ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1143ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1144ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1145ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1146ff ee ff ee ff ee ff ee ff ee ff ee ff 11 22 33 ┆ ••••••••••••••"3
114733 33 63 6f 75 63 6f 75 21 ┆ 33coucou!
1148----
1149====
1150
1151====
e57a18e1
PP
1152This example shows how to use a repetition block as a conditional
1153section depending on some predefined variable.
2adf4336
PP
1154
1155Input:
1156
1157----
1158aa bb cc dd
e57a18e1
PP
1159
1160!repeat cond
1161 ee ff "meow mix" 00
1162!end
1163
2adf4336
PP
1164{be} {-1993:16}
1165----
1166
1167Output (`cond` is 0):
1168
1169----
1170aa bb cc dd f8 37
1171----
1172
1173Output (`cond` is 1):
1174
1175----
1176aa bb cc dd ee ff 6d 65 6f 77 20 6d 69 78 00 f8 ┆ ••••••meow mix••
117737 ┆ 7
1178----
1179====
1180
e57a18e1
PP
1181=== Post-item repetition
1182
1183A _post-item repetition_ represents the bytes of an item repeated a
1184given number of times.
1185
1186A post-item repetition is:
1187
1188. Any item except:
1189
1190** A <<current-byte-order-setting,current byte order setting>>.
1191** A <<current-offset-setting,current offset setting>>.
1192** A <<label,label>>.
1193** A <<offset-alignment,offset alignment>>.
1194** A <<variable-assignment,variable assignment>>.
1195** A <<repetition-block,repetition block>>.
1196
1197. The ``pass:[*]`` character.
1198
1199. One of:
1200
1201** A positive integer (hexadecimal starting with `0x` or `0X` accepted)
1202 which is the number of times to repeat the previous item.
1203
1204** The ``pass:[{]`` prefix, a valid {py3} expression, and the
1205 ``pass:[}]`` suffix.
1206+
1207For a repetition at some source location{nbsp}__**L**__, this expression
1208may contain:
1209+
1210--
1211* The name of any <<label,label>> defined before{nbsp}__**L**__ and
1212 which isn't part of its repeated item.
1213* The name of any <<variable-assignment,variable>> known
1214 at{nbsp}__**L**__, which isn't part of its repeated item, and which
1215 doesn't, directly or indirectly, refer to a label defined
1216 after{nbsp}__**L**__.
1217--
1218+
1219The value of the special name `ICITTE` (`int` type) in this expression
1220is the <<cur-offset,current offset>> (before handling the items to
1221repeat).
1222
1223** A valid {py3} name.
1224+
1225For the name `__NAME__`, this is equivalent to the
1226`pass:[{]__NAME__pass:[}]` form above.
1227
1228You may also use a <<repetition-block,repetition block>>. The form
1229``__ITEM__{nbsp}pass:[*]{nbsp}__X__`` is equivalent to
1230``!repeat{nbsp}__X__{nbsp}__ITEM__{nbsp}!end``.
1231
1232====
1233Input:
1234
1235----
1236{end - ICITTE - 1 : 8} * 0x100 <end>
1237----
1238
1239Output:
1240
1241----
1242ff fe fd fc fb fa f9 f8 f7 f6 f5 f4 f3 f2 f1 f0 ┆ ••••••••••••••••
1243ef ee ed ec eb ea e9 e8 e7 e6 e5 e4 e3 e2 e1 e0 ┆ ••••••••••••••••
1244df de dd dc db da d9 d8 d7 d6 d5 d4 d3 d2 d1 d0 ┆ ••••••••••••••••
1245cf ce cd cc cb ca c9 c8 c7 c6 c5 c4 c3 c2 c1 c0 ┆ ••••••••••••••••
1246bf be bd bc bb ba b9 b8 b7 b6 b5 b4 b3 b2 b1 b0 ┆ ••••••••••••••••
1247af ae ad ac ab aa a9 a8 a7 a6 a5 a4 a3 a2 a1 a0 ┆ ••••••••••••••••
12489f 9e 9d 9c 9b 9a 99 98 97 96 95 94 93 92 91 90 ┆ ••••••••••••••••
12498f 8e 8d 8c 8b 8a 89 88 87 86 85 84 83 82 81 80 ┆ ••••••••••••••••
12507f 7e 7d 7c 7b 7a 79 78 77 76 75 74 73 72 71 70 ┆ •~}|{zyxwvutsrqp
12516f 6e 6d 6c 6b 6a 69 68 67 66 65 64 63 62 61 60 ┆ onmlkjihgfedcba`
12525f 5e 5d 5c 5b 5a 59 58 57 56 55 54 53 52 51 50 ┆ _^]\[ZYXWVUTSRQP
12534f 4e 4d 4c 4b 4a 49 48 47 46 45 44 43 42 41 40 ┆ ONMLKJIHGFEDCBA@
12543f 3e 3d 3c 3b 3a 39 38 37 36 35 34 33 32 31 30 ┆ ?>=<;:9876543210
12552f 2e 2d 2c 2b 2a 29 28 27 26 25 24 23 22 21 20 ┆ /.-,+*)('&%$#"!
12561f 1e 1d 1c 1b 1a 19 18 17 16 15 14 13 12 11 10 ┆ ••••••••••••••••
12570f 0e 0d 0c 0b 0a 09 08 07 06 05 04 03 02 01 00 ┆ ••••••••••••••••
1258----
1259====
1260
1261====
1262Input:
1263
1264----
1265{times = 1}
1266aa bb cc dd
1267(
1268 <here>
1269 (ee ff) * {here + 1}
1270 11 22 33 * {times}
1271 {times = times + 1}
1272) * 3
1273"coucou!"
1274----
1275
1276Output:
1277
1278----
1279aa bb cc dd ee ff ee ff ee ff ee ff ee ff 11 22 ┆ •••••••••••••••"
128033 ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ 3•••••••••••••••
1281ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1282ff ee ff ee ff 11 22 33 33 ee ff ee ff ee ff ee ┆ ••••••"33•••••••
1283ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1284ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1285ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1286ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1287ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1288ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1289ff ee ff ee ff ee ff ee ff ee ff ee ff 11 22 33 ┆ ••••••••••••••"3
129033 33 63 6f 75 63 6f 75 21 ┆ 33coucou!
1291----
1292====
1293
71aaa3f7
PP
1294== Command-line tool
1295
1296If you <<install-normand,installed>> the `normand` package, then you
1297can use the `normand` command-line tool:
1298
1299----
1300$ normand <<< '"ma gang de malades"' | hexdump -C
1301----
1302
1303----
130400000000 6d 61 20 67 61 6e 67 20 64 65 20 6d 61 6c 61 64 |ma gang de malad|
130500000010 65 73 |es|
1306----
1307
1308If you copy the `normand.py` module to your own project, then you can
1309run the module itself:
1310
1311----
1312$ python3 -m normand <<< '"ma gang de malades"' | hexdump -C
1313----
1314
1315----
131600000000 6d 61 20 67 61 6e 67 20 64 65 20 6d 61 6c 61 64 |ma gang de malad|
131700000010 65 73 |es|
1318----
1319
1320Without a path argument, the `normand` tool reads from the standard
1321input.
1322
1323The `normand` tool prints the generated binary data to the standard
1324output.
1325
1326Various options control the initial <<state,state>> of the processor:
1327use the `--help` option to learn more.
1328
1329== {py3} API
1330
e57a18e1 1331The whole `normand` package/module public API is:
71aaa3f7
PP
1332
1333[source,python]
1334----
e57a18e1 1335# Byte order.
71aaa3f7
PP
1336class ByteOrder(enum.Enum):
1337 # Big endian.
1338 BE = ...
1339
1340 # Little endian.
1341 LE = ...
1342
1343
e57a18e1
PP
1344# Text location.
1345class TextLocation:
71aaa3f7
PP
1346 # Line number.
1347 @property
1348 def line_no(self) -> int:
1349 ...
1350
1351 # Column number.
1352 @property
1353 def col_no(self) -> int:
1354 ...
1355
1356
e57a18e1 1357# Parsing error.
71aaa3f7
PP
1358class ParseError(RuntimeError):
1359 # Source text location.
1360 @property
e57a18e1 1361 def text_loc(self) -> TextLocation:
71aaa3f7
PP
1362 ...
1363
1364
e57a18e1
PP
1365# Variables dictionary type (for type hints).
1366VariablesT = typing.Dict[str, typing.Union[int, float]]
1367
1368
1369# Labels dictionary type (for type hints).
1370LabelsT = typing.Dict[str, int]
1b8aa84a
PP
1371
1372
e57a18e1 1373# Parsing result.
71aaa3f7
PP
1374class ParseResult:
1375 # Generated data.
1376 @property
1377 def data(self) -> bytearray:
1378 ...
1379
1380 # Updated variable values.
1381 @property
1b8aa84a 1382 def variables(self) -> SymbolsT:
71aaa3f7
PP
1383 ...
1384
1385 # Updated main group label values.
1386 @property
1b8aa84a 1387 def labels(self) -> SymbolsT:
71aaa3f7
PP
1388 ...
1389
1390 # Final offset.
1391 @property
1392 def offset(self) -> int:
1393 ...
1394
1395 # Final byte order.
1396 @property
1b8aa84a 1397 def byte_order(self) -> typing.Optional[ByteOrder]:
71aaa3f7
PP
1398 ...
1399
1b8aa84a 1400
e57a18e1
PP
1401# Parses the `normand` input using the initial state defined by
1402# `init_variables`, `init_labels`, `init_offset`, and `init_byte_order`,
1403# and returns the corresponding parsing result.
71aaa3f7 1404def parse(normand: str,
1b8aa84a
PP
1405 init_variables: typing.Optional[SymbolsT] = None,
1406 init_labels: typing.Optional[SymbolsT] = None,
71aaa3f7
PP
1407 init_offset: int = 0,
1408 init_byte_order: typing.Optional[ByteOrder] = None) -> ParseResult:
1409 ...
1410----
1411
1412The `normand` parameter is the actual <<learn-normand,Normand input>>
1413while the other parameters control the initial <<state,state>>.
1414
1415The `parse()` function raises a `ParseError` instance should it fail to
1416parse the `normand` string for any reason.
bf8f3b38
PP
1417
1418== Development
1419
1420Normand is a https://python-poetry.org/[Poetry] project.
1421
1422To develop it, install it through Poetry and enter the virtual
1423environment:
1424
1425----
1426$ poetry install
1427$ poetry shell
1428$ normand <<< '"lol" * 10 0a'
1429----
1430
1431`normand.py` is processed by:
1432
1433* https://microsoft.github.io/pyright/[Pyright]
1434* https://github.com/psf/black[Black]
1435* https://pycqa.github.io/isort/[isort]
1436
1437=== Testing
1438
1439Use https://docs.pytest.org/[pytest] to test Normand once the package is
1440part of your virtual environment, for example:
1441
1442----
1443$ poetry install
1444$ poetry run pip3 install pytest
1445$ poetry run pytest
1446----
1447
1448The `pytest` project is currently not a development dependency in
1449`pyproject.toml` due to backward compatibiliy issues with
1450Python{nbsp}3.4.
1451
1452In the `tests` directory, each `*.nt` file is a test. The file name
1453prefix indicates what it's meant to test:
1454
1455`pass-`::
1456 Everything above the `---` line is the valid Normand input
1457 to test.
1458+
1459Everything below the `---` line is the expected data
1460(whitespace-separated hexadecimal bytes).
1461
1462`fail-`::
1463 Everything above the `---` line is the invalid Normand input
1464 to test.
1465+
1466Everything below the `---` line is the expected error message having
1467this form:
1468+
1469----
1470LINE:COL - MESSAGE
1471----
1472
1473=== Contributing
1474
1475Normand uses https://review.lttng.org/admin/repos/normand,general[Gerrit]
1476for code review.
1477
1478To report a bug, https://github.com/efficios/normand/issues/new[create a
1479GitHub issue].
This page took 0.077988 seconds and 4 git commands to generate.