README.adoc: "an current offset" -> "a current offset"
[normand.git] / README.adoc
CommitLineData
bb2f9e9c
PP
1// Show ToC at a specific location for a GitHub rendering
2ifdef::env-github[]
3:toc: macro
4endif::env-github[]
5
6ifndef::env-github[]
71aaa3f7 7:toc: left
bb2f9e9c
PP
8endif::env-github[]
9
10// This is to mimic what GitHub does so that anchors work in an offline
11// rendering too.
12:idprefix:
13:idseparator: -
71aaa3f7 14
bb2f9e9c 15// Other attributes
71aaa3f7
PP
16:py3: Python{nbsp}3
17
bb2f9e9c
PP
18= Normand
19Philippe Proulx
20
df0f8552
PP
21image::normand-logo.png[]
22
71aaa3f7
PP
23[.normal]
24image:https://img.shields.io/pypi/v/normand.svg?label=Latest%20version[link="https://pypi.python.org/pypi/normand"]
25
26[.lead]
27_**Normand**_ is a text-to-binary processor with its own language.
28
29This package offers both a portable {py3} module and a command-line
30tool.
31
676f6189 32WARNING: This version of Normand is 0.7, meaning both the Normand
71aaa3f7
PP
33language and the module/CLI interface aren't stable.
34
bb2f9e9c
PP
35ifdef::env-github[]
36// ToC location for a GitHub rendering
37toc::[]
38endif::env-github[]
39
71aaa3f7
PP
40== Introduction
41
42The purpose of Normand is to consume human-readable text representing
43bytes and to produce the corresponding binary data.
44
45.Simple bytes input.
46====
47Consider the following Normand input:
48
49----
504f 55 32 bb $167 fe %10100111 a9 $-32
51----
52
53The generated nine bytes are:
54
55----
564f 55 32 bb a7 fe a7 a9 e0
57----
58====
59
60As you can see in the last example, the fundamental unit of the Normand
61language is the _byte_. The order in which you list bytes will be the
62order of the generated data.
63
64The Normand language is more than simple lists of bytes, though. Its
65main features are:
66
67Comments, including a bunch of insignificant symbols which may improve readability::
68+
69Input:
70+
71----
72ff bb %1101:0010 # This is a comment
7378 29 af $192 # This too # 99 $-80
74fe80::6257:18ff:fea3:4229
7560:57:18:a3:42:29
7610839636-5d65-4a68-8e6a-21608ddf7258
77----
78+
79Output:
80+
81----
82ff bb d2 78 29 af c0 99 b0 fe 80 62 57 18 ff fe
83a3 42 29 60 57 18 a3 42 29 10 83 96 36 5d 65 4a
8468 8e 6a 21 60 8d df 72 58
85----
86
87Hexadecimal, decimal, and binary byte constants::
88+
89Input:
90+
91----
92aa bb $247 $-89 %0011_0010 %11.01= 10/10
93----
94+
95Output:
96+
97----
98aa bb f7 a7 32 da
99----
100
101UTF-8, UTF-16, and UTF-32 literal strings::
102+
103Input:
104+
105----
106"hello world!" 00
107u16le"stress\nverdict 🤣"
108----
109+
110Output:
111+
112----
11368 65 6c 6c 6f 20 77 6f 72 6c 64 21 00 73 00 74 ┆ hello world!•s•t
11400 72 00 65 00 73 00 73 00 0a 00 76 00 65 00 72 ┆ •r•e•s•s•••v•e•r
11500 64 00 69 00 63 00 74 00 20 00 3e d8 23 dd ┆ •d•i•c•t• •>•#•
116----
117
118Labels: special variables holding the offset where they're defined::
119+
120----
121<beg> b2 52 e3 bc 91 05
122$100 $50 <chair> 33 9f fe
12325 e9 89 8a <end>
124----
125
126Variables::
127+
128----
1295e 65 {tower = 47} c6 7f f2 c4
13044 {hurl = tower - 14} b5 {tower = hurl} 26 2d
131----
132+
133The value of a variable assignment is the evaluation of a valid {py3}
134expression which may include label and variable names.
135
269f6eb3 136Fixed-length number with a given length (8{nbsp}bits to 64{nbsp}bits) and byte order::
71aaa3f7
PP
137+
138Input:
139+
140----
141{strength = 4}
142{be} 67 <lbl> 44 $178 {(end - lbl) * 8 + strength : 16} $99 <end>
143{le} {-1993 : 32}
269f6eb3 144{-3.141593 : 64}
71aaa3f7
PP
145----
146+
147Output:
148+
149----
269f6eb3
PP
15067 44 b2 00 2c 63 37 f8 ff ff 7f bd c2 82 fb 21
15109 c0
71aaa3f7
PP
152----
153+
269f6eb3 154The encoded number is the evaluation of a valid {py3} expression which
05f81895
PP
155may include label and variable names.
156
157https://en.wikipedia.org/wiki/LEB128[LEB128] integer::
158+
159Input:
160+
161----
162aa bb cc {-1993 : sleb128} <meow> dd ee ff
163{meow * 199 : uleb128}
164----
165+
166Output:
167+
168----
169aa bb cc b7 70 dd ee ff e3 07
170----
171+
172The encoded integer is the evaluation of a valid {py3} expression which
71aaa3f7
PP
173may include label and variable names.
174
175Repetition::
176+
177Input:
178+
179----
2adf4336 180aa bb * 5 cc <zoom> "yeah\0" * {zoom * 3}
71aaa3f7
PP
181----
182+
183Output:
184+
185----
2adf4336
PP
186aa bb bb bb bb bb cc 79 65 61 68 00 79 65 61 68 ┆ •••••••yeah•yeah
18700 79 65 61 68 00 79 65 61 68 00 79 65 61 68 00 ┆ •yeah•yeah•yeah•
18879 65 61 68 00 79 65 61 68 00 79 65 61 68 00 79 ┆ yeah•yeah•yeah•y
18965 61 68 00 79 65 61 68 00 79 65 61 68 00 79 65 ┆ eah•yeah•yeah•ye
19061 68 00 79 65 61 68 00 79 65 61 68 00 79 65 61 ┆ ah•yeah•yeah•yea
19168 00 79 65 61 68 00 79 65 61 68 00 79 65 61 68 ┆ h•yeah•yeah•yeah
71aaa3f7 19200 79 65 61 68 00 79 65 61 68 00 79 65 61 68 00 ┆ •yeah•yeah•yeah•
71aaa3f7
PP
193----
194
676f6189
PP
195Alignment::
196+
197Input:
198+
199----
200{be}
201
202 {199:32}
203@64 {43:64}
204@16 {-123:16}
205@32~255 {5584:32}
206----
207+
208Output:
209+
210----
21100 00 00 c7 00 00 00 00 00 00 00 00 00 00 00 2b
212ff 85 ff ff 00 00 15 d0
213----
71aaa3f7
PP
214
215Multilevel grouping::
216+
217Input:
218+
219----
220ff ((aa bb "zoom" cc) * 5) * 3 $-34 * 4
221----
222+
223Output:
224+
225----
226ff aa bb 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa ┆ •••zoom•••zoom••
227bb 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa bb 7a ┆ •zoom•••zoom•••z
2286f 6f 6d cc aa bb 7a 6f 6f 6d cc aa bb 7a 6f 6f ┆ oom•••zoom•••zoo
2296d cc aa bb 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc ┆ m•••zoom•••zoom•
230aa bb 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa bb ┆ ••zoom•••zoom•••
2317a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa bb 7a 6f ┆ zoom•••zoom•••zo
2326f 6d cc aa bb 7a 6f 6f 6d cc de de de de ┆ om•••zoom•••••
233----
234
235Precise error reporting::
236+
237----
238/tmp/meow.normand:10:24 - Expecting a bit (`0` or `1`).
239----
240+
241----
242/tmp/meow.normand:32:6 - Unexpected character `k`.
243----
244+
245----
2adf4336 246/tmp/meow.normand:24:19 - Illegal (unknown or unreachable) variable/label name `meow` in expression `(meow - 45) // 8`; the legal names are {`mix`, `zoom`}.
71aaa3f7
PP
247----
248+
249----
250/tmp/meow.normand:18:9 - Value 315 is outside the 8-bit range when evaluating expression `end - ICITTE` at byte offset 45.
251----
252
253You can use Normand to track data source files in your favorite VCS
254instead of raw binary files. The binary files that Normand generates can
255be used to test file format decoding, including malformatted data, for
256example, as well as for education.
257
258See <<learn-normand>> to explore all the Normand features.
259
260== Install Normand
261
262Normand requires Python ≥ 3.4.
263
264To install Normand:
265
266----
267$ python3 -m pip install --user normand
268----
269
270See
271https://packaging.python.org/en/latest/tutorials/installing-packages/#installing-to-the-user-site[Installing to the User Site]
272to learn more about a user site installation.
273
274[NOTE]
275====
276Normand has a single module file, `normand.py`, which you can copy as is
af3cf417 277to your project to use it (both the <<python3-api,`normand.parse()`>>
71aaa3f7
PP
278function and the <<command-line-tool,command-line tool>>).
279
280`normand.py` has _no external dependencies_, but if you're using
281Python{nbsp}3.4, you'll need a local copy of the standard `typing`
282module.
283====
284
285== Learn Normand
286
287A Normand text input is a sequence of items which represent a sequence
288of raw bytes.
289
290[[state]] During the processing of items to data, Normand relies on a
291current state:
292
293[%header%autowidth]
294|===
af3cf417 295|State variable |Description |Initial value: <<python3-api,{py3} API>> |Initial value: <<command-line-tool,CLI>>
71aaa3f7
PP
296
297|[[cur-offset]] Current offset
298|
05f81895 299The current offset has an effect on the value of <<label,labels>> and of
269f6eb3
PP
300the special `ICITTE` name in <<fixed-length-number,fixed-length
301number>>, <<leb-128-integer,LEB128 integer>>, and
71aaa3f7
PP
302<<variable-assignment,variable assignment>> expression evaluation.
303
304Each generated byte increments the current offset.
305
306A <<current-offset-setting,current offset setting>> may change the
676f6189
PP
307current offset without generating data.
308
309An <<current-offset-alignment,current offset alignment>> generates
310padding bytes to make the current offset satisfy a given alignment.
71aaa3f7
PP
311|`init_offset` parameter of the `parse()` function.
312|`--offset` option.
313
314|[[cur-bo]] Current byte order
315|
05f81895 316The current byte order has an effect on the encoding of
269f6eb3 317<<fixed-length-number,fixed-length numbers>>.
71aaa3f7
PP
318
319A <<current-byte-order-setting,current byte order setting>> may change
320the current byte order.
321|`init_byte_order` parameter of the `parse()` function.
322|`--byte-order` option.
323
324|<<label,Labels>>
325|Mapping of label names to integral values.
326|`init_labels` parameter of the `parse()` function.
327|One or more `--label` options.
328
329|<<variable-assignment,Variables>>
330|Mapping of variable names to integral values.
331|`init_variables` parameter of the `parse()` function.
332|One or more `--var` options.
333|===
334
335The available items are:
336
337* A <<byte-constant,constant integer>> representing a single byte.
338
339* A <<literal-string,literal string>> representing a sequence of bytes
340 encoding UTF-8, UTF-16, or UTF-32 data.
341
342* A <<current-byte-order-setting,current byte order setting>> (big or
343 little endian).
344
269f6eb3
PP
345* A <<fixed-length-number,fixed-length number>> (integer or
346 floating point) using the <<cur-bo,current byte order>> and of which
347 the value is the result of a {py3} expression.
05f81895
PP
348
349* An <<leb128-integer,LEB128 integer>> of which the value is the result
350 of a {py3} expression.
71aaa3f7
PP
351
352* A <<current-offset-setting,current offset setting>>.
353
676f6189
PP
354* A <<current-offset-alignment,current offset alignment>>.
355
71aaa3f7
PP
356* A <<label,label>>, that is, a named constant holding the current
357 offset.
358+
359This is similar to an assembly label.
360
361* A <<variable-assignment,variable assignment>> associating a name to
362 the integral result of an evaluated {py3} expression.
363
364* A <<group,group>>, that is, a scoped sequence of items.
365
366Moreover, you can <<repetition,repeat>> any item above, except an offset
2adf4336
PP
367or a label, a given fixed or variable number of times. This is called a
368repetition.
71aaa3f7
PP
369
370A Normand comment may exist:
371
372* Between items, possibly within a group.
373* Between the nibbles of a constant hexadecimal byte.
374* Between the bits of a constant binary byte.
375* Between the last item and the ``pass:[*]`` character of a repetition,
2adf4336
PP
376 and between that ``pass:[*]`` character and the following number
377 or expression.
71aaa3f7
PP
378
379A comment is anything between two ``pass:[#]`` characters on the same
380line, or from ``pass:[#]`` until the end of the line. Whitespaces and
381the following symbol characters are also considered comments where a
382comment may exist:
383
384----
676f6189 385! / \ ? & : ; . , + [ ] _ = | -
71aaa3f7
PP
386----
387
388The latter serve to improve readability so that you may write, for
389example, a MAC address or a UUID as is.
390
391You can test the examples of this section with the `normand`
392<<command-line-tool,command-line tool>> as such:
393
394----
395$ normand file | hexdump -C
396----
397
398where `file` is the name of a file containing the Normand input.
399
400=== Byte constant
401
402A _byte constant_ represents a single byte.
403
404A byte constant is:
405
406Hexadecimal form::
407 Two consecutive hexits.
408
409Decimal form::
410 A decimal number after the `$` prefix.
411
412Binary form::
413 Eight bits after the `%` prefix.
414
415====
416Input:
417
418----
419ab cd [3d 8F] CC
420----
421
422Output:
423
424----
425ab cd 3d 8f cc
426----
427====
428
429====
430Input:
431
432----
433$192 %1100/0011 $ -77
434----
435
436Output:
437
438----
439c0 c3 b3
440----
441====
442
443====
444Input:
445
446----
44758f64689-6316-4d55-8a1a-04cada366172
448fe80::6257:18ff:fea3:4229
449----
450
451Output:
452
453----
45458 f6 46 89 63 16 4d 55 8a 1a 04 ca da 36 61 72 ┆ X•F•c•MU•••••6ar
455fe 80 62 57 18 ff fe a3 42 29 ┆ ••bW••••B)
456----
457====
458
459====
460Input:
461
462----
463%01110011 %01100001 %01101100 %01110101 %01110100
464----
465
466Output:
467
468----
46973 61 6c 75 74 ┆ salut
470----
471====
472
473=== Literal string
474
475A _literal string_ represents the UTF-8-, UTF-16-, or UTF-32-encoded
476bytes of a string.
477
478The string to encode isn't implicitly null-terminated: use `\0` at the
479end of the string to add a null character.
480
481A literal string is:
482
483. **Optional**: one of the following encodings instead of UTF-8:
484+
485--
486[horizontal]
487`u16be`:: UTF-16BE.
488`u16le`:: UTF-16LE.
489`u32be`:: UTF-32BE.
490`u32le`:: UTF-32LE.
491--
492
493. The ``pass:["]`` prefix.
494
495. A sequence of zero or more characters, possibly containing escape
496 sequences.
497+
498An escape sequence is the ``\`` character followed by one of:
499+
500--
501[horizontal]
502`0`:: Null (U+0000)
503`a`:: Alert (U+0007)
504`b`:: Backspace (U+0008)
505`e`:: Escape (U+001B)
506`f`:: Form feed (U+000C)
507`n`:: End of line (U+000A)
508`r`:: Carriage return (U+000D)
509`t`:: Character tabulation (U+0009)
510`v`:: Line tabulation (U+000B)
511``\``:: Reverse solidus (U+005C)
512``pass:["]``:: Quotation mark (U+0022)
513--
514
515. The ``pass:["]`` suffix.
516
517====
518Input:
519
520----
521"coucou tout le monde!"
522----
523
524Output:
525
526----
52763 6f 75 63 6f 75 20 74 6f 75 74 20 6c 65 20 6d ┆ coucou tout le m
5286f 6e 64 65 21 ┆ onde!
529----
530====
531
532====
533Input:
534
535----
536u16le"I am not young enough to know everything."
537----
538
539Output:
540
541----
54249 00 20 00 61 00 6d 00 20 00 6e 00 6f 00 74 00 ┆ I• •a•m• •n•o•t•
54320 00 79 00 6f 00 75 00 6e 00 67 00 20 00 65 00 ┆ •y•o•u•n•g• •e•
5446e 00 6f 00 75 00 67 00 68 00 20 00 74 00 6f 00 ┆ n•o•u•g•h• •t•o•
54520 00 6b 00 6e 00 6f 00 77 00 20 00 65 00 76 00 ┆ •k•n•o•w• •e•v•
54665 00 72 00 79 00 74 00 68 00 69 00 6e 00 67 00 ┆ e•r•y•t•h•i•n•g•
5472e 00 ┆ .•
548----
549====
550
551====
552Input:
553
554----
555u32be "\"illusion is the first\nof all pleasures\" 🦉"
556----
557
558Output:
559
560----
56100 00 00 22 00 00 00 69 00 00 00 6c 00 00 00 6c ┆ •••"•••i•••l•••l
56200 00 00 75 00 00 00 73 00 00 00 69 00 00 00 6f ┆ •••u•••s•••i•••o
56300 00 00 6e 00 00 00 20 00 00 00 69 00 00 00 73 ┆ •••n••• •••i•••s
56400 00 00 20 00 00 00 74 00 00 00 68 00 00 00 65 ┆ ••• •••t•••h•••e
56500 00 00 20 00 00 00 66 00 00 00 69 00 00 00 72 ┆ ••• •••f•••i•••r
56600 00 00 73 00 00 00 74 00 00 00 0a 00 00 00 6f ┆ •••s•••t•••••••o
56700 00 00 66 00 00 00 20 00 00 00 61 00 00 00 6c ┆ •••f••• •••a•••l
56800 00 00 6c 00 00 00 20 00 00 00 70 00 00 00 6c ┆ •••l••• •••p•••l
56900 00 00 65 00 00 00 61 00 00 00 73 00 00 00 75 ┆ •••e•••a•••s•••u
57000 00 00 72 00 00 00 65 00 00 00 73 00 00 00 22 ┆ •••r•••e•••s•••"
57100 00 00 20 00 01 f9 89 ┆ ••• ••••
572----
573====
574
575=== Current byte order setting
576
577This special item sets the <<cur-bo,_current byte order_>>.
578
579The two accepted forms are:
580
581[horizontal]
582``pass:[{be}]``:: Set the current byte order to big endian.
583``pass:[{le}]``:: Set the current byte order to little endian.
584
269f6eb3 585=== Fixed-length number
71aaa3f7 586
269f6eb3
PP
587A _fixed-length number_ represents a fixed number of bytes encoding
588either:
589
590* An unsigned or signed integer (two's complement).
591+
592The available lengths are 8, 16, 24, 32, 40, 48, 56, and 64.
593
594* A floating point number
595 ([IEEE{nbsp}754-2008[https://standards.ieee.org/standard/754-2008.html]).
596+
597The available length are 32 (_binary32_) and 64 (_binary64_).
71aaa3f7 598
269f6eb3
PP
599The value is the result of evaluating a {py3} expression using the
600<<cur-bo,current byte order>>.
601
602A fixed-length number is:
71aaa3f7
PP
603
604. The ``pass:[{]`` prefix.
605
606. A valid {py3} expression.
05f81895 607+
269f6eb3 608For a fixed-length number at some source location{nbsp}__**L**__, this
05f81895
PP
609expression may contain the name of any accessible <<label,label>> (not
610within a nested group), including the name of a label defined
611after{nbsp}__**L**__, as well as the name of any
612<<variable-assignment,variable>> known at{nbsp}__**L**__.
613+
269f6eb3
PP
614The value of the special name `ICITTE` (`int` type) in this expression
615is the <<cur-offset,current offset>> (before encoding the number).
71aaa3f7
PP
616
617. The `:` character.
618
269f6eb3
PP
619. An encoding length in bits amongst:
620+
621--
622The expression evaluates to an `int` value::
623 `8`, `16`, `24`, `32`, `40`, `48`, `56`, and `64`.
624
625The expression evaluates to a `float` value::
626 `32` and `64`.
627--
71aaa3f7
PP
628
629. The `}` suffix.
630
631====
632Input:
633
634----
635{le} {345:16}
636{be} {-0xabcd:32}
637----
638
639Output:
640
641----
64259 01 ff ff 54 33
643----
644====
645
646====
647Input:
648
649----
650{be}
651
652# String length in bits
653{8 * (str_end - str_beg) : 16}
654
655# String
656<str_beg>
657 "hello world!"
658<str_end>
659----
660
661Output:
662
663----
66400 60 68 65 6c 6c 6f 20 77 6f 72 6c 64 21 ┆ •`hello world!
665----
666====
667
668====
669Input:
670
671----
672{20 - ICITTE : 8} * 10
673----
674
675Output:
676
677----
67814 13 12 11 10 0f 0e 0d 0c 0b
679----
680====
681
269f6eb3
PP
682====
683Input:
684
685----
686{le}
687{2 * 0.0529 : 32}
688----
689
690Output:
691
692----
693ac ad d8 3d
694----
695====
696
05f81895
PP
697=== LEB128 integer
698
699An _LEB128 integer_ represents a variable number of bytes encoding an
700unsigned or signed integer which is the result of evaluating a {py3}
701expression following the https://en.wikipedia.org/wiki/LEB128[LEB128]
702format.
703
704An LEB128 integer is:
705
706. The ``pass:[{]`` prefix.
707
708. A valid {py3} expression.
709+
710For an LEB128 integer at some source location{nbsp}__**L**__, this
711expression may contain:
712+
713--
714* The name of any <<label,label>> defined before{nbsp}__**L**__.
715* The name of any <<variable-assignment,variable>> known at{nbsp}__**L**__
716 which doesn't, directly or indirectly, refer to a label
717 defined after{nbsp}__**L**__.
718--
719+
269f6eb3
PP
720The value of the special name `ICITTE` (`int` type) in this expression
721is the <<cur-offset,current offset>> (before encoding the integer).
05f81895
PP
722
723. The `:` character.
724
725. One of:
726+
727--
728[horizontal]
729`uleb128`:: Use the unsigned LEB128 format.
730`sleb128`:: Use the signed LEB128 format.
731--
732
733. The `}` suffix.
734
735====
736Input:
737
738----
739{624485 : uleb128}
740----
741
742Output:
743
744----
745e5 8e 26
746----
747====
748
749====
750Input:
751
752----
753aa bb cc dd
754<meow>
755ee ff
756{-981238311 + (meow * -23) : sleb128}
757"hello"
758----
759
c2b79cf6
PP
760Output:
761
05f81895
PP
762----
763aa bb cc dd ee ff fd fa 8d ac 7c 68 65 6c 6c 6f ┆ ••••••••••|hello
764----
765====
766
71aaa3f7
PP
767=== Current offset setting
768
769This special item sets the <<cur-offset,_current offset_>>.
770
771A current offset setting is:
772
773. The `<` prefix.
774
775. A positive integer (hexadecimal starting with `0x` or `0X` accepted)
776 which is the new current offset.
777
778. The `>` suffix.
779
780====
781Input:
782
783----
784 {ICITTE : 8} * 8
785<0x61> {ICITTE : 8} * 8
786----
787
788Output:
789
790----
79100 01 02 03 04 05 06 07 61 62 63 64 65 66 67 68 ┆ ••••••••abcdefgh
792----
793====
794
795====
796Input:
797
798----
799aa bb cc dd <meow> ee ff
800<12> 11 22 33 <mix> 44 55
801{meow : 8} {mix : 8}
802----
803
804Output:
805
806----
807aa bb cc dd ee ff 11 22 33 44 55 04 0f ┆ •••••••"3DU••
808----
809====
810
676f6189
PP
811=== Current offset alignment
812
00deb9fa 813A _current offset alignment_ represents zero or more padding bytes to
676f6189
PP
814make the <<cur-offset,current offset>> meet a given
815https://en.wikipedia.org/wiki/Data_structure_alignment[alignment] value.
816
817More specifically, for an alignment value of{nbsp}__**N**__{nbsp}bits,
818a current offset alignment represents the required padding bytes until
819the current offset is a multiple of __**N**__{nbsp}/{nbsp}8.
820
821A current offset alignment is:
822
823. The `@` prefix.
824
825. A positive integer (hexadecimal starting with `0x` or `0X` accepted)
826 which is the alignment value in _bits_.
827+
828This value must be greater than zero and a multiple of{nbsp}8.
829
830. **Optional**:
831+
832--
833. The ``pass:[~]`` prefix.
834. A positive integer (hexadecimal starting with `0x` or `0X` accepted)
835 which is the value of the byte to use as padding to align the
836 <<cur-offset,current offset>>.
837--
838+
839Without this section, the padding byte value is zero.
840
841====
842Input:
843
844----
84511 22 (@32 aa bb cc) * 3
846----
847
848Output:
849
850----
85111 22 00 00 aa bb cc 00 aa bb cc 00 aa bb cc
852----
853====
854
855====
856Input:
857
858----
859{le}
86077 88
861@32~0xcc {-893.5:32}
862@128~0x55 "meow"
863----
864
865Output:
866
867----
86877 88 cc cc 00 60 5f c4 55 55 55 55 55 55 55 55 ┆ w••••`_•UUUUUUUU
8696d 65 6f 77 ┆ meow
870----
871====
872
873====
874Input:
875
876----
877aa bb cc <29> @64~255 "zoom"
878----
879
880Output:
881
882----
883aa bb cc ff ff ff 7a 6f 6f 6d ┆ ••••••zoom
884----
885====
886
71aaa3f7
PP
887=== Label
888
889A _label_ associates a name to the <<cur-offset,current offset>>.
890
891All the labels of a whole Normand input must have unique names.
892
05f81895 893A label must not share the name of a <<variable-assignment,variable>>
71aaa3f7
PP
894name.
895
71aaa3f7
PP
896A label is:
897
898. The `<` prefix.
899
05f81895 900. A valid {py3} name which is not `ICITTE` (see
269f6eb3 901 <<fixed-length-number>>, <<leb128-integer>>, and
05f81895 902 <<variable-assignment>> to learn more).
71aaa3f7
PP
903
904. The `>` suffix.
905
906=== Variable assignment
907
908A _variable assignment_ associates a name to the integral result of an
909evaluated {py3} expression.
910
05f81895 911A variable assignment is:
71aaa3f7
PP
912
913. The ``pass:[{]`` prefix.
914
05f81895 915. A valid {py3} name which is not `ICITTE` (see
269f6eb3 916 <<fixed-length-number>>, <<leb128-integer>>, and
05f81895 917 <<variable-assignment>> to learn more).
71aaa3f7
PP
918
919. The `=` character.
920
921. A valid {py3} expression.
05f81895
PP
922+
923For a variable assignment at some source location{nbsp}__**L**__, this
924expression may contain the name of any accessible <<label,label>> (not
925within a nested group), including the name of a label defined
926after{nbsp}__**L**__, as well as the name of any
927<<variable-assignment,variable>> known at{nbsp}__**L**__.
928+
269f6eb3
PP
929The value of the special name `ICITTE` (`int` type) in this expression
930is the <<cur-offset,current offset>>.
71aaa3f7
PP
931
932. The `}` suffix.
933
934====
935Input:
936
937----
938{mix = 101} {le}
939{meow = 42} 11 22 {meow:8} 33 {meow = ICITTE + 17}
940"yooo" {meow + mix : 16}
941----
942
943Output:
944
945----
94611 22 2a 33 79 6f 6f 6f 7a 00 ┆ •"*3yoooz•
947----
948====
949
950=== Group
951
952A _group_ is a scoped sequence of items.
953
954The <<label,labels>> within a group aren't visible outside of it.
955
956The main purpose of a group is to <<repetition,repeat>> more than a
957single item.
958
959A group is:
960
961. The `(` prefix.
962
963. Zero or more items.
964
965. The `)` suffix.
966
967====
968Input:
969
970----
971((aa bb cc) dd () ee) "leclerc"
972----
973
974Output:
975
976----
977aa bb cc dd ee 6c 65 63 6c 65 72 63 ┆ •••••leclerc
978----
979====
980
981====
982Input:
983
984----
985((aa bb cc) * 3 dd ee) * 5
986----
987
988Output:
989
990----
991aa bb cc aa bb cc aa bb cc dd ee aa bb cc aa bb
992cc aa bb cc dd ee aa bb cc aa bb cc aa bb cc dd
993ee aa bb cc aa bb cc aa bb cc dd ee aa bb cc aa
994bb cc aa bb cc dd ee
995----
996====
997
998====
999Input:
1000
1001----
1002{be}
1003(
1004 <str_beg> u16le"sébastien diaz" <str_end>
1005 {ICITTE - str_beg : 8}
1006 {(end - str_beg) * 5 : 24}
1007) * 3
1008<end>
1009----
1010
1011Output:
1012
1013----
101473 00 e9 00 62 00 61 00 73 00 74 00 69 00 65 00 ┆ s•••b•a•s•t•i•e•
10156e 00 20 00 64 00 69 00 61 00 7a 00 1c 00 01 e0 ┆ n• •d•i•a•z•••••
101673 00 e9 00 62 00 61 00 73 00 74 00 69 00 65 00 ┆ s•••b•a•s•t•i•e•
10176e 00 20 00 64 00 69 00 61 00 7a 00 1c 00 01 40 ┆ n• •d•i•a•z••••@
101873 00 e9 00 62 00 61 00 73 00 74 00 69 00 65 00 ┆ s•••b•a•s•t•i•e•
10196e 00 20 00 64 00 69 00 61 00 7a 00 1c 00 00 a0 ┆ n• •d•i•a•z•••••
1020----
1021====
1022
1023=== Repetition
1024
1025A _repetition_ represents the bytes of an item repeated a given number
1026of times.
1027
1028A repetition is:
1029
676f6189
PP
1030. Any item except:
1031
1032** A <<current-byte-order-setting,current byte order setting>>.
1033** A <<current-offset-setting,current offset setting>>.
1034** A <<label,label>>.
1035** A <<offset-alignment,offset alignment>>.
1036** A <<variable-assignment,variable assignment>>.
71aaa3f7
PP
1037
1038. The ``pass:[*]`` character.
1039
2adf4336
PP
1040. One of:
1041
1042** A positive integer (hexadecimal starting with `0x` or `0X` accepted)
1043 which is the number of times to repeat the previous item.
1044
1045** The ``pass:[{]`` prefix, a valid {py3} expression, and the
1046 ``pass:[}]`` suffix.
05f81895
PP
1047+
1048For a repetition at some source location{nbsp}__**L**__, this expression
1049may contain:
1050+
1051--
1052* The name of any <<label,label>> defined before{nbsp}__**L**__ and
1053 which isn't part of its repeated item.
1054* The name of any <<variable-assignment,variable>> known
1055 at{nbsp}__**L**__, which isn't part of its repeated item, and which
1056 doesn't, directly or indirectly, refer to a label defined
1057 after{nbsp}__**L**__.
1058--
1059+
1060This expression must not contain the special name `ICITTE`.
71aaa3f7
PP
1061
1062====
1063Input:
1064
1065----
1066{end - ICITTE - 1 : 8} * 0x100 <end>
1067----
1068
1069Output:
1070
1071----
1072ff fe fd fc fb fa f9 f8 f7 f6 f5 f4 f3 f2 f1 f0 ┆ ••••••••••••••••
1073ef ee ed ec eb ea e9 e8 e7 e6 e5 e4 e3 e2 e1 e0 ┆ ••••••••••••••••
1074df de dd dc db da d9 d8 d7 d6 d5 d4 d3 d2 d1 d0 ┆ ••••••••••••••••
1075cf ce cd cc cb ca c9 c8 c7 c6 c5 c4 c3 c2 c1 c0 ┆ ••••••••••••••••
1076bf be bd bc bb ba b9 b8 b7 b6 b5 b4 b3 b2 b1 b0 ┆ ••••••••••••••••
1077af ae ad ac ab aa a9 a8 a7 a6 a5 a4 a3 a2 a1 a0 ┆ ••••••••••••••••
10789f 9e 9d 9c 9b 9a 99 98 97 96 95 94 93 92 91 90 ┆ ••••••••••••••••
10798f 8e 8d 8c 8b 8a 89 88 87 86 85 84 83 82 81 80 ┆ ••••••••••••••••
10807f 7e 7d 7c 7b 7a 79 78 77 76 75 74 73 72 71 70 ┆ •~}|{zyxwvutsrqp
10816f 6e 6d 6c 6b 6a 69 68 67 66 65 64 63 62 61 60 ┆ onmlkjihgfedcba`
10825f 5e 5d 5c 5b 5a 59 58 57 56 55 54 53 52 51 50 ┆ _^]\[ZYXWVUTSRQP
10834f 4e 4d 4c 4b 4a 49 48 47 46 45 44 43 42 41 40 ┆ ONMLKJIHGFEDCBA@
10843f 3e 3d 3c 3b 3a 39 38 37 36 35 34 33 32 31 30 ┆ ?>=<;:9876543210
10852f 2e 2d 2c 2b 2a 29 28 27 26 25 24 23 22 21 20 ┆ /.-,+*)('&%$#"!
10861f 1e 1d 1c 1b 1a 19 18 17 16 15 14 13 12 11 10 ┆ ••••••••••••••••
10870f 0e 0d 0c 0b 0a 09 08 07 06 05 04 03 02 01 00 ┆ ••••••••••••••••
1088----
1089====
1090
2adf4336
PP
1091====
1092Input:
1093
1094----
1095{times = 1}
1096aa bb cc dd
1097(
1098 <here>
1099 (ee ff) * {here + 1}
1100 11 22 33 * {times}
1101 {times = times + 1}
1102) * 3
1103"coucou!"
1104----
1105
1106Output:
1107
1108----
1109aa bb cc dd ee ff ee ff ee ff ee ff ee ff 11 22 ┆ •••••••••••••••"
111033 ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ 3•••••••••••••••
1111ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1112ff ee ff ee ff 11 22 33 33 ee ff ee ff ee ff ee ┆ ••••••"33•••••••
1113ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1114ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1115ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1116ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1117ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1118ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ ••••••••••••••••
1119ff ee ff ee ff ee ff ee ff ee ff ee ff 11 22 33 ┆ ••••••••••••••"3
112033 33 63 6f 75 63 6f 75 21 ┆ 33coucou!
1121----
1122====
1123
1124====
1125This example shows how to use a repetition as a conditional section
1126depending on some predefined variable.
1127
1128Input:
1129
1130----
1131aa bb cc dd
1132(ee ff "meow mix" 00) * {cond}
1133{be} {-1993:16}
1134----
1135
1136Output (`cond` is 0):
1137
1138----
1139aa bb cc dd f8 37
1140----
1141
1142Output (`cond` is 1):
1143
1144----
1145aa bb cc dd ee ff 6d 65 6f 77 20 6d 69 78 00 f8 ┆ ••••••meow mix••
114637 ┆ 7
1147----
1148====
1149
71aaa3f7
PP
1150== Command-line tool
1151
1152If you <<install-normand,installed>> the `normand` package, then you
1153can use the `normand` command-line tool:
1154
1155----
1156$ normand <<< '"ma gang de malades"' | hexdump -C
1157----
1158
1159----
116000000000 6d 61 20 67 61 6e 67 20 64 65 20 6d 61 6c 61 64 |ma gang de malad|
116100000010 65 73 |es|
1162----
1163
1164If you copy the `normand.py` module to your own project, then you can
1165run the module itself:
1166
1167----
1168$ python3 -m normand <<< '"ma gang de malades"' | hexdump -C
1169----
1170
1171----
117200000000 6d 61 20 67 61 6e 67 20 64 65 20 6d 61 6c 61 64 |ma gang de malad|
117300000010 65 73 |es|
1174----
1175
1176Without a path argument, the `normand` tool reads from the standard
1177input.
1178
1179The `normand` tool prints the generated binary data to the standard
1180output.
1181
1182Various options control the initial <<state,state>> of the processor:
1183use the `--help` option to learn more.
1184
1185== {py3} API
1186
1187The whole `normand` package/module API is:
1188
1189[source,python]
1190----
1191class ByteOrder(enum.Enum):
1192 # Big endian.
1193 BE = ...
1194
1195 # Little endian.
1196 LE = ...
1197
1198
71aaa3f7
PP
1199class TextLoc:
1200 # Line number.
1201 @property
1202 def line_no(self) -> int:
1203 ...
1204
1205 # Column number.
1206 @property
1207 def col_no(self) -> int:
1208 ...
1209
1210
1211class ParseError(RuntimeError):
1212 # Source text location.
1213 @property
1214 def text_loc(self) -> TextLoc:
1215 ...
1216
1217
1b8aa84a
PP
1218SymbolsT = typing.Dict[str, int]
1219
1220
71aaa3f7
PP
1221class ParseResult:
1222 # Generated data.
1223 @property
1224 def data(self) -> bytearray:
1225 ...
1226
1227 # Updated variable values.
1228 @property
1b8aa84a 1229 def variables(self) -> SymbolsT:
71aaa3f7
PP
1230 ...
1231
1232 # Updated main group label values.
1233 @property
1b8aa84a 1234 def labels(self) -> SymbolsT:
71aaa3f7
PP
1235 ...
1236
1237 # Final offset.
1238 @property
1239 def offset(self) -> int:
1240 ...
1241
1242 # Final byte order.
1243 @property
1b8aa84a 1244 def byte_order(self) -> typing.Optional[ByteOrder]:
71aaa3f7
PP
1245 ...
1246
1b8aa84a 1247
71aaa3f7 1248def parse(normand: str,
1b8aa84a
PP
1249 init_variables: typing.Optional[SymbolsT] = None,
1250 init_labels: typing.Optional[SymbolsT] = None,
71aaa3f7
PP
1251 init_offset: int = 0,
1252 init_byte_order: typing.Optional[ByteOrder] = None) -> ParseResult:
1253 ...
1254----
1255
1256The `normand` parameter is the actual <<learn-normand,Normand input>>
1257while the other parameters control the initial <<state,state>>.
1258
1259The `parse()` function raises a `ParseError` instance should it fail to
1260parse the `normand` string for any reason.
bf8f3b38
PP
1261
1262== Development
1263
1264Normand is a https://python-poetry.org/[Poetry] project.
1265
1266To develop it, install it through Poetry and enter the virtual
1267environment:
1268
1269----
1270$ poetry install
1271$ poetry shell
1272$ normand <<< '"lol" * 10 0a'
1273----
1274
1275`normand.py` is processed by:
1276
1277* https://microsoft.github.io/pyright/[Pyright]
1278* https://github.com/psf/black[Black]
1279* https://pycqa.github.io/isort/[isort]
1280
1281=== Testing
1282
1283Use https://docs.pytest.org/[pytest] to test Normand once the package is
1284part of your virtual environment, for example:
1285
1286----
1287$ poetry install
1288$ poetry run pip3 install pytest
1289$ poetry run pytest
1290----
1291
1292The `pytest` project is currently not a development dependency in
1293`pyproject.toml` due to backward compatibiliy issues with
1294Python{nbsp}3.4.
1295
1296In the `tests` directory, each `*.nt` file is a test. The file name
1297prefix indicates what it's meant to test:
1298
1299`pass-`::
1300 Everything above the `---` line is the valid Normand input
1301 to test.
1302+
1303Everything below the `---` line is the expected data
1304(whitespace-separated hexadecimal bytes).
1305
1306`fail-`::
1307 Everything above the `---` line is the invalid Normand input
1308 to test.
1309+
1310Everything below the `---` line is the expected error message having
1311this form:
1312+
1313----
1314LINE:COL - MESSAGE
1315----
1316
1317=== Contributing
1318
1319Normand uses https://review.lttng.org/admin/repos/normand,general[Gerrit]
1320for code review.
1321
1322To report a bug, https://github.com/efficios/normand/issues/new[create a
1323GitHub issue].
This page took 0.070575 seconds and 4 git commands to generate.