From fc21bb27ebb58b9b1dcd2184685d928ffabcc252 Mon Sep 17 00:00:00 2001 From: Philippe Proulx Date: Thu, 5 Oct 2023 00:31:07 -0400 Subject: [PATCH] Accept many more prefixes and suffixes for a constant integer This patch makes Normand accept the `0o`/`0O` and `0b`/`0B` prefixes for a constant integer, as well as the `h`/`H`/`q`/`Q`/`o`/`O`/`b`/`B` suffixes. Those suffixes are common in tools such as MASM [1]: > You can also specify hexadecimal numbers by adding an h after the > number. You can use uppercase or lowercase letters within numbers. For > example, "0x4AB3", "0X4aB3", "4AB3h", "4ab3h", and "4aB3H" have the > same meaning. as well as NASM [2]: > NASM allows you to specify numbers in a variety of number bases, in a > variety of ways: you can suffix `H`, `Q` or `O`, and `B` for hex, > octal, and binary, or you can prefix `0x` for hex in the style of C, > [...] This constant integer form is available anywhere outside a Python expression string, for example: 55 * 28Fh +0o755~11010001b m:my_macro(-126q, 0b101) Internally, the _norm_const_int() function transforms any suffix form into a Python prefix form (for int()), keeping the negative `-` if present. [1]: https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/masm-numbers-and-operators [2]: https://www.tortall.net/projects/yasm/manual/html/nasm-const.html Change-Id: I708494e84080b9f4292397c6a81e67e335d330cd Signed-off-by: Philippe Proulx --- README.adoc | 71 +++++++++++++++++++--------- normand/normand.py | 47 ++++++++++++++++-- pyproject.toml | 2 +- tests/pass-const-int-bin.nt | 15 ++++++ tests/pass-const-int-dec.nt | 8 ++++ tests/pass-const-int-hex.nt | 58 +++++++++++++++++++++++ tests/pass-const-int-oct.nt | 42 ++++++++++++++++ tests/pass-readme-intro-fill.nt | 2 +- tests/pass-readme-learn-rep-blk-1.nt | 2 +- 9 files changed, 217 insertions(+), 30 deletions(-) create mode 100644 tests/pass-const-int-bin.nt create mode 100644 tests/pass-const-int-dec.nt create mode 100644 tests/pass-const-int-hex.nt create mode 100644 tests/pass-const-int-oct.nt diff --git a/README.adoc b/README.adoc index ce86733..3b8ec75 100644 --- a/README.adoc +++ b/README.adoc @@ -29,7 +29,7 @@ _**Normand**_ is a text-to-binary processor with its own language. This package offers both a portable {py3} module and a command-line tool. -WARNING: This version of Normand is 0.12, meaning both the Normand +WARNING: This version of Normand is 0.13, meaning both the Normand language and the module/CLI interface aren't stable. ifdef::env-github[] @@ -253,7 +253,7 @@ Input: +0x40 {ICITTE:8} "meow mix" -+200~0xff ++200~FFh {ICITTE:8} ---- + @@ -500,6 +500,34 @@ comment may exist: The latter serve to improve readability so that you may write, for example, a MAC address or a UUID as is. +[[const-int]] Many items require a _constant integer_, possibly +negative, in which case it may start with `-` for a negative integer. A +positive constant integer is any of: + +Decimal:: + One or mode digits (`0` to `9`). + +Hexadecimal:: + One of: ++ +* The `0x` or `0X` prefix followed with one or more hexadecimal digits + (`0` to `9`, `a` to `f`, or `A` to `F`). +* One or more hexadecimal digits followed with the `h` or `H` suffix. + +Octal:: + One of: ++ +* The `0o` or `0O` prefix followed with one or more octal digits + (`0` to `7`). +* One or more octal digits followed with the `o`, `O`, `q`, or `Q` + suffix. + +Binary:: + One of: ++ +* The `0b` or `0B` prefix followed with one or more bits (`0` or `1`). +* One or more bits followed with the `b` or `B` suffix. + You can test the examples of this section with the `normand` <> as such: @@ -516,10 +544,10 @@ A _byte constant_ represents a single byte. A byte constant is: Hexadecimal form:: - Two consecutive hexits. + Two consecutive hexadecimal digits. Decimal form:: - A decimal number after the `$` prefix. + One or more digits after the `$` prefix. Binary form:: Eight bits after the `%` prefix. @@ -704,7 +732,7 @@ either: The available lengths are 8, 16, 24, 32, 40, 48, 56, and 64. * A floating point number - ([IEEE{nbsp}754-2008[https://standards.ieee.org/standard/754-2008.html]). + (IEEE{nbsp}754-2008[https://standards.ieee.org/standard/754-2008.html]). + The available length are 32 (_binary32_) and 64 (_binary64_). @@ -826,7 +854,8 @@ For an LEB128 integer at some source location{nbsp}__**L**__, this expression may contain: + -- -* The name of any <> defined before{nbsp}__**L**__. +* The name of any <> defined before{nbsp}__**L**__ + which isn't within a nested group. * The name of any <> known at{nbsp}__**L**__. -- @@ -886,8 +915,8 @@ A current offset setting is: . The `<` prefix. -. A positive integer (hexadecimal starting with `0x` or `0X` accepted) - which is the new current offset. +. A <> which is the new current + offset. . The `>` suffix. @@ -936,8 +965,8 @@ A current offset alignment is: . The `@` prefix. -. A positive integer (hexadecimal starting with `0x` or `0X` accepted) - which is the alignment value in _bits_. +. A <> which is the alignment value + in _bits_. + This value must be greater than zero and a multiple of{nbsp}8. @@ -945,9 +974,8 @@ This value must be greater than zero and a multiple of{nbsp}8. + -- . The ``pass:[~]`` prefix. -. A positive integer (hexadecimal starting with `0x` or `0X` accepted) - which is the value of the byte to use as padding to align the - <>. +. A <> which is the value of the + byte to use as padding to align the <>. -- + Without this section, the padding byte value is zero. @@ -1009,8 +1037,8 @@ A filling is: . One of: -** A positive integer (hexadecimal starting with `0x` or `0X` accepted) - which is the current offset target. +** A <> which is the current offset + target. ** The ``pass:[{]`` prefix, a valid {py3} expression of which the evaluation result type is `int` or `bool` (automatically converted to @@ -1043,9 +1071,8 @@ it's used. + -- . The ``pass:[~]`` prefix. -. A positive integer (hexadecimal starting with `0x` or `0X` accepted) - which is the value of the byte to use as padding to reach the - current offset target. +. A <> which is the value of the + byte to use as padding to reach the current offset target. -- + Without this section, the padding byte value is zero. @@ -1354,8 +1381,8 @@ A repetition block is: . One of: -** A positive integer (hexadecimal starting with `0x` or `0X` accepted) - which is the number of times to repeat the previous item. +** A <> which is the number of + times to repeat the previous item. ** The ``pass:[{]`` prefix, a valid {py3} expression of which the evaluation result type is `int` or `bool` (automatically converted to @@ -1392,7 +1419,7 @@ is equivalent to ``(__ITEMS__){nbsp}pass:[*]{nbsp}__X__``. Input: ---- -!repeat 0x100 +!repeat 0o400 {end - ICITTE - 1 : 8} !end @@ -1577,7 +1604,7 @@ names of the definition of the chosen macro. A parameter value is one of: + -- -* A positive integer (hexadecimal starting with `0x` or `0X` accepted). +* A <>, possibly negative. * The ``pass:[{]`` prefix, a valid {py3} expression of which the evaluation result type is `int` or `bool` (automatically converted to diff --git a/normand/normand.py b/normand/normand.py index c44373d..90ca0d8 100644 --- a/normand/normand.py +++ b/normand/normand.py @@ -30,7 +30,7 @@ # Upstream repository: . __author__ = "Philippe Proulx" -__version__ = "0.12.0" +__version__ = "0.13.0" __all__ = [ "__author__", "__version__", @@ -993,8 +993,45 @@ class _Parser: self._expect_pat(self._val_var_assign_set_bo_suffix_pat, "Expecting `}`") return item + # Returns a normalized version (so as to be parseable by int()) of + # the constant integer string `s`, possibly negative, dealing with + # any radix suffix. + @staticmethod + def _norm_const_int(s: str): + neg = "" + pos = s + + if s.startswith("-"): + neg = "-" + pos = s[1:] + + for r in "xXoObB": + if pos.startswith("0" + r): + # Already correct + return s + + # Try suffix + asm_suf_base = { + "h": "x", + "H": "x", + "q": "o", + "Q": "o", + "o": "o", + "O": "o", + "b": "b", + "B": "B", + } + + for suf in asm_suf_base: + if pos[-1] == suf: + s = "{}0{}{}".format(neg, asm_suf_base[suf], pos.rstrip(suf)) + + return s + # Common constant integer patterns - _pos_const_int_pat = re.compile(r"0[Xx][A-Fa-f0-9]+|\d+") + _pos_const_int_pat = re.compile( + r"0[Xx][A-Fa-f0-9]+|0[Oo][0-7]+|0[Bb][01]+|[A-Fa-f0-9]+[hH]|[0-7]+[qQoO]|[01]+[bB]|\d+" + ) _const_int_pat = re.compile(r"(?P-)?(?:{})".format(_pos_const_int_pat.pattern)) # Tries to parse an offset setting value (after the initial `<`), @@ -1010,7 +1047,7 @@ class _Parser: return # Return item - return _SetOffset(int(m.group(0), 0), begin_text_loc) + return _SetOffset(int(self._norm_const_int(m.group(0)), 0), begin_text_loc) # Tries to parse a label name (after the initial `<`), returning a # label item on success. @@ -1092,7 +1129,7 @@ class _Parser: ) # Validate - pad_val = int(m.group(0), 0) + pad_val = int(self._norm_const_int(m.group(0)), 0) if pad_val > 255: _raise_error( @@ -1217,7 +1254,7 @@ class _Parser: if m.group("neg") == "-" and not allow_neg: _raise_error("Expecting a positive constant integer", expr_text_loc) - expr_str = m.group(0) + expr_str = self._norm_const_int(m.group(0)) return self._ast_expr_from_str(expr_str, expr_text_loc) diff --git a/pyproject.toml b/pyproject.toml index bcf5751..9567508 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -23,7 +23,7 @@ [tool.poetry] name = 'normand' -version = '0.12.0' +version = '0.13.0' description = 'Text-to-binary processor with its own language' license = 'MIT' authors = ['Philippe Proulx '] diff --git a/tests/pass-const-int-bin.nt b/tests/pass-const-int-bin.nt new file mode 100644 index 0000000..97f6de8 --- /dev/null +++ b/tests/pass-const-int-bin.nt @@ -0,0 +1,15 @@ +!m ci(val) + {val:8} +!end + +m:ci(0b11001010) +m:ci(0B11001010) +m:ci(11001010b) +m:ci(11001010B) +m:ci(-0b1010011) +m:ci(-0B1010011) +m:ci(-1010011b) +m:ci(-1010011B) +--- +ca ca ca ca +ad ad ad ad diff --git a/tests/pass-const-int-dec.nt b/tests/pass-const-int-dec.nt new file mode 100644 index 0000000..10b04cb --- /dev/null +++ b/tests/pass-const-int-dec.nt @@ -0,0 +1,8 @@ +!m ci(val) + {val:8} +!end + +m:ci(42) +m:ci(-17) +--- +2a ef diff --git a/tests/pass-const-int-hex.nt b/tests/pass-const-int-hex.nt new file mode 100644 index 0000000..868c62a --- /dev/null +++ b/tests/pass-const-int-hex.nt @@ -0,0 +1,58 @@ +!m ci(val) + {val:8} +!end + +m:ci(0x01) +m:ci(0x23) +m:ci(0x45) +m:ci(0x67) +m:ci(0x89) +m:ci(0xab) +m:ci(0xcd) +m:ci(0xef) +m:ci(0xAB) +m:ci(0xCD) +m:ci(0xEF) +m:ci(0X01) +m:ci(0X23) +m:ci(0X45) +m:ci(0X67) +m:ci(0X89) +m:ci(0Xab) +m:ci(0Xcd) +m:ci(0Xef) +m:ci(0XAB) +m:ci(0XCD) +m:ci(0XEF) +m:ci(01h) +m:ci(23h) +m:ci(45h) +m:ci(67h) +m:ci(89h) +m:ci(abh) +m:ci(cdh) +m:ci(efh) +m:ci(ABh) +m:ci(CDh) +m:ci(EFh) +m:ci(01H) +m:ci(23H) +m:ci(45H) +m:ci(67H) +m:ci(89H) +m:ci(abH) +m:ci(cdH) +m:ci(efH) +m:ci(ABH) +m:ci(CDH) +m:ci(EFH) +m:ci(-0x4a) +m:ci(-0X4a) +m:ci(-4ah) +m:ci(-4aH) +--- +01 23 45 67 89 ab cd ef ab cd ef +01 23 45 67 89 ab cd ef ab cd ef +01 23 45 67 89 ab cd ef ab cd ef +01 23 45 67 89 ab cd ef ab cd ef +b6 b6 b6 b6 diff --git a/tests/pass-const-int-oct.nt b/tests/pass-const-int-oct.nt new file mode 100644 index 0000000..760585d --- /dev/null +++ b/tests/pass-const-int-oct.nt @@ -0,0 +1,42 @@ +!m ci(val) + {val:8} +!end + +m:ci(0o01) +m:ci(0o23) +m:ci(0o45) +m:ci(0o67) +m:ci(0O01) +m:ci(0O23) +m:ci(0O45) +m:ci(0O67) +m:ci(01o) +m:ci(23o) +m:ci(45o) +m:ci(67o) +m:ci(01O) +m:ci(23O) +m:ci(45O) +m:ci(67O) +m:ci(01q) +m:ci(23q) +m:ci(45q) +m:ci(67q) +m:ci(01Q) +m:ci(23Q) +m:ci(45Q) +m:ci(67Q) +m:ci(-0o75) +m:ci(-0O75) +m:ci(-75o) +m:ci(-75O) +m:ci(-75q) +m:ci(-75Q) +--- +01 13 25 37 +01 13 25 37 +01 13 25 37 +01 13 25 37 +01 13 25 37 +01 13 25 37 +c3 c3 c3 c3 c3 c3 diff --git a/tests/pass-readme-intro-fill.nt b/tests/pass-readme-intro-fill.nt index 1ef55a2..c36f731 100644 --- a/tests/pass-readme-intro-fill.nt +++ b/tests/pass-readme-intro-fill.nt @@ -5,7 +5,7 @@ +0x40 {ICITTE:8} "meow mix" -+200~0xff ++200~FFh {ICITTE:8} --- ef be ad de 37 f8 09 00 00 00 00 00 00 00 00 00 diff --git a/tests/pass-readme-learn-rep-blk-1.nt b/tests/pass-readme-learn-rep-blk-1.nt index 3483a59..29f754f 100644 --- a/tests/pass-readme-learn-rep-blk-1.nt +++ b/tests/pass-readme-learn-rep-blk-1.nt @@ -1,4 +1,4 @@ -!repeat 0x100 +!repeat 0o400 {end - ICITTE - 1 : 8} !end -- 2.34.1