Use JSON schemas to validate the layout and types of the YAML config.
This patch changes `config_parse.py` so as to use JSON schemas
(<https://json-schema.org/>) to validate many aspects of a barectf YAML
configuration file instead of having redundant, manual checks.
JSON schemas are found in the `barectf` package's `schemas` directory by
an instance of the new `_SchemaValidator` class. Such an object finds
all the YAML files in the `schemas` directory and, for each of them:
1. Loads it (YAML to Python dictionary).
2. Gets its ID (`$id` property).
3. Adds the schema (1.) to a local schema store using its ID (2.).
Then _SchemaValidator.validate() uses this schema store to build a JSON
schema reference resolver and validator from the `jsonschema` package
(<https://pypi.org/project/jsonschema/>).
The `jsonschema` dependency is added to `pyproject.toml` and
`poetry.lock`. We need a version which supports JSON Schema draft 7
because we need the conditional keywords (`if`/`then`).
Because the barectf YAML configuration file format supports features
which do not exist natively in YAML (inclusions, field type aliases,
field type inheritance, and log level aliases), we can't have a single
JSON schema for the "raw" configuration file. There are actually five
validation steps, each one having its JSON schema:
1. Make sure the configuration object is minimally valid, that is, it's
an object and has a `version` property with a supported value.
Schema: `config/config-min.yaml`.
2. Make sure the configuration object is valid for the inclusion phase.
Those schemas only validate that the metadata/stream/event objects
are objects and that any `$include` property is valid.
Each time the YAML configuration parser loads a partial YAML file for
inclusion, it validates the resulting object using the corresponding
schema depending on what's being included.
Knowing those objects are valid as such is enough to process
inclusions without caring about the resulting configuration object's
validity.
Schemas:
* `2/config/clock-pre-include.yaml`
* `2/config/config-pre-include.yaml`
* `2/config/event-pre-include.yaml`
* `2/config/include-prop.yaml`
* `2/config/metadata-pre-include.yaml`
* `2/config/stream-pre-include.yaml`
* `2/config/trace-pre-include.yaml`
3. Make sure the configuration object is valid for the field type
expansion phase.
This schema digs into compound field types recursively to make sure
any field type is either a string (field type alias) or an object
with either a `class` property or an `$inherit`/`inherit` (mutually
exclusive) property.
Knowing the configuration object is valid as such is enough to expand
field types without caring about the resulting configuration object's
validity. So, for example, a resulting, expanded field type could be:
class: string
size: 16
value-type:
class: array
meow: mix
We just don't care at this point, as long as field types are
"complete".
Schema: `2/config/config-pre-field-type-expansion.yaml`.
4. Make sure the configuration object is valid for the log level
expansion phase.
This schema validates the `$log-levels` property of the metadata
object as well as the `log-level` property of any event object.
Knowing the configuration object is valid as such is enough to expand
log levels, that is, to replace log level strings with their numeric
value, without caring about the resulting configuration object's
validity.
Schema: `2/config/config-pre-log-level-expansion.yaml`.
5. Make sure the configuration object is valid.
This validates the final, effective configuration object which, at
this point, do not contain any:
* `$include` properties.
* Strings as field types.
* `$inherit`/`inherit` properties (field type objects).
* `$log-levels` property (metadata object).
* Strings for the `log-level` properties (event objects).
Also, this is the configuration object which the `--dump-config`
option prints now.
* `2/config/byte-order-prop.yaml`
* `2/config/config.yaml`
* `2/config/field-type.yaml`
* `2/config/uuid-prop.yaml`
After step 5., there's not much to validate in `config_parse.py` itself:
* Make sure referred clock type objects (in `property-mappings` properties
of integer field type objects) exist.
* Make sure identifiers are valid (the schemas do not always validate
that they exclude CTF keywords for technical reasons).
* Make sure alignment values are valid (powers of two).
* Make sure there's only one default stream type name and that it
exists.
* Make sure the values of enumeration field type members are within the
field type's range depending on the value (integer) field type's size.
* Everything that remains in `_BarectfMetadataValidator`.
* Everything in `_MetadataSpecialFieldsValidator`, which is untouched
because that's not something we validate with JSON schemas, although
we could possibly, but it might stretch the use case.
_SchemaValidator.validate() catches a `jsonschema` exception and
converts it to a barectf `ConfigParseError` exception to avoid leaking
`jsonschema` objects (implementation detail) from barectf calls. I made
an effort to make the error object as readable as possible, for example
converting the instance path to context object names, but there's room
for improvement here.
No functional change intended, except for the modified raised
`ConfigParseError` objects. Tests are not changed and pass.
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
17 files changed:
This page took 0.025074 seconds and 4 git commands to generate.