From: Philippe Proulx Date: Fri, 8 Dec 2023 18:30:18 +0000 (+0000) Subject: src/cpp-common: add bt2c::parseJson() functions (listener mode) X-Git-Url: http://drtracing.org/?a=commitdiff_plain;h=3310affacd07f200476641a96beecdb44ae6c44e;p=babeltrace.git src/cpp-common: add bt2c::parseJson() functions (listener mode) This patch adds the bt2c::parseJson() functions in `parse-json.hpp`. Those functions wrap the file-internal `bt2c::internal::JsonParser` class of which an instance can parse a single JSON value, calling specific methods of a JSON event listener as it processes. Internally, `bt2c::internal::JsonParser` uses a string scanner (`bt2c::StrScanner`). In searching for a simple JSON parsing solution, I could not find, as of this date, any project which satisfies the following requirements out of the box: • Is well known, well documented, and well tested. • Has an MIT-compatible license. • Parses both unsigned and signed 64-bit integers (range -9,223,372,036,854,775,808 to 18,446,744,073,709,551,615). • Provides an exact text location (offset, line number, column number) on parsing error (through logging and the message of an error cause). • Provides an exact text location (offset, line number, column number) for each parsed value. I believe the text locations are essential as this JSON parser will be used to decode CTF2‑SPEC‑2.0 [1] metadata streams: because Babeltrace 2 will be a reference implementation of CTF 2, it makes sense to make an effort to pinpoint the exact location of syntactic and semantic errors. More specifically: • JSON for Modern C++ (by Niels Lohmann) [2] doesn't support text location access, although there's a pending pull request (draft as of this date) to add such support [3]. • The exceptions of JsonCpp [4] don't contain a text location, only a message. • SimpleJSON [5] doesn't offer text location access and seems to be an archived project. • RapidJSON [6] doesn't offer text location access. • yajl [7] could offer some form of text location access (offset, at least) with yajl_get_bytes_consumed(), remembering the last offset on our side, although I don't know how nice it would play with whitespaces. That being said, regarding integers, the `yajl_callbacks` structure [8] only contains a `yajl_integer` function pointer which receives a `long long` value (no direct 64-bit unsigned integer support). It's possible to set the `yajl_number` callback for any number, but the `yajl_double` callback gets disabled in that case, and the callback receives a string which needs further parsing on our side: this is pretty much what's implemented `bt2c::StrScanner` anyway. At this point I stopped searching as I already had a working and tested string scanner and, as you can see, without comments, `parse-json.hpp` is only 231 lines of effective code and satisfies all the requirements above. You can test bt2c::parseJson() with a simple program like this: #include #include #include "parse-json.hpp" struct Printer { void onNull(const bt2c::TextLoc&) { std::cout << "null\n"; } template void onScalarVal(const ValT& val, const bt2c::TextLoc&) { std::cout << val << '\n'; } void onArrayBegin(const bt2c::TextLoc&) { std::cout << "[\n"; } void onArrayEnd(const bt2c::TextLoc&) { std::cout << "]\n"; } void onObjBegin(const bt2c::TextLoc&) { std::cout << "{\n"; } void onObjKey(const std::string& key, const bt2c::TextLoc&) { std::cout << key << ": "; } void onObjEnd(const bt2c::TextLoc&) { std::cout << "}\n"; } }; int main(const int, const char * const * const argv) { Printer printer; bt2c::parseJson(argv[1], printer); } Then: $ ./test-parse-json 23 $ ./test-parse-json '"\u03c9 represents angular velocity"' $ ./test-parse-json '{"salut": [23, true, 42.4e-9, {"meow": null}]}' $ ./test-parse-json 18446744073709551615 $ ./test-parse-json -9223372036854775808 Also try some parsing errors: $ ./test-parse-json '{"salut": [false, 42.4e-9, "meow": null}]}' $ ./test-parse-json 18446744073709551616 $ ./test-parse-json -9223372036854775809 $ ./test-parse-json '"invalid \u8dkf codepoint"' [1]: https://diamon.org/ctf/CTF2-SPEC-2.0.html [2]: https://github.com/nlohmann/json [3]: https://github.com/nlohmann/json/pull/3165 [4]: https://github.com/open-source-parsers/jsoncpp [5]: https://github.com/nbsdx/SimpleJSON [6]: https://rapidjson.org/ [7]: https://github.com/lloyd/yajl [8]: https://lloyd.github.io/yajl/yajl-2.1.0/structyajl__callbacks.html Signed-off-by: Philippe Proulx Change-Id: Id32c2b64723ca50b044369c424fe046c0a183cce Reviewed-on: https://review.lttng.org/c/babeltrace/+/7411 Reviewed-on: https://review.lttng.org/c/babeltrace/+/12682 --- diff --git a/src/Makefile.am b/src/Makefile.am index 920139db..8197f4c5 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -171,6 +171,7 @@ cpp_common_libcpp_common_la_SOURCES = \ cpp-common/bt2c/libc-up.hpp \ cpp-common/bt2c/logging.hpp \ cpp-common/bt2c/make-span.hpp \ + cpp-common/bt2c/parse-json.hpp \ cpp-common/bt2c/prio-heap.hpp \ cpp-common/bt2c/read-fixed-len-int.hpp \ cpp-common/bt2c/regex.hpp \ diff --git a/src/cpp-common/bt2c/parse-json.hpp b/src/cpp-common/bt2c/parse-json.hpp new file mode 100644 index 00000000..f8db3755 --- /dev/null +++ b/src/cpp-common/bt2c/parse-json.hpp @@ -0,0 +1,471 @@ +/* + * Copyright (c) 2022-2024 Philippe Proulx + * + * SPDX-License-Identifier: MIT + */ + +#ifndef BABELTRACE_CPP_COMMON_BT2C_PARSE_JSON_HPP +#define BABELTRACE_CPP_COMMON_BT2C_PARSE_JSON_HPP + +#include +#include +#include + +#include "common/assert.h" +#include "cpp-common/bt2s/string-view.hpp" + +#include "exc.hpp" +#include "str-scanner.hpp" +#include "text-loc.hpp" + +namespace bt2c { +namespace internal { + +/* + * JSON text parser. + * + * This parser parses a single JSON value, calling the methods of a JSON + * event listener of type `ListenerT` for each JSON event. + * + * The requirements of `ListenerT` are the following public methods: + * + * void onNull(const TextLoc&); + * void onScalarVal(bool, const TextLoc&); + * void onScalarVal(unsigned long long, const TextLoc&); + * void onScalarVal(long long, const TextLoc&); + * void onScalarVal(double, const TextLoc&); + * void onScalarVal(bt2s::string_view, const TextLoc&); + * void onArrayBegin(const TextLoc&); + * void onArrayEnd(const TextLoc&); + * void onObjBegin(const TextLoc&); + * void onObjKey(bt2s::string_view, const TextLoc&); + * void onObjEnd(const TextLoc&); + * + * The received text location always indicate the location of the + * _beginning_ of the text representing the corresponding JSON value. + * + * This parser honours the grammar of , not + * parsing special floating-point number tokens (`nan`, `inf`, and the + * rest) or C-style comments. + */ +template +class JsonParser final +{ +public: + /* + * Builds a JSON text parser, wrapping the string `str`, and parses + * it, calling the methods of the JSON event listener `listener`. + * + * Adds to the text location offset for all error messages. + * + * When the JSON parser logs or appends a cause to the error of the + * current thread, it uses `baseOffset` to format the text location + * part of the message. + */ + explicit JsonParser(bt2s::string_view str, ListenerT& listener, std::size_t baseOffset, + const Logger& parentLogger); + +private: + /* + * Parses the whole JSON string. + */ + void _parse(); + + /* + * Expects a JSON value, appending a cause to the error of the + * current thread and throwing `Error` if not found. + */ + void _expectVal(); + + /* + * Tries to parse `null`, calling the event listener on success. + */ + bool _tryParseNull(); + + /* + * Tries to parse `true` or `false`, calling the event listener on + * success. + */ + bool _tryParseBool(); + + /* + * Tries to parse a JSON number, calling the event listener on + * success. + */ + bool _tryParseNumber(); + + /* + * Tries to parse a JSON object key, calling the event listener on + * success. + */ + bool _tryParseObjKey(); + + /* + * Tries to parse a JSON string, calling the event listener on + * success. + */ + bool _tryParseStr(); + + /* + * Tries to parse a JSON array, calling the event listener on + * success. + */ + bool _tryParseArray(); + + /* + * Tries to parse a JSON object, calling the event listener on + * success. + */ + bool _tryParseObj(); + + /* + * Expects the specific token `token`, appending a cause to the + * error of the current thread and throwing `Error` if not found. + */ + void _expectToken(const bt2s::string_view token) + { + if (!_mSs.tryScanToken(token)) { + BT_CPPLOGE_TEXT_LOC_APPEND_CAUSE_AND_THROW(Error, _mSs.loc(), "Expecting `{}`.", + token.to_string()); + } + } + + /* + * Calls StrScanner::tryScanLitStr() with the JSON-specific escape + * sequence starting characters. + */ + bt2s::string_view _tryScanLitStr() + { + return _mSs.tryScanLitStr("/bfnrtu"); + } + + /* + * Returns whether or not the current character of the underlying + * string scanner looks like the beginning of the fractional or + * exponent part of a constant real number. + */ + bool _ssCurCharLikeConstRealFracOrExp() const noexcept + { + return *_mSs.at() == '.' || *_mSs.at() == 'E' || *_mSs.at() == 'e'; + } + +private: + /* Logging configuration */ + Logger _mLogger; + + /* Underlying string scanner */ + StrScanner _mSs; + + /* JSON event listener */ + ListenerT *_mListener; + + /* Object key sets, one for each JSON object level, to detect duplicates */ + std::vector> _mKeys; +}; + +template +JsonParser::JsonParser(const bt2s::string_view str, ListenerT& listener, + const std::size_t baseOffset, const Logger& parentLogger) : + _mLogger {parentLogger, "PARSE-JSON"}, + _mSs {str, baseOffset, parentLogger}, _mListener {&listener} +{ + this->_parse(); +} + +template +void JsonParser::_expectVal() +{ + if (this->_tryParseNull()) { + return; + } + + if (this->_tryParseBool()) { + return; + } + + if (this->_tryParseStr()) { + return; + } + + if (this->_tryParseArray()) { + return; + } + + if (this->_tryParseObj()) { + return; + } + + if (this->_tryParseNumber()) { + return; + } + + BT_CPPLOGE_TEXT_LOC_APPEND_CAUSE_AND_THROW( + Error, _mSs.loc(), + "Expecting a JSON value: `null`, `true`, `false`, a supported number " + "(for an integer: -9,223,372,036,854,775,808 to 18,446,744,073,709,551,615), " + "`\"` (a string), `[` (an array), or `{{` (an object)."); +} + +template +void JsonParser::_parse() +{ + /* Expect a single JSON value */ + this->_expectVal(); + + /* Skip trailing whitespaces */ + _mSs.skipWhitespaces(); + + /* Make sure all the text is consumed */ + if (!_mSs.isDone()) { + BT_CPPLOGE_TEXT_LOC_APPEND_CAUSE_AND_THROW(Error, _mSs.loc(), + "Extra data after parsed JSON value."); + } +} + +template +bool JsonParser::_tryParseNull() +{ + _mSs.skipWhitespaces(); + + const auto loc = _mSs.loc(); + + if (_mSs.tryScanToken("null")) { + _mListener->onNull(loc); + return true; + } + + return false; +} + +template +bool JsonParser::_tryParseBool() +{ + _mSs.skipWhitespaces(); + + const auto loc = _mSs.loc(); + + if (_mSs.tryScanToken("true")) { + _mListener->onScalarVal(true, loc); + return true; + } else if (_mSs.tryScanToken("false")) { + _mListener->onScalarVal(false, loc); + return true; + } + + return false; +} + +template +bool JsonParser::_tryParseNumber() +{ + _mSs.skipWhitespaces(); + + const auto loc = _mSs.loc(); + + /* + * The `_mSs.tryScanConstReal()` call below is somewhat expensive + * currently because it involves executing a regex to confirm the + * constant real number form. + * + * The strategy below is to: + * + * 1. Keep the current position P of the string scanner. + * + * 2. Call `_mSs.tryScanConstUInt()` and + * `_mSs.tryScanConstSInt()` first. + * + * If either one succeeds, make sure the scanned JSON number + * can't be in fact a real number. If it can, then reset the + * position of the string scanner to P. It's safe to reset the + * string scanner position at this point because + * `_mSs.skipWhitespaces()` was called above and the constant + * number scanning methods won't scan a newline character. + * + * 3. Call `_mSs.tryScanConstReal()` last. + */ + const auto at = _mSs.at(); + + if (const auto uIntVal = _mSs.tryScanConstUInt()) { + if (!this->_ssCurCharLikeConstRealFracOrExp()) { + /* Confirmed unsigned integer form */ + _mListener->onScalarVal(*uIntVal, loc); + return true; + } + + /* Looks like a constant real number: backtrack */ + _mSs.at(at); + } else if (const auto sIntVal = _mSs.tryScanConstSInt()) { + if (!this->_ssCurCharLikeConstRealFracOrExp()) { + /* Confirmed signed integer form */ + _mListener->onScalarVal(*sIntVal, loc); + return true; + } + + /* Looks like a constant real number: backtrack */ + _mSs.at(at); + } + + if (const auto realVal = _mSs.tryScanConstReal()) { + _mListener->onScalarVal(*realVal, loc); + return true; + } + + return false; +} + +template +bool JsonParser::_tryParseStr() +{ + _mSs.skipWhitespaces(); + + const auto loc = _mSs.loc(); + const auto str = this->_tryScanLitStr(); + + if (str.data()) { + _mListener->onScalarVal(str, loc); + return true; + } + + return false; +} + +template +bool JsonParser::_tryParseObjKey() +{ + _mSs.skipWhitespaces(); + + const auto loc = _mSs.loc(); + const auto str = this->_tryScanLitStr(); + + if (!str.empty()) { + /* _tryParseObj() pushes */ + BT_ASSERT(!_mKeys.empty()); + + /* Insert, checking for duplicate key */ + if (!_mKeys.back().insert(str.to_string()).second) { + BT_CPPLOGE_TEXT_LOC_APPEND_CAUSE_AND_THROW( + Error, _mSs.loc(), "Duplicate JSON object key `{}`.", str.to_string()); + } + + _mListener->onObjKey(str, loc); + return true; + } + + return false; +} + +template +bool JsonParser::_tryParseArray() +{ + _mSs.skipWhitespaces(); + + const auto loc = _mSs.loc(); + + if (!_mSs.tryScanToken("[")) { + return false; + } + + /* Beginning of array */ + _mListener->onArrayBegin(loc); + + if (_mSs.tryScanToken("]")) { + /* Empty array */ + _mListener->onArrayEnd(loc); + return true; + } + + while (true) { + /* Expect array element */ + this->_expectVal(); + + if (!_mSs.tryScanToken(",")) { + /* No more array elements */ + break; + } + } + + /* End of array */ + this->_expectToken("]"); + _mListener->onArrayEnd(loc); + return true; +} + +template +bool JsonParser::_tryParseObj() +{ + _mSs.skipWhitespaces(); + + const auto loc = _mSs.loc(); + + if (!_mSs.tryScanToken("{")) { + return false; + } + + /* Beginning of object */ + _mListener->onObjBegin(loc); + + if (_mSs.tryScanToken("}")) { + /* Empty object */ + _mListener->onObjEnd(loc); + return true; + } + + /* New level of object keys */ + _mKeys.push_back({}); + + while (true) { + /* Expect object key */ + _mSs.skipWhitespaces(); + + if (!this->_tryParseObjKey()) { + BT_CPPLOGE_TEXT_LOC_APPEND_CAUSE_AND_THROW( + Error, _mSs.loc(), "Expecting a JSON object key (double-quoted string)."); + } + + /* Expect colon */ + this->_expectToken(":"); + + /* Expect entry value */ + this->_expectVal(); + + if (!_mSs.tryScanToken(",")) { + /* No more entries */ + break; + } + } + + /* End of object */ + BT_ASSERT(!_mKeys.empty()); + _mKeys.pop_back(); + this->_expectToken("}"); + _mListener->onObjEnd(loc); + return true; +} + +} /* namespace internal */ + +/* + * Parses the JSON text `str`, calling the methods of `listener` for + * each JSON event (see `internal::JsonParser` for the requirements + * of `ListenerT`). + * + * When the function logs or appends a cause to the error of the current + * thread, it uses `baseOffset` to format the text location part of the + * message. + */ +template +void parseJson(const bt2s::string_view str, ListenerT& listener, const std::size_t baseOffset, + const Logger& parentLogger) +{ + internal::JsonParser {str, listener, baseOffset, parentLogger}; +} + +template +void parseJson(const bt2s::string_view str, ListenerT& listener, const Logger& parentLogger) +{ + parseJson(str, listener, 0, parentLogger); +} + +} /* namespace bt2c */ + +#endif /* BABELTRACE_CPP_COMMON_BT2C_PARSE_JSON_HPP */