Michael Jeanson [Wed, 16 Mar 2022 15:40:52 +0000 (11:40 -0400)]
Fix: use the correct endian compat macros
Document which variant of the endian macros our compat header guarantees
across all platforms and fix incorrect uses.
This was discovered with -Wundef on macOS.
Change-Id: Iaf442fe5887063661273ac2a00c9fa4015e83d5c
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 16 Mar 2022 21:55:45 +0000 (17:55 -0400)]
Clean-up: tests: silence bogus warning
1486757 Buffer not null terminated
If the buffer is treated as a null terminated string in later operations, a buffer overflow or over-read may occur.
In test_create_ust_event_exclusion(): The string buffer may not have a null terminator if the source string's length is equal to the buffer size (CWE-170)
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I0c3fd6c9d591e1c67d8b80bf12825bb0d1520a65
Jérémie Galarneau [Wed, 16 Mar 2022 21:39:07 +0000 (17:39 -0400)]
Fix: tests: uninitialized lttng_payload
1474980 Uninitialized pointer read
Incorrect values could be read from, or even written to, an arbitrary memory location, causing incorrect computations.
In test_event_rule_userspace_probe(): Reads an uninitialized pointer or its target (CWE-457)
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I7645278f18e4a4678fb5ede9523d0cfa8d3aa106
Jérémie Galarneau [Wed, 16 Mar 2022 21:35:43 +0000 (17:35 -0400)]
Fix: sessiond: ust-app: uninitialized name logged on stream copy failure
1466302 Uninitialized scalar variable
The variable will contain an arbitrary value left from earlier computations.
In send_channel_uid_to_ust(buffer_reg_channel *, ust_app *, ust_app_session *, ust_app_channel *): Use of an uninitialized variable (CWE-457)
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Icd38de3b67dab783fa26a721c68c48ebfbb59785
Jérémie Galarneau [Wed, 16 Mar 2022 21:29:11 +0000 (17:29 -0400)]
Fix: lttng-elf: untrusted entry size divisor
1405557 Untrusted divisor
The divisor could be controlled by an attacker, who could cause a division by zero.
In lttng_elf_get_symbol_offset: An unscrutinized value from an untrusted source used as a divisor (CWE-369)
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I029708a0df4f62fe0031e374d50839c26f4f3f4b
Jonathan Rajotte [Fri, 11 Mar 2022 18:39:58 +0000 (13:39 -0500)]
Fix: tests: test definitions arrays contain invalid data
Observed issue
==============
The long_regression Ci job fails on test_thread_stall.
11:17:16 # export LTTNG_SESSION_CONFIG_XSD_PATH=/home/jenkins/workspace/lttng-tools_master_long_regression/arch/amd64/babeltrace_version/stable-2.0/build/std/conf/std/liburcu_version/master/test_type/full/src/lttng-tools/tests/../src/common/
11:17:16 # env /home/jenkins/workspace/lttng-tools_master_long_regression/arch/amd64/babeltrace_version/stable-2.0/build/std/conf/std/liburcu_version/master/test_type/full/src/lttng-tools/tests/../src/bin/lttng-sessiond/lttng-sessiond --background --consumerd64-path=/home/jenkins/workspace/lttng-tools_master_long_regression/arch/amd64/babeltrace_version/stable-2.0/build/std/conf/std/liburcu_version/master/test_type/full/src/lttng-tools/tests/../src/bin/lttng-consumerd/lttng-consumerd 1
11:17:16 ok 16 - Start session daemon
11:17:16 # Check after running for 30 seconds
11:17:16 not ok 17 - Validation failure
11:17:16 # Failed test 'Validation failure'
11:17:16 # in /home/jenkins/workspace/lttng-tools_master_long_regression/arch/amd64/babeltrace_version/stable-2.0/build/std/conf/std/liburcu_version/master/test_type/full/src/lttng-tools/tests/regression/tools/health/../../../utils/tap/tap.sh:fail() at line 159.
11:17:16 # Health returned:
11:17:16 # stdout:
11:17:16 # stderr:
11:17:16 # Killing (signal SIGKILL) lttng-sessiond and lt-lttng-sessiond pids:
1840601 1840602
11:17:16 ok 18 - Wait after kill session daemon
...
17:57:01 # Test health problem detection with LTTNG_RELAYD_THREAD_DISPATCHER
17:57:01 # Start session daemon
17:57:01 # export LTTNG_SESSION_CONFIG_XSD_PATH=/home/jenkins/workspace/lttng-tools_master_long_regression/arch/amd64/babeltrace_version/stable-2.0/build/std/conf/std/liburcu_version/master/test_type/full/src/lttng-tools/tests/../src/common/
17:57:01 # env /home/jenkins/workspace/lttng-tools_master_long_regression/arch/amd64/babeltrace_version/stable-2.0/build/std/conf/std/liburcu_version/master/test_type/full/src/lttng-tools/tests/../src/bin/lttng-sessiond/lttng-sessiond --background --consumerd64-path=/home/jenkins/workspace/lttng-tools_master_long_regression/arch/amd64/babeltrace_version/stable-2.0/build/std/conf/std/liburcu_version/master/test_type/full/src/lttng-tools/tests/../src/bin/lttng-consumerd/lttng-consumerd 1
17:57:01 ok 38 - Start session daemon
17:57:01 # /home/jenkins/workspace/lttng-tools_master_long_regression/arch/amd64/babeltrace_version/stable-2.0/build/std/conf/std/liburcu_version/master/test_type/full/src/lttng-tools/tests/regression/tools/health/../../../../src/bin/lttng/lttng create health_thread_stall --no-output
17:57:01 ok 39 - Create session health_thread_stall in no-output mode
17:57:01 # With UST consumer daemons
17:57:01 # /home/jenkins/workspace/lttng-tools_master_long_regression/arch/amd64/babeltrace_version/stable-2.0/build/std/conf/std/liburcu_version/master/test_type/full/src/lttng-tools/tests/regression/tools/health/../../../../src/bin/lttng/lttng enable-event tp:tptest -c testchan -s health_thread_stall -u
17:57:01 ok 40 - Enable ust event tp:tptest for session health_thread_stall
17:57:01 ok 41 # skip: Root access is needed. Skipping kernel consumer health check test.
17:57:01 # /home/jenkins/workspace/lttng-tools_master_long_regression/arch/amd64/babeltrace_version/stable-2.0/build/std/conf/std/liburcu_version/master/test_type/full/src/lttng-tools/tests/regression/tools/health/../../../../src/bin/lttng/lttng start health_thread_stall
17:57:01 ok 42 - Start tracing for session health_thread_stall
17:57:01 # Check after running for 30 seconds
17:57:01 not ok 43 - Validation failure
17:57:01 # Failed test 'Validation failure'
17:57:01 # in /home/jenkins/workspace/lttng-tools_master_long_regression/arch/amd64/babeltrace_version/stable-2.0/build/std/conf/std/liburcu_version/master/test_type/full/src/lttng-tools/tests/regression/tools/health/../../../utils/tap/tap.sh:fail() at line 159.
17:57:01 # Health returned:
17:57:01 # stdout:
17:57:01 # stderr:
17:57:01 # Killing (signal SIGTERM) lttng-consumerd pids: 690297 690299
17:57:01 Error: consumer closed the command socket
17:57:01 Error: Health error occurred in thread_consumer_management
17:57:01 ok 44 - Wait after kill consumer daemon
Cause
=====
After investigation, commit
3c3390532736cfb5198f863d0d2b218e21fcf76d [1]
introduces the test regression.
Albeit [1] removes `LTTNG_SESSIOND_THREAD_HT_CLEANUP` from the `THREAD`
array and the corresponding error message in `ERROR_STRING`, it does not
modify the `NEEDS_ROOT`, `TEST_CONSUMERD` and `TEST_RELAYD` arrays.
Also the test count is not adjusted to reflect the removal of the
`THREAD` element.
Solution
========
Remove the unused data from `NEEDS_ROOT`, `TEST_CONSUMERD` and
`TEST_RELAYD` and adjust the test count.
Known drawbacks
=========
None.
References
==========
[1] https://github.com/lttng/lttng-tools/commit/
3c3390532736cfb5198f863d0d2b218e21fcf76d
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I9c16fa8d76b41f1a28fd342d9f076969f4ff1b13
Jérémie Galarneau [Wed, 16 Mar 2022 20:35:30 +0000 (16:35 -0400)]
Clean-up: exclusions: use LTTNG_EVENT_EXCLUSION_NAME_AT util
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I59da48be1a2d905d3a7ff485d353fdacddda784d
Jérémie Galarneau [Tue, 15 Mar 2022 21:19:27 +0000 (17:19 -0400)]
Clean-up: lttng-ctl: strnlen out of bounds access
gcc 11.2 produces the following warning. The lttng_strncpy helper
assumes that 'src' is a null terminated string. As such, the use of a
string literal (of size 37) in this specific example is correct as
strnlen will not read beyond the null terminator.
Replacing strnlen by strlen eliminates this warning. strnlen was used to
short-circuit the source length check when it was larger than the
destination. This optimization is unlikely to matter. Pascal-style
strings should be used when string length computations are expected to
be prohibitively expensive.
In file included from ../../../src/common/macros.h:15,
from ../../../include/lttng/health-internal.h:18,
from lttng-ctl-health.cpp:19:
In function 'size_t lttng_strnlen(const char*, size_t)',
inlined from 'int lttng_strncpy(char*, const char*, size_t)' at ../../../src/common/macros.h:123:19,
inlined from 'int set_health_socket_path(lttng_health*, int)' at lttng-ctl-health.cpp:198:22,
inlined from 'int lttng_health_query(lttng_health*)' at lttng-ctl-health.cpp:319:30:
../../../src/common/compat/string.h:19:23: warning: 'size_t strnlen(const char*, size_t)' specified bound 4096 may exceed source size 37 [-Wstringop-overread]
19 | return strnlen(str, max);
| ~~~~~~~^~~~~~~~~~
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I290109433fcae7073321f1b48ecfbb2ec6e4ad26
Jérémie Galarneau [Tue, 15 Mar 2022 21:13:03 +0000 (17:13 -0400)]
Clean-up: sessiond-comm: out of bounds access warning
gcc 11.2 produces the two following warnings. In both case, setting an
array's dimension to zero is used to express a variable length array of
names that are LTTNG_SYMBOL_NAME_LEN bytes long. gcc doesn't know about
this and correctly points out that an access is taking place outside of
the array's bounds.
Omit the '0' dimension to work around this warning.
event.cpp: In function 'ssize_t lttng_event_create_from_payload(lttng_payload_view*, lttng_event**, lttng_event_exclusion**, char**, lttng_bytecode**)':
event.cpp:320:62: warning: array subscript i is outside array bounds of 'char [0][256]' [-Warray-bounds]
320 | ret = lttng_strncpy(local_exclusions->names[i],
| ~~~~~~~~~~~~~~~~~~~~~~~~~^
In file included from event.cpp:16:
../../src/common/sessiond-comm/sessiond-comm.h:569:14: note: while referencing 'lttng_event_exclusion::names'
569 | char names[0][LTTNG_SYMBOL_NAME_LEN];
| ^~~~~
event-rule/user-tracepoint.cpp: In function 'lttng_event_rule_generate_exclusions_status lttng_event_rule_user_tracepoint_generate_exclusions(const lttng_event_rule*, lttng_event_exclusion**)':
event-rule/user-tracepoint.cpp:383:61: warning: array subscript i is outside array bounds of 'char [0][256]' [-Warray-bounds]
383 | copy_ret = lttng_strncpy(exclusions->names[i], exclusion_str,
| ~~~~~~~~~~~~~~~~~~~^
In file included from ../../src/common/runas.h:17,
from event-rule/user-tracepoint.cpp:17:
../../src/common/sessiond-comm/sessiond-comm.h:569:14: note: while referencing 'lttng_event_exclusion::names'
569 | char names[0][LTTNG_SYMBOL_NAME_LEN];
| ^~~~~
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I260185f2baf085ca4486ce3b13696ee5fa55938a
Jérémie Galarneau [Wed, 16 Mar 2022 15:56:21 +0000 (11:56 -0400)]
Fix: event: erroneous bound check on perf counter name size
The wrong size if used when initializing a perf counter name from a
payload. The destination size must be used to prevent out of bound
writes.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I8ea41d30815bd2c02bb2ad8b01e8cecd2d6549a8
Jérémie Galarneau [Wed, 16 Mar 2022 15:55:08 +0000 (11:55 -0400)]
Fix: sessiond: event name length check is too strict
A truncation check when initializing an event from an event rule limits
the name to one less character than is supposed to be allowed.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I172c5098597923d14508c150c7b3577f759bae72
Jérémie Galarneau [Wed, 16 Mar 2022 15:52:42 +0000 (11:52 -0400)]
Clean-up: use sizeof instead of repeating string length constant
Looking into a number of coverity reports (false positives), I
identified a number of sites which use the maximal symbol length
constant when the actual size of an array can be used. This will prevent
mismatches in the future should the array sizes change.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ia74f43d3871fdce60affbde068401b58c84b09ad
Jérémie Galarneau [Wed, 16 Mar 2022 14:48:59 +0000 (10:48 -0400)]
Fix: relayd: missing session unlock on error path
1475890 Missing unlock May result in deadlock if there is another
attempt to acquire the lock.
In viewer_get_new_streams(relay_connection *): Missing a release of a
lock on a path (CWE-667)
Reported-by: Coverity Scan
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I66de344f5f39ac85bf8db93cf39a07d0c6cf7694
Michael Jeanson [Mon, 7 Mar 2022 18:59:46 +0000 (13:59 -0500)]
Bring compiler warning flags in line with other projects
Change-Id: I0281a357afbace553368cd01357bb2f21de3352d
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Tue, 8 Mar 2022 23:26:06 +0000 (18:26 -0500)]
Prepare for '-Wsign-compare'
In preparation for '-Wextra'
Change-Id: I9a3b91009b2b44c0aeacfb37fa2b8b901be79992
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Tue, 8 Mar 2022 23:16:06 +0000 (18:16 -0500)]
Prepare for '-Wimplicit-fallthrough'
In preparation for '-Wextra'
Change-Id: Ice4c5aa7f6ce9107c88f38ec4024a4631589ad73
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Tue, 8 Mar 2022 23:05:26 +0000 (18:05 -0500)]
Prepare for '-Wmissing-field-initializers'
In preparation for '-Wextra'
Change-Id: Ic593491ad44c1254f158b19659c3b9567d180ad1
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Tue, 8 Mar 2022 16:33:30 +0000 (11:33 -0500)]
Prepare for '-Wignored-qualifiers'
In preparation for '-Wextra'
Change-Id: I6734a105170da2d57480fb5e15cae839adc38e62
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Tue, 8 Mar 2022 15:52:55 +0000 (10:52 -0500)]
Prepare for '-Wunused-parameter'
In preparation for '-Wextra'
Change-Id: I30e6abb9502fc97daa565fde450d1e4235cf1ec7
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Wed, 9 Mar 2022 16:32:09 +0000 (11:32 -0500)]
configure: add '-Wredundant-decls' to warning flags
Change-Id: I5329ebe83aab40e6796b506c28e853b4af3c5e99
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Wed, 9 Mar 2022 15:19:17 +0000 (10:19 -0500)]
configure: add '-Wmissing-noreturn' to warning flags
Change-Id: I95a981348109d4614afcfe9c85f971e65afc2765
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Mon, 7 Mar 2022 20:59:30 +0000 (15:59 -0500)]
configure: add '-Wlogical-op' to warning flags
Change-Id: I0516add62151b22352f96d1e62871a013b8fa6f3
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Mon, 7 Mar 2022 19:21:21 +0000 (14:21 -0500)]
configure: add '-Wundef' to warning flags
Change-Id: If47c16121b1679862e7a5f75fce70c7d9973e92e
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Mon, 7 Mar 2022 19:02:29 +0000 (14:02 -0500)]
configure: add '-Wnull-dereference' to warning flags
Change-Id: Ife5ad6963262c5c2715954fcd34c94015fb30aa6
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Mon, 7 Mar 2022 16:37:49 +0000 (11:37 -0500)]
fix: warning '-Wstringop-truncation' with GCC 11.2
Building with GCC 11.2 results in the following warning :
In file included from ../../src/common/tracker.h:18,
from ../../src/bin/lttng-sessiond/trace-ust.h:17,
from test_ust_data.cpp:19:
../../src/common/sessiond-comm/sessiond-comm.h:569:14: note: while referencing ‘lttng_event_exclusion::names’
569 | char names[0][LTTNG_SYMBOL_NAME_LEN];
| ^~~~~
test_ust_data.cpp:209:16: warning: ‘char* strncpy(char*, const char*, size_t)’ specified bound 256 equals destination size [-Wstringop-truncation]
209 | strncpy(LTTNG_EVENT_EXCLUSION_NAME_AT(exclusion, 0),
| ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
210 | get_random_string(), LTTNG_SYMBOL_NAME_LEN);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test_ust_data.cpp:211:16: warning: ‘char* strncpy(char*, const char*, size_t)’ specified bound 256 equals destination size [-Wstringop-truncation]
211 | strncpy(LTTNG_EVENT_EXCLUSION_NAME_AT(exclusion, 1),
| ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
212 | get_random_string(), LTTNG_SYMBOL_NAME_LEN);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Change-Id: I78eea760b4684227ee457c3368c6397d0a767af5
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Mon, 7 Mar 2022 16:28:19 +0000 (11:28 -0500)]
fix: warning '-Wformat-overflow' with GCC 11.2
Building with GCC 11.2 results in the following warning :
In file included from rotation-thread.cpp:11:
In function 'int handle_job_queue(rotation_thread_handle*, rotation_thread*, rotation_thread_timer_queue*)',
inlined from 'void* thread_rotation(void*)' at rotation-thread.cpp:844:27:
../../../src/common/error.h:139:32: warning: '%s' directive argument is null [-Wformat-overflow=]
139 | fprintf((type) == PRINT_MSG ? stdout : stderr, fmt, ## args); \
| ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../../src/common/error.h:155:25: note: in expansion of macro '__lttng_print'
155 | __lttng_print(type, \
| ^~~~~~~~~~~~~
../../../src/common/error.h:195:27: note: in expansion of macro '_ERRMSG'
195 | #define DBG(fmt, args...) _ERRMSG("DBG1", PRINT_DBG, fmt, ## args)
| ^~~~~~~
rotation-thread.cpp:587:25: note: in expansion of macro 'DBG'
587 | DBG("Session \"%s\" not found",
| ^~~
Use an empty string for the format string if 'session->name' is NULL.
Change-Id: Ibe29b43c0e8afd13b1c28770e8f7451340cc1e81
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Mon, 22 Nov 2021 19:43:09 +0000 (14:43 -0500)]
Cleanup: DIST_SUBDIRS is redundant when using AM conditionals
From automake's documentation[1] :
If SUBDIRS is defined conditionally using Automake conditionals,
Automake will define DIST_SUBDIRS automatically from the possible
values of SUBDIRS in all conditions.
[1] https://www.gnu.org/software/automake/manual/html_node/SUBDIRS-vs-DIST_005fSUBDIRS.html
Change-Id: I8495f1f4452ccde4920ecd63bfd37de4eb10c281
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 10 Mar 2022 22:46:31 +0000 (17:46 -0500)]
Fix: relayd: session id is ignored by 2.11+ create session command
The id of the session used by the sessiond is not returned by
cmd_create_session_2_11 and its caller sets the value in the
relay_session to an uninitialized value.
Up until recently this didn't have much effect as this uninitialized
value was stored and used to perform look-ups in the trace chunk
registry, which would work.
However, the recent multi-consumer rotation fixes make this problem more
significant as this 'id' is used as a key to join relay sessions
originating from the same session daemon.
This was discovered by enabling the '-Wunused-parameter' warning.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I7e33f5f93dc46bb630c431408c9472be3a75c030
Jérémie Galarneau [Thu, 3 Mar 2022 00:27:31 +0000 (19:27 -0500)]
Fix: consumerd: use-after-free of metadata bucket
Observed issue
==============
When consumer_stream_destroy() is called from, for example, the error
path in setup_metadata(), consumer_stream_free() can end up being called
twice on the same stream. Since the stream->metadata_bucket is not set
to NULL after being destroyed, it leads to a use-after-free:
ERROR: AddressSanitizer: heap-use-after-free on address 0x604000000318
READ of size 8 at 0x604000000318 thread T7
#0 in metadata_bucket_destroy
#1 in consumer_stream_free
#2 in consumer_stream_destroy
#3 in setup_metadata
#4 in lttng_ustconsumer_recv_cmd
#5 in lttng_consumer_recv_cmd
#6 in consumer_thread_sessiond_poll
#7 in start_thread nptl/pthread_create.c:481
#8 in clone (/lib/x86_64-linux-gnu/libc.so.6+0xfcbde)
0x604000000318 is located 8 bytes inside of 48-byte region [0x604000000310,0x604000000340)
freed by thread T7 here:
#0 in __interceptor_free
#1 in metadata_bucket_destroy
#2 in consumer_stream_free
#3 in consumer_stream_destroy
#4 in clean_channel_stream_list
#5 in consumer_del_channel
#6 in consumer_stream_destroy
#7 in setup_metadata
#8 in lttng_ustconsumer_recv_cmd
#9 in lttng_consumer_recv_cmd
#10 in consumer_thread_sessiond_poll
#11 in start_thread nptl/pthread_create.c:481
previously allocated by thread T7 here:
#0 in __interceptor_calloc
#1 in zmalloc
#2 in metadata_bucket_create
#3 in consumer_stream_enable_metadata_bucketization
#4 in lttng_ustconsumer_set_stream_ops
#5 in lttng_ustconsumer_on_recv_stream
#6 in lttng_consumer_on_recv_stream
#7 in create_ust_streams
#8 in ask_channel
#9 in lttng_ustconsumer_recv_cmd
#10 in lttng_consumer_recv_cmd
#11 in consumer_thread_sessiond_poll
#12 in start_thread nptl/pthread_create.c:481
Thread T7 created by T0 here:
#0 in __interceptor_pthread_create
#1 in main
#2 in __libc_start_main ../csu/libc-start.c:332
SUMMARY: AddressSanitizer: heap-use-after-free in metadata_bucket_destroy
This can be easily reproduced by forcing a failure during the setup
of the metadata reproducible using the following change:
diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c
index
fa1c71299..
97ed59632 100644
--- a/src/common/ust-consumer/ust-consumer.c
+++ b/src/common/ust-consumer/ust-consumer.c
@@ -908,8 +908,7 @@ static int setup_metadata(struct lttng_consumer_local_data *ctx, uint64_t key)
/* Send metadata stream to relayd if needed. */
if (metadata->metadata_stream->net_seq_idx != (uint64_t) -1ULL) {
- ret = consumer_send_relayd_stream(metadata->metadata_stream,
- metadata->pathname);
+ ret = -1;
if (ret < 0) {
ret = LTTCOMM_CONSUMERD_ERROR_METADATA;
goto error;
Cause
=====
Channels have a list of streams that are being "setup" and are not
yet monitored for consumption. During this setup phase, the streams are
owned by the channel. On destruction of the channel, any stream in that
list will thus be cleaned-up.
When destroying a consumer stream, a reference to its channel is 'put'.
This can result in the destruction of the channel.
In the situation described above, the release of the channel's reference
is done before the stream is removed from the channel's stream list.
This causes the channel's clean-up to invoke (again) the current
stream's clean-up, resulting in the double-free of the metadata bucket.
This problem is present in a number of error paths.
Solution
========
Some error paths already manually removed the consumer stream from it's
channel's stream list before invoking consumer_stream_destroy(). The
various error paths that have to deal with this possible situation are
changed to simply invoke consumer_stream_destroy().
consumer_stream_destroy() is modified to always remove the stream from
its channel's list before performing the rest of the clean-up. This
ensures that those double clean-ups can't occur.
Drawbacks
=========
None.
Reported-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Tested-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ibeca9b675b86fc46be3f57826f7158de4da43df8
Jérémie Galarneau [Thu, 3 Mar 2022 22:52:33 +0000 (17:52 -0500)]
Fix: ust-consumerd: leak of stream control structure
The following leak is reported by LeakSanitizer when
setup_metadata() fails to send the metadata stream to the relay
daemon:
==
3050181==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 240 byte(s) in 5 object(s) allocated from:
#0 0x7f5fce02cfb9 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x7f5fcdd95a7a in zmalloc ../../../src/common/macros.h:23
#2 0x7f5fcdd95a7a in lttng_ust_ctl_create_stream /home/jgalar/EfficiOS/src/lttng-ust/src/lib/lttng-ust-ctl/ustctl.c:1649
A consumer stream can have an allocated
`struct lttng_ust_ctl_consumer_stream *` (ustream) even if it is
not globally visible at the time of its teardown.
In the case of the user space consumer, the only site that creates
consumer stream instances ensures that the allocation of the
lttng_ust_ctl_consumer_stream succeeded, ensuring that the
consumer stream's 'ustream' is always set.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ia4be7969e85bd8812ae13b042e1e100812a63c1d
Jérémie Galarneau [Fri, 4 Mar 2022 20:29:12 +0000 (15:29 -0500)]
Fix: liblttng-ctl: erroneous flat size computation
compute_flattened_size() erroneously computes (over-estimates) the size
of the allocation required to hold the flat array of struct lttng_event
returned to the user by lttng_list_{events, syscalls, tracepoints}.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I0a80ef0fa66428b7df34303804a024e80b635c69
Michael Jeanson [Thu, 5 Aug 2021 20:49:26 +0000 (16:49 -0400)]
fix: msgpack requires limits.h for UINT_MAX
Building with '-Wundef' reveals this issue :
unpack.c: In function ‘template_callback_array’:
unpack.c:197:17: warning: "UINT_MAX" is not defined, evaluates to 0 [-Wundef]
197 | #if SIZE_MAX == UINT_MAX
| ^~~~~~~~
unpack.c: In function ‘template_callback_map’:
unpack.c:241:17: warning: "UINT_MAX" is not defined, evaluates to 0 [-Wundef]
241 | #if SIZE_MAX == UINT_MAX
| ^~~~~~~~
Change-Id: I7dadd9f7013d613509f66e67ff1beb8ae593d2bf
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Thu, 3 Mar 2022 23:28:26 +0000 (18:28 -0500)]
Split warning flags list for C and C++
When using Ccache [1], some flags specific to C are accepted by the C++
compiler but result in warning messages on each invocation of the
compiler. To remediate this, split the warning flags detection list in
three, a common base and a specific list for C and C++.
[1] https://github.com/ccache/ccache/issues/738
Change-Id: I9ef360efbfae445845ca1016e5f5eebdd3bdb0ac
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 13 Jan 2022 20:38:06 +0000 (15:38 -0500)]
Tests: add a multi-domain ust+kernel rotation test
Validate that multi-domain rotations work as intended for both local and
remote outputs. This validates the fix introduced by c5c793.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I91ddd313fcfbc5421db281baea34ba2d0aae2c82
Jérémie Galarneau [Wed, 2 Mar 2022 17:59:17 +0000 (12:59 -0500)]
Fix: relayd: connection abruptly closed on viewer stream creation failure
Commit
fe88e5175 explains (and fixes) an issue that could cause the
creation of viewer streams to fail. Currently, the error path causes the
relay daemon to abruptly close the connection to its live viewer peer.
This behaviour makes it impossible for the viewer to determine if an
error occurred or if the network connection simply failed.
Returning an `LTTNG_VIEWER_NEW_STREAMS_ERR` status code allows the
viewer to report a precise error. The viewer connection is closed since
the internal error is unlikely to be recoverable.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I557a8b66c1bd1c0bf361cfbabe962d8a6808f4f4
Jérémie Galarneau [Wed, 2 Mar 2022 17:37:39 +0000 (12:37 -0500)]
Fix: relayd: live client fails on clear of multi-domain session
Observed issue
==============
Two test cases of the clear/test_ust test suite occasionally fail in the
integration jobs testing cross-bitness (32/64) LTTng deployments.
Babeltrace fails with the following error when a clear occurs while a
live client consumes a trace:
02-28 16:55:03.262 32362 32362 E PLUGIN/SRC.CTF.LTTNG-LIVE/VIEWER lttng_live_recv@viewer-connection.c:198 [lttng-live] Remote side has closed connection
02-28 16:55:03.262 32362 32362 E PLUGIN/SRC.CTF.LTTNG-LIVE/VIEWER lttng_live_session_get_new_streams@viewer-connection.c:1706 [lttng-live] Error receiving get new streams reply
02-28 16:55:03.262 32362 32362 E PLUGIN/SRC.CTF.LTTNG-LIVE lttng_live_msg_iter_next@lttng-live.c:1665 [lttng-live] Error preparing the next batch of messages: live-iter-status=LTTNG_LIVE_ITERATOR_STATUS_ERROR
02-28 16:55:03.262 32362 32362 W LIB/MSG-ITER bt_message_iterator_next@iterator.c:864 Component input port message iterator's "next" method failed: iter-addr=0x55eab7eb1170, iter-upstream-comp-name="lttng-live", iter-upstream-comp-log-level=WARNING, iter-upstream-comp-class-type=SOURCE, iter-upstream-comp-class-name="lttng-live", iter-upstream-comp-class-partial-descr="Connect to an LTTng relay daemon", iter-upstream-port-type=OUTPUT, iter-upstream-port-name="out", status=ERROR
02-28 16:55:03.262 32362 32362 E PLUGIN/FLT.UTILS.MUXER muxer_upstream_msg_iter_next@muxer.c:454 [muxer] Upstream iterator's next method returned an error: status=ERROR
02-28 16:55:03.262 32362 32362 E PLUGIN/FLT.UTILS.MUXER validate_muxer_upstream_msg_iters@muxer.c:991 [muxer] Cannot validate muxer's upstream message iterator wrapper: muxer-msg-iter-addr=0x55eab7eb1120, muxer-upstream-msg-iter-wrap-addr=0x55eab7eb3a70
02-28 16:55:03.262 32362 32362 E PLUGIN/FLT.UTILS.MUXER muxer_msg_iter_next@muxer.c:1415 [muxer] Cannot get next message: comp-addr=0x55eab7eb0470, muxer-comp-addr=0x55eab7eb0510, muxer-msg-iter-addr=0x55eab7eb1120, msg-iter-addr=0x55eab7eb0fb0, status=ERROR
02-28 16:55:03.262 32362 32362 W LIB/MSG-ITER bt_message_iterator_next@iterator.c:864 Component input port message iterator's "next" method failed: iter-addr=0x55eab7eb0fb0, iter-upstream-comp-name="muxer", iter-upstream-comp-log-level=WARNING, iter-upstream-comp-class-type=FILTER, iter-upstream-comp-class-name="muxer", iter-upstream-comp-class-partial-descr="Sort messages from multiple inpu", iter-upstream-port-type=OUTPUT, iter-upstream-port-name="out", status=ERROR
02-28 16:55:03.262 32362 32362 W LIB/GRAPH consume_graph_sink@graph.c:473 Component's "consume" method failed: status=ERROR, comp-addr=0x55eab7eb0760, comp-name="pretty", comp-log-level=WARNING, comp-class-type=SINK, comp-class-name="pretty", comp-class-partial-descr="Pretty-print messages (`text` fo", comp-class-is-frozen=1, comp-class-so-handle-addr=0x55eab7ebd910, comp-class-so-handle-path="/root/workspace/joraj_integration_base_job/deps-64/build/lib/babeltrace2/plugins/babeltrace-plugin-text.so", comp-input-port-count=1, comp-output-port-count=0
02-28 16:55:03.262 32362 32362 E CLI cmd_run@babeltrace2.c:2548 Graph failed to complete successfully
ERROR: [Babeltrace CLI] (babeltrace2.c:2548)
Graph failed to complete successfully
CAUSED BY [libbabeltrace2] (graph.c:473)
Component's "consume" method failed: status=ERROR, comp-addr=0x55eab7eb0760,
comp-name="pretty", comp-log-level=WARNING, comp-class-type=SINK,
comp-class-name="pretty", comp-class-partial-descr="Pretty-print messages
(`text` fo", comp-class-is-frozen=1, comp-class-so-handle-addr=0x55eab7ebd910,
comp-class-so-handle-path="/root/workspace/joraj_integration_base_job/deps-64/build/lib/babeltrace2/plugins/babeltrace-plugin-text.so",
comp-input-port-count=1, comp-output-port-count=0
CAUSED BY [libbabeltrace2] (iterator.c:864)
Component input port message iterator's "next" method failed:
iter-addr=0x55eab7eb0fb0, iter-upstream-comp-name="muxer",
iter-upstream-comp-log-level=WARNING, iter-upstream-comp-class-type=FILTER,
iter-upstream-comp-class-name="muxer",
iter-upstream-comp-class-partial-descr="Sort messages from multiple inpu",
iter-upstream-port-type=OUTPUT, iter-upstream-port-name="out", status=ERROR
CAUSED BY [muxer: 'filter.utils.muxer'] (muxer.c:991)
Cannot validate muxer's upstream message iterator wrapper:
muxer-msg-iter-addr=0x55eab7eb1120,
muxer-upstream-msg-iter-wrap-addr=0x55eab7eb3a70
CAUSED BY [muxer: 'filter.utils.muxer'] (muxer.c:454)
Upstream iterator's next method returned an error: status=ERROR
CAUSED BY [libbabeltrace2] (iterator.c:864)
Component input port message iterator's "next" method failed:
iter-addr=0x55eab7eb1170, iter-upstream-comp-name="lttng-live",
iter-upstream-comp-log-level=WARNING, iter-upstream-comp-class-type=SOURCE,
iter-upstream-comp-class-name="lttng-live",
iter-upstream-comp-class-partial-descr="Connect to an LTTng relay daemon",
iter-upstream-port-type=OUTPUT, iter-upstream-port-name="out", status=ERROR
CAUSED BY [lttng-live: 'source.ctf.lttng-live'] (lttng-live.c:1665)
Error preparing the next batch of messages:
live-iter-status=LTTNG_LIVE_ITERATOR_STATUS_ERROR
CAUSED BY [lttng-live: 'source.ctf.lttng-live'] (viewer-connection.c:1706)
Error receiving get new streams reply
CAUSED BY [lttng-live: 'source.ctf.lttng-live'] (viewer-connection.c:198)
Remote side has closed connection
Looking at the relay daemon logs, we see the following error:
DBG1 - 16:55:03.
262106718 [32139/32146]: Adding new file "ust/pid/gen-ust-events-32373-
20220228-165503/chan_0" to trace chunk "(unnamed)" (in lttng_trace_chunk_add_file() at trace-chunk.cpp:1310)
PERROR - 16:55:03.
262133333 [32139/32146]: Failed to open fs handle to ust/pid/gen-ust-events-32373-
20220228-165503/chan_0, open() returned: No such file or directory (in fd_tracker_open_fs_handle() at fd-tracker/fd-tracker.cpp:548)
Cause
=====
Adding more debugging logging allows us to see the following situation
takes place:
- relay thread: Create trace chunk on session 1.
- live thread: get new streams against session 1, returns NO_NEW_STREAMS
since the session has an 'ongoing_rotation'.
- live thread: get new streams against session 2, sees no rotation
ongoing and attempts to open `chan_0` when creating a viewer stream
The "ongoing rotation" check was introduced in
a7ceb342d and, in a
nutshell, prevents live viewers from creating new viewer streams during
a rotation.
The "ongoing rotation" state is entered when a CREATE_NEW_TRACE_CHUNK
command is issued against a session.
However, this presumes that a relay_session maps 1:1 to a session on the
session daemon's end. This isn't the case as, in multi-domain
scenarios (tracing 32-bit, 64-bit, and kernel events), a single session
daemon session can map to multiple relay_session objects. This is
because the consumer daemons maintain independant connections to the
relay daemon.
To synchronize rotations accross related relay_session instances, the
relay daemon uses the same trace chunk instances accross relay_session
instances. This means that while a trace chunk is created against a
specific relay session, it can be used by other relay_session instances.
To manage shared trace chunks between relay_sessions, the relay daemon
makes use of the trace_chunk_registry. This registry allows
relay_sessions to share trace chunk instances using a unique key tuple:
- session daemon instance uuid,
- session daemon session id,
- trace chunk id.
There is no equivalent mechanism to track the "ongoing_rotation" state
accross relay_sessions originating from the same sessiond session.
In the current scenario, this causes the live client to correctly see
that no new streams are available for session 1 (say, the 32-bit user
space session). Unfortunately, this state is not entered for other
sessions (64-bit and kernel relay sessions). Hence, the viewer succeds
in acquiring new streams from session 2, exposing the race the 'ongoing
rotation' state aims to protect against.
Solution
========
Like the trace chunk instances, the "ongoing rotation" state must be
shared accross relay sessions that originate from the same session
daemon session.
To "emulate" this shared state, session_has_ongoing_rotation() checks
if any relay session originating from the same sessiond session
have an ongoing rotation. If it is the case, we temporarily prevent
live viewers from acquiring new streams.
Known drawbacks
===============
session_has_ongoing_rotation() iterates over all sessions, acquiring
their lock in the process, which is certainly undesirable from a
performance standpoint.
Optimizing this is not a great challenge, but is beyond the scope
of this immediate fix.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I457a32fa497d151ca933c25433c80665268a7c1c
Jonathan Rajotte [Mon, 14 Feb 2022 16:23:28 +0000 (11:23 -0500)]
Fix: rotation: hang on destroy when using scheduled rotation based on timer
Observed issue
==============
The following scenario results in a hang for `lttng destroy`:
lttng create test
lttng enable-event -u -a
lttng enable-rotation --timer 100000
lttng start
lttng stop
lttng start
lttng destroy
Cause
=====
There is an imbalance in how many times we start the rotation timer.
The rotation timer is only removed on `lttng destroy` or when disabling
a time-based-rotation. On the other hand, the timer is "started"
on `lttng start` and when enabling a time based rotation.
The imbalance emerging from a start/stop/start sequence would prevent the
teardown of the session object since each time the timer is started a
reference to the session is held.
Solution
========
Do not start the rotation schedule timer if it was already launched.
Known drawbacks
=========
None.
Change-Id: Ic5b8938166358fe7629187bebdf02a09e90846c0
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Wed, 9 Feb 2022 19:23:18 +0000 (14:23 -0500)]
Fix: consumerd: fd still open after `lttng snapshot record` returns
Observed issue
=====
Using a snapshot output located on a pramfs mount:
lttng snapshot record
rm -rf /my_mount/my_trace_output
`rm` fails with ENOTEMPTY on rmdir for /my_mount/my_trace_output.
At that point, the lttng-consumerd daemon have an open fd on:
/my_mount/my_trace_output/ust
Note that a sleep between both command "fixes" the issue.
Cause
=====
The reclaim for the in-registry trace chunks can happen after the LTTng
CLI returns since we use `call_rcu`.
```
static
void lttng_trace_chunk_release(struct urcu_ref *ref)
....
if (chunk->in_registry_element) {
struct lttng_trace_chunk_registry_element *element;
element = container_of(chunk, typeof(*element), chunk);
if (element->registry) {
rcu_read_lock();
cds_lfht_del(element->registry->ht, &element->trace_chunk_registry_ht_node);
rcu_read_unlock();
-> call_rcu(&element->rcu_node, free_lttng_trace_chunk_registry_element);
} else {
```
The delayed reclaim of the `lttng_trace_chunk_registry_element` can
result in lttng-consumerd holding an open fd for the "chunk directory"
of the chunk since the close() is only done during the "*fini" phase of
the chunk (`lttng_trace_chunk_fini`).
Solution
========
Considering that the rcu lookup+refcount access scheme is used for the
trace chunk object and that at that point the refcount for the trace
chunk object is effectively zero, we can move the
`lttng_trace_chunk_fini` safely outside of the
`free_lttng_trace_chunk_registry_element` call_rcu call.
Known drawbacks
=========
Even if this solves the current situation, it is important to note that
the actual object holding the reference is itself refcounted and only
close the fd on release. This means that we are still exposed to this
problem if at some point the directory handle is shared and outlives the
trace chunk for some reason in the future.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I6da3948824bf8b092fc8248b1bb0263fdd5887be
Jonathan Rajotte [Wed, 2 Mar 2022 15:50:16 +0000 (10:50 -0500)]
Fix: event: unchecked return value for lttng_strncpy
Reported by Coverity:
CID
1475806: Unchecked return value (CHECKED_RETURN)
Calling lttng_strncpy without checking return value.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ib2b16bef61bcc24fea381c18fd8f914e9a84a6e7
Jonathan Rajotte [Wed, 2 Mar 2022 15:36:11 +0000 (10:36 -0500)]
Fix: event: format specifier for ssize_t
Observed issue
==============
In file included from event.cpp:15:
event.cpp: In function ‘ssize_t lttng_event_create_from_payload(lttng_payload_view*, lttng_event**, lttng_event_exclusion**, char**, lttng_bytecode**)’:
../../src/common/error.h:191:28: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘ssize_t’ {aka ‘int’} [-Wformat=]
191 | __lttng_print(PRINT_WARN, "Warning: " fmt "\n", ## args)
| ^~~~~~~~~~~~~~~~~~~~
../../src/common/error.h:139:51: note: in definition of macro ‘__lttng_print’
139 | fprintf((type) == PRINT_MSG ? stdout : stderr, fmt, ## args); \
| ^~~
event.cpp:624:4: note: in expansion of macro ‘WARN’
624 | WARN("Userspace probe location from the received buffer is not the advertised length: header length = %" PRIu32 ", payload length = %lu", event_comm->userspace_probe_location_len, ret);
| ^~~~
Solution
========
Albeit there is no "canonical" way of printing ssize_t, use '%zd' since
we already make use of it elsewhere.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Id41e6ccf07bd580813f169b65d281a4fa305fb48
Jonathan Rajotte [Wed, 2 Mar 2022 15:24:29 +0000 (10:24 -0500)]
Fix: sessiond: use after free
Introduced by
88b02a8e7cbdbc728cc681cd07bbe0c738964717
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I9b1c3eeed7e51976af1a4982fc522066b3b29461
Jonathan Rajotte [Tue, 1 Mar 2022 15:35:44 +0000 (10:35 -0500)]
Fix: sessiond: `break` used instead of `goto`
On error path, use `goto`.
Reported by Coverity:
CID
1475805: Unused value (UNUSED_VALUE)
Assigning value LTTNG_ERR_UNK to ret_code here, but that stored value is overwritten before it can be used.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ife3b5486b8463242dfecb61a3a9718e8eb7149b4
Jonathan Rajotte [Tue, 1 Mar 2022 15:31:33 +0000 (10:31 -0500)]
Fix: sessiond: lttng_channel object is not reclaimed
The reclaim is also missing on the error path.
Reported by Coverity:
CID
1475807: Resource leak (RESOURCE_LEAK)
Variable channel going out of scope leaks the storage it points to.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I9ae59900f6bc33282b8b8a65b834a718428a7f50
Jonathan Rajotte [Tue, 1 Mar 2022 15:27:37 +0000 (10:27 -0500)]
Fix: common: local_attr might leak
Reported by Coverity:
CID
1475808: Resource leak (RESOURCE_LEAK)
Variable local_attr going out of scope leaks the storage it points to.
CID
1475810: Resource leak (RESOURCE_LEAK)
Variable local_attr going out of scope leaks the storage it points to.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ibe2ee6516618827b6f52676d9a9f7d510cadf1bc
Jonathan Rajotte [Tue, 1 Mar 2022 15:23:24 +0000 (10:23 -0500)]
Fix: session: end goto label used for error path
Reported by Coverity:
CID
1475801: Unused value (UNUSED_VALUE)
Assigning value LTTNG_ERR_FATAL to ret_code here, but that stored value is overwritten before it can be used
CID
1475802: Unused value (UNUSED_VALUE)
Assigning value LTTNG_ERR_NOMEM to ret_code here, but that stored value is overwritten before it can be used.
CID
1475814: Unused value (UNUSED_VALUE)
Assigning value LTTNG_ERR_UST_CHAN_NOT_FOUND to ret_code here, but that stored value is overwritten before it can be used.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I79fb5d65ea0f4f3dfbbef2a42ec3dcf0542043c5
Jonathan Rajotte [Tue, 1 Mar 2022 15:18:46 +0000 (10:18 -0500)]
Fix: sessiond: event, filter expression, bytecode, exclusions might leak
Reported by Coverity:
CID
1475803: Resource leak (RESOURCE_LEAK)
Variable filter_expression going out of scope leaks the storage it points to.
CID
1475816: Resource leak (RESOURCE_LEAK)
Variable exclusions going out of scope leaks the storage it points to.
CID
1475817: Resource leak (RESOURCE_LEAK)
Variable bytecode going out of scope leaks the storage it points to.
CID
1475820: Resource leak (RESOURCE_LEAK)
Variable event going out of scope leaks the storage it points to.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I705e3cc1a1894af52d6ddeee94798a887dc82bba
Jonathan Rajotte [Tue, 1 Mar 2022 14:52:21 +0000 (09:52 -0500)]
Fix: lttng-sessiond: event_context might leak
If the parsing completed but is not the expected size, the
*out_event_context object leaks.
Reported by Coverity:
CID
1475821 (#1 of 1): Resource leak (RESOURCE_LEAK)
Variable event_context going out of scope leaks the storage it points to.
Change-Id: I8198f7f3f1e1af3dcb377d89674402ec49cb788b
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Tue, 1 Mar 2022 14:09:04 +0000 (09:09 -0500)]
Fix: lttng-ctl: uninitialized pointer read
Reported by Coverity:
CID
1475822: Uninitialized pointer read (UNINIT)
Using uninitialized value reply._fd_handles.array.size when calling lttng_payload_reset.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ied7f5de61d736e8d8f9c869a36322262a4d79cd5
Jonathan Rajotte [Tue, 1 Mar 2022 14:03:45 +0000 (09:03 -0500)]
Fix: lttng-sessiond: nb_events is uninitialized
Reported by Coverity:
CID
1475823: Uninitialized scalar variable (UNINIT)
Using uninitialized value nb_events.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I623b1ed0c1ef9b4acddb1e46e7311dfcdc974629
Jonathan Rajotte [Tue, 1 Mar 2022 14:50:06 +0000 (09:50 -0500)]
Fix: ser/des: missing null terminator on payload append
Reported by the lttng-ust-java-tests_master_build CI job.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I47b048c65a9855347c9b02fe44d9fdbae7e06dbf
Jonathan Rajotte [Tue, 1 Mar 2022 17:33:53 +0000 (12:33 -0500)]
Fix: test: test plan/skip must come after TAP initialization
TAP is initialized during the sourcing of `utils.sh`. Calling it before
yield nothing.
Move the platform validation after the `PLAN_TEST` call.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ib7b4434440ed7590b39b0580ea294045ace40495
Jonathan Rajotte [Tue, 1 Mar 2022 02:30:44 +0000 (21:30 -0500)]
Fix: test: missing test plan for non-x86/arm platforms (i.e powerpc)
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I9a34a0b45fd011943436ce6e45c8f0d1666df26f
Jonathan Rajotte [Tue, 1 Mar 2022 02:23:53 +0000 (21:23 -0500)]
Fix: remove debugging output
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I7bcdb7c47f811cf99dee1e4e483cc22847f481c5
Jérémie Galarneau [Wed, 23 Feb 2022 22:40:06 +0000 (17:40 -0500)]
Fix: lttng: truncated addresses and offsets on 32-bit builds
The lttng client parses hexadecimal addresses using, at some point,
strtoul(). Using this function effectively caps addresses and
offsets to MAX_UINT32 resulting in failures to enable kprobes
against a 64-bit kernel using a 32-bit client.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: If619e9e84413de5cd32d8c06f363152caaf5ac46
Jonathan Rajotte [Wed, 12 Jan 2022 23:18:08 +0000 (18:18 -0500)]
Fix: liblttng-ctl comm: lttng_event_field is not packed
Observed issue
==============
For MI testing where the lttng-sessiond is 64 bit and the lttng CLI is
32 bit, the tracepoint field listing fails with partial garbage output.
The size of the struct differs between bitness for x86-64 and x86
leading to serialization/deserialization problem across client
(liblttng-ctl) and lttng-sessiond.
sizeof(struct lttng_event_field):
x86: 1136
x86-64: 1144
The struct cannot be marked as LTTNG_PACKED since it is part of the API.
Solution
========
Adopt a similar pattern to the new APIs with a "serialize" &
"create_from_buffer" approach. The only particularity is that we need to
flatten the event_field on listing.
Most of the complexity is moved to `src/common/event.c`
Known drawbacks
=========
None.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I280d9809d110237574e2606ee93a7aeba41e704e
Jonathan Rajotte [Wed, 12 Jan 2022 23:18:08 +0000 (18:18 -0500)]
Fix: liblttng-ctl comm: lttng_event_context is not packed
Observed issue
==============
The size of the struct differs between bitness for x86-64 and x86
leading to serialization/deserialization problem across client
(liblttng-ctl) and lttng-sessiond.
sizeof(struct lttng_event_context):
x86: 308
x86-64: 312
The struct cannot be marked as LTTNG_PACKED since it is part of the API.
Solution
========
Adopt a similar pattern to the new API with a "serialize" &
"create_from_buffer" approach.
Most of the complexity is moved to `src/common/event.c`
Known drawbacks
=========
None.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ieb400eab2a2df4070ff51cb2b44929d3ea945ce4
Jonathan Rajotte [Wed, 12 Jan 2022 23:18:08 +0000 (18:18 -0500)]
Fix: liblttng-ctl comm: lttng_event is not packed
Observed issue
==============
In `lttcomm_session_msg` the lttng_event struct is marked as
LTTNG_PACKED. This statement have no effect as explained in commit [2].
Solution
========
Adopt a similar pattern to the new API with a "serialize" &
"create_from_buffer" approach.
Most of the complexity is moved to `src/common/event.c`
Known drawbacks
=========
None.
Note
====
Jérémie Galarneau: This patch was extensively modified from the original
patch applying against stable-2.12 to accomodate for the use of the
lttng_payload utils throughout the liblttng-ctl <-> lttng-sessiond
communication code.
Some changes were also made to build as C++.
Reference
========
[1] https://review.lttng.org/gitweb?p=lttng-tools.git;a=commit;h=
7bd95aee4660c6419a4a65429fc27754481e7e90
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I35d848519dacb2b119324e88f262aa95951e4ac6
Jonathan Rajotte [Sat, 29 Jan 2022 00:29:58 +0000 (19:29 -0500)]
libcommon: move event.c to libcommon-lgpl
The `event.c` license is already LGPL. There is no technical reason why
it was not part of the lgpl side of libcommon, simply that nothing that
is LGPL needed it. This will change in upcoming commits with the
addition of ser/des functions of `struct lttng_event` and other structs
related to `lttng_event` for liblttng-ctl.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I1e15a547e526198f971a287a726d0e6229a733b0
Jonathan Rajotte [Wed, 12 Jan 2022 23:18:08 +0000 (18:18 -0500)]
Fix: liblttng-ctl comm: lttng_channel is not packed
Observed issue
==============
The size of the struct differs between bitness for x86-64 and x86
leading to serialization/deserialization problem across client
(liblttng-ctl) and lttng-sessiond.
sizeof(struct lttng_channel):
x86: 608
x86-64: 624
The struct cannot be marked as LTTNG_PACKED since it is part of the API.
Solution
========
Adopt a similar pattern to the new API with a "serialize" &
"create_from_buffer" approach. The only particularity is that we need to
flatten the channels on listing.
Most of the complexity is moved to `src/common/channel.c`
Known drawbacks
=========
None.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Id5c9aaf3cf8b3d739b71263c02cae8d4d2fedfe3
Jonathan Rajotte [Thu, 27 Jan 2022 19:22:22 +0000 (14:22 -0500)]
Fix: conversion from KB to bytes overflow on arm32
Observed issue
==============
On enable channel the memory available check fails on arm32 when
available memory, in bytes, is larger than 2^32.
Cause
=====
`read_proc_meminfo_field` converts the read value (in KB) to bytes and
stores it into a size_t variable.
On the system running the reproducer the value of the `value_kb` variable
is
4839692, yielding an overflow when multiplied with 1024 since
`size_t` is 32 bit long. `size_t` can be larger in certain situation
(i.e LARGEFILE) but this is irrelevant to the problem at hand.
Solution
========
Convert all the checks to use uint64_t.
Known drawbacks
=========
None.
References
==========
The multiplication overflow check scheme is borrowed from
`src/common/time.c`
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I067da25659ab4115e5494e48aab45a1c35f56652
Jonathan Rajotte [Tue, 11 Jan 2022 17:28:15 +0000 (12:28 -0500)]
Fix: Unexpected payload size in cmd_recv_stream_2_11
Observed issue
==============
For the following scenario:
lttng-relayd: 64 bit
lttng-sessiond: 64 bit
lttng-consumerd: 32 bit
application: 32 bit
Commands
lttng create --set-url=net://127.0.0.1
lttng enable-event -u -a
lttng start
./application
On application start the lttng-relayd reports this error:
DEBUG1 - 14:16:38.
216442600 [
2004731/
2004735]: Done receiving control command payload: fd = 19, payload size = 4376 bytes (in relay_process_control_receive_payload() at main.c:3456)
DEBUG3 - 14:16:38.
216469462 [
2004731/
2004735]: Processing "RELAYD_ADD_STREAM" command for socket 19 (in relay_process_control_command() at main.c:3327)
Error: Unexpected payload size in "cmd_recv_stream_2_11": expected >=
3519925694 bytes, got 4376 bytes
Cause
=====
In `relayd_add_stream`, instead of taking the > 2.11 protocol path, the
`relayd_add_stream_2_2` function is called.
The value of the rsock version number are:
major: 21845
minor: 2
Which is simply invalid since we know that the version should be 2.12.
The relayd sock version numbers are set during the
LTTNG_CONSUMER_ADD_RELAYD_SOCKET command between the lttng-sessiond and
the lttng-consumerd process. It is important to note here that both
processes do NOT have the same bitness.
The serialization and deserialization of `struct lttcomm_relayd_sock` is
the culprit.
`struct lttcomm_relayd_sock` contains a `struct lttcomm_sock`:
struct lttcomm_sock {
int32_t fd;
enum lttcomm_sock_proto proto;
struct lttcomm_sockaddr sockaddr;
const struct lttcomm_proto_ops *ops;
} LTTNG_PACKED;
Note that `ops` is a pointer and its size varies based on the bitness of
the application. Hence the size of the `struct lttcomm_sock` differs
across bitness. Since it is the first member of `struct
lttcomm_relayd_sock`, the memory layout is simply invalid across
bitness (amd64/x86).
This results in invalid parsing for the overall "struct
lttcomm_relayd_sock" when dealing with a lttng-consumerd with a
different bitness than the lttng-sessiond. As far as I know local
tracing scenarios are not affected since this is only relevant when
dealing with a lttng-relayd.
Solution
========
Pass the socket protocol type, relayd major, relayd minor in
`lttcomm_consumer_msg`. On the receiver side, query the network stack to
get the peer information to populate a basic `lttcomm_sock`. Leaving
this work to the OS saves us from having to serialize the `sockaddr_in*`
structs.
Known drawbacks
=========
We rely on `getpeername` for the first time. Compatibility might be a
problem.
This code path assumes a lot of thing that cannot be asserted against
such as the fact that the socket from which we fetch the info must be
`connected`. Still at this point, the socket is completely setup and the
rest of the code depends on it already.
From GETPEERNAME(2):
```
For stream sockets, once a connect(2) has been performed, either
socket can call getpeername() to obtain the address of the peer
socket. On the other hand, datagram sockets are connectionless.
Calling connect(2) on a datagram socket merely sets the peer
address for outgoing datagrams sent with write(2) or recv(2).
The caller of connect(2) can use getpeername() to obtain the
peer address that it earlier set for the socket. However, the
peer socket is unaware of this information, and calling
getpeername() on the peer socket will return no useful
information (unless a connect(2) call was also executed on the
peer). Note also that the receiver of a datagram can obtain the
address of the sender when using recvfrom(2).
```
But here we are always "the caller of connect".
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ic157c4137b2f20e394c907136687fcbd126f90a0
Jonathan Rajotte [Mon, 31 Jan 2022 19:47:35 +0000 (14:47 -0500)]
Tests: add kernel test into the `make check` test suite.
The Jenkins CI mostly run the `make check` suite. Only the Lava base CI
run the root_regression test suite. Most of those test can be run on
`make check` without incurring any major extra time.
Only `regression/tools/streaming/test_high_throughput_limits` is left in
root_regression since it is currently "unreasonable" in term of the time
it takes to run. This could be tackled another time.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I3d59705e84549c2a1840cf96a7723396e2e0c402
Jérémie Galarneau [Wed, 9 Feb 2022 00:41:10 +0000 (19:41 -0500)]
Build dist fix: some kernel tests are not distributed
A missing '\' causes some kernel test scripts to not be added
the EXTRA_DIST variable. This causes those tests to not be
shipped as part of the tarball.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I3c824b4c7ea913cb4cdc7ac6f98df05267ca5764
Jérémie Galarneau [Thu, 10 Feb 2022 22:53:56 +0000 (17:53 -0500)]
Tests: fix: test_kernel_function: erroneous test count
The test_kernel_function test runs one extra test which causes it to be
marked as failed. Adjust the count to fit the number of tests.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ia861cf7b0123fdf19405bb84271ddda76f9fd6e7
Jérémie Galarneau [Wed, 9 Feb 2022 22:00:40 +0000 (17:00 -0500)]
Tests: fix: test_kernel_function: event name mismatch
The test_kernel_function test expects the name of the function
exit event to be suffixed with "_return". This was changed to
"_exit" during the development of the 2.13 release.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I6e7f85af33699b2bf0439f74343d521f896594e1
Jérémie Galarneau [Thu, 10 Feb 2022 16:59:31 +0000 (11:59 -0500)]
Tests: fix: select_poll_epoll: test assumes epoll fd value
The test currently assumes that epoll fds are always == 3, which
is not always the case depending on the execution environment.
This change causes `select_poll_epoll` to produce a JSON file
containing the application's pid and epoll fd values that is
then used by the validation script.
Note that the test is converted to C++ to allow the use of
internal utils (common/error.h/cpp) without changing their linkage.
However, the code is still regular C to ease the backport of this
fix.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ie373c63f6e6b9267ae2d785c9f0a532a5de37905
Jérémie Galarneau [Thu, 24 Feb 2022 19:06:28 +0000 (14:06 -0500)]
Fix: tests: test_kernel: break should only be used in loops
Using `break` in a function's scope makes no sense in bash. I am
guessing the original author meant to exit early from the various tests.
Regardless, the rest of the test can be ran without issues. I am not
sure traces of failed tests should be kept, but that's a separate issue.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I286ccb796afbbca4e3866e6fd0b35a3746045346
Jérémie Galarneau [Thu, 24 Feb 2022 19:05:16 +0000 (14:05 -0500)]
Tests: test_kernel: add comments regarding the filling of buffers
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I3a8652d7dd95c015d6978349d166e8f291c9a5f3
Mathieu Desnoyers [Thu, 24 Feb 2022 17:24:45 +0000 (12:24 -0500)]
Fix: tools/snapshots/test_kernel flaky test
When tracing all system calls, nothing guarantees that the first system
call won't come from some _other_ program on the system, on a CPU != 0,
and stay invariant between the two snapshots (when it should not be).
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Iba45664f10eea0a757f8ac4eb9a7c7c75c305eef
Francis Deslauriers [Thu, 25 Mar 2021 17:13:45 +0000 (13:13 -0400)]
Tests: Fix: test_list_triggers_cli: support in-kernel builtin lttng-modules
This commit changes the grep call to remove the [lttng_tracer] string
from the pattern. When building the lttng modules directly in the kernel
there is not mention of [lttng_tracer].
Furthermore, the symbol type "t" is "T" in my test VM. The difference
may be due to the builtin nature of lttng on this VM.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I046905dc51524c23ad3671b66614b86a084ef8e2
Depends-on: lttng-ust: If1f29dd64538bc5979d3e89aa5cca3be4e41046f
Michael Jeanson [Fri, 11 Feb 2022 15:26:02 +0000 (15:26 +0000)]
Add Log4j 2.x agent tests for the 'log4j' domain
Add integration tests for the new Log4j 2.x agent in Log4j 1.x compat
mode using the current 'log4j' domain, use the new configure switch
'--enable-test-java-agent-log4j2' to enable it or
'--enable-test-java-agent-all' to enable all Java agents tests.
To run only this new test, use this command :
cd tests/regression && make check TESTS="ust/java-log4j2/test_agent_log4j2_domain_log4j"
Change-Id: Id780c9ee13913c91c18548f58b14cc600e77e9fa
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Fri, 19 Nov 2021 20:40:09 +0000 (15:40 -0500)]
src/common: use single Makefile for parallel builds
Use a single Makefile in 'src/common' as it contains multiple
subdirectories with a small number of objects to compile. This allows
faster parallel builds since parallelism in automake is applied per
Makefile.
There is anectodal evidence of a 25 seconds improvement to the build
process on a 36 core machine.
Change-Id: If2ce266050e345d58b00bf65b574ccf5168f28f1
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Mon, 27 Sep 2021 18:01:04 +0000 (14:01 -0400)]
Fix: missing RCU read side critical sections
Based on the comments of the called functions.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ica81b648ce88290c4ca7507fb00a78480457cf01
Francis Deslauriers [Mon, 27 Sep 2021 14:56:28 +0000 (10:56 -0400)]
Enforce documented RCU preconditions with assertions
Mindlessly add `rcu_read_ongoing()` assertions to functions that are
documented as "must be called" within a RCU critical section.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I25f9903938123394e6960ab2a338be6abaf2fe72
Francis Deslauriers [Tue, 18 Jan 2022 22:25:06 +0000 (17:25 -0500)]
Remove ht-cleanup thread
The hashtable cleanup thread was introduced to prevent deadlocks
happening when the `cds_lfht_destroy()` function was called concurrently
with userspace-rcu hashtable resizes.
This was fixed in the userspace-rcu project in commit:
commit
d0ec0ed2fcb5d67a28587dcb778606e64f5b7b83
Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date: Tue May 30 15:51:45 2017 -0400
Use workqueue in rculfhash
That commit makes it so that the `cds_lfht_destroy()` function can
safely be called within RCU read-side critical sections.
This commit is included in the 0.10 release of urcu. The LTTng-Tools
project now has a minimum version dependency on urcu 0.11.
Because it's now safe to call `cds_lfht_destroy()` within RCU critical
sections, the need for the hash table cleanup thread disappears. This
commit replaces all uses of `ht_cleanup_push()` by `lttng_ht_destroy()`
and remove all uses and mentions of the ht_cleanup thread.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I163a281b64a6b3eed62c58515932f71f3b52fea6
Francis Deslauriers [Wed, 20 Jan 2021 21:18:09 +0000 (16:18 -0500)]
Cleanup: remove outdated comments on RCU requirements
The ht-cleanup thread was introduced to work-around the limitation that
the hash table destroy could not be called from a RCU read-side critical
section because of requirements from the rculfhash API, some comments
stating that destruction code should not be called from RCU read-side
critical sections were left in place when they should in fact have been
removed.
This ht-cleanup thread work around was introduced by the following
commit:
commit
0b2dc8df2a6d7b3341a72a04767dd6328907c97c
Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date: Fri Jun 14 07:44:52 2013 -0400
Fix: hash table cleanup call_rcu deadlock
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ifdaa63e6146b51fdb251e75795e382b5f390f7be
Francis Deslauriers [Thu, 18 Nov 2021 20:11:58 +0000 (15:11 -0500)]
relayd: add logging statements in `viewer_get_packet()`
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Idf3c7a5e87636abe63ff59d22bb3104f1d3519e5
Francis Deslauriers [Thu, 18 Nov 2021 17:57:22 +0000 (12:57 -0500)]
relayd: add logging statements in `viewer_attach_session()`
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I2cdae4bd809442d5de7dcfb838f346e1e4370628
Francis Deslauriers [Tue, 16 Nov 2021 20:14:57 +0000 (15:14 -0500)]
relayd: add logging statements in `viewer_get_next_index()`
Change the byte order of the `status` field right before we send the
index to the viewer to allow for useful logging along the way.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I50bb246810486e13fde3085e21fea4bb5ef3776b
Mathieu Desnoyers [Wed, 26 Jan 2022 15:53:55 +0000 (10:53 -0500)]
Fix: _lttng_variant_statedump should expect lttng_ust_ctl_atype_variant_nestable
The precondition check in _lttng_variant_statedump is too strict: it
should also expect lttng_ust_ctl_atype_variant_nestable. Remove this
check entirely, which is redundant with the switch/case in the only
caller sites in the same compile unit.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I744cc5cef69ff29f8d7ecc1cf01c9a629c6e222e
Mathieu Desnoyers [Mon, 24 Jan 2022 22:05:39 +0000 (17:05 -0500)]
Fix context mismatch across UST version due to legacy array context field
Observed issue
==============
Tracing applications linked against both LTTng-UST 2.12.x and 2.13.1 in
the same session fails with:
"Error: Registering application channel due to context field mismatch"
for some applications with the "procname" context enabled.
Cause
=====
The procname context uses the legacy array field type prior to LTTng-UST
2.13, and uses the array_nestable field type starting from LTTng-UST
2.13. The field comparison in lttng-sessiond is unaware of the fact that
they map to the exact same binary trace layout and are therefore
compatible.
Solution
========
Introduce a new fixup function "ust_app_fixup_legacy_context_fields" in
lttng-sessiond for channel context fields, which detects the presence of
the legacy array, struct and variant types, and rewrites them as
array_nestable, struct_nestable, and variant_nestable types.
Reject the legacy sequence type in channel context fields because it is
not used by LTTng-UST 2.12 and older.
Rewriting those legacy context types as the new "nestable" types ensures
that field comparison functions will correctly handle a mix of 2.12 and
2.13 LTTng-UST tracers using a procname context to a given session's
channel.
Known Drawbacks
===============
None.
Fixes: fb2772932 ("Validate channel context mismatch across UST applications")
Change-Id: Ic51b92130ae8e1f7b510924f78ff46459dc2f578
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Mathieu Desnoyers [Wed, 12 Jan 2022 21:46:21 +0000 (16:46 -0500)]
Relicence all source and header files included in LGPL code
All code included in libcommon-lgpl.a should be LGPL. Some were licensed
as GPLv2 by mistake. We need to relicense those.
EfficiOS owns the copyright to all of the affected source files and
agrees to this relicensing from GPLv2 to LGPLv2.1.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ib4aa1a7cc8a6f8e2a9891f1bd05c7ea2f8530e9b
Mathieu Desnoyers [Wed, 19 Jan 2022 16:35:20 +0000 (11:35 -0500)]
Move utils_expand_path and utils_expand_path_keep_symlink to libpath.la
Move the GPLv2 helper functions utils_expand_path and
utils_expand_path_keep_symlink to libpath.la. This will allow utils.cpp
to be relicensed to LGPLv2.1 by making sure EfficiOS owns the copyright
for the entire source file.
Statically include libpath.la into libcommon-gpl.la.
The "lttng" executable is GPLv2 and only depends on libcommon-lgpl.la,
so it needs to explicitly list libpath.la as its dependency.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I2108e0bd35ce75797d4a20e65891c820a1dd79f8
Mathieu Desnoyers [Wed, 12 Jan 2022 20:23:09 +0000 (15:23 -0500)]
Link lttng executable on libcommon-lgpl.a
Allows testing whether liblttng-ctl.so has any dependency on
libcommon-gpl.a. Will trigger a link-time failure if it is the case.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ie6f864dccdd61f896a7e238e202dc6188ec4f9c5
Mathieu Desnoyers [Tue, 11 Jan 2022 20:57:37 +0000 (15:57 -0500)]
Introduce libcommon-lgpl for liblttng-ctl
liblttng-ctl is a LGPLv2.1 library should should not use GPLv2 code.
Introduce libcommon-lgpl as a static archive containing only LGPLv2.1
compatible code.
This also removes the dependency from liblttng-ctl to liburcu.
Include some source files in libcommon-lgpl.a which are indirectly needed
by source files required in libcommon-lgpl.a:
- endpoint.cpp,
- lttng-elf.cpp,
- lttng-elf.h.
Include some source files in libcommon-lgpl.a which are only needed to
link the lttng executable:
- domain.cpp,
- spawn-viewer.cpp, spawn-viewer.h.
Introduce the new source file hashtable/seed.cpp to move the
lttng_ht_seed symbol in a source file which does not require
liburcu-cds, so it can be present in libcommon-lgpl. This allows
building compile units which are needed in the lgpl common library which
also contain functions which directly refer to lttng_ht_seed.
Programs and libraries which use libhashtable.la are changed to use
libcommon-gpl.la instead. libhashtable becomes internal to libcommon.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I27d2acb823a6d951692a5da88ce32bbe6bafb072
Mathieu Desnoyers [Tue, 11 Jan 2022 19:46:37 +0000 (14:46 -0500)]
Rename libcommon.so to libcommon-gpl.so
libcommon is a static library is currently used by both liblttng-ctl
(LGPLv2.1) and all lttng-tools executables (GPLv2).
Given that some code in libcommon depends on liburcu, this introduces an
indirect dependency from liblttng-ctl to liburcu, which is unwanted.
This first step renames libcommon.so to libcommon-gpl.so. Following
steps will introduce a more lightweight libcommon-lgpl.so which only
contains LGPLv2.1 code, and removes the dependency on liburcu.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ia8f37ea229f68550200cbb1528216a505bbbd45f
Mathieu Desnoyers [Thu, 20 Jan 2022 15:30:01 +0000 (10:30 -0500)]
Copyright ownership transfer
Apply copyright ownership transfer from Jonathan Rajotte-Julien and
Olivier Cotte to EfficiOS Inc.
Link: https://lists.lttng.org/pipermail/lttng-dev/2022-January/030111.html
Link: https://lists.lttng.org/pipermail/lttng-dev/2022-January/030113.html
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Olivier Moussavou Cotte <olivier.moussavoucotte@mail.mcgill.ca>
Signed-off-by: Jonathan Rajotte-Julien <jonathan.r.julien@gmail.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I0c2800ec17f5d435c8a18f2dae096809b8037b5d
Jérémie Galarneau [Wed, 19 Jan 2022 19:35:25 +0000 (14:35 -0500)]
Docs: relayd: live: clarify ownership of vstream after viewer release
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I2b0a838bb72bec1ec693ee0c722b748105d7280c
Jérémie Galarneau [Fri, 14 Jan 2022 20:42:03 +0000 (15:42 -0500)]
Copyright ownership transfer
Apply copyright ownership transfer from David Goulet, Julien Desfossez,
and Simon Marchi to EfficiOS Inc.
Link: https://lists.lttng.org/pipermail/lttng-dev/2022-January/030087.html
Link: https://lists.lttng.org/pipermail/lttng-dev/2022-January/030092.html
Link: https://lists.lttng.org/pipermail/lttng-dev/2022-January/030091.html
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Signed-off-by: David Goulet <dgoulet@ev0ke.net>
Signed-off-by: Julien Desfossez <ju@klipix.org>
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Change-Id: Id13575afd4a2a09bb91a8d2b7a12dc3db8dc329f
Jérémie Galarneau [Fri, 23 Jul 2021 20:56:53 +0000 (16:56 -0400)]
Fix: relayd: erroneous rundir permission logging message
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I2eb0901896feab684d397aa43f8c9036018d9c95
Jérémie Galarneau [Thu, 9 Dec 2021 21:27:29 +0000 (16:27 -0500)]
Fix: sessiond: rotation thread: fatal error when not finding a session
The rotation thread implements scheduled rotations (by size) by
registering a trigger that monitors the session's consumed size and
notifies when the next rotation's size threshold is exceeded.
The notification is delivered asynchronously which doesn't prevent
the session from being destroyed before the rotation thread has
had the time to process the notification (and perform a rotation).
Since it is possible for a session to be destroyed by the time the
notification is processed, the rotation thread shouldn't handle
this eventuality as a fatal error (shutting down the thread).
Note that nobody reported this issue nor did I attempt to reproduce it.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I588054bad3542854851f28d34f2c758bdf420a34
Jérémie Galarneau [Wed, 12 Jan 2022 20:48:00 +0000 (15:48 -0500)]
Fix: relayd: rotation failure for multi-domain session
Observed issue
==============
Rotating a multi-domain streaming session results in the following
error:
$ lttng rotate
Waiting for rotation to complete...
Error: Failed to retrieve rotation state.
Meanwhile, the relay daemon logs indicate the following:
DBG1 - 14:56:04.
213163667 [265774/265778]: lttng_trace_chunk_rename_path from .tmp_new_chunk to (null) (in lttng_trace_chunk_rename_path_no_lock() at trace-chunk.cpp:759)
PERROR - 14:56:04.
213242941 [265774/265778]: Failed to move trace chunk directory ".tmp_new_chunk" to "20220112T145604-0500-1": No such file or directory (in lttng_trace_chunk_rename_path_no_lock() at trace-chunk.cpp:799)
DBG1 - 14:56:04.
213396931 [265774/265778]: aborting session 2 (in session_abort() at session.cpp:588)
DBG1 - 14:56:04.
213512198 [265774/265778]: Control connection closed with 22 (in relay_thread_close_connection() at main.cpp:3874)
The 'abort' of session 2 here causes the kernel consumer to fail to
consume subbuffers:
Error: Relayd send index failed. Cleaning up relayd 3.
Error: Error consuming subbuffer: (0)
[...]
Cause
=====
Following the flow of execution in the relay daemon shows that different
trace chunks are used by the two relay sessions that result from the
streaming of a single multi-domain session. Both trace chunks "own" the
same output directory.
When a rotation is performed, the first trace chunk to be closed will
move the directory. Then, the second trace chunk to be closed will
attempt to do the same, failing to do so as seen in the relay daemon
log.
Solution
========
Using different trace chunk instances for relay sessions belonging to a
single sessiond session goes against the intended use of the sessiond
trace chunk registry.
A sessiond trace chunk registry allows the relay daemon to share trace
chunks used by different "relay sessions" when they were created for the
same user-visible session daemon session. Tracing multiple domains (e.g.
ust and kernel) results in per-domain relay sessions being created.
Sharing trace chunks, and their output directory more specifically, is
essential to properly implement session rotations. The sharing of output
directory handles allows directory renames to be performed once and
without races that would stem from from multiple renames.
The reason why sessiond trace chunk registry returns different trace
chunk instances for two relay sessions is that the wrong session `id` is
used to publish trace chunks. The `id` that must be used to share trace
chunks accross the relay sessions that belong to the same sessiond
session is `id_sessiond`.
`id_sessiond` is optional as it is only provided by consumers v2.11+.
Otherwise, it is fine to use the relay session `id`: it is a unique id
for a given session daemon instance and those consumers will not issue a
session rotation (or clear) as the feature didn't exist.
A reference counting bug revealed by this change is also fixed in the
implementation of the sessiond trace chunk registry.
When the trace chunk is first published, two references to the published
chunks exist. One is taken by the registry while the other is being
returned to the caller. In the use case of the relay daemon, the
reference held by the registry itself is undesirable.
We want the trace chunk to be removed from the registry as soon as it is
not being used by the relay daemon (through a session or a stream). This
differs from the behaviour of the consumer daemon which relies on an
explicit command from the session daemon to release the registry's
reference.
In cases where the trace chunk had already been published, the reference
belonging to the sessiond trace chunk registry instance has already been
'put' by the firt publication. We must simply return the published trace
chunk with a reference taken on behalf of the caller.
Known drawbacks
===============
None.
Change-Id: Ic33443b114a87574a1b26ac5ccb022e47f886ddd
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 12 Jan 2022 17:43:07 +0000 (12:43 -0500)]
Docs: relayd: document the role of session_trace_chunk_registry
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I6158dadfe92d23c0081aac888c3e54b3c9c711a3
Mathieu Desnoyers [Tue, 11 Jan 2022 18:59:15 +0000 (13:59 -0500)]
Fix: lttng-ctl: lttng_list_sessions: initialize out_sessions to NULL when returning 0
Observed issue
==============
Users of lttng-ctl API's lttng_list_sessions observe application crash
when freeing the *out_sessions output value when lttng_list_sessions
returns 0.
Cause
=====
The implementation does not set *out_sessions to NULL when
lttng_ctl_ask_sessiond() sets the sessions variable to NULL.
This causes the user application to attempt to free(3) an uninitialized
pointer.
Solution
========
Initialize out_sessions to NULL before invoking
lttng_ctl_ask_sessiond(), so it is initialized when lttng_list_sessions
returns 0, thus allowing *out_sessions to be subsequently freed.
A free(3) on a NULL pointer is a no-op.
Known drawbacks
===============
None.
History
=======
This was introduced by those two commits:
b178f53e90 ("Generate session name and default output on sessiond's end")
27ea4ba825 ("Fix: error when listing sessions with no session")
This is a regression present in the stable-2.11, stable-2.12,
stable-2.13, and master branches.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I34125d708a32674d79b831e5004c48321ebd711e
Francis Deslauriers [Thu, 18 Nov 2021 17:38:11 +0000 (12:38 -0500)]
relayd: add `lttng_viewer_command_str()`
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Icb7be4901727654c6339ed47344320140fdaf9fb
Francis Deslauriers [Thu, 11 Nov 2021 15:48:31 +0000 (10:48 -0500)]
relayd: add `lttcomm_relayd_command_str()`
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Idfd708843d19ca1b2cfe458a3b2b37e55f9c3274
Simon Marchi [Tue, 30 Nov 2021 03:15:53 +0000 (22:15 -0500)]
Fix: lttng: initialize variable in run_command_string
I got some crashes when using `lttng track` and hitting some error
paths. The tracker_handle variable is run_command_string is passed to
lttng_process_attr_tracker_handle_destroy uninitialized if
lttng_session_get_tracker_handle fails.
$ valgrind lttng track --kernel --pid 569878
==634572== Memcheck, a memory error detector
==634572== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==634572== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==634572== Command: lttng track --kernel --pid 569878
==634572==
Error: Unknown error
==634572== Conditional jump or move depends on uninitialised value(s)
==634572== at 0x4875007: lttng_process_attr_tracker_handle_destroy (tracker.cpp:25)
==634572== by 0x13AC55: run_command_string(cmd_type, char const*, lttng_domain_type, lttng_process_attr, char const*, mi_writer*) (track-untrack.cpp:485)
==634572== by 0x13ADA5: run_command(cmd_type, char const*, process_attr_command_args const*, mi_writer*) (track-untrack.cpp:535)
==634572== by 0x13B472: cmd_track_untrack(cmd_type, int, char const**, char const*) (track-untrack.cpp:740)
==634572== by 0x13B5D9: cmd_track(int, char const**) (track-untrack.cpp:805)
==634572== by 0x14C598: handle_command(int, char**) (lttng.cpp:237)
==634572== by 0x14CCE9: parse_args(int, char**) (lttng.cpp:426)
==634572== by 0x14CE65: main (lttng.cpp:475)
Fix it by initializing it to NULL.
Change-Id: Id2693e75c3c5c83cef58db3109973d7ab679b859
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 1 Dec 2021 13:46:11 +0000 (08:46 -0500)]
Fix: enable-rotation: wrong type of rotation specified in message
48a4000 changed the initialization of the schedule_type_str to allow the
code to compile as C++. The C-version of the code explicitly specified
the enum values which was "safer" while the C++ version assumes that the
array is initialized in the order of the enum's values.
This results in an incorrect message when enabling a rotation schedule.
For instance, enabling a periodic rotation will print that a
"size-based" rotation was successfully enabled.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I3cb6fcb6a67d7b0fa4c7913694923b5f03255389
Simon Marchi [Wed, 8 Sep 2021 17:12:12 +0000 (13:12 -0400)]
common: replace uses of calloc(..., 1) with zmalloc
I found these uses of calloc, which could be replaced with zmalloc. I
presumed that the reason for using calloc was for the fact that it
initializes the memory to zeroes.
Change-Id: Icd8b84d707978e949c55d9694e6dbc10319c0169
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Mathieu Desnoyers [Mon, 22 Nov 2021 19:38:35 +0000 (14:38 -0500)]
Fix: consumer-stream: live viewers observe timestamps going backwards
Observed issue
==============
When stress-testing over an entire weekend, we caught the following
occurrences of timestamps going backwards with babeltrace in live viewer
mode:
One occurrence:
# Test ust streaming live clear with viewer with new metadata after clear
# Parameters: tracing_active=0, clear_twice=0, buffer_type=pid
6-7 occurrences:
# Test ust basic streaming live with viewer
# Parameters: tracing_active=1, clear_twice=0, buffer_type=uid
Relevant log of the relayd focused on the context of the stream
triggering the issue (stream 994):
506:DBG1 - 17:26:54.
966288486 [
3866648/
3866655]: Relay viewer stream 994 not found (in viewer_stream_get_by_id() at viewer-stream.cpp:268)
603:DBG1 - 17:26:54.
970265349 [
3866648/
3866655]: Sending stream 994 to viewer (in send_viewer_streams() at live.cpp:241)
843:DBG1 - 17:26:55.
959835404 [
3866648/
3866652]: stream_add_index for stream 994 (in stream_add_index() at stream.cpp:1233)
844:DBG3 - 17:26:55.
959866303 [
3866648/
3866652]: Finding index for stream id 994 and seq_num 0 (in relay_index_get_by_id_or_create() at index.cpp:112)
845:DBG2 - 17:26:55.
959896611 [
3866648/
3866652]: Creating relay index for stream id 994 and seqnum 0 (in relay_index_create() at index.cpp:34)
846:DBG2 - 17:26:55.
959928097 [
3866648/
3866652]: Adding relay index with stream id 994 and seqnum 0 (in relay_index_add_unique() at index.cpp:70)
847:DBG2 - 17:26:55.
959968163 [
3866648/
3866652]: Index found or created in HT for stream ID 994 and seqnum 0 (in relay_index_get_by_id_or_create() at index.cpp:144)
884:DBG3 - 17:26:55.
961676906 [
3866648/
3866652]: Receiving data for stream id 994 seqnum 0, 0 bytes received, 84 bytes left to receive (in relay_process_data_receive_payload() at main.cpp:3583)
885:DBG1 - 17:26:55.
961783540 [
3866648/
3866652]: Wrote to stream 994: data_length = 84, padding_length = 0 (in stream_write() at stream.cpp:1113)
886:DBG1 - 17:26:55.
961862441 [
3866648/
3866652]: Wrote to stream 994: data_length = 0, padding_length = 4012 (in stream_write() at stream.cpp:1113)
887:DBG1 - 17:26:55.
961895593 [
3866648/
3866652]: handle_index_data: stream 994 net_seq_num 0 data offset 0 (in stream_update_index() at stream.cpp:1140)
888:DBG3 - 17:26:55.
961945901 [
3866648/
3866652]: Finding index for stream id 994 and seq_num 0 (in relay_index_get_by_id_or_create() at index.cpp:112)
889:DBG2 - 17:26:55.
961983056 [
3866648/
3866652]: Index found or created in HT for stream ID 994 and seqnum 0 (in relay_index_get_by_id_or_create() at index.cpp:144)
894:DBG2 - 17:26:55.
962334285 [
3866648/
3866652]: Writing index for stream ID 994 and seq num 0 (in relay_index_try_flush() at index.cpp:275)
895:DBG2 - 17:26:55.
962390756 [
3866648/
3866652]: index put for stream id 994 and seqnum 0 refcount 1 (in relay_index_put() at index.cpp:237)
1743:DBG1 - 17:26:56.
083287172 [
3866648/
3866655]: Check index status: index_received_seqcount 1 index_sent_seqcount 0 for stream 994 (in check_index_status() at live.cpp:1454)
1746:DBG1 - 17:26:56.
083446379 [
3866648/
3866655]: Sending viewer index for stream 994 offset 0 (in viewer_get_next_index() at live.cpp:1801)
1748:DBG1 - 17:26:56.
083544877 [
3866648/
3866655]: Index 1 for stream 994 sent (in viewer_get_next_index() at live.cpp:1842)
1751:DBG1 - 17:26:56.
083778149 [
3866648/
3866655]: Sent 4108 bytes for stream 994 (in viewer_get_packet() at live.cpp:1968)
2858:DBG1 - 17:26:56.
907744916 [
3866648/
3866652]: stream_add_index for stream 994 (in stream_add_index() at stream.cpp:1233)
2859:DBG1 - 17:26:56.
907762798 [
3866648/
3866652]: Received live beacon for stream 994 (in stream_add_index() at stream.cpp:1237)
2862:DBG3 - 17:26:56.
907855597 [
3866648/
3866652]: Receiving data for stream id 994 seqnum 1, 0 bytes received, 84 bytes left to receive (in relay_process_data_receive_payload() at main.cpp:3583)
2863:DBG1 - 17:26:56.
907950195 [
3866648/
3866652]: Wrote to stream 994: data_length = 84, padding_length = 0 (in stream_write() at stream.cpp:1113)
2864:DBG1 - 17:26:56.
908002911 [
3866648/
3866652]: Wrote to stream 994: data_length = 0, padding_length = 4012 (in stream_write() at stream.cpp:1113)
2865:DBG1 - 17:26:56.
908024312 [
3866648/
3866652]: handle_index_data: stream 994 net_seq_num 1 data offset 4096 (in stream_update_index() at stream.cpp:1140)
2866:DBG3 - 17:26:56.
908043082 [
3866648/
3866652]: Finding index for stream id 994 and seq_num 1 (in relay_index_get_by_id_or_create() at index.cpp:112)
2867:DBG2 - 17:26:56.
908061879 [
3866648/
3866652]: Creating relay index for stream id 994 and seqnum 1 (in relay_index_create() at index.cpp:34)
2868:DBG2 - 17:26:56.
908082115 [
3866648/
3866652]: Adding relay index with stream id 994 and seqnum 1 (in relay_index_add_unique() at index.cpp:70)
2869:DBG2 - 17:26:56.
908101275 [
3866648/
3866652]: Index found or created in HT for stream ID 994 and seqnum 1 (in relay_index_get_by_id_or_create() at index.cpp:144)
3011:DBG1 - 17:26:56.
913436908 [
3866648/
3866655]: Check index status: index_received_seqcount 1 index_sent_seqcount 1 for stream 994 (in check_index_status() at live.cpp:1454)
3012:DBG1 - 17:26:56.
913457688 [
3866648/
3866655]: Check index status: inactive with beacon, for stream 994 (in check_index_status() at live.cpp:1492)
3014:DBG1 - 17:26:56.
913507164 [
3866648/
3866655]: Index 1 for stream 994 sent (in viewer_get_next_index() at live.cpp:1842)
3043:DBG1 - 17:26:56.
914167206 [
3866648/
3866652]: stream_add_index for stream 994 (in stream_add_index() at stream.cpp:1233)
3044:DBG3 - 17:26:56.
914186324 [
3866648/
3866652]: Finding index for stream id 994 and seq_num 1 (in relay_index_get_by_id_or_create() at index.cpp:112)
3045:DBG2 - 17:26:56.
914205597 [
3866648/
3866652]: Index found or created in HT for stream ID 994 and seqnum 1 (in relay_index_get_by_id_or_create() at index.cpp:144)
3046:DBG2 - 17:26:56.
914227502 [
3866648/
3866652]: Writing index for stream ID 994 and seq num 1 (in relay_index_try_flush() at index.cpp:275)
3047:DBG2 - 17:26:56.
914299536 [
3866648/
3866652]: index put for stream id 994 and seqnum 1 refcount 1 (in relay_index_put() at index.cpp:237)
3587:DBG1 - 17:26:57.
067977800 [
3866648/
3866652]: Set begin data pending flag to stream 994 (in relay_begin_data_pending() at main.cpp:2240)
3644:DBG1 - 17:26:57.
070915787 [
3866648/
3866652]: Data pending for stream id 994: prev_data_seq 1, prev_index_seq 1, and last_seq 1 (in relay_data_pending() at main.cpp:2091)
3913:DBG1 - 17:26:57.
093492259 [
3866648/
3866652]: try_rotate_stream_index: Stream 994 (rotate_at_packet_seq_num = 2, received_packet_seq_num = (value = 1, is_set = 1)) (in try_rotate_stream_index() at stream.cpp:482)
3914:DBG1 - 17:26:57.
093525950 [
3866648/
3866652]: Rotating stream 994 index file (in try_rotate_stream_index() at stream.cpp:511)
3915:DBG1 - 17:26:57.
093561064 [
3866648/
3866652]: try_rotate_stream_data: Stream 994 (rotate_at_index_packet_seq_num = 2, rotate_at_prev_data_net_seq = 1, prev_data_seq = 1) (in try_rotate_stream_data() at stream.cpp:357)
3916:DBG1 - 17:26:57.
093591085 [
3866648/
3866652]: Rotating stream 994 data file with size 8192 (in stream_rotate_data_file() at stream.cpp:138)
3917:DBG1 - 17:26:57.
093626697 [
3866648/
3866652]: stream_rotate_data_file: reset tracefile_size_current for stream 994 was 8192 (in stream_rotate_data_file() at stream.cpp:169)
3918:DBG1 - 17:26:57.
093656578 [
3866648/
3866652]: Rotation completed for stream 994 (in stream_complete_rotation() at stream.cpp:66)
4238:DBG1 - 17:26:57.
635064782 [
3866648/
3866652]: Trying to close stream 994 (in try_stream_close() at stream.cpp:883)
4239:DBG1 - 17:26:57.
635098224 [
3866648/
3866652]: Succeeded in closing stream 994 (in try_stream_close() at stream.cpp:983)
4744:DBG1 - 17:26:57.
741972785 [
3866648/
3866655]: Check index status: index_received_seqcount 2 index_sent_seqcount 1 for stream 994 (in check_index_status() at live.cpp:1454)
4745:DBG1 - 17:26:57.
742030216 [
3866648/
3866655]: Sending viewer index for stream 994 offset 4096 (in viewer_get_next_index() at live.cpp:1801)
4747:DBG1 - 17:26:57.
742088421 [
3866648/
3866655]: Index 2 for stream 994 sent (in viewer_get_next_index() at live.cpp:1842)
4750:DBG1 - 17:26:57.
742197990 [
3866648/
3866655]: Sent 4108 bytes for stream 994 (in viewer_get_packet() at live.cpp:1968)
4755:DBG1 - 17:26:57.
932525633 [
3866648/
3866655]: Releasing stream id 994 (in stream_release() at stream.cpp:778)
4756:DBG1 - 17:26:57.
932555313 [
3866648/
3866655]: Rotation completed for stream 994 (in stream_complete_rotation() at stream.cpp:66)
Cause
=====
This log shows the following sequence:
consumer-data consumer-live-timer relayd-main relayd-live babeltrace
1) lttng_consumer_read_subbuffer
2) get next subbuf
3) write data packet to data socket
(starting here the data packet is
somewhere on the network)
4) put next subbuf
5) post_consume()
6) consumer_stream_sync_metadata_index()
7) wait for metadata
8) consumer_stream_sync_metadata()
9) check_stream()
10) set missed_metadata_flush
11) call send_live_beacon(()
12) sample empty ring buffer
13) read current timestamp
14) send inactivity beacon (empty packet)
15) receives a live beacon (@ 17:26:56.
907762798)
16) call consumer_stream_send_index()
17) send packet index to relayd
18) receives a data packet (@ 17:26:56.
907855597)
(at this point the data
packet is received from the
network)
19) ask for next index
20) informs the live viewer of the live beacon (@ 17:26:56.
913457688)
21) receives an index packet (@ 17:26:56.
914227502)
22) ask for next index
23) sends the packet to the viewer (@ 17:26:57.
742197990)
24) observes time going
backwards between the
previous live beacon
and the data packet.
The issue is caused by consumer_stream_sync_metadata_index which is
called after sending a data packet (therefore after having consumed a
data packet from the ring buffer). It invokes the send_live_beacon
callback before sending the index associated with the data packet that
was just sent.
However, this introduces a discrepancy between the live beacon
inactivity guarantees and the yet-to-be-sent packet index: the data
packet sent at [3] can be anywhere on the network, not even received by
the relay daemon, when the live beacon is sampling a now empty ring
buffer at [12], and thus sends a live inactivity beacon to the relay
daemon. Then, when the index is sent by consumer_stream_send_index
[16], its timestamp is in the past compared to the inactivity beacon
sent by send_live_beacon [11].
The purpose of the field "stream->indexes_in_flight" is to prevent
setting the inactivity timestamp in the relay stream when data is
missing for indexes that were received. This works because the indexes
are sent over the control socket, which is where the inactivity beacons
are also sent. It does not however prevent issues the other way around:
data sent prior to the inactivity beacon may or may not have reached
the relay daemon. It is therefore important to make sure that consuming
ring buffer data and sending that data's index vs sampling for an empty
ring buffer and sending an inactivity beacon are correctly ordered.
Solution
========
Send inactivity beacon after packet index.
Also document the purpose of sending an inactivity beacon in this
scenario.
Note
====
This issue is present since lttng-tools 2.7.0 (backported to 2.6.1),
where lttng_ustconsumer_read_subbuffer() invokes
consumer_flush_ust_index() prior to call consumer_stream_write_index().
It was introduced by commit
288bdb302a1 ("Fix: sessiond vs consumerd
push/get metadata deadlock").
Known drawbacks
===============
None.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ied43cc8229413b25e69b40ac218adad420f18797
This page took 0.061203 seconds and 5 git commands to generate.