Jérémie Galarneau [Wed, 12 Jan 2022 17:43:07 +0000 (12:43 -0500)]
Docs: relayd: document the role of session_trace_chunk_registry
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I6158dadfe92d23c0081aac888c3e54b3c9c711a3
Mathieu Desnoyers [Tue, 11 Jan 2022 18:59:15 +0000 (13:59 -0500)]
Fix: lttng-ctl: lttng_list_sessions: initialize out_sessions to NULL when returning 0
Observed issue
==============
Users of lttng-ctl API's lttng_list_sessions observe application crash
when freeing the *out_sessions output value when lttng_list_sessions
returns 0.
Cause
=====
The implementation does not set *out_sessions to NULL when
lttng_ctl_ask_sessiond() sets the sessions variable to NULL.
This causes the user application to attempt to free(3) an uninitialized
pointer.
Solution
========
Initialize out_sessions to NULL before invoking
lttng_ctl_ask_sessiond(), so it is initialized when lttng_list_sessions
returns 0, thus allowing *out_sessions to be subsequently freed.
A free(3) on a NULL pointer is a no-op.
Known drawbacks
===============
None.
History
=======
This was introduced by those two commits:
b178f53e90 ("Generate session name and default output on sessiond's end")
27ea4ba825 ("Fix: error when listing sessions with no session")
This is a regression present in the stable-2.11, stable-2.12,
stable-2.13, and master branches.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I34125d708a32674d79b831e5004c48321ebd711e
Francis Deslauriers [Thu, 18 Nov 2021 17:38:11 +0000 (12:38 -0500)]
relayd: add `lttng_viewer_command_str()`
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Icb7be4901727654c6339ed47344320140fdaf9fb
Francis Deslauriers [Thu, 11 Nov 2021 15:48:31 +0000 (10:48 -0500)]
relayd: add `lttcomm_relayd_command_str()`
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Idfd708843d19ca1b2cfe458a3b2b37e55f9c3274
Simon Marchi [Tue, 30 Nov 2021 03:15:53 +0000 (22:15 -0500)]
Fix: lttng: initialize variable in run_command_string
I got some crashes when using `lttng track` and hitting some error
paths. The tracker_handle variable is run_command_string is passed to
lttng_process_attr_tracker_handle_destroy uninitialized if
lttng_session_get_tracker_handle fails.
$ valgrind lttng track --kernel --pid 569878
==634572== Memcheck, a memory error detector
==634572== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==634572== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==634572== Command: lttng track --kernel --pid 569878
==634572==
Error: Unknown error
==634572== Conditional jump or move depends on uninitialised value(s)
==634572== at 0x4875007: lttng_process_attr_tracker_handle_destroy (tracker.cpp:25)
==634572== by 0x13AC55: run_command_string(cmd_type, char const*, lttng_domain_type, lttng_process_attr, char const*, mi_writer*) (track-untrack.cpp:485)
==634572== by 0x13ADA5: run_command(cmd_type, char const*, process_attr_command_args const*, mi_writer*) (track-untrack.cpp:535)
==634572== by 0x13B472: cmd_track_untrack(cmd_type, int, char const**, char const*) (track-untrack.cpp:740)
==634572== by 0x13B5D9: cmd_track(int, char const**) (track-untrack.cpp:805)
==634572== by 0x14C598: handle_command(int, char**) (lttng.cpp:237)
==634572== by 0x14CCE9: parse_args(int, char**) (lttng.cpp:426)
==634572== by 0x14CE65: main (lttng.cpp:475)
Fix it by initializing it to NULL.
Change-Id: Id2693e75c3c5c83cef58db3109973d7ab679b859
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 1 Dec 2021 13:46:11 +0000 (08:46 -0500)]
Fix: enable-rotation: wrong type of rotation specified in message
48a4000 changed the initialization of the schedule_type_str to allow the
code to compile as C++. The C-version of the code explicitly specified
the enum values which was "safer" while the C++ version assumes that the
array is initialized in the order of the enum's values.
This results in an incorrect message when enabling a rotation schedule.
For instance, enabling a periodic rotation will print that a
"size-based" rotation was successfully enabled.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I3cb6fcb6a67d7b0fa4c7913694923b5f03255389
Simon Marchi [Wed, 8 Sep 2021 17:12:12 +0000 (13:12 -0400)]
common: replace uses of calloc(..., 1) with zmalloc
I found these uses of calloc, which could be replaced with zmalloc. I
presumed that the reason for using calloc was for the fact that it
initializes the memory to zeroes.
Change-Id: Icd8b84d707978e949c55d9694e6dbc10319c0169
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Mathieu Desnoyers [Mon, 22 Nov 2021 19:38:35 +0000 (14:38 -0500)]
Fix: consumer-stream: live viewers observe timestamps going backwards
Observed issue
==============
When stress-testing over an entire weekend, we caught the following
occurrences of timestamps going backwards with babeltrace in live viewer
mode:
One occurrence:
# Test ust streaming live clear with viewer with new metadata after clear
# Parameters: tracing_active=0, clear_twice=0, buffer_type=pid
6-7 occurrences:
# Test ust basic streaming live with viewer
# Parameters: tracing_active=1, clear_twice=0, buffer_type=uid
Relevant log of the relayd focused on the context of the stream
triggering the issue (stream 994):
506:DBG1 - 17:26:54.
966288486 [
3866648/
3866655]: Relay viewer stream 994 not found (in viewer_stream_get_by_id() at viewer-stream.cpp:268)
603:DBG1 - 17:26:54.
970265349 [
3866648/
3866655]: Sending stream 994 to viewer (in send_viewer_streams() at live.cpp:241)
843:DBG1 - 17:26:55.
959835404 [
3866648/
3866652]: stream_add_index for stream 994 (in stream_add_index() at stream.cpp:1233)
844:DBG3 - 17:26:55.
959866303 [
3866648/
3866652]: Finding index for stream id 994 and seq_num 0 (in relay_index_get_by_id_or_create() at index.cpp:112)
845:DBG2 - 17:26:55.
959896611 [
3866648/
3866652]: Creating relay index for stream id 994 and seqnum 0 (in relay_index_create() at index.cpp:34)
846:DBG2 - 17:26:55.
959928097 [
3866648/
3866652]: Adding relay index with stream id 994 and seqnum 0 (in relay_index_add_unique() at index.cpp:70)
847:DBG2 - 17:26:55.
959968163 [
3866648/
3866652]: Index found or created in HT for stream ID 994 and seqnum 0 (in relay_index_get_by_id_or_create() at index.cpp:144)
884:DBG3 - 17:26:55.
961676906 [
3866648/
3866652]: Receiving data for stream id 994 seqnum 0, 0 bytes received, 84 bytes left to receive (in relay_process_data_receive_payload() at main.cpp:3583)
885:DBG1 - 17:26:55.
961783540 [
3866648/
3866652]: Wrote to stream 994: data_length = 84, padding_length = 0 (in stream_write() at stream.cpp:1113)
886:DBG1 - 17:26:55.
961862441 [
3866648/
3866652]: Wrote to stream 994: data_length = 0, padding_length = 4012 (in stream_write() at stream.cpp:1113)
887:DBG1 - 17:26:55.
961895593 [
3866648/
3866652]: handle_index_data: stream 994 net_seq_num 0 data offset 0 (in stream_update_index() at stream.cpp:1140)
888:DBG3 - 17:26:55.
961945901 [
3866648/
3866652]: Finding index for stream id 994 and seq_num 0 (in relay_index_get_by_id_or_create() at index.cpp:112)
889:DBG2 - 17:26:55.
961983056 [
3866648/
3866652]: Index found or created in HT for stream ID 994 and seqnum 0 (in relay_index_get_by_id_or_create() at index.cpp:144)
894:DBG2 - 17:26:55.
962334285 [
3866648/
3866652]: Writing index for stream ID 994 and seq num 0 (in relay_index_try_flush() at index.cpp:275)
895:DBG2 - 17:26:55.
962390756 [
3866648/
3866652]: index put for stream id 994 and seqnum 0 refcount 1 (in relay_index_put() at index.cpp:237)
1743:DBG1 - 17:26:56.
083287172 [
3866648/
3866655]: Check index status: index_received_seqcount 1 index_sent_seqcount 0 for stream 994 (in check_index_status() at live.cpp:1454)
1746:DBG1 - 17:26:56.
083446379 [
3866648/
3866655]: Sending viewer index for stream 994 offset 0 (in viewer_get_next_index() at live.cpp:1801)
1748:DBG1 - 17:26:56.
083544877 [
3866648/
3866655]: Index 1 for stream 994 sent (in viewer_get_next_index() at live.cpp:1842)
1751:DBG1 - 17:26:56.
083778149 [
3866648/
3866655]: Sent 4108 bytes for stream 994 (in viewer_get_packet() at live.cpp:1968)
2858:DBG1 - 17:26:56.
907744916 [
3866648/
3866652]: stream_add_index for stream 994 (in stream_add_index() at stream.cpp:1233)
2859:DBG1 - 17:26:56.
907762798 [
3866648/
3866652]: Received live beacon for stream 994 (in stream_add_index() at stream.cpp:1237)
2862:DBG3 - 17:26:56.
907855597 [
3866648/
3866652]: Receiving data for stream id 994 seqnum 1, 0 bytes received, 84 bytes left to receive (in relay_process_data_receive_payload() at main.cpp:3583)
2863:DBG1 - 17:26:56.
907950195 [
3866648/
3866652]: Wrote to stream 994: data_length = 84, padding_length = 0 (in stream_write() at stream.cpp:1113)
2864:DBG1 - 17:26:56.
908002911 [
3866648/
3866652]: Wrote to stream 994: data_length = 0, padding_length = 4012 (in stream_write() at stream.cpp:1113)
2865:DBG1 - 17:26:56.
908024312 [
3866648/
3866652]: handle_index_data: stream 994 net_seq_num 1 data offset 4096 (in stream_update_index() at stream.cpp:1140)
2866:DBG3 - 17:26:56.
908043082 [
3866648/
3866652]: Finding index for stream id 994 and seq_num 1 (in relay_index_get_by_id_or_create() at index.cpp:112)
2867:DBG2 - 17:26:56.
908061879 [
3866648/
3866652]: Creating relay index for stream id 994 and seqnum 1 (in relay_index_create() at index.cpp:34)
2868:DBG2 - 17:26:56.
908082115 [
3866648/
3866652]: Adding relay index with stream id 994 and seqnum 1 (in relay_index_add_unique() at index.cpp:70)
2869:DBG2 - 17:26:56.
908101275 [
3866648/
3866652]: Index found or created in HT for stream ID 994 and seqnum 1 (in relay_index_get_by_id_or_create() at index.cpp:144)
3011:DBG1 - 17:26:56.
913436908 [
3866648/
3866655]: Check index status: index_received_seqcount 1 index_sent_seqcount 1 for stream 994 (in check_index_status() at live.cpp:1454)
3012:DBG1 - 17:26:56.
913457688 [
3866648/
3866655]: Check index status: inactive with beacon, for stream 994 (in check_index_status() at live.cpp:1492)
3014:DBG1 - 17:26:56.
913507164 [
3866648/
3866655]: Index 1 for stream 994 sent (in viewer_get_next_index() at live.cpp:1842)
3043:DBG1 - 17:26:56.
914167206 [
3866648/
3866652]: stream_add_index for stream 994 (in stream_add_index() at stream.cpp:1233)
3044:DBG3 - 17:26:56.
914186324 [
3866648/
3866652]: Finding index for stream id 994 and seq_num 1 (in relay_index_get_by_id_or_create() at index.cpp:112)
3045:DBG2 - 17:26:56.
914205597 [
3866648/
3866652]: Index found or created in HT for stream ID 994 and seqnum 1 (in relay_index_get_by_id_or_create() at index.cpp:144)
3046:DBG2 - 17:26:56.
914227502 [
3866648/
3866652]: Writing index for stream ID 994 and seq num 1 (in relay_index_try_flush() at index.cpp:275)
3047:DBG2 - 17:26:56.
914299536 [
3866648/
3866652]: index put for stream id 994 and seqnum 1 refcount 1 (in relay_index_put() at index.cpp:237)
3587:DBG1 - 17:26:57.
067977800 [
3866648/
3866652]: Set begin data pending flag to stream 994 (in relay_begin_data_pending() at main.cpp:2240)
3644:DBG1 - 17:26:57.
070915787 [
3866648/
3866652]: Data pending for stream id 994: prev_data_seq 1, prev_index_seq 1, and last_seq 1 (in relay_data_pending() at main.cpp:2091)
3913:DBG1 - 17:26:57.
093492259 [
3866648/
3866652]: try_rotate_stream_index: Stream 994 (rotate_at_packet_seq_num = 2, received_packet_seq_num = (value = 1, is_set = 1)) (in try_rotate_stream_index() at stream.cpp:482)
3914:DBG1 - 17:26:57.
093525950 [
3866648/
3866652]: Rotating stream 994 index file (in try_rotate_stream_index() at stream.cpp:511)
3915:DBG1 - 17:26:57.
093561064 [
3866648/
3866652]: try_rotate_stream_data: Stream 994 (rotate_at_index_packet_seq_num = 2, rotate_at_prev_data_net_seq = 1, prev_data_seq = 1) (in try_rotate_stream_data() at stream.cpp:357)
3916:DBG1 - 17:26:57.
093591085 [
3866648/
3866652]: Rotating stream 994 data file with size 8192 (in stream_rotate_data_file() at stream.cpp:138)
3917:DBG1 - 17:26:57.
093626697 [
3866648/
3866652]: stream_rotate_data_file: reset tracefile_size_current for stream 994 was 8192 (in stream_rotate_data_file() at stream.cpp:169)
3918:DBG1 - 17:26:57.
093656578 [
3866648/
3866652]: Rotation completed for stream 994 (in stream_complete_rotation() at stream.cpp:66)
4238:DBG1 - 17:26:57.
635064782 [
3866648/
3866652]: Trying to close stream 994 (in try_stream_close() at stream.cpp:883)
4239:DBG1 - 17:26:57.
635098224 [
3866648/
3866652]: Succeeded in closing stream 994 (in try_stream_close() at stream.cpp:983)
4744:DBG1 - 17:26:57.
741972785 [
3866648/
3866655]: Check index status: index_received_seqcount 2 index_sent_seqcount 1 for stream 994 (in check_index_status() at live.cpp:1454)
4745:DBG1 - 17:26:57.
742030216 [
3866648/
3866655]: Sending viewer index for stream 994 offset 4096 (in viewer_get_next_index() at live.cpp:1801)
4747:DBG1 - 17:26:57.
742088421 [
3866648/
3866655]: Index 2 for stream 994 sent (in viewer_get_next_index() at live.cpp:1842)
4750:DBG1 - 17:26:57.
742197990 [
3866648/
3866655]: Sent 4108 bytes for stream 994 (in viewer_get_packet() at live.cpp:1968)
4755:DBG1 - 17:26:57.
932525633 [
3866648/
3866655]: Releasing stream id 994 (in stream_release() at stream.cpp:778)
4756:DBG1 - 17:26:57.
932555313 [
3866648/
3866655]: Rotation completed for stream 994 (in stream_complete_rotation() at stream.cpp:66)
Cause
=====
This log shows the following sequence:
consumer-data consumer-live-timer relayd-main relayd-live babeltrace
1) lttng_consumer_read_subbuffer
2) get next subbuf
3) write data packet to data socket
(starting here the data packet is
somewhere on the network)
4) put next subbuf
5) post_consume()
6) consumer_stream_sync_metadata_index()
7) wait for metadata
8) consumer_stream_sync_metadata()
9) check_stream()
10) set missed_metadata_flush
11) call send_live_beacon(()
12) sample empty ring buffer
13) read current timestamp
14) send inactivity beacon (empty packet)
15) receives a live beacon (@ 17:26:56.
907762798)
16) call consumer_stream_send_index()
17) send packet index to relayd
18) receives a data packet (@ 17:26:56.
907855597)
(at this point the data
packet is received from the
network)
19) ask for next index
20) informs the live viewer of the live beacon (@ 17:26:56.
913457688)
21) receives an index packet (@ 17:26:56.
914227502)
22) ask for next index
23) sends the packet to the viewer (@ 17:26:57.
742197990)
24) observes time going
backwards between the
previous live beacon
and the data packet.
The issue is caused by consumer_stream_sync_metadata_index which is
called after sending a data packet (therefore after having consumed a
data packet from the ring buffer). It invokes the send_live_beacon
callback before sending the index associated with the data packet that
was just sent.
However, this introduces a discrepancy between the live beacon
inactivity guarantees and the yet-to-be-sent packet index: the data
packet sent at [3] can be anywhere on the network, not even received by
the relay daemon, when the live beacon is sampling a now empty ring
buffer at [12], and thus sends a live inactivity beacon to the relay
daemon. Then, when the index is sent by consumer_stream_send_index
[16], its timestamp is in the past compared to the inactivity beacon
sent by send_live_beacon [11].
The purpose of the field "stream->indexes_in_flight" is to prevent
setting the inactivity timestamp in the relay stream when data is
missing for indexes that were received. This works because the indexes
are sent over the control socket, which is where the inactivity beacons
are also sent. It does not however prevent issues the other way around:
data sent prior to the inactivity beacon may or may not have reached
the relay daemon. It is therefore important to make sure that consuming
ring buffer data and sending that data's index vs sampling for an empty
ring buffer and sending an inactivity beacon are correctly ordered.
Solution
========
Send inactivity beacon after packet index.
Also document the purpose of sending an inactivity beacon in this
scenario.
Note
====
This issue is present since lttng-tools 2.7.0 (backported to 2.6.1),
where lttng_ustconsumer_read_subbuffer() invokes
consumer_flush_ust_index() prior to call consumer_stream_write_index().
It was introduced by commit
288bdb302a1 ("Fix: sessiond vs consumerd
push/get metadata deadlock").
Known drawbacks
===============
None.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ied43cc8229413b25e69b40ac218adad420f18797
Mathieu Desnoyers [Fri, 19 Nov 2021 20:11:37 +0000 (15:11 -0500)]
Fix: relayd: ressource leaks on viewer_stream_create error
Observed issue
==============
When facing failure to open viewer stream chunks in the context of "Fix:
relayd: failure to open chunk files concurrently with session clear",
we observe that the relay daemon triggers an assertion due to a
non-empty session hash table on cleanup.
Cause
=====
viewer_stream_create() does a stream_get(), but without any matching
stream_put() on error. This in turn holds a reference on the ctf_trace,
which holds a reference on the session.
By inspecting the code, we notice that the following ressources can be
leaked on error:
- vstream->stream_file.handle,
- vstream->index_file,
- vstream->stream.
In non-error scenarios, viewer_stream_release() is responsible for
releasing references on the composite objects.
The vstream->stream_file.trace_chunk is not an issue because it is put
in the destroy handler (as well as within the release, before having its
vstream->stream_file.trace_chunk pointer set to NULL).
Solution
========
Properly put references on all objects which are contained by the viewer
stream on error by introducing
viewer_stream_release_composite_objects(), which is used both in the
error path of viewer_stream_create() and in viewer_stream_release().
Note
====
Why not move those "put" operations in viewer_stream_destroy ?
This is done in the release to ensure we put references on composite
objects immediately when our own reference reaches 0, rather than
waiting for a grace period through call_rcu, which could then cause
chained call_rcu callbacks and require multiple invocation of
rcu_barrier on relayd exit to guarantee that all callbacks have been
executed and all ressources properly freed.
Known drawbacks
===============
None.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I864b58eda94ebda5e6faea45c922d4e814a15daa
Mathieu Desnoyers [Fri, 19 Nov 2021 16:12:25 +0000 (11:12 -0500)]
Fix: relayd: live: erroneous message timestamp observed from live viewer
Observed issue
==============
Another situation where erroneous message timestamp is observed by the
live viewer. Happens rarely (only two occurrences while running
ust_clear in stress-tests overnight on a 16-core machine).
Triggered with the following test:
# Test ust streaming live clear with viewer with new metadata after clear
# Parameters: tracing_active=0, clear_twice=1, buffer_type=pid
Babeltrace 2 error:
11-19 06:21:02.571 488497 488497 E PLUGIN/SRC.CTF.LTTNG-LIVE handle_late_message@lttng-live.c:1206 [lttng-live] Invalid live stream state: have a late message that is not a packet discarded or event discarded message: late-msg-type=PACKET_BEGINNING
11-19 06:21:02.578 488497 488497 E PLUGIN/SRC.CTF.LTTNG-LIVE next_stream_iterator_for_trace@lttng-live.c:1360 [lttng-live] Late message could not be handled correctly: lttng-live-msg-iter-addr=0x55b4cdee1590, stream-name="stream-1024", curr-msg-ts=
1637302861380968303, last-msg-ts=
1637302861472798701
11-19 06:21:02.578 488497 488497 E PLUGIN/SRC.CTF.LTTNG-LIVE lttng_live_msg_iter_next@lttng-live.c:1802 [lttng-live] Error preparing the next batch of messages: live-iter-status=LTTNG_LIVE_ITERATOR_STATUS_ERROR
11-19 06:21:02.579 488497 488497 W LIB/MSG-ITER bt_message_iterator_next@iterator.c:866 Component input port message iterator's "next" method failed: iter-addr=0x55b4cdecfe30, iter-upstream-comp-name="lttng-live", iter-upstream-comp-log-level=WARNING, iter-upstream-comp-class-type=SOURCE, iter-upstream-comp-class-name="lttng-live", iter-upstream-comp-class-partial-descr="Connect to an LTTng relay daemon", iter-upstream-port-type=OUTPUT, iter-upstream-port-name="out", status=ERROR
Cause
=====
viewer_get_next_index() does not protect its use of the session
ongoing_rotation state by any synchronization mechanism. It only takes
the stream lock, which is not used to protect changes to the ongoing
rotation state with respect to the chunk rename operation performed by
chunk creation at the beginning of the clear command.
Solution
========
Protect the use of the session ongoing rotation state and file open
operations by the session lock in viewer_get_next_index().
Known drawbacks
===============
I don't expect this to cause any real scalability concern considering
the fact that the relay daemon has only two threads, one to handle
session daemon commands, and the other to handle viewer commands.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I242942ac6bc7a9199a17cd9800fc461a397b8955
Mathieu Desnoyers [Fri, 19 Nov 2021 16:08:15 +0000 (11:08 -0500)]
relayd: clarify viewer_get_next_index no rotation condition
The condition identifying cases where a viewer_get_next_index must not
perform a viewer stream rotation is relatively tricky to comprehend.
Split it in two intermediate variables, and use those variables to
print debugging statements when conditions for not performing a
rotation match.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ieca2fbf879c8ed45c0b3aa6035ac32679b12dcdc
Mathieu Desnoyers [Thu, 18 Nov 2021 20:22:39 +0000 (15:22 -0500)]
Fix: relayd: failure to open chunk files concurrently with session clear
Observed issue
==============
When stress-testing ust clear with an active live viewer, we observe a
situation where the live viewer thread fails to open chunk files in
make_viewer_streams() when executed after the creation of the new trace
chunk at the beginning of the clear command:
DBG3 - 16:19:50.
923577790 [40834/40838]: Processing "RELAYD_CREATE_TRACE_CHUNK" command for socket 19 (in relay_process_control_command() at main.cpp:3262)
DBG1 - 16:19:50.
923577730 [40834/40841]: Relay viewer stream 225 not found (in viewer_stream_get_by_id() at viewer-stream.cpp:265)
DBG1 - 16:19:50.
923600762 [40834/40838]: lttng_trace_chunk_rename_path from to .tmp_old_chunk (in lttng_trace_chunk_rename_path_no_lock() at trace-chunk.cpp:759)
DBG1 - 16:19:50.
923627202 [40834/40841]: Opening trace chunk file "ust/uid/0/64-bit/my_chan-0_26" (in _lttng_trace_chunk_open_fs_handle_locked() at trace-chunk.cpp:1359)
DBG1 - 16:19:50.
923664685 [40834/40841]: Adding new file "ust/uid/0/64-bit/my_chan-0_26" to trace chunk "(unnamed)" (in lttng_trace_chunk_add_file() at trace-chunk.cpp:1309)
DBG1 - 16:19:50.
923706441 [40834/40841]: Relay viewer stream 226 not found (in viewer_stream_get_by_id() at viewer-stream.cpp:265)
DBG1 - 16:19:50.
923727770 [40834/40841]: Opening trace chunk file "ust/uid/0/64-bit/my_chan-0_27" (in _lttng_trace_chunk_open_fs_handle_locked() at trace-chunk.cpp:1359)
DBG1 - 16:19:50.
923744686 [40834/40841]: Adding new file "ust/uid/0/64-bit/my_chan-0_27" to trace chunk "(unnamed)" (in lttng_trace_chunk_add_file() at trace-chunk.cpp:1309)
DBG1 - 16:19:50.
923773427 [40834/40841]: Relay viewer stream 227 not found (in viewer_stream_get_by_id() at viewer-stream.cpp:265)
DBG1 - 16:19:50.
923803791 [40834/40841]: Opening trace chunk file "ust/uid/0/64-bit/my_chan-0_28" (in _lttng_trace_chunk_open_fs_handle_locked() at trace-chunk.cpp:1359)
DBG1 - 16:19:50.
923831589 [40834/40841]: Adding new file "ust/uid/0/64-bit/my_chan-0_28" to trace chunk "(unnamed)" (in lttng_trace_chunk_add_file() at trace-chunk.cpp:1309)
DBG1 - 16:19:50.
923865981 [40834/40841]: Relay viewer stream 228 not found (in viewer_stream_get_by_id() at viewer-stream.cpp:265)
DBG1 - 16:19:50.
923889329 [40834/40841]: Opening trace chunk file "ust/uid/0/64-bit/index/my_chan-0_29.idx" (in _lttng_trace_chunk_open_fs_handle_locked() at trace-chunk.cpp:1359)
DBG1 - 16:19:50.
923905142 [40834/40838]: Creating trace chunk: chunk_id = 1, creation time =
20211118-161950 (in lttng_trace_chunk_create() at trace-chunk.cpp:440)
DBG1 - 16:19:50.
923907984 [40834/40841]: Adding new file "ust/uid/0/64-bit/index/my_chan-0_29.idx" to trace chunk "(unnamed)" (in lttng_trace_chunk_add_file() at trace-chunk.cpp:1309)
DBG1 - 16:19:50.
923937804 [40834/40838]: Chunk name set to "20211118T161950+0000-1" (in lttng_trace_chunk_create() at trace-chunk.cpp:471)
PERROR - 16:19:50.
923984288 [40834/40841]: Failed to open fs handle to ust/uid/0/64-bit/index/my_chan-0_29.idx, open() returned: No such file or directory (in fd_tracker_open_fs_handle() at fd-tracker.cpp:548)
DBG1 - 16:19:50.
924050763 [40834/40841]: Opening trace chunk file "ust/uid/0/64-bit/my_chan-0_29" (in _lttng_trace_chunk_open_fs_handle_locked() at trace-chunk.cpp:1359)
DBG1 - 16:19:50.
924074480 [40834/40841]: Adding new file "ust/uid/0/64-bit/my_chan-0_29" to trace chunk "(unnamed)" (in lttng_trace_chunk_add_file() at trace-chunk.cpp:1309)
PERROR - 16:19:50.
924094720 [40834/40841]: Failed to open fs handle to ust/uid/0/64-bit/my_chan-0_29, open() returned: No such file or directory (in fd_tracker_open_fs_handle() at fd-tracker.cpp:548)
DBG1 - 16:19:50.
924193679 [40834/40841]: Viewer connection closed with 23 (in thread_worker() at live.cpp:2542)
DBG1 - 16:19:50.
924227482 [40834/40838]: Attempting to publish trace chunk: sessiond {
34038782-6f74-4b2d-801e-
752cf3d8885e}, session_id = 7, chunk_id = 1 (in sessiond_trace_chunk_registry_publish_chunk() at sessiond-trace-chunks.cpp:385)
DBG1 - 16:19:50.
924312916 [40834/40838]: Reset communication state of relay connection (fd = 19) (in connection_reset_protocol_state() at connection.cpp:82)
DBG3 - 16:19:50.
924350200 [40834/40838]: Relayd worker thread polling... (in relay_thread_worker() at main.cpp:3833)
DBG3 - 16:19:50.
924365222 [40834/40841]: Relayd live viewer worker thread polling... (in thread_worker() at live.cpp:2456)
DBG1 - 16:19:50.
926718319 [40834/40838]: Done receiving control command header: fd = 19, cmd = 18, cmd_version = 0, payload size = 532 bytes (in relay_process_control_receive_header() at main.cpp:3422)
DBG3 - 16:19:50.
926755574 [40834/40838]: Relayd worker thread polling... (in relay_thread_worker() at main.cpp:3833)
DBG1 - 16:19:50.
926787638 [40834/40838]: Done receiving control command payload: fd = 19, payload size = 532 bytes (in relay_process_control_receive_payload() at main.cpp:3339)
DBG3 - 16:19:50.
926811247 [40834/40838]: Processing "RELAYD_ROTATE_STREAMS" command for socket 19 (in relay_process_control_command() at main.cpp:3258)
Cause
=====
This is caused by relay_create_trace_chunk() using
lttng_trace_chunk_rename_path() to move away each trace subdirectory
into the subdirectory .tmp_old_chunk, and making this the new top-level
chunk directory (temporarily). This is a temporary state which will be
resorbed on relay_close_trace_chunk(), moving back the top-level chunk
directory to its original place.
Attempts to open chunk files from the prior chunk may result in failures,
because the chunk lock protecting the chunk rename operation only
protects the chunk owned by the relay thread, not its copy(ies) owned by
the live viewer thread.
This intermediate state should _not_ be observed by the live viewer
thread. The session ongoing rotation state should prevent the live
viewer threads from observing this.
Solution
========
Set the ongoing rotation state in relay_create_trace_chunk() earlier:
before invoking lttng_trace_chunk_rename_path(). Also ensure that the
session ongoing rotation state is protected by the session lock.
On the live thread side, introduce use of the session ongoing rotation
state in viewer_get_new_streams() and viewer_attach_session() to
effectively skip creation of the viewer streams if a session has a
rotation ongoing.
Viewers are expected to deal with the LTTNG_VIEWER_NEW_STREAMS_NO_NEW
reply (or handle the fact that no streams are currently available) and
try again later.
Both Babeltrace 2.0 and Babeltrace 1.5 appear to handle those replies
correctly.
Known drawbacks
===============
None.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I3ad8d59dff2994bfb62efa02a620f3c0ffbb5e3b
Simon Marchi [Thu, 16 Dec 2021 04:13:45 +0000 (23:13 -0500)]
common: compile argpar-utils as C++
argpar-utils was in review while everything was converted to be compiled
as C++, so it missed the boat. Do it now.
Change-Id: Ica005b3b27cc0bd9ef6e886d7078649096c7c7a2
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Mathieu Desnoyers [Wed, 17 Nov 2021 16:55:36 +0000 (11:55 -0500)]
Fix: relayd: live: metadata stream reference count < 0 assert
Observed issue
==============
While running tests/regression/tools/clear/test_ust test in a loop we
eventually witness the following error:
The symptom on the Babeltrace side is a Connection reset by peer. This
is caused by a relayd abort after an assertion failure due to a
reference count being lower than 0.
Test case:
# Test ust streaming live clear with viewer
# Parameters: tracing_active=0, clear_twice=0, buffer_type=pid
Tools commit: https://review.lttng.org/c/lttng-tools/+/6754/1
Relay daemon assertion failure stack trace:
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007ffa2e850859 in __GI_abort () at abort.c:79
#2 0x00007ffa2e850729 in __assert_fail_base (fmt=0x7ffa2e9e6588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x56221bdb96f4 "res >= 0",
file=0x56221bdb97a0 "/root/virtenv/usr/include/urcu/ref.h", line=66, function=<optimized out>) at assert.c:92
#3 0x00007ffa2e861f36 in __GI___assert_fail (assertion=assertion@entry=0x56221bdb96f4 "res >= 0", file=file@entry=0x56221bdb97a0 "/root/virtenv/usr/include/urcu/ref.h", line=line@entry=66,
function=function@entry=0x56221bdbd638 <__PRETTY_FUNCTION__.7554> "urcu_ref_put") at assert.c:101
#4 0x000056221bd6a1cc in urcu_ref_put (release=<optimized out>, ref=0x7ffa24008cb0) at /root/virtenv/usr/include/urcu/ref.h:66
#5 viewer_stream_put (vstream=vstream@entry=0x7ffa24008cb0) at viewer-stream.c:279
#6 0x000056221bd5e4c5 in viewer_get_metadata (conn=conn@entry=0x7ffa0c000fc0) at live.c:2211
#7 0x000056221bd63778 in process_control (conn=0x7ffa0c000fc0, recv_hdr=0x7ffa297c5af0) at live.c:2376
#8 thread_worker (data=<optimized out>) at live.c:2541
#9 0x00007ffa2ea28609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#10 0x00007ffa2e94d293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Babeltrace crash:
11-17 05:13:05.197 47007 47007 D PLUGIN/SRC.CTF.LTTNG-LIVE/VIEWER lttng_live_get_next_index@viewer-connection.c:1494 [live] Requesting next index for stream: cmd=GET_NEXT_INDEX, viewer-stream-id=3352
11-17 05:13:05.215 47007 47007 E PLUGIN/SRC.CTF.LTTNG-LIVE/VIEWER lttng_live_recv@viewer-connection.c:245 [live] Error receiving from Relay: Connection reset by peer.
11-17 05:13:05.215 47007 47007 E PLUGIN/SRC.CTF.LTTNG-LIVE/VIEWER lttng_live_get_next_index@viewer-connection.c:1522 [live] Error receiving get next index reply
11-17 05:13:05.215 47007 47007 D PLUGIN/SRC.CTF.LTTNG-LIVE lttng_live_iterator_next_msg_on_stream@lttng-live.c:1069 [live] Returning from advancing live stream iterator: status=LTTNG_LIVE_ITERATOR_STATUS_ERROR, stream-name="stream-3352", viewer-stream-id=3352
11-17 05:13:05.215 47007 47007 E PLUGIN/SRC.CTF.LTTNG-LIVE lttng_live_msg_iter_next@lttng-live.c:1802 [live] Error preparing the next batch of messages: live-iter-status=LTTNG_LIVE_ITERATOR_STATUS_ERROR
11-17 05:13:05.216 47007 47007 W LIB/MSG-ITER bt_message_iterator_next@iterator.c:866 Component input port message iterator's "next" method failed: iter-addr=0x563b7dd07700, iter-upstream-comp-name="live", iter-upstream-comp-log-level=TRACE, iter-upstream-comp-class-type=SOURCE, iter-upstream-comp-class-name="lttng-live", iter-upstream-comp-class-partial-descr="Connect to an LTTng relay daemon", iter-upstream-port-type=OUT
PUT, iter-upstream-port-name="out", status=ERROR
11-17 05:13:05.216 47007 47007 E PLUGIN/FLT.UTILS.MUXER muxer_upstream_msg_iter_next@muxer.c:446 [mux] Upstream iterator's next method returned an error: status=ERROR
11-17 05:13:05.216 47007 47007 E PLUGIN/FLT.UTILS.MUXER validate_muxer_upstream_msg_iters@muxer.c:989 [mux] Cannot validate muxer's upstream message iterator wrapper: muxer-msg-iter-addr=0x563b7dd04740, muxer-upstream-msg-iter-wrap-addr=0x563b7dd0db30
11-17 05:13:05.216 47007 47007 E PLUGIN/FLT.UTILS.MUXER muxer_msg_iter_next@muxer.c:1417 [mux] Cannot get next message: comp-addr=0x563b7dd1a760, muxer-comp-addr=0x563b7dd1b930, muxer-msg-iter-addr=0x563b7dd04740, msg-iter-addr=0x563b7dd07590, status=ERROR
11-17 05:13:05.216 47007 47007 W LIB/MSG-ITER bt_message_iterator_next@iterator.c:866 Component input port message iterator's "next" method failed: iter-addr=0x563b7dd07590, iter-upstream-comp-name="mux", iter-upstream-comp-log-level=WARNING, iter-upstream-comp-class-type=FILTER, iter-upstream-comp-class-name="muxer", iter-upstream-comp-class-partial-descr="Sort messages from multiple inpu", iter-upstream-port-type=OUTPUT,
iter-upstream-port-name="out", status=ERROR
11-17 05:13:05.216 47007 47007 W LIB/GRAPH consume_graph_sink@graph.c:462 Component's "consume" method failed: status=ERROR, comp-addr=0x563b7dd06d20, comp-name="pretty", comp-log-level=WARNING, comp-class-type=SINK, comp-class-name="pretty", comp-class-partial-descr="Pretty-print messages (`text` fo", comp-class-is-frozen=1, comp-class-so-handle-addr=0x563b7dd051b0, comp-class-so-handle-path="/root/virtenv/usr/lib/babeltr
ace2/plugins/babeltrace-plugin-text.so", comp-input-port-count=1, comp-output-port-count=0
11-17 05:13:05.217 47007 47007 E CLI cmd_run@babeltrace2.c:2537 Graph failed to complete successfully
Cause
=====
Both relay and live threads can put the ownership reference on the
metadata viewer stream concurrently without synchronization, thus
leading to a reference count going lower than 0.
The viewer stream ownership design initially planned for being owned by
the live thread, thus allowing the live thread to put the ownership
reference as soon as the associated relay stream is observed as closed,
and the viewer stream is considered as hung up.
However, in the specific case of the metadata viewer stream, the
responsibility of closing the metadata viewer stream is shared between
the relay and live threads, because the viewers expect to observe a
LTTNG_VIEWER_NO_NEW_METADATA message before the metadata stream
hangs up (see comment in viewer_get_metadata()). Therefore, if
viewer_get_metadata() is done before the metadata stream is closed, the
viewer will receive the LTTNG_VIEWER_NO_NEW_METADATA message, and set
the no_new_metadata_notified state to true. It's then the relay thread's
relay_close_stream() which will invoke the ownership put. However,
the live thread may concurrently try to put the viewer stream ownership
as well from a subsequent viewer_get_metadata(), thus leading to a
reference count < 0.
Solution
========
Fix this by putting the ownership reference from the live viewer thread
rather than the relay thread. This can be done by tracking the state of
no_new_metadata_notified within the live viewer thread.
Known drawbacks
===============
This will postpone reclaim of the metadata viewer stream from the
relay stream close to the following viewer_get_metadata (after a
LTTNG_VIEWER_NO_NEW_METADATA message has been replied), which I don't
think is an issue.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I9783d3a8402dddaa364a7a69c0764eaf30c16fde
Mathieu Desnoyers [Thu, 18 Nov 2021 15:37:29 +0000 (10:37 -0500)]
Fix: relayd: compare viewer chunks by ID rather than address
Observed issue
==============
In "Fix: relayd: live: erroneous message timestamp observed from live
viewer", we observe that sometimes viewer streams unexpectedly end up
performing a viewer stream rotation at session destroy.
Cause
=====
This may happen in the following scenario:
1) Relay stream A is rotated to NULL.
2) viewer_get_next_index for viewer stream A:
2.1) observes a NULL rstream->trace_chunk, updates the viewer session
current trace chunk to NULL (viewer_session_set_trace_chunk_copy).
2.2) "Transition the viewer stream into the latest trace chunk
available." does not issue viewer_stream_rotate_to_trace_chunk, because
the condition (rstream->completed_rotation_count ==
vstream->last_seen_rotation_count + 1 && !rstream->trace_chunk)
evaluates to "true", and thus the entire if () evaluates to false.
3) check_index_status detects rstream->closed and
index_received_seqcount == index_sent_seqcount, thus replying HUP to
viewer, effectively releasing ownership of the viewer stream.
4) viewer_get_next_index for viewer stream B (not rotated to NULL yet):
4.1) observes a non-NULL rstream->trace_chunk, updates the viewer
session current trace chunk to *a new copy* of the non-NULL
rstream->trace_chunk (viewer_session_set_trace_chunk_copy).
4.2) the comparison (conn->viewer_session->current_trace_chunk !=
vstream->stream_file.trace_chunk) done by pointer don't match, because
the viewer session current trace chunk is a new copy.
Therefore, due to those stream close scenarios where the viewer session
can go back and forth between NULL and _different copies_ of the relay
chunk, we cannot use a comparison of chunks by address on the viewer
chunks.
Solution
========
Compare the viewer stream chunks by ID rather than address.
Known drawbacks
===============
The comparison is probably slightly slower, but I don't expect this to
be significant.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I05e4f97f26b2659c007726cc29d3edafa17bdb98
Mathieu Desnoyers [Tue, 16 Nov 2021 20:59:01 +0000 (15:59 -0500)]
Fix: relayd: live: erroneous message timestamp observed from live viewer
Observed issue
==============
While running tests/regression/tools/clear/test_ust test in a loop we
eventually witness the following error:
# Test ust basic streaming live with viewer
# Parameters: tracing_active=0, clear_twice=0, buffer_type=uid
ok 425 - Create session 6jbcTSKUG7s2RIp5 with uri:net://localhost and opts: --live
ok 426 - Enable channel chan for session 6jbcTSKUG7s2RIp5
ok 427 - Enable ust event tp:tptest for session 6jbcTSKUG7s2RIp5
ok 428 - Start tracing for session 6jbcTSKUG7s2RIp5
# Waiting for live trace at url: net://localhost
ok 429 - Waiting for live trace at url: net://localhost
# Waiting for live viewers on url: net://localhost
ok 430 - Waiting for live viewers on url: net://localhost
# Wait until viewer sees all 10 expected events
ok 431 - Live viewer read 10 events, expect 10
ok 432 - Destroy session 6jbcTSKUG7s2RIp5
# Waiting for application to exit
ok 433 - Wait for application to exit
# Wait for viewer to exit
10-28 22:07:37.935 764967 764967 E PLUGIN/SRC.CTF.LTTNG-LIVE next_stream_iterator_for_trace@lttng-live.c:1222 [lttng-live] Message's timestamp is less than lttng-live's message iterator's last returned timestamp: lttng-live-msg-iter-addr=0x55fe45d4977
0, ts=
1635458857116911882, last-msg-ts=
1635458857123908033
10-28 22:07:37.937 764967 764967 E PLUGIN/SRC.CTF.LTTNG-LIVE lttng_live_msg_iter_next@lttng-live.c:1662 [lttng-live] Error preparing the next batch of messages: live-iter-status=LTTNG_LIVE_ITERATOR_STATUS_ERROR
10-28 22:07:37.937 764967 764967 W LIB/MSG-ITER bt_message_iterator_next@iterator.c:861 Component input port message iterator's "next" method failed: iter-addr=0x55fe45d38c50, iter-upstream-comp-name="lttng-live", iter-upstream-comp-log-level=WARNING,
iter-upstream-comp-class-type=SOURCE, iter-upstream-comp-class-name="lttng-live", iter-upstream-comp-class-partial-descr="Connect to an LTTng relay daemon", iter-upstream-port-type=OUTPUT, iter-upstream-port-name="out", status=ERROR
10-28 22:07:37.937 764967 764967 E PLUGIN/FLT.UTILS.MUXER muxer_upstream_msg_iter_next@muxer.c:452 [muxer] Upstream iterator's next method returned an error: status=ERROR
10-28 22:07:37.938 764967 764967 E PLUGIN/FLT.UTILS.MUXER validate_muxer_upstream_msg_iters@muxer.c:986 [muxer] Cannot validate muxer's upstream message iterator wrapper: muxer-msg-iter-addr=0x55fe45d33640, muxer-upstream-msg-iter-wrap-addr=0x55fe45d3
af20
10-28 22:07:37.938 764967 764967 E PLUGIN/FLT.UTILS.MUXER muxer_msg_iter_next@muxer.c:1411 [muxer] Cannot get next message: comp-addr=0x55fe45d49cb0, muxer-comp-addr=0x55fe45d49d30, muxer-msg-iter-addr=0x55fe45d33640, msg-iter-addr=0x55fe45d38ae0, sta
tus=ERROR
10-28 22:07:37.938 764967 764967 W LIB/MSG-ITER bt_message_iterator_next@iterator.c:861 Component input port message iterator's "next" method failed: iter-addr=0x55fe45d38ae0, iter-upstream-comp-name="muxer", iter-upstream-comp-log-level=WARNING, iter
-upstream-comp-class-type=FILTER, iter-upstream-comp-class-name="muxer", iter-upstream-comp-class-partial-descr="Sort messages from multiple inpu", iter-upstream-port-type=OUTPUT, iter-upstream-port-name="out", status=ERROR
10-28 22:07:37.938 764967 764967 W LIB/GRAPH consume_graph_sink@graph.c:469 Component's "consume" method failed: status=ERROR, comp-addr=0x55fe45d49550, comp-name="pretty", comp-log-level=WARNING, comp-class-type=SINK, comp-class-name="pretty", comp-c
lass-partial-descr="Pretty-print messages (`text` fo", comp-class-is-frozen=0, comp-class-so-handle-addr=0x55fe45d38460, comp-class-so-handle-path="/root/virtenv/usr/lib/babeltrace2/plugins/babeltrace-plugin-text.la", comp-input-port-count=1, comp-out
put-port-count=0
10-28 22:07:37.938 764967 764967 E CLI cmd_run@babeltrace2.c:2547 Graph failed to complete successfully
Cause
=====
When doing the rotation associated with a clear,
viewer_stream_sync_tracefile_array_tail aims at pushing forward the
index_sent_seqcount for sessions in tracefile rotation mode to the
oldest available seqcount (tail) for this stream. It does not take into
consideration that the current index_sent_seqcount may already be passed
that oldest available seqcount position, thus eventually re-emitting
trace data.
For sessions *not* in tracefile rotation mode (this is known because
`tracefile_array_get_seq_tail()` returns -1ULL), this function
erroneously resets the index_sent_seqcount to 0, thus also causing trace
data to be re-emitted.
Solution
========
Solve this by using the maximum between the current index_sent_seqcount
and the tracefile array tail position as new position in
viewer_stream_sync_tracefile_array_tail.
Notes
=====
This symptom is also observed without using the clear command, simply on
destroy with a live viewer attached. This is caused by another issue
(not addressed by this patch) which causes
viewer_stream_rotate_to_trace_chunk to be sometimes invoked when streams
are closed on destroy.
Known drawbacks
===============
None.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ifb877d6e2bd94831ddf93077eac3cc968945d6f7
Mathieu Desnoyers [Wed, 24 Nov 2021 20:56:16 +0000 (15:56 -0500)]
Validate channel context mismatch across UST applications
Observed issue
==============
Applications traced with LTTng-UST are expected to all provide the exact
same layout for their channel's context fields, else it leads to
corrupted traces. This is only enforced within LTTng-UST. There is
nothing in the session daemon that prevents this scenario, and it is
only observable when reading the corrupted trace.
This makes the entire trace unreadable from the point where it is
corrupted.
Cause
=====
Even though LTTng-UST sends the entire description of its context fields
along with the channel registration notification, there is no validation
of the context fields' content against the context fields present in the
ust registry.
Solution
========
Validate each registered UST channel context fields against the fields
present in the registry. Reject the application if there is a mismatch.
Known drawbacks
===============
None.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I8b49032bf4f766e549dfccfafdce8cddcbb2873f
Jérémie Galarneau [Tue, 14 Dec 2021 19:52:24 +0000 (14:52 -0500)]
Fix: relayd comm: improperly packed rotate streams command header
Observed issue
==============
The implicit rotation of a session performed during its destruction
fails on LTTng 2.12 (and upwards) with the following errors:
Error: Relayd rotate streams replied error 152
Error: Relayd rotate stream failed. Cleaning up relayd 2
Error: Rotate channel failed
Failed to find relay daemon socket: relayd_id = 2
Error: Failed to perform a quiet rotation as part of the destruction of session "my_session": Rotation failure on consumer
Cause
=====
Error 152 matches the LTTNG_ERR_INVALID_PROTOCOL error, which implies
that the consumer daemon sent an unexpected command to the relay daemon.
It was determined that the RELAYD_ROTATE_STREAMS command header is not
properly packed since the LTTNG_PACKED annotation was omitted from its
`new_chunk_id` optional field. The documentation of LTTNG_OPTIONAL_COMM
duly indicates that this is required.
Without the use of LTTNG_PACKED, various lengths of padding (3 or 7
bytes) are inserted between new_chunk_id's `is_set` and `value` field to
align `value`, which results in an incorrect interpretation of the
command's arguments.
The relay daemon catches the protocol error when it is built in a
configuration that inserts 7 bytes of padding, while the consumer only
inserts three.
Solution
========
The solution proposed here is not perfect, see "Known drawbacks".
First, if we were to annotate the field, patched consumer daemons would
send unintelligible command headers to unpatched relay daemons. Leaving
it as is is the least of all evils, see "Known drawbacks" for more
details.
From the relay daemon end, we can easily anticipate the padding problem
by reading the `stream_count` field and use it to determine the expected
size of the payload.
The difference between the actual size of the payload and the expected
size allows us to determine the padding size and use the appropriate
declaration of the structure to interpret the command's arguments.
Known drawbacks
===============
While this fix causes the relay daemon to handle all improperly packed
command headers received from an unpatched consumer daemon, the reverse
is not completely true.
The following tables show which cases are known to work and which are
known to be broken when patched and unpatched versions of the relay
and consumer daemons are mixed, with the various alignment constraints.
Note that here:
- 4 byte alignement implies "daemon running on an architecture on
which uint64_t is aligned on an 4-byte boundary" (e.g. x86),
- 8 byte alignement implies "daemon running on an architecture on
which uint64_t is aligned on an 8-byte boundary" (e.g. x86-64).
Scenario 1 - Unpatched relay daemon and unpatched consumer daemon
-----------------------------------------------------------------------------------
| Architecture alignment | 4 byte alignement consumerd | 8 byte alignment consumerd |
|------------------------|-----------------------------|----------------------------|
| 4 byte alignment relay | Works | Fails |
| 8 byte alignment relay | Fails | Works |
-----------------------------------------------------------------------------------
Scenario 2 - Patched relay daemon and unpatched consumer daemon
-----------------------------------------------------------------------------------
| Architecture alignment | 4 byte alignement consumerd | 8 byte alignment consumerd |
|------------------------|-----------------------------|----------------------------|
| 4 byte alignment relay | Works | Works |
| 8 byte alignment relay | Works | Works |
-----------------------------------------------------------------------------------
Scenario 3 - Unpatched relay daemon and patched consumer daemon
-----------------------------------------------------------------------------------
| Architecture alignment | 4 byte alignement consumerd | 8 byte alignment consumerd |
|------------------------|-----------------------------|----------------------------|
| 4 byte alignment relay | Works | Fails |
| 8 byte alignment relay | Fails | Works |
-----------------------------------------------------------------------------------
Scenario 4 - Patched relay daemon and patched consumer daemon
-----------------------------------------------------------------------------------
| Architecture alignment | 4 byte alignement consumerd | 8 byte alignment consumerd |
|------------------------|-----------------------------|----------------------------|
| 4 byte alignment relay | Works | Works |
| 8 byte alignment relay | Works | Works |
-----------------------------------------------------------------------------------
Note that Scenarios 1 and 3 are the same since this fix doesn't
change the behaviour of the consumer daemon.
Also note that packing the `new_chunk_id` field would break the two
working cases of scenario 3 which are, in all likelyhood, the more
common cases.
A new command using a properly packed version of the command's header
could be implemented in future versions, but this presents no benefit as
part of this fix.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I3b822a9766ea5be51996a771a0d4a90efe29ce0b
Jonathan Rajotte [Thu, 18 Nov 2021 15:47:16 +0000 (10:47 -0500)]
Test: snapshot after regenerate metadata
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I48ae0e9d3d4e67172c4a686d42ceadf6bcb81ead
Jonathan Rajotte [Wed, 17 Nov 2021 21:18:59 +0000 (16:18 -0500)]
Fix: ust-consumer: segfault on snapshot after regenerate metadata
Observed issue
==============
lttng-consumerd segfaults for the following scenario:
$ lttng create test --snapshot
$ lttng enable-event -u -an
$ lttng start
# Run an app just to have some events
$ lttng regenerate metadata
$ lttng snapshot record
The following backtrace is obtained:
(gdb) bt
#0 __GI___pthread_mutex_lock (mutex=0x130) at ../nptl/pthread_mutex_lock.c:67
#1 0x000055b383cfaed3 in lttng_ustconsumer_recv_metadata (sock=29, key=4, offset=0, len=12391, version=1, channel=0x7fe90000a760, timer=0, wait=1) at ust-consumer.c:1347
#2 0x000055b383d00197 in lttng_ustconsumer_request_metadata (ctx=0x55b3855a1e50, channel=0x7fe90000a760, timer=0, wait=1) at ust-consumer.c:3336
#3 0x000055b383cf9e76 in snapshot_metadata (metadata_channel=0x7fe90000a760, key=4, path=0x7fe911a09944 "uid/1000/64-bit", relayd_id=
18446744073709551615, ctx=0x55b3855a1e50) at ust-consum
#4 0x000055b383cfbe73 in lttng_ustconsumer_recv_cmd (ctx=0x55b3855a1e50, sock=28, consumer_sockpoll=0x7fe911a0dbb0) at ust-consumer.c:1853
#5 0x000055b383ccf9b7 in lttng_consumer_recv_cmd (ctx=0x55b3855a1e50, sock=28, consumer_sockpoll=0x7fe911a0dbb0) at consumer.c:2097
#6 0x000055b383cd3bfd in consumer_thread_sessiond_poll (data=0x55b3855a1e50) at consumer.c:3284
#7 0x00007fe914c22609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#8 0x00007fe914b47293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) up
#1 0x000055b383cfaed3 in lttng_ustconsumer_recv_metadata (sock=29, key=4, offset=0, len=12391, version=1, channel=0x7fe90000a760, timer=0, wait=1) at ust-consumer.c:1347
1347 pthread_mutex_lock(&channel->metadata_stream->lock);
(gdb) print channel->metadata_stream
$1 = (struct lttng_consumer_stream *) 0x0
Note that the following scenario also leads to a similar backtrace:
$ lttng create test --snapshot
$ lttng enable-event -u -an
$ lttng start
# Run an app just to have some events with a short duration
$ lttng regenerate metadata
# Run a second app just to have some events and a metadata flush while
# the metadata cache status is set as `invalidated`.
^ lttng-consumerd segfault on app termination.
The backtrace:
(gdb) bt
#0 __GI___pthread_mutex_lock (mutex=0x130) at ../nptl/pthread_mutex_lock.c:67
#1 0x00005612a5a13ed3 in lttng_ustconsumer_recv_metadata (sock=28, key=2, offset=0, len=12391, version=1, channel=0x7f3978005400, timer=0, wait=1) at ust-consumer.c:1347
#2 0x00005612a5a14d0a in lttng_ustconsumer_recv_cmd (ctx=0x5612a6feee50, sock=28, consumer_sockpoll=0x7f3989494bb0) at ust-consumer.c:1818
#3 0x00005612a59e89b7 in lttng_consumer_recv_cmd (ctx=0x5612a6feee50, sock=28, consumer_sockpoll=0x7f3989494bb0) at consumer.c:2097
#4 0x00005612a59ecbfd in consumer_thread_sessiond_poll (data=0x5612a6feee50) at consumer.c:3284
#5 0x00007f398c6a9609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f398c5ce293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) up
#1 0x00005612a5a13ed3 in lttng_ustconsumer_recv_metadata (sock=28, key=2, offset=0, len=12391, version=1, channel=0x7f3978005400, timer=0, wait=1) at ust-consumer.c:1347
1347 pthread_mutex_lock(&channel->metadata_stream->lock);
(gdb) print channel->metadata_stream
$1 = (struct lttng_consumer_stream *) 0x0
(gdb)
Cause
=====
For session configured in snapshot mode the metadata channel
metadata_stream field is NULL except for a "short" window during the
actual snapshot record action (snapshot_metadata).
The `regenerate metadata` effectively flag the metadata cache as invalid
leading to handling the cache invalidation state
(`CONSUMER_METADATA_CACHE_WRITE_STATUS_INVALIDATED`) in
`lttng_ustconsumer_recv_metadata`. This was introduced by
b1316da1ffbd276fc8271e7a9438e683ad352781 [1].
At that point the function expects the `channel->metadata_stream` to be
non null. This is simply not true for snapshot session metadata channels.
Note that we cannot simply swap `lttng_ustconsumer_request_metadata` and
`create_ust_streams` in `snapshot_metadata` to ensure that the
`channel->metadata_stream` is non null since
`lttng_ustconsumer_recv_metadata` ends up being called on metadata flush when
an app quit. This sequence of events corresponds to the second scenario
put forward in the `Observed Issue` section.
Solution
========
Null check `channel->metadata_stream` and perform only the operation
when it is non null. This partly mirror what is done in `consumer_metadata_wakeup_pipe`.
I am not sure if the check on `channel->monitor` is required but it
seems irrelevant to the notion of resetting the stream consumed position
when the stream exists.
With this taken care off, we find ourself with another
backtrace:
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 [93/122]
#1 0x00007f75cf9b3859 in __GI_abort () at abort.c:79
#2 0x00007f75cf9b3729 in __assert_fail_base (fmt=0x7f75cfb49588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55ab004e9c68 "pthread_mutex_trylock(&stream->lock)", file=0x55ab004
#3 0x00007f75cf9c4f36 in __GI___assert_fail (assertion=0x55ab004e9c68 "pthread_mutex_trylock(&stream->lock)", file=0x55ab004e8d7a "ust-consumer.c", line=1285, function=0x55ab004eb8a0 <__PR
#4 0x000055ab004b1b9c in metadata_stream_reset_cache_consumed_position (stream=0x7f75b400a850) at ust-consumer.c:1285
#5 0x000055ab004b4fef in commit_one_metadata_packet (stream=0x7f75b400a850) at ust-consumer.c:2551
#6 0x000055ab004b5f46 in get_next_subbuffer_metadata (stream=0x7f75b400a850, subbuffer=0x7f75cc972630) at ust-consumer.c:2927
#7 0x000055ab0048b6a9 in lttng_consumer_read_subbuffer (stream=0x7f75b400a850, ctx=0x55ab01d4ee50, locked_by_caller=true) at consumer.c:3522
#8 0x000055ab004b0f66 in snapshot_metadata (metadata_channel=0x7f75b4005400, key=2, path=0x7f75cc972944 "uid/1000/64-bit", relayd_id=
18446744073709551615, ctx=0x55ab01d4ee50) at ust-consum
#9 0x000055ab004b2e86 in lttng_ustconsumer_recv_cmd (ctx=0x55ab01d4ee50, sock=28, consumer_sockpoll=0x7f75cc976bb0) at ust-consumer.c:1861
#10 0x000055ab004869b7 in lttng_consumer_recv_cmd (ctx=0x55ab01d4ee50, sock=28, consumer_sockpoll=0x7f75cc976bb0) at consumer.c:2097
#11 0x000055ab0048abfd in consumer_thread_sessiond_poll (data=0x55ab01d4ee50) at consumer.c:3284
#12 0x00007f75cfb8b609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#13 0x00007f75cfab0293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Which is also caused in part to the invalidation of the cache.
`metadata_stream_reset_cache_consumed_position` expect that the stream
at that point be locked. Which is not the case despite what the last argument
to `lttng_consumer_read_subbuffer` indicates. To alleviate that we hold
the lock during the call to `lttng_consumer_read_subbuffer`.
Known drawbacks
=========
None.
References
=========
[1] https://git.lttng.org/?p=lttng-tools.git;a=commit;h=
b1316da1ffbd276fc8271e7a9438e683ad352781
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ib09fdc989b1baae26d0f496afeabc3e7cb27228c
Simon Marchi [Wed, 25 Aug 2021 18:39:11 +0000 (14:39 -0400)]
lttng: list valid condition / action names if missing or unknown
I think it would be helpful to the user to list the condition and action
names, when the condition or action name is missing or unrecognized.
This patch implements that, here are some examples of the result:
$ lttng add-trigger --action notify --condition
Error: While parsing argument #4: Missing required argument for option `--condition`
Error: Valid condition names are:
Error: event-rule-matches
$ lttng add-trigger --action notify --condition pouet
Error: While parsing argument #5: Unknown condition name 'pouet'
Error: Valid condition names are:
Error: event-rule-matches
$ lttng add-trigger --condition event-rule-matches --action
Error: While parsing argument #4: Missing required argument for option `--action`
Error: Valid action names are:
Error: notify
$ lttng add-trigger --condition event-rule-matches --action pouet
Error: While parsing argument #5: Unknown action name: pouet
Error: Valid action names are:
Error: notify
To achieve this, add a new optional out argument to parse_next_item, to
allow the caller to get the argpar_error object if a parsing error
happened. Because of this, the callers must now be able to
differentiate parsing error from memory errors: in the latter case, no
argpar_error object is returned. So, add a new
PARSE_NEXT_ITEM_STATUS_ERROR_MEMORY status, and make users of
parse_next_item handle it.
In the add-trigger command implementation, handle the "missing opt arg"
case of OPT_ACTION and OPT_CONDITION specially to print the valid names.
Handle unknown names in parse_action and parse_condition.
Add a test for an unknown action name, it seems to be missing. Change
the error message format slightly to make it match the messages for
unknown condition names.
Change-Id: I4c13cecacb3a2ff4367e391c4aba0d05f1f28f22
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Tue, 24 Aug 2021 21:50:32 +0000 (17:50 -0400)]
lttng: mention argument number on unknown action / condition name
When a unrecognized condition or action name is given, the error message
does not contain the part that mentions the argument index, like
argument parsing error messages have:
$ lttng add-trigger --action notify --condition pouet
Error: Unknown condition name 'pouet'
I think it would be useful to have it, it can help when you have really
long command lines (something possible with add-trigger). The result
is:
$ lttng add-trigger --action notify --condition pouet
Error: While parsing argument #4 (`--condition`): Unknown condition name 'pouet'
$ lttng add-trigger --action notify --condition=pouet
Error: While parsing argument #4 (`--condition=pouet`): Unknown condition name 'pouet'
$ lttng add-trigger --condition event-rule-matches --action pouet
Error: While parsing argument #4 (`--action`): Unknown action name: pouet
$ lttng add-trigger --condition event-rule-matches --action=pouet
Error: While parsing argument #4 (`--action=pouet`): Unknown action name: pouet
Change-Id: Ic1a00cffa87ea7b3569b24c7417a00d7a52555f2
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Wed, 25 Aug 2021 18:34:25 +0000 (14:34 -0400)]
lttng: fix argument numbers in add-trigger error messages
Argument numbers in add-trigger argument parsing error messages are
wrong. For example:
$ ./src/bin/lttng/lttng add-trigger --condition event-rule-matches --foo
Error: While parsing argument #1 (`--foo`): Unknown option `--foo`
This is due to the fact that multiple separate argpar iterators are
created to parse an add-trigger command line. Iterators are created at
the top-level, to parse the top-level arguments. Iterators are also
created when parsing a condition or an action, to parse the arguments
specific to that condition or action. As a result, iterators are passed
a subset of the full command line, and the argument indices in the error
messages are off.
Fix that by passing around an "argc offset", which represents by how
much what's being parsed is offset from the full command-line. Use that
to adjust the error messages to give indices that make sense to the
user:
$ ./src/bin/lttng/lttng add-trigger --condition event-rule-matches --foo
Error: While parsing argument #4 (`--foo`): Unknown option `--foo`
Change-Id: I1d312593d338641d0ec10abe515b640771a1f160
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Tue, 24 Aug 2021 01:42:42 +0000 (21:42 -0400)]
argpar-utils: tweak unknown option error message
Add the "While parsing argument #X" context, that is used when other
errors are encountered. Before:
Error: Unknown option `--hello`
After:
Error: While parsing argument #1 (`--hello`): Unknown option `--hello`
Change-Id: Ifd84a1e8b1fe0456e09ff154cc515eaa2850a7e1
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 20 Aug 2021 18:39:20 +0000 (14:39 -0400)]
argpar: sync with upstream - adjust to iterator API
Sync with commit
143cec42e14e ("Force usage of ARGPAR_ASSERT() condition
when NDEBUG is defined").
The main change in this sync is the API that changed from
parse-all-at-once (the `argpar_parse` function) to something based on an
iterator, where we need to call `argpar_iter_next` to obtain the next
item. This was prototyped here (in lttng-tools), so this patch converts
the code to the API that was actually implemented in upstream argpar.
A difference between what we had and the current argpar API is that
argpar does not provide a formatted error string anymore. It provides
an `argpar_error` object contaning all the raw information needed to
create such string. The new `format_arg_error_v` function formats the
errors using the exact same syntax as argpar did, such that no changes
in the tests are necessary.
The new `parse_next_item` function factors out the code around calling
argpar_iter_next that would otherwise be duplicated at a few places.
These two new functions are placed into a new `argpar-utils` convenience
library. I originally put them in the `libcommon.la` convenience
library, but that caused some parts of the code that don't do any
argument parsing (e.g. liblttng-ctl) to have to be linked against
argpar. As a separate library, we can limit that to just the `lttng`
binary.
Change-Id: I94aa90ffcd93f52b6073c4cd7caca78cfd0f2e05
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Mon, 23 Aug 2021 18:32:51 +0000 (14:32 -0400)]
configure: enable -Wformat=2
The -Wformat=2 diagnostic flag on GCC enables the -Wformat-nonliteral
-Wformat-security diagnostics, which are useful to catch some format
string mistakes. -Wformat-security is also enabled by default with
Clang, meaning that there were some warnings only appearing with
Clang.
Try to enabled the -Wformat=2 flag to make things more consistent across
compilers and catch more mistakes.
The only issues are these, in tests/regression/ust/linking:
CC demo_builtin-demo.o
In file included from /usr/include/stdio.h:866,
from /home/simark/src/lttng-tools/tests/regression/ust/linking/demo.c:9:
/usr/include/bits/stdio2.h: In function ‘sprintf’:
/usr/include/bits/stdio2.h:40:35: warning: format not a string literal, argument types not checked [-Wformat-nonliteral]
40 | __va_arg_pack ());
| ^~~~~~~~~~~~~
The reason this appears is that this directory uses -Wsystem-headers,
making the compiler show diagnostics in headers considered "system
headers". Manually silence those warnings by disabling
-Wformat-nonliteral in that specific directory.
Change-Id: I4c7991e76b2f5405f3b3397348adb9134de37d41
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 20 Aug 2021 20:10:37 +0000 (16:10 -0400)]
common: move append_str to string-utils
Move the append_str function from filter-visitor-generate-bytecode.c to
the string-utils lib, so that it can be re-used elsewhere.
Change-Id: Ica09cb750ac7f3f2d6f3fb5fc786b683ceb6f79a
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Philippe Proulx [Tue, 7 Dec 2021 20:47:21 +0000 (15:47 -0500)]
lttng-create(1): specify that `--shm-path` only applies to UST channels
This is a current limitation, but could change in the future.
Fixes: https://bugs.lttng.org/issues/1158
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ie2b8114f35d7fd095790aae6add4653c5da169bf
Jérémie Galarneau [Fri, 10 Dec 2021 21:13:27 +0000 (16:13 -0500)]
Fix: sessiond: action-executor: misquoted strings in logging
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I5b37472c0b9e49116ec542c57346a78359d732d7
Jonathan Rajotte [Fri, 19 Nov 2021 20:03:37 +0000 (15:03 -0500)]
Refactor: action executor: fetch session based on execution context
There is little reason to fetch the session by name if at that point in
the action execution code path the id of the session is available.
Session that are in destroyed state are ignored. This simplify a bit the
overall execution flow.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I4a104e3b509c36b0ceea6e01f9dbd043ceb812b1
Jérémie Galarneau [Fri, 10 Dec 2021 19:10:13 +0000 (14:10 -0500)]
Tests: live kernel: no plan printed when non-root
The live kernel test does not produce a valid TAP output when
skipping due to not running the test as root.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I4a10de494260084216ddb1b8f4ee27a546c4d8ed
Jonathan Rajotte [Thu, 9 Dec 2021 20:14:26 +0000 (15:14 -0500)]
Fix: sessiond: assert on lttng_ht_add_unique_str on ltt_sessions_ht_by_name
Observed issue
==============
The lttng-sessiond asserts with the following backtrace on lttng create:
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007ffff7ab5859 in __GI_abort () at abort.c:79
#2 0x00007ffff7ab5729 in __assert_fail_base (fmt=0x7ffff7c4b588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5555556ab0a6 "node_ptr == &node->node", file=0x5555556ab085 "hashtable.c", line=298, function=<optimized out>) at a
#3 0x00007ffff7ac6f36 in __GI___assert_fail (assertion=assertion@entry=0x5555556ab0a6 "node_ptr == &node->node", file=file@entry=0x5555556ab085 "hashtable.c", line=line@entry=298, function=function@entry=0x5555556ab380 <__PRETTY_FUNCTIO
#4 0x000055555560be44 in lttng_ht_add_unique_str (ht=<optimized out>, node=0x7fffe0026c58) at hashtable.c:298
#5 0x000055555558fb6a in add_session_ht (ls=0x7fffe0024970) at session.c:372
#6 session_create (name=<optimized out>, uid=1000, gid=1000, out_session=out_session@entry=0x7fffedfddbd8) at session.c:1308
#7 0x000055555559b219 in cmd_create_session_from_descriptor (creds=<optimized out>, creds=<optimized out>, home_path=<optimized out>, descriptor=<optimized out>) at cmd.c:3040
#8 cmd_create_session (cmd_ctx=cmd_ctx@entry=0x7fffedfe5fa0, sock=<optimized out>, return_descriptor=return_descriptor@entry=0x7fffedfddd68) at cmd.c:3176
#9 0x00005555555cc341 in process_client_msg (sock_error=0x7fffedfddd10, sock=0x7fffedfddd0c, cmd_ctx=0x7fffedfe5fa0) at client.c:2177
#10 thread_manage_clients (data=<optimized out>) at client.c:2742
#11 0x00005555555c5fff in launch_thread (data=0x55555571b780) at thread.c:66
#12 0x00007ffff7c8b609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#13 0x00007ffff7bb2293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
The issue can be reproduced with modifications to the rotation thread
code and the following scenario:
$ lttng create test
$ lttng enable-event -u -a
$ lttng start
run any app just so that we have a complete valid session. (might not be necessary)
$ lttng destroy --no-wait
$ lttng create test
^ Should assert here.
The diff to be applied:
diff --git a/src/bin/lttng-sessiond/rotation-thread.cpp b/src/bin/lttng-sessiond/rotation-thread.cpp
index
ac149c845..
c11f068ed 100644
--- a/src/bin/lttng-sessiond/rotation-thread.cpp
+++ b/src/bin/lttng-sessiond/rotation-thread.cpp
@@ -565,6 +565,8 @@ int handle_job_queue(struct rotation_thread_handle *handle,
{
int ret = 0;
+ sleep(5);
+
for (;;) {
struct ltt_session *session;
struct rotation_thread_job *job;
Note that the initial report for this issue was on a system under load
for which the `lttng destroy` completion check failed and a `lttng
create` was performed. As of today the exact reason why the completion
check failed is not known. Still we can "fix" the race leading to the
lttng-sessiond assertion considering a user might use the `--no-wait`
variant of `lttng destroy` and could easily end up in this
situation.
Cause
=====
Note: all `lttng create` commands have the same session name passed as
argument.
On `lttng destroy` the ltt_session object is flagged as destroyed
(ltt_session::destroyed). The removal of the object from the hash
table (ltt_sessions_ht_by_name) will be performed during the
`session_release` which is driven by the session refcount.
A reference on the `ltt_session` object is held for the
rotation initiated by the `lttng destroy` command. The rotation is
enqueued by the rotation thread.
At this point the system is busy and the rotation thread does not run.
We simulate this with a `sleep(5)` during the `handle_job_queue`.
The `lttng destroy --no-wait` returns. If the `--no-wait` option is not
passed the `lttng destroy` command will work as expected and wait for
completion. We can SIGINT the `lttng destroy` command and perform a
`lttng create` yielding the same backtrace.
On `lttng create`, `session_create` validates that the name does not
conflict with an existing session using `session_find_by_name`. It is
important to note that `session_find_by_name` discriminates also on the
`session->destroyed` flag (introduced by [1]).
The `ltt_sessions_ht_by_name` hash table was introduced by [2] to remove
the need to lock the session list to sample a session id during the
queueing of actions to be executed related to a trigger. The assumption
was made that, during the creation phase, the session would
always be unique in that hash table based on its name. This is simply
not true since multiple sessions with the same name can coexist as long
as only a single one is marked as "not destroyed". This is an important
concept due to the refcounting of the object and the feature relying on
the lifetime of the object (i.e rotation). This is mostly valid when
talking about the global session list.
Solution
========
Move the hash table removal earlier during the release of the session
object.
Move the removal from `del_session_ht`, which is done during the
`session_release` function, to the `lttng_session_destroy` function.
It is safe to do so since currently the only user of that hash table
(the action executor) does not care much about destroyed session at that
point.
This ensures that we maintain the uniqueness property of the key (name)
for that hash table on insertion.
The alternative was to expose an hash table that could contain
duplicates and force the handling of a set on all lookups.
Known drawbacks
=========
None.
References
==========
[1] https://git.lttng.org/?p=lttng-tools.git;a=commit;h=
e32d7f274604b77bcd83c24994e88df3761ed658
[2] https://git.lttng.org/?p=lttng-tools.git;a=commit;h=
e1bbf98908a6399f39a9a8bc95bd8b59cecaa816
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I2f1d0d6c04ee7210166e9847a850afbe6eaa7609
Jérémie Galarneau [Wed, 3 Nov 2021 02:31:15 +0000 (22:31 -0400)]
Fix: sessiond: snapshot: leak of trace chunk
Valgrind reports a leak after every snapshot record command:
==827791== 430 (280 direct, 150 indirect) bytes in 1 blocks are definitely lost in loss record 34 of 37
==827791== at 0x48435FF: calloc (vg_replace_malloc.c:1117)
==827791== by 0x223D01: zmalloc (macros.h:45)
==827791== by 0x224B79: lttng_trace_chunk_allocate (trace-chunk.c:387)
==827791== by 0x224E41: lttng_trace_chunk_create (trace-chunk.c:427)
==827791== by 0x150B55: session_create_new_trace_chunk (session.c:656)
==827791== by 0x164A11: snapshot_record (cmd.c:5113)
==827791== by 0x1651EE: cmd_snapshot_record (cmd.c:5302)
==827791== by 0x196E74: process_client_msg (client.c:2166)
==827791== by 0x198AF1: thread_manage_clients (client.c:2742)
==827791== by 0x18E245: launch_thread (thread.c:66)
==827791== by 0x4B9E258: start_thread (in /usr/lib/libpthread-2.33.so)
==827791== by 0x4CB45E2: clone (in /usr/lib/libc-2.33.so)
session_set_trace_chunk() on line 5162 returns a reference to the
current trace chunk which is never released.
This also causes tests/regression/tools/snapshots/test_ust_long to fail
due to a file descriptor exhaustion (presumably from using too many
directory file descriptors) when it is executed by an unprivileged user.
The CI doesn't catch this since the long regression test suite is
executed as root.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I4b6e45df48c3daafa2294c80ccd8a2b4d91401e1
Francis Deslauriers [Thu, 28 Oct 2021 15:01:22 +0000 (11:01 -0400)]
Tests: add missing kernel test cases to make check target
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I4ae8b8b1c130dd370322c2629f851d6aa1f557e7
Simon Marchi [Fri, 20 Aug 2021 19:19:42 +0000 (15:19 -0400)]
configure: enable -Wsuggest-attribute=format
Enable this warning, which suggests adding format attributes to some
functions, to help with format string validation. Fix the warnings it
generates by using the new ATTR_FORMAT_PRINTF macro.
There is only one spot we can't fix, that is in modprobe.c. The
compiler suggests we add an attribute to the kmod_set_log_fn
declaration, which we can't do, since it's not our code:
/home/simark/src/lttng-tools/src/bin/lttng-sessiond/modprobe.c: In function ‘setup_kmod_ctx’:
/home/simark/src/lttng-tools/src/bin/lttng-sessiond/modprobe.c:286:31: error: argument 2 of ‘kmod_set_log_fn’ might be a candidate for a format attribute [-Werror=suggest-attribute=format]
286 | kmod_set_log_fn(*ctx, log_kmod, NULL);
| ^~~~~~~~
I don't see any other choice but to explicitly ignore that spot.
Introduce some macros to abstract how to ignore specific diagnostics.
This is useful since not all compilers support the same diagnostic
flags. For example, telling clang to ignore -Wsuggest-attribute=format
would give an error.
Change-Id: I71278d7e2cdc66d4bbc59bd966469d0b427e963d
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 20 Aug 2021 19:22:16 +0000 (15:22 -0400)]
tests: sync tests/utils/tap with Babeltrace repository
There are a few fixes / improvements in the Babeltrace version of the
tap library, bring them here. The Babeltrace commit used is
0022a87819b0 ("Fix: clear_string_field(): set first character to 0") In
particular, I'm looking for the TAP_PRINTF_FORMAT fixes, to be able to
enable -Wsuggest-attribute=format. But I think it's easier to keep them
in sync, so bring in all changes, instead of just the TAP_PRINTF_FORMAT
ones.
The only diff not brought are the semicolons in the pass / fail
definitions, which were removed in the previous patch. Those changes
should be brought to the Babeltrace repository.
Bringing in the TAP_PRINTF_FORMAT changes finds a few format string
mistakes in the tests, fix them.
Change-Id: I0d125b313265e72303be8d704ba40554bca87cd1
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Philippe Proulx [Tue, 13 Jul 2021 14:48:19 +0000 (10:48 -0400)]
lttng_strerror(): accept positive and negative codes
Some functions return positive `lttng_error_code` enumerator values
while other return negative values.
Make lttng_strerror() accept positive and negative codes. This improves
the API UX and will make the documentation less complex.
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I359cef98ae32991d1a01b7f2cd484ece7bab2df9
Jonathan Rajotte [Tue, 23 Nov 2021 17:15:42 +0000 (12:15 -0500)]
Fix: test: use BABELTRACE_BIN instead of babeltrace
Observed issue
==============
The System tests jobs fails on multi-session test since the move to bt2.
Cause
=====
The tests uses `babeltrace` instead of `BABELTRACE_BIN`.
Solution
========
Use `BABELTRACE_BIN`.
Add a babelrace bail out.
While there fix easy shellcheck warning.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I441d736e85c447c5765bffd520ec2f267c86048f
Michael Jeanson [Wed, 11 Nov 2020 16:18:55 +0000 (11:18 -0500)]
Cleanup: consumerd: update stale comment
Change-Id: If65648f32e5b78cfd4e16d86f4cf9eac6c1b3f43
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Thu, 11 Nov 2021 20:02:54 +0000 (15:02 -0500)]
Fix: action executor: ref count imbalance for session object
Observed issue
==============
The following scenario leads to a hang on `lttng destroy`.
# Start lttng-sessiond under gdb
$ gdb lttng-sessiond
set pagination off
set non-stop
start
break action_executor_snapshot_session_handler
$ lttng add-trigger --name my_trigger --condition=event-rule-matches --type=user:tracepoint --name=sample_component:message --action=snapshot-session my_snapshot
$ lttng create --snapshot my_snapshot
$ lttng enable-event -u -a
$ lttng start
$ start an app producing a single sample_component:message
# gdb should break on thread 6
# inside gdb
thread 6
$ lttng destroy my_snapshot
$ lttng create --snapshot my_snapshot
$ lttng enable-event -u -a
$ lttng start
# inside gdb use `continue`
$ lttng destroy my_snapshot
The destroy command hang:
Destroying session my_snapshot.... ....
Cause
=====
The scenario forces the usage of the following code path:
if (session->id != LTTNG_OPTIONAL_GET(item->context.session_id)) {
624├───────────────> DBG("Session id for session `%s` (id: %" PRIu64
625│ " is not the same that was sampled (id: %" PRIu64
626│ " at the moment the work item was enqueued for %s` action of trigger `%s`",
627│ session_name, session->id,
628│ LTTNG_OPTIONAL_GET(item->context.session_id),
629│ get_action_name(action),
630│ get_trigger_name(work_item->trigger));
631│ ret = 0;
632│ goto error_unlock_list;
633│ }
At that point a reference on the session object was taken on line:
610│ session = session_find_by_name(session_name);
But the reference is never put on `error_unlock_list` resulting in a ref
count problem.
Solution
========
Use `session_put` for the code path.
Note that most of the handler also have the same problem that was
introduced by commit
72365501d3148ca977a09bad8de0ec51b427bdd8 [1]
Known drawbacks
=========
None.
Refs
=========
[1] https://github.com/lttng/lttng-tools/commit/
72365501d3148ca977a09bad8de0ec51b427bdd8
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I23c3c089866df74854bbfe64320310c4b28ee41d
Mathieu Desnoyers [Thu, 2 Dec 2021 22:33:55 +0000 (17:33 -0500)]
Fix: relayd: `!vsession->current_trace_chunk` assertion failed
Observed issue
==============
When performing:
#!/bin/bash
lttng create py_syscalls --live
lttng enable-event -u -a
lttng enable-event -k -a
lttng start
babeltrace2 -i lttng-live net://localhost/host/raton/py_syscalls
The relay daemon hits this assertion:
Thread 8 (Thread 0x7fffeeffd700 (LWP 167040) "lttng-relayd"):
#0 0x00007ffff7b1618b in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff7af5859 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007ffff7af5729 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007ffff7b06f36 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x00005555555889bb in viewer_session_attach (vsession=0x7fffdc001400, session=session@entry=0x7fffe8001180) at viewer-session.c:80
#5 0x000055555557bcff in viewer_attach_session (conn=0x7fffd0001140) at live.c:1275
#6 process_control (conn=0x7fffd0001140, recv_hdr=0x7fffeeffcaf0) at live.c:2341
#7 thread_worker (data=<optimized out>) at live.c:2515
#8 0x00007ffff7ccd609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#9 0x00007ffff7bf2293 in clone () from /lib/x86_64-linux-gnu/libc.so.6
Cause
=====
This assert appears to be entirely wrong.
It checks that the "viewer session" has a NULL current trace chunk when
attaching a session to a viewer session, but in the case where a viewer
session has multiple sessions (e.g. with kernel and ust tracing
combined), we are attaching each session individually to the viewer
session, and we set the current trace chunk of the viewer session when
we attach the first session to it.
So it is expected to be non-NULL when attaching the second session.
Solution
========
Remove the assertion.
Known limitations
=================
None.
Fixes: #1335
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I4d8f5d5347b4588144ddf449976cae5a94b81b3a
Jérémie Galarneau [Wed, 8 Dec 2021 21:07:29 +0000 (16:07 -0500)]
Docs: relayd: document the semantics of a viewer session trace chunk
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ia0af16114a5adc54b90f60f26ffffe682eecb1a6
Michael Jeanson [Fri, 15 Oct 2021 16:08:13 +0000 (12:08 -0400)]
relayd: do not link lttng-relayd on liblttng-ctl
Building on Cygwin, we get:
libtool: link: gcc -shared .libs/cyglttng-ctl-0.dll.def .libs/lttng-ctl.o .libs/snapshot.o .libs/lttng-ctl-health.o .libs/save.o .libs/load.o .libs/deprecated-symbols.o .libs/channel.o .libs/rotate.o .libs/event.o .libs/destruction-handle.o .libs/clear.o .libs/tracker.o -Wl,--whole-archive ../../../src/common/sessiond-comm/.libs/libsessiond-comm.a ../../../src/common/.libs/libcommon.a -Wl,--no-whole-archive -L/cygdrive/c/Users/jenkins/workspace/lttng-tools_master_winbuild/arch/cygwin64/babeltrace_version/stable-2.0/build/std/conf/relayd-only/liburcu_version/master/test_type/base/deps/build/lib -lxml2 -L/build/lib -lurcu -lurcu-common -lurcu-cds -lrt -pthread -g -O2 -pthread -o .libs/cyglttng-ctl-0.dll -Wl,--enable-auto-image-base -Xlinker --out-implib -Xlinker .libs/liblttng-ctl.dll.a
/usr/lib/gcc/x86_64-pc-cygwin/11/../../../../x86_64-pc-cygwin/bin/ld: cannot export error_log_time: symbol not defined
This is caused by commit
ca806b0b247f ("liblttng-ctl: use export list to
define exported symbols"). liblttng-ctl is not needed on Cygwin, so
rather than fight it to resolve the issue and have liblttng-ctl build on
Cygwin, skip building liblttng-ctl on Cygwin.
liblttng-ctl is currently built on Cygwin because relayd depends on it.
This should not be the case, as relayd does not use liblttng-ctl. Remove
the dependency on liblttng-ctl from lttng-relayd/Makefile.am. Remove the
dependency on libconfig.la (used for session config save and load) as
well. It's included in libcommon.la, so it's redundant, but relayd
doesn't need it anyway.
Change-Id: If7e1944c4a30b8adcdd6e6d3083a94f27988697e
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 15 Oct 2021 19:15:51 +0000 (15:15 -0400)]
common: split ini-config in its own convenience library
The src/common/config contains code related to two kinds of unrelated
"config": the ini config, used for configuration files, and the XML
session configuration (used for loading/saving sessions, and
incidentally MI).
Split the ini config in its own convenience library, in
src/common/ini-config and keep the rest under src/common/config.
Move ini-related things out of config/session-config.{cpp,h} and into
ini-config/ini-config.{cpp,h}.
Change-Id: Ia0b2b6cdcc15198e20444aa30f1fc86c053176d9
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Fri, 19 Nov 2021 16:40:08 +0000 (11:40 -0500)]
Fix: C++ syntax of macOS compat code
Minors fixes to the macOS compat code to build with a C++ compiler
following the conversion of the binaries.
Change-Id: Ic879d56d0025c838d5a34b0b45d02bae02a12053
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 19 Nov 2021 19:35:51 +0000 (14:35 -0500)]
Docs: optional.h: incorrect declaration example
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I1c9f6e8c3200bc72a5eb29e6459c3449bf0771cd
Michael Jeanson [Thu, 18 Nov 2021 22:43:55 +0000 (17:43 -0500)]
Fix: g++ 4.8 doesn't support non-trivial designated initializers
GCC 4.8 doesn't support initializing a compound literal or an anonymous
structure with only empty brackets '{}'. Add members until it stops
complaining.
Change-Id: I22f05d3f8791e34da2618b0cae282da994c670f3
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Thu, 18 Nov 2021 16:15:29 +0000 (11:15 -0500)]
Fix: pass explicit type to std::min for 32-bit platforms
Provide an explicit type to templates when comparing a 'uint64_t' with a
type that is less than 64-bits on 32-bit platforms.
main.cpp: In function ‘relay_connection_status relay_process_data_receive_payload(relay_connection*)’:
main.cpp:3632:44: error: no matching function for call to ‘min(uint64_t&, const size_t&)’
3632 | size_t recv_size = std::min(left_to_receive, chunk_size);
| ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Change-Id: I8fe8725c5c888cce9c54d564bd668e9723c0f947
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Wed, 17 Nov 2021 20:53:15 +0000 (15:53 -0500)]
Cleanup: ust-app.c: remove extra parenthesis in log statements
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ie3e59dbf9444e48edae9ae97c4a38770f739b698
Michael Jeanson [Mon, 1 Nov 2021 21:06:35 +0000 (17:06 -0400)]
Cleanup: typo: commiting -> committing
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: If2c6ad336e783efb0a93f29f4c5090ad4c360b0a
Michael Jeanson [Mon, 1 Nov 2021 21:05:51 +0000 (17:05 -0400)]
Cleanup: typo: existance -> existence
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I1573b1858896ee631756578afa9639533e75ca47
Francis Deslauriers [Tue, 16 Nov 2021 19:49:17 +0000 (14:49 -0500)]
Cleanup: typo: everythong -> everything
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I21604bf997ba8afd1824849f865cc3adf6c1e528
Francis Deslauriers [Fri, 1 Oct 2021 21:02:27 +0000 (17:02 -0400)]
Cleanup: typo: mutliple -> multiple
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ia9b1b4adcaed7045a60418bae93a4cf73a588f25
Simon Marchi [Fri, 15 Oct 2021 19:55:28 +0000 (15:55 -0400)]
Remove extern "C" from internal headers
All internal code is now compiled as C++, we can now remove all 'extern
"C"' declarations from internal headers. This means files will see each
other's declarations as C++, and we can now use things in headers.
Remove the min/min_t/max/max_t macros from macros.h as well.
Change-Id: I5a6b7ef60be5f46160c6d5ca39f082d2137d5a07
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Mon, 6 Sep 2021 13:45:35 +0000 (09:45 -0400)]
tests: compile some tools/tests as C++
These tests use things from the common libs, or at least include header
files from src/common. These files are going to contain C++-specific
things in a following commit, so it's easier if we compile them
tools/tests as C++.
Change-Id: Ib99f2373beb414c50eaa10b35e0d895bc37e4e64
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Sat, 4 Sep 2021 23:32:47 +0000 (19:32 -0400)]
tests: compile all unit tests as C++
Change-Id: I9938237cee13a534d5c52288983238c0871dc504
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Wed, 22 Sep 2021 12:21:14 +0000 (08:21 -0400)]
common: compile libcommon as C++
Change-Id: I5160ef36932d71a4d925018fe0763bbc78b88009
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 3 Sep 2021 21:31:29 +0000 (17:31 -0400)]
common: compile libconsumer, libust-consumer, libkernel-consumer as C++
Change-Id: I6d51a069b360121152286a674d551fd5e80bfe2f
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 3 Sep 2021 21:31:29 +0000 (17:31 -0400)]
common: compile libindex as C++
Change-Id: Ia9ade43bad7d53a652e83529d0e4f3d29b7839c8
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 3 Sep 2021 21:31:29 +0000 (17:31 -0400)]
common: compile libtestpoint as C++
Change-Id: I2dae562c4632ea3ecfa1365f573baa6d6ba232bf
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 3 Sep 2021 21:31:29 +0000 (17:31 -0400)]
common: compile librelayd as C++
Change-Id: I59e983fce3bc3a9850a9df7c1b76966d2c9f175c
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Wed, 22 Sep 2021 12:16:19 +0000 (08:16 -0400)]
common: compile libsessiond-common as C++
Convert lttcomm_readable_code to a switch in a function, and rework
lttcomm_get_readable_code a bit. That changes the behavior a bit, but
probably not in a meaningful way.
Change-Id: I92da95f5d27de9df176835e820dd81ab93fb7b89
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 3 Sep 2021 21:31:29 +0000 (17:31 -0400)]
common: compile libkernel-ctl as C++
Change-Id: If0c92341d8efea2d093d404d25399bd8d75fc059
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 3 Sep 2021 21:31:29 +0000 (17:31 -0400)]
common: compile libhealth as C++
Change-Id: Ie5362170a7b58850bbb00b30e130dfb0986a2dd3
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Wed, 22 Sep 2021 12:13:04 +0000 (08:13 -0400)]
common: compile libhashtable as C++
Change-Id: Ia91d3207ffffb0cd45ee987107058dc9e4690c94
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Wed, 6 Oct 2021 16:16:33 +0000 (12:16 -0400)]
common: compile libfilter as C++
This patch renames filter-lexer.l to filter-lexer.lpp and
filter-parser.y to filter-parser.ypp. That makes automake pass the
right options to flex/bison to generate C++ code.
In filter-lexer.lpp, Instead of having declarations with the `unused`
attribute for yyunput and yyinput, use the noinput and nounput options.
The rest of the changes are standard C to C++ conversion stuff.
Change-Id: Ie4bf1981b970145f97e8db1d88edaa2d9b95aef4
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Wed, 22 Sep 2021 01:06:30 +0000 (21:06 -0400)]
common: compile libfd-tracker as C++
Change-Id: I6eed13d24dd8a306f4aa474f7d739931f0750635
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Wed, 22 Sep 2021 01:00:28 +0000 (21:00 -0400)]
common: compile libconfig as C++
As seen previously, make some global variables that need to be exported
non-const.
config_element_tracker_pid_target_legacy seems unused and clang gives an
unused warning. It was not exposed publicly, so I think it's safe to
remove it.
Change-Id: I6f9e7d77a7a04b02ae6585383c11389869b4a79a
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 3 Sep 2021 21:31:29 +0000 (17:31 -0400)]
common: compile libcompat as C++
I got errors like these for programs that use libcompat (usually through
libcommon), but are still linked with gcc, rather than g++:
CCLD filter-grammar-test
/usr/bin/ld: ./.libs/libcommon.a(directory-handle.o):(.data.rel.local.DW.ref.__gxx_personality_v0[DW.ref.__gxx_personality_v0]+0x0): undefined reference to `_
_gxx_personality_v0'
Automake still links them with gcc, because they don't contain any C++
source directly. Fix that by changing them to be C++ source.
Change-Id: I3eeca3d9af8940795b69f48d306f282ae0b08589
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 17 Nov 2021 23:55:35 +0000 (18:55 -0500)]
.gitignore: ignore generated filter parser sources
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I19fd4aa8cb4ef0c48fbacf61aa869497777e12fa
Simon Marchi [Wed, 22 Sep 2021 00:57:54 +0000 (20:57 -0400)]
common: compile libbytecode as C++
Change-Id: I33642d43db58b9631772ad99f6fb9d4babf9d888
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 3 Sep 2021 21:31:29 +0000 (17:31 -0400)]
common: compile libstring-utils as C++
The code of string-utils.cpp is compiled as C++, but the functions are
still exported as C symbols for the moment (until all users are
converted to C++).
The only thing of interest here is this error:
CXX string-utils.lo
/home/simark/src/lttng-tools/src/common/string-utils/string-utils.cpp: In function ‘star_glob_pattern_type_flags strutils_test_glob_pattern(const char*)’:
/home/simark/src/lttng-tools/src/common/string-utils/string-utils.cpp:89:37: error: invalid conversion from ‘int’ to ‘star_glob_pattern_type_flags’ [-fpermissive]
89 | ret |= STAR_GLOB_PATTERN_TYPE_FLAG_END_ONLY;
| ~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| |
| int
In C++, you can't freely use bitwise operator on enumerators. I added
an operator|= free function to handle it, which converts to the
underlying type and back. If we have many of these enums used as flags,
we could think of adding a class for that, like enum_flags in GDB:
https://gitlab.com/gnutools/binutils-gdb/-/blob/
7a6cb96b710257a4f5bc7e85cc103b6bf8dfc25c/gdbsupport/enum-flags.h
Change-Id: I64b458a6f6c1e5a131525826a116607eef824aaa
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Wed, 6 Oct 2021 16:15:33 +0000 (12:15 -0400)]
lib: compile liblttng-ctl as C++
Same as the previous commits, but compile the liblttng-ctl library as
C++ code (while still offering a C interface).
Some exported global variables (for example in deprecated-symbols.cpp)
have to be made non-const, otherwise we get:
CXX deprecated-symbols.lo
/home/simark/src/lttng-tools/src/lib/lttng-ctl/deprecated-symbols.cpp:21:33: error: ‘visibility’ attribute ignored [-Werror=attributes]
21 | LTTNG_EXPORT const char * const config_element_pid_tracker;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
I think this is related to the fact that const global variables
automatically have internal linkage in C++.
Despite using -export-symbols in src/lib/lttng-ctl/Makefile.am, some new
ELF symbols become exposed. It could be related to this, in the ld man
page:
--retain-symbols-file does not discard undefined symbols, or
symbols needed for relocations.
One new symbol I see, for example, is `_Z16connect_sessiondv`. Looking
at liblttng-ctl.so, I indeed see a relocation for this symbol:
000000000010b778 0000053e00000007 R_X86_64_JUMP_SLOT
00000000000314a2 _Z16connect_sessiondv + 0
And that would explain why the linker keeps that symbol visible. This
is related to the entry for this function in the procedure linkage
table. I'm not entirely sure why these functions didn't generate a PLT
entry in C, but do in C++.
To avoid these new symbols, build everything with -fvisibility=hidden
and -fvisibility-inlines-hidden, and tag functions we really want to
export with LTTNG_EXPORT, a new macro defined to
__attribute__((visibility("default"))). This macro is publicly visible,
because it has to be used in distributed header files (although it's of
no use for users of liblttng-ctl).
Change-Id: Ie51bf0a2edfb87e5f46f9c39eed5309d9f8c41d6
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 3 Sep 2021 21:31:29 +0000 (17:31 -0400)]
bin: compile lttng-consumerd as a C++
Same as the previous commits, but change lttng-consumerd to be a C++
program.
Function lttng_consumer_get_type is dlsym-ed by
tests/regression/tools/notification/consumer_testpoints.c, so we have
two choices: make the dlsym use the mangled name or make the function
have a C linkage name. The latter is better, because the mangled name
is implementation-defined, and therefore more fragile.
Change-Id: I1986e38c193a935721e52c89426c89fd434ccd6b
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 3 Sep 2021 21:31:29 +0000 (17:31 -0400)]
bin: compile lttng-crash as a C++
Same as the previous commits, but change lttng-crash to be a C++
program.
Change-Id: Ia487e12d2db07e2cd091637d38669289bcaff5d4
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 3 Sep 2021 21:31:29 +0000 (17:31 -0400)]
bin: compile lttng-relayd as a C++
Same as the previous commit, but change lttng-relayd to be a C++
program. In addition to what was mentioned in previous commits:
- Test test_relayd_backward_compat_group_by_session uses an object file
from the relayd binary, so it's simpler to change this test to be C++
as well.
- There are some declarations related to fs_handle in fd-tracker.h.
Remove them, as they are already in fs-handle.h.
Change-Id: I72c5fdecd2e82c30633563a47bd641af1dcfa29c
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Wed, 6 Oct 2021 16:14:41 +0000 (12:14 -0400)]
bin: compile lttng-sessiond as C++
Same as commit
48a400056134 ("bin: compile lttng as C++"), but change
lttng-sessiond to be a C++ program. In addition to the categories of
changes already mentioned in that commit's message, here are some
interesting changes:
- Add an include in trigger.h, an exported header, to fix:
CXX notification-thread.lo
In file included from /home/simark/src/lttng-tools/src/bin/lttng-sessiond/notification-thread.cpp:9:
/home/simark/src/lttng-tools/include/lttng/trigger/trigger.h:142:13: error: use of enum ‘lttng_error_code’ without previous declaration
142 | extern enum lttng_error_code lttng_register_trigger_with_name(
| ^~~~~~~~~~~~~~~~
- We get this with clang:
CXX lttng-conf.o
In file included from /home/simark/src/lttng-tools/src/bin/lttng/conf.cpp:18:
In file included from /home/simark/src/lttng-tools/src/common/common.h:14:
In file included from /home/simark/src/lttng-tools/src/common/runas.h:17:
In file included from /home/simark/src/lttng-tools/src/common/sessiond-comm/sessiond-comm.h:38:
In file included from /home/simark/src/lttng-tools/src/common/unix.h:17:
/home/simark/src/lttng-tools/src/common/payload-view.h:82:27: error: 'lttng_payload_view_from_payload' has C-linkage specified, but returns user-defined type 'struct lttng_payload_view' which is incompatible with C [-Werror,-Wreturn-type-c-linkage]
struct lttng_payload_view lttng_payload_view_from_payload(
^
Turns out that because of the "const" field in lttng_payload_view,
clang doesn't consider that type incompatible with C. I don't
really want to remove the "const" for C code using that API, so
conditionally remove it if we are compiling with clang in C++.
- clang gives:
CXX event.lo
In file included from /home/simark/src/lttng-tools/src/bin/lttng-sessiond/event.cpp:19:
/home/simark/src/lttng-tools/src/common/bytecode/bytecode.h:50:1: error: struct has size 0 in C, size 1 in C++ [-Werror,-Wextern-c-compat]
struct literal_string {
^
It looks like that type isn't even used? Remove it.
- it's not possible to initialize some union members, for example with
lttcomm_consumer_msg, in consumer.cpp. Initialize it in a separate
statement.
- It's not possible to use the transparent union trick when calling
urcu function, for example in thread_application_registration, in
register.cpp. We need to instantiate a cds_wfcq_head_ptr_t object,
assign the appropriate field, and pass that object to the function.
- the ALIGNED_CONST_PTR trick does not work in C++:
CXX consumer.lo
In file included from /home/simark/src/lttng-tools/src/common/error.h:19,
from /home/simark/src/lttng-tools/src/common/common.h:12,
from /home/simark/src/lttng-tools/src/bin/lttng-sessiond/consumer.cpp:19:
/home/simark/src/lttng-tools/src/bin/lttng-sessiond/consumer.cpp: In function ‘int consumer_send_relayd_socket(consumer_socket*, lttcomm_relayd_sock*, consumer_output*, lttng_stream_type, uint64_t, const char*, const char*, const char*, int, const uint64_t*, time_t, bool)’:
/home/simark/src/lttng-tools/src/common/macros.h:116:58: error: expected primary-expression before ‘]’ token
116 | #define ALIGNED_CONST_PTR(value) (((const typeof(value) []) { value }))
| ^
/home/simark/src/lttng-tools/src/bin/lttng-sessiond/consumer.cpp:1192:48: note: in expansion of macro ‘ALIGNED_CONST_PTR’
1192 | ret = consumer_send_fds(consumer_sock, ALIGNED_CONST_PTR(rsock->sock.fd), 1);
| ^~~~~~~~~~~~~~~~~
Replace uses with copying the data in a local variable (which is
properly aligned), and pass the address to that variable to the
function.
- In consumer.h, an array field in a structure is defined using
the max macro. It can't be replaced with std::max, since std::max
isn't constexpr in C++11. Define a max_constexpr function locally
and use it.
- g++ 7 doesn't support non-trivial designated initializers, leading to
errors like:
CXX globals.lo
/home/smarchi/src/lttng-tools/src/bin/lttng-sessiond/globals.cpp:44:1: sorry, unimplemented: non-trivial designated initializers not supported
};
^
Change consumer_data to have a constructor instead. Change
initializations of some structures, such as lttcomm_lttng_msg, to
initialize the fields separate from the variable declaration. This
requires making these variable non-const which is not ideal. But
once everything is C++, these types could get a fancy constructor,
and then they can be made const again.
- When compiling without UST support the stub versions of functions
ust_app_rotate_session & co, in ust-app.h, are used. Some of them
have the return type "enum lttng_error_code", but return 0, an invalid
value, causing:
CXX main.o
In file included from /home/smarchi/src/lttng-tools/src/bin/lttng-sessiond/lttng-sessiond.h:22:0,
from /home/smarchi/src/lttng-tools/src/bin/lttng-sessiond/main.cpp:45:
/home/smarchi/src/lttng-tools/src/bin/lttng-sessiond/ust-app.h: In function ‘lttng_error_code ust_app_snapshot_record(ltt_ust_session*, const consumer_output*, int, uint64_t)’:
/home/smarchi/src/lttng-tools/src/bin/lttng-sessiond/ust-app.h:575:9: error: invalid conversion from ‘int’ to ‘lttng_error_code’ [-fpermissive]
return 0;
^
Change these functions to return LTTNG_ERR_UNK. These functions are
not supposed to be called if UST support is not included. But even
if they were: all their callers check that the return value is not
LTTNG_OK. The value 0 would be considered an error, so will be
LTTNG_ERR_UNK.
Change-Id: I2cdd34459a54b1943087b43843ef20b35b7bf7d8
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Wed, 10 Nov 2021 13:42:25 +0000 (08:42 -0500)]
Fix: tests: fix unused-but-set warning in test_fd_tracker.c
When building with clang-14 on Ubuntu 20.04, I get:
CC test_fd_tracker.o
/home/smarchi/src/lttng-tools/tests/unit/test_fd_tracker.c:169:15: error: variable 'fds_set_to_minus_1' set but not used [-Werror,-Wunused-but-set-variable]
unsigned int fds_set_to_minus_1 = 0;
^
The compiler seems right, so remove fds_set_to_minus_1. It might be
that the intention was to assert something using this variable, but I
couldn't figure it out.
Change-Id: I12bfd07bca7829de8d5b85d375d9b52bd84d677a
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Wed, 10 Nov 2021 13:39:22 +0000 (08:39 -0500)]
Fix: sessiond: fix possible buffer overflow warning
When compiling with clang-14 on Ubuntu 20.04, I get:
CC lttng-syscall.lo
/home/smarchi/src/lttng-tools/src/bin/lttng-sessiond/lttng-syscall.c:70:13: error: 'fscanf' may overflow; destination buffer in argument 4 has size 255, but the corresponding specifier may require size 256 [-Werror,-Wfortify-source]
&index, name, &bitness) == 3) {
^
I think the compiler is right, we read a string when length up to 255 in
a buffer of size 255. We need one more byte for the NULL terminator,
fix that.
Change-Id: I6b2eec401af3ef6230dd4b6c8559032de9b54584
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Wed, 17 Nov 2021 02:29:16 +0000 (21:29 -0500)]
common: fix compilation when strnlen, strnup and memrchr are not available
Add a few casts to fix build failure when these functions are not
available. For example:
CXX lttng-conf.o
In file included from /home/smarchi/src/lttng-tools/src/common/macros.h:15:0,
from /home/smarchi/src/lttng-tools/src/common/error.h:19,
from /home/smarchi/src/lttng-tools/src/common/common.h:12,
from /home/smarchi/src/lttng-tools/src/bin/lttng/conf.cpp:18:
/home/smarchi/src/lttng-tools/src/common/compat/string.h: In function ‘size_t lttng_strnlen(const char*, size_t)’:
/home/smarchi/src/lttng-tools/src/common/compat/string.h:28:14: error: invalid conversion from ‘const void*’ to ‘const char*’ [-fpermissive]
end = memchr(str, 0, max);
~~~~~~^~~~~~~~~~~~~
This can be tested by modifying config.cache and setting the relevant
config variables to "no".
Change-Id: Ieb6debf32c82927767ad32d2f4aa62fb829a9f9f
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Mon, 23 Aug 2021 21:12:28 +0000 (17:12 -0400)]
Fix: tests: app unregistering is not guaranteed by app lifetime
Observed issue
==============
The per-pid timer based rotation tests fail on a minimal ptest
yocto image.
The test suite report that the second archive is not empty as it
expects.
Note that the yocto/OE image is running under QEMU without
KVM.
Cause
=====
Since the image is running under QEMU without KVM, the overall
processing capability of the VM is quite limited.
The test seems to assume that between the first and the second rotation
the app will be unregistered by the time the second rotation is issued.
Note that the observable lifetime of an app is not equal to the
lttng-sessiond/consumerd app visibility since we deal with app
unregistration via a polling mechanism.
Note, that as far as I understand, this is a testing issue only.
It is still relevant in the context of rotation to validate that the second
rotation archive does NOT contain info for a "dead" app under per-pid
configuration.
Solution
========
Move the rotation timer operation after the app is registered and
considered unregistered from the point of view of
lttng-sessiond/lttng-consumerd. This should give us a more robust
approach.
Known drawbacks
=========
None.
References
==========
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ie8c542d29ef8bdb325efc05de14e80b179c68754
Jérémie Galarneau [Tue, 2 Nov 2021 19:35:27 +0000 (15:35 -0400)]
Clean-up: lttngctl: copy string using strdup instead of asprintf
Use strdup to copy a string instead of using the formatting utilities.
The code of the function is also adapted to follow the coding standards.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I13328737e42a0287a7402761f75c67eef0dbdfdd
Jonathan Rajotte [Tue, 19 Oct 2021 19:22:39 +0000 (15:22 -0400)]
Fix: lttng-ctl: tracing_group memory leaks
Observed issue
==============
liblttng-ctl leaks memory if `lttng_set_tracing_group` is called at least
1 time by an API client.
joraj@~/lttng/master/lttng-tools-dev [master][]$ valgrind --leak-check=full lttng --group=joraj list
==24823== Memcheck, a memory error detector
==24823== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==24823== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==24823== Command: lttng --group=joraj list
==24823==
Error: No session daemon is available
==24823==
==24823== HEAP SUMMARY:
==24823== in use at exit: 8 bytes in 1 blocks
==24823== total heap usage: 55 allocs, 54 frees, 87,023 bytes allocated
==24823==
==24823== 8 bytes in 1 blocks are definitely lost in loss record 1 of 1
==24823== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==24823== by 0x4BA7DC7: __vasprintf_internal (vasprintf.c:71)
==24823== by 0x4C4B742: __asprintf_chk (asprintf_chk.c:34)
==24823== by 0x48687D9: asprintf (stdio2.h:181)
==24823== by 0x48687D9: lttng_set_tracing_group (lttng-ctl.c:2620)
==24823== by 0x4011B89: call_init.part.0 (dl-init.c:72)
==24823== by 0x4011C90: call_init (dl-init.c:30)
==24823== by 0x4011C90: _dl_init (dl-init.c:119)
==24823== by 0x4001139: ??? (in /usr/lib/x86_64-linux-gnu/ld-2.31.so)
==24823== by 0x2: ???
==24823== by 0x1FFEFFFCFE: ???
==24823== by 0x1FFEFFFD04: ???
==24823== by 0x1FFEFFFD12: ???
==24823==
==24823== LEAK SUMMARY:
==24823== definitely lost: 8 bytes in 1 blocks
==24823== indirectly lost: 0 bytes in 0 blocks
==24823== possibly lost: 0 bytes in 0 blocks
==24823== still reachable: 0 bytes in 0 blocks
==24823== suppressed: 0 bytes in 0 blocks
==24823==
==24823== For lists of detected and suppressed errors, rerun with: -s
==24823== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Cause
=====
The allocated pointer in the library constructor is not freed on
subsequent assignation.
Solution
========
Free the pointer.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ie1d4c45df2764a88c74d56de691783df9215633c
Jonathan Rajotte [Fri, 15 Oct 2021 19:50:19 +0000 (15:50 -0400)]
Cleanup: missing ')'
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I3198306b7f7cb081155dd53631db26e282f23526
Francis Deslauriers [Tue, 2 Nov 2021 13:33:02 +0000 (09:33 -0400)]
Fix: use <unistd.h> instead of <sys/unistd.h>
Fixes: #1330
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I07cabde5a0295de06f7c6f42dd12de803b57c907
Francis Deslauriers [Mon, 1 Nov 2021 19:31:26 +0000 (15:31 -0400)]
Fix: Tests: unchecked `close()` return value
CID
1465101 (#1 of 1): Unchecked return value (CHECKED_RETURN)
9. check_return: Calling close without checking return value (as is done
elsewhere 177 out of 185 times).
CID
1465100 (#1 of 1): Unchecked return value (CHECKED_RETURN)
4. check_return: Calling close without checking return value (as is done
elsewhere 177 out of 185 times)
CID
1465099 (#1 of 1): Unchecked return value (CHECKED_RETURN) 4.
check_return: Calling close without checking return value (as is done
elsewhere 177 out of 185 times).
CID
1465098 (#1 of 1): Unchecked return value (CHECKED_RETURN) 4.
check_return: Calling close without checking return value (as is done
elsewhere 177 out of 185 times).
CID
1465097 (#1 of 1): Unchecked return value (CHECKED_RETURN) 4.
check_return: Calling close without checking return value (as is done
elsewhere 177 out of 185 times).
Reported-by: Coverity Scan
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I8e2552c75ab7cec5aa3707e2c1c4d9f2484b501a
Jérémie Galarneau [Mon, 1 Nov 2021 19:43:55 +0000 (15:43 -0400)]
Fix: relayd: live: mishandled initial null trace chunk
Observed issue
==============
As reported in #1323 (https://bugs.lttng.org/issues/1323), crashes of
the relay daemon are observed when running the user space clear tests.
The crash occurs with the following stack trace:
#0 0x000055fbb861d6ae in urcu_ref_get_unless_zero (ref=0x28) at /usr/local/include/urcu/ref.h:85
#1 lttng_trace_chunk_get (chunk=0x0) at trace-chunk.c:1836
#2 0x000055fbb86051e2 in make_viewer_streams (relay_session=relay_session@entry=0x7f6ea002d540, viewer_session=<optimized out>, seek_t=seek_t@entry=LTTNG_VIEWER_SEEK_BEGINNING, nb_total=nb_total@entry=0x7f6ea9607b00, nb_unsent=nb_unsent@entry=0x7f6ea9607aec, nb_created=nb_created@entry=0x7f6ea9607ae8, closed=<optimized out>) at live.c:405
#3 0x000055fbb86061d9 in viewer_get_new_streams (conn=0x7f6e94000fc0) at live.c:1155
#4 process_control (conn=0x7f6e94000fc0, recv_hdr=0x7f6ea9607af0) at live.c:2353
#5 thread_worker (data=<optimized out>) at live.c:2515
#6 0x00007f6eae86a609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#7 0x00007f6eae78f293 in clone () from /lib/x86_64-linux-gnu/libc.so.6
The race window during which this occurs seems very small as it can take
hours to reproduce this crash. However, a minimal reproducer could be
identified, as stated in the bug report.
Essentially, the same crash can be reproduced by attaching a live viewer
to a session that has seen events being produced, been stopped and been
cleared.
Cause
=====
The crash occurs as an attempt is made to take a reference to a viewer
session’s trace chunk as viewer streams are created. The crux of the
problem is that the code doesn’t expect a viewer session’s trace chunk
to be NULL.
The viewer session’s current trace chunk is initially set, when a viewer
attaches to the viewer session, to a copy the corresponding
relay_session’s current trace chunk.
A live session always attempts to "catch-up" to the newest available
trace chunk. This means that when a viewer reaches the end of a trace
chunk, the viewer session may not transition to the "next" one: it jumps
to the most recent trace chunk available (the one being produced by the
relay_session). Hence, if the producer performs multiple rotations
before a viewer completes the consumption of a trace chunk, it will skip
over those "intermediary" trace chunks.
A viewer session updates its current trace chunk when:
1) new viewer streams are created,
2) a new index is requested,
3) metadata is requested.
Hence, as a general principle, the viewer session will reference the
most recent trace chunk available _even if its streams do not point to
it_. It indicates which trace chunk viewer streams should transition to
when the end of their current trace chunk is reached.
The live code properly handles transitions to a null chunk. This can be
verified by attaching a viewer to a live session, stopping the session,
clearing it (thus entering a null trace chunk), and resuming tracing.
The only issue is that the case where the first trace chunk of a viewer
session is "null" (no active trace chunk) is mishandled in two places:
1) in make_viewer_streams(), where the crash is observed,
2) in viewer_get_metadata().
Solution
========
In make_viewer_streams(), it is assumed that a viewer session will have
a non-null trace chunk whenever a rotation is not ongoing. This is
reflected by the fact that a reference is always acquired on the viewer
session’s trace chunk.
That code is one of the three places that can cause a viewer session’s
trace chunk to be updated. We still want to update the viewer session to
the most recently seen trace chunk (null, in this case). However, there
is no reference to acquire and the trace chunk to use for the creation
of the viewer stream is NULL. This is properly handled by
viewer_stream_create().
The second site to change is viewer_get_metadata() which doesn’t handle
a viewer metadata stream not having an active trace chunk at all.
Thankfully, the protocol allows us to express this condition by
returning the LTTNG_VIEWER_NO_NEW_METADATA status code when a viewer
metadata stream doesn’t have an open file and doesn’t have a current
trace chunk.
Surprisingly, this bug didn’t trigger in the case where a transition to
a null chunk occurred _after_ attaching to a viewer session.
This is because viewers will typically ask for metadata as a result of an
LTTNG_VIEWER_FLAG_NEW_METADATA reply to the GET_NEXT_INDEX command. When
a session is stopped and all data was consumed, this command returns
that no new data is available, causing the viewers to wait and ask again
later.
However, when attaching, babeltrace2 (at least, and probably babeltrace 1.x)
always asks for an initial segment of metadata before asking for an
index.
Known drawbacks
===============
None.
Fixes: #1323
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I516fca60755e6897f6b7170c12d706ef57ad61a5
Jérémie Galarneau [Mon, 1 Nov 2021 19:44:04 +0000 (15:44 -0400)]
Docs: relayd: document the lifetime of viewer session trace chunks
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I167ae4e651099c824bb24d5497199d1414b6ca1a
Francis Deslauriers [Fri, 8 Oct 2021 13:41:04 +0000 (09:41 -0400)]
Tests: use babeltrace2 for all tests
- Change value of `BABELTRACE_BIN` to `babeltrace2`,
- Replace all direct calls to babeltrace with calls to `BABELTRACE_BIN`
variable,
- Add `bail_out_if_no_babeltrace` bash function to fail the test if
Babeltrace 2 is needed but not found.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ieeffe4977660f47561732223e94a1b8f5a00ef0e
Francis Deslauriers [Thu, 7 Oct 2021 19:25:28 +0000 (15:25 -0400)]
Tests: port validate_select_poll_epoll.py to bt2 python bindings
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I6ed3177e5bd31a56f4a1e84b53bf1d9025ac3161
Francis Deslauriers [Thu, 30 Sep 2021 18:43:11 +0000 (14:43 -0400)]
Fix: configure.ac: reporting SDT uprobe as a UST feature
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I86638d6a148b04e7131e4af7ec830c5e56817fdc
Francis Deslauriers [Thu, 28 Oct 2021 19:36:49 +0000 (15:36 -0400)]
Cleanup: Tests: remove trace after test
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I28024856d22f395922996fbb69c490fb1783b2e9
Francis Deslauriers [Wed, 20 Oct 2021 14:46:39 +0000 (10:46 -0400)]
Cleanup: Remove unused `live_find_viewer_stream_by_id()` func declaration
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I78f977a94ea3609fbee1176608ff97399e314e7f
Francis Deslauriers [Thu, 7 Oct 2021 18:52:27 +0000 (14:52 -0400)]
Fix: Tests: leaking epoll fd
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I5ec4fcdb87159f35932c20e7314cda764d14967c
Francis Deslauriers [Mon, 25 Oct 2021 15:32:24 +0000 (11:32 -0400)]
Typo: occurences -> occurrences
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I719e26febd639f3b047b6aa6361fc6734088e871
Jérémie Galarneau [Mon, 18 Oct 2021 18:43:18 +0000 (14:43 -0400)]
Fix: lttng: add-context: silence coverity warning
Coverity reports that:
1464653 Uninitialized scalar field
The field will contain an arbitrary value left over from earlier
computations.
In ctx_opts::ctx_opts(): A scalar field is not initialized by the
constructor.
uninit_member: Non-static class member hide_help is not initialized
in this constructor nor in any functions that it calls.
In our case it doesn't matter since a nullptr symbol indicates the end
the ctx_opts array. An "unknown" value is added to the context array
and used to initialize the end-of-list item to an invalid value.
Calling the ctx_opts(const char *symbol_, context_type ctx_type_, bool
hide_help_ = false) constructor causes `hide_help` to be initialized.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ie685fef5712705a1938d9e6b8a63d23ab5bc6b39
Jérémie Galarneau [Fri, 15 Oct 2021 21:03:38 +0000 (17:03 -0400)]
Build fix: Missing message in LTTNG_DEPRECATED invocation
Coverity scan build jobs fail since LTTNG_DEPRECATED expects a string
and none is provided at the lttng_metadata_regenerate use site.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I7e6701abd24c679f578b0adead771ac93b6566cd
Simon Marchi [Fri, 3 Sep 2021 21:31:28 +0000 (17:31 -0400)]
bin: compile lttng as C++
Compile the code of the lttng binary as C++ source.
- start by renaming all files under src/bin/lttng to have the .cpp
extension, adjust Makefile.am accordingly
- apply the sophisticated algorithm:
while does_not_build():
fix_error()
until completion
Fixes fall in these categories:
- add extern "C" to headers of functions implemented in C. This is
likely temporary: at some point some of these things will be
implemented in C++, at which point we'll remove the extern "C".
- rename mi_lttng_version to mi_lttng_version_data, to avoid a -Wshadow
warning about the mi_lttng_version function hiding the
mi_lttng_version's struct constructor
- we have the same warning about lttng_calibrate, but we can't rename
it, it's exposed in a public header. Add some pragmas to disable the
warning around there. We will need more macro smartness in case we
need to support a compiler that doesn't understand these pragmas.
- in filter-ast.h, add a dummy field to the empty struct, to avoid a
-Wextern-c-compat warning with clang++ (it warns us that the struct
has size 0 in C but size 1 in C++).
- in add_context.cpp, we can't initialize ctx_opts' union field like we
did in C. Fix that by adding a ctx_opts constructor for each kind of
context and implement the PERF_* macros to use them.
- need to explicitly cast void pointer to type of the destination, for
example the eturn value of allocation functions, or parameter of
"destroy" functions
- need to explicitly cast when passing an int to an enum parameter, for
example an lttng_error_code parameter
- remove use of designated array initializers, for example for
schedule_type_str in disable_rotation.cpp
- fix order of struct initializers to match order of field
declarations, for example in list_triggers.cpp, function
cmd_list_triggers
- rename some things to avoid clashing with keywords, for example in
runas.h
Change-Id: Id743b141552a412b4104af4dda8969eef5032388
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Wed, 6 Oct 2021 16:30:25 +0000 (12:30 -0400)]
Cleanup: always use sysconf to get the page size
Use 'sysconf(_SC_PAGE_SIZE)' across the code base which is works on all
our supported platforms.
Change-Id: I4231d45e0b03301de1274c0a5a4903cd17b4a80a
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Fri, 16 Jul 2021 17:58:10 +0000 (13:58 -0400)]
Cleanup: namespace 'align' macros
Remove the duplicate ALIGN() and ALIGN_TO() macro and replace them with
namespaced variants 'lttng_align_ceil()' and 'lttng_align_floor()'.
Change-Id: I683baccb4e97874e647cf557bad9653a336f4a6d
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
This page took 0.066533 seconds and 5 git commands to generate.