Fix: relayd: live: erroneous message timestamp observed from live viewer
authorMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fri, 19 Nov 2021 16:12:25 +0000 (11:12 -0500)
committerJérémie Galarneau <jeremie.galarneau@efficios.com>
Fri, 21 Jan 2022 22:57:27 +0000 (17:57 -0500)
Observed issue
==============

Another situation where erroneous message timestamp is observed by the
live viewer. Happens rarely (only two occurrences while running
ust_clear in stress-tests overnight on a 16-core machine).

Triggered with the following test:

  # Test ust streaming live clear with viewer with new metadata after clear
  # Parameters: tracing_active=0, clear_twice=1, buffer_type=pid

Babeltrace 2 error:

  11-19 06:21:02.571 488497 488497 E PLUGIN/SRC.CTF.LTTNG-LIVE handle_late_message@lttng-live.c:1206 [lttng-live] Invalid live stream state: have a late message that is not a packet discarded or event discarded message: late-msg-type=PACKET_BEGINNING
  11-19 06:21:02.578 488497 488497 E PLUGIN/SRC.CTF.LTTNG-LIVE next_stream_iterator_for_trace@lttng-live.c:1360 [lttng-live] Late message could not be handled correctly: lttng-live-msg-iter-addr=0x55b4cdee1590, stream-name="stream-1024", curr-msg-ts=1637302861380968303, last-msg-ts=1637302861472798701
  11-19 06:21:02.578 488497 488497 E PLUGIN/SRC.CTF.LTTNG-LIVE lttng_live_msg_iter_next@lttng-live.c:1802 [lttng-live] Error preparing the next batch of messages: live-iter-status=LTTNG_LIVE_ITERATOR_STATUS_ERROR
  11-19 06:21:02.579 488497 488497 W LIB/MSG-ITER bt_message_iterator_next@iterator.c:866 Component input port message iterator's "next" method failed: iter-addr=0x55b4cdecfe30, iter-upstream-comp-name="lttng-live", iter-upstream-comp-log-level=WARNING, iter-upstream-comp-class-type=SOURCE, iter-upstream-comp-class-name="lttng-live", iter-upstream-comp-class-partial-descr="Connect to an LTTng relay daemon", iter-upstream-port-type=OUTPUT, iter-upstream-port-name="out", status=ERROR

Cause
=====

viewer_get_next_index() does not protect its use of the session
ongoing_rotation state by any synchronization mechanism. It only takes
the stream lock, which is not used to protect changes to the ongoing
rotation state with respect to the chunk rename operation performed by
chunk creation at the beginning of the clear command.

Solution
========

Protect the use of the session ongoing rotation state and file open
operations by the session lock in viewer_get_next_index().

Known drawbacks
===============

I don't expect this to cause any real scalability concern considering
the fact that the relay daemon has only two threads, one to handle
session daemon commands, and the other to handle viewer commands.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I242942ac6bc7a9199a17cd9800fc461a397b8955

src/bin/lttng-relayd/live.c

index 5365e17a78b6549641abfc5a123cd292e1f4d1fb..604d24aa6ac00f98cd6ee8e4b9bff4088b9b95f3 100644 (file)
@@ -1630,6 +1630,13 @@ int viewer_get_next_index(struct relay_connection *conn)
        metadata_viewer_stream =
                        ctf_trace_get_viewer_metadata_stream(ctf_trace);
 
+       /*
+        * Hold the session lock to protect against concurrent changes
+        * to the chunk files (e.g. rename done by clear), which are
+        * protected by the session ongoing rotation state. Those are
+        * synchronized with the session lock.
+        */
+       pthread_mutex_lock(&rstream->trace->session->lock);
        pthread_mutex_lock(&rstream->lock);
 
        /*
@@ -1796,6 +1803,7 @@ int viewer_get_next_index(struct relay_connection *conn)
 send_reply:
        if (rstream) {
                pthread_mutex_unlock(&rstream->lock);
+               pthread_mutex_unlock(&rstream->trace->session->lock);
        }
 
        if (metadata_viewer_stream) {
@@ -1837,6 +1845,7 @@ end:
 
 error_put:
        pthread_mutex_unlock(&rstream->lock);
+       pthread_mutex_unlock(&rstream->trace->session->lock);
        if (metadata_viewer_stream) {
                viewer_stream_put(metadata_viewer_stream);
        }
This page took 0.028612 seconds and 5 git commands to generate.