lttng-tools.git
2 years agoFix: notification: kernel: consumption of event notification stalls
Jonathan Rajotte [Fri, 1 Apr 2022 12:41:17 +0000 (08:41 -0400)] 
Fix: notification: kernel: consumption of event notification stalls

Observed issue
==============

Using:

 lttng add-trigger --condition event-rule-matches --type kernel:tracepoint --name "sched_waking" --capture comm --action notify

The sessiond receives multiple event notifications from the kernel event
source then stop receiving despite the kernel event source buffer
being full.

Cause
=====

It turns out that the kernel event source, when reaching near the end of
its buffer capacity, raises the POLLPRI [1] flag and not the POLLIN
flag.

Solution
========

lttng-modules stretches a bit the usage of POLLPRI as defined by the man
page (man 2 poll):

 There is some exceptional condition on the  file  descriptor. Possibilities
 include:

 *  There is out-of-band data on a TCP socket (see tcp(7)).

 *  A  pseudoterminal  master  in  packet  mode has seen a state change on the
    slave (see ioctl_tty(2)).

 *  A cgroup.events file has been modified (see cgroups(7)).

Still, even if lttng-modules changes how it does things, lttng-tools
needs to support other lttng-modules versions.

Thus, add LPOLLPRI (EPOLLPRI/POLLPRI) to the event mask when dealing
with notification event sources.

Note
=====

In the future, during the poll loop we could also prioritize
event sources in POLLPRI 'state'.

Known drawbacks
=========

None.

References
==========

[1] https://github.com/lttng/lttng-modules/blob/c312bda00d2dc10ce5f6c1189acbefee5c6c8c6c/src/lttng-abi.c#L1169

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ieb428ef1037c8eb197b489a38a1ae5216ac63d4b

2 years agoFix: notification: assert on len > 0 for dropped notification message
Jonathan Rajotte [Thu, 31 Mar 2022 13:46:15 +0000 (09:46 -0400)] 
Fix: notification: assert on len > 0 for dropped notification message

Observed issue
==============

Using the notification client from
doc/examples/trigger-condition-event-matches/notification-client.cpp, an
assert is hit when the notification subsystem is under load.

 #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
 #1  0x00007f69eab58859 in __GI_abort () at abort.c:79
 #2  0x00007f69eab58729 in __assert_fail_base (fmt=0x7f69eacee588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7f69eae1d5dd "len > 0", file=0x7f69eae1d5cb "unix.cpp", line=179, function=<optimized out>) at assert.c:92
 #3  0x00007f69eab6a006 in __GI___assert_fail (assertion=0x7f69eae1d5dd "len > 0", file=0x7f69eae1d5cb "unix.cpp", line=179, function=0x7f69eae1d598 "ssize_t lttcomm_recv_unix_sock(int, void*, size_t)") at assert.c:101
 #4  0x00007f69eadd5fe6 in lttcomm_recv_unix_sock (sock=3, buf=0x55da9ecd5f89, len=0) at unix.cpp:179
 #5  0x00007f69ead7df3f in receive_message (channel=0x55da9ecd6ee0) at channel.cpp:64
 #6  0x00007f69ead7e478 in lttng_notification_channel_get_next_notification (channel=0x55da9ecd6ee0, _notification=0x7ffdefed2570) at channel.cpp:279
 #7  0x000055da9e0e742f in main (argc=2, argv=0x7ffdefed2698) at notification-client.cpp:506

 (gdb) frame
 #5  0x00007f69ead7df3f in receive_message (channel=0x55da9ecd6ee0) at channel.cpp:64
 64              ret = lttcomm_recv_unix_sock(channel->socket,

 (gdb) print msg
 $2 = {type = 5 '\005', size = 0, fds = 0, payload = 0x7ffdefed24a8 ""}

The msg type 5 is
`LTTNG_NOTIFICATION_CHANNEL_MESSAGE_TYPE_NOTIFICATION_DROPPED`

Cause
=====

The msg portion of a
`LTTNG_NOTIFICATION_CHANNEL_MESSAGE_TYPE_NOTIFICATION_DROPPED` is indeed
zero. There is no extra payload.

Solution
========

When the msg size is zero, skip the 'payload' reception phase.

Known drawbacks
=========

None.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ibabb922d0e410c9902414a5eabbe04738861d772

2 years agoFix: example: print_notification is called on status all returned status
Jonathan Rajotte [Thu, 31 Mar 2022 13:44:24 +0000 (09:44 -0400)] 
Fix: example: print_notification is called on status all returned status

The notification should only be printed for
`LTTNG_NOTIFICATION_CHANNEL_STATUS_OK`.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I5534406d8fbd5c0fff7013fda6335d54bef071a2

2 years agoFix: sessiond: assertion hit in ltt_sessions_ht_empty
Jonathan Rajotte [Mon, 28 Mar 2022 20:49:17 +0000 (16:49 -0400)] 
Fix: sessiond: assertion hit in ltt_sessions_ht_empty

Observed issue
==============

Scenario:

gdb lttng-sessiond
  set non-stop
  break rotation-thread.cpp:584
  ^ simulates a slow rotation thread or not scheduled thread.

lttng create test1
lttng enable-event -u -a
lttng start test1
lttng create test2
lttng enable-event -u -a
lttng start test2
lttng destroy test1
   This will hang on rotation pending checks on the CLI side.

In another shell:

lttng destroy test2
   This will hang on rotation pending checks on the CLI side.

Back to gdb
   thread 7
   continue

Results in:

 #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
 #1  0x00007ffff786c859 in __GI_abort () at abort.c:79
 #2  0x00007ffff786c729 in __assert_fail_base (fmt=0x7ffff7a02588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5555556bb148 "count == lttng_ht_get_count(ltt_sessions_ht_by_name)", file=0x5555556bae9f "session.cpp", line=395, function=<optimized out>) at assert.c:92
 #3  0x00007ffff787e006 in __GI___assert_fail (assertion=0x5555556bb148 "count == lttng_ht_get_count(ltt_sessions_ht_by_name)", file=0x5555556bae9f "session.cpp", line=395, function=0x5555556bb129 "int ltt_sessions_ht_empty()") at assert.c:101
 #4  0x0000555555586d59 in ltt_sessions_ht_empty () at session.cpp:395
 #5  0x0000555555586e53 in del_session_ht (ls=0x7fffdc000c30) at session.cpp:418
 #6  0x0000555555588a95 in session_release (ref=0x7fffdc001e50) at session.cpp:999
 #7  0x000055555558620f in urcu_ref_put (ref=0x7fffdc001e50, release=0x5555555886eb <session_release(urcu_ref*)>) at /home/joraj/lttng/master/install/include/urcu/ref.h:68
 #8  0x0000555555588c8f in session_put (session=0x7fffdc000c30) at session.cpp:1048
 #9  0x00005555555bf995 in handle_job_queue (handle=0x55555575d260, state=0x7fffeeffc240, queue=0x555555758960) at rotation-thread.cpp:612
 #10 0x00005555555c05da in thread_rotation (data=0x55555575d260) at rotation-thread.cpp:847
 #11 0x00005555555c3b1c in launch_thread (data=0x55555575d2f0) at thread.cpp:66
 #12 0x00007ffff7a46609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 #13 0x00007ffff7969163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Other scenarios can lead to a similar backtrace when using the
`--no-wait` lttng destroy option.

Cause
=====

Since ed41e5709047ef545aa28082416e641e003b45e0 [1], hash table removal
for the session object for the `ltt_sessions_ht_by_name` and
`ltt_sessions_ht_by_name` are "decoupled". Removal from
`ltt_sessions_ht_by_name` is done early in `session_destroy()` while
removal from `ltt_sessions_ht_by_id` is done during `session_release` when
the last reference of a session object is released.

This can leads to `imbalances` between the size of the two hash tables
when multiple sessions are at play.

Solution
========

Rework `ltt_sessions_ht_empty()` to exit early when
`ltt_sessions_ht_by_id` is not empty. Perform a sanity check on
`ltt_sessions_ht_by_name` only when `ltt_sessions_ht_by_id` is empty.

Note
========

Ideally both hash tables' lifetime would be managed separately but it
seems easier in term of initialization to bundle them together for now
considering the limited scope of the `ltt_sessions_ht_by_name` hash
table.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I66c459f80298f929add703ac977cccd1da6dd556

2 years agoFix: tests: missing _GNU_SOURCE for F_GETPIPE_SZ
Jonathan Rajotte [Thu, 31 Mar 2022 15:20:01 +0000 (11:20 -0400)] 
Fix: tests: missing _GNU_SOURCE for F_GETPIPE_SZ

Per man 2 fcntl:

  F_GETOWN_EX,  F_SETOWN_EX,  F_SETPIPE_SZ,  F_GETPIPE_SZ,  F_GETSIG,  F_SETSIG,
  F_NOTIFY,   F_GETLEASE,   and  F_SETLEASE  are  Linux-specific.   (Define  the
  _GNU_SOURCE macro to obtain these definitions.)

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I2b61dfb79ffa384dc2bab56cd3510ddc6ae21e85

2 years agoFix: compat: 'LTTNG_UST_ABI_PROCNAME_LEN' is undeclared
Jonathan Rajotte [Tue, 29 Mar 2022 20:31:44 +0000 (16:31 -0400)] 
Fix: compat: 'LTTNG_UST_ABI_PROCNAME_LEN' is undeclared

Observed issue
==============

On old systems, the `lttng_pthread_setname_np` function fallsback to
using the compat prctl version. In that context,
`LTTNG_UST_ABI_PROCNAME_LEN` is indeed not declared.

Solution
========

Use `LTTNG_PTHREAD_NAMELEN`. This mimics what is done in other versions
of `lttng_pthread_setname_np`.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I73956cacd7b1e9400881b17b1cd89db2530d3a00

2 years agocommon: prevent using memset on non-POD types
Simon Marchi [Wed, 8 Sep 2021 22:00:42 +0000 (18:00 -0400)] 
common: prevent using memset on non-POD types

While converting some code to use C++ constructs, it can be easy to
forget to change some spot that uses memset to initialize or move the
object. Add a templated deleted declaration to prevent using memset on
types that aren't POD.

For example, if I make lttng_ust_event non-POD, in
src/bin/lttng-sessiond/trace-ust.h, I get this error:

      CXX      save.lo
    /home/simark/src/lttng-tools/src/bin/lttng-sessiond/save.cpp: In function ‘int save_agent_events(config_writer*, agent*)’:
    /home/simark/src/lttng-tools/src/bin/lttng-sessiond/save.cpp:1246:23: error: use of deleted function ‘void* memset(T*, int, size_t) [with T = ltt_ust_event; <template-parameter-1-2> = void; size_t = long unsigned int]’
     1246 |                 memset(&fake_event, 0, sizeof(fake_event));
          |                 ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    In file included from /home/simark/src/lttng-tools/src/common/defaults.h:14,
                     from /home/simark/src/lttng-tools/src/bin/lttng-sessiond/save.cpp:15:
    /home/simark/src/lttng-tools/src/common/macros.h:128:7: note: declared here
      128 | void *memset(T *s, int c, size_t n) = delete;
          |       ^~~~~~

Note: I tried applying this to memcpy as well, but Clang gave me some
troubles with its -Waddress-of-packed-member diagnostic, so I gave up.

Change-Id: Id55735db15901c6fc5d58e9b6b6b689733302398
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoTests: validate_xml: leak of xml document instance
Jérémie Galarneau [Thu, 31 Mar 2022 02:59:44 +0000 (22:59 -0400)] 
Tests: validate_xml: leak of xml document instance

`doc` is never free'd when validating an XML.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ia0b541963350aeb6610382fd3226ffd37ab4847e

2 years agoAdd type-checked versions of allocation and deallocations functions
Simon Marchi [Wed, 17 Nov 2021 02:36:17 +0000 (21:36 -0500)] 
Add type-checked versions of allocation and deallocations functions

A common mistake when porting things from C to C++ is to use malloc for
types that have a on-trivial constructor (and to not call the
constructor explicitly either). For example:

struct foo {
std::vector<int> field;
};

foo *f = (foo *) zmalloc(sizeof(*f));

This allocates a `foo` without calling the constructor, leaving it in an
invalid state. Same idea when free-ing with free something that has a
non-trivial destructor.

To avoid this, I suggest adding templated allocation functions that we
will use throughout, that verify if the given type is safe to malloc
(and generate a compilation failure if not). The existing code barely
needs changes. For example:

- (foo *) zmalloc(sizeof(*f))
+ zmalloc<foo>()

For simplicity I propose that as soon as a type is non-POD
(std::is_pod<T>::value is false), we prevent using malloc/free on it. It
would be ok in theory to allocate such a type with malloc and free with
free, but call the constructor (using placement-new) and destructor
explicitly, but I don't see why we would want to do that. It might also
be technically more correct to use a combination of
std::is_trivially_constructible and std::is_trivially_destructible
(std::is_pod being not fine-grained enough), but using std::is_pod just
keeps things simpler.

This patch introduces the following templated allocation functions:

1. zmalloc<T>()
2. zmalloc<T>(size)
3. malloc<T>()
4. malloc<T>(size)
5. calloc<T>(nmemb)

1. Allocate one T, zero-initialized
2. Allocate a buffer of size `size`, zero-initialized, this is used when
   the caller calculates the size to allocate, like when using flexible
   array members
3. Same as 1, but without the zero-initialization
4. Same as 2, but without the zero-initialization
5. Allocate an array of `nmemb` elements of type T, zero-initialized

For the de-allocation side, add templated `free` function declaration
that uses std::enable_if (SFINAE) to declare a deleted prototype if the
type T isn't safe to free (causing a compilation error).

There are a lot of places where we pass pointers to void to free. These
can't be checked, as we don't know what type of object the pointer
really points to. We could forbid that and fix all callers to pass a
typed pointer, but that seems a bit too much to chew for the moment. So
for now, simply accept that freeing pointers to void won't be checked.
It's a best effort.

As an example, if I add an explicit constructor to type ctf_trace (in
src/bin/lttng-relayd/ctf-trace.h), I get the following errors with
clang. For the allocation:

/home/simark/src/lttng-tools/src/common/macros.h:57:2: error: static_assert failed due to requirement 'std::is_pod<ctf_trace>::value' "type is POD"
static_assert (std::is_pod<T>::value, "type is POD");
^              ~~~~~~~~~~~~~~~~~~~~~
/home/simark/src/lttng-tools/src/bin/lttng-relayd/ctf-trace.cpp:84:10: note: in instantiation of function template specialization 'zmalloc<ctf_trace>' requested here
trace = zmalloc<ctf_trace>();
^

For the de-allocation:

/home/simark/src/lttng-tools/src/bin/lttng-relayd/ctf-trace.cpp:29:2: error: call to deleted function 'free'
free(trace);
^~~~
/home/simark/src/lttng-tools/src/common/macros.h:125:6: note: candidate function [with T = ctf_trace, $1 = void] has been explicitly deleted
void free(T *p) = delete;
     ^
/usr/include/stdlib.h:565:13: note: candidate function
extern void free (void *__ptr) __THROW;
    ^

Change-Id: I246a9113d08fa36b81a49137f4e80a5e808de913
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agosessiond: document RCU locking assumption during channel metadata statedump
Jérémie Galarneau [Wed, 30 Mar 2022 13:23:46 +0000 (09:23 -0400)] 
sessiond: document RCU locking assumption during channel metadata statedump

The rcu read lock must be held by the caller during a call to
ust_metadata_channel_statedump. An assertion and a comment are added.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ia57a140d51470cc43cf62d36c9b4b552e1c17191

2 years agoFix: lttng-sessiond: output stream metadata before events
Simon Marchi [Thu, 6 Jan 2022 18:24:59 +0000 (13:24 -0500)] 
Fix: lttng-sessiond: output stream metadata before events

When trying the `doc/examples/demo` example from the lttng-ust
repository, the resulting trace's metadata lists some events before the
corresponding stream declaration. Here's an excerpt:

    event {
            name = "lttng_ust_statedump:end";
        id = 5;
        stream_id = 0;
        loglevel = 13;
        fields := struct {
            };
    };

    stream {
            id = 0;
            event.header := struct event_header_large;
            packet.context := struct packet_context;
    };

I don't know if this is allowed in CTF 1, but it won't be in CTF 2 (an
event record class fragment must come after its parent data stream class
fragment). In any case, I think it makes more sense to have the stream
first.

What I can see is that the ust_metadata_event_statedump function (which
emits the `event` declarations) is called for the statedump events
before the ust_metadata_channel_statedump function (which emits the
`stream` declaration) is called. A simple fix, as implemented in this
patch, is to delay emitting the event declarations until the stream
declaration has been emitted. To do so, return early in
ust_metadata_event_statedump if the `chan->metadata_dumped` flag is not
set. Then, when emitting the stream declaration, in
ust_metadata_event_statedump, emit any existing event, which have
presumably been skipped before hand.

It's possible that ust_metadata_event_statedump getting called before
ust_metadata_channel_statedump is a symptom of some more fundamental
problem over which this patch only papers over, but I don't know enough
about this to be able to tell.

I couldn't think of an appropriate test to write for this. However, once
we generate CTF2, such a bug would likely be caught by trace readers
rejecting the invlid metadata. So if we were to re-introduce this bug,
we would notice.

Change-Id: I6e3158c801fcc01b318618890704d19b3230e7a5
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agolttng-sessiond: remove goto in ust_metadata_channel_statedump
Simon Marchi [Thu, 6 Jan 2022 18:27:55 +0000 (13:27 -0500)] 
lttng-sessiond: remove goto in ust_metadata_channel_statedump

A follow-up patch uses an std::vector declared in the middle of
ust_metadata_channel_statedump. This isn't compatible with the
goto-based error handling, since gotos should not jump over object
initialization (otherwise, the object gets destroyed without having been
constructed).

Moving the std::vector declaration to the beginning of the function
would work, but it would be a pessimization: we would construct an
object that we may not need, depending on the code path taken. We
therefore want to declare (and construct) the std::vector just before
we need it.

Fix this by replacing gotos with return statements.

Also, add a `ret` check after the last lttng_metadata_printf call. If
this call failed, for some reason, we would return an error, but still
set chan->metadata_dumped. That makes this case different than the other
error paths in the function, where chan->metadata_dumped doesn't get
set. Adding the check makes this case like the other ones.

Change-Id: Iba81422a7c3bac96a8d209bba6b4d53ad26b3e4e
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agolttng-sessiond: rename ust_registry_channel::ht to events
Simon Marchi [Thu, 6 Jan 2022 17:49:09 +0000 (12:49 -0500)] 
lttng-sessiond: rename ust_registry_channel::ht to events

For clarity, rename the field to "events" to indicate that it contains
the channel's events.

Change-Id: I0bd90c13d7c8e313fff72eb18d0d7ebfc23762d4
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoClean-up: Remove remaining max_t macros
Simon Marchi [Wed, 15 Dec 2021 19:23:05 +0000 (14:23 -0500)] 
Clean-up: Remove remaining max_t macros

I found two remaining max_t macros. Remove them, and adjust one call
site that was still using that.

Change-Id: Icaedcaea1a88e87262bfa544691db398a1bfd203
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoRename C++ header files to .hpp
Simon Marchi [Wed, 15 Dec 2021 20:13:05 +0000 (15:13 -0500)] 
Rename C++ header files to .hpp

Rename all C++ header files (include/**/*-internal.h, src/**/*.h except
argpar and msgpack, some headers in tests) to have the .hpp extension.

Doing so highlights that we include some C++ header files in some test
files still compiled as C. This is ok for now, as the files they include
don't actually contain C++ code incompatible with C yet, but they could
eventually. This is something we can fix later.

Change-Id: I8bf326b6b2946a3e26704f3ef3ac5831bbe9bc26
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoClean-up: sessiond: cmd_enable_channel_internal
Jérémie Galarneau [Fri, 25 Mar 2022 19:43:56 +0000 (15:43 -0400)] 
Clean-up: sessiond: cmd_enable_channel_internal

After catching an error code mixup in cmd_enable_channel_internal, this
change explicitly sets the return type of cmd_enable_channel_internal
and of its callees to `enum lttng_error_code` since those already return
these values as integers.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ic3c042c2ee3d259cc694e6aaf3a1a2f3ca843042

2 years agoFix: sessiond: cmd_enable_channel: negative error code used
Jérémie Galarneau [Fri, 25 Mar 2022 19:34:47 +0000 (15:34 -0400)] 
Fix: sessiond: cmd_enable_channel: negative error code used

A negative `lttng_error_code` value is returned (as an integer)
when a channel copy fails. Return a positive error code.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I340f739cc33858a06832bb75a7a6d5e18459551f

2 years agoClean-up: remove unused makefile statements
Michael Jeanson [Mon, 22 Nov 2021 20:58:38 +0000 (15:58 -0500)] 
Clean-up: remove unused makefile statements

Change-Id: Ic8b6a68d64b866f177e6aa02a15c9930b468238a
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoUpdate vendored msgpack-c to 4.0.0
Michael Jeanson [Thu, 24 Mar 2022 18:29:36 +0000 (14:29 -0400)] 
Update vendored msgpack-c to 4.0.0

The upstream changes from 3.3.0 to 4.0.0 :

  * Fix and improve alignment logic (#962)
  * Fix iovec name conflict (#953)
  * Fix empty string print (#942)
  * Fix buffer ptr size (#899)
  * Fix UB. Check null pointer before using memcpy() (#890)

Change-Id: Ifc4d7de43d0f11d6331d98d7cfa93227f8b756bc
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoFix: doc: action: wrong function documented for action_list destroy
Jonathan Rajotte [Thu, 24 Mar 2022 20:05:20 +0000 (16:05 -0400)] 
Fix: doc: action: wrong function documented for action_list destroy

The lttng_action_list_destroy function is internal.

API users must use `lttng_action_destroy()` to destroy the returned
object of `lttng_action_list_create()`.

Change-Id: Ic910efd07dd071f7e38e48d34a5e000b3f805729
Reported-by: Michael Jeason <mjeanson@efficios.com>
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoFix: lttng-elf: wrong error label used by error path
Jérémie Galarneau [Fri, 18 Mar 2022 14:58:03 +0000 (10:58 -0400)] 
Fix: lttng-elf: wrong error label used by error path

1486805 Resource leak
The system resource will not be reclaimed and reused, reducing the future availability of the resource.

In lttng_elf_get_symbol_offset: Leak of memory or pointers to system resources (CWE-404)

Reported-by: Coverity Scan
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I78b868773b389c2eaf3f1d45151fe9416b1fe447

2 years agoClean-up: fix '-Wformat' warnings on various platforms
Michael Jeanson [Wed, 16 Mar 2022 18:18:36 +0000 (14:18 -0400)] 
Clean-up: fix '-Wformat' warnings on various platforms

Change-Id: I39a2dd8bb4f1f6654a65f9fab8d5ac74439a4410
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoClean-up: fix '-Wimplicit-fallthrough' warnings on various platforms
Michael Jeanson [Wed, 16 Mar 2022 16:06:16 +0000 (12:06 -0400)] 
Clean-up: fix '-Wimplicit-fallthrough' warnings on various platforms

Change-Id: I80ad6bebcd2eed9c3e83ef3b750254d1fd98e95d
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoClean-up: fix '-Wnused-parameter' warnings on various platforms
Michael Jeanson [Wed, 16 Mar 2022 15:57:24 +0000 (11:57 -0400)] 
Clean-up: fix '-Wnused-parameter' warnings on various platforms

Change-Id: I35bd06414fd8407b2f281789ac2e419f40a08fa2
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoClean-up: fix '-Wundef' warnings on various platforms
Michael Jeanson [Wed, 16 Mar 2022 15:46:49 +0000 (11:46 -0400)] 
Clean-up: fix '-Wundef' warnings on various platforms

Change-Id: I8dfffd2ad5eb55a0b8fe74d29a82a224da19f30a
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoClean-up: silence warnings for generated code on RHEL8
Michael Jeanson [Wed, 16 Mar 2022 14:46:33 +0000 (10:46 -0400)] 
Clean-up: silence warnings for generated code on RHEL8

  lttng_wrap.lo -MD -MP -MF .deps/lttng_wrap.Tpo -c lttng_wrap.c  -fPIC -DPIC -o .libs/lttng_wrap.o
  lttng_wrap.c:1824:23: warning: cast between incompatible function types from ‘PyObject * (*)(PyObject *)’ {aka ‘struct _object * (*)(struct _object *)’} to ‘PyObject * (*)(PyObject *, PyObject *)’ {aka ‘struct _object * (*)(struct _object *, struct _object *)’} [-Wcast-function-type]
     {(char *)"disown",  (PyCFunction)SwigPyObject_disown,  METH_NOARGS,  (char *)"releases ownership of the pointer"},
                         ^

Change-Id: I9258a58317814fdc94c8ce3c76e615b73aaf4199
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoFix: use the correct endian compat macros
Michael Jeanson [Wed, 16 Mar 2022 15:40:52 +0000 (11:40 -0400)] 
Fix: use the correct endian compat macros

Document which variant of the endian macros our compat header guarantees
across all platforms and fix incorrect uses.

This was discovered with -Wundef on macOS.

Change-Id: Iaf442fe5887063661273ac2a00c9fa4015e83d5c
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoClean-up: tests: silence bogus warning
Jérémie Galarneau [Wed, 16 Mar 2022 21:55:45 +0000 (17:55 -0400)] 
Clean-up: tests: silence bogus warning

1486757 Buffer not null terminated
If the buffer is treated as a null terminated string in later operations, a buffer overflow or over-read may occur.

In test_create_ust_event_exclusion(): The string buffer may not have a null terminator if the source string's length is equal to the buffer size (CWE-170)

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I0c3fd6c9d591e1c67d8b80bf12825bb0d1520a65

2 years agoFix: tests: uninitialized lttng_payload
Jérémie Galarneau [Wed, 16 Mar 2022 21:39:07 +0000 (17:39 -0400)] 
Fix: tests: uninitialized lttng_payload

1474980 Uninitialized pointer read
Incorrect values could be read from, or even written to, an arbitrary memory location, causing incorrect computations.

In test_event_rule_userspace_probe(): Reads an uninitialized pointer or its target (CWE-457)

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I7645278f18e4a4678fb5ede9523d0cfa8d3aa106

2 years agoFix: sessiond: ust-app: uninitialized name logged on stream copy failure
Jérémie Galarneau [Wed, 16 Mar 2022 21:35:43 +0000 (17:35 -0400)] 
Fix: sessiond: ust-app: uninitialized name logged on stream copy failure

1466302 Uninitialized scalar variable
The variable will contain an arbitrary value left from earlier computations.

In send_channel_uid_to_ust(buffer_reg_channel *, ust_app *, ust_app_session *, ust_app_channel *): Use of an uninitialized variable (CWE-457)

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Icd38de3b67dab783fa26a721c68c48ebfbb59785

2 years agoFix: lttng-elf: untrusted entry size divisor
Jérémie Galarneau [Wed, 16 Mar 2022 21:29:11 +0000 (17:29 -0400)] 
Fix: lttng-elf: untrusted entry size divisor

1405557 Untrusted divisor
The divisor could be controlled by an attacker, who could cause a division by zero.

In lttng_elf_get_symbol_offset: An unscrutinized value from an untrusted source used as a divisor (CWE-369)

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I029708a0df4f62fe0031e374d50839c26f4f3f4b

2 years agoFix: tests: test definitions arrays contain invalid data
Jonathan Rajotte [Fri, 11 Mar 2022 18:39:58 +0000 (13:39 -0500)] 
Fix: tests: test definitions arrays contain invalid data

Observed issue
==============

The long_regression Ci job fails on test_thread_stall.

 11:17:16 # export LTTNG_SESSION_CONFIG_XSD_PATH=/home/jenkins/workspace/lttng-tools_master_long_regression/arch/amd64/babeltrace_version/stable-2.0/build/std/conf/std/liburcu_version/master/test_type/full/src/lttng-tools/tests/../src/common/
 11:17:16 # env /home/jenkins/workspace/lttng-tools_master_long_regression/arch/amd64/babeltrace_version/stable-2.0/build/std/conf/std/liburcu_version/master/test_type/full/src/lttng-tools/tests/../src/bin/lttng-sessiond/lttng-sessiond --background --consumerd64-path=/home/jenkins/workspace/lttng-tools_master_long_regression/arch/amd64/babeltrace_version/stable-2.0/build/std/conf/std/liburcu_version/master/test_type/full/src/lttng-tools/tests/../src/bin/lttng-consumerd/lttng-consumerd 1
 11:17:16 ok 16 - Start session daemon
 11:17:16 # Check after running for 30 seconds
 11:17:16 not ok 17 - Validation failure
 11:17:16 #   Failed test 'Validation failure'
 11:17:16 #   in /home/jenkins/workspace/lttng-tools_master_long_regression/arch/amd64/babeltrace_version/stable-2.0/build/std/conf/std/liburcu_version/master/test_type/full/src/lttng-tools/tests/regression/tools/health/../../../utils/tap/tap.sh:fail() at line 159.
 11:17:16 # Health returned:
 11:17:16 # stdout:
 11:17:16 # stderr:
 11:17:16 # Killing (signal SIGKILL) lttng-sessiond and lt-lttng-sessiond pids: 1840601 1840602
 11:17:16 ok 18 - Wait after kill session daemon

 ...

 17:57:01 # Test health problem detection with LTTNG_RELAYD_THREAD_DISPATCHER
 17:57:01 # Start session daemon
 17:57:01 # export LTTNG_SESSION_CONFIG_XSD_PATH=/home/jenkins/workspace/lttng-tools_master_long_regression/arch/amd64/babeltrace_version/stable-2.0/build/std/conf/std/liburcu_version/master/test_type/full/src/lttng-tools/tests/../src/common/
 17:57:01 # env /home/jenkins/workspace/lttng-tools_master_long_regression/arch/amd64/babeltrace_version/stable-2.0/build/std/conf/std/liburcu_version/master/test_type/full/src/lttng-tools/tests/../src/bin/lttng-sessiond/lttng-sessiond --background --consumerd64-path=/home/jenkins/workspace/lttng-tools_master_long_regression/arch/amd64/babeltrace_version/stable-2.0/build/std/conf/std/liburcu_version/master/test_type/full/src/lttng-tools/tests/../src/bin/lttng-consumerd/lttng-consumerd 1
 17:57:01 ok 38 - Start session daemon
 17:57:01 # /home/jenkins/workspace/lttng-tools_master_long_regression/arch/amd64/babeltrace_version/stable-2.0/build/std/conf/std/liburcu_version/master/test_type/full/src/lttng-tools/tests/regression/tools/health/../../../../src/bin/lttng/lttng create health_thread_stall --no-output
 17:57:01 ok 39 - Create session health_thread_stall in no-output mode
 17:57:01 # With UST consumer daemons
 17:57:01 # /home/jenkins/workspace/lttng-tools_master_long_regression/arch/amd64/babeltrace_version/stable-2.0/build/std/conf/std/liburcu_version/master/test_type/full/src/lttng-tools/tests/regression/tools/health/../../../../src/bin/lttng/lttng enable-event tp:tptest -c testchan -s health_thread_stall -u
 17:57:01 ok 40 - Enable ust event tp:tptest for session health_thread_stall
 17:57:01 ok 41 # skip: Root access is needed. Skipping kernel consumer health check test.
 17:57:01 # /home/jenkins/workspace/lttng-tools_master_long_regression/arch/amd64/babeltrace_version/stable-2.0/build/std/conf/std/liburcu_version/master/test_type/full/src/lttng-tools/tests/regression/tools/health/../../../../src/bin/lttng/lttng start health_thread_stall
 17:57:01 ok 42 - Start tracing for session health_thread_stall
 17:57:01 # Check after running for 30 seconds
 17:57:01 not ok 43 - Validation failure
 17:57:01 #   Failed test 'Validation failure'
 17:57:01 #   in /home/jenkins/workspace/lttng-tools_master_long_regression/arch/amd64/babeltrace_version/stable-2.0/build/std/conf/std/liburcu_version/master/test_type/full/src/lttng-tools/tests/regression/tools/health/../../../utils/tap/tap.sh:fail() at line 159.
 17:57:01 # Health returned:
 17:57:01 # stdout:
 17:57:01 # stderr:
 17:57:01 # Killing (signal SIGTERM) lttng-consumerd pids: 690297 690299
 17:57:01 Error: consumer closed the command socket
 17:57:01 Error: Health error occurred in thread_consumer_management
 17:57:01 ok 44 - Wait after kill consumer daemon

Cause
=====

After investigation, commit 3c3390532736cfb5198f863d0d2b218e21fcf76d [1]
introduces the test regression.

Albeit [1] removes `LTTNG_SESSIOND_THREAD_HT_CLEANUP` from the `THREAD`
array and the corresponding error message in `ERROR_STRING`, it does not
modify the `NEEDS_ROOT`, `TEST_CONSUMERD` and `TEST_RELAYD` arrays.

Also the test count is not adjusted to reflect the removal of the
`THREAD` element.

Solution
========

Remove the unused data from `NEEDS_ROOT`, `TEST_CONSUMERD` and
`TEST_RELAYD` and adjust the test count.

Known drawbacks
=========

None.

References
==========

[1] https://github.com/lttng/lttng-tools/commit/3c3390532736cfb5198f863d0d2b218e21fcf76d

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I9c16fa8d76b41f1a28fd342d9f076969f4ff1b13

2 years agoClean-up: exclusions: use LTTNG_EVENT_EXCLUSION_NAME_AT util
Jérémie Galarneau [Wed, 16 Mar 2022 20:35:30 +0000 (16:35 -0400)] 
Clean-up: exclusions: use LTTNG_EVENT_EXCLUSION_NAME_AT util

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I59da48be1a2d905d3a7ff485d353fdacddda784d

2 years agoClean-up: lttng-ctl: strnlen out of bounds access
Jérémie Galarneau [Tue, 15 Mar 2022 21:19:27 +0000 (17:19 -0400)] 
Clean-up: lttng-ctl: strnlen out of bounds access

gcc 11.2 produces the following warning. The lttng_strncpy helper
assumes that 'src' is a null terminated string. As such, the use of a
string literal (of size 37) in this specific example is correct as
strnlen will not read beyond the null terminator.

Replacing strnlen by strlen eliminates this warning. strnlen was used to
short-circuit the source length check when it was larger than the
destination. This optimization is unlikely to matter. Pascal-style
strings should be used when string length computations are expected to
be prohibitively expensive.

In file included from ../../../src/common/macros.h:15,
                 from ../../../include/lttng/health-internal.h:18,
                 from lttng-ctl-health.cpp:19:
In function 'size_t lttng_strnlen(const char*, size_t)',
    inlined from 'int lttng_strncpy(char*, const char*, size_t)' at ../../../src/common/macros.h:123:19,
    inlined from 'int set_health_socket_path(lttng_health*, int)' at lttng-ctl-health.cpp:198:22,
    inlined from 'int lttng_health_query(lttng_health*)' at lttng-ctl-health.cpp:319:30:
../../../src/common/compat/string.h:19:23: warning: 'size_t strnlen(const char*, size_t)' specified bound 4096 may exceed source size 37 [-Wstringop-overread]
   19 |         return strnlen(str, max);
      |                ~~~~~~~^~~~~~~~~~

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I290109433fcae7073321f1b48ecfbb2ec6e4ad26

2 years agoClean-up: sessiond-comm: out of bounds access warning
Jérémie Galarneau [Tue, 15 Mar 2022 21:13:03 +0000 (17:13 -0400)] 
Clean-up: sessiond-comm: out of bounds access warning

gcc 11.2 produces the two following warnings. In both case, setting an
array's dimension to zero is used to express a variable length array of
names that are LTTNG_SYMBOL_NAME_LEN bytes long. gcc doesn't know about
this and correctly points out that an access is taking place outside of
the array's bounds.

Omit the '0' dimension to work around this warning.

event.cpp: In function 'ssize_t lttng_event_create_from_payload(lttng_payload_view*, lttng_event**, lttng_event_exclusion**, char**, lttng_bytecode**)':
event.cpp:320:62: warning: array subscript i is outside array bounds of 'char [0][256]' [-Warray-bounds]
  320 |                 ret = lttng_strncpy(local_exclusions->names[i],
      |                                     ~~~~~~~~~~~~~~~~~~~~~~~~~^
In file included from event.cpp:16:
../../src/common/sessiond-comm/sessiond-comm.h:569:14: note: while referencing 'lttng_event_exclusion::names'
  569 |         char names[0][LTTNG_SYMBOL_NAME_LEN];
      |              ^~~~~

event-rule/user-tracepoint.cpp: In function 'lttng_event_rule_generate_exclusions_status lttng_event_rule_user_tracepoint_generate_exclusions(const lttng_event_rule*, lttng_event_exclusion**)':
event-rule/user-tracepoint.cpp:383:61: warning: array subscript i is outside array bounds of 'char [0][256]' [-Warray-bounds]
  383 |                 copy_ret = lttng_strncpy(exclusions->names[i], exclusion_str,
      |                                          ~~~~~~~~~~~~~~~~~~~^
In file included from ../../src/common/runas.h:17,
                 from event-rule/user-tracepoint.cpp:17:
../../src/common/sessiond-comm/sessiond-comm.h:569:14: note: while referencing 'lttng_event_exclusion::names'
  569 |         char names[0][LTTNG_SYMBOL_NAME_LEN];
      |              ^~~~~

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I260185f2baf085ca4486ce3b13696ee5fa55938a

2 years agoFix: event: erroneous bound check on perf counter name size
Jérémie Galarneau [Wed, 16 Mar 2022 15:56:21 +0000 (11:56 -0400)] 
Fix: event: erroneous bound check on perf counter name size

The wrong size if used when initializing a perf counter name from a
payload. The destination size must be used to prevent out of bound
writes.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I8ea41d30815bd2c02bb2ad8b01e8cecd2d6549a8

2 years agoFix: sessiond: event name length check is too strict
Jérémie Galarneau [Wed, 16 Mar 2022 15:55:08 +0000 (11:55 -0400)] 
Fix: sessiond: event name length check is too strict

A truncation check when initializing an event from an event rule limits
the name to one less character than is supposed to be allowed.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I172c5098597923d14508c150c7b3577f759bae72

2 years agoClean-up: use sizeof instead of repeating string length constant
Jérémie Galarneau [Wed, 16 Mar 2022 15:52:42 +0000 (11:52 -0400)] 
Clean-up: use sizeof instead of repeating string length constant

Looking into a number of coverity reports (false positives), I
identified a number of sites which use the maximal symbol length
constant when the actual size of an array can be used. This will prevent
mismatches in the future should the array sizes change.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ia74f43d3871fdce60affbde068401b58c84b09ad

2 years agoFix: relayd: missing session unlock on error path
Jérémie Galarneau [Wed, 16 Mar 2022 14:48:59 +0000 (10:48 -0400)] 
Fix: relayd: missing session unlock on error path

1475890 Missing unlock May result in deadlock if there is another
attempt to acquire the lock.

In viewer_get_new_streams(relay_connection *): Missing a release of a
lock on a path (CWE-667)

Reported-by: Coverity Scan
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I66de344f5f39ac85bf8db93cf39a07d0c6cf7694

2 years agoBring compiler warning flags in line with other projects
Michael Jeanson [Mon, 7 Mar 2022 18:59:46 +0000 (13:59 -0500)] 
Bring compiler warning flags in line with other projects

Change-Id: I0281a357afbace553368cd01357bb2f21de3352d
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoPrepare for '-Wsign-compare'
Michael Jeanson [Tue, 8 Mar 2022 23:26:06 +0000 (18:26 -0500)] 
Prepare for '-Wsign-compare'

In preparation for '-Wextra'

Change-Id: I9a3b91009b2b44c0aeacfb37fa2b8b901be79992
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoPrepare for '-Wimplicit-fallthrough'
Michael Jeanson [Tue, 8 Mar 2022 23:16:06 +0000 (18:16 -0500)] 
Prepare for '-Wimplicit-fallthrough'

In preparation for '-Wextra'

Change-Id: Ice4c5aa7f6ce9107c88f38ec4024a4631589ad73
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoPrepare for '-Wmissing-field-initializers'
Michael Jeanson [Tue, 8 Mar 2022 23:05:26 +0000 (18:05 -0500)] 
Prepare for '-Wmissing-field-initializers'

In preparation for '-Wextra'

Change-Id: Ic593491ad44c1254f158b19659c3b9567d180ad1
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoPrepare for '-Wignored-qualifiers'
Michael Jeanson [Tue, 8 Mar 2022 16:33:30 +0000 (11:33 -0500)] 
Prepare for '-Wignored-qualifiers'

In preparation for '-Wextra'

Change-Id: I6734a105170da2d57480fb5e15cae839adc38e62
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoPrepare for '-Wunused-parameter'
Michael Jeanson [Tue, 8 Mar 2022 15:52:55 +0000 (10:52 -0500)] 
Prepare for '-Wunused-parameter'

In preparation for '-Wextra'

Change-Id: I30e6abb9502fc97daa565fde450d1e4235cf1ec7
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoconfigure: add '-Wredundant-decls' to warning flags
Michael Jeanson [Wed, 9 Mar 2022 16:32:09 +0000 (11:32 -0500)] 
configure: add '-Wredundant-decls' to warning flags

Change-Id: I5329ebe83aab40e6796b506c28e853b4af3c5e99
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoconfigure: add '-Wmissing-noreturn' to warning flags
Michael Jeanson [Wed, 9 Mar 2022 15:19:17 +0000 (10:19 -0500)] 
configure: add '-Wmissing-noreturn' to warning flags

Change-Id: I95a981348109d4614afcfe9c85f971e65afc2765
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoconfigure: add '-Wlogical-op' to warning flags
Michael Jeanson [Mon, 7 Mar 2022 20:59:30 +0000 (15:59 -0500)] 
configure: add '-Wlogical-op' to warning flags

Change-Id: I0516add62151b22352f96d1e62871a013b8fa6f3
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoconfigure: add '-Wundef' to warning flags
Michael Jeanson [Mon, 7 Mar 2022 19:21:21 +0000 (14:21 -0500)] 
configure: add '-Wundef' to warning flags

Change-Id: If47c16121b1679862e7a5f75fce70c7d9973e92e
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoconfigure: add '-Wnull-dereference' to warning flags
Michael Jeanson [Mon, 7 Mar 2022 19:02:29 +0000 (14:02 -0500)] 
configure: add '-Wnull-dereference' to warning flags

Change-Id: Ife5ad6963262c5c2715954fcd34c94015fb30aa6
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agofix: warning '-Wstringop-truncation' with GCC 11.2
Michael Jeanson [Mon, 7 Mar 2022 16:37:49 +0000 (11:37 -0500)] 
fix: warning '-Wstringop-truncation' with GCC 11.2

Building with GCC 11.2 results in the following warning :

  In file included from ../../src/common/tracker.h:18,
                 from ../../src/bin/lttng-sessiond/trace-ust.h:17,
                 from test_ust_data.cpp:19:
../../src/common/sessiond-comm/sessiond-comm.h:569:14: note: while referencing ‘lttng_event_exclusion::names’
  569 |         char names[0][LTTNG_SYMBOL_NAME_LEN];
      |              ^~~~~
test_ust_data.cpp:209:16: warning: ‘char* strncpy(char*, const char*, size_t)’ specified bound 256 equals destination size [-Wstringop-truncation]
  209 |         strncpy(LTTNG_EVENT_EXCLUSION_NAME_AT(exclusion, 0),
      |         ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  210 |                 get_random_string(), LTTNG_SYMBOL_NAME_LEN);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test_ust_data.cpp:211:16: warning: ‘char* strncpy(char*, const char*, size_t)’ specified bound 256 equals destination size [-Wstringop-truncation]
  211 |         strncpy(LTTNG_EVENT_EXCLUSION_NAME_AT(exclusion, 1),
      |         ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  212 |                 get_random_string(), LTTNG_SYMBOL_NAME_LEN);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Change-Id: I78eea760b4684227ee457c3368c6397d0a767af5
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agofix: warning '-Wformat-overflow' with GCC 11.2
Michael Jeanson [Mon, 7 Mar 2022 16:28:19 +0000 (11:28 -0500)] 
fix: warning '-Wformat-overflow' with GCC 11.2

Building with GCC 11.2 results in the following warning :

  In file included from rotation-thread.cpp:11:
  In function 'int handle_job_queue(rotation_thread_handle*, rotation_thread*, rotation_thread_timer_queue*)',
      inlined from 'void* thread_rotation(void*)' at rotation-thread.cpp:844:27:
  ../../../src/common/error.h:139:32: warning: '%s' directive argument is null [-Wformat-overflow=]
    139 |                         fprintf((type) == PRINT_MSG ? stdout : stderr, fmt, ## args);   \
        |                         ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ../../../src/common/error.h:155:25: note: in expansion of macro '__lttng_print'
    155 |                         __lttng_print(type,                             \
        |                         ^~~~~~~~~~~~~
  ../../../src/common/error.h:195:27: note: in expansion of macro '_ERRMSG'
    195 | #define DBG(fmt, args...) _ERRMSG("DBG1", PRINT_DBG, fmt, ## args)
        |                           ^~~~~~~
  rotation-thread.cpp:587:25: note: in expansion of macro 'DBG'
    587 |                         DBG("Session \"%s\" not found",
        |                         ^~~

Use an empty string for the format string if 'session->name' is NULL.

Change-Id: Ibe29b43c0e8afd13b1c28770e8f7451340cc1e81
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoCleanup: DIST_SUBDIRS is redundant when using AM conditionals
Michael Jeanson [Mon, 22 Nov 2021 19:43:09 +0000 (14:43 -0500)] 
Cleanup: DIST_SUBDIRS is redundant when using AM conditionals

From automake's documentation[1] :

  If SUBDIRS is defined conditionally using Automake conditionals,
  Automake will define DIST_SUBDIRS automatically from the possible
  values of SUBDIRS in all conditions.

[1] https://www.gnu.org/software/automake/manual/html_node/SUBDIRS-vs-DIST_005fSUBDIRS.html

Change-Id: I8495f1f4452ccde4920ecd63bfd37de4eb10c281
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoFix: relayd: session id is ignored by 2.11+ create session command
Jérémie Galarneau [Thu, 10 Mar 2022 22:46:31 +0000 (17:46 -0500)] 
Fix: relayd: session id is ignored by 2.11+ create session command

The id of the session used by the sessiond is not returned by
cmd_create_session_2_11 and its caller sets the value in the
relay_session to an uninitialized value.

Up until recently this didn't have much effect as this uninitialized
value was stored and used to perform look-ups in the trace chunk
registry, which would work.

However, the recent multi-consumer rotation fixes make this problem more
significant as this 'id' is used as a key to join relay sessions
originating from the same session daemon.

This was discovered by enabling the '-Wunused-parameter' warning.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I7e33f5f93dc46bb630c431408c9472be3a75c030

2 years agoFix: consumerd: use-after-free of metadata bucket
Jérémie Galarneau [Thu, 3 Mar 2022 00:27:31 +0000 (19:27 -0500)] 
Fix: consumerd: use-after-free of metadata bucket

Observed issue
==============

When consumer_stream_destroy() is called from, for example, the error
path in setup_metadata(), consumer_stream_free() can end up being called
twice on the same stream.  Since the stream->metadata_bucket is not set
to NULL after being destroyed, it leads to a use-after-free:

 ERROR: AddressSanitizer: heap-use-after-free on address 0x604000000318
 READ of size 8 at 0x604000000318 thread T7
     #0 in metadata_bucket_destroy
     #1 in consumer_stream_free
     #2 in consumer_stream_destroy
     #3 in setup_metadata
     #4 in lttng_ustconsumer_recv_cmd
     #5 in lttng_consumer_recv_cmd
     #6 in consumer_thread_sessiond_poll
     #7 in start_thread nptl/pthread_create.c:481
     #8 in clone (/lib/x86_64-linux-gnu/libc.so.6+0xfcbde)

 0x604000000318 is located 8 bytes inside of 48-byte region [0x604000000310,0x604000000340)
 freed by thread T7 here:
     #0 in __interceptor_free
     #1 in metadata_bucket_destroy
     #2 in consumer_stream_free
     #3 in consumer_stream_destroy
     #4 in clean_channel_stream_list
     #5 in consumer_del_channel
     #6 in consumer_stream_destroy
     #7 in setup_metadata
     #8 in lttng_ustconsumer_recv_cmd
     #9 in lttng_consumer_recv_cmd
     #10 in consumer_thread_sessiond_poll
     #11 in start_thread nptl/pthread_create.c:481

 previously allocated by thread T7 here:
     #0 in __interceptor_calloc
     #1 in zmalloc
     #2 in metadata_bucket_create
     #3 in consumer_stream_enable_metadata_bucketization
     #4 in lttng_ustconsumer_set_stream_ops
     #5 in lttng_ustconsumer_on_recv_stream
     #6 in lttng_consumer_on_recv_stream
     #7 in create_ust_streams
     #8 in ask_channel
     #9 in lttng_ustconsumer_recv_cmd
     #10 in lttng_consumer_recv_cmd
     #11 in consumer_thread_sessiond_poll
     #12 in start_thread nptl/pthread_create.c:481

 Thread T7 created by T0 here:
     #0 in __interceptor_pthread_create
     #1 in main
     #2 in __libc_start_main ../csu/libc-start.c:332

 SUMMARY: AddressSanitizer: heap-use-after-free in metadata_bucket_destroy

This can be easily reproduced by forcing a failure during the setup
of the metadata reproducible using the following change:

  diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c
  index fa1c71299..97ed59632 100644

  --- a/src/common/ust-consumer/ust-consumer.c
  +++ b/src/common/ust-consumer/ust-consumer.c
  @@ -908,8 +908,7 @@ static int setup_metadata(struct lttng_consumer_local_data *ctx, uint64_t key)

           /* Send metadata stream to relayd if needed. */
           if (metadata->metadata_stream->net_seq_idx != (uint64_t) -1ULL) {
  -                ret = consumer_send_relayd_stream(metadata->metadata_stream,
  -                                metadata->pathname);
  +                ret = -1;
                   if (ret < 0) {
                           ret = LTTCOMM_CONSUMERD_ERROR_METADATA;
                           goto error;

Cause
=====

Channels have a list of streams that are being "setup" and are not
yet monitored for consumption. During this setup phase, the streams are
owned by the channel. On destruction of the channel, any stream in that
list will thus be cleaned-up.

When destroying a consumer stream, a reference to its channel is 'put'.
This can result in the destruction of the channel.

In the situation described above, the release of the channel's reference
is done before the stream is removed from the channel's stream list.
This causes the channel's clean-up to invoke (again) the current
stream's clean-up, resulting in the double-free of the metadata bucket.

This problem is present in a number of error paths.

Solution
========

Some error paths already manually removed the consumer stream from it's
channel's stream list before invoking consumer_stream_destroy(). The
various error paths that have to deal with this possible situation are
changed to simply invoke consumer_stream_destroy().

consumer_stream_destroy() is modified to always remove the stream from
its channel's list before performing the rest of the clean-up. This
ensures that those double clean-ups can't occur.

Drawbacks
=========

None.

Reported-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Tested-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ibeca9b675b86fc46be3f57826f7158de4da43df8

2 years agoFix: ust-consumerd: leak of stream control structure
Jérémie Galarneau [Thu, 3 Mar 2022 22:52:33 +0000 (17:52 -0500)] 
Fix: ust-consumerd: leak of stream control structure

The following leak is reported by LeakSanitizer when
setup_metadata() fails to send the metadata stream to the relay
daemon:

  ==3050181==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 240 byte(s) in 5 object(s) allocated from:
      #0 0x7f5fce02cfb9 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
      #1 0x7f5fcdd95a7a in zmalloc ../../../src/common/macros.h:23
      #2 0x7f5fcdd95a7a in lttng_ust_ctl_create_stream /home/jgalar/EfficiOS/src/lttng-ust/src/lib/lttng-ust-ctl/ustctl.c:1649

A consumer stream can have an allocated
`struct lttng_ust_ctl_consumer_stream *` (ustream) even if it is
not globally visible at the time of its teardown.

In the case of the user space consumer, the only site that creates
consumer stream instances ensures that the allocation of the
lttng_ust_ctl_consumer_stream succeeded, ensuring that the
consumer stream's 'ustream' is always set.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ia4be7969e85bd8812ae13b042e1e100812a63c1d

2 years agoFix: liblttng-ctl: erroneous flat size computation
Jérémie Galarneau [Fri, 4 Mar 2022 20:29:12 +0000 (15:29 -0500)] 
Fix: liblttng-ctl: erroneous flat size computation

compute_flattened_size() erroneously computes (over-estimates) the size
of the allocation required to hold the flat array of struct lttng_event
returned to the user by lttng_list_{events, syscalls, tracepoints}.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I0a80ef0fa66428b7df34303804a024e80b635c69

2 years agofix: msgpack requires limits.h for UINT_MAX
Michael Jeanson [Thu, 5 Aug 2021 20:49:26 +0000 (16:49 -0400)] 
fix: msgpack requires limits.h for UINT_MAX

Building with '-Wundef' reveals this issue :

  unpack.c: In function ‘template_callback_array’:
  unpack.c:197:17: warning: "UINT_MAX" is not defined, evaluates to 0 [-Wundef]
    197 | #if SIZE_MAX == UINT_MAX
        |                 ^~~~~~~~
  unpack.c: In function ‘template_callback_map’:
  unpack.c:241:17: warning: "UINT_MAX" is not defined, evaluates to 0 [-Wundef]
    241 | #if SIZE_MAX == UINT_MAX
        |                 ^~~~~~~~

Change-Id: I7dadd9f7013d613509f66e67ff1beb8ae593d2bf
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoSplit warning flags list for C and C++
Michael Jeanson [Thu, 3 Mar 2022 23:28:26 +0000 (18:28 -0500)] 
Split warning flags list for C and C++

When using Ccache [1], some flags specific to C are accepted by the C++
compiler but result in warning messages on each invocation of the
compiler. To remediate this, split the warning flags detection list in
three, a common base and a specific list for C and C++.

[1] https://github.com/ccache/ccache/issues/738

Change-Id: I9ef360efbfae445845ca1016e5f5eebdd3bdb0ac
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoTests: add a multi-domain ust+kernel rotation test
Jérémie Galarneau [Thu, 13 Jan 2022 20:38:06 +0000 (15:38 -0500)] 
Tests: add a multi-domain ust+kernel rotation test

Validate that multi-domain rotations work as intended for both local and
remote outputs. This validates the fix introduced by c5c793.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I91ddd313fcfbc5421db281baea34ba2d0aae2c82

2 years agoFix: relayd: connection abruptly closed on viewer stream creation failure
Jérémie Galarneau [Wed, 2 Mar 2022 17:59:17 +0000 (12:59 -0500)] 
Fix: relayd: connection abruptly closed on viewer stream creation failure

Commit fe88e5175 explains (and fixes) an issue that could cause the
creation of viewer streams to fail. Currently, the error path causes the
relay daemon to abruptly close the connection to its live viewer peer.
This behaviour makes it impossible for the viewer to determine if an
error occurred or if the network connection simply failed.

Returning an `LTTNG_VIEWER_NEW_STREAMS_ERR` status code allows the
viewer to report a precise error. The viewer connection is closed since
the internal error is unlikely to be recoverable.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I557a8b66c1bd1c0bf361cfbabe962d8a6808f4f4

2 years agoFix: relayd: live client fails on clear of multi-domain session
Jérémie Galarneau [Wed, 2 Mar 2022 17:37:39 +0000 (12:37 -0500)] 
Fix: relayd: live client fails on clear of multi-domain session

Observed issue
==============

Two test cases of the clear/test_ust test suite occasionally fail in the
integration jobs testing cross-bitness (32/64) LTTng deployments.

Babeltrace fails with the following error when a clear occurs while a
live client consumes a trace:

  02-28 16:55:03.262 32362 32362 E PLUGIN/SRC.CTF.LTTNG-LIVE/VIEWER lttng_live_recv@viewer-connection.c:198 [lttng-live] Remote side has closed connection
  02-28 16:55:03.262 32362 32362 E PLUGIN/SRC.CTF.LTTNG-LIVE/VIEWER lttng_live_session_get_new_streams@viewer-connection.c:1706 [lttng-live] Error receiving get new streams reply
  02-28 16:55:03.262 32362 32362 E PLUGIN/SRC.CTF.LTTNG-LIVE lttng_live_msg_iter_next@lttng-live.c:1665 [lttng-live] Error preparing the next batch of messages: live-iter-status=LTTNG_LIVE_ITERATOR_STATUS_ERROR
  02-28 16:55:03.262 32362 32362 W LIB/MSG-ITER bt_message_iterator_next@iterator.c:864 Component input port message iterator's "next" method failed: iter-addr=0x55eab7eb1170, iter-upstream-comp-name="lttng-live", iter-upstream-comp-log-level=WARNING, iter-upstream-comp-class-type=SOURCE, iter-upstream-comp-class-name="lttng-live", iter-upstream-comp-class-partial-descr="Connect to an LTTng relay daemon", iter-upstream-port-type=OUTPUT, iter-upstream-port-name="out", status=ERROR
  02-28 16:55:03.262 32362 32362 E PLUGIN/FLT.UTILS.MUXER muxer_upstream_msg_iter_next@muxer.c:454 [muxer] Upstream iterator's next method returned an error: status=ERROR
  02-28 16:55:03.262 32362 32362 E PLUGIN/FLT.UTILS.MUXER validate_muxer_upstream_msg_iters@muxer.c:991 [muxer] Cannot validate muxer's upstream message iterator wrapper: muxer-msg-iter-addr=0x55eab7eb1120, muxer-upstream-msg-iter-wrap-addr=0x55eab7eb3a70
  02-28 16:55:03.262 32362 32362 E PLUGIN/FLT.UTILS.MUXER muxer_msg_iter_next@muxer.c:1415 [muxer] Cannot get next message: comp-addr=0x55eab7eb0470, muxer-comp-addr=0x55eab7eb0510, muxer-msg-iter-addr=0x55eab7eb1120, msg-iter-addr=0x55eab7eb0fb0, status=ERROR
  02-28 16:55:03.262 32362 32362 W LIB/MSG-ITER bt_message_iterator_next@iterator.c:864 Component input port message iterator's "next" method failed: iter-addr=0x55eab7eb0fb0, iter-upstream-comp-name="muxer", iter-upstream-comp-log-level=WARNING, iter-upstream-comp-class-type=FILTER, iter-upstream-comp-class-name="muxer", iter-upstream-comp-class-partial-descr="Sort messages from multiple inpu", iter-upstream-port-type=OUTPUT, iter-upstream-port-name="out", status=ERROR
  02-28 16:55:03.262 32362 32362 W LIB/GRAPH consume_graph_sink@graph.c:473 Component's "consume" method failed: status=ERROR, comp-addr=0x55eab7eb0760, comp-name="pretty", comp-log-level=WARNING, comp-class-type=SINK, comp-class-name="pretty", comp-class-partial-descr="Pretty-print messages (`text` fo", comp-class-is-frozen=1, comp-class-so-handle-addr=0x55eab7ebd910, comp-class-so-handle-path="/root/workspace/joraj_integration_base_job/deps-64/build/lib/babeltrace2/plugins/babeltrace-plugin-text.so", comp-input-port-count=1, comp-output-port-count=0
  02-28 16:55:03.262 32362 32362 E CLI cmd_run@babeltrace2.c:2548 Graph failed to complete successfully

  ERROR:    [Babeltrace CLI] (babeltrace2.c:2548)
    Graph failed to complete successfully
  CAUSED BY [libbabeltrace2] (graph.c:473)
    Component's "consume" method failed: status=ERROR, comp-addr=0x55eab7eb0760,
    comp-name="pretty", comp-log-level=WARNING, comp-class-type=SINK,
    comp-class-name="pretty", comp-class-partial-descr="Pretty-print messages
    (`text` fo", comp-class-is-frozen=1, comp-class-so-handle-addr=0x55eab7ebd910,
    comp-class-so-handle-path="/root/workspace/joraj_integration_base_job/deps-64/build/lib/babeltrace2/plugins/babeltrace-plugin-text.so",
    comp-input-port-count=1, comp-output-port-count=0
  CAUSED BY [libbabeltrace2] (iterator.c:864)
    Component input port message iterator's "next" method failed:
    iter-addr=0x55eab7eb0fb0, iter-upstream-comp-name="muxer",
    iter-upstream-comp-log-level=WARNING, iter-upstream-comp-class-type=FILTER,
    iter-upstream-comp-class-name="muxer",
    iter-upstream-comp-class-partial-descr="Sort messages from multiple inpu",
    iter-upstream-port-type=OUTPUT, iter-upstream-port-name="out", status=ERROR
  CAUSED BY [muxer: 'filter.utils.muxer'] (muxer.c:991)
    Cannot validate muxer's upstream message iterator wrapper:
    muxer-msg-iter-addr=0x55eab7eb1120,
    muxer-upstream-msg-iter-wrap-addr=0x55eab7eb3a70
  CAUSED BY [muxer: 'filter.utils.muxer'] (muxer.c:454)
    Upstream iterator's next method returned an error: status=ERROR
  CAUSED BY [libbabeltrace2] (iterator.c:864)
    Component input port message iterator's "next" method failed:
    iter-addr=0x55eab7eb1170, iter-upstream-comp-name="lttng-live",
    iter-upstream-comp-log-level=WARNING, iter-upstream-comp-class-type=SOURCE,
    iter-upstream-comp-class-name="lttng-live",
    iter-upstream-comp-class-partial-descr="Connect to an LTTng relay daemon",
    iter-upstream-port-type=OUTPUT, iter-upstream-port-name="out", status=ERROR
  CAUSED BY [lttng-live: 'source.ctf.lttng-live'] (lttng-live.c:1665)
    Error preparing the next batch of messages:
    live-iter-status=LTTNG_LIVE_ITERATOR_STATUS_ERROR
  CAUSED BY [lttng-live: 'source.ctf.lttng-live'] (viewer-connection.c:1706)
    Error receiving get new streams reply
  CAUSED BY [lttng-live: 'source.ctf.lttng-live'] (viewer-connection.c:198)
    Remote side has closed connection

Looking at the relay daemon logs, we see the following error:
  DBG1 - 16:55:03.262106718 [32139/32146]: Adding new file "ust/pid/gen-ust-events-32373-20220228-165503/chan_0" to trace chunk "(unnamed)" (in lttng_trace_chunk_add_file() at trace-chunk.cpp:1310)
  PERROR - 16:55:03.262133333 [32139/32146]: Failed to open fs handle to ust/pid/gen-ust-events-32373-20220228-165503/chan_0, open() returned: No such file or directory (in fd_tracker_open_fs_handle() at fd-tracker/fd-tracker.cpp:548)

Cause
=====

Adding more debugging logging allows us to see the following situation
takes place:

- relay thread: Create trace chunk on session 1.
- live thread: get new streams against session 1, returns NO_NEW_STREAMS
  since the session has an 'ongoing_rotation'.
- live thread: get new streams against session 2, sees no rotation
  ongoing and attempts to open `chan_0` when creating a viewer stream

The "ongoing rotation" check was introduced in a7ceb342d and, in a
nutshell, prevents live viewers from creating new viewer streams during
a rotation.

The "ongoing rotation" state is entered when a CREATE_NEW_TRACE_CHUNK
command is issued against a session.

However, this presumes that a relay_session maps 1:1 to a session on the
session daemon's end. This isn't the case as, in multi-domain
scenarios (tracing 32-bit, 64-bit, and kernel events), a single session
daemon session can map to multiple relay_session objects. This is
because the consumer daemons maintain independant connections to the
relay daemon.

To synchronize rotations accross related relay_session instances, the
relay daemon uses the same trace chunk instances accross relay_session
instances. This means that while a trace chunk is created against a
specific relay session, it can be used by other relay_session instances.

To manage shared trace chunks between relay_sessions, the relay daemon
makes use of the trace_chunk_registry. This registry allows
relay_sessions to share trace chunk instances using a unique key tuple:
  - session daemon instance uuid,
  - session daemon session id,
  - trace chunk id.

There is no equivalent mechanism to track the "ongoing_rotation" state
accross relay_sessions originating from the same sessiond session.

In the current scenario, this causes the live client to correctly see
that no new streams are available for session 1 (say, the 32-bit user
space session). Unfortunately, this state is not entered for other
sessions (64-bit and kernel relay sessions). Hence, the viewer succeds
in acquiring new streams from session 2, exposing the race the 'ongoing
rotation' state aims to protect against.

Solution
========

Like the trace chunk instances, the "ongoing rotation" state must be
shared accross relay sessions that originate from the same session
daemon session.

To "emulate" this shared state, session_has_ongoing_rotation() checks
if any relay session originating from the same sessiond session
have an ongoing rotation. If it is the case, we temporarily prevent
live viewers from acquiring new streams.

Known drawbacks
===============

session_has_ongoing_rotation() iterates over all sessions, acquiring
their lock in the process, which is certainly undesirable from a
performance standpoint.

Optimizing this is not a great challenge, but is beyond the scope
of this immediate fix.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I457a32fa497d151ca933c25433c80665268a7c1c

2 years agoFix: rotation: hang on destroy when using scheduled rotation based on timer
Jonathan Rajotte [Mon, 14 Feb 2022 16:23:28 +0000 (11:23 -0500)] 
Fix: rotation: hang on destroy when using scheduled rotation based on timer

Observed issue
==============

The following scenario results in a hang for `lttng destroy`:

lttng create test
lttng enable-event -u -a
lttng enable-rotation --timer 100000
lttng start
lttng stop
lttng start
lttng destroy

Cause
=====

There is an imbalance in how many times we start the rotation timer.

The rotation timer is only removed on `lttng destroy` or when disabling
a time-based-rotation. On the other hand, the timer is "started"
on `lttng start` and when enabling a time based rotation.

The imbalance emerging from a start/stop/start sequence would prevent the
teardown of the session object since each time the timer is started a
reference to the session is held.

Solution
========

Do not start the rotation schedule timer if it was already launched.

Known drawbacks
=========

None.

Change-Id: Ic5b8938166358fe7629187bebdf02a09e90846c0
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoFix: consumerd: fd still open after `lttng snapshot record` returns
Jonathan Rajotte [Wed, 9 Feb 2022 19:23:18 +0000 (14:23 -0500)] 
Fix: consumerd: fd still open after `lttng snapshot record` returns

Observed issue
=====

Using a snapshot output located on a pramfs mount:

  lttng snapshot record
  rm -rf /my_mount/my_trace_output

`rm` fails with ENOTEMPTY on rmdir for /my_mount/my_trace_output.

At that point, the lttng-consumerd daemon have an open fd on:
  /my_mount/my_trace_output/ust

Note that a sleep between both command "fixes" the issue.

Cause
=====

The reclaim for the in-registry trace chunks can happen after the LTTng
CLI returns since we use `call_rcu`.

```
static
void lttng_trace_chunk_release(struct urcu_ref *ref)

....

  if (chunk->in_registry_element) {
    struct lttng_trace_chunk_registry_element *element;

    element = container_of(chunk, typeof(*element), chunk);
    if (element->registry) {
      rcu_read_lock();
      cds_lfht_del(element->registry->ht, &element->trace_chunk_registry_ht_node);
      rcu_read_unlock();
->    call_rcu(&element->rcu_node, free_lttng_trace_chunk_registry_element);
   } else {

```

The delayed reclaim of the `lttng_trace_chunk_registry_element` can
result in lttng-consumerd holding an open fd for the "chunk directory"
of the chunk since the close() is only done during the "*fini" phase of
the chunk (`lttng_trace_chunk_fini`).

Solution
========

Considering that the rcu lookup+refcount access scheme is used for the
trace chunk object and that at that point the refcount for the trace
chunk object is effectively zero, we can move the
`lttng_trace_chunk_fini` safely outside of the
`free_lttng_trace_chunk_registry_element` call_rcu call.

Known drawbacks
=========

Even if this solves the current situation, it is important to note that
the actual object holding the reference is itself refcounted and only
close the fd on release. This means that we are still exposed to this
problem if at some point the directory handle is shared and outlives the
trace chunk for some reason in the future.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I6da3948824bf8b092fc8248b1bb0263fdd5887be

2 years agoFix: event: unchecked return value for lttng_strncpy
Jonathan Rajotte [Wed, 2 Mar 2022 15:50:16 +0000 (10:50 -0500)] 
Fix: event: unchecked return value for lttng_strncpy

Reported by Coverity:

  CID 1475806: Unchecked return value (CHECKED_RETURN)
  Calling lttng_strncpy without checking return value.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ib2b16bef61bcc24fea381c18fd8f914e9a84a6e7

2 years agoFix: event: format specifier for ssize_t
Jonathan Rajotte [Wed, 2 Mar 2022 15:36:11 +0000 (10:36 -0500)] 
Fix: event: format specifier for ssize_t

Observed issue
==============

  In file included from event.cpp:15:
  event.cpp: In function ‘ssize_t lttng_event_create_from_payload(lttng_payload_view*, lttng_event**, lttng_event_exclusion**, char**, lttng_bytecode**)’:
  ../../src/common/error.h:191:28: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘ssize_t’ {aka ‘int’} [-Wformat=]
    191 |  __lttng_print(PRINT_WARN, "Warning: " fmt "\n", ## args)
        |                            ^~~~~~~~~~~~~~~~~~~~
  ../../src/common/error.h:139:51: note: in definition of macro ‘__lttng_print’
    139 |    fprintf((type) == PRINT_MSG ? stdout : stderr, fmt, ## args); \
        |                                                   ^~~
  event.cpp:624:4: note: in expansion of macro ‘WARN’
    624 |    WARN("Userspace probe location from the received buffer is not the advertised length: header length = %" PRIu32 ", payload length = %lu", event_comm->userspace_probe_location_len, ret);
        |    ^~~~

Solution
========

Albeit there is no "canonical" way of printing ssize_t, use '%zd' since
we already make use of it elsewhere.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Id41e6ccf07bd580813f169b65d281a4fa305fb48

2 years agoFix: sessiond: use after free
Jonathan Rajotte [Wed, 2 Mar 2022 15:24:29 +0000 (10:24 -0500)] 
Fix: sessiond: use after free

Introduced by 88b02a8e7cbdbc728cc681cd07bbe0c738964717

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I9b1c3eeed7e51976af1a4982fc522066b3b29461

2 years agoFix: sessiond: `break` used instead of `goto`
Jonathan Rajotte [Tue, 1 Mar 2022 15:35:44 +0000 (10:35 -0500)] 
Fix: sessiond: `break` used instead of `goto`

On error path, use `goto`.

Reported by Coverity:

    CID 1475805: Unused value (UNUSED_VALUE)
    Assigning value LTTNG_ERR_UNK to ret_code here, but that stored value is overwritten before it can be used.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ife3b5486b8463242dfecb61a3a9718e8eb7149b4

2 years agoFix: sessiond: lttng_channel object is not reclaimed
Jonathan Rajotte [Tue, 1 Mar 2022 15:31:33 +0000 (10:31 -0500)] 
Fix: sessiond: lttng_channel object is not reclaimed

The reclaim is also missing on the error path.

Reported by Coverity:

    CID 1475807: Resource leak (RESOURCE_LEAK)
    Variable channel going out of scope leaks the storage it points to.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I9ae59900f6bc33282b8b8a65b834a718428a7f50

2 years agoFix: common: local_attr might leak
Jonathan Rajotte [Tue, 1 Mar 2022 15:27:37 +0000 (10:27 -0500)] 
Fix: common: local_attr might leak

Reported by Coverity:

    CID 1475808: Resource leak (RESOURCE_LEAK)
    Variable local_attr going out of scope leaks the storage it points to.

    CID 1475810: Resource leak (RESOURCE_LEAK)
    Variable local_attr going out of scope leaks the storage it points to.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ibe2ee6516618827b6f52676d9a9f7d510cadf1bc

2 years agoFix: session: end goto label used for error path
Jonathan Rajotte [Tue, 1 Mar 2022 15:23:24 +0000 (10:23 -0500)] 
Fix: session: end goto label used for error path

Reported by Coverity:

    CID 1475801: Unused value (UNUSED_VALUE)
    Assigning value LTTNG_ERR_FATAL to ret_code here, but that stored value is overwritten before it can be used

    CID 1475802: Unused value (UNUSED_VALUE)
    Assigning value LTTNG_ERR_NOMEM to ret_code here, but that stored value is overwritten before it can be used.

    CID 1475814: Unused value (UNUSED_VALUE)
    Assigning value LTTNG_ERR_UST_CHAN_NOT_FOUND to ret_code here, but that stored value is overwritten before it can be used.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I79fb5d65ea0f4f3dfbbef2a42ec3dcf0542043c5

2 years agoFix: sessiond: event, filter expression, bytecode, exclusions might leak
Jonathan Rajotte [Tue, 1 Mar 2022 15:18:46 +0000 (10:18 -0500)] 
Fix: sessiond: event, filter expression, bytecode, exclusions might leak

Reported by Coverity:

    CID 1475803: Resource leak (RESOURCE_LEAK)
    Variable filter_expression going out of scope leaks the storage it points to.

    CID 1475816: Resource leak (RESOURCE_LEAK)
    Variable exclusions going out of scope leaks the storage it points to.

    CID 1475817: Resource leak (RESOURCE_LEAK)
    Variable bytecode going out of scope leaks the storage it points to.

    CID 1475820: Resource leak (RESOURCE_LEAK)
    Variable event going out of scope leaks the storage it points to.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I705e3cc1a1894af52d6ddeee94798a887dc82bba

2 years agoFix: lttng-sessiond: event_context might leak
Jonathan Rajotte [Tue, 1 Mar 2022 14:52:21 +0000 (09:52 -0500)] 
Fix: lttng-sessiond: event_context might leak

If the parsing completed but is not the expected size, the
*out_event_context object leaks.

Reported by Coverity:

    CID 1475821 (#1 of 1): Resource leak (RESOURCE_LEAK)
    Variable event_context going out of scope leaks the storage it points to.

Change-Id: I8198f7f3f1e1af3dcb377d89674402ec49cb788b
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoFix: lttng-ctl: uninitialized pointer read
Jonathan Rajotte [Tue, 1 Mar 2022 14:09:04 +0000 (09:09 -0500)] 
Fix: lttng-ctl: uninitialized pointer read

Reported by Coverity:

    CID 1475822: Uninitialized pointer read (UNINIT)
    Using uninitialized value reply._fd_handles.array.size when calling lttng_payload_reset.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ied7f5de61d736e8d8f9c869a36322262a4d79cd5

2 years agoFix: lttng-sessiond: nb_events is uninitialized
Jonathan Rajotte [Tue, 1 Mar 2022 14:03:45 +0000 (09:03 -0500)] 
Fix: lttng-sessiond: nb_events is uninitialized

Reported by Coverity:

    CID 1475823: Uninitialized scalar variable (UNINIT)
    Using uninitialized value nb_events.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I623b1ed0c1ef9b4acddb1e46e7311dfcdc974629

2 years agoFix: ser/des: missing null terminator on payload append
Jonathan Rajotte [Tue, 1 Mar 2022 14:50:06 +0000 (09:50 -0500)] 
Fix: ser/des: missing null terminator on payload append

Reported by the lttng-ust-java-tests_master_build CI job.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I47b048c65a9855347c9b02fe44d9fdbae7e06dbf

2 years agoFix: test: test plan/skip must come after TAP initialization
Jonathan Rajotte [Tue, 1 Mar 2022 17:33:53 +0000 (12:33 -0500)] 
Fix: test: test plan/skip must come after TAP initialization

TAP is initialized during the sourcing of `utils.sh`. Calling it before
yield nothing.

Move the platform validation after the `PLAN_TEST` call.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ib7b4434440ed7590b39b0580ea294045ace40495

2 years agoFix: test: missing test plan for non-x86/arm platforms (i.e powerpc)
Jonathan Rajotte [Tue, 1 Mar 2022 02:30:44 +0000 (21:30 -0500)] 
Fix: test: missing test plan for non-x86/arm platforms (i.e powerpc)

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I9a34a0b45fd011943436ce6e45c8f0d1666df26f

2 years agoFix: remove debugging output
Jonathan Rajotte [Tue, 1 Mar 2022 02:23:53 +0000 (21:23 -0500)] 
Fix: remove debugging output

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I7bcdb7c47f811cf99dee1e4e483cc22847f481c5

2 years agoFix: lttng: truncated addresses and offsets on 32-bit builds
Jérémie Galarneau [Wed, 23 Feb 2022 22:40:06 +0000 (17:40 -0500)] 
Fix: lttng: truncated addresses and offsets on 32-bit builds

The lttng client parses hexadecimal addresses using, at some point,
strtoul(). Using this function effectively caps addresses and
offsets to MAX_UINT32 resulting in failures to enable kprobes
against a 64-bit kernel using a 32-bit client.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: If619e9e84413de5cd32d8c06f363152caaf5ac46

2 years agoFix: liblttng-ctl comm: lttng_event_field is not packed
Jonathan Rajotte [Wed, 12 Jan 2022 23:18:08 +0000 (18:18 -0500)] 
Fix: liblttng-ctl comm: lttng_event_field is not packed

Observed issue
==============

For MI testing where the lttng-sessiond is 64 bit and the lttng CLI is
32 bit, the tracepoint field listing fails with partial garbage output.

The size of the struct differs between bitness for x86-64 and x86
leading to serialization/deserialization problem across client
(liblttng-ctl) and lttng-sessiond.

sizeof(struct lttng_event_field):

  x86: 1136
  x86-64: 1144

The struct cannot be marked as LTTNG_PACKED since it is part of the API.

Solution
========

Adopt a similar pattern to the new APIs with a "serialize" &
"create_from_buffer" approach. The only particularity is that we need to
flatten the event_field on listing.

Most of the complexity is moved to `src/common/event.c`

Known drawbacks
=========

None.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I280d9809d110237574e2606ee93a7aeba41e704e

2 years agoFix: liblttng-ctl comm: lttng_event_context is not packed
Jonathan Rajotte [Wed, 12 Jan 2022 23:18:08 +0000 (18:18 -0500)] 
Fix: liblttng-ctl comm: lttng_event_context is not packed

Observed issue
==============

The size of the struct differs between bitness for x86-64 and x86
leading to serialization/deserialization problem across client
(liblttng-ctl) and lttng-sessiond.

sizeof(struct lttng_event_context):

  x86: 308
  x86-64: 312

The struct cannot be marked as LTTNG_PACKED since it is part of the API.

Solution
========

Adopt a similar pattern to the new API with a "serialize" &
"create_from_buffer" approach.

Most of the complexity is moved to `src/common/event.c`

Known drawbacks
=========

None.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ieb400eab2a2df4070ff51cb2b44929d3ea945ce4

2 years agoFix: liblttng-ctl comm: lttng_event is not packed
Jonathan Rajotte [Wed, 12 Jan 2022 23:18:08 +0000 (18:18 -0500)] 
Fix: liblttng-ctl comm: lttng_event is not packed

Observed issue
==============

In `lttcomm_session_msg` the lttng_event struct is marked as
LTTNG_PACKED. This statement have no effect as explained in commit [2].

Solution
========

Adopt a similar pattern to the new API with a "serialize" &
"create_from_buffer" approach.

Most of the complexity is moved to `src/common/event.c`

Known drawbacks
=========

None.

Note
====

Jérémie Galarneau: This patch was extensively modified from the original
patch applying against stable-2.12 to accomodate for the use of the
lttng_payload utils throughout the liblttng-ctl <-> lttng-sessiond
communication code.

Some changes were also made to build as C++.

Reference
========
[1] https://review.lttng.org/gitweb?p=lttng-tools.git;a=commit;h=7bd95aee4660c6419a4a65429fc27754481e7e90

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I35d848519dacb2b119324e88f262aa95951e4ac6

2 years agolibcommon: move event.c to libcommon-lgpl
Jonathan Rajotte [Sat, 29 Jan 2022 00:29:58 +0000 (19:29 -0500)] 
libcommon: move event.c to libcommon-lgpl

The `event.c` license is already LGPL. There is no technical reason why
it was not part of the lgpl side of libcommon, simply that nothing that
is LGPL needed it. This will change in upcoming commits with the
addition of ser/des functions of `struct lttng_event` and other structs
related to `lttng_event` for liblttng-ctl.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I1e15a547e526198f971a287a726d0e6229a733b0

2 years agoFix: liblttng-ctl comm: lttng_channel is not packed
Jonathan Rajotte [Wed, 12 Jan 2022 23:18:08 +0000 (18:18 -0500)] 
Fix: liblttng-ctl comm: lttng_channel is not packed

Observed issue
==============

The size of the struct differs between bitness for x86-64 and x86
leading to serialization/deserialization problem across client
(liblttng-ctl) and lttng-sessiond.

sizeof(struct lttng_channel):

  x86: 608
  x86-64: 624

The struct cannot be marked as LTTNG_PACKED since it is part of the API.

Solution
========

Adopt a similar pattern to the new API with a "serialize" &
"create_from_buffer" approach. The only particularity is that we need to
flatten the channels on listing.

Most of the complexity is moved to `src/common/channel.c`

Known drawbacks
=========

None.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Id5c9aaf3cf8b3d739b71263c02cae8d4d2fedfe3

2 years agoFix: conversion from KB to bytes overflow on arm32
Jonathan Rajotte [Thu, 27 Jan 2022 19:22:22 +0000 (14:22 -0500)] 
Fix: conversion from KB to bytes overflow on arm32

Observed issue
==============

On enable channel the memory available check fails on arm32 when
available memory, in bytes, is larger than 2^32.

Cause
=====

`read_proc_meminfo_field` converts the read value (in KB) to bytes and
stores it into a size_t variable.

On the system running the reproducer the value of the `value_kb` variable
is 4839692, yielding an overflow when multiplied with 1024 since
`size_t` is 32 bit long. `size_t` can be larger in certain situation
(i.e LARGEFILE) but this is irrelevant to the problem at hand.

Solution
========

Convert all the checks to use uint64_t.

Known drawbacks
=========

None.

References
==========

The multiplication overflow check scheme is borrowed from
`src/common/time.c`

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I067da25659ab4115e5494e48aab45a1c35f56652

2 years agoFix: Unexpected payload size in cmd_recv_stream_2_11
Jonathan Rajotte [Tue, 11 Jan 2022 17:28:15 +0000 (12:28 -0500)] 
Fix: Unexpected payload size in cmd_recv_stream_2_11

Observed issue
==============

For the following scenario:

 lttng-relayd: 64 bit
 lttng-sessiond: 64 bit
 lttng-consumerd: 32 bit
 application: 32 bit

 Commands
   lttng create --set-url=net://127.0.0.1
   lttng enable-event -u -a
   lttng start
   ./application

On application start the lttng-relayd reports this error:

  DEBUG1 - 14:16:38.216442600 [2004731/2004735]: Done receiving control command payload: fd = 19, payload size = 4376 bytes (in relay_process_control_receive_payload() at main.c:3456)
  DEBUG3 - 14:16:38.216469462 [2004731/2004735]: Processing "RELAYD_ADD_STREAM" command for socket 19 (in relay_process_control_command() at main.c:3327)
  Error: Unexpected payload size in "cmd_recv_stream_2_11": expected >= 3519925694 bytes, got 4376 bytes

Cause
=====

In `relayd_add_stream`, instead of taking the > 2.11 protocol path, the
`relayd_add_stream_2_2` function is called.

The value of the rsock version number are:

  major: 21845
  minor: 2

Which is simply invalid since we know that the version should be 2.12.

The relayd sock version numbers are set during the
LTTNG_CONSUMER_ADD_RELAYD_SOCKET command between the lttng-sessiond and
the lttng-consumerd process. It is important to note here that both
processes do NOT have the same bitness.

The serialization and deserialization of `struct lttcomm_relayd_sock` is
the culprit.

`struct lttcomm_relayd_sock` contains a `struct lttcomm_sock`:

struct lttcomm_sock {
    int32_t fd;
    enum lttcomm_sock_proto proto;
    struct lttcomm_sockaddr sockaddr;
    const struct lttcomm_proto_ops *ops;
} LTTNG_PACKED;

Note that `ops` is a pointer and its size varies based on the bitness of
the application. Hence the size of the `struct lttcomm_sock` differs
across bitness. Since it is the first member of `struct
lttcomm_relayd_sock`, the memory layout is simply invalid across
bitness (amd64/x86).

This results in invalid parsing for the overall "struct
lttcomm_relayd_sock" when dealing with a lttng-consumerd with a
different bitness than the lttng-sessiond. As far as I know local
tracing scenarios are not affected since this is only relevant when
dealing with a lttng-relayd.

Solution
========

Pass the socket protocol type, relayd major, relayd minor in
`lttcomm_consumer_msg`. On the receiver side, query the network stack to
get the peer information to populate a basic `lttcomm_sock`. Leaving
this work to the OS saves us from having to serialize the `sockaddr_in*`
structs.

Known drawbacks
=========

We rely on `getpeername` for the first time. Compatibility might be a
problem.

This code path assumes a lot of thing that cannot be asserted against
such as the fact that the socket from which we fetch the info must be
`connected`. Still at this point, the socket is completely setup and the
rest of the code depends on it already.

From GETPEERNAME(2):

```
       For stream sockets, once a connect(2) has been performed, either
       socket can call getpeername() to obtain the address of the peer
       socket.  On the other hand, datagram sockets  are connectionless.
       Calling connect(2) on a datagram socket merely sets the peer
       address for outgoing datagrams sent with write(2) or recv(2).
       The caller of connect(2) can use getpeername() to obtain the
       peer address that it earlier set for the socket.  However, the
       peer socket is unaware of this information, and calling
       getpeername() on the  peer  socket will  return  no useful
       information (unless a connect(2) call was also executed on the
       peer).  Note also that the receiver of a datagram can obtain the
       address of the sender when using recvfrom(2).
```

But here we are always "the caller of connect".

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ic157c4137b2f20e394c907136687fcbd126f90a0

2 years agoTests: add kernel test into the `make check` test suite.
Jonathan Rajotte [Mon, 31 Jan 2022 19:47:35 +0000 (14:47 -0500)] 
Tests: add kernel test into the `make check` test suite.

The Jenkins CI mostly run the `make check` suite. Only the Lava base CI
run the root_regression test suite. Most of those test can be run on
`make check` without incurring any major extra time.

Only `regression/tools/streaming/test_high_throughput_limits` is left in
root_regression since it is currently "unreasonable" in term of the time
it takes to run. This could be tackled another time.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I3d59705e84549c2a1840cf96a7723396e2e0c402

2 years agoBuild dist fix: some kernel tests are not distributed
Jérémie Galarneau [Wed, 9 Feb 2022 00:41:10 +0000 (19:41 -0500)] 
Build dist fix: some kernel tests are not distributed

A missing '\' causes some kernel test scripts to not be added
the EXTRA_DIST variable. This causes those tests to not be
shipped as part of the tarball.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I3c824b4c7ea913cb4cdc7ac6f98df05267ca5764

2 years agoTests: fix: test_kernel_function: erroneous test count
Jérémie Galarneau [Thu, 10 Feb 2022 22:53:56 +0000 (17:53 -0500)] 
Tests: fix: test_kernel_function: erroneous test count

The test_kernel_function test runs one extra test which causes it to be
marked as failed. Adjust the count to fit the number of tests.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ia861cf7b0123fdf19405bb84271ddda76f9fd6e7

2 years agoTests: fix: test_kernel_function: event name mismatch
Jérémie Galarneau [Wed, 9 Feb 2022 22:00:40 +0000 (17:00 -0500)] 
Tests: fix: test_kernel_function: event name mismatch

The test_kernel_function test expects the name of the function
exit event to be suffixed with "_return". This was changed to
"_exit" during the development of the 2.13 release.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I6e7f85af33699b2bf0439f74343d521f896594e1

2 years agoTests: fix: select_poll_epoll: test assumes epoll fd value
Jérémie Galarneau [Thu, 10 Feb 2022 16:59:31 +0000 (11:59 -0500)] 
Tests: fix: select_poll_epoll: test assumes epoll fd value

The test currently assumes that epoll fds are always == 3, which
is not always the case depending on the execution environment.

This change causes `select_poll_epoll` to produce a JSON file
containing the application's pid and epoll fd values that is
then used by the validation script.

Note that the test is converted to C++ to allow the use of
internal utils (common/error.h/cpp) without changing their linkage.

However, the code is still regular C to ease the backport of this
fix.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ie373c63f6e6b9267ae2d785c9f0a532a5de37905

2 years agoFix: tests: test_kernel: break should only be used in loops
Jérémie Galarneau [Thu, 24 Feb 2022 19:06:28 +0000 (14:06 -0500)] 
Fix: tests: test_kernel: break should only be used in loops

Using `break` in a function's scope makes no sense in bash. I am
guessing the original author meant to exit early from the various tests.

Regardless, the rest of the test can be ran without issues. I am not
sure traces of failed tests should be kept, but that's a separate issue.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I286ccb796afbbca4e3866e6fd0b35a3746045346

2 years agoTests: test_kernel: add comments regarding the filling of buffers
Jérémie Galarneau [Thu, 24 Feb 2022 19:05:16 +0000 (14:05 -0500)] 
Tests: test_kernel: add comments regarding the filling of buffers

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I3a8652d7dd95c015d6978349d166e8f291c9a5f3

2 years agoFix: tools/snapshots/test_kernel flaky test
Mathieu Desnoyers [Thu, 24 Feb 2022 17:24:45 +0000 (12:24 -0500)] 
Fix: tools/snapshots/test_kernel flaky test

When tracing all system calls, nothing guarantees that the first system
call won't come from some _other_ program on the system, on a CPU != 0,
and stay invariant between the two snapshots (when it should not be).

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Iba45664f10eea0a757f8ac4eb9a7c7c75c305eef

2 years agoTests: Fix: test_list_triggers_cli: support in-kernel builtin lttng-modules
Francis Deslauriers [Thu, 25 Mar 2021 17:13:45 +0000 (13:13 -0400)] 
Tests: Fix: test_list_triggers_cli: support in-kernel builtin lttng-modules

This commit changes the grep call to remove the [lttng_tracer] string
from the pattern. When building the lttng modules directly in the kernel
there is not mention of [lttng_tracer].

Furthermore, the symbol type "t" is "T" in my test VM. The difference
may be due to the builtin nature of lttng on this VM.

Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I046905dc51524c23ad3671b66614b86a084ef8e2
Depends-on: lttng-ust: If1f29dd64538bc5979d3e89aa5cca3be4e41046f

2 years agoAdd Log4j 2.x agent tests for the 'log4j' domain
Michael Jeanson [Fri, 11 Feb 2022 15:26:02 +0000 (15:26 +0000)] 
Add Log4j 2.x agent tests for the 'log4j' domain

Add integration tests for the new Log4j 2.x agent in Log4j 1.x compat
mode using the current 'log4j' domain, use the new configure switch
'--enable-test-java-agent-log4j2' to enable it or
'--enable-test-java-agent-all' to enable all Java agents tests.

To run only this new test, use this command :

  cd tests/regression && make check TESTS="ust/java-log4j2/test_agent_log4j2_domain_log4j"

Change-Id: Id780c9ee13913c91c18548f58b14cc600e77e9fa
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agosrc/common: use single Makefile for parallel builds
Michael Jeanson [Fri, 19 Nov 2021 20:40:09 +0000 (15:40 -0500)] 
src/common: use single Makefile for parallel builds

Use a single Makefile in 'src/common' as it contains multiple
subdirectories with a small number of objects to compile. This allows
faster parallel builds since parallelism in automake is applied per
Makefile.

There is anectodal evidence of a 25 seconds improvement to the build
process on a 36 core machine.

Change-Id: If2ce266050e345d58b00bf65b574ccf5168f28f1
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 years agoFix: missing RCU read side critical sections
Francis Deslauriers [Mon, 27 Sep 2021 18:01:04 +0000 (14:01 -0400)] 
Fix: missing RCU read side critical sections

Based on the comments of the called functions.

Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ica81b648ce88290c4ca7507fb00a78480457cf01

2 years agoEnforce documented RCU preconditions with assertions
Francis Deslauriers [Mon, 27 Sep 2021 14:56:28 +0000 (10:56 -0400)] 
Enforce documented RCU preconditions with assertions

Mindlessly add `rcu_read_ongoing()` assertions to functions that are
documented as "must be called" within a RCU critical section.

Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I25f9903938123394e6960ab2a338be6abaf2fe72

This page took 0.057822 seconds and 5 git commands to generate.