Michael Jeanson [Fri, 29 Jun 2018 21:28:31 +0000 (17:28 -0400)]
Cleanup: modinfo keys
Remove duplicates keys, add missing keys, add missing information and
fix the description of some modules.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Fri, 29 Jun 2018 21:28:30 +0000 (17:28 -0400)]
Add extra version information framework
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Wed, 27 Jun 2018 16:42:52 +0000 (12:42 -0400)]
Revert "Add btrfs file item tracepoints"
This reverts commit
b42b2955e13153b7283f20613f15fe98e6427baf.
It introduces the following warnings:
>>> depmod: WARNING: /lib/modules/4.18.0-rc1+/extra/probes/lttng-probe-btrfs.ko
>>> needs unknown symbol btrfs_get_token_32
>>> depmod: WARNING: /lib/modules/4.18.0-rc1+/extra/probes/lttng-probe-btrfs.ko
>>> needs unknown symbol btrfs_get_token_8
>>> depmod: WARNING: /lib/modules/4.18.0-rc1+/extra/probes/lttng-probe-btrfs.ko
>>> needs unknown symbol btrfs_get_token_16
>>> depmod: WARNING: /lib/modules/4.18.0-rc1+/extra/probes/lttng-probe-btrfs.ko
>>> needs unknown symbol btrfs_get_token_64
>>> make[1]: Leaving directory `/home/efficios/git/linux-percpu-dev'
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Mon, 18 Jun 2018 18:53:19 +0000 (14:53 -0400)]
Fix: btrfs: Remove unnecessary fs_info parameter
See upstream commit:
commit
3dca5c942dac60164e6a6e89172f25b86af07ce7
Author: Qu Wenruo <wqu@suse.com>
Date: Thu Apr 26 14:24:25 2018 +0800
btrfs: trace: Remove unnecessary fs_info parameter for btrfs__reserve_extent event class
fs_info can be extracted from btrfs_block_group_cache, and all
btrfs_block_group_cache is created by btrfs_create_block_group_cache()
with fs_info initialized, no need to worry about NULL pointer
dereference.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Mon, 18 Jun 2018 18:53:18 +0000 (14:53 -0400)]
Fix: btrfs: use fs_info for btrfs_handle_em_exist tracepoint
See upstream commit:
commit
f46b24c9457143a367c6707eac82d546e2bcf280
Author: David Sterba <dsterba@suse.com>
Date: Tue Apr 3 21:45:57 2018 +0200
btrfs: use fs_info for btrfs_handle_em_exist tracepoint
We really want to know to which filesystem the extent map events belong,
but as it cannot be reached from the extent_map pointers, we need to
pass it down the callchain.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Mon, 18 Jun 2018 18:53:17 +0000 (14:53 -0400)]
Fix: asoc: Remove snd_soc_cache_sync() implementation
See upstream commit:
commit
427d204c86e095bb91eb8af381bd90a48376a860
Author: Lars-Peter Clausen <lars@metafoo.de>
Date: Sat Nov 8 16:38:07 2014 +0100
ASoC: Remove snd_soc_cache_sync() implementation
This function has no more non regmap user, which means we can remove the
implementation of the function and associated functions and structure
fields.
For convenience we keep a static inline version of the function that
forwards calls to regcache_sync() unconditionally.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Mon, 18 Jun 2018 18:53:16 +0000 (14:53 -0400)]
Fix: asoc: fix printing jack name
See upstream commit:
commit
f4833a519aec793cf8349bf479589d37473ef6a7
Author: Arnd Bergmann <arnd@arndb.de>
Date: Wed Feb 24 17:38:14 2016 +0100
ASoC: trace: fix printing jack name
After a change to the snd_jack structure, the 'name' member
is no longer available in all configurations, which results in a
build failure in the tracing code:
include/trace/events/asoc.h: In function 'trace_event_raw_event_snd_soc_jack_report':
include/trace/events/asoc.h:240:32: error: 'struct snd_jack' has no member named 'name'
The name field is normally initialized from the card shortname and
the jack "id" field:
snprintf(jack->name, sizeof(jack->name), "%s %s",
card->shortname, jack->id);
This changes the tracing output to just contain the 'id' by
itself, which slightly changes the output format but avoids the
link error and is hopefully still enough to see what is going on.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Mon, 18 Jun 2018 18:53:15 +0000 (14:53 -0400)]
Fix: asoc: Consolidate path trace events
See upstream commit:
commit
6e588a0d839b51bae49852b68740a25cacc91978
Author: Lars-Peter Clausen <lars@metafoo.de>
Date: Tue Aug 11 21:38:01 2015 +0200
ASoC: dapm: Consolidate path trace events
The snd_soc_dapm_input_path and snd_soc_dapm_output_path trace events are
identical except for the direction. Instead of having two events have a
single one that has a field that contains the direction.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Mon, 18 Jun 2018 18:53:14 +0000 (14:53 -0400)]
Fix: ASoC level IO tracing removed upstream
Removed in v3.16.
See upstream commits:
Author: Lars-Peter Clausen <lars@metafoo.de>
Date: Tue Apr 22 13:23:17 2014 +0200
ASoC: Remove ASoC level IO tracing
The ASoC framework is in the process of migrating all IO operations to regmap.
regmap has its own more sophisticated tracing infrastructure for IO operations,
which means that the ASoC level IO tracing becomes redundant, hence this patch
removes them. There are still a handful of ASoC drivers left that do not use
regmap yet, but hopefully the removal of the ASoC IO tracing will be an
additional incentive to switch to regmap.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Francis Deslauriers [Thu, 7 Jun 2018 18:48:04 +0000 (14:48 -0400)]
Enable userspace callstack contexts only on x86
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Francis Deslauriers [Mon, 29 May 2017 19:32:04 +0000 (15:32 -0400)]
Prevent re-entrancy in callstack-user context
Userspace callstack context often triggers kernel pagefaults that can be
traced by the kernel tracer which might then attempt to gather the
userspace callstack again... This recursion will be stop by the
RING_BUFFER_MAX_NESTING check but will still pollute the traces with
redundant information.
To prevent this, check if the tracer is already gathering the userspace
callstack and if it's the case don't record it again.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Sun, 25 Oct 2015 16:02:24 +0000 (12:02 -0400)]
Callstack context: bump number of entries to 128
Use a limit that fits in a 4096 bytes page on a 64-bit system. The only
reason for the prior 25 entries limitation was a bug in the header size
calculation (now fixed).
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Sun, 25 Oct 2015 15:21:32 +0000 (11:21 -0400)]
Fix: callstack context alignment calculation
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Sat, 24 Oct 2015 09:25:52 +0000 (05:25 -0400)]
Cleanup callstack context
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Sat, 24 Oct 2015 08:57:44 +0000 (04:57 -0400)]
Fix callstack context: write empty sequence if no stack trace
The trace content needs to match the metadata, else the trace will be
corrupted.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Sat, 24 Oct 2015 08:48:11 +0000 (04:48 -0400)]
Fix: callstack context: false-sharing, bad memory size allocation
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Sat, 24 Oct 2015 08:17:44 +0000 (04:17 -0400)]
callstack context: use delimiter when stack is incomplete
Reverse the delimiter logic so we only consume trace space and pollute
the user output when the stack is incomplete.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Sat, 24 Oct 2015 07:42:10 +0000 (03:42 -0400)]
Cleanup callstack context
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Francis Giraldeau [Wed, 17 Jul 2013 21:05:20 +0000 (17:05 -0400)]
Add kernel and user callstack contexts
Signed-off-by: Francis Giraldeau <francis.giraldeau@gmail.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Francis Deslauriers [Tue, 30 May 2017 15:53:35 +0000 (11:53 -0400)]
Assign CPU id before saving the context size
The callstack contexts will use the CPU id to save per-CPU data so this
field needs to be set before calling the get_size function of this
context.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Francis Giraldeau [Wed, 3 Sep 2014 19:47:21 +0000 (15:47 -0400)]
Define max nesting count constant
Extract the constant within the code as #define. The define is added to
frontend.h in order to be included in other source files.
Signed-off-by: Francis Giraldeau <francis.giraldeau@gmail.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Francis Deslauriers [Tue, 30 May 2017 15:50:18 +0000 (11:50 -0400)]
Compute variable sized context length
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Francis Giraldeau [Wed, 27 Aug 2014 19:52:14 +0000 (15:52 -0400)]
Pass arguments for context size computation
Pass same arguments to get_size_arg() than to record(). This new
operation has the same effect than get_size(), and the client code can
implement either one.
Signed-off-by: Francis Giraldeau <francis.giraldeau@gmail.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Thu, 7 Jun 2018 19:49:11 +0000 (15:49 -0400)]
Add 9p probe
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Thu, 7 Jun 2018 19:48:32 +0000 (15:48 -0400)]
Update delayed ref tracepoints for v3.12
In v3.12 'btrfs_delayed_tree_ref' was split in 2 tracepoints and the
name was kept as an event class which did not trigger a build failure.
See upstream commit:
commit
599c75ec3f7f3b606e8a0a684c00f12190712de8
Author: Liu Bo <bo.li.liu@oracle.com>
Date: Tue Jul 16 19:03:36 2013 +0800
Btrfs/tracepoint: update delayed ref tracepoints
This shows exactly how btrfs processes the delayed refs onto disks,
which is very helpful on understanding delayed ref mechanism and
debugging related bugs.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Thu, 7 Jun 2018 19:48:31 +0000 (15:48 -0400)]
Add btrfs file item tracepoints
See upstream commit:
commit
09ed2f165cb3449237dec842b3564044e12d22cb
Author: Liu Bo <bo.li.liu@oracle.com>
Date: Fri Mar 10 11:09:48 2017 -0800
Btrfs: add file item tracepoints
While debugging truncate problems, I found that these tracepoints could
help us quickly know what went wrong.
Two sets of tracepoints are created to track regular/prealloc file item
and inline file item respectively, I put inline as a separate one since
what inline file items cares about are way less than the regular one.
This adds four tracepoints:
- btrfs_get_extent_show_fi_regular
- btrfs_get_extent_show_fi_inline
- btrfs_truncate_show_fi_regular
- btrfs_truncate_show_fi_inline
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Thu, 7 Jun 2018 19:48:30 +0000 (15:48 -0400)]
Add btrfs tracepoint for em's EEXIST case
See upstream commits:
commit
393da91819e35af538ef97c7c6a04899e2fbfe0e
Author: Liu Bo <bo.li.liu@oracle.com>
Date: Fri Jan 5 12:51:16 2018 -0700
Btrfs: add tracepoint for em's EEXIST case
This is adding a tracepoint 'btrfs_handle_em_exist' to help debug the
subtle bugs around merge_extent_mapping.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Thu, 7 Jun 2018 19:32:49 +0000 (15:32 -0400)]
Fix: dyntick field added to trace_rcu_dyntick in v4.16
See upstream commit:
commit
dec98900eae1e22467182e58688abe5fae98bd5f
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date: Wed Oct 4 16:24:29 2017 -0700
rcu: Add ->dynticks field to rcu_dyntick trace event
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Thu, 7 Jun 2018 16:24:28 +0000 (12:24 -0400)]
Fix: BUILD_BUG_ON with compile time constant on < v2.6.38
See upstream commits :
commit
8c87df457cb58fe75b9b893007917cf8095660a0
Author: Jan Beulich <JBeulich@novell.com>
Date: Tue Sep 22 16:43:52 2009 -0700
BUILD_BUG_ON(): fix it and a couple of bogus uses of it
gcc permitting variable length arrays makes the current construct used for
BUILD_BUG_ON() useless, as that doesn't produce any diagnostic if the
controlling expression isn't really constant. Instead, this patch makes
it so that a bit field gets used here. Consequently, those uses where the
condition isn't really constant now also need fixing.
Note that in the gfp.h, kmemcheck.h, and virtio_config.h cases
MAYBE_BUILD_BUG_ON() really just serves documentation purposes - even if
the expression is compile time constant (__builtin_constant_p() yields
true), the array is still deemed of variable length by gcc, and hence the
whole expression doesn't have the intended effect.
commit
7ef88ad561457c0346355dfd1f53e503ddfde719
Author: Rusty Russell <rusty@rustcorp.com.au>
Date: Mon Jan 24 14:45:10 2011 -0600
BUILD_BUG_ON: make it handle more cases
BUILD_BUG_ON used to use the optimizer to do code elimination or fail
at link time; it was changed to first the size of a negative array (a
nicer compile time error), then (in
8c87df457cb58fe75b9b893007917cf8095660a0) to a bitfield.
This forced us to change some non-constant cases to MAYBE_BUILD_BUG_ON();
as Jan points out in that commit, it didn't work as intended anyway.
bitfields: needs a literal constant at parse time, and can't be put under
"if (__builtin_constant_p(x))" for example.
negative array: can handle anything, but if the compiler can't tell it's
a constant, silently has no effect.
link time: breaks link if the compiler can't determine the value, but the
linker output is not usually as informative as a compiler error.
If we use the negative-array-size method *and* the link time trick,
we get the ability to use BUILD_BUG_ON() under __builtin_constant_p()
branches, and maximal ability for the compiler to detect errors at
build time.
We also document it thoroughly.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Thu, 7 Jun 2018 16:10:00 +0000 (12:10 -0400)]
Fix: lttng filter validator ERANGE error handling
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Wed, 6 Jun 2018 21:32:26 +0000 (17:32 -0400)]
Fix: filter interpreter: use LTTNG_SIZE_MAX
Own macro required for older kernels.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Mon, 25 Sep 2017 15:37:14 +0000 (11:37 -0400)]
Filter: add FILTER_OP_RETURN_S64 instruction
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Fri, 22 Sep 2017 21:03:34 +0000 (17:03 -0400)]
Perform bitwise ops on unsigned types
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Fri, 22 Sep 2017 20:00:13 +0000 (16:00 -0400)]
Filter: catch shift undefined behavior
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Fri, 22 Sep 2017 00:42:34 +0000 (20:42 -0400)]
Filter: add lshift, rshift, bit not ops
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Tue, 12 Sep 2017 22:36:34 +0000 (18:36 -0400)]
Filter: index array, sequences, implement bitwise binary operators
Implement indexing of array and sequence of integers, as well as bitwise
binary operators &, |, ^.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Tue, 15 May 2018 21:51:24 +0000 (17:51 -0400)]
Fix: pid tracker should track "pgid" for noargs probes
The "pid" notion exposed by LTTng translates to the "pgid" notion in the
Linux kernel. Therefore using "current->pid" as argument to the PID
tracker actually ends up behaving as a "tid" tracker, which does not
match the intent nor the user-space tracer behavior.
The probes taking arguments were fixed by a prior commit, but it missed
probes without arguments.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Tue, 8 May 2018 15:58:25 +0000 (11:58 -0400)]
lttng-tp-mempool: perform node-local allocation
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Tue, 1 May 2018 20:42:44 +0000 (16:42 -0400)]
Fix: update RCU instrumentation for 4.17
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Tue, 17 Apr 2018 15:07:47 +0000 (11:07 -0400)]
Fix: sunrpc instrumentation for 4.17
See upstream commit:
commit
e671edb9428c8a61662aaf8c39f5edced7cc45c7
Author: Chuck Lever <chuck.lever@oracle.com>
Date: Fri Mar 16 10:33:44 2018 -0400
sunrpc: Simplify synopsis of some trace points
Clean up: struct rpc_task carries a pointer to a struct rpc_clnt,
and in fact task->tk_client is always what is passed into trace
points that are already passing @task.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Tue, 17 Apr 2018 15:07:46 +0000 (11:07 -0400)]
Fix: use struct reclaim_stat in mm_vmscan_lru_shrink_inactive for 4.17
See upstream commit:
commit
d51d1e64500fcb48fc6a18c77c965b8f48a175f2
Author: Steven Rostedt <rostedt@goodmis.org>
Date: Tue Apr 10 16:28:07 2018 -0700
mm, vmscan, tracing: use pointer to reclaim_stat struct in trace event
The trace event trace_mm_vmscan_lru_shrink_inactive() currently has 12
parameters! Seven of them are from the reclaim_stat structure. This
structure is currently local to mm/vmscan.c. By moving it to the global
vmstat.h header, we can also reference it from the vmscan tracepoints.
In moving it, it brings down the overhead of passing so many arguments
to the trace event. In the future, we may limit the number of arguments
that a trace event may pass (ideally just 6, but more realistically it
may be 8).
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Tue, 17 Apr 2018 15:07:45 +0000 (11:07 -0400)]
Fix: Add gfp_flags arg to mm_vmscan_kswapd_wake for 4.17
See upstream commit:
commit
5ecd9d403ad081ed2de7b118c1e96124d4e0ba6c
Author: David Rientjes <rientjes@google.com>
Date: Thu Apr 5 16:25:16 2018 -0700
mm, page_alloc: wakeup kcompactd even if kswapd cannot free more memory
Kswapd will not wakeup if per-zone watermarks are not failing or if too
many previous attempts at background reclaim have failed.
This can be true if there is a lot of free memory available. For high-
order allocations, kswapd is responsible for waking up kcompactd for
background compaction. If the zone is not below its watermarks or
reclaim has recently failed (lots of free memory, nothing left to
reclaim), kcompactd does not get woken up.
When __GFP_DIRECT_RECLAIM is not allowed, allow kcompactd to still be
woken up even if kswapd will not reclaim. This allows high-order
allocations, such as thp, to still trigger background compaction even
when the zone has an abundance of free memory.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Khalid Elmously [Sun, 25 Mar 2018 15:06:03 +0000 (11:06 -0400)]
Update: kvm instrumentation for ubuntu 4.13.0-38
Starting from 4.13.0-38 the ubuntu kernel backport a kvm instrumentation
change introduced in 4.15 which affects the prototype of the kvm_mmio
event.
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Fri, 23 Mar 2018 15:41:46 +0000 (11:41 -0400)]
Fix: update kvm instrumentation for Ubuntu 3.13.0-144
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Thu, 22 Mar 2018 21:33:32 +0000 (17:33 -0400)]
Fix: btrfs instrumentation namespacing
Trips this warning:
[ 122.301894] WARNING: CPU: 6 PID: 1654 at /home/efficios/git/lttng-modules/lttng-probes.c:99 fixup_lazy_probes+0x195/0x200 [lttng_tracer]
[ 122.304974] Modules linked in: lttng_probe_compaction(O+) lttng_probe_btrfs(O) lttng_probe_block(O) lttng_ring_buffer_metadata_mmap_client(O) lttng_ring_buffer_client_mmap_overwrite(O) lttng_ring_buffer_client_mmap_discard(O) lttng_ring_buffer_metadata_client(O) lttng_ring_buffer_client_overwrite(O) lttng_ring_buffer_client_discard(O) lttng_tracer(O) lttng_statedump(O) lttng_ftrace(O) lttng_kprobes(O) lttng_clock(O) lttng_lib_ring_buffer(O) lttng_kretprobes(O)
[ 122.314772] CPU: 6 PID: 1654 Comm: modprobe Tainted: G O 4.16.0-rc6+ #54
[ 122.316738] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 122.320280] RIP: 0010:fixup_lazy_probes+0x195/0x200 [lttng_tracer]
[ 122.321825] RSP: 0018:
ffffc90008467ca0 EFLAGS:
00010286
[ 122.323137] RAX:
00000000ffffffff RBX:
ffffffffa01e7000 RCX:
0000000000000061
[ 122.324847] RDX:
0000000000000005 RSI:
ffffffffa01e21ac RDI:
ffffffffa01e233b
[ 122.326528] RBP:
ffffffffa017f078 R08:
0000000000000062 R09:
0000000000000345
[ 122.328154] R10:
0000000000000000 R11:
ffffc90008467a28 R12:
0000000000000005
[ 122.329791] R13:
0000000000000010 R14:
0000000000000010 R15:
0000000000000006
[ 122.331410] FS:
00007f6c8d9a7740(0000) GS:
ffff880c0fb80000(0000) knlGS:
0000000000000000
[ 122.333323] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 122.334673] CR2:
00007ffcc9698ff8 CR3:
0000000c0afae004 CR4:
00000000001606e0
[ 122.336300] Call Trace:
[ 122.337011] ? __event_probe__compaction_migratepages+0x250/0x250 [lttng_probe_compaction]
[ 122.338901] lttng_get_probe_list_head.part.2+0x19/0x20 [lttng_tracer]
[ 122.340349] lttng_probe_register+0xd5/0xe0 [lttng_tracer]
[ 122.341607] ? __event_probe__compaction_migratepages+0x250/0x250 [lttng_probe_compaction]
[ 122.343453] do_one_initcall+0x3d/0x16e
[ 122.344383] ? _cond_resched+0x15/0x30
[ 122.345323] ? kmem_cache_alloc_trace+0xe1/0x1b0
[ 122.346394] ? do_init_module+0x22/0x20c
[ 122.347329] do_init_module+0x5a/0x20c
[ 122.350037] load_module+0x244f/0x2980
[ 122.350958] ? m_show+0x190/0x190
[ 122.351774] ? security_capable+0x41/0x60
[ 122.352723] SYSC_finit_module+0x80/0xb0
[ 122.353716] do_syscall_64+0x76/0x1a0
[ 122.354565] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 122.355669] RIP: 0033:0x7f6c8d4c73c9
[ 122.356502] RSP: 002b:
00007ffcc969c248 EFLAGS:
00000206 ORIG_RAX:
0000000000000139
[ 122.358209] RAX:
ffffffffffffffda RBX:
000055763df4fee9 RCX:
00007f6c8d4c73c9
[ 122.359684] RDX:
0000000000000000 RSI:
000055763df4fee9 RDI:
0000000000000004
[ 122.361182] RBP:
0000000000000000 R08:
0000000000000000 R09:
000055763f39a450
[ 122.362663] R10:
0000000000000004 R11:
0000000000000206 R12:
000055763f392400
[ 122.364144] R13:
000055763f396cb0 R14:
000055763f3925a0 R15:
0000000000040000
[ 122.365690] Code: 25 14 a0 4a 8b 04 f0 48 8b 30 31 c0 e8 25 3b 10 e1 48 8b 43 08 48 8b 33 4c 89 e2 4a 8b 04 f0 48 8b 38 e8 9f b7 b1 e1 85 c0 74 07 <0f> 0b e9 b3 fe ff ff 48 c7 c7 16 26 14 a0 e8 f8 3a 10 e1 48 8b
[ 122.369348] ---[ end trace
15840f1166edf835 ]---
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Tue, 13 Mar 2018 16:14:43 +0000 (12:14 -0400)]
Cleanup: comment about CONFIG_HOTPLUG_CPU ifdef
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Lars Persson [Sun, 11 Mar 2018 14:02:43 +0000 (15:02 +0100)]
Fix: do not use CONFIG_HOTPLUG_CPU for the new hotplug API
Kernel configurations without CONFIG_HOTPLUG_CPU throw an unknown
symbol error when attempting to insert the lttng-trace module:
lttng_tracer: Unknown symbol lttng_hp_prepare (err 0)
lttng_tracer: Unknown symbol lttng_hp_online (err 0)
This was caused by lttng-events and lttng-context-perf-counter not
agreeing on which preprocessor condition that should guard the use of
the hotplug API. In fact the API is available also on kernels built
without CONFIG_HOTPLUG_CPU.
Signed-off-by: Lars Persson <larper@axis.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Thu, 8 Mar 2018 16:18:56 +0000 (11:18 -0500)]
Fix: update kvm instrumentation for 4.1.50+
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Julien Desfossez [Fri, 23 Feb 2018 16:37:11 +0000 (11:37 -0500)]
Use the memory pool instead of kmalloc
Replace the use of kmalloc/kfree in the tracepoint probes that need
dynamic allocation with the tracepoint memory pool alloc/free.
Signed-off-by: Julien Desfossez <jdesfossez@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Julien Desfossez [Fri, 23 Feb 2018 16:37:10 +0000 (11:37 -0500)]
Create a memory pool for temporary tracepoint probes storage
This memory pool is created when the lttng-tracer module is loaded. It
allocates 4 buffers of 4k on each CPU. These buffers are designed to
allow tracepoint probes to temporarily store data that does not fit on
the stack (during the code_pre and code_post phases). The memory is
freed when the lttng-tracer module is unloaded.
This removes the need for dynamic allocation during the execution of
tracepoint probes, which does not behave well on PREEMPT_RT kernel, even
when invoked with the GFP_ATOMIC | GFP_NOWAIT flags.
Signed-off-by: Julien Desfossez <jdesfossez@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Wed, 21 Feb 2018 21:36:17 +0000 (16:36 -0500)]
Fix: use proper pid_ns in the process statedump
The pid_ns we currently use from the nsproxy struct is not the task's
pid_ns but the one that children of this task will use.
As stated in include/linux/nsproxy.h :
The pid namespace is an exception -- it's accessed using
task_active_pid_ns. The pid namespace here is the
namespace that children will use.
While it will be the same most of the time, it will report incorrect
information in some situations. Plus it has the side effect of
simplifying the code and removing kernel version checks.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Tue, 20 Feb 2018 17:16:25 +0000 (12:16 -0500)]
Fix: add variable quoting to shell scripts
Prevent errors if a path contains spaces.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Tue, 20 Feb 2018 17:10:05 +0000 (12:10 -0500)]
Update: kvm instrumentation for fedora 4.14.13-300
Starting from 4.14.13-300 the fedora kernel backport a kvm instrumentation
change introduced in 4.15 which affects the prototype of the kvm_mmio event.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Loïc Gelle [Tue, 20 Feb 2018 17:10:04 +0000 (12:10 -0500)]
Fix: Add Fedora version macros
Signed-off-by: Loïc Gelle <loic.gelle@polymtl.ca>
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Tue, 19 Dec 2017 21:10:23 +0000 (16:10 -0500)]
Add preemptirq instrumentation
The tracepoints were introduced in kernl 4.15 alongside the config
option PREEMPTIRQ_EVENTS.
This enables tracing of disable and enable events for preemption and
irqs. For tracing preempt disable/enable events, DEBUG_PREEMPT must be
enabled. For tracing irq disable/enable events, PROVE_LOCKING must
be disabled.
See upstream commit:
commit
d59158162e032917a428704160a2063a02405ec6
Author: Joel Fernandes <joelaf@google.com>
Date: Tue Oct 10 15:51:37 2017 -0700
tracing: Add support for preempt and irq enable/disable events
Preempt and irq trace events can be used for tracing the start and
end of an atomic section which can be used by a trace viewer like
systrace to graphically view the start and end of an atomic section and
correlate them with latencies and scheduling issues.
This also serves as a prelude to using synthetic events or probes to
rewrite the preempt and irqsoff tracers, along with numerous benefits of
using trace events features for these events.
Link: http://lkml.kernel.org/r/20171006005432.14244-3-joelaf@google.com
Link: http://lkml.kernel.org/r/20171010225137.17370-1-joelaf@google.com
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Jérémie Galarneau [Fri, 4 Nov 2016 21:12:49 +0000 (17:12 -0400)]
Clean-up: fix stale #endif comments
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Julien Desfossez [Wed, 18 Oct 2017 15:14:08 +0000 (11:14 -0400)]
Command to dump the metadata cache again
This command allows the consumer to ask for the metadata cache to be
dumped entirely another time. This is used by the session rotation
feature to get a new copy of what was in the metadata cache without
regenerating it and re-sampling the offset from epoch.
Signed-off-by: Julien Desfossez <jdesfossez@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Stéphane Graber [Tue, 19 Dec 2017 20:55:07 +0000 (15:55 -0500)]
Add a new /dev/lttng-logger interface
This is identical to /proc/lttng-logger but has the advantage of working
from within containers when the path is made accessible to them.
Fixes: #1145
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Tue, 13 Feb 2018 20:23:51 +0000 (15:23 -0500)]
Fix: update btrfs instrumentation for SuSE 4.4.114-92
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Tue, 13 Feb 2018 20:23:50 +0000 (15:23 -0500)]
Fix: update block instrumentation for SuSE 4.4.114-92
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Mon, 12 Feb 2018 17:32:25 +0000 (18:32 +0100)]
Fix: update rcu instrumentation for v4.16
See upstream commits :
commit
dec98900eae1e22467182e58688abe5fae98bd5f
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date: Wed Oct 4 16:24:29 2017 -0700
rcu: Add ->dynticks field to rcu_dyntick trace event
commit
84585aa8b6ad24e5bdfba9db4a320a6aeed192ab
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date: Wed Oct 4 15:55:16 2017 -0700
rcu: Shrink ->dynticks_{nmi_,}nesting from long long to long
Because the ->dynticks_nesting field now only contains the process-based
nesting level instead of a value encoding both the process nesting level
and the irq "nesting" level, we no longer need a long long, even on
32-bit systems. This commit therefore changes both the ->dynticks_nesting
and ->dynticks_nmi_nesting fields to long.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Mon, 12 Feb 2018 17:32:12 +0000 (18:32 +0100)]
Fix: update vmscan instrumentation for v4.16
See upstream commit :
commit
9092c71bb724dba2ecba849eae69e5c9d39bd3d2
Author: Josef Bacik <jbacik@fb.com>
Date: Wed Jan 31 16:16:26 2018 -0800
mm: use sc->priority for slab shrink targets
Previously we were using the ratio of the number of lru pages scanned to
the number of eligible lru pages to determine the number of slab objects
to scan. The problem with this is that these two things have nothing to
do with each other, so in slab heavy work loads where there is little to
no page cache we can end up with the pages scanned being a very low
number. This means that we reclaim next to no slab pages and waste a
lot of time reclaiming small amounts of space.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Rasmus Villemoes [Mon, 12 Feb 2018 17:31:40 +0000 (18:31 +0100)]
Fix: update timer instrumentation on 4.16 and 4.14-rt
See upstream commit :
commit
63e2ed3659752a4850e0ef3a07f809988fcd74a4
Author: Anna-Maria Gleixner <anna-maria@linutronix.de>
Date: Thu Dec 21 11:41:38 2017 +0100
tracing/hrtimer: Print the hrtimer mode in the 'hrtimer_start' tracepoint
The 'hrtimer_start' tracepoint lacks the mode information. The mode is
important because consecutive starts can switch from ABS to REL or from
PINNED to non PINNED.
Append the mode field.
See linux-rt commit :
commit
6ee32a49b1ed61c08ac9f1c9fcbf83d3c749b71d
Author: Anna-Maria Gleixner <anna-maria@linutronix.de>
Date: Sun Oct 22 23:39:46 2017 +0200
tracing: hrtimer: Print hrtimer mode in hrtimer_start tracepoint
The hrtimer_start tracepoint lacks the mode information. The mode is
important because consecutive starts can switch from ABS to REL or from
PINNED to non PINNED.
Add the mode information.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Tue, 30 Jan 2018 21:48:36 +0000 (16:48 -0500)]
Update kvm instrumentation for debian kernel 4.14.0-3
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Thu, 25 Jan 2018 17:41:57 +0000 (12:41 -0500)]
Fix: network instrumentation protocol enum
The enumeration field within the header payload should keep the
enumeration describing the header field, and not use the variant
selector enumeration.
This issue has been introduced by commit "Fix: network instrumentation
handling of corrupted TCP headers".
It causes the following warning messages in babeltrace:
[warning] Unknown value 6 in enum.
[warning] Unknown value 17 in enum.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Tue, 23 Jan 2018 21:03:25 +0000 (16:03 -0500)]
Fix: update btrfs instrumentation for SuSE 4.4.103-6
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Tue, 23 Jan 2018 21:03:24 +0000 (16:03 -0500)]
Fix: update block instrumentation for SuSE 4.4.73-5
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Tue, 23 Jan 2018 21:00:07 +0000 (16:00 -0500)]
Fix: global_dirty_limit for kernel v4.2 and up
global_dirty_limit was moved into wb_domain
See upstream commit :
commit
dcc25ae76eb7b8ff883eaaab57e30e8f2f085be3
Author: Tejun Heo <tj@kernel.org>
Date: Fri May 22 18:23:22 2015 -0400
writeback: move global_dirty_limit into wb_domain
This patch is a part of the series to define wb_domain which
represents a domain that wb's (bdi_writeback's) belong to and are
measured against each other in. This will enable IO backpressure
propagation for cgroup writeback.
global_dirty_limit exists to regulate the global dirty threshold which
is a property of the wb_domain. This patch moves hard_dirty_limit,
dirty_lock, and update_time into wb_domain.
This is pure reorganization and doesn't introduce any behavioral
changes.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Thu, 18 Jan 2018 19:18:14 +0000 (14:18 -0500)]
Fix: network instrumentation handling of corrupted TCP headers
A malformed packet may contain a valid IPv4/IPv6 header, but an
inconsistent TCP header. As a result, the trace contains a fully
formed IPv4/IPv6 header, including the "protocol" or "nexthdr"
fields indicating TCP, but no following TCP header.
This scenario leads to an unreadable CTF trace, because the
trace viewer expects a TCP header, but instead gets the next
event.
Therefore, using the IP header fields as selector for the
transport layer variant is not the right approach: introduce
our own selector field, which allows to properly deal with this
corner-case.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Wed, 17 Jan 2018 18:37:26 +0000 (13:37 -0500)]
Fix: add missing uaccess.h include from kstrtox.h wrapper
Required to build lttng-modules against kernel < 3.0.0 on ARM.
Fixes #1148
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Wed, 17 Jan 2018 16:17:08 +0000 (11:17 -0500)]
Update: kvm instrumentation for 4.14.14+, 4.9.77+, 4.4.112+
Starting from 3.14.14, 4.9.77, and 4.4.112, the 3.14, 4.9, and 4.4
stable kernel branches backport a kvm instrumentation change introduced
in 4.15 which affects the prototype of the kvm_mmio event.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Tue, 9 Jan 2018 22:40:00 +0000 (17:40 -0500)]
Fix: btrfs_delayed_ref_head was unwired since v3.12
See upstream commit:
commit
599c75ec3f7f3b606e8a0a684c00f12190712de8
Author: Liu Bo <bo.li.liu@oracle.com>
Date: Tue Jul 16 19:03:36 2013 +0800
Btrfs/tracepoint: update delayed ref tracepoints
This shows exactly how btrfs processes the delayed refs onto disks,
which is very helpful on understanding delayed ref mechanism and
debugging related bugs.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Tue, 9 Jan 2018 20:43:20 +0000 (15:43 -0500)]
Update kvm instrumentation for debian kernel 4.9.65-3
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Tue, 9 Jan 2018 20:43:19 +0000 (15:43 -0500)]
Fix: debian kernel version parsing
The debian version script only worked for ckt kernels and that was fine
until now because we only had checks for those versions in the code.
ckt (Canonical Kernel Team) kernels were used for a while during the jessie
cycle, their versionning is a bit different. They track the upstream vanilla
stable updates but they don't update the minor version number and instead add
an additionnal -cktX. They were all 3.16.7-cktX and after a while the version
switched back to upstream style at 3.16.36.
Knowing that, we can compare regular debian and ckt kernel versions
using this scheme :
MAJOR.PATCHLEVEL.SUBLEVEL.CKT.DEBABI.DEBPATCH
And setting CKT to zero for non-ckt kernels.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Tue, 9 Jan 2018 16:04:36 +0000 (11:04 -0500)]
Fix: block instrumentation 4.14+ NULL pointer dereference
Support for block layer instrumentation on Linux kernels 4.14+
introduces the following NULL pointer dereference:
181.6723 [ 3819.390121] BUG: unable to handle kernel NULL pointer dereference at
0000000000000008
181.6724 [ 3819.394856] IP: __event_probe__block_get_rq+0x127/0x4a0 [lttng_probe_block]
181.6725 [ 3819.394856] PGD
7b924067 P4D
7b924067 PUD
733a7067 PMD 0
181.6726 [ 3819.394856] Oops: 0000 [#1] SMP
181.6727 [ 3819.394856] Modules linked in: lttng_test(OE) lttng_probe_x86_exceptions(OE) lttng_probe_x86_irq_vectors(OE) lttng_probe_writeback(OE) lttng_probe_workqueue(OE) lttng_probe_vmscan(OE) lttng_probe_udp(OE) lttng_probe_timer(OE) lttng_probe_sunrpc(OE) lttng_probe_statedump(OE) lttng_probe_sock(OE) lttng_probe_skb(OE) lttng_probe_signal(OE) lttng_probe_scsi(OE) lttng_probe_sched(OE) lttng_probe_regulator(OE) lttng_probe_regmap(OE) lttng_probe_rcu(OE) lttng_probe_random(OE) lttng_probe_printk(OE) lttng_probe_power(OE) lttng_probe_net(OE) lttng_probe_napi(OE) lttng_probe_module(OE) lttng_probe_kvm_x86_mmu(OE) lttng_probe_kvm_x86(OE) lttng_probe_kvm(OE) lttng_probe_kmem(OE) lttng_probe_jbd2(OE) lttng_probe_irq(OE) lttng_probe_i2c(OE) lttng_probe_gpio(OE) lttng_probe_ext4(OE) lttng_probe_compaction(OE) lttng_probe_btrfs(OE)
181.6728 [ 3819.394856] lttng_probe_block(OE) lttng_ring_buffer_metadata_mmap_client(OE) lttng_ring_buffer_client_mmap_overwrite(OE) lttng_ring_buffer_client_mmap_discard(OE) lttng_ring_buffer_metadata_client(OE) lttng_ring_buffer_client_overwrite(OE) lttng_ring_buffer_client_discard(OE) lttng_tracer(OE) lttng_statedump(OE) lttng_ftrace(OE) lttng_kprobes(OE) lttng_clock(OE) lttng_lib_ring_buffer(OE) lttng_kretprobes(OE) [last unloaded: lttng_statedump]
181.6729 [ 3819.394856] CPU: 1 PID: 17541 Comm: kworker/u4:2 Tainted: G OE 4.14.0 #1
181.6730 [ 3819.394856] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
181.6731 [ 3819.394856] Workqueue: events_freezable_power_ disk_events_workfn
181.6732 [ 3819.394856] task:
ffff9cd5b9bb1cc0 task.stack:
ffffbf4100444000
181.6733 [ 3819.394856] RIP: 0010:__event_probe__block_get_rq+0x127/0x4a0 [lttng_probe_block]
181.6734 [ 3819.394856] RSP: 0018:
ffffbf4100447b40 EFLAGS:
00010246
181.6735 [ 3819.394856] RAX:
0000000000000000 RBX:
ffff9cd5b39757a8 RCX:
ffff9cd5ae850000
181.6736 [ 3819.394856] RDX:
000000000000042a RSI:
0000000000000bd6 RDI:
ffffdf40ffd04470
181.6737 [ 3819.394856] RBP:
ffffbf4100447c50 R08:
0000000000800000 R09:
0000000000019bd6
181.6738 [ 3819.394856] R10:
ffffdf40ffd04470 R11:
0000000000000000 R12:
0000000000000000
181.6739 [ 3819.394856] R13:
000000000001d060 R14:
ffff9cd5bb9988a0 R15:
ffff9cd5b992b480
181.6740 [ 3819.394856] FS:
0000000000000000(0000) GS:
ffff9cd5bfd00000(0000) knlGS:
0000000000000000
181.6741 [ 3819.394856] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
181.6742 [ 3819.394856] CR2:
0000000000000008 CR3:
00000000736ab000 CR4:
00000000000006e0
181.6743 [ 3819.394856] Call Trace:
181.6744 [ 3819.394856] ? scsi_old_init_rq+0x84/0x100
181.6745 [ 3819.394856] ? mempool_alloc+0x5f/0x150
181.6746 [ 3819.394856] ? kvm_clock_read+0x1e/0x20
181.6747 [ 3819.394856] get_request+0x4db/0x7e0
181.6748 [ 3819.394856] ? wait_woken+0x80/0x80
181.6749 [ 3819.394856] blk_get_request+0x9c/0x110
181.6750 [ 3819.394856] scsi_execute+0x40/0x260
181.6751 [ 3819.394856] sr_check_events+0x7d/0x290
181.6752 [ 3819.394856] cdrom_check_events+0x18/0x30
181.6753 [ 3819.394856] sr_block_check_events+0x2a/0x30
181.6754 [ 3819.394856] disk_check_events+0x51/0x130
181.6755 [ 3819.394856] disk_events_workfn+0x16/0x20
181.6756 [ 3819.394856] process_one_work+0x156/0x3f0
181.6757 [ 3819.394856] worker_thread+0x4b/0x460
181.6758 [ 3819.394856] kthread+0x109/0x140
181.6759 [ 3819.394856] ? process_one_work+0x3f0/0x3f0
181.6760 [ 3819.394856] ? kthread_create_on_node+0x40/0x40
181.6761 [ 3819.394856] ret_from_fork+0x25/0x30
181.6762 [ 3819.394856] Code: 00 00 00 00 48 89 85 20 ff ff ff 48 8d 85 10 ff ff ff 8b 73 04 48 89 85 28 ff ff ff 49 8b 47 48 ff 50 28 85 c0 0f 88 78 01 00 00 <49> 8b 44 24 08 ba 04 00 00 00 48 8d b5 08 ff ff ff 48 8d bd 20
181.6763 [ 3819.394856] RIP: __event_probe__block_get_rq+0x127/0x4a0 [lttng_probe_block] RSP:
ffffbf4100447b40
181.6764 [ 3819.394856] CR2:
0000000000000008
181.6765 [ 3819.394856] ---[ end trace
b08f087751369a25 ]---
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Tue, 2 Jan 2018 16:07:05 +0000 (11:07 -0500)]
Update: kvm instrumentation for 3.16.52 and 3.2.97
Starting from 3.16.52 and 3.2.97, the 3.16 and 3.2 stable kernel
branches backport a kvm instrumentation change introduced in 4.15 which
affects the prototype of the kvm_mmio event.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Wed, 27 Dec 2017 14:07:30 +0000 (09:07 -0500)]
Fix: kvm instrumentation for 4.15
Incorrect version range.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Tue, 26 Dec 2017 14:47:36 +0000 (09:47 -0500)]
Update sock instrumentation for 4.15
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Tue, 26 Dec 2017 14:47:22 +0000 (09:47 -0500)]
Update kvm instrumentation for 4.15
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Tue, 19 Dec 2017 20:06:42 +0000 (15:06 -0500)]
Fix: ACCESS_ONCE() removed in kernel 4.15
The ACCESS_ONCE() macro was removed in kernel 4.15 and should be
replaced by READ_ONCE and WRITE_ONCE which were introduced in kernel
3.19.
This commit replaces all calls to ACCESS_ONCE() with the appropriate
READ_ONCE or WRITE_ONCE and adds compatibility macros for kernels that
have them.
See this upstream commit:
commit
b03a0fe0c5e4b46dcd400d27395b124499554a71
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date: Mon Oct 23 14:07:25 2017 -0700
locking/atomics, mm: Convert ACCESS_ONCE() to READ_ONCE()/WRITE_ONCE()
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't currently harmful.
However, for some features it is necessary to instrument reads and
writes separately, which is not possible with ACCESS_ONCE(). This
distinction is critical to correct operation.
It's possible to transform the bulk of kernel code using the Coccinelle
script below. However, this doesn't handle comments, leaving references
to ACCESS_ONCE() instances which have been removed. As a preparatory
step, this patch converts the mm code and comments to use
{READ,WRITE}_ONCE() consistently.
----
virtual patch
@ depends on patch @
expression E1, E2;
@@
- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)
@ depends on patch @
expression E;
@@
- ACCESS_ONCE(E)
+ READ_ONCE(E)
----
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Mon, 18 Dec 2017 19:35:55 +0000 (14:35 -0500)]
Fix: sched instrumentation on stable RT kernels
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Wed, 29 Nov 2017 22:03:21 +0000 (17:03 -0500)]
timer API transition for kernel 4.15
The timer API changes starting from kernel 4.15.0.
There's an interresting LWN article on this subject:
https://lwn.net/Articles/735887/
Check these upstream commits for more details:
commit
686fef928bba6be13cabe639f154af7d72b63120
Author: Kees Cook <keescook@chromium.org>
Date: Thu Sep 28 06:38:17 2017 -0700
timer: Prepare to change timer callback argument type
Modern kernel callback systems pass the structure associated with a
given callback to the callback function. The timer callback remains one
of the legacy cases where an arbitrary unsigned long argument continues
to be passed as the callback argument. This has several problems:
- This bloats the timer_list structure with a normally redundant
.data field.
- No type checking is being performed, forcing callbacks to do
explicit type casts of the unsigned long argument into the object
that was passed, rather than using container_of(), as done in most
of the other callback infrastructure.
- Neighboring buffer overflows can overwrite both the .function and
the .data field, providing attackers with a way to elevate from a buffer
overflow into a simplistic ROP-like mechanism that allows calling
arbitrary functions with a controlled first argument.
- For future Control Flow Integrity work, this creates a unique function
prototype for timer callbacks, instead of allowing them to continue to
be clustered with other void functions that take a single unsigned long
argument.
This adds a new timer initialization API, which will ultimately replace
the existing setup_timer(), setup_{deferrable,pinned,etc}_timer() family,
named timer_setup() (to mirror hrtimer_setup(), making instances of its
use much easier to grep for).
In order to support the migration of existing timers into the new
callback arguments, timer_setup() casts its arguments to the existing
legacy types, and explicitly passes the timer pointer as the legacy
data argument. Once all setup_*timer() callers have been replaced with
timer_setup(), the casts can be removed, and the data argument can be
dropped with the timer expiration code changed to just pass the timer
to the callback directly.
:
Modern kernel callback systems pass the structure associated with a
given callback to the callback function. The timer callback remains one
of the legacy cases where an arbitrary unsigned long argument continues
to be passed as the callback argument. This has several problems:
- This bloats the timer_list structure with a normally redundant
.data field.
- No type checking is being performed, forcing callbacks to do
explicit type casts of the unsigned long argument into the object
that was passed, rather than using container_of(), as done in most
of the other callback infrastructure.
- Neighboring buffer overflows can overwrite both the .function and
the .data field, providing attackers with a way to elevate from a buffer
overflow into a simplistic ROP-like mechanism that allows calling
arbitrary functions with a controlled first argument.
- For future Control Flow Integrity work, this creates a unique function
prototype for timer callbacks, instead of allowing them to continue to
be clustered with other void functions that take a single unsigned long
argument.
This adds a new timer initialization API, which will ultimately replace
the existing setup_timer(), setup_{deferrable,pinned,etc}_timer() family,
named timer_setup() (to mirror hrtimer_setup(), making instances of its
use much easier to grep for).
In order to support the migration of existing timers into the new
callback arguments, timer_setup() casts its arguments to the existing
legacy types, and explicitly passes the timer pointer as the legacy
data argument. Once all setup_*timer() callers have been replaced with
timer_setup(), the casts can be removed, and the data argument can be
dropped with the timer expiration code changed to just pass the timer
to the callback directly.
Since the regular pattern of using container_of() during local variable
declaration repeats the need for the variable type declaration
to be included, this adds a helper modeled after other from_*()
helpers that wrap container_of(), named from_timer(). This helper uses
typeof(*variable), removing the type redundancy and minimizing the need
for line wraps in forthcoming conversions from "unsigned data long" to
"struct timer_list *" in the timer callbacks:
-void callback(unsigned long data)
+void callback(struct timer_list *t)
{
- struct some_data_structure *local = (struct some_data_structure *)data;
+ struct some_data_structure *local = from_timer(local, t, timer);
Finally, in order to support the handful of timer users that perform
open-coded assignments of the .function (and .data) fields, provide
cast macros (TIMER_FUNC_TYPE and TIMER_DATA_TYPE) that can be used
temporarily. Once conversion has been completed, these can be globally
trivially removed.
...
commit
e99e88a9d2b067465adaa9c111ada99a041bef9a
Author: Kees Cook <keescook@chromium.org>
Date: Mon Oct 16 14:43:17 2017 -0700
treewide: setup_timer() -> timer_setup()
This converts all remaining cases of the old setup_timer() API into using
timer_setup(), where the callback argument is the structure already
holding the struct timer_list. These should have no behavioral changes,
since they just change which pointer is passed into the callback with
the same available pointers after conversion. It handles the following
examples, in addition to some other variations.
...
commit
185981d54a60ae90942c6ba9006b250f3348cef2
Author: Kees Cook <keescook@chromium.org>
Date: Wed Oct 4 16:26:58 2017 -0700
timer: Remove init_timer_pinned() in favor of timer_setup()
This refactors the only users of init_timer_pinned() to use
the new timer_setup() and from_timer(). Drops the definition of
init_timer_pinned().
...
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Wed, 13 Dec 2017 18:40:42 +0000 (13:40 -0500)]
Fix: Don't nest get online cpus
Since the cpu hotplug refactoring in the Linux kernel, CPU hotplug
"online cpus" read lock cannot be nested anymore.
Fix this by disabling preemption around the section instead.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Fri, 8 Dec 2017 19:17:21 +0000 (14:17 -0500)]
Fix: lttng_channel_syscall_mask() bool use in bitfield
gcc 7 warns about using ~ on a bool. Pass a char as input type instead.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Tue, 28 Nov 2017 21:02:45 +0000 (16:02 -0500)]
Fix: update kmem instrumentation for kernel 4.15
See upstream commit:
commit
2d4894b5d2ae0fe1725ea7abd57b33bfbbe45492
Author: Mel Gorman <mgorman@techsingularity.net>
Date: Wed Nov 15 17:37:59 2017 -0800
mm: remove cold parameter from free_hot_cold_page*
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Tue, 7 Nov 2017 21:44:36 +0000 (16:44 -0500)]
Fix: lttng_kvmalloc helper NULL pointer OOPS
The static function __vmalloc_node is not visible by KALLSYMS_ALL on at
least some kernels, which leads to a call to a NULL function when trying
to perform allocation of lttng buffer memory under memory fragmentation
conditions (kmalloc_node failure).
Use __vmalloc_node_range instead, and check that the returned pointer
is non-NULL to ensure this type of failure does not happen in any
condition.
Fallback to __vmalloc(), even though it is not NUMA-aware, in case
we fail to find __vmalloc_node_range, and print an explicit warning
to the user console about the need to enable KALLSYMS_ALL.
This affects kernels < 4.12. Later kernels provide kvmalloc(), which
we use.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Wed, 1 Nov 2017 19:55:58 +0000 (15:55 -0400)]
Update version to 2.11.0-pre
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Tue, 31 Oct 2017 22:23:59 +0000 (18:23 -0400)]
Fix: lttng-logger get_user_pages_fast error handling
Comparing a signed return value against an unsigned nr_pages performs
the comparison as "unsigned", and therefore mistakenly considers
get_user_pages_fast() errors as success.
By passing an invalid pointer to write() to the /proc/lttng-logger
interface, unprivileged user-space processes can trigger a kernel OOPS.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Thu, 5 Oct 2017 18:52:15 +0000 (14:52 -0400)]
Fix: update block instrumentation for 4.14 kernel
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Thu, 5 Oct 2017 18:45:43 +0000 (14:45 -0400)]
Revert "Fix: update block instrumentation for kernel 4.14"
This reverts commit
49447902967115fe5a07ee7a1df3d17fbf4b1ab8.
It introduces a NULL pointer dereference:
[ 37.862398] BUG: unable to handle kernel NULL pointer dereference at
0000000000000008
181.3 [ 37.864108] IP: [<
ffffffffa01c41b7>] __event_probe__block_get_rq+0x127/0x4b0 [lttng_probe_block]
181.4 [ 37.864108] PGD
7a402067 PUD
7a4c7067 PMD 0
181.5 [ 37.864108] Oops: 0000 [#1] SMP
181.6 [ 37.864108] Modules linked in: lttng_probe_x86_exceptions(OE) lttng_probe_x86_irq_vectors(OE) lttng_probe_writeback(OE) lttng_probe_workqueue(OE) lttng_probe_vmscan(OE) lttng_probe_udp(OE) lttng_probe_timer(OE) lttng_probe_sunrpc(OE) lttng_probe_statedump(OE) lttng_probe_sock(OE) lttng_probe_skb(OE) lttng_probe_signal(OE) lttng_probe_scsi(OE) lttng_probe_sched(OE) lttng_probe_regulator(OE) lttng_probe_regmap(OE) lttng_probe_rcu(OE) lttng_probe_random(OE) lttng_probe_printk(OE) lttng_probe_power(OE) lttng_probe_net(OE) lttng_probe_napi(OE) lttng_probe_module(OE) lttng_probe_kvm_x86_mmu(OE) lttng_probe_kvm_x86(OE) lttng_probe_kvm(OE) lttng_probe_kmem(OE) lttng_probe_jbd2(OE) lttng_probe_irq(OE) lttng_probe_i2c(OE) lttng_probe_gpio(OE) lttng_probe_ext4(OE) lttng_probe_compaction(OE) lttng_probe_btrfs(OE) lttng_probe_block(OE) lttng_ring_buffer_metadata_mmap_client(OE) lttng_ring_buffer_client_mmap_overwrite(OE) lttng_ring_buffer_client_mmap_discard(OE) lttng_ring_buffer_metadata_client(OE) lttng_ring_buffer_client_overwrite(OE) lttng_ring_buffer_client_discard(OE) lttng_tracer(OE) lttng_statedump(OE) lttng_ftrace(OE) lttng_kprobes(OE) lttng_clock(OE) lttng_lib_ring_buffer(OE) lttng_kretprobes(OE)
181.7 [ 37.864108] CPU: 1 PID: 6 Comm: kworker/u4:0 Tainted: G OE 4.4.90 #1
181.8 [ 37.864108] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
181.9 [ 37.864108] Workqueue: events_freezable_power_ disk_events_workfn
181.10 [ 37.864108] task:
ffff88007c861bc0 ti:
ffff88007c868000 task.ti:
ffff88007c868000
181.11 [ 37.864108] RIP: 0010:[<
ffffffffa01c41b7>] [<
ffffffffa01c41b7>] __event_probe__block_get_rq+0x127/0x4b0 [lttng_probe_block]
181.12 [ 37.864108] RSP: 0018:
ffff88007c86ba98 EFLAGS:
00010246
181.13 [ 37.864108] RAX:
0000000000000000 RBX:
ffff880073683348 RCX:
ffff8800747d0000
181.14 [ 37.864108] RDX:
00000008d0c5bde9 RSI:
00000000000009f2 RDI:
0000000000400000
181.15 [ 37.864108] RBP:
ffff88007c86bba8 R08:
00000000001789ed R09:
0000000000100000
181.16 [ 37.864108] R10:
ffffe8ffffd02460 R11:
0000000000000000 R12:
0000000000000000
181.17 [ 37.864108] R13:
0000000000017fe0 R14:
ffff88007363c6e8 R15:
ffff88007bef83c0
181.18 [ 37.864108] FS:
0000000000000000(0000) GS:
ffff88007fd00000(0000) knlGS:
0000000000000000
181.19 [ 37.864108] CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
181.20 [ 37.864108] CR2:
0000000000000008 CR3:
000000007a4d0000 CR4:
00000000000006e0
181.21 [ 37.864108] Stack:
181.22 [ 37.864108]
0000000000000000 ffffffff8115a46b ffff88007c86bbe8 ffff88007bc67e30
181.23 [ 37.864108]
ffff880073683348 00000000ffffff01 ffff88007a7a1000 ffff88007c86bab8
181.24 [ 37.864108]
0000000000000028 0000000100000001 ffffe8ffffd02460 0000000000000035
181.25 [ 37.864108] Call Trace:
181.26 [ 37.864108] [<
ffffffff8115a46b>] ? ktime_get_mono_fast_ns+0x4b/0x90
181.27 [ 37.864108] [<
ffffffff81532849>] ? alloc_request_struct+0x19/0x20
181.28 [ 37.864108] [<
ffffffff811e8d8f>] ? mempool_alloc+0x5f/0x150
181.29 [ 37.864108] [<
ffffffffa021815c>] ? __event_probe__kmem_alloc+0x1dc/0x2c0 [lttng_probe_kmem]
181.30 [ 37.864108] [<
ffffffff810ad85e>] ? kvm_clock_read+0x1e/0x20
181.31 [ 37.864108] [<
ffffffff81535f4f>] get_request+0x4af/0x760
181.32 [ 37.864108] [<
ffffffff8112c270>] ? wake_atomic_t_function+0x60/0x60
181.33 [ 37.864108] [<
ffffffff81536283>] blk_get_request+0x83/0xe0
181.34 [ 37.864108] [<
ffffffff81773b5d>] scsi_execute+0x3d/0x1d0
181.35 [ 37.864108] [<
ffffffff817758fe>] scsi_execute_req_flags+0x8e/0xf0
181.36 [ 37.864108] [<
ffffffff81788f4d>] sr_check_events+0x8d/0x2a0
181.37 [ 37.864108] [<
ffffffff81547590>] ? disk_check_events+0x130/0x130
181.38 [ 37.864108] [<
ffffffff8181b618>] cdrom_check_events+0x18/0x30
181.39 [ 37.864108] [<
ffffffff8178935a>] sr_block_check_events+0x2a/0x30
181.40 [ 37.864108] [<
ffffffff815474b1>] disk_check_events+0x51/0x130
181.41 [ 37.864108] [<
ffffffff815475a6>] disk_events_workfn+0x16/0x20
181.42 [ 37.864108] [<
ffffffff81102b85>] process_one_work+0x165/0x480
181.43 [ 37.864108] [<
ffffffff81102eeb>] worker_thread+0x4b/0x4c0
181.44 [ 37.864108] [<
ffffffff81102ea0>] ? process_one_work+0x480/0x480
181.45 [ 37.864108] [<
ffffffff81108d86>] kthread+0xd6/0xf0
181.46 [ 37.864108] [<
ffffffff81108cb0>] ? kthread_create_on_node+0x180/0x180
181.47 [ 37.864108] [<
ffffffff81aa690f>] ret_from_fork+0x3f/0x70
181.48 [ 37.864108] [<
ffffffff81108cb0>] ? kthread_create_on_node+0x180/0x180
181.49 [ 37.864108] Code: 00 00 00 00 48 89 85 20 ff ff ff 48 8d 85 10 ff ff ff 8b 73 04 48 89 85 28 ff ff ff 49 8b 47 48 ff 50 28 85 c0 0f 88 5d 01 00 00 <49> 8b 44 24 08 48 85 c0 0f 84 3d 03 00 00 8b 00 89 85 08 ff ff
181.50 [ 37.864108] RIP [<
ffffffffa01c41b7>] __event_probe__block_get_rq+0x127/0x4b0 [lttng_probe_block]
181.51 [ 37.864108] RSP <
ffff88007c86ba98>
181.52 [ 37.864108] CR2:
0000000000000008
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Fri, 29 Sep 2017 20:40:36 +0000 (16:40 -0400)]
Fix: version check error in btrfs instrumentation
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Wed, 20 Sep 2017 16:12:41 +0000 (12:12 -0400)]
Fix: update btrfs instrumentation for kernel 4.14
See upstream commit:
Author: Jeff Mahoney <jeffm@suse.com>
Date: Wed Jun 28 21:56:54 2017 -0600
btrfs: constify tracepoint arguments
Tracepoint arguments are all read-only. If we mark the arguments
as const, we're able to keep or convert those arguments to const
where appropriate.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Wed, 20 Sep 2017 16:12:40 +0000 (12:12 -0400)]
Fix: update writeback instrumentation for kernel 4.14
See upstream commits:
commit
11fb998986a72aa7e997d96d63d52582a01228c5
Author: Mel Gorman <mgorman@techsingularity.net>
Date: Thu Jul 28 15:46:20 2016 -0700
mm: move most file-based accounting to the node
There are now a number of accounting oddities such as mapped file pages
being accounted for on the node while the total number of file pages are
accounted on the zone. This can be coped with to some extent but it's
confusing so this patch moves the relevant file-based accounted. Due to
throttling logic in the page allocator for reliable OOM detection, it is
still necessary to track dirty and writeback pages on a per-zone basis.
commit
c4a25635b60d08853a3e4eaae3ab34419a36cfa2
Author: Mel Gorman <mgorman@techsingularity.net>
Date: Thu Jul 28 15:46:23 2016 -0700
mm: move vmscan writes and file write accounting to the node
As reclaim is now node-based, it follows that page write activity due to
page reclaim should also be accounted for on the node. For consistency,
also account page writes and page dirtying on a per-node basis.
After this patch, there are a few remaining zone counters that may appear
strange but are fine. NUMA stats are still per-zone as this is a
user-space interface that tools consume. NR_MLOCK, NR_SLAB_*,
NR_PAGETABLE, NR_KERNEL_STACK and NR_BOUNCE are all allocations that
potentially pin low memory and cannot trivially be reclaimed on demand.
This information is still useful for debugging a page allocation failure
warning.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Wed, 20 Sep 2017 16:12:39 +0000 (12:12 -0400)]
Fix: update block instrumentation for kernel 4.14
See upstream commit:
commit
74d46992e0d9dee7f1f376de0d56d31614c8a17a
Author: Christoph Hellwig <hch@lst.de>
Date: Wed Aug 23 19:10:32 2017 +0200
block: replace bi_bdev with a gendisk pointer and partitions index
This way we don't need a block_device structure to submit I/O. The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open. Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).
For the actual I/O path all that we need is the gendisk, which exists
once per block device. But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.
Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Tue, 26 Sep 2017 18:16:47 +0000 (14:16 -0400)]
Fix: vmalloc wrapper on kernel < 2.6.38
Ensure that all probes end up including the vmalloc wrapper through the
lttng-tracer.h header so the trace_*() static inlines are generated
through inclusion of include/trace/events/kmem.h before we define
CREATE_TRACE_POINTS.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Tue, 26 Sep 2017 17:46:30 +0000 (13:46 -0400)]
Fix: vmalloc wrapper on kernel >= 4.12
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Mon, 25 Sep 2017 14:56:20 +0000 (10:56 -0400)]
Add kmalloc failover to vmalloc
This patch is based on the kvmalloc helpers introduced in kernel 4.12.
It will gracefully failover memory allocations of more than one page to
vmalloc for systems under high memory pressure or fragmentation.
See Linux kernel commit:
commit
a7c3e901a46ff54c016d040847eda598a9e3e653
Author: Michal Hocko <mhocko@suse.com>
Date: Mon May 8 15:57:09 2017 -0700
mm: introduce kv[mz]alloc helpers
Patch series "kvmalloc", v5.
There are many open coded kmalloc with vmalloc fallback instances in the
tree. Most of them are not careful enough or simply do not care about
the underlying semantic of the kmalloc/page allocator which means that
a) some vmalloc fallbacks are basically unreachable because the kmalloc
part will keep retrying until it succeeds b) the page allocator can
invoke a really disruptive steps like the OOM killer to move forward
which doesn't sound appropriate when we consider that the vmalloc
fallback is available.
As it can be seen implementing kvmalloc requires quite an intimate
knowledge if the page allocator and the memory reclaim internals which
strongly suggests that a helper should be implemented in the memory
subsystem proper.
Most callers, I could find, have been converted to use the helper
instead. This is patch 6. There are some more relying on __GFP_REPEAT
in the networking stack which I have converted as well and Eric Dumazet
was not opposed [2] to convert them as well.
[1] http://lkml.kernel.org/r/
20170130094940.13546-1-mhocko@kernel.org
[2] http://lkml.kernel.org/r/
1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com
This patch (of 9):
Using kmalloc with the vmalloc fallback for larger allocations is a
common pattern in the kernel code. Yet we do not have any common helper
for that and so users have invented their own helpers. Some of them are
really creative when doing so. Let's just add kv[mz]alloc and make sure
it is implemented properly. This implementation makes sure to not make
a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
to not warn about allocation failures. This also rules out the OOM
killer as the vmalloc is a more approapriate fallback than a disruptive
user visible action.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Tue, 19 Sep 2017 16:16:58 +0000 (12:16 -0400)]
Fix: mmap: caches aliased on virtual addresses
Some architectures (e.g. implementations of arm64) implement their
caches based on the virtual addresses (rather than physical address).
It has the upside of making the cache access faster (no TLB lookup
required to access the cache line), but the downside of requiring
virtual mappings (e.g. kernel vs user-space) to be aligned on the number
of bits used for cache aliasing.
Perform dcache flushing for the entire sub-buffer in the get_subbuf
operation on those architectures, thus ensuring we don't end up with
cache aliasing issues.
An alternative approach we could eventually take would be to create a
kernel mapping for the ring buffer that is aligned with the user-space
mapping.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Mon, 21 Aug 2017 18:47:08 +0000 (14:47 -0400)]
Fix: update ext4 instrumentation for kernel 4.13
See this upstream commit :
commit
a627b0a7c15ee4d2c87a86d5be5c8167382e8d0d
Author: Eric Whitney <enwlinux@gmail.com>
Date: Sun Jul 30 22:30:11 2017 -0400
ext4: remove unused metadata accounting variables
Two variables in ext4_inode_info, i_reserved_meta_blocks and
i_allocated_meta_blocks, are unused. Removing them saves a little
memory per in-memory inode and cleans up clutter in several tracepoints.
Adjust tracepoint output from ext4_alloc_da_blocks() for consistency
and fix a typo and whitespace near these changes.
Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mathieu Desnoyers [Fri, 21 Jul 2017 12:22:04 +0000 (08:22 -0400)]
Fix: Sleeping function called from invalid context
It affects system call instrumentation for accept, accept4 and connect,
only on the x86-64 architecture.
We need to use the LTTng accessing functions to touch user-space memory,
which take care of disabling the page fault handler, so we don't preempt
while in preempt-off context (tracepoints disable preemption).
Fixes #1111
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
This page took 0.055077 seconds and 5 git commands to generate.