Update libside rfc
[libside.git] / doc / rfc-libside.txt
CommitLineData
1d80273a
MD
1
2RFC - libside
3
f6380ed8 4[ This document is under heavy construction. Please beware of the
1d80273a
MD
5 potholes as you wander through it. ]
6
7* Introduction
8
9The purpose of the libside API/ABI is to allow a kernel tracer and many
10user-space tracers to attach to static and dynamic instrumentation of
11user-space applications.
12
13The libside library expresses the instrumentation description as data
14(no generated code). Instrumentation arguments are passed on the stack
15as an array of typed items, along with a reference to the
16instrumentation description.
17
18-- TODO API vs ABI
19This library exposes a type system and a set of macros to help
20applications declare their instrumentation and insert instrumentation
21calls. It exposes APIs to kernel and user-space tracers so they can list
22and connect to the instrumentation, and conditionally enables
23instrumentation when at least one tracer is using it.
24
25The type system includes support for statically known types and dynamic
26types. Nested structures, arrays, and variable-length arrays are
27supported.
28
29This library learns from the user feedback about experience with
30LTTng-UST and Linux kernel tracepoints, and therefore it introduces
31significant changes (and vast simplifications) to the way
32instrumentation is done compared to LTTng-UST and Linux kernel
33tracepoints.
34
35
36* Genesis
37
38- Linux kernel User Events ABI
39 - Exposes a stable ABI allowing applications to register their event
40 names/field types to the kernel,
41 - Can be expected to have a large effect on application instrumentation,
42 - My concerns:
43 – Should be co-designed with a userspace instrumentation API/ABI rather than only
44 focusing on the kernel ABI,
45 – Should allow purely userspace tracers to use the same instrumentation as userspace
46 tracers implemented within the Linux kernel,
47 – Tracers can target their specific use-cases, but infrastructure should be shared,
48 – Limit fragmentation of the instrumentation ecosystem.
49
50- Improvements over tracepoints:
51 - Improve compiler error reporting vs tracepoints
52 - API uses standard header inclusion practices
53 - share ABI across runtimes (no need to reimplement tracepoints for
54 each language, or to use string only payloads)
55- Improvements over SDT: allow expressing additional event semantic
56 (e.g. user attributes, versioning, nested and compound data types)
57 - libside has less impact on control flow when disabled (no stack setup)
58 - SDT ABI is focused on architecture calling conventions, libside ABI
59 is easier to use from runtime environments which have an ABI
60 different from the native architecture (golang, rust, python, java).
61 libside instrumentation ABI calls a small fixed set of functions.
62- Comparison with ETW
63 - similar to libside in terms of array of arguments,
64 - does not support pre-registration of events (static typing)
65 - type information received at runtime from the instrumentation
66 callsite.
67
68* Desiderata
69
70- Common instrumentation for kernel and purely userspace tracers,
71 - Instrumentation is self-described,
72 - Support compound and nested types,
73 - Support pre-registration of events,
74 - Do not rely on compiled event-specific code,
75 - Independent from ELF,
76 - Simple ABI for instrumented code, kernel, and user-space tracers,
77 - Support concurrent tracers,
78 - Expose API to allow dynamic instrumentation libraries to register
79 their events/payloads.
80
81- Support statically typed instrumentation
82
83- Support dynamically typed instrumentation
84 - Natively cover dynamically-typed languages
85 - The support for events with dynamic fields allows lessening the number
86 of statically declared events in situation where an application
87 possesses seldom-used events with a large variety of parameter types.
88 - The support for mixed static and dynamic event fields allows
89 implementation of post-processing string formatting along with a
90 variadic payload, while keeping trace data in a structured format.
91
92- Performance considerations for userspace tracers.
93 - Maintain performance characteristics comparable to existing
94 userspace tracers.
95 - Low overhead, good scalability when used by userspace tracers.
96
97- Allows tracing user-space through a kernel tracer. Even through it is
98 an approach that adds more overhead, it has the benefit of not
99 requiring agent threads to be deployed into applications, which is
100 useful to trace locked-down processes.
101
102- Instrumentation registration APIs
103 - Instrumentation can be generated at runtime
104 - dynamic patching,
105 - JIT
106 - Instrumentation can be declared statically (static instrumentation)
107 - Instrumentation can be enabled dynamically.
108 - Very low overhead when not in use.
109
110- libside must be extensible in the future.
111 - Extension scheme should allow adding new types in the future without
112 requiring complex logic to future-proof tracers.
113 - Exposed types are invariant,
114 - libside ABI and API can be extended by adding new types.
115
116- the side ABI should allow multiple instances and versions within
117 a process (e.g. libside for C/C++, Java side ABI, Python side ABI...).
118
119- Both event description and payload are data (no generated text).
120 - It allows tracers to directly interpret the event payload from their
121 description, removing the need for code generation. This lessens the
122 instruction cache pollution compared to code generation approaches.
123 - Tracer interpreter for filtering and field capture can directly use
124 the instrumentation data, without need for setting up a structured
125 argument layout on the stack within the tracer.
126
127- Validation of argument vs event description coherence.
128
129- Passing arguments to events should be:
130 - Conveniently express application data structures to be expected as
131 instrumentation input.
132 - Flexible,
133 - Efficient,
134 - If all are not possible combined, specialize types for each purpose.
135
136- Allow tracers to passively collect application state transitions.
137
138- Allow tracers to actively sample the current state of an application.
139
140- Error messages generated when misusing the API should be easy to
141 comprehend and resolve.
142
143- Allow expressing additional custom semantic augmenting events and
144 types.
145
146
147* Design / Architecture
148
149
150- Compiler error messages are easy to understand because it is a simple
151 header file without any repeated inclusion tricks.
152
153
154- Variadic events.
155
156
157- Instrumentation API/ABI:
158 – Type system,
159 - Type visitor callbacks
160 - (perfetto)
161 - Stack-copy types
162 - Data-gathering types
163 - Dynamic types.
164 – Helper macros for C/C++,
165 – Express instrumentation description as data,
166 – Instrumentation arguments are passed on the stack as a data array
167 (similar to iovec) along with a reference to instrumentation
168 description,
169 – Instrumentation is conditionally enabled when at least one tracer is
170 registered to it.
171
172- Tracer-agnostic API/ABI:
173 – Available events notifications,
174 – Conditionally enabling instrumentation,
175 – Synchronize registered user-space tracer callbacks with RCU,
176 – Co-designed to interact with User Events.
177
178- Application state dump
179 - How are applications/libraries meant to provide state information ?
180 - How are tracers meant to interact with state dump ?
181 - statedump mode polling
182 - statedump mode agent thread
183
184- RCU to synchronize userspace tracers registration vs invocation
185
186- How tracers are meant to interact with libside ?
187
188- How is C/C++ language instrumentation is meant to be used ?
189
190- How are dynamic instrumentation facilities meant to interact with
191 libside ?
192
193- How is a kernel tracer meant to interact with libside ?
194
195- How is gdb (ptrace) meant to interact with libside ?
196
197- Validation that instrumentation arguments match event description
198 fields cannot be done by the compiler, requires either:
199 - run time check,
200 - static checker (only for static instrumentation).
201
202- Event attributes.
203
204- Type attributes.
205
206
207
208
This page took 0.029726 seconds and 4 git commands to generate.