Update rseq.2 man page based on comments
[librseq.git] / doc / man / rseq.2
CommitLineData
81da251a
MD
1'\" t
2.\" Copyright 2015-2023 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
41478a42 3.\"
81da251a 4.\" SPDX-License-Identifier: Linux-man-pages-copyleft
41478a42 5.\"
81da251a 6.TH rseq 2 (date) "Linux man-pages (unreleased)"
41478a42 7.SH NAME
81da251a 8rseq \- restartable sequences system call
41478a42
MD
9.SH SYNOPSIS
10.nf
81da251a
MD
11.PP
12.BR "#include <linux/rseq.h>" \
13" /* Definition of " RSEQ_* " constants and rseq types */"
14.BR "#include #include <sys/syscall.h>" " * Definition of " SYS_* " constants */"
15.B #include <unistd.h>
16.PP
17.BI "int syscall(SYS_rseq, struct rseq *_Nullable " rseq ", uint32_t " rseq_len \
18", int " flags ", uint32_t " sig ");
19.fi
20.PP
21.IR Note :
22glibc provides no wrapper for
23.BR rseq (),
24necessitating the use of
25.BR syscall (2).
41478a42 26.SH DESCRIPTION
81da251a 27.PP
841a0f9b
MD
28The
29.BR rseq ()
30ABI accelerates specific user-space operations by registering a
81da251a
MD
31per-thread data structure shared between kernel and user-space.
32This data structure can be read from or written to by user-space to skip
841a0f9b 33otherwise expensive system calls.
81da251a 34.PP
da5633b4
MD
35A restartable sequence is a sequence of instructions guaranteed to be executed
36atomically with respect to other threads and signal handlers on the current
81da251a
MD
37CPU.
38If its execution does not complete atomically, the kernel changes the
39execution flow by jumping to an abort handler defined by user-space for
40that restartable sequence.
41.PP
da5633b4 42Using restartable sequences requires to register a
81da251a
MD
43rseq ABI per-thread data structure (
44.B struct rseq
45) through the
da5633b4 46.BR rseq ()
81da251a
MD
47system call.
48Only one rseq ABI can be registered per thread, so user-space libraries
49and applications must follow a user-space ABI defining how to share this
50resource.
51The ABI defining how to share this resource between applications and
52libraries is defined by the C library.
841a0f9b
MD
53Allocation of the per-thread rseq ABI and its registration to the kernel
54is handled by glibc since version 2.35.
81da251a 55.PP
841a0f9b 56The rseq ABI per-thread data structure contains a
da5633b4 57.I rseq_cs
81da251a
MD
58field which points to the currently executing critical section.
59For each thread, a single rseq critical section can run at any given
60point.
61Each critical section need to be implemented in assembly.
62.PP
41478a42
MD
63The
64.BR rseq ()
65ABI accelerates user-space operations on per-cpu data by defining a
66shared data structure ABI between each user-space thread and the kernel.
81da251a 67.PP
41478a42
MD
68It allows user-space to perform update operations on per-cpu data
69without requiring heavy-weight atomic operations.
81da251a 70.PP
41478a42 71The term CPU used in this documentation refers to a hardware execution
81da251a
MD
72context.
73For instance, each CPU number returned by
da5633b4 74.BR sched_getcpu ()
81da251a
MD
75is a CPU.
76The current CPU means to the CPU on which the registered thread is
da5633b4 77running.
81da251a 78.PP
41478a42 79Restartable sequences are atomic with respect to preemption (making it
81da251a
MD
80atomic with respect to other threads running on the same CPU),
81as well as signal delivery (user-space execution contexts nested over
82the same thread).
83They either complete atomically with respect to preemption on the
84current CPU and signal delivery, or they are aborted.
85.PP
da5633b4 86Restartable sequences are suited for update operations on per-cpu data.
81da251a 87.PP
da5633b4 88Restartable sequences can be used on data structures shared between threads
81da251a
MD
89within a process,
90and on data structures shared between threads across different
91processes.
41478a42 92.PP
81da251a
MD
93Some examples of operations that can be accelerated or improved by this ABI:
94.IP \(bu 3
41478a42 95Memory allocator per-cpu free-lists,
81da251a 96.IP \(bu 3
41478a42 97Querying the current CPU number,
81da251a 98.IP \(bu 3
41478a42 99Incrementing per-CPU counters,
81da251a 100.IP \(bu 3
41478a42 101Modifying data protected by per-CPU spinlocks,
81da251a 102.IP \(bu 3
41478a42 103Inserting/removing elements in per-CPU linked-lists,
81da251a 104.IP \(bu 3
41478a42 105Writing/reading per-CPU ring buffers content.
81da251a
MD
106.IP \(bu 3
107Accurately reading performance monitoring unit counters with respect to
108thread migration.
41478a42 109.PP
81da251a
MD
110Restartable sequences must not perform system calls.
111Doing so may result in termination of the process by a segmentation
112fault.
41478a42
MD
113.PP
114The
115.I rseq
116argument is a pointer to the thread-local rseq structure to be shared
6a78527e 117between kernel and user-space.
41478a42 118.PP
841a0f9b 119The structure
41478a42 120.B struct rseq
81da251a
MD
121is an extensible structure.
122Additional feature fields can be added in future kernel versions.
123Its layout is as follows:
41478a42
MD
124.TP
125.B Structure alignment
81da251a
MD
126This structure is aligned on either 32-byte boundary,
127or on the alignment value returned by
128.I getauxval(
129.B AT_RSEQ_ALIGN
130)
841a0f9b 131if the structure size differs from 32 bytes.
41478a42
MD
132.TP
133.B Structure size
81da251a
MD
134This structure size needs to be at least 32 bytes.
135It can be either 32 bytes,
136or it needs to be large enough to hold the result of
137.I getauxval(
138.B AT_RSEQ_FEATURE_SIZE
139) .
841a0f9b 140Its size is passed as parameter to the rseq system call.
81da251a 141.RS
da5633b4 142.PP
da5633b4
MD
143.EX
144struct rseq {
145 __u32 cpu_id_start;
146 __u32 cpu_id;
147 union {
148 /* Edited out for conciseness. [...] */
149 } rseq_cs;
150 __u32 flags;
841a0f9b
MD
151 __u32 node_id;
152 __u32 mm_cid;
da5633b4
MD
153} __attribute__((aligned(32)));
154.EE
81da251a 155.RE
41478a42
MD
156.TP
157.B Fields
81da251a 158.RS
41478a42 159.I cpu_id_start
81da251a 160.RS
841a0f9b 161Always-updated value of the CPU number on which the registered thread is
81da251a
MD
162running.
163Its value is guaranteed to always be a possible CPU number,
164even when rseq is not registered.
165Its value should always be confirmed by reading the cpu_id field before
166user-space performs any side-effect
167(e.g. storing to memory).
168.PP
841a0f9b 169This field is always guaranteed to hold a valid CPU number in the range
81da251a
MD
170[ 0 .. nr_possible_cpus - 1 ].
171It can therefore be loaded by user-space and used as an offset in
172per-cpu data structures without having to check whether its value is
173within the valid bounds compared to the number of possible CPUs in the
174system.
175.PP
176Initialized by user-space to a possible CPU number (e.g., 0),
177updated by the kernel for threads registered with rseq.
178.PP
6a78527e
MD
179For user-space applications executed on a kernel without rseq support,
180the cpu_id_start field stays initialized at 0, which is indeed a valid
81da251a
MD
181CPU number.
182It is therefore valid to use it as an offset in per-cpu data structures,
183and only validate whether it's actually the current CPU number by
184comparing it with the cpu_id field within the rseq critical section.
185If the kernel does not provide rseq support, that cpu_id field stays
186initialized at -1,
187so the comparison always fails, as intended.
188.PP
841a0f9b 189This field should only be read by the thread which registered this data
81da251a
MD
190structure.
191Aligned on 32-bit.
192.PP
da5633b4
MD
193It is up to user-space to implement a fall-back mechanism for scenarios where
194rseq is not available.
81da251a
MD
195.RE
196.PP
41478a42 197.I cpu_id
81da251a 198.RS
841a0f9b 199Always-updated value of the CPU number on which the registered thread is
81da251a
MD
200running.
201Initialized by user-space to -1,
202updated by the kernel for threads registered with rseq.
203.PP
841a0f9b 204This field should only be read by the thread which registered this data
81da251a
MD
205structure.
206Aligned on 32-bit.
207.RE
208.PP
41478a42 209.I rseq_cs
81da251a
MD
210.RS
211The rseq_cs field is a pointer to a
212.B struct rseq_cs .
213Is is NULL when no rseq assembly block critical section is active for
214the registered thread.
215Setting it to point to a critical section descriptor (
216.B struct rseq_cs
217) marks the beginning of the critical section.
218.PP
da5633b4 219Initialized by user-space to NULL.
81da251a 220.PP
da5633b4
MD
221Updated by user-space, which sets the address of the currently
222active rseq_cs at the beginning of assembly instruction sequence
81da251a
MD
223block,
224and set to NULL by the kernel when it restarts an assembly instruction
225sequence block,
226as well as when the kernel detects that it is preempting or delivering a
227signal outside of the range targeted by the rseq_cs.
228Also needs to be set to NULL by user-space before reclaiming memory that
229contains the targeted
230.B struct rseq_cs .
231.PP
da5633b4 232Read and set by the kernel.
81da251a 233.PP
841a0f9b 234This field should only be updated by the thread which registered this
81da251a
MD
235data structure.
236Aligned on 64-bit.
237.RE
238.PP
41478a42 239.I flags
81da251a
MD
240.RS
241Flags indicating the restart behavior for the registered thread.
242This is mainly used for debugging purposes.
243Can be a combination of:
244.TP
245.B RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT
246Inhibit instruction sequence block restart on preemption for this
247thread.
248This flag is deprecated since kernel 6.1.
249.TP
250.B RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL
251Inhibit instruction sequence block restart on signal delivery for this
252thread.
253This flag is deprecated since kernel 6.1.
841a0f9b 254.TP
81da251a
MD
255.B RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE
256Inhibit instruction sequence block restart on migration for this thread.
257This flag is deprecated since kernel 6.1.
258.PP
259Initialized by user-space, used by the kernel.
260.RE
261.PP
841a0f9b 262.I node_id
81da251a 263.RS
841a0f9b 264Always-updated value of the current NUMA node ID.
81da251a 265.PP
841a0f9b 266Initialized by user-space to 0.
81da251a
MD
267.PP
268Updated by the kernel.
269Read by user-space with single-copy atomicity semantics.
270This field should only be read by the thread which registered
271this data structure.
272Aligned on 32-bit.
273.RE
274.PP
841a0f9b 275.I mm_cid
81da251a
MD
276.RS
277Contains the current thread's concurrency ID
278(allocated uniquely within a memory map).
279.PP
280Updated by the kernel.
281Read by user-space with single-copy atomicity semantics.
282This field should only be read by the thread which registered this data
283structure.
284Aligned on 32-bit.
285.PP
286This concurrency ID is within the possible cpus range,
287and is temporarily (and uniquely) assigned while threads are actively
288running within a memory map.
289If a memory map has fewer threads than cores,
290or is limited to run on few cores concurrently through sched affinity or
291cgroup cpusets,
292the concurrency IDs will be values close to 0,
293thus allowing efficient use of user-space memory for per-cpu data
294structures.
295.RE
296.RE
297.RE
41478a42
MD
298.PP
299The layout of
300.B struct rseq_cs
301version 0 is as follows:
302.TP
303.B Structure alignment
6a78527e 304This structure is aligned on 32-byte boundary.
41478a42
MD
305.TP
306.B Structure size
307This structure has a fixed size of 32 bytes.
81da251a 308.RS
da5633b4
MD
309.EX
310struct rseq_cs {
311 __u32 version;
312 __u32 flags;
313 __u64 start_ip;
314 __u64 post_commit_offset;
315 __u64 abort_ip;
316} __attribute__((aligned(32)));
317.EE
81da251a
MD
318.RE
319.PP
41478a42 320.B Fields
81da251a 321.RS
41478a42 322.I version
81da251a
MD
323.RS
324Version of this structure.
325Should be initialized to 0.
326.RE
327.PP
41478a42 328.I flags
81da251a
MD
329.RS
330Flags indicating the restart behavior of this structure.
331Can be a combination of:
332.TP
333.B RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT
334Inhibit instruction sequence block restart on preemption for this
335critical section.
336This flag is deprecated since kernel 6.1.
41478a42 337.TP
81da251a
MD
338.B RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL
339Inhibit instruction sequence block restart on signal delivery for this
340critical section.
341This flag is deprecated since kernel 6.1.
342.TP
343.B RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE
344Inhibit instruction sequence block restart on migration for this
345critical section.
346This flag is deprecated since kernel 6.1.
347.RE
348.PP
41478a42 349.I start_ip
81da251a 350.RS
41478a42
MD
351Instruction pointer address of the first instruction of the sequence of
352consecutive assembly instructions.
81da251a
MD
353.RE
354.PP
41478a42 355.I post_commit_offset
81da251a 356.RS
41478a42
MD
357Offset (from start_ip address) of the address after the last instruction
358of the sequence of consecutive assembly instructions.
81da251a
MD
359.RE
360.PP
41478a42 361.I abort_ip
81da251a 362.RS
41478a42
MD
363Instruction pointer address where to move the execution flow in case of
364abort of the sequence of consecutive assembly instructions.
81da251a
MD
365.RE
366.RE
41478a42
MD
367.PP
368The
369.I rseq_len
370argument is the size of the
81da251a 371.B struct rseq
41478a42 372to register.
41478a42
MD
373.PP
374The
375.I flags
376argument is 0 for registration, and
81da251a 377.B RSEQ_FLAG_UNREGISTER
41478a42 378for unregistration.
41478a42
MD
379.PP
380The
381.I sig
382argument is the 32-bit signature to be expected before the abort
383handler code.
41478a42
MD
384.PP
385A single library per process should keep the rseq structure in a
841a0f9b 386per-thread data structure.
41478a42
MD
387The
388.I cpu_id
389field should be initialized to -1, and the
390.I cpu_id_start
391field should be initialized to a possible CPU value (typically 0).
41478a42
MD
392.PP
393Each thread is responsible for registering and unregistering its rseq
81da251a
MD
394structure.
395No more than one rseq structure address can be registered per thread at
396a given time.
6a78527e 397.PP
da5633b4
MD
398Reclaim of rseq object's memory must only be done after either an
399explicit rseq unregistration is performed or after the thread exits.
41478a42
MD
400.PP
401In a typical usage scenario, the thread registering the rseq
81da251a
MD
402structure will be performing loads and stores from/to that structure.
403It is however also allowed to read that structure from other threads.
41478a42 404The rseq field updates performed by the kernel provide relaxed atomicity
81da251a
MD
405semantics (atomic store, without memory ordering),
406which guarantee that other threads performing relaxed atomic reads
407(atomic load, without memory ordering) of the cpu number fields will
408always observe a consistent value.
409.PP
41478a42 410.SH RETURN VALUE
81da251a
MD
411A return value of 0 indicates success.
412On error, \-1 is returned, and
41478a42
MD
413.I errno
414is set appropriately.
81da251a 415.PP
41478a42
MD
416.SH ERRORS
417.TP
418.B EINVAL
419Either
420.I flags
421contains an invalid value, or
422.I rseq
423contains an address which is not appropriately aligned, or
424.I rseq_len
da5633b4 425contains an incorrect size.
41478a42
MD
426.TP
427.B ENOSYS
428The
429.BR rseq ()
430system call is not implemented by this kernel.
431.TP
432.B EFAULT
433.I rseq
434is an invalid address.
435.TP
436.B EBUSY
437Restartable sequence is already registered for this thread.
438.TP
439.B EPERM
440The
441.I sig
442argument on unregistration does not match the signature received
443on registration.
81da251a 444.PP
41478a42
MD
445.SH VERSIONS
446The
447.BR rseq ()
448system call was added in Linux 4.18.
81da251a
MD
449.PP
450.SH STANDARDS
41478a42
MD
451.BR rseq ()
452is Linux-specific.
81da251a 453.PP
41478a42
MD
454.SH SEE ALSO
455.BR sched_getcpu (3) ,
841a0f9b
MD
456.BR membarrier (2) ,
457.BR getauxval (3)
This page took 0.041601 seconds and 4 git commands to generate.