Implement the REUSE specification for licensing and copyright
[librseq.git] / doc / man / rseq.2
CommitLineData
f2d7b530 1.\" SPDX-FileCopyrightText: 2015-2023 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
41478a42 2.\"
81da251a 3.\" SPDX-License-Identifier: Linux-man-pages-copyleft
41478a42 4.\"
81da251a 5.TH rseq 2 (date) "Linux man-pages (unreleased)"
41478a42 6.SH NAME
81da251a 7rseq \- restartable sequences system call
82ee9b47
MD
8.SH LIBRARY
9Standard C library
10.RI ( libc ", " \-lc )
41478a42
MD
11.SH SYNOPSIS
12.nf
81da251a 13.PP
82ee9b47
MD
14.BR "#include <linux/rseq.h>" " /* Definition of " RSEQ_* " constants */"
15.BR "#include <sys/syscall.h>" " /* Definition of " SYS_* " constants */"
81da251a
MD
16.B #include <unistd.h>
17.PP
c7df66c5 18.BI "int syscall(SYS_rseq, struct rseq *" rseq ", uint32_t " rseq_len ,
82ee9b47 19.BI " int " flags ", uint32_t " sig );
81da251a
MD
20.fi
21.PP
22.IR Note :
23glibc provides no wrapper for
24.BR rseq (),
25necessitating the use of
26.BR syscall (2).
41478a42 27.SH DESCRIPTION
841a0f9b
MD
28The
29.BR rseq ()
30ABI accelerates specific user-space operations by registering a
81da251a
MD
31per-thread data structure shared between kernel and user-space.
32This data structure can be read from or written to by user-space to skip
841a0f9b 33otherwise expensive system calls.
81da251a 34.PP
82ee9b47
MD
35A restartable sequence is a sequence of instructions
36guaranteed to be executed atomically with respect to
37other threads and signal handlers on the current CPU.
38If its execution does not complete atomically,
39the kernel changes the execution flow by jumping to an abort handler
40defined by user-space for that restartable sequence.
81da251a 41.PP
da5633b4 42Using restartable sequences requires to register a
82ee9b47
MD
43.BR rseq ()
44ABI per-thread data structure
45.RB ( "struct rseq" )
46through the
da5633b4 47.BR rseq ()
81da251a 48system call.
82ee9b47
MD
49Only one
50.BR rseq ()
51ABI can be registered per thread, so user-space libraries and
52applications must follow a user-space ABI defining how to share this
81da251a
MD
53resource.
54The ABI defining how to share this resource between applications and
55libraries is defined by the C library.
82ee9b47
MD
56Allocation of the per-thread
57.BR rseq ()
58ABI and its registration to the kernel is handled by glibc since version
592.35.
81da251a 60.PP
82ee9b47
MD
61The
62.BR rseq ()
63ABI per-thread data structure contains a
da5633b4 64.I rseq_cs
81da251a
MD
65field which points to the currently executing critical section.
66For each thread, a single rseq critical section can run at any given
67point.
68Each critical section need to be implemented in assembly.
69.PP
41478a42
MD
70The
71.BR rseq ()
72ABI accelerates user-space operations on per-cpu data by defining a
73shared data structure ABI between each user-space thread and the kernel.
81da251a 74.PP
41478a42
MD
75It allows user-space to perform update operations on per-cpu data
76without requiring heavy-weight atomic operations.
81da251a 77.PP
41478a42 78The term CPU used in this documentation refers to a hardware execution
81da251a
MD
79context.
80For instance, each CPU number returned by
da5633b4 81.BR sched_getcpu ()
81da251a
MD
82is a CPU.
83The current CPU means to the CPU on which the registered thread is
da5633b4 84running.
81da251a 85.PP
41478a42 86Restartable sequences are atomic with respect to preemption (making it
81da251a
MD
87atomic with respect to other threads running on the same CPU),
88as well as signal delivery (user-space execution contexts nested over
89the same thread).
90They either complete atomically with respect to preemption on the
91current CPU and signal delivery, or they are aborted.
92.PP
da5633b4 93Restartable sequences are suited for update operations on per-cpu data.
81da251a 94.PP
da5633b4 95Restartable sequences can be used on data structures shared between threads
81da251a
MD
96within a process,
97and on data structures shared between threads across different
98processes.
41478a42 99.PP
81da251a
MD
100Some examples of operations that can be accelerated or improved by this ABI:
101.IP \(bu 3
41478a42 102Memory allocator per-cpu free-lists,
81da251a 103.IP \(bu 3
41478a42 104Querying the current CPU number,
81da251a 105.IP \(bu 3
41478a42 106Incrementing per-CPU counters,
81da251a 107.IP \(bu 3
41478a42 108Modifying data protected by per-CPU spinlocks,
81da251a 109.IP \(bu 3
41478a42 110Inserting/removing elements in per-CPU linked-lists,
81da251a 111.IP \(bu 3
41478a42 112Writing/reading per-CPU ring buffers content.
81da251a
MD
113.IP \(bu 3
114Accurately reading performance monitoring unit counters with respect to
115thread migration.
41478a42 116.PP
81da251a
MD
117Restartable sequences must not perform system calls.
118Doing so may result in termination of the process by a segmentation
119fault.
41478a42
MD
120.PP
121The
122.I rseq
82ee9b47
MD
123argument is a pointer to the thread-local
124.B struct rseq
125to be shared between kernel and user-space.
41478a42 126.PP
841a0f9b 127The structure
41478a42 128.B struct rseq
81da251a
MD
129is an extensible structure.
130Additional feature fields can be added in future kernel versions.
131Its layout is as follows:
41478a42
MD
132.TP
133.B Structure alignment
81da251a
MD
134This structure is aligned on either 32-byte boundary,
135or on the alignment value returned by
82ee9b47
MD
136.IR getauxval ()
137invoked with
81da251a 138.B AT_RSEQ_ALIGN
841a0f9b 139if the structure size differs from 32 bytes.
41478a42
MD
140.TP
141.B Structure size
81da251a
MD
142This structure size needs to be at least 32 bytes.
143It can be either 32 bytes,
144or it needs to be large enough to hold the result of
82ee9b47
MD
145.IR getauxval ()
146invoked with
147.BR AT_RSEQ_FEATURE_SIZE .
148Its size is passed as parameter to the
149.BR rseq ()
150system call.
151.in +4n
152.IP
da5633b4 153.EX
82ee9b47
MD
154#include <linux/rseq.h>
155
da5633b4
MD
156struct rseq {
157 __u32 cpu_id_start;
158 __u32 cpu_id;
159 union {
82ee9b47 160 /* ... */
da5633b4
MD
161 } rseq_cs;
162 __u32 flags;
841a0f9b
MD
163 __u32 node_id;
164 __u32 mm_cid;
da5633b4
MD
165} __attribute__((aligned(32)));
166.EE
82ee9b47 167.in
41478a42
MD
168.TP
169.B Fields
81da251a 170.RS
82ee9b47 171.TP
41478a42 172.I cpu_id_start
841a0f9b 173Always-updated value of the CPU number on which the registered thread is
81da251a
MD
174running.
175Its value is guaranteed to always be a possible CPU number,
82ee9b47
MD
176even when
177.BR rseq ()
178is not registered.
81da251a
MD
179Its value should always be confirmed by reading the cpu_id field before
180user-space performs any side-effect
181(e.g. storing to memory).
82ee9b47 182.IP
841a0f9b 183This field is always guaranteed to hold a valid CPU number in the range
81da251a 184[ 0 .. nr_possible_cpus - 1 ].
82ee9b47
MD
185It can therefore be loaded by user-space
186and used as an offset in per-cpu data structures
187without having to check whether its value is within the valid bounds
188compared to the number of possible CPUs in the system.
189.IP
81da251a 190Initialized by user-space to a possible CPU number (e.g., 0),
82ee9b47
MD
191updated by the kernel for threads registered with
192.BR rseq ().
193.IP
194For user-space applications executed on a kernel without
195.BR rseq ()
196support,
197the cpu_id_start field stays initialized at 0,
198which is indeed a valid CPU number.
81da251a
MD
199It is therefore valid to use it as an offset in per-cpu data structures,
200and only validate whether it's actually the current CPU number by
201comparing it with the cpu_id field within the rseq critical section.
82ee9b47
MD
202If the kernel does not provide
203.BR rseq ()
204support, that cpu_id field stays initialized at -1,
81da251a 205so the comparison always fails, as intended.
82ee9b47 206.IP
841a0f9b 207This field should only be read by the thread which registered this data
81da251a
MD
208structure.
209Aligned on 32-bit.
82ee9b47
MD
210.IP
211It is up to user space to implement a fall-back mechanism for scenarios where
212.BR rseq ()
213is not available.
214.TP
41478a42 215.I cpu_id
841a0f9b 216Always-updated value of the CPU number on which the registered thread is
81da251a
MD
217running.
218Initialized by user-space to -1,
82ee9b47
MD
219updated by the kernel for threads registered with
220.BR rseq ().
221.IP
841a0f9b 222This field should only be read by the thread which registered this data
81da251a
MD
223structure.
224Aligned on 32-bit.
82ee9b47 225.TP
41478a42 226.I rseq_cs
81da251a 227The rseq_cs field is a pointer to a
82ee9b47 228.BR "struct rseq_cs" .
81da251a
MD
229Is is NULL when no rseq assembly block critical section is active for
230the registered thread.
82ee9b47
MD
231Setting it to point to a critical section descriptor
232.RB ( "struct rseq_cs")
233marks the beginning of the critical section.
234.IP
da5633b4 235Initialized by user-space to NULL.
82ee9b47 236.IP
da5633b4
MD
237Updated by user-space, which sets the address of the currently
238active rseq_cs at the beginning of assembly instruction sequence
81da251a
MD
239block,
240and set to NULL by the kernel when it restarts an assembly instruction
241sequence block,
242as well as when the kernel detects that it is preempting or delivering a
243signal outside of the range targeted by the rseq_cs.
244Also needs to be set to NULL by user-space before reclaiming memory that
245contains the targeted
82ee9b47
MD
246.BR "struct rseq_cs" .
247.IP
da5633b4 248Read and set by the kernel.
82ee9b47 249.IP
841a0f9b 250This field should only be updated by the thread which registered this
81da251a
MD
251data structure.
252Aligned on 64-bit.
82ee9b47 253.TP
41478a42 254.I flags
81da251a
MD
255Flags indicating the restart behavior for the registered thread.
256This is mainly used for debugging purposes.
257Can be a combination of:
82ee9b47 258.RS
81da251a
MD
259.TP
260.B RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT
261Inhibit instruction sequence block restart on preemption for this
262thread.
82ee9b47 263This flag is deprecated since Linux 6.1.
81da251a
MD
264.TP
265.B RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL
266Inhibit instruction sequence block restart on signal delivery for this
267thread.
82ee9b47 268This flag is deprecated since Linux 6.1.
841a0f9b 269.TP
81da251a
MD
270.B RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE
271Inhibit instruction sequence block restart on migration for this thread.
82ee9b47 272This flag is deprecated since Linux 6.1.
81da251a 273.RE
82ee9b47
MD
274.IP
275Initialized by user-space, used by the kernel.
276.TP
841a0f9b
MD
277.I node_id
278Always-updated value of the current NUMA node ID.
82ee9b47 279.IP
841a0f9b 280Initialized by user-space to 0.
82ee9b47 281.IP
81da251a
MD
282Updated by the kernel.
283Read by user-space with single-copy atomicity semantics.
284This field should only be read by the thread which registered
285this data structure.
286Aligned on 32-bit.
82ee9b47 287.TP
841a0f9b 288.I mm_cid
81da251a
MD
289Contains the current thread's concurrency ID
290(allocated uniquely within a memory map).
82ee9b47 291.IP
81da251a
MD
292Updated by the kernel.
293Read by user-space with single-copy atomicity semantics.
294This field should only be read by the thread which registered this data
295structure.
296Aligned on 32-bit.
82ee9b47 297.IP
81da251a
MD
298This concurrency ID is within the possible cpus range,
299and is temporarily (and uniquely) assigned while threads are actively
300running within a memory map.
301If a memory map has fewer threads than cores,
302or is limited to run on few cores concurrently through sched affinity or
303cgroup cpusets,
304the concurrency IDs will be values close to 0,
305thus allowing efficient use of user-space memory for per-cpu data
306structures.
307.RE
41478a42
MD
308.PP
309The layout of
310.B struct rseq_cs
311version 0 is as follows:
312.TP
313.B Structure alignment
6a78527e 314This structure is aligned on 32-byte boundary.
41478a42
MD
315.TP
316.B Structure size
317This structure has a fixed size of 32 bytes.
82ee9b47
MD
318.in +4n
319.IP
da5633b4 320.EX
82ee9b47
MD
321#include <linux/rseq.h>
322
da5633b4
MD
323struct rseq_cs {
324 __u32 version;
325 __u32 flags;
326 __u64 start_ip;
327 __u64 post_commit_offset;
328 __u64 abort_ip;
329} __attribute__((aligned(32)));
330.EE
82ee9b47
MD
331.in
332.TP
41478a42 333.B Fields
81da251a 334.RS
82ee9b47 335.TP
41478a42 336.I version
81da251a
MD
337Version of this structure.
338Should be initialized to 0.
82ee9b47 339.TP
41478a42 340.I flags
81da251a
MD
341.RS
342Flags indicating the restart behavior of this structure.
343Can be a combination of:
344.TP
345.B RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT
346Inhibit instruction sequence block restart on preemption for this
347critical section.
82ee9b47 348This flag is deprecated since Linux 6.1.
41478a42 349.TP
81da251a
MD
350.B RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL
351Inhibit instruction sequence block restart on signal delivery for this
352critical section.
82ee9b47 353This flag is deprecated since Linux 6.1.
81da251a
MD
354.TP
355.B RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE
356Inhibit instruction sequence block restart on migration for this
357critical section.
82ee9b47 358This flag is deprecated since Linux 6.1.
81da251a 359.RE
82ee9b47 360.TP
41478a42
MD
361.I start_ip
362Instruction pointer address of the first instruction of the sequence of
363consecutive assembly instructions.
82ee9b47 364.TP
41478a42
MD
365.I post_commit_offset
366Offset (from start_ip address) of the address after the last instruction
367of the sequence of consecutive assembly instructions.
82ee9b47 368.TP
41478a42
MD
369.I abort_ip
370Instruction pointer address where to move the execution flow in case of
371abort of the sequence of consecutive assembly instructions.
81da251a 372.RE
41478a42
MD
373.PP
374The
375.I rseq_len
376argument is the size of the
81da251a 377.B struct rseq
41478a42 378to register.
41478a42
MD
379.PP
380The
381.I flags
382argument is 0 for registration, and
81da251a 383.B RSEQ_FLAG_UNREGISTER
41478a42 384for unregistration.
41478a42
MD
385.PP
386The
387.I sig
388argument is the 32-bit signature to be expected before the abort
389handler code.
41478a42 390.PP
82ee9b47
MD
391A single library per process should keep the
392.B struct rseq
393in a per-thread data structure.
41478a42
MD
394The
395.I cpu_id
396field should be initialized to -1, and the
397.I cpu_id_start
398field should be initialized to a possible CPU value (typically 0).
41478a42 399.PP
82ee9b47
MD
400Each thread is responsible for registering and unregistering its
401.BR "struct rseq" .
402No more than one
403.B struct rseq
404address can be registered per thread at a given time.
6a78527e 405.PP
82ee9b47
MD
406Reclaim of
407.B struct rseq
408object's memory must only be done after either an explicit rseq
409unregistration is performed or after the thread exits.
41478a42 410.PP
82ee9b47
MD
411In a typical usage scenario, the thread registering the
412.B struct rseq
413will be performing loads and stores from/to that structure.
81da251a 414It is however also allowed to read that structure from other threads.
82ee9b47
MD
415The
416.B struct rseq
417field updates performed by the kernel provide relaxed atomicity
81da251a
MD
418semantics (atomic store, without memory ordering),
419which guarantee that other threads performing relaxed atomic reads
420(atomic load, without memory ordering) of the cpu number fields will
421always observe a consistent value.
41478a42 422.SH RETURN VALUE
81da251a
MD
423A return value of 0 indicates success.
424On error, \-1 is returned, and
41478a42
MD
425.I errno
426is set appropriately.
41478a42
MD
427.SH ERRORS
428.TP
429.B EINVAL
430Either
431.I flags
432contains an invalid value, or
433.I rseq
434contains an address which is not appropriately aligned, or
435.I rseq_len
da5633b4 436contains an incorrect size.
41478a42
MD
437.TP
438.B ENOSYS
439The
440.BR rseq ()
441system call is not implemented by this kernel.
442.TP
443.B EFAULT
444.I rseq
445is an invalid address.
446.TP
447.B EBUSY
448Restartable sequence is already registered for this thread.
449.TP
450.B EPERM
451The
452.I sig
453argument on unregistration does not match the signature received
454on registration.
41478a42
MD
455.SH VERSIONS
456The
457.BR rseq ()
458system call was added in Linux 4.18.
81da251a 459.SH STANDARDS
41478a42
MD
460.BR rseq ()
461is Linux-specific.
41478a42
MD
462.SH SEE ALSO
463.BR sched_getcpu (3) ,
841a0f9b
MD
464.BR membarrier (2) ,
465.BR getauxval (3)
This page took 0.069218 seconds and 4 git commands to generate.