-'\" t
.\" Copyright 2015-2023 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
.\"
.\" SPDX-License-Identifier: Linux-man-pages-copyleft
.TH rseq 2 (date) "Linux man-pages (unreleased)"
.SH NAME
rseq \- restartable sequences system call
+.SH LIBRARY
+Standard C library
+.RI ( libc ", " \-lc )
.SH SYNOPSIS
.nf
.PP
-.BR "#include <linux/rseq.h>" \
-" /* Definition of " RSEQ_* " constants and rseq types */"
-.BR "#include #include <sys/syscall.h>" " * Definition of " SYS_* " constants */"
+.BR "#include <linux/rseq.h>" " /* Definition of " RSEQ_* " constants */"
+.BR "#include <sys/syscall.h>" " /* Definition of " SYS_* " constants */"
.B #include <unistd.h>
.PP
-.BI "int syscall(SYS_rseq, struct rseq *_Nullable " rseq ", uint32_t " rseq_len \
-", int " flags ", uint32_t " sig ");
+.BI "int syscall(SYS_rseq, struct rseq *_Nullable " rseq ", uint32_t " rseq_len ,
+.BI " int " flags ", uint32_t " sig );
.fi
.PP
.IR Note :
necessitating the use of
.BR syscall (2).
.SH DESCRIPTION
-.PP
The
.BR rseq ()
ABI accelerates specific user-space operations by registering a
This data structure can be read from or written to by user-space to skip
otherwise expensive system calls.
.PP
-A restartable sequence is a sequence of instructions guaranteed to be executed
-atomically with respect to other threads and signal handlers on the current
-CPU.
-If its execution does not complete atomically, the kernel changes the
-execution flow by jumping to an abort handler defined by user-space for
-that restartable sequence.
+A restartable sequence is a sequence of instructions
+guaranteed to be executed atomically with respect to
+other threads and signal handlers on the current CPU.
+If its execution does not complete atomically,
+the kernel changes the execution flow by jumping to an abort handler
+defined by user-space for that restartable sequence.
.PP
Using restartable sequences requires to register a
-rseq ABI per-thread data structure (
-.B struct rseq
-) through the
+.BR rseq ()
+ABI per-thread data structure
+.RB ( "struct rseq" )
+through the
.BR rseq ()
system call.
-Only one rseq ABI can be registered per thread, so user-space libraries
-and applications must follow a user-space ABI defining how to share this
+Only one
+.BR rseq ()
+ABI can be registered per thread, so user-space libraries and
+applications must follow a user-space ABI defining how to share this
resource.
The ABI defining how to share this resource between applications and
libraries is defined by the C library.
-Allocation of the per-thread rseq ABI and its registration to the kernel
-is handled by glibc since version 2.35.
+Allocation of the per-thread
+.BR rseq ()
+ABI and its registration to the kernel is handled by glibc since version
+2.35.
.PP
-The rseq ABI per-thread data structure contains a
+The
+.BR rseq ()
+ABI per-thread data structure contains a
.I rseq_cs
field which points to the currently executing critical section.
For each thread, a single rseq critical section can run at any given
.PP
The
.I rseq
-argument is a pointer to the thread-local rseq structure to be shared
-between kernel and user-space.
+argument is a pointer to the thread-local
+.B struct rseq
+to be shared between kernel and user-space.
.PP
The structure
.B struct rseq
.B Structure alignment
This structure is aligned on either 32-byte boundary,
or on the alignment value returned by
-.I getauxval(
+.IR getauxval ()
+invoked with
.B AT_RSEQ_ALIGN
-)
if the structure size differs from 32 bytes.
.TP
.B Structure size
This structure size needs to be at least 32 bytes.
It can be either 32 bytes,
or it needs to be large enough to hold the result of
-.I getauxval(
-.B AT_RSEQ_FEATURE_SIZE
-) .
-Its size is passed as parameter to the rseq system call.
-.RS
-.PP
+.IR getauxval ()
+invoked with
+.BR AT_RSEQ_FEATURE_SIZE .
+Its size is passed as parameter to the
+.BR rseq ()
+system call.
+.in +4n
+.IP
.EX
+#include <linux/rseq.h>
+
struct rseq {
__u32 cpu_id_start;
__u32 cpu_id;
union {
- /* Edited out for conciseness. [...] */
+ /* ... */
} rseq_cs;
__u32 flags;
__u32 node_id;
__u32 mm_cid;
} __attribute__((aligned(32)));
.EE
-.RE
+.in
.TP
.B Fields
.RS
+.TP
.I cpu_id_start
-.RS
Always-updated value of the CPU number on which the registered thread is
running.
Its value is guaranteed to always be a possible CPU number,
-even when rseq is not registered.
+even when
+.BR rseq ()
+is not registered.
Its value should always be confirmed by reading the cpu_id field before
user-space performs any side-effect
(e.g. storing to memory).
-.PP
+.IP
This field is always guaranteed to hold a valid CPU number in the range
[ 0 .. nr_possible_cpus - 1 ].
-It can therefore be loaded by user-space and used as an offset in
-per-cpu data structures without having to check whether its value is
-within the valid bounds compared to the number of possible CPUs in the
-system.
-.PP
+It can therefore be loaded by user-space
+and used as an offset in per-cpu data structures
+without having to check whether its value is within the valid bounds
+compared to the number of possible CPUs in the system.
+.IP
Initialized by user-space to a possible CPU number (e.g., 0),
-updated by the kernel for threads registered with rseq.
-.PP
-For user-space applications executed on a kernel without rseq support,
-the cpu_id_start field stays initialized at 0, which is indeed a valid
-CPU number.
+updated by the kernel for threads registered with
+.BR rseq ().
+.IP
+For user-space applications executed on a kernel without
+.BR rseq ()
+support,
+the cpu_id_start field stays initialized at 0,
+which is indeed a valid CPU number.
It is therefore valid to use it as an offset in per-cpu data structures,
and only validate whether it's actually the current CPU number by
comparing it with the cpu_id field within the rseq critical section.
-If the kernel does not provide rseq support, that cpu_id field stays
-initialized at -1,
+If the kernel does not provide
+.BR rseq ()
+support, that cpu_id field stays initialized at -1,
so the comparison always fails, as intended.
-.PP
+.IP
This field should only be read by the thread which registered this data
structure.
Aligned on 32-bit.
-.PP
-It is up to user-space to implement a fall-back mechanism for scenarios where
-rseq is not available.
-.RE
-.PP
+.IP
+It is up to user space to implement a fall-back mechanism for scenarios where
+.BR rseq ()
+is not available.
+.TP
.I cpu_id
-.RS
Always-updated value of the CPU number on which the registered thread is
running.
Initialized by user-space to -1,
-updated by the kernel for threads registered with rseq.
-.PP
+updated by the kernel for threads registered with
+.BR rseq ().
+.IP
This field should only be read by the thread which registered this data
structure.
Aligned on 32-bit.
-.RE
-.PP
+.TP
.I rseq_cs
-.RS
The rseq_cs field is a pointer to a
-.B struct rseq_cs .
+.BR "struct rseq_cs" .
Is is NULL when no rseq assembly block critical section is active for
the registered thread.
-Setting it to point to a critical section descriptor (
-.B struct rseq_cs
-) marks the beginning of the critical section.
-.PP
+Setting it to point to a critical section descriptor
+.RB ( "struct rseq_cs")
+marks the beginning of the critical section.
+.IP
Initialized by user-space to NULL.
-.PP
+.IP
Updated by user-space, which sets the address of the currently
active rseq_cs at the beginning of assembly instruction sequence
block,
signal outside of the range targeted by the rseq_cs.
Also needs to be set to NULL by user-space before reclaiming memory that
contains the targeted
-.B struct rseq_cs .
-.PP
+.BR "struct rseq_cs" .
+.IP
Read and set by the kernel.
-.PP
+.IP
This field should only be updated by the thread which registered this
data structure.
Aligned on 64-bit.
-.RE
-.PP
+.TP
.I flags
-.RS
Flags indicating the restart behavior for the registered thread.
This is mainly used for debugging purposes.
Can be a combination of:
+.RS
.TP
.B RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT
Inhibit instruction sequence block restart on preemption for this
thread.
-This flag is deprecated since kernel 6.1.
+This flag is deprecated since Linux 6.1.
.TP
.B RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL
Inhibit instruction sequence block restart on signal delivery for this
thread.
-This flag is deprecated since kernel 6.1.
+This flag is deprecated since Linux 6.1.
.TP
.B RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE
Inhibit instruction sequence block restart on migration for this thread.
-This flag is deprecated since kernel 6.1.
-.PP
-Initialized by user-space, used by the kernel.
+This flag is deprecated since Linux 6.1.
.RE
-.PP
+.IP
+Initialized by user-space, used by the kernel.
+.TP
.I node_id
-.RS
Always-updated value of the current NUMA node ID.
-.PP
+.IP
Initialized by user-space to 0.
-.PP
+.IP
Updated by the kernel.
Read by user-space with single-copy atomicity semantics.
This field should only be read by the thread which registered
this data structure.
Aligned on 32-bit.
-.RE
-.PP
+.TP
.I mm_cid
-.RS
Contains the current thread's concurrency ID
(allocated uniquely within a memory map).
-.PP
+.IP
Updated by the kernel.
Read by user-space with single-copy atomicity semantics.
This field should only be read by the thread which registered this data
structure.
Aligned on 32-bit.
-.PP
+.IP
This concurrency ID is within the possible cpus range,
and is temporarily (and uniquely) assigned while threads are actively
running within a memory map.
thus allowing efficient use of user-space memory for per-cpu data
structures.
.RE
-.RE
-.RE
.PP
The layout of
.B struct rseq_cs
.TP
.B Structure size
This structure has a fixed size of 32 bytes.
-.RS
+.in +4n
+.IP
.EX
+#include <linux/rseq.h>
+
struct rseq_cs {
__u32 version;
__u32 flags;
__u64 abort_ip;
} __attribute__((aligned(32)));
.EE
-.RE
-.PP
+.in
+.TP
.B Fields
.RS
+.TP
.I version
-.RS
Version of this structure.
Should be initialized to 0.
-.RE
-.PP
+.TP
.I flags
.RS
Flags indicating the restart behavior of this structure.
.B RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT
Inhibit instruction sequence block restart on preemption for this
critical section.
-This flag is deprecated since kernel 6.1.
+This flag is deprecated since Linux 6.1.
.TP
.B RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL
Inhibit instruction sequence block restart on signal delivery for this
critical section.
-This flag is deprecated since kernel 6.1.
+This flag is deprecated since Linux 6.1.
.TP
.B RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE
Inhibit instruction sequence block restart on migration for this
critical section.
-This flag is deprecated since kernel 6.1.
+This flag is deprecated since Linux 6.1.
.RE
-.PP
+.TP
.I start_ip
-.RS
Instruction pointer address of the first instruction of the sequence of
consecutive assembly instructions.
-.RE
-.PP
+.TP
.I post_commit_offset
-.RS
Offset (from start_ip address) of the address after the last instruction
of the sequence of consecutive assembly instructions.
-.RE
-.PP
+.TP
.I abort_ip
-.RS
Instruction pointer address where to move the execution flow in case of
abort of the sequence of consecutive assembly instructions.
.RE
-.RE
.PP
The
.I rseq_len
argument is the 32-bit signature to be expected before the abort
handler code.
.PP
-A single library per process should keep the rseq structure in a
-per-thread data structure.
+A single library per process should keep the
+.B struct rseq
+in a per-thread data structure.
The
.I cpu_id
field should be initialized to -1, and the
.I cpu_id_start
field should be initialized to a possible CPU value (typically 0).
.PP
-Each thread is responsible for registering and unregistering its rseq
-structure.
-No more than one rseq structure address can be registered per thread at
-a given time.
+Each thread is responsible for registering and unregistering its
+.BR "struct rseq" .
+No more than one
+.B struct rseq
+address can be registered per thread at a given time.
.PP
-Reclaim of rseq object's memory must only be done after either an
-explicit rseq unregistration is performed or after the thread exits.
+Reclaim of
+.B struct rseq
+object's memory must only be done after either an explicit rseq
+unregistration is performed or after the thread exits.
.PP
-In a typical usage scenario, the thread registering the rseq
-structure will be performing loads and stores from/to that structure.
+In a typical usage scenario, the thread registering the
+.B struct rseq
+will be performing loads and stores from/to that structure.
It is however also allowed to read that structure from other threads.
-The rseq field updates performed by the kernel provide relaxed atomicity
+The
+.B struct rseq
+field updates performed by the kernel provide relaxed atomicity
semantics (atomic store, without memory ordering),
which guarantee that other threads performing relaxed atomic reads
(atomic load, without memory ordering) of the cpu number fields will
always observe a consistent value.
-.PP
.SH RETURN VALUE
A return value of 0 indicates success.
On error, \-1 is returned, and
.I errno
is set appropriately.
-.PP
.SH ERRORS
.TP
.B EINVAL
.I sig
argument on unregistration does not match the signature received
on registration.
-.PP
.SH VERSIONS
The
.BR rseq ()
system call was added in Linux 4.18.
-.PP
.SH STANDARDS
.BR rseq ()
is Linux-specific.
-.PP
.SH SEE ALSO
.BR sched_getcpu (3) ,
.BR membarrier (2) ,