Commit | Line | Data |
---|---|---|
9fd7fc34 NMG |
1 | lglock - local/global locks for mostly local access patterns |
2 | ------------------------------------------------------------ | |
3 | ||
4 | Origin: Nick Piggin's VFS scalability series introduced during | |
5 | 2.6.35++ [1] [2] | |
6 | Location: kernel/locking/lglock.c | |
7 | include/linux/lglock.h | |
8 | Users: currently only the VFS and stop_machine related code | |
9 | ||
10 | Design Goal: | |
11 | ------------ | |
12 | ||
13 | Improve scalability of globally used large data sets that are | |
14 | distributed over all CPUs as per_cpu elements. | |
15 | ||
16 | To manage global data structures that are partitioned over all CPUs | |
17 | as per_cpu elements but can be mostly handled by CPU local actions | |
18 | lglock will be used where the majority of accesses are cpu local | |
19 | reading and occasional cpu local writing with very infrequent | |
20 | global write access. | |
21 | ||
22 | ||
23 | * deal with things locally whenever possible | |
24 | - very fast access to the local per_cpu data | |
25 | - reasonably fast access to specific per_cpu data on a different | |
26 | CPU | |
27 | * while making global action possible when needed | |
28 | - by expensive access to all CPUs locks - effectively | |
29 | resulting in a globally visible critical section. | |
30 | ||
31 | Design: | |
32 | ------- | |
33 | ||
34 | Basically it is an array of per_cpu spinlocks with the | |
35 | lg_local_lock/unlock accessing the local CPUs lock object and the | |
36 | lg_local_lock_cpu/unlock_cpu accessing a remote CPUs lock object | |
37 | the lg_local_lock has to disable preemption as migration protection so | |
38 | that the reference to the local CPUs lock does not go out of scope. | |
39 | Due to the lg_local_lock/unlock only touching cpu-local resources it | |
40 | is fast. Taking the local lock on a different CPU will be more | |
41 | expensive but still relatively cheap. | |
42 | ||
43 | One can relax the migration constraints by acquiring the current | |
44 | CPUs lock with lg_local_lock_cpu, remember the cpu, and release that | |
45 | lock at the end of the critical section even if migrated. This should | |
46 | give most of the performance benefits without inhibiting migration | |
47 | though needs careful considerations for nesting of lglocks and | |
48 | consideration of deadlocks with lg_global_lock. | |
49 | ||
50 | The lg_global_lock/unlock locks all underlying spinlocks of all | |
51 | possible CPUs (including those off-line). The preemption disable/enable | |
52 | are needed in the non-RT kernels to prevent deadlocks like: | |
53 | ||
54 | on cpu 1 | |
55 | ||
56 | task A task B | |
57 | lg_global_lock | |
58 | got cpu 0 lock | |
59 | <<<< preempt <<<< | |
60 | lg_local_lock_cpu for cpu 0 | |
61 | spin on cpu 0 lock | |
62 | ||
63 | On -RT this deadlock scenario is resolved by the arch_spin_locks in the | |
64 | lglocks being replaced by rt_mutexes which resolve the above deadlock | |
65 | by boosting the lock-holder. | |
66 | ||
67 | ||
68 | Implementation: | |
69 | --------------- | |
70 | ||
71 | The initial lglock implementation from Nick Piggin used some complex | |
72 | macros to generate the lglock/brlock in lglock.h - they were later | |
73 | turned into a set of functions by Andi Kleen [7]. The change to functions | |
74 | was motivated by the presence of multiple lock users and also by them | |
75 | being easier to maintain than the generating macros. This change to | |
76 | functions is also the basis to eliminated the restriction of not | |
77 | being initializeable in kernel modules (the remaining problem is that | |
78 | locks are not explicitly initialized - see lockdep-design.txt) | |
79 | ||
80 | Declaration and initialization: | |
81 | ------------------------------- | |
82 | ||
83 | #include <linux/lglock.h> | |
84 | ||
85 | DEFINE_LGLOCK(name) | |
86 | or: | |
87 | DEFINE_STATIC_LGLOCK(name); | |
88 | ||
89 | lg_lock_init(&name, "lockdep_name_string"); | |
90 | ||
91 | on UP this is mapped to DEFINE_SPINLOCK(name) in both cases, note | |
92 | also that as of 3.18-rc6 all declaration in use are of the _STATIC_ | |
93 | variant (and it seems that the non-static was never in use). | |
94 | lg_lock_init is initializing the lockdep map only. | |
95 | ||
96 | Usage: | |
97 | ------ | |
98 | ||
99 | From the locking semantics it is a spinlock. It could be called a | |
100 | locality aware spinlock. lg_local_* behaves like a per_cpu | |
101 | spinlock and lg_global_* like a global spinlock. | |
102 | No surprises in the API. | |
103 | ||
104 | lg_local_lock(*lglock); | |
105 | access to protected per_cpu object on this CPU | |
106 | lg_local_unlock(*lglock); | |
107 | ||
108 | lg_local_lock_cpu(*lglock, cpu); | |
109 | access to protected per_cpu object on other CPU cpu | |
110 | lg_local_unlock_cpu(*lglock, cpu); | |
111 | ||
112 | lg_global_lock(*lglock); | |
113 | access all protected per_cpu objects on all CPUs | |
114 | lg_global_unlock(*lglock); | |
115 | ||
116 | There are no _trylock variants of the lglocks. | |
117 | ||
118 | Note that the lg_global_lock/unlock has to iterate over all possible | |
119 | CPUs rather than the actually present CPUs or a CPU could go off-line | |
120 | with a held lock [4] and that makes it very expensive. A discussion on | |
121 | these issues can be found at [5] | |
122 | ||
123 | Constraints: | |
124 | ------------ | |
125 | ||
126 | * currently the declaration of lglocks in kernel modules is not | |
127 | possible, though this should be doable with little change. | |
128 | * lglocks are not recursive. | |
129 | * suitable for code that can do most operations on the CPU local | |
130 | data and will very rarely need the global lock | |
131 | * lg_global_lock/unlock is *very* expensive and does not scale | |
132 | * on UP systems all lg_* primitives are simply spinlocks | |
133 | * in PREEMPT_RT the spinlock becomes an rt-mutex and can sleep but | |
134 | does not change the tasks state while sleeping [6]. | |
135 | * in PREEMPT_RT the preempt_disable/enable in lg_local_lock/unlock | |
136 | is downgraded to a migrate_disable/enable, the other | |
137 | preempt_disable/enable are downgraded to barriers [6]. | |
138 | The deadlock noted for non-RT above is resolved due to rt_mutexes | |
139 | boosting the lock-holder in this case which arch_spin_locks do | |
140 | not do. | |
141 | ||
142 | lglocks were designed for very specific problems in the VFS and probably | |
143 | only are the right answer in these corner cases. Any new user that looks | |
144 | at lglocks probably wants to look at the seqlock and RCU alternatives as | |
145 | her first choice. There are also efforts to resolve the RCU issues that | |
146 | currently prevent using RCU in place of view remaining lglocks. | |
147 | ||
148 | Note on brlock history: | |
149 | ----------------------- | |
150 | ||
151 | The 'Big Reader' read-write spinlocks were originally introduced by | |
152 | Ingo Molnar in 2000 (2.4/2.5 kernel series) and removed in 2003. They | |
153 | later were introduced by the VFS scalability patch set in 2.6 series | |
154 | again as the "big reader lock" brlock [2] variant of lglock which has | |
155 | been replaced by seqlock primitives or by RCU based primitives in the | |
156 | 3.13 kernel series as was suggested in [3] in 2003. The brlock was | |
157 | entirely removed in the 3.13 kernel series. | |
158 | ||
159 | Link: 1 http://lkml.org/lkml/2010/8/2/81 | |
160 | Link: 2 http://lwn.net/Articles/401738/ | |
161 | Link: 3 http://lkml.org/lkml/2003/3/9/205 | |
162 | Link: 4 https://lkml.org/lkml/2011/8/24/185 | |
163 | Link: 5 http://lkml.org/lkml/2011/12/18/189 | |
164 | Link: 6 https://www.kernel.org/pub/linux/kernel/projects/rt/ | |
165 | patch series - lglocks-rt.patch.patch | |
166 | Link: 7 http://lkml.org/lkml/2012/3/5/26 |