Commit | Line | Data |
---|---|---|
19306059 PM |
1 | Using RCU to Protect Dynamic NMI Handlers |
2 | ||
3 | ||
4 | Although RCU is usually used to protect read-mostly data structures, | |
5 | it is possible to use RCU to provide dynamic non-maskable interrupt | |
6 | handlers, as well as dynamic irq handlers. This document describes | |
7 | how to do this, drawing loosely from Zwane Mwaikambo's NMI-timer | |
25eb650a WG |
8 | work in "arch/x86/oprofile/nmi_timer_int.c" and in |
9 | "arch/x86/kernel/traps.c". | |
19306059 PM |
10 | |
11 | The relevant pieces of code are listed below, each followed by a | |
12 | brief explanation. | |
13 | ||
14 | static int dummy_nmi_callback(struct pt_regs *regs, int cpu) | |
15 | { | |
16 | return 0; | |
17 | } | |
18 | ||
19 | The dummy_nmi_callback() function is a "dummy" NMI handler that does | |
20 | nothing, but returns zero, thus saying that it did nothing, allowing | |
21 | the NMI handler to take the default machine-specific action. | |
22 | ||
23 | static nmi_callback_t nmi_callback = dummy_nmi_callback; | |
24 | ||
25 | This nmi_callback variable is a global function pointer to the current | |
26 | NMI handler. | |
27 | ||
b5606c2d | 28 | void do_nmi(struct pt_regs * regs, long error_code) |
19306059 PM |
29 | { |
30 | int cpu; | |
31 | ||
32 | nmi_enter(); | |
33 | ||
34 | cpu = smp_processor_id(); | |
35 | ++nmi_count(cpu); | |
36 | ||
50aec002 | 37 | if (!rcu_dereference_sched(nmi_callback)(regs, cpu)) |
19306059 PM |
38 | default_do_nmi(regs); |
39 | ||
40 | nmi_exit(); | |
41 | } | |
42 | ||
43 | The do_nmi() function processes each NMI. It first disables preemption | |
44 | in the same way that a hardware irq would, then increments the per-CPU | |
45 | count of NMIs. It then invokes the NMI handler stored in the nmi_callback | |
46 | function pointer. If this handler returns zero, do_nmi() invokes the | |
47 | default_do_nmi() function to handle a machine-specific NMI. Finally, | |
48 | preemption is restored. | |
49 | ||
50aec002 PM |
50 | In theory, rcu_dereference_sched() is not needed, since this code runs |
51 | only on i386, which in theory does not need rcu_dereference_sched() | |
52 | anyway. However, in practice it is a good documentation aid, particularly | |
53 | for anyone attempting to do something similar on Alpha or on systems | |
54 | with aggressive optimizing compilers. | |
19306059 | 55 | |
50aec002 | 56 | Quick Quiz: Why might the rcu_dereference_sched() be necessary on Alpha, |
19306059 PM |
57 | given that the code referenced by the pointer is read-only? |
58 | ||
59 | ||
60 | Back to the discussion of NMI and RCU... | |
61 | ||
62 | void set_nmi_callback(nmi_callback_t callback) | |
63 | { | |
64 | rcu_assign_pointer(nmi_callback, callback); | |
65 | } | |
66 | ||
67 | The set_nmi_callback() function registers an NMI handler. Note that any | |
68 | data that is to be used by the callback must be initialized up -before- | |
69 | the call to set_nmi_callback(). On architectures that do not order | |
70 | writes, the rcu_assign_pointer() ensures that the NMI handler sees the | |
71 | initialized values. | |
72 | ||
73 | void unset_nmi_callback(void) | |
74 | { | |
75 | rcu_assign_pointer(nmi_callback, dummy_nmi_callback); | |
76 | } | |
77 | ||
78 | This function unregisters an NMI handler, restoring the original | |
79 | dummy_nmi_handler(). However, there may well be an NMI handler | |
80 | currently executing on some other CPU. We therefore cannot free | |
81 | up any data structures used by the old NMI handler until execution | |
82 | of it completes on all other CPUs. | |
83 | ||
84 | One way to accomplish this is via synchronize_sched(), perhaps as | |
85 | follows: | |
86 | ||
87 | unset_nmi_callback(); | |
88 | synchronize_sched(); | |
89 | kfree(my_nmi_data); | |
90 | ||
91 | This works because synchronize_sched() blocks until all CPUs complete | |
92 | any preemption-disabled segments of code that they were executing. | |
93 | Since NMI handlers disable preemption, synchronize_sched() is guaranteed | |
94 | not to return until all ongoing NMI handlers exit. It is therefore safe | |
95 | to free up the handler's data as soon as synchronize_sched() returns. | |
96 | ||
32300751 | 97 | Important note: for this to work, the architecture in question must |
b15a2e7d | 98 | invoke nmi_enter() and nmi_exit() on NMI entry and exit, respectively. |
32300751 | 99 | |
19306059 PM |
100 | |
101 | Answer to Quick Quiz | |
102 | ||
50aec002 | 103 | Why might the rcu_dereference_sched() be necessary on Alpha, given |
19306059 PM |
104 | that the code referenced by the pointer is read-only? |
105 | ||
106 | Answer: The caller to set_nmi_callback() might well have | |
50aec002 PM |
107 | initialized some data that is to be used by the new NMI |
108 | handler. In this case, the rcu_dereference_sched() would | |
109 | be needed, because otherwise a CPU that received an NMI | |
110 | just after the new handler was set might see the pointer | |
111 | to the new NMI handler, but the old pre-initialized | |
112 | version of the handler's data. | |
113 | ||
114 | This same sad story can happen on other CPUs when using | |
115 | a compiler with aggressive pointer-value speculation | |
116 | optimizations. | |
117 | ||
118 | More important, the rcu_dereference_sched() makes it | |
119 | clear to someone reading the code that the pointer is | |
120 | being protected by RCU-sched. |