Commit | Line | Data |
---|---|---|
5e928f77 RW |
1 | Run-time Power Management Framework for I/O Devices |
2 | ||
3 | (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc. | |
7490e442 | 4 | (C) 2010 Alan Stern <stern@rowland.harvard.edu> |
5e928f77 RW |
5 | |
6 | 1. Introduction | |
7 | ||
8 | Support for run-time power management (run-time PM) of I/O devices is provided | |
9 | at the power management core (PM core) level by means of: | |
10 | ||
11 | * The power management workqueue pm_wq in which bus types and device drivers can | |
12 | put their PM-related work items. It is strongly recommended that pm_wq be | |
13 | used for queuing all work items related to run-time PM, because this allows | |
14 | them to be synchronized with system-wide power transitions (suspend to RAM, | |
15 | hibernation and resume from system sleep states). pm_wq is declared in | |
16 | include/linux/pm_runtime.h and defined in kernel/power/main.c. | |
17 | ||
18 | * A number of run-time PM fields in the 'power' member of 'struct device' (which | |
19 | is of the type 'struct dev_pm_info', defined in include/linux/pm.h) that can | |
20 | be used for synchronizing run-time PM operations with one another. | |
21 | ||
22 | * Three device run-time PM callbacks in 'struct dev_pm_ops' (defined in | |
23 | include/linux/pm.h). | |
24 | ||
25 | * A set of helper functions defined in drivers/base/power/runtime.c that can be | |
26 | used for carrying out run-time PM operations in such a way that the | |
27 | synchronization between them is taken care of by the PM core. Bus types and | |
28 | device drivers are encouraged to use these functions. | |
29 | ||
30 | The run-time PM callbacks present in 'struct dev_pm_ops', the device run-time PM | |
31 | fields of 'struct dev_pm_info' and the core helper functions provided for | |
32 | run-time PM are described below. | |
33 | ||
34 | 2. Device Run-time PM Callbacks | |
35 | ||
36 | There are three device run-time PM callbacks defined in 'struct dev_pm_ops': | |
37 | ||
38 | struct dev_pm_ops { | |
39 | ... | |
40 | int (*runtime_suspend)(struct device *dev); | |
41 | int (*runtime_resume)(struct device *dev); | |
e1b1903e | 42 | int (*runtime_idle)(struct device *dev); |
5e928f77 RW |
43 | ... |
44 | }; | |
45 | ||
a6ab7aa9 RW |
46 | The ->runtime_suspend(), ->runtime_resume() and ->runtime_idle() callbacks are |
47 | executed by the PM core for either the bus type, or device type (if the bus | |
48 | type's callback is not defined), or device class (if the bus type's and device | |
49 | type's callbacks are not defined) of given device. The bus type, device type | |
50 | and device class callbacks are referred to as subsystem-level callbacks in what | |
51 | follows. | |
52 | ||
c7b61de5 AS |
53 | By default, the callbacks are always invoked in process context with interrupts |
54 | enabled. However, subsystems can use the pm_runtime_irq_safe() helper function | |
55 | to tell the PM core that a device's ->runtime_suspend() and ->runtime_resume() | |
56 | callbacks should be invoked in atomic context with interrupts disabled | |
57 | (->runtime_idle() is still invoked the default way). This implies that these | |
58 | callback routines must not block or sleep, but it also means that the | |
59 | synchronous helper functions listed at the end of Section 4 can be used within | |
60 | an interrupt handler or in an atomic context. | |
61 | ||
a6ab7aa9 RW |
62 | The subsystem-level suspend callback is _entirely_ _responsible_ for handling |
63 | the suspend of the device as appropriate, which may, but need not include | |
64 | executing the device driver's own ->runtime_suspend() callback (from the | |
5e928f77 | 65 | PM core's point of view it is not necessary to implement a ->runtime_suspend() |
a6ab7aa9 RW |
66 | callback in a device driver as long as the subsystem-level suspend callback |
67 | knows what to do to handle the device). | |
5e928f77 | 68 | |
a6ab7aa9 | 69 | * Once the subsystem-level suspend callback has completed successfully |
5e928f77 RW |
70 | for given device, the PM core regards the device as suspended, which need |
71 | not mean that the device has been put into a low power state. It is | |
72 | supposed to mean, however, that the device will not process data and will | |
a6ab7aa9 RW |
73 | not communicate with the CPU(s) and RAM until the subsystem-level resume |
74 | callback is executed for it. The run-time PM status of a device after | |
75 | successful execution of the subsystem-level suspend callback is 'suspended'. | |
76 | ||
77 | * If the subsystem-level suspend callback returns -EBUSY or -EAGAIN, | |
78 | the device's run-time PM status is 'active', which means that the device | |
79 | _must_ be fully operational afterwards. | |
80 | ||
81 | * If the subsystem-level suspend callback returns an error code different | |
82 | from -EBUSY or -EAGAIN, the PM core regards this as a fatal error and will | |
83 | refuse to run the helper functions described in Section 4 for the device, | |
84 | until the status of it is directly set either to 'active', or to 'suspended' | |
85 | (the PM core provides special helper functions for this purpose). | |
86 | ||
87 | In particular, if the driver requires remote wake-up capability (i.e. hardware | |
88 | mechanism allowing the device to request a change of its power state, such as | |
89 | PCI PME) for proper functioning and device_run_wake() returns 'false' for the | |
90 | device, then ->runtime_suspend() should return -EBUSY. On the other hand, if | |
91 | device_run_wake() returns 'true' for the device and the device is put into a low | |
92 | power state during the execution of the subsystem-level suspend callback, it is | |
93 | expected that remote wake-up will be enabled for the device. Generally, remote | |
94 | wake-up should be enabled for all input devices put into a low power state at | |
95 | run time. | |
96 | ||
97 | The subsystem-level resume callback is _entirely_ _responsible_ for handling the | |
98 | resume of the device as appropriate, which may, but need not include executing | |
99 | the device driver's own ->runtime_resume() callback (from the PM core's point of | |
100 | view it is not necessary to implement a ->runtime_resume() callback in a device | |
101 | driver as long as the subsystem-level resume callback knows what to do to handle | |
102 | the device). | |
103 | ||
104 | * Once the subsystem-level resume callback has completed successfully, the PM | |
105 | core regards the device as fully operational, which means that the device | |
106 | _must_ be able to complete I/O operations as needed. The run-time PM status | |
107 | of the device is then 'active'. | |
108 | ||
109 | * If the subsystem-level resume callback returns an error code, the PM core | |
110 | regards this as a fatal error and will refuse to run the helper functions | |
111 | described in Section 4 for the device, until its status is directly set | |
112 | either to 'active' or to 'suspended' (the PM core provides special helper | |
113 | functions for this purpose). | |
114 | ||
115 | The subsystem-level idle callback is executed by the PM core whenever the device | |
116 | appears to be idle, which is indicated to the PM core by two counters, the | |
117 | device's usage counter and the counter of 'active' children of the device. | |
5e928f77 RW |
118 | |
119 | * If any of these counters is decreased using a helper function provided by | |
120 | the PM core and it turns out to be equal to zero, the other counter is | |
121 | checked. If that counter also is equal to zero, the PM core executes the | |
a6ab7aa9 | 122 | subsystem-level idle callback with the device as an argument. |
5e928f77 | 123 | |
a6ab7aa9 RW |
124 | The action performed by a subsystem-level idle callback is totally dependent on |
125 | the subsystem in question, but the expected and recommended action is to check | |
126 | if the device can be suspended (i.e. if all of the conditions necessary for | |
127 | suspending the device are satisfied) and to queue up a suspend request for the | |
128 | device in that case. The value returned by this callback is ignored by the PM | |
129 | core. | |
5e928f77 RW |
130 | |
131 | The helper functions provided by the PM core, described in Section 4, guarantee | |
132 | that the following constraints are met with respect to the bus type's run-time | |
133 | PM callbacks: | |
134 | ||
135 | (1) The callbacks are mutually exclusive (e.g. it is forbidden to execute | |
136 | ->runtime_suspend() in parallel with ->runtime_resume() or with another | |
137 | instance of ->runtime_suspend() for the same device) with the exception that | |
138 | ->runtime_suspend() or ->runtime_resume() can be executed in parallel with | |
139 | ->runtime_idle() (although ->runtime_idle() will not be started while any | |
140 | of the other callbacks is being executed for the same device). | |
141 | ||
142 | (2) ->runtime_idle() and ->runtime_suspend() can only be executed for 'active' | |
143 | devices (i.e. the PM core will only execute ->runtime_idle() or | |
144 | ->runtime_suspend() for the devices the run-time PM status of which is | |
145 | 'active'). | |
146 | ||
147 | (3) ->runtime_idle() and ->runtime_suspend() can only be executed for a device | |
148 | the usage counter of which is equal to zero _and_ either the counter of | |
149 | 'active' children of which is equal to zero, or the 'power.ignore_children' | |
150 | flag of which is set. | |
151 | ||
152 | (4) ->runtime_resume() can only be executed for 'suspended' devices (i.e. the | |
153 | PM core will only execute ->runtime_resume() for the devices the run-time | |
154 | PM status of which is 'suspended'). | |
155 | ||
156 | Additionally, the helper functions provided by the PM core obey the following | |
157 | rules: | |
158 | ||
159 | * If ->runtime_suspend() is about to be executed or there's a pending request | |
160 | to execute it, ->runtime_idle() will not be executed for the same device. | |
161 | ||
162 | * A request to execute or to schedule the execution of ->runtime_suspend() | |
163 | will cancel any pending requests to execute ->runtime_idle() for the same | |
164 | device. | |
165 | ||
166 | * If ->runtime_resume() is about to be executed or there's a pending request | |
167 | to execute it, the other callbacks will not be executed for the same device. | |
168 | ||
169 | * A request to execute ->runtime_resume() will cancel any pending or | |
15bcb91d AS |
170 | scheduled requests to execute the other callbacks for the same device, |
171 | except for scheduled autosuspends. | |
5e928f77 RW |
172 | |
173 | 3. Run-time PM Device Fields | |
174 | ||
175 | The following device run-time PM fields are present in 'struct dev_pm_info', as | |
176 | defined in include/linux/pm.h: | |
177 | ||
178 | struct timer_list suspend_timer; | |
15bcb91d | 179 | - timer used for scheduling (delayed) suspend and autosuspend requests |
5e928f77 RW |
180 | |
181 | unsigned long timer_expires; | |
182 | - timer expiration time, in jiffies (if this is different from zero, the | |
183 | timer is running and will expire at that time, otherwise the timer is not | |
184 | running) | |
185 | ||
186 | struct work_struct work; | |
187 | - work structure used for queuing up requests (i.e. work items in pm_wq) | |
188 | ||
189 | wait_queue_head_t wait_queue; | |
190 | - wait queue used if any of the helper functions needs to wait for another | |
191 | one to complete | |
192 | ||
193 | spinlock_t lock; | |
194 | - lock used for synchronisation | |
195 | ||
196 | atomic_t usage_count; | |
197 | - the usage counter of the device | |
198 | ||
199 | atomic_t child_count; | |
200 | - the count of 'active' children of the device | |
201 | ||
202 | unsigned int ignore_children; | |
203 | - if set, the value of child_count is ignored (but still updated) | |
204 | ||
205 | unsigned int disable_depth; | |
206 | - used for disabling the helper funcions (they work normally if this is | |
207 | equal to zero); the initial value of it is 1 (i.e. run-time PM is | |
208 | initially disabled for all devices) | |
209 | ||
210 | unsigned int runtime_error; | |
211 | - if set, there was a fatal error (one of the callbacks returned error code | |
212 | as described in Section 2), so the helper funtions will not work until | |
213 | this flag is cleared; this is the error code returned by the failing | |
214 | callback | |
215 | ||
216 | unsigned int idle_notification; | |
217 | - if set, ->runtime_idle() is being executed | |
218 | ||
219 | unsigned int request_pending; | |
220 | - if set, there's a pending request (i.e. a work item queued up into pm_wq) | |
221 | ||
222 | enum rpm_request request; | |
223 | - type of request that's pending (valid if request_pending is set) | |
224 | ||
225 | unsigned int deferred_resume; | |
226 | - set if ->runtime_resume() is about to be run while ->runtime_suspend() is | |
227 | being executed for that device and it is not practical to wait for the | |
228 | suspend to complete; means "start a resume as soon as you've suspended" | |
229 | ||
7a1a8eb5 RW |
230 | unsigned int run_wake; |
231 | - set if the device is capable of generating run-time wake-up events | |
232 | ||
5e928f77 RW |
233 | enum rpm_status runtime_status; |
234 | - the run-time PM status of the device; this field's initial value is | |
235 | RPM_SUSPENDED, which means that each device is initially regarded by the | |
236 | PM core as 'suspended', regardless of its real hardware status | |
237 | ||
87d1b3e6 RW |
238 | unsigned int runtime_auto; |
239 | - if set, indicates that the user space has allowed the device driver to | |
240 | power manage the device at run time via the /sys/devices/.../power/control | |
241 | interface; it may only be modified with the help of the pm_runtime_allow() | |
242 | and pm_runtime_forbid() helper functions | |
243 | ||
7490e442 AS |
244 | unsigned int no_callbacks; |
245 | - indicates that the device does not use the run-time PM callbacks (see | |
246 | Section 8); it may be modified only by the pm_runtime_no_callbacks() | |
247 | helper function | |
248 | ||
c7b61de5 AS |
249 | unsigned int irq_safe; |
250 | - indicates that the ->runtime_suspend() and ->runtime_resume() callbacks | |
251 | will be invoked with the spinlock held and interrupts disabled | |
252 | ||
15bcb91d AS |
253 | unsigned int use_autosuspend; |
254 | - indicates that the device's driver supports delayed autosuspend (see | |
255 | Section 9); it may be modified only by the | |
256 | pm_runtime{_dont}_use_autosuspend() helper functions | |
257 | ||
258 | unsigned int timer_autosuspends; | |
259 | - indicates that the PM core should attempt to carry out an autosuspend | |
260 | when the timer expires rather than a normal suspend | |
261 | ||
262 | int autosuspend_delay; | |
263 | - the delay time (in milliseconds) to be used for autosuspend | |
264 | ||
265 | unsigned long last_busy; | |
266 | - the time (in jiffies) when the pm_runtime_mark_last_busy() helper | |
267 | function was last called for this device; used in calculating inactivity | |
268 | periods for autosuspend | |
269 | ||
5e928f77 RW |
270 | All of the above fields are members of the 'power' member of 'struct device'. |
271 | ||
272 | 4. Run-time PM Device Helper Functions | |
273 | ||
274 | The following run-time PM helper functions are defined in | |
275 | drivers/base/power/runtime.c and include/linux/pm_runtime.h: | |
276 | ||
277 | void pm_runtime_init(struct device *dev); | |
278 | - initialize the device run-time PM fields in 'struct dev_pm_info' | |
279 | ||
280 | void pm_runtime_remove(struct device *dev); | |
281 | - make sure that the run-time PM of the device will be disabled after | |
282 | removing the device from device hierarchy | |
283 | ||
284 | int pm_runtime_idle(struct device *dev); | |
a6ab7aa9 RW |
285 | - execute the subsystem-level idle callback for the device; returns 0 on |
286 | success or error code on failure, where -EINPROGRESS means that | |
287 | ->runtime_idle() is already being executed | |
5e928f77 RW |
288 | |
289 | int pm_runtime_suspend(struct device *dev); | |
a6ab7aa9 | 290 | - execute the subsystem-level suspend callback for the device; returns 0 on |
5e928f77 RW |
291 | success, 1 if the device's run-time PM status was already 'suspended', or |
292 | error code on failure, where -EAGAIN or -EBUSY means it is safe to attempt | |
293 | to suspend the device again in future | |
294 | ||
15bcb91d AS |
295 | int pm_runtime_autosuspend(struct device *dev); |
296 | - same as pm_runtime_suspend() except that the autosuspend delay is taken | |
297 | into account; if pm_runtime_autosuspend_expiration() says the delay has | |
298 | not yet expired then an autosuspend is scheduled for the appropriate time | |
299 | and 0 is returned | |
300 | ||
5e928f77 | 301 | int pm_runtime_resume(struct device *dev); |
de8164fb | 302 | - execute the subsystem-level resume callback for the device; returns 0 on |
5e928f77 RW |
303 | success, 1 if the device's run-time PM status was already 'active' or |
304 | error code on failure, where -EAGAIN means it may be safe to attempt to | |
305 | resume the device again in future, but 'power.runtime_error' should be | |
306 | checked additionally | |
307 | ||
308 | int pm_request_idle(struct device *dev); | |
a6ab7aa9 RW |
309 | - submit a request to execute the subsystem-level idle callback for the |
310 | device (the request is represented by a work item in pm_wq); returns 0 on | |
311 | success or error code if the request has not been queued up | |
5e928f77 | 312 | |
15bcb91d AS |
313 | int pm_request_autosuspend(struct device *dev); |
314 | - schedule the execution of the subsystem-level suspend callback for the | |
315 | device when the autosuspend delay has expired; if the delay has already | |
316 | expired then the work item is queued up immediately | |
317 | ||
5e928f77 | 318 | int pm_schedule_suspend(struct device *dev, unsigned int delay); |
a6ab7aa9 RW |
319 | - schedule the execution of the subsystem-level suspend callback for the |
320 | device in future, where 'delay' is the time to wait before queuing up a | |
321 | suspend work item in pm_wq, in milliseconds (if 'delay' is zero, the work | |
322 | item is queued up immediately); returns 0 on success, 1 if the device's PM | |
5e928f77 RW |
323 | run-time status was already 'suspended', or error code if the request |
324 | hasn't been scheduled (or queued up if 'delay' is 0); if the execution of | |
325 | ->runtime_suspend() is already scheduled and not yet expired, the new | |
326 | value of 'delay' will be used as the time to wait | |
327 | ||
328 | int pm_request_resume(struct device *dev); | |
a6ab7aa9 RW |
329 | - submit a request to execute the subsystem-level resume callback for the |
330 | device (the request is represented by a work item in pm_wq); returns 0 on | |
5e928f77 RW |
331 | success, 1 if the device's run-time PM status was already 'active', or |
332 | error code if the request hasn't been queued up | |
333 | ||
334 | void pm_runtime_get_noresume(struct device *dev); | |
335 | - increment the device's usage counter | |
336 | ||
337 | int pm_runtime_get(struct device *dev); | |
338 | - increment the device's usage counter, run pm_request_resume(dev) and | |
339 | return its result | |
340 | ||
341 | int pm_runtime_get_sync(struct device *dev); | |
342 | - increment the device's usage counter, run pm_runtime_resume(dev) and | |
343 | return its result | |
344 | ||
345 | void pm_runtime_put_noidle(struct device *dev); | |
346 | - decrement the device's usage counter | |
347 | ||
348 | int pm_runtime_put(struct device *dev); | |
15bcb91d AS |
349 | - decrement the device's usage counter; if the result is 0 then run |
350 | pm_request_idle(dev) and return its result | |
351 | ||
352 | int pm_runtime_put_autosuspend(struct device *dev); | |
353 | - decrement the device's usage counter; if the result is 0 then run | |
354 | pm_request_autosuspend(dev) and return its result | |
5e928f77 RW |
355 | |
356 | int pm_runtime_put_sync(struct device *dev); | |
15bcb91d AS |
357 | - decrement the device's usage counter; if the result is 0 then run |
358 | pm_runtime_idle(dev) and return its result | |
359 | ||
c7b61de5 AS |
360 | int pm_runtime_put_sync_suspend(struct device *dev); |
361 | - decrement the device's usage counter; if the result is 0 then run | |
362 | pm_runtime_suspend(dev) and return its result | |
363 | ||
15bcb91d AS |
364 | int pm_runtime_put_sync_autosuspend(struct device *dev); |
365 | - decrement the device's usage counter; if the result is 0 then run | |
366 | pm_runtime_autosuspend(dev) and return its result | |
5e928f77 RW |
367 | |
368 | void pm_runtime_enable(struct device *dev); | |
369 | - enable the run-time PM helper functions to run the device bus type's | |
370 | run-time PM callbacks described in Section 2 | |
371 | ||
372 | int pm_runtime_disable(struct device *dev); | |
a6ab7aa9 RW |
373 | - prevent the run-time PM helper functions from running subsystem-level |
374 | run-time PM callbacks for the device, make sure that all of the pending | |
375 | run-time PM operations on the device are either completed or canceled; | |
376 | returns 1 if there was a resume request pending and it was necessary to | |
377 | execute the subsystem-level resume callback for the device to satisfy that | |
378 | request, otherwise 0 is returned | |
5e928f77 RW |
379 | |
380 | void pm_suspend_ignore_children(struct device *dev, bool enable); | |
381 | - set/unset the power.ignore_children flag of the device | |
382 | ||
383 | int pm_runtime_set_active(struct device *dev); | |
384 | - clear the device's 'power.runtime_error' flag, set the device's run-time | |
385 | PM status to 'active' and update its parent's counter of 'active' | |
386 | children as appropriate (it is only valid to use this function if | |
387 | 'power.runtime_error' is set or 'power.disable_depth' is greater than | |
388 | zero); it will fail and return error code if the device has a parent | |
389 | which is not active and the 'power.ignore_children' flag of which is unset | |
390 | ||
391 | void pm_runtime_set_suspended(struct device *dev); | |
392 | - clear the device's 'power.runtime_error' flag, set the device's run-time | |
393 | PM status to 'suspended' and update its parent's counter of 'active' | |
394 | children as appropriate (it is only valid to use this function if | |
395 | 'power.runtime_error' is set or 'power.disable_depth' is greater than | |
396 | zero) | |
397 | ||
d690b2cd | 398 | bool pm_runtime_suspended(struct device *dev); |
f08f5a0a RW |
399 | - return true if the device's runtime PM status is 'suspended' and its |
400 | 'power.disable_depth' field is equal to zero, or false otherwise | |
d690b2cd | 401 | |
87d1b3e6 RW |
402 | void pm_runtime_allow(struct device *dev); |
403 | - set the power.runtime_auto flag for the device and decrease its usage | |
404 | counter (used by the /sys/devices/.../power/control interface to | |
405 | effectively allow the device to be power managed at run time) | |
406 | ||
407 | void pm_runtime_forbid(struct device *dev); | |
408 | - unset the power.runtime_auto flag for the device and increase its usage | |
409 | counter (used by the /sys/devices/.../power/control interface to | |
410 | effectively prevent the device from being power managed at run time) | |
411 | ||
7490e442 AS |
412 | void pm_runtime_no_callbacks(struct device *dev); |
413 | - set the power.no_callbacks flag for the device and remove the run-time | |
414 | PM attributes from /sys/devices/.../power (or prevent them from being | |
415 | added when the device is registered) | |
416 | ||
c7b61de5 AS |
417 | void pm_runtime_irq_safe(struct device *dev); |
418 | - set the power.irq_safe flag for the device, causing the runtime-PM | |
419 | suspend and resume callbacks (but not the idle callback) to be invoked | |
420 | with interrupts disabled | |
421 | ||
15bcb91d AS |
422 | void pm_runtime_mark_last_busy(struct device *dev); |
423 | - set the power.last_busy field to the current time | |
424 | ||
425 | void pm_runtime_use_autosuspend(struct device *dev); | |
426 | - set the power.use_autosuspend flag, enabling autosuspend delays | |
427 | ||
428 | void pm_runtime_dont_use_autosuspend(struct device *dev); | |
429 | - clear the power.use_autosuspend flag, disabling autosuspend delays | |
430 | ||
431 | void pm_runtime_set_autosuspend_delay(struct device *dev, int delay); | |
432 | - set the power.autosuspend_delay value to 'delay' (expressed in | |
433 | milliseconds); if 'delay' is negative then run-time suspends are | |
434 | prevented | |
435 | ||
436 | unsigned long pm_runtime_autosuspend_expiration(struct device *dev); | |
437 | - calculate the time when the current autosuspend delay period will expire, | |
438 | based on power.last_busy and power.autosuspend_delay; if the delay time | |
439 | is 1000 ms or larger then the expiration time is rounded up to the | |
440 | nearest second; returns 0 if the delay period has already expired or | |
441 | power.use_autosuspend isn't set, otherwise returns the expiration time | |
442 | in jiffies | |
443 | ||
5e928f77 RW |
444 | It is safe to execute the following helper functions from interrupt context: |
445 | ||
446 | pm_request_idle() | |
15bcb91d | 447 | pm_request_autosuspend() |
5e928f77 RW |
448 | pm_schedule_suspend() |
449 | pm_request_resume() | |
450 | pm_runtime_get_noresume() | |
451 | pm_runtime_get() | |
452 | pm_runtime_put_noidle() | |
453 | pm_runtime_put() | |
15bcb91d AS |
454 | pm_runtime_put_autosuspend() |
455 | pm_runtime_enable() | |
5e928f77 RW |
456 | pm_suspend_ignore_children() |
457 | pm_runtime_set_active() | |
458 | pm_runtime_set_suspended() | |
15bcb91d AS |
459 | pm_runtime_suspended() |
460 | pm_runtime_mark_last_busy() | |
461 | pm_runtime_autosuspend_expiration() | |
5e928f77 | 462 | |
c7b61de5 AS |
463 | If pm_runtime_irq_safe() has been called for a device then the following helper |
464 | functions may also be used in interrupt context: | |
465 | ||
466 | pm_runtime_suspend() | |
467 | pm_runtime_autosuspend() | |
468 | pm_runtime_resume() | |
469 | pm_runtime_get_sync() | |
470 | pm_runtime_put_sync_suspend() | |
471 | ||
5e928f77 RW |
472 | 5. Run-time PM Initialization, Device Probing and Removal |
473 | ||
474 | Initially, the run-time PM is disabled for all devices, which means that the | |
475 | majority of the run-time PM helper funtions described in Section 4 will return | |
476 | -EAGAIN until pm_runtime_enable() is called for the device. | |
477 | ||
478 | In addition to that, the initial run-time PM status of all devices is | |
479 | 'suspended', but it need not reflect the actual physical state of the device. | |
480 | Thus, if the device is initially active (i.e. it is able to process I/O), its | |
481 | run-time PM status must be changed to 'active', with the help of | |
482 | pm_runtime_set_active(), before pm_runtime_enable() is called for the device. | |
483 | ||
484 | However, if the device has a parent and the parent's run-time PM is enabled, | |
485 | calling pm_runtime_set_active() for the device will affect the parent, unless | |
486 | the parent's 'power.ignore_children' flag is set. Namely, in that case the | |
487 | parent won't be able to suspend at run time, using the PM core's helper | |
488 | functions, as long as the child's status is 'active', even if the child's | |
489 | run-time PM is still disabled (i.e. pm_runtime_enable() hasn't been called for | |
490 | the child yet or pm_runtime_disable() has been called for it). For this reason, | |
491 | once pm_runtime_set_active() has been called for the device, pm_runtime_enable() | |
492 | should be called for it too as soon as reasonably possible or its run-time PM | |
493 | status should be changed back to 'suspended' with the help of | |
494 | pm_runtime_set_suspended(). | |
495 | ||
496 | If the default initial run-time PM status of the device (i.e. 'suspended') | |
497 | reflects the actual state of the device, its bus type's or its driver's | |
498 | ->probe() callback will likely need to wake it up using one of the PM core's | |
499 | helper functions described in Section 4. In that case, pm_runtime_resume() | |
500 | should be used. Of course, for this purpose the device's run-time PM has to be | |
501 | enabled earlier by calling pm_runtime_enable(). | |
502 | ||
503 | If the device bus type's or driver's ->probe() or ->remove() callback runs | |
504 | pm_runtime_suspend() or pm_runtime_idle() or their asynchronous counterparts, | |
505 | they will fail returning -EAGAIN, because the device's usage counter is | |
506 | incremented by the core before executing ->probe() and ->remove(). Still, it | |
507 | may be desirable to suspend the device as soon as ->probe() or ->remove() has | |
a6ab7aa9 RW |
508 | finished, so the PM core uses pm_runtime_idle_sync() to invoke the |
509 | subsystem-level idle callback for the device at that time. | |
f1212ae1 | 510 | |
87d1b3e6 RW |
511 | The user space can effectively disallow the driver of the device to power manage |
512 | it at run time by changing the value of its /sys/devices/.../power/control | |
513 | attribute to "on", which causes pm_runtime_forbid() to be called. In principle, | |
514 | this mechanism may also be used by the driver to effectively turn off the | |
515 | run-time power management of the device until the user space turns it on. | |
516 | Namely, during the initialization the driver can make sure that the run-time PM | |
517 | status of the device is 'active' and call pm_runtime_forbid(). It should be | |
518 | noted, however, that if the user space has already intentionally changed the | |
519 | value of /sys/devices/.../power/control to "auto" to allow the driver to power | |
520 | manage the device at run time, the driver may confuse it by using | |
521 | pm_runtime_forbid() this way. | |
522 | ||
f1212ae1 AS |
523 | 6. Run-time PM and System Sleep |
524 | ||
525 | Run-time PM and system sleep (i.e., system suspend and hibernation, also known | |
526 | as suspend-to-RAM and suspend-to-disk) interact with each other in a couple of | |
527 | ways. If a device is active when a system sleep starts, everything is | |
528 | straightforward. But what should happen if the device is already suspended? | |
529 | ||
530 | The device may have different wake-up settings for run-time PM and system sleep. | |
531 | For example, remote wake-up may be enabled for run-time suspend but disallowed | |
532 | for system sleep (device_may_wakeup(dev) returns 'false'). When this happens, | |
533 | the subsystem-level system suspend callback is responsible for changing the | |
534 | device's wake-up setting (it may leave that to the device driver's system | |
535 | suspend routine). It may be necessary to resume the device and suspend it again | |
536 | in order to do so. The same is true if the driver uses different power levels | |
537 | or other settings for run-time suspend and system sleep. | |
538 | ||
539 | During system resume, devices generally should be brought back to full power, | |
540 | even if they were suspended before the system sleep began. There are several | |
541 | reasons for this, including: | |
542 | ||
543 | * The device might need to switch power levels, wake-up settings, etc. | |
544 | ||
545 | * Remote wake-up events might have been lost by the firmware. | |
546 | ||
547 | * The device's children may need the device to be at full power in order | |
548 | to resume themselves. | |
549 | ||
550 | * The driver's idea of the device state may not agree with the device's | |
551 | physical state. This can happen during resume from hibernation. | |
552 | ||
553 | * The device might need to be reset. | |
554 | ||
555 | * Even though the device was suspended, if its usage counter was > 0 then most | |
556 | likely it would need a run-time resume in the near future anyway. | |
557 | ||
558 | * Always going back to full power is simplest. | |
559 | ||
560 | If the device was suspended before the sleep began, then its run-time PM status | |
561 | will have to be updated to reflect the actual post-system sleep status. The way | |
562 | to do this is: | |
563 | ||
564 | pm_runtime_disable(dev); | |
565 | pm_runtime_set_active(dev); | |
566 | pm_runtime_enable(dev); | |
567 | ||
568 | The PM core always increments the run-time usage counter before calling the | |
569 | ->prepare() callback and decrements it after calling the ->complete() callback. | |
570 | Hence disabling run-time PM temporarily like this will not cause any run-time | |
571 | suspend callbacks to be lost. | |
d690b2cd RW |
572 | |
573 | 7. Generic subsystem callbacks | |
574 | ||
575 | Subsystems may wish to conserve code space by using the set of generic power | |
576 | management callbacks provided by the PM core, defined in | |
577 | driver/base/power/generic_ops.c: | |
578 | ||
579 | int pm_generic_runtime_idle(struct device *dev); | |
580 | - invoke the ->runtime_idle() callback provided by the driver of this | |
581 | device, if defined, and call pm_runtime_suspend() for this device if the | |
582 | return value is 0 or the callback is not defined | |
583 | ||
584 | int pm_generic_runtime_suspend(struct device *dev); | |
585 | - invoke the ->runtime_suspend() callback provided by the driver of this | |
586 | device and return its result, or return -EINVAL if not defined | |
587 | ||
588 | int pm_generic_runtime_resume(struct device *dev); | |
589 | - invoke the ->runtime_resume() callback provided by the driver of this | |
590 | device and return its result, or return -EINVAL if not defined | |
591 | ||
592 | int pm_generic_suspend(struct device *dev); | |
593 | - if the device has not been suspended at run time, invoke the ->suspend() | |
594 | callback provided by its driver and return its result, or return 0 if not | |
595 | defined | |
596 | ||
597 | int pm_generic_resume(struct device *dev); | |
598 | - invoke the ->resume() callback provided by the driver of this device and, | |
599 | if successful, change the device's runtime PM status to 'active' | |
600 | ||
601 | int pm_generic_freeze(struct device *dev); | |
602 | - if the device has not been suspended at run time, invoke the ->freeze() | |
603 | callback provided by its driver and return its result, or return 0 if not | |
604 | defined | |
605 | ||
606 | int pm_generic_thaw(struct device *dev); | |
607 | - if the device has not been suspended at run time, invoke the ->thaw() | |
608 | callback provided by its driver and return its result, or return 0 if not | |
609 | defined | |
610 | ||
611 | int pm_generic_poweroff(struct device *dev); | |
612 | - if the device has not been suspended at run time, invoke the ->poweroff() | |
613 | callback provided by its driver and return its result, or return 0 if not | |
614 | defined | |
615 | ||
616 | int pm_generic_restore(struct device *dev); | |
617 | - invoke the ->restore() callback provided by the driver of this device and, | |
618 | if successful, change the device's runtime PM status to 'active' | |
619 | ||
620 | These functions can be assigned to the ->runtime_idle(), ->runtime_suspend(), | |
621 | ->runtime_resume(), ->suspend(), ->resume(), ->freeze(), ->thaw(), ->poweroff(), | |
622 | or ->restore() callback pointers in the subsystem-level dev_pm_ops structures. | |
623 | ||
624 | If a subsystem wishes to use all of them at the same time, it can simply assign | |
625 | the GENERIC_SUBSYS_PM_OPS macro, defined in include/linux/pm.h, to its | |
626 | dev_pm_ops structure pointer. | |
627 | ||
628 | Device drivers that wish to use the same function as a system suspend, freeze, | |
629 | poweroff and run-time suspend callback, and similarly for system resume, thaw, | |
630 | restore, and run-time resume, can achieve this with the help of the | |
631 | UNIVERSAL_DEV_PM_OPS macro defined in include/linux/pm.h (possibly setting its | |
632 | last argument to NULL). | |
7490e442 AS |
633 | |
634 | 8. "No-Callback" Devices | |
635 | ||
636 | Some "devices" are only logical sub-devices of their parent and cannot be | |
637 | power-managed on their own. (The prototype example is a USB interface. Entire | |
638 | USB devices can go into low-power mode or send wake-up requests, but neither is | |
639 | possible for individual interfaces.) The drivers for these devices have no | |
640 | need of run-time PM callbacks; if the callbacks did exist, ->runtime_suspend() | |
641 | and ->runtime_resume() would always return 0 without doing anything else and | |
642 | ->runtime_idle() would always call pm_runtime_suspend(). | |
643 | ||
644 | Subsystems can tell the PM core about these devices by calling | |
645 | pm_runtime_no_callbacks(). This should be done after the device structure is | |
646 | initialized and before it is registered (although after device registration is | |
647 | also okay). The routine will set the device's power.no_callbacks flag and | |
648 | prevent the non-debugging run-time PM sysfs attributes from being created. | |
649 | ||
650 | When power.no_callbacks is set, the PM core will not invoke the | |
651 | ->runtime_idle(), ->runtime_suspend(), or ->runtime_resume() callbacks. | |
652 | Instead it will assume that suspends and resumes always succeed and that idle | |
653 | devices should be suspended. | |
654 | ||
655 | As a consequence, the PM core will never directly inform the device's subsystem | |
656 | or driver about run-time power changes. Instead, the driver for the device's | |
657 | parent must take responsibility for telling the device's driver when the | |
658 | parent's power state changes. | |
15bcb91d AS |
659 | |
660 | 9. Autosuspend, or automatically-delayed suspends | |
661 | ||
662 | Changing a device's power state isn't free; it requires both time and energy. | |
663 | A device should be put in a low-power state only when there's some reason to | |
664 | think it will remain in that state for a substantial time. A common heuristic | |
665 | says that a device which hasn't been used for a while is liable to remain | |
666 | unused; following this advice, drivers should not allow devices to be suspended | |
667 | at run-time until they have been inactive for some minimum period. Even when | |
668 | the heuristic ends up being non-optimal, it will still prevent devices from | |
669 | "bouncing" too rapidly between low-power and full-power states. | |
670 | ||
671 | The term "autosuspend" is an historical remnant. It doesn't mean that the | |
672 | device is automatically suspended (the subsystem or driver still has to call | |
673 | the appropriate PM routines); rather it means that run-time suspends will | |
674 | automatically be delayed until the desired period of inactivity has elapsed. | |
675 | ||
676 | Inactivity is determined based on the power.last_busy field. Drivers should | |
677 | call pm_runtime_mark_last_busy() to update this field after carrying out I/O, | |
678 | typically just before calling pm_runtime_put_autosuspend(). The desired length | |
679 | of the inactivity period is a matter of policy. Subsystems can set this length | |
680 | initially by calling pm_runtime_set_autosuspend_delay(), but after device | |
681 | registration the length should be controlled by user space, using the | |
682 | /sys/devices/.../power/autosuspend_delay_ms attribute. | |
683 | ||
684 | In order to use autosuspend, subsystems or drivers must call | |
685 | pm_runtime_use_autosuspend() (preferably before registering the device), and | |
686 | thereafter they should use the various *_autosuspend() helper functions instead | |
687 | of the non-autosuspend counterparts: | |
688 | ||
689 | Instead of: pm_runtime_suspend use: pm_runtime_autosuspend; | |
690 | Instead of: pm_schedule_suspend use: pm_request_autosuspend; | |
691 | Instead of: pm_runtime_put use: pm_runtime_put_autosuspend; | |
692 | Instead of: pm_runtime_put_sync use: pm_runtime_put_sync_autosuspend. | |
693 | ||
694 | Drivers may also continue to use the non-autosuspend helper functions; they | |
695 | will behave normally, not taking the autosuspend delay into account. | |
696 | Similarly, if the power.use_autosuspend field isn't set then the autosuspend | |
697 | helper functions will behave just like the non-autosuspend counterparts. | |
698 | ||
699 | The implementation is well suited for asynchronous use in interrupt contexts. | |
700 | However such use inevitably involves races, because the PM core can't | |
701 | synchronize ->runtime_suspend() callbacks with the arrival of I/O requests. | |
702 | This synchronization must be handled by the driver, using its private lock. | |
703 | Here is a schematic pseudo-code example: | |
704 | ||
705 | foo_read_or_write(struct foo_priv *foo, void *data) | |
706 | { | |
707 | lock(&foo->private_lock); | |
708 | add_request_to_io_queue(foo, data); | |
709 | if (foo->num_pending_requests++ == 0) | |
710 | pm_runtime_get(&foo->dev); | |
711 | if (!foo->is_suspended) | |
712 | foo_process_next_request(foo); | |
713 | unlock(&foo->private_lock); | |
714 | } | |
715 | ||
716 | foo_io_completion(struct foo_priv *foo, void *req) | |
717 | { | |
718 | lock(&foo->private_lock); | |
719 | if (--foo->num_pending_requests == 0) { | |
720 | pm_runtime_mark_last_busy(&foo->dev); | |
721 | pm_runtime_put_autosuspend(&foo->dev); | |
722 | } else { | |
723 | foo_process_next_request(foo); | |
724 | } | |
725 | unlock(&foo->private_lock); | |
726 | /* Send req result back to the user ... */ | |
727 | } | |
728 | ||
729 | int foo_runtime_suspend(struct device *dev) | |
730 | { | |
731 | struct foo_priv foo = container_of(dev, ...); | |
732 | int ret = 0; | |
733 | ||
734 | lock(&foo->private_lock); | |
735 | if (foo->num_pending_requests > 0) { | |
736 | ret = -EBUSY; | |
737 | } else { | |
738 | /* ... suspend the device ... */ | |
739 | foo->is_suspended = 1; | |
740 | } | |
741 | unlock(&foo->private_lock); | |
742 | return ret; | |
743 | } | |
744 | ||
745 | int foo_runtime_resume(struct device *dev) | |
746 | { | |
747 | struct foo_priv foo = container_of(dev, ...); | |
748 | ||
749 | lock(&foo->private_lock); | |
750 | /* ... resume the device ... */ | |
751 | foo->is_suspended = 0; | |
752 | pm_runtime_mark_last_busy(&foo->dev); | |
753 | if (foo->num_pending_requests > 0) | |
754 | foo_process_requests(foo); | |
755 | unlock(&foo->private_lock); | |
756 | return 0; | |
757 | } | |
758 | ||
759 | The important point is that after foo_io_completion() asks for an autosuspend, | |
760 | the foo_runtime_suspend() callback may race with foo_read_or_write(). | |
761 | Therefore foo_runtime_suspend() has to check whether there are any pending I/O | |
762 | requests (while holding the private lock) before allowing the suspend to | |
763 | proceed. | |
764 | ||
765 | In addition, the power.autosuspend_delay field can be changed by user space at | |
766 | any time. If a driver cares about this, it can call | |
767 | pm_runtime_autosuspend_expiration() from within the ->runtime_suspend() | |
768 | callback while holding its private lock. If the function returns a nonzero | |
769 | value then the delay has not yet expired and the callback should return | |
770 | -EAGAIN. |