Commit | Line | Data |
---|---|---|
1da177e4 LT |
1 | The Linux Watchdog driver API. |
2 | ||
3 | Copyright 2002 Christer Weingel <wingel@nano-system.com> | |
4 | ||
5 | Some parts of this document are copied verbatim from the sbc60xxwdt | |
6 | driver which is (c) Copyright 2000 Jakob Oestergaard <jakob@ostenfeld.dk> | |
7 | ||
8 | This document describes the state of the Linux 2.4.18 kernel. | |
9 | ||
10 | Introduction: | |
11 | ||
12 | A Watchdog Timer (WDT) is a hardware circuit that can reset the | |
13 | computer system in case of a software fault. You probably knew that | |
14 | already. | |
15 | ||
16 | Usually a userspace daemon will notify the kernel watchdog driver via the | |
17 | /dev/watchdog special device file that userspace is still alive, at | |
18 | regular intervals. When such a notification occurs, the driver will | |
19 | usually tell the hardware watchdog that everything is in order, and | |
20 | that the watchdog should wait for yet another little while to reset | |
21 | the system. If userspace fails (RAM error, kernel bug, whatever), the | |
22 | notifications cease to occur, and the hardware watchdog will reset the | |
23 | system (causing a reboot) after the timeout occurs. | |
24 | ||
25 | The Linux watchdog API is a rather AD hoc construction and different | |
26 | drivers implement different, and sometimes incompatible, parts of it. | |
27 | This file is an attempt to document the existing usage and allow | |
28 | future driver writers to use it as a reference. | |
29 | ||
30 | The simplest API: | |
31 | ||
32 | All drivers support the basic mode of operation, where the watchdog | |
33 | activates as soon as /dev/watchdog is opened and will reboot unless | |
34 | the watchdog is pinged within a certain time, this time is called the | |
35 | timeout or margin. The simplest way to ping the watchdog is to write | |
36 | some data to the device. So a very simple watchdog daemon would look | |
37 | like this: | |
38 | ||
39 | int main(int argc, const char *argv[]) { | |
40 | int fd=open("/dev/watchdog",O_WRONLY); | |
41 | if (fd==-1) { | |
42 | perror("watchdog"); | |
43 | exit(1); | |
44 | } | |
45 | while(1) { | |
46 | write(fd, "\0", 1); | |
47 | sleep(10); | |
48 | } | |
49 | } | |
50 | ||
51 | A more advanced driver could for example check that a HTTP server is | |
52 | still responding before doing the write call to ping the watchdog. | |
53 | ||
54 | When the device is closed, the watchdog is disabled. This is not | |
55 | always such a good idea, since if there is a bug in the watchdog | |
56 | daemon and it crashes the system will not reboot. Because of this, | |
57 | some of the drivers support the configuration option "Disable watchdog | |
58 | shutdown on close", CONFIG_WATCHDOG_NOWAYOUT. If it is set to Y when | |
59 | compiling the kernel, there is no way of disabling the watchdog once | |
60 | it has been started. So, if the watchdog dameon crashes, the system | |
61 | will reboot after the timeout has passed. | |
62 | ||
63 | Some other drivers will not disable the watchdog, unless a specific | |
64 | magic character 'V' has been sent /dev/watchdog just before closing | |
65 | the file. If the userspace daemon closes the file without sending | |
66 | this special character, the driver will assume that the daemon (and | |
67 | userspace in general) died, and will stop pinging the watchdog without | |
68 | disabling it first. This will then cause a reboot. | |
69 | ||
70 | The ioctl API: | |
71 | ||
72 | All conforming drivers also support an ioctl API. | |
73 | ||
74 | Pinging the watchdog using an ioctl: | |
75 | ||
76 | All drivers that have an ioctl interface support at least one ioctl, | |
77 | KEEPALIVE. This ioctl does exactly the same thing as a write to the | |
78 | watchdog device, so the main loop in the above program could be | |
79 | replaced with: | |
80 | ||
81 | while (1) { | |
82 | ioctl(fd, WDIOC_KEEPALIVE, 0); | |
83 | sleep(10); | |
84 | } | |
85 | ||
86 | the argument to the ioctl is ignored. | |
87 | ||
88 | Setting and getting the timeout: | |
89 | ||
90 | For some drivers it is possible to modify the watchdog timeout on the | |
91 | fly with the SETTIMEOUT ioctl, those drivers have the WDIOF_SETTIMEOUT | |
92 | flag set in their option field. The argument is an integer | |
93 | representing the timeout in seconds. The driver returns the real | |
94 | timeout used in the same variable, and this timeout might differ from | |
95 | the requested one due to limitation of the hardware. | |
96 | ||
97 | int timeout = 45; | |
98 | ioctl(fd, WDIOC_SETTIMEOUT, &timeout); | |
99 | printf("The timeout was set to %d seconds\n", timeout); | |
100 | ||
101 | This example might actually print "The timeout was set to 60 seconds" | |
102 | if the device has a granularity of minutes for its timeout. | |
103 | ||
104 | Starting with the Linux 2.4.18 kernel, it is possible to query the | |
105 | current timeout using the GETTIMEOUT ioctl. | |
106 | ||
107 | ioctl(fd, WDIOC_GETTIMEOUT, &timeout); | |
108 | printf("The timeout was is %d seconds\n", timeout); | |
109 | ||
110 | Envinronmental monitoring: | |
111 | ||
112 | All watchdog drivers are required return more information about the system, | |
113 | some do temperature, fan and power level monitoring, some can tell you | |
114 | the reason for the last reboot of the system. The GETSUPPORT ioctl is | |
115 | available to ask what the device can do: | |
116 | ||
117 | struct watchdog_info ident; | |
118 | ioctl(fd, WDIOC_GETSUPPORT, &ident); | |
119 | ||
120 | the fields returned in the ident struct are: | |
121 | ||
122 | identity a string identifying the watchdog driver | |
123 | firmware_version the firmware version of the card if available | |
124 | options a flags describing what the device supports | |
125 | ||
126 | the options field can have the following bits set, and describes what | |
127 | kind of information that the GET_STATUS and GET_BOOT_STATUS ioctls can | |
128 | return. [FIXME -- Is this correct?] | |
129 | ||
130 | WDIOF_OVERHEAT Reset due to CPU overheat | |
131 | ||
132 | The machine was last rebooted by the watchdog because the thermal limit was | |
133 | exceeded | |
134 | ||
135 | WDIOF_FANFAULT Fan failed | |
136 | ||
137 | A system fan monitored by the watchdog card has failed | |
138 | ||
139 | WDIOF_EXTERN1 External relay 1 | |
140 | ||
141 | External monitoring relay/source 1 was triggered. Controllers intended for | |
142 | real world applications include external monitoring pins that will trigger | |
143 | a reset. | |
144 | ||
145 | WDIOF_EXTERN2 External relay 2 | |
146 | ||
147 | External monitoring relay/source 2 was triggered | |
148 | ||
149 | WDIOF_POWERUNDER Power bad/power fault | |
150 | ||
151 | The machine is showing an undervoltage status | |
152 | ||
153 | WDIOF_CARDRESET Card previously reset the CPU | |
154 | ||
155 | The last reboot was caused by the watchdog card | |
156 | ||
157 | WDIOF_POWEROVER Power over voltage | |
158 | ||
159 | The machine is showing an overvoltage status. Note that if one level is | |
160 | under and one over both bits will be set - this may seem odd but makes | |
161 | sense. | |
162 | ||
163 | WDIOF_KEEPALIVEPING Keep alive ping reply | |
164 | ||
165 | The watchdog saw a keepalive ping since it was last queried. | |
166 | ||
167 | WDIOF_SETTIMEOUT Can set/get the timeout | |
168 | ||
169 | ||
170 | For those drivers that return any bits set in the option field, the | |
171 | GETSTATUS and GETBOOTSTATUS ioctls can be used to ask for the current | |
172 | status, and the status at the last reboot, respectively. | |
173 | ||
174 | int flags; | |
175 | ioctl(fd, WDIOC_GETSTATUS, &flags); | |
176 | ||
177 | or | |
178 | ||
179 | ioctl(fd, WDIOC_GETBOOTSTATUS, &flags); | |
180 | ||
181 | Note that not all devices support these two calls, and some only | |
182 | support the GETBOOTSTATUS call. | |
183 | ||
184 | Some drivers can measure the temperature using the GETTEMP ioctl. The | |
185 | returned value is the temperature in degrees farenheit. | |
186 | ||
187 | int temperature; | |
188 | ioctl(fd, WDIOC_GETTEMP, &temperature); | |
189 | ||
190 | Finally the SETOPTIONS ioctl can be used to control some aspects of | |
191 | the cards operation; right now the pcwd driver is the only one | |
192 | supporting thiss ioctl. | |
193 | ||
194 | int options = 0; | |
195 | ioctl(fd, WDIOC_SETOPTIONS, options); | |
196 | ||
197 | The following options are available: | |
198 | ||
199 | WDIOS_DISABLECARD Turn off the watchdog timer | |
200 | WDIOS_ENABLECARD Turn on the watchdog timer | |
201 | WDIOS_TEMPPANIC Kernel panic on temperature trip | |
202 | ||
203 | [FIXME -- better explanations] | |
204 | ||
205 | Implementations in the current drivers in the kernel tree: | |
206 | ||
207 | Here I have tried to summarize what the different drivers support and | |
208 | where they do strange things compared to the other drivers. | |
209 | ||
210 | acquirewdt.c -- Acquire Single Board Computer | |
211 | ||
212 | This driver has a hardcoded timeout of 1 minute | |
213 | ||
214 | Supports CONFIG_WATCHDOG_NOWAYOUT | |
215 | ||
216 | GETSUPPORT returns KEEPALIVEPING. GETSTATUS will return 1 if | |
217 | the device is open, 0 if not. [FIXME -- isn't this rather | |
218 | silly? To be able to use the ioctl, the device must be open | |
219 | and so GETSTATUS will always return 1]. | |
220 | ||
221 | advantechwdt.c -- Advantech Single Board Computer | |
222 | ||
223 | Timeout that defaults to 60 seconds, supports SETTIMEOUT. | |
224 | ||
225 | Supports CONFIG_WATCHDOG_NOWAYOUT | |
226 | ||
227 | GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT. | |
228 | The GETSTATUS call returns if the device is open or not. | |
229 | [FIXME -- silliness again?] | |
230 | ||
231 | eurotechwdt.c -- Eurotech CPU-1220/1410 | |
232 | ||
233 | The timeout can be set using the SETTIMEOUT ioctl and defaults | |
234 | to 60 seconds. | |
235 | ||
236 | Also has a module parameter "ev", event type which controls | |
237 | what should happen on a timeout, the string "int" or anything | |
238 | else that causes a reboot. [FIXME -- better description] | |
239 | ||
240 | Supports CONFIG_WATCHDOG_NOWAYOUT | |
241 | ||
242 | GETSUPPORT returns CARDRESET and WDIOF_SETTIMEOUT but | |
243 | GETSTATUS is not supported and GETBOOTSTATUS just returns 0. | |
244 | ||
245 | i810-tco.c -- Intel 810 chipset | |
246 | ||
247 | Also has support for a lot of other i8x0 stuff, but the | |
248 | watchdog is one of the things. | |
249 | ||
250 | The timeout is set using the module parameter "i810_margin", | |
251 | which is in steps of 0.6 seconds where 2<i810_margin<64. The | |
252 | driver supports the SETTIMEOUT ioctl. | |
253 | ||
254 | Supports CONFIG_WATCHDOG_NOWAYOUT. | |
255 | ||
256 | GETSUPPORT returns WDIOF_SETTIMEOUT. The GETSTATUS call | |
257 | returns some kind of timer value which ist not compatible with | |
258 | the other drivers. GETBOOT status returns some kind of | |
259 | hardware specific boot status. [FIXME -- describe this] | |
260 | ||
261 | ib700wdt.c -- IB700 Single Board Computer | |
262 | ||
263 | Default timeout of 30 seconds and the timeout is settable | |
264 | using the SETTIMEOUT ioctl. Note that only a few timeout | |
265 | values are supported. | |
266 | ||
267 | Supports CONFIG_WATCHDOG_NOWAYOUT | |
268 | ||
269 | GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT. | |
270 | The GETSTATUS call returns if the device is open or not. | |
271 | [FIXME -- silliness again?] | |
272 | ||
273 | machzwd.c -- MachZ ZF-Logic | |
274 | ||
275 | Hardcoded timeout of 10 seconds | |
276 | ||
277 | Has a module parameter "action" that controls what happens | |
278 | when the timeout runs out which can be 0 = RESET (default), | |
279 | 1 = SMI, 2 = NMI, 3 = SCI. | |
280 | ||
281 | Supports CONFIG_WATCHDOG_NOWAYOUT and the magic character | |
282 | 'V' close handling. | |
283 | ||
284 | GETSUPPORT returns WDIOF_KEEPALIVEPING, and the GETSTATUS call | |
285 | returns if the device is open or not. [FIXME -- silliness | |
286 | again?] | |
287 | ||
288 | mixcomwd.c -- MixCom Watchdog | |
289 | ||
290 | [FIXME -- I'm unable to tell what the timeout is] | |
291 | ||
292 | Supports CONFIG_WATCHDOG_NOWAYOUT | |
293 | ||
294 | GETSUPPORT returns WDIOF_KEEPALIVEPING, GETSTATUS returns if | |
295 | the device is opened or not [FIXME -- I'm not really sure how | |
296 | this works, there seems to be some magic connected to | |
297 | CONFIG_WATCHDOG_NOWAYOUT] | |
298 | ||
299 | pcwd.c -- Berkshire PC Watchdog | |
300 | ||
301 | Hardcoded timeout of 1.5 seconds | |
302 | ||
303 | Supports CONFIG_WATCHDOG_NOWAYOUT | |
304 | ||
305 | GETSUPPORT returns WDIOF_OVERHEAT|WDIOF_CARDRESET and both | |
306 | GETSTATUS and GETBOOTSTATUS return something useful. | |
307 | ||
308 | The SETOPTIONS call can be used to enable and disable the card | |
309 | and to ask the driver to call panic if the system overheats. | |
310 | ||
311 | sbc60xxwdt.c -- 60xx Single Board Computer | |
312 | ||
313 | Hardcoded timeout of 10 seconds | |
314 | ||
315 | Does not support CONFIG_WATCHDOG_NOWAYOUT, but has the magic | |
316 | character 'V' close handling. | |
317 | ||
318 | No bits set in GETSUPPORT | |
319 | ||
320 | scx200.c -- National SCx200 CPUs | |
321 | ||
322 | Not in the kernel yet. | |
323 | ||
324 | The timeout is set using a module parameter "margin" which | |
325 | defaults to 60 seconds. The timeout can also be set using | |
326 | SETTIMEOUT and read using GETTIMEOUT. | |
327 | ||
328 | Supports a module parameter "nowayout" that is initialized | |
329 | with the value of CONFIG_WATCHDOG_NOWAYOUT. Also supports the | |
330 | magic character 'V' handling. | |
331 | ||
332 | shwdt.c -- SuperH 3/4 processors | |
333 | ||
334 | [FIXME -- I'm unable to tell what the timeout is] | |
335 | ||
336 | Supports CONFIG_WATCHDOG_NOWAYOUT | |
337 | ||
338 | GETSUPPORT returns WDIOF_KEEPALIVEPING, and the GETSTATUS call | |
339 | returns if the device is open or not. [FIXME -- silliness | |
340 | again?] | |
341 | ||
342 | softdog.c -- Software watchdog | |
343 | ||
344 | The timeout is set with the module parameter "soft_margin" | |
345 | which defaults to 60 seconds, the timeout is also settable | |
346 | using the SETTIMEOUT ioctl. | |
347 | ||
348 | Supports CONFIG_WATCHDOG_NOWAYOUT | |
349 | ||
350 | WDIOF_SETTIMEOUT bit set in GETSUPPORT | |
351 | ||
352 | w83877f_wdt.c -- W83877F Computer | |
353 | ||
354 | Hardcoded timeout of 30 seconds | |
355 | ||
356 | Does not support CONFIG_WATCHDOG_NOWAYOUT, but has the magic | |
357 | character 'V' close handling. | |
358 | ||
359 | No bits set in GETSUPPORT | |
360 | ||
361 | w83627hf_wdt.c -- w83627hf watchdog | |
362 | ||
363 | Timeout that defaults to 60 seconds, supports SETTIMEOUT. | |
364 | ||
365 | Supports CONFIG_WATCHDOG_NOWAYOUT | |
366 | ||
367 | GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT. | |
368 | The GETSTATUS call returns if the device is open or not. | |
369 | ||
370 | wdt.c -- ICS WDT500/501 ISA and | |
371 | wdt_pci.c -- ICS WDT500/501 PCI | |
372 | ||
373 | Default timeout of 60 seconds. The timeout is also settable | |
374 | using the SETTIMEOUT ioctl. | |
375 | ||
376 | Supports CONFIG_WATCHDOG_NOWAYOUT | |
377 | ||
378 | GETSUPPORT returns with bits set depending on the actual | |
379 | card. The WDT501 supports a lot of external monitoring, the | |
380 | WDT500 much less. | |
381 | ||
382 | wdt285.c -- Footbridge watchdog | |
383 | ||
384 | The timeout is set with the module parameter "soft_margin" | |
385 | which defaults to 60 seconds. The timeout is also settable | |
386 | using the SETTIMEOUT ioctl. | |
387 | ||
388 | Does not support CONFIG_WATCHDOG_NOWAYOUT | |
389 | ||
390 | WDIOF_SETTIMEOUT bit set in GETSUPPORT | |
391 | ||
392 | wdt977.c -- Netwinder W83977AF chip | |
393 | ||
394 | Hardcoded timeout of 3 minutes | |
395 | ||
396 | Supports CONFIG_WATCHDOG_NOWAYOUT | |
397 | ||
398 | Does not support any ioctls at all. | |
399 |