Commit | Line | Data |
---|---|---|
1da177e4 LT |
1 | ------------------------------------------------------------------------------ |
2 | T H E /proc F I L E S Y S T E M | |
3 | ------------------------------------------------------------------------------ | |
4 | /proc/sys Terrehon Bowden <terrehon@pacbell.net> October 7 1999 | |
5 | Bodo Bauer <bb@ricochet.net> | |
6 | ||
7 | 2.4.x update Jorge Nerin <comandante@zaralinux.com> November 14 2000 | |
8 | ------------------------------------------------------------------------------ | |
9 | Version 1.3 Kernel version 2.2.12 | |
10 | Kernel version 2.4.0-test11-pre4 | |
11 | ------------------------------------------------------------------------------ | |
12 | ||
13 | Table of Contents | |
14 | ----------------- | |
15 | ||
16 | 0 Preface | |
17 | 0.1 Introduction/Credits | |
18 | 0.2 Legal Stuff | |
19 | ||
20 | 1 Collecting System Information | |
21 | 1.1 Process-Specific Subdirectories | |
22 | 1.2 Kernel data | |
23 | 1.3 IDE devices in /proc/ide | |
24 | 1.4 Networking info in /proc/net | |
25 | 1.5 SCSI info | |
26 | 1.6 Parallel port info in /proc/parport | |
27 | 1.7 TTY info in /proc/tty | |
28 | 1.8 Miscellaneous kernel statistics in /proc/stat | |
29 | ||
30 | 2 Modifying System Parameters | |
31 | 2.1 /proc/sys/fs - File system data | |
32 | 2.2 /proc/sys/fs/binfmt_misc - Miscellaneous binary formats | |
33 | 2.3 /proc/sys/kernel - general kernel parameters | |
34 | 2.4 /proc/sys/vm - The virtual memory subsystem | |
35 | 2.5 /proc/sys/dev - Device specific parameters | |
36 | 2.6 /proc/sys/sunrpc - Remote procedure calls | |
37 | 2.7 /proc/sys/net - Networking stuff | |
38 | 2.8 /proc/sys/net/ipv4 - IPV4 settings | |
39 | 2.9 Appletalk | |
40 | 2.10 IPX | |
41 | 2.11 /proc/sys/fs/mqueue - POSIX message queues filesystem | |
d7ff0dbf JFM |
42 | 2.12 /proc/<pid>/oom_adj - Adjust the oom-killer score |
43 | 2.13 /proc/<pid>/oom_score - Display current oom-killer score | |
f9c99463 | 44 | 2.14 /proc/<pid>/io - Display the IO accounting fields |
bb90110d | 45 | 2.15 /proc/<pid>/coredump_filter - Core dump filtering settings |
2d4d4864 | 46 | 2.16 /proc/<pid>/mountinfo - Information about mounts |
7ef9964e | 47 | 2.17 /proc/sys/fs/epoll - Configuration options for the epoll interface |
1da177e4 LT |
48 | |
49 | ------------------------------------------------------------------------------ | |
50 | Preface | |
51 | ------------------------------------------------------------------------------ | |
52 | ||
53 | 0.1 Introduction/Credits | |
54 | ------------------------ | |
55 | ||
56 | This documentation is part of a soon (or so we hope) to be released book on | |
57 | the SuSE Linux distribution. As there is no complete documentation for the | |
58 | /proc file system and we've used many freely available sources to write these | |
59 | chapters, it seems only fair to give the work back to the Linux community. | |
60 | This work is based on the 2.2.* kernel version and the upcoming 2.4.*. I'm | |
61 | afraid it's still far from complete, but we hope it will be useful. As far as | |
62 | we know, it is the first 'all-in-one' document about the /proc file system. It | |
63 | is focused on the Intel x86 hardware, so if you are looking for PPC, ARM, | |
64 | SPARC, AXP, etc., features, you probably won't find what you are looking for. | |
65 | It also only covers IPv4 networking, not IPv6 nor other protocols - sorry. But | |
66 | additions and patches are welcome and will be added to this document if you | |
67 | mail them to Bodo. | |
68 | ||
69 | We'd like to thank Alan Cox, Rik van Riel, and Alexey Kuznetsov and a lot of | |
70 | other people for help compiling this documentation. We'd also like to extend a | |
71 | special thank you to Andi Kleen for documentation, which we relied on heavily | |
72 | to create this document, as well as the additional information he provided. | |
73 | Thanks to everybody else who contributed source or docs to the Linux kernel | |
74 | and helped create a great piece of software... :) | |
75 | ||
76 | If you have any comments, corrections or additions, please don't hesitate to | |
77 | contact Bodo Bauer at bb@ricochet.net. We'll be happy to add them to this | |
78 | document. | |
79 | ||
80 | The latest version of this document is available online at | |
81 | http://skaro.nightcrawler.com/~bb/Docs/Proc as HTML version. | |
82 | ||
83 | If the above direction does not works for you, ypu could try the kernel | |
84 | mailing list at linux-kernel@vger.kernel.org and/or try to reach me at | |
85 | comandante@zaralinux.com. | |
86 | ||
87 | 0.2 Legal Stuff | |
88 | --------------- | |
89 | ||
90 | We don't guarantee the correctness of this document, and if you come to us | |
91 | complaining about how you screwed up your system because of incorrect | |
92 | documentation, we won't feel responsible... | |
93 | ||
94 | ------------------------------------------------------------------------------ | |
95 | CHAPTER 1: COLLECTING SYSTEM INFORMATION | |
96 | ------------------------------------------------------------------------------ | |
97 | ||
98 | ------------------------------------------------------------------------------ | |
99 | In This Chapter | |
100 | ------------------------------------------------------------------------------ | |
101 | * Investigating the properties of the pseudo file system /proc and its | |
102 | ability to provide information on the running Linux system | |
103 | * Examining /proc's structure | |
104 | * Uncovering various information about the kernel and the processes running | |
105 | on the system | |
106 | ------------------------------------------------------------------------------ | |
107 | ||
108 | ||
109 | The proc file system acts as an interface to internal data structures in the | |
110 | kernel. It can be used to obtain information about the system and to change | |
111 | certain kernel parameters at runtime (sysctl). | |
112 | ||
113 | First, we'll take a look at the read-only parts of /proc. In Chapter 2, we | |
114 | show you how you can use /proc/sys to change settings. | |
115 | ||
116 | 1.1 Process-Specific Subdirectories | |
117 | ----------------------------------- | |
118 | ||
119 | The directory /proc contains (among other things) one subdirectory for each | |
120 | process running on the system, which is named after the process ID (PID). | |
121 | ||
122 | The link self points to the process reading the file system. Each process | |
123 | subdirectory has the entries listed in Table 1-1. | |
124 | ||
125 | ||
126 | Table 1-1: Process specific entries in /proc | |
127 | .............................................................................. | |
b813e931 DR |
128 | File Content |
129 | clear_refs Clears page referenced bits shown in smaps output | |
130 | cmdline Command line arguments | |
131 | cpu Current and last cpu in which it was executed (2.4)(smp) | |
132 | cwd Link to the current working directory | |
133 | environ Values of environment variables | |
134 | exe Link to the executable of this process | |
135 | fd Directory, which contains all file descriptors | |
136 | maps Memory maps to executables and library files (2.4) | |
137 | mem Memory held by this process | |
138 | root Link to the root directory of this process | |
139 | stat Process status | |
140 | statm Process memory status information | |
141 | status Process status in human readable form | |
142 | wchan If CONFIG_KALLSYMS is set, a pre-decoded wchan | |
143 | smaps Extension based on maps, the rss size for each mapped file | |
1da177e4 LT |
144 | .............................................................................. |
145 | ||
146 | For example, to get the status information of a process, all you have to do is | |
147 | read the file /proc/PID/status: | |
148 | ||
149 | >cat /proc/self/status | |
150 | Name: cat | |
151 | State: R (running) | |
152 | Pid: 5452 | |
153 | PPid: 743 | |
154 | TracerPid: 0 (2.4) | |
155 | Uid: 501 501 501 501 | |
156 | Gid: 100 100 100 100 | |
157 | Groups: 100 14 16 | |
158 | VmSize: 1112 kB | |
159 | VmLck: 0 kB | |
160 | VmRSS: 348 kB | |
161 | VmData: 24 kB | |
162 | VmStk: 12 kB | |
163 | VmExe: 8 kB | |
164 | VmLib: 1044 kB | |
165 | SigPnd: 0000000000000000 | |
166 | SigBlk: 0000000000000000 | |
167 | SigIgn: 0000000000000000 | |
168 | SigCgt: 0000000000000000 | |
169 | CapInh: 00000000fffffeff | |
170 | CapPrm: 0000000000000000 | |
171 | CapEff: 0000000000000000 | |
172 | ||
173 | ||
174 | This shows you nearly the same information you would get if you viewed it with | |
175 | the ps command. In fact, ps uses the proc file system to obtain its | |
176 | information. The statm file contains more detailed information about the | |
18d96779 KC |
177 | process memory usage. Its seven fields are explained in Table 1-2. The stat |
178 | file contains details information about the process itself. Its fields are | |
179 | explained in Table 1-3. | |
1da177e4 LT |
180 | |
181 | ||
182 | Table 1-2: Contents of the statm files (as of 2.6.8-rc3) | |
183 | .............................................................................. | |
184 | Field Content | |
185 | size total program size (pages) (same as VmSize in status) | |
186 | resident size of memory portions (pages) (same as VmRSS in status) | |
187 | shared number of pages that are shared (i.e. backed by a file) | |
188 | trs number of pages that are 'code' (not including libs; broken, | |
189 | includes data segment) | |
190 | lrs number of pages of library (always 0 on 2.6) | |
191 | drs number of pages of data/stack (including libs; broken, | |
192 | includes library text) | |
193 | dt number of dirty pages (always 0 on 2.6) | |
194 | .............................................................................. | |
195 | ||
18d96779 KC |
196 | |
197 | Table 1-3: Contents of the stat files (as of 2.6.22-rc3) | |
198 | .............................................................................. | |
199 | Field Content | |
200 | pid process id | |
201 | tcomm filename of the executable | |
202 | state state (R is running, S is sleeping, D is sleeping in an | |
203 | uninterruptible wait, Z is zombie, T is traced or stopped) | |
204 | ppid process id of the parent process | |
205 | pgrp pgrp of the process | |
206 | sid session id | |
207 | tty_nr tty the process uses | |
208 | tty_pgrp pgrp of the tty | |
209 | flags task flags | |
210 | min_flt number of minor faults | |
211 | cmin_flt number of minor faults with child's | |
212 | maj_flt number of major faults | |
213 | cmaj_flt number of major faults with child's | |
214 | utime user mode jiffies | |
215 | stime kernel mode jiffies | |
216 | cutime user mode jiffies with child's | |
217 | cstime kernel mode jiffies with child's | |
218 | priority priority level | |
219 | nice nice level | |
220 | num_threads number of threads | |
2e01e00e | 221 | it_real_value (obsolete, always 0) |
18d96779 KC |
222 | start_time time the process started after system boot |
223 | vsize virtual memory size | |
224 | rss resident set memory size | |
225 | rsslim current limit in bytes on the rss | |
226 | start_code address above which program text can run | |
227 | end_code address below which program text can run | |
228 | start_stack address of the start of the stack | |
229 | esp current value of ESP | |
230 | eip current value of EIP | |
231 | pending bitmap of pending signals (obsolete) | |
232 | blocked bitmap of blocked signals (obsolete) | |
233 | sigign bitmap of ignored signals (obsolete) | |
234 | sigcatch bitmap of catched signals (obsolete) | |
235 | wchan address where process went to sleep | |
236 | 0 (place holder) | |
237 | 0 (place holder) | |
238 | exit_signal signal to send to parent thread on exit | |
239 | task_cpu which CPU the task is scheduled on | |
240 | rt_priority realtime priority | |
241 | policy scheduling policy (man sched_setscheduler) | |
242 | blkio_ticks time spent waiting for block IO | |
243 | .............................................................................. | |
244 | ||
245 | ||
1da177e4 LT |
246 | 1.2 Kernel data |
247 | --------------- | |
248 | ||
249 | Similar to the process entries, the kernel data files give information about | |
250 | the running kernel. The files used to obtain this information are contained in | |
18d96779 | 251 | /proc and are listed in Table 1-4. Not all of these will be present in your |
1da177e4 LT |
252 | system. It depends on the kernel configuration and the loaded modules, which |
253 | files are there, and which are missing. | |
254 | ||
18d96779 | 255 | Table 1-4: Kernel info in /proc |
1da177e4 LT |
256 | .............................................................................. |
257 | File Content | |
258 | apm Advanced power management info | |
259 | buddyinfo Kernel memory allocator information (see text) (2.5) | |
260 | bus Directory containing bus specific information | |
261 | cmdline Kernel command line | |
262 | cpuinfo Info about the CPU | |
263 | devices Available devices (block and character) | |
264 | dma Used DMS channels | |
265 | filesystems Supported filesystems | |
266 | driver Various drivers grouped here, currently rtc (2.4) | |
267 | execdomains Execdomains, related to security (2.4) | |
268 | fb Frame Buffer devices (2.4) | |
269 | fs File system parameters, currently nfs/exports (2.4) | |
270 | ide Directory containing info about the IDE subsystem | |
271 | interrupts Interrupt usage | |
272 | iomem Memory map (2.4) | |
273 | ioports I/O port usage | |
274 | irq Masks for irq to cpu affinity (2.4)(smp?) | |
275 | isapnp ISA PnP (Plug&Play) Info (2.4) | |
276 | kcore Kernel core image (can be ELF or A.OUT(deprecated in 2.4)) | |
277 | kmsg Kernel messages | |
278 | ksyms Kernel symbol table | |
279 | loadavg Load average of last 1, 5 & 15 minutes | |
280 | locks Kernel locks | |
281 | meminfo Memory info | |
282 | misc Miscellaneous | |
283 | modules List of loaded modules | |
284 | mounts Mounted filesystems | |
285 | net Networking info (see text) | |
286 | partitions Table of partitions known to the system | |
8b60756a | 287 | pci Deprecated info of PCI bus (new way -> /proc/bus/pci/, |
1da177e4 LT |
288 | decoupled by lspci (2.4) |
289 | rtc Real time clock | |
290 | scsi SCSI info (see text) | |
291 | slabinfo Slab pool info | |
292 | stat Overall statistics | |
293 | swaps Swap space utilization | |
294 | sys See chapter 2 | |
295 | sysvipc Info of SysVIPC Resources (msg, sem, shm) (2.4) | |
296 | tty Info of tty drivers | |
297 | uptime System uptime | |
298 | version Kernel version | |
299 | video bttv info of video resources (2.4) | |
a47a126a | 300 | vmallocinfo Show vmalloced areas |
1da177e4 LT |
301 | .............................................................................. |
302 | ||
303 | You can, for example, check which interrupts are currently in use and what | |
304 | they are used for by looking in the file /proc/interrupts: | |
305 | ||
306 | > cat /proc/interrupts | |
307 | CPU0 | |
308 | 0: 8728810 XT-PIC timer | |
309 | 1: 895 XT-PIC keyboard | |
310 | 2: 0 XT-PIC cascade | |
311 | 3: 531695 XT-PIC aha152x | |
312 | 4: 2014133 XT-PIC serial | |
313 | 5: 44401 XT-PIC pcnet_cs | |
314 | 8: 2 XT-PIC rtc | |
315 | 11: 8 XT-PIC i82365 | |
316 | 12: 182918 XT-PIC PS/2 Mouse | |
317 | 13: 1 XT-PIC fpu | |
318 | 14: 1232265 XT-PIC ide0 | |
319 | 15: 7 XT-PIC ide1 | |
320 | NMI: 0 | |
321 | ||
322 | In 2.4.* a couple of lines where added to this file LOC & ERR (this time is the | |
323 | output of a SMP machine): | |
324 | ||
325 | > cat /proc/interrupts | |
326 | ||
327 | CPU0 CPU1 | |
328 | 0: 1243498 1214548 IO-APIC-edge timer | |
329 | 1: 8949 8958 IO-APIC-edge keyboard | |
330 | 2: 0 0 XT-PIC cascade | |
331 | 5: 11286 10161 IO-APIC-edge soundblaster | |
332 | 8: 1 0 IO-APIC-edge rtc | |
333 | 9: 27422 27407 IO-APIC-edge 3c503 | |
334 | 12: 113645 113873 IO-APIC-edge PS/2 Mouse | |
335 | 13: 0 0 XT-PIC fpu | |
336 | 14: 22491 24012 IO-APIC-edge ide0 | |
337 | 15: 2183 2415 IO-APIC-edge ide1 | |
338 | 17: 30564 30414 IO-APIC-level eth0 | |
339 | 18: 177 164 IO-APIC-level bttv | |
340 | NMI: 2457961 2457959 | |
341 | LOC: 2457882 2457881 | |
342 | ERR: 2155 | |
343 | ||
344 | NMI is incremented in this case because every timer interrupt generates a NMI | |
345 | (Non Maskable Interrupt) which is used by the NMI Watchdog to detect lockups. | |
346 | ||
347 | LOC is the local interrupt counter of the internal APIC of every CPU. | |
348 | ||
349 | ERR is incremented in the case of errors in the IO-APIC bus (the bus that | |
350 | connects the CPUs in a SMP system. This means that an error has been detected, | |
351 | the IO-APIC automatically retry the transmission, so it should not be a big | |
352 | problem, but you should read the SMP-FAQ. | |
353 | ||
38e760a1 JK |
354 | In 2.6.2* /proc/interrupts was expanded again. This time the goal was for |
355 | /proc/interrupts to display every IRQ vector in use by the system, not | |
356 | just those considered 'most important'. The new vectors are: | |
357 | ||
358 | THR -- interrupt raised when a machine check threshold counter | |
359 | (typically counting ECC corrected errors of memory or cache) exceeds | |
360 | a configurable threshold. Only available on some systems. | |
361 | ||
362 | TRM -- a thermal event interrupt occurs when a temperature threshold | |
363 | has been exceeded for the CPU. This interrupt may also be generated | |
364 | when the temperature drops back to normal. | |
365 | ||
366 | SPU -- a spurious interrupt is some interrupt that was raised then lowered | |
367 | by some IO device before it could be fully processed by the APIC. Hence | |
368 | the APIC sees the interrupt but does not know what device it came from. | |
369 | For this case the APIC will generate the interrupt with a IRQ vector | |
370 | of 0xff. This might also be generated by chipset bugs. | |
371 | ||
372 | RES, CAL, TLB -- rescheduling, call and TLB flush interrupts are | |
373 | sent from one CPU to another per the needs of the OS. Typically, | |
374 | their statistics are used by kernel developers and interested users to | |
375 | determine the occurance of interrupt of the given type. | |
376 | ||
377 | The above IRQ vectors are displayed only when relevent. For example, | |
378 | the threshold vector does not exist on x86_64 platforms. Others are | |
379 | suppressed when the system is a uniprocessor. As of this writing, only | |
380 | i386 and x86_64 platforms support the new IRQ vector displays. | |
381 | ||
382 | Of some interest is the introduction of the /proc/irq directory to 2.4. | |
1da177e4 LT |
383 | It could be used to set IRQ to CPU affinity, this means that you can "hook" an |
384 | IRQ to only one CPU, or to exclude a CPU of handling IRQs. The contents of the | |
18404756 MK |
385 | irq subdir is one subdir for each IRQ, and two files; default_smp_affinity and |
386 | prof_cpu_mask. | |
1da177e4 LT |
387 | |
388 | For example | |
389 | > ls /proc/irq/ | |
390 | 0 10 12 14 16 18 2 4 6 8 prof_cpu_mask | |
18404756 | 391 | 1 11 13 15 17 19 3 5 7 9 default_smp_affinity |
1da177e4 LT |
392 | > ls /proc/irq/0/ |
393 | smp_affinity | |
394 | ||
18404756 MK |
395 | smp_affinity is a bitmask, in which you can specify which CPUs can handle the |
396 | IRQ, you can set it by doing: | |
1da177e4 | 397 | |
18404756 MK |
398 | > echo 1 > /proc/irq/10/smp_affinity |
399 | ||
400 | This means that only the first CPU will handle the IRQ, but you can also echo | |
401 | 5 which means that only the first and fourth CPU can handle the IRQ. | |
1da177e4 | 402 | |
18404756 MK |
403 | The contents of each smp_affinity file is the same by default: |
404 | ||
405 | > cat /proc/irq/0/smp_affinity | |
406 | ffffffff | |
1da177e4 | 407 | |
18404756 MK |
408 | The default_smp_affinity mask applies to all non-active IRQs, which are the |
409 | IRQs which have not yet been allocated/activated, and hence which lack a | |
410 | /proc/irq/[0-9]* directory. | |
1da177e4 | 411 | |
18404756 MK |
412 | prof_cpu_mask specifies which CPUs are to be profiled by the system wide |
413 | profiler. Default value is ffffffff (all cpus). | |
1da177e4 LT |
414 | |
415 | The way IRQs are routed is handled by the IO-APIC, and it's Round Robin | |
416 | between all the CPUs which are allowed to handle it. As usual the kernel has | |
417 | more info than you and does a better job than you, so the defaults are the | |
418 | best choice for almost everyone. | |
419 | ||
420 | There are three more important subdirectories in /proc: net, scsi, and sys. | |
421 | The general rule is that the contents, or even the existence of these | |
422 | directories, depend on your kernel configuration. If SCSI is not enabled, the | |
423 | directory scsi may not exist. The same is true with the net, which is there | |
424 | only when networking support is present in the running kernel. | |
425 | ||
426 | The slabinfo file gives information about memory usage at the slab level. | |
427 | Linux uses slab pools for memory management above page level in version 2.2. | |
428 | Commonly used objects have their own slab pool (such as network buffers, | |
429 | directory cache, and so on). | |
430 | ||
431 | .............................................................................. | |
432 | ||
433 | > cat /proc/buddyinfo | |
434 | ||
435 | Node 0, zone DMA 0 4 5 4 4 3 ... | |
436 | Node 0, zone Normal 1 0 0 1 101 8 ... | |
437 | Node 0, zone HighMem 2 0 0 1 1 0 ... | |
438 | ||
439 | Memory fragmentation is a problem under some workloads, and buddyinfo is a | |
440 | useful tool for helping diagnose these problems. Buddyinfo will give you a | |
441 | clue as to how big an area you can safely allocate, or why a previous | |
442 | allocation failed. | |
443 | ||
444 | Each column represents the number of pages of a certain order which are | |
445 | available. In this case, there are 0 chunks of 2^0*PAGE_SIZE available in | |
446 | ZONE_DMA, 4 chunks of 2^1*PAGE_SIZE in ZONE_DMA, 101 chunks of 2^4*PAGE_SIZE | |
447 | available in ZONE_NORMAL, etc... | |
448 | ||
449 | .............................................................................. | |
450 | ||
451 | meminfo: | |
452 | ||
453 | Provides information about distribution and utilization of memory. This | |
454 | varies by architecture and compile options. The following is from a | |
455 | 16GB PIII, which has highmem enabled. You may not have all of these fields. | |
456 | ||
457 | > cat /proc/meminfo | |
458 | ||
459 | ||
460 | MemTotal: 16344972 kB | |
461 | MemFree: 13634064 kB | |
462 | Buffers: 3656 kB | |
463 | Cached: 1195708 kB | |
464 | SwapCached: 0 kB | |
465 | Active: 891636 kB | |
466 | Inactive: 1077224 kB | |
467 | HighTotal: 15597528 kB | |
468 | HighFree: 13629632 kB | |
469 | LowTotal: 747444 kB | |
470 | LowFree: 4432 kB | |
471 | SwapTotal: 0 kB | |
472 | SwapFree: 0 kB | |
473 | Dirty: 968 kB | |
474 | Writeback: 0 kB | |
b88473f7 | 475 | AnonPages: 861800 kB |
1da177e4 | 476 | Mapped: 280372 kB |
b88473f7 MS |
477 | Slab: 284364 kB |
478 | SReclaimable: 159856 kB | |
479 | SUnreclaim: 124508 kB | |
480 | PageTables: 24448 kB | |
481 | NFS_Unstable: 0 kB | |
482 | Bounce: 0 kB | |
483 | WritebackTmp: 0 kB | |
1da177e4 LT |
484 | CommitLimit: 7669796 kB |
485 | Committed_AS: 100056 kB | |
1da177e4 LT |
486 | VmallocTotal: 112216 kB |
487 | VmallocUsed: 428 kB | |
488 | VmallocChunk: 111088 kB | |
489 | ||
490 | MemTotal: Total usable ram (i.e. physical ram minus a few reserved | |
491 | bits and the kernel binary code) | |
492 | MemFree: The sum of LowFree+HighFree | |
493 | Buffers: Relatively temporary storage for raw disk blocks | |
494 | shouldn't get tremendously large (20MB or so) | |
495 | Cached: in-memory cache for files read from the disk (the | |
496 | pagecache). Doesn't include SwapCached | |
497 | SwapCached: Memory that once was swapped out, is swapped back in but | |
498 | still also is in the swapfile (if memory is needed it | |
499 | doesn't need to be swapped out AGAIN because it is already | |
500 | in the swapfile. This saves I/O) | |
501 | Active: Memory that has been used more recently and usually not | |
502 | reclaimed unless absolutely necessary. | |
503 | Inactive: Memory which has been less recently used. It is more | |
504 | eligible to be reclaimed for other purposes | |
505 | HighTotal: | |
506 | HighFree: Highmem is all memory above ~860MB of physical memory | |
507 | Highmem areas are for use by userspace programs, or | |
508 | for the pagecache. The kernel must use tricks to access | |
509 | this memory, making it slower to access than lowmem. | |
510 | LowTotal: | |
511 | LowFree: Lowmem is memory which can be used for everything that | |
3f6dee9b | 512 | highmem can be used for, but it is also available for the |
1da177e4 LT |
513 | kernel's use for its own data structures. Among many |
514 | other things, it is where everything from the Slab is | |
515 | allocated. Bad things happen when you're out of lowmem. | |
516 | SwapTotal: total amount of swap space available | |
517 | SwapFree: Memory which has been evicted from RAM, and is temporarily | |
518 | on the disk | |
519 | Dirty: Memory which is waiting to get written back to the disk | |
520 | Writeback: Memory which is actively being written back to the disk | |
b88473f7 | 521 | AnonPages: Non-file backed pages mapped into userspace page tables |
1da177e4 | 522 | Mapped: files which have been mmaped, such as libraries |
e82443c0 | 523 | Slab: in-kernel data structures cache |
b88473f7 MS |
524 | SReclaimable: Part of Slab, that might be reclaimed, such as caches |
525 | SUnreclaim: Part of Slab, that cannot be reclaimed on memory pressure | |
526 | PageTables: amount of memory dedicated to the lowest level of page | |
527 | tables. | |
528 | NFS_Unstable: NFS pages sent to the server, but not yet committed to stable | |
529 | storage | |
530 | Bounce: Memory used for block device "bounce buffers" | |
531 | WritebackTmp: Memory used by FUSE for temporary writeback buffers | |
1da177e4 LT |
532 | CommitLimit: Based on the overcommit ratio ('vm.overcommit_ratio'), |
533 | this is the total amount of memory currently available to | |
534 | be allocated on the system. This limit is only adhered to | |
535 | if strict overcommit accounting is enabled (mode 2 in | |
536 | 'vm.overcommit_memory'). | |
537 | The CommitLimit is calculated with the following formula: | |
538 | CommitLimit = ('vm.overcommit_ratio' * Physical RAM) + Swap | |
539 | For example, on a system with 1G of physical RAM and 7G | |
540 | of swap with a `vm.overcommit_ratio` of 30 it would | |
541 | yield a CommitLimit of 7.3G. | |
542 | For more details, see the memory overcommit documentation | |
543 | in vm/overcommit-accounting. | |
544 | Committed_AS: The amount of memory presently allocated on the system. | |
545 | The committed memory is a sum of all of the memory which | |
546 | has been allocated by processes, even if it has not been | |
547 | "used" by them as of yet. A process which malloc()'s 1G | |
548 | of memory, but only touches 300M of it will only show up | |
549 | as using 300M of memory even if it has the address space | |
550 | allocated for the entire 1G. This 1G is memory which has | |
551 | been "committed" to by the VM and can be used at any time | |
552 | by the allocating application. With strict overcommit | |
553 | enabled on the system (mode 2 in 'vm.overcommit_memory'), | |
554 | allocations which would exceed the CommitLimit (detailed | |
555 | above) will not be permitted. This is useful if one needs | |
556 | to guarantee that processes will not fail due to lack of | |
557 | memory once that memory has been successfully allocated. | |
1da177e4 LT |
558 | VmallocTotal: total size of vmalloc memory area |
559 | VmallocUsed: amount of vmalloc area which is used | |
560 | VmallocChunk: largest contigious block of vmalloc area which is free | |
561 | ||
a47a126a ED |
562 | .............................................................................. |
563 | ||
564 | vmallocinfo: | |
565 | ||
566 | Provides information about vmalloced/vmaped areas. One line per area, | |
567 | containing the virtual address range of the area, size in bytes, | |
568 | caller information of the creator, and optional information depending | |
569 | on the kind of area : | |
570 | ||
571 | pages=nr number of pages | |
572 | phys=addr if a physical address was specified | |
573 | ioremap I/O mapping (ioremap() and friends) | |
574 | vmalloc vmalloc() area | |
575 | vmap vmap()ed pages | |
576 | user VM_USERMAP area | |
577 | vpages buffer for pages pointers was vmalloced (huge area) | |
578 | N<node>=nr (Only on NUMA kernels) | |
579 | Number of pages allocated on memory node <node> | |
580 | ||
581 | > cat /proc/vmallocinfo | |
582 | 0xffffc20000000000-0xffffc20000201000 2101248 alloc_large_system_hash+0x204 ... | |
583 | /0x2c0 pages=512 vmalloc N0=128 N1=128 N2=128 N3=128 | |
584 | 0xffffc20000201000-0xffffc20000302000 1052672 alloc_large_system_hash+0x204 ... | |
585 | /0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 N3=64 | |
586 | 0xffffc20000302000-0xffffc20000304000 8192 acpi_tb_verify_table+0x21/0x4f... | |
587 | phys=7fee8000 ioremap | |
588 | 0xffffc20000304000-0xffffc20000307000 12288 acpi_tb_verify_table+0x21/0x4f... | |
589 | phys=7fee7000 ioremap | |
590 | 0xffffc2000031d000-0xffffc2000031f000 8192 init_vdso_vars+0x112/0x210 | |
591 | 0xffffc2000031f000-0xffffc2000032b000 49152 cramfs_uncompress_init+0x2e ... | |
592 | /0x80 pages=11 vmalloc N0=3 N1=3 N2=2 N3=3 | |
593 | 0xffffc2000033a000-0xffffc2000033d000 12288 sys_swapon+0x640/0xac0 ... | |
594 | pages=2 vmalloc N1=2 | |
595 | 0xffffc20000347000-0xffffc2000034c000 20480 xt_alloc_table_info+0xfe ... | |
596 | /0x130 [x_tables] pages=4 vmalloc N0=4 | |
597 | 0xffffffffa0000000-0xffffffffa000f000 61440 sys_init_module+0xc27/0x1d00 ... | |
598 | pages=14 vmalloc N2=14 | |
599 | 0xffffffffa000f000-0xffffffffa0014000 20480 sys_init_module+0xc27/0x1d00 ... | |
600 | pages=4 vmalloc N1=4 | |
601 | 0xffffffffa0014000-0xffffffffa0017000 12288 sys_init_module+0xc27/0x1d00 ... | |
602 | pages=2 vmalloc N1=2 | |
603 | 0xffffffffa0017000-0xffffffffa0022000 45056 sys_init_module+0xc27/0x1d00 ... | |
604 | pages=10 vmalloc N0=10 | |
1da177e4 LT |
605 | |
606 | 1.3 IDE devices in /proc/ide | |
607 | ---------------------------- | |
608 | ||
609 | The subdirectory /proc/ide contains information about all IDE devices of which | |
610 | the kernel is aware. There is one subdirectory for each IDE controller, the | |
611 | file drivers and a link for each IDE device, pointing to the device directory | |
612 | in the controller specific subtree. | |
613 | ||
614 | The file drivers contains general information about the drivers used for the | |
615 | IDE devices: | |
616 | ||
617 | > cat /proc/ide/drivers | |
618 | ide-cdrom version 4.53 | |
619 | ide-disk version 1.08 | |
620 | ||
621 | More detailed information can be found in the controller specific | |
622 | subdirectories. These are named ide0, ide1 and so on. Each of these | |
18d96779 | 623 | directories contains the files shown in table 1-5. |
1da177e4 LT |
624 | |
625 | ||
18d96779 | 626 | Table 1-5: IDE controller info in /proc/ide/ide? |
1da177e4 LT |
627 | .............................................................................. |
628 | File Content | |
629 | channel IDE channel (0 or 1) | |
630 | config Configuration (only for PCI/IDE bridge) | |
631 | mate Mate name | |
632 | model Type/Chipset of IDE controller | |
633 | .............................................................................. | |
634 | ||
635 | Each device connected to a controller has a separate subdirectory in the | |
18d96779 | 636 | controllers directory. The files listed in table 1-6 are contained in these |
1da177e4 LT |
637 | directories. |
638 | ||
639 | ||
18d96779 | 640 | Table 1-6: IDE device information |
1da177e4 LT |
641 | .............................................................................. |
642 | File Content | |
643 | cache The cache | |
644 | capacity Capacity of the medium (in 512Byte blocks) | |
645 | driver driver and version | |
646 | geometry physical and logical geometry | |
647 | identify device identify block | |
648 | media media type | |
649 | model device identifier | |
650 | settings device setup | |
651 | smart_thresholds IDE disk management thresholds | |
652 | smart_values IDE disk management values | |
653 | .............................................................................. | |
654 | ||
655 | The most interesting file is settings. This file contains a nice overview of | |
656 | the drive parameters: | |
657 | ||
658 | # cat /proc/ide/ide0/hda/settings | |
659 | name value min max mode | |
660 | ---- ----- --- --- ---- | |
661 | bios_cyl 526 0 65535 rw | |
662 | bios_head 255 0 255 rw | |
663 | bios_sect 63 0 63 rw | |
664 | breada_readahead 4 0 127 rw | |
665 | bswap 0 0 1 r | |
666 | file_readahead 72 0 2097151 rw | |
667 | io_32bit 0 0 3 rw | |
668 | keepsettings 0 0 1 rw | |
669 | max_kb_per_request 122 1 127 rw | |
670 | multcount 0 0 8 rw | |
671 | nice1 1 0 1 rw | |
672 | nowerr 0 0 1 rw | |
673 | pio_mode write-only 0 255 w | |
674 | slow 0 0 1 rw | |
675 | unmaskirq 0 0 1 rw | |
676 | using_dma 0 0 1 rw | |
677 | ||
678 | ||
679 | 1.4 Networking info in /proc/net | |
680 | -------------------------------- | |
681 | ||
682 | The subdirectory /proc/net follows the usual pattern. Table 1-6 shows the | |
683 | additional values you get for IP version 6 if you configure the kernel to | |
684 | support this. Table 1-7 lists the files and their meaning. | |
685 | ||
686 | ||
687 | Table 1-6: IPv6 info in /proc/net | |
688 | .............................................................................. | |
689 | File Content | |
690 | udp6 UDP sockets (IPv6) | |
691 | tcp6 TCP sockets (IPv6) | |
692 | raw6 Raw device statistics (IPv6) | |
693 | igmp6 IP multicast addresses, which this host joined (IPv6) | |
694 | if_inet6 List of IPv6 interface addresses | |
695 | ipv6_route Kernel routing table for IPv6 | |
696 | rt6_stats Global IPv6 routing tables statistics | |
697 | sockstat6 Socket statistics (IPv6) | |
698 | snmp6 Snmp data (IPv6) | |
699 | .............................................................................. | |
700 | ||
701 | ||
702 | Table 1-7: Network info in /proc/net | |
703 | .............................................................................. | |
704 | File Content | |
705 | arp Kernel ARP table | |
706 | dev network devices with statistics | |
707 | dev_mcast the Layer2 multicast groups a device is listening too | |
708 | (interface index, label, number of references, number of bound | |
709 | addresses). | |
710 | dev_stat network device status | |
711 | ip_fwchains Firewall chain linkage | |
712 | ip_fwnames Firewall chain names | |
713 | ip_masq Directory containing the masquerading tables | |
714 | ip_masquerade Major masquerading table | |
715 | netstat Network statistics | |
716 | raw raw device statistics | |
717 | route Kernel routing table | |
718 | rpc Directory containing rpc info | |
719 | rt_cache Routing cache | |
720 | snmp SNMP data | |
721 | sockstat Socket statistics | |
722 | tcp TCP sockets | |
723 | tr_rif Token ring RIF routing table | |
724 | udp UDP sockets | |
725 | unix UNIX domain sockets | |
726 | wireless Wireless interface data (Wavelan etc) | |
727 | igmp IP multicast addresses, which this host joined | |
728 | psched Global packet scheduler parameters. | |
729 | netlink List of PF_NETLINK sockets | |
730 | ip_mr_vifs List of multicast virtual interfaces | |
731 | ip_mr_cache List of multicast routing cache | |
732 | .............................................................................. | |
733 | ||
734 | You can use this information to see which network devices are available in | |
735 | your system and how much traffic was routed over those devices: | |
736 | ||
737 | > cat /proc/net/dev | |
738 | Inter-|Receive |[... | |
739 | face |bytes packets errs drop fifo frame compressed multicast|[... | |
740 | lo: 908188 5596 0 0 0 0 0 0 [... | |
741 | ppp0:15475140 20721 410 0 0 410 0 0 [... | |
742 | eth0: 614530 7085 0 0 0 0 0 1 [... | |
743 | ||
744 | ...] Transmit | |
745 | ...] bytes packets errs drop fifo colls carrier compressed | |
746 | ...] 908188 5596 0 0 0 0 0 0 | |
747 | ...] 1375103 17405 0 0 0 0 0 0 | |
748 | ...] 1703981 5535 0 0 0 3 0 0 | |
749 | ||
750 | In addition, each Channel Bond interface has it's own directory. For | |
751 | example, the bond0 device will have a directory called /proc/net/bond0/. | |
752 | It will contain information that is specific to that bond, such as the | |
753 | current slaves of the bond, the link status of the slaves, and how | |
754 | many times the slaves link has failed. | |
755 | ||
756 | 1.5 SCSI info | |
757 | ------------- | |
758 | ||
759 | If you have a SCSI host adapter in your system, you'll find a subdirectory | |
760 | named after the driver for this adapter in /proc/scsi. You'll also see a list | |
761 | of all recognized SCSI devices in /proc/scsi: | |
762 | ||
763 | >cat /proc/scsi/scsi | |
764 | Attached devices: | |
765 | Host: scsi0 Channel: 00 Id: 00 Lun: 00 | |
766 | Vendor: IBM Model: DGHS09U Rev: 03E0 | |
767 | Type: Direct-Access ANSI SCSI revision: 03 | |
768 | Host: scsi0 Channel: 00 Id: 06 Lun: 00 | |
769 | Vendor: PIONEER Model: CD-ROM DR-U06S Rev: 1.04 | |
770 | Type: CD-ROM ANSI SCSI revision: 02 | |
771 | ||
772 | ||
773 | The directory named after the driver has one file for each adapter found in | |
774 | the system. These files contain information about the controller, including | |
775 | the used IRQ and the IO address range. The amount of information shown is | |
776 | dependent on the adapter you use. The example shows the output for an Adaptec | |
777 | AHA-2940 SCSI adapter: | |
778 | ||
779 | > cat /proc/scsi/aic7xxx/0 | |
780 | ||
781 | Adaptec AIC7xxx driver version: 5.1.19/3.2.4 | |
782 | Compile Options: | |
783 | TCQ Enabled By Default : Disabled | |
784 | AIC7XXX_PROC_STATS : Disabled | |
785 | AIC7XXX_RESET_DELAY : 5 | |
786 | Adapter Configuration: | |
787 | SCSI Adapter: Adaptec AHA-294X Ultra SCSI host adapter | |
788 | Ultra Wide Controller | |
789 | PCI MMAPed I/O Base: 0xeb001000 | |
790 | Adapter SEEPROM Config: SEEPROM found and used. | |
791 | Adaptec SCSI BIOS: Enabled | |
792 | IRQ: 10 | |
793 | SCBs: Active 0, Max Active 2, | |
794 | Allocated 15, HW 16, Page 255 | |
795 | Interrupts: 160328 | |
796 | BIOS Control Word: 0x18b6 | |
797 | Adapter Control Word: 0x005b | |
798 | Extended Translation: Enabled | |
799 | Disconnect Enable Flags: 0xffff | |
800 | Ultra Enable Flags: 0x0001 | |
801 | Tag Queue Enable Flags: 0x0000 | |
802 | Ordered Queue Tag Flags: 0x0000 | |
803 | Default Tag Queue Depth: 8 | |
804 | Tagged Queue By Device array for aic7xxx host instance 0: | |
805 | {255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255} | |
806 | Actual queue depth per device for aic7xxx host instance 0: | |
807 | {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1} | |
808 | Statistics: | |
809 | (scsi0:0:0:0) | |
810 | Device using Wide/Sync transfers at 40.0 MByte/sec, offset 8 | |
811 | Transinfo settings: current(12/8/1/0), goal(12/8/1/0), user(12/15/1/0) | |
812 | Total transfers 160151 (74577 reads and 85574 writes) | |
813 | (scsi0:0:6:0) | |
814 | Device using Narrow/Sync transfers at 5.0 MByte/sec, offset 15 | |
815 | Transinfo settings: current(50/15/0/0), goal(50/15/0/0), user(50/15/0/0) | |
816 | Total transfers 0 (0 reads and 0 writes) | |
817 | ||
818 | ||
819 | 1.6 Parallel port info in /proc/parport | |
820 | --------------------------------------- | |
821 | ||
822 | The directory /proc/parport contains information about the parallel ports of | |
823 | your system. It has one subdirectory for each port, named after the port | |
824 | number (0,1,2,...). | |
825 | ||
826 | These directories contain the four files shown in Table 1-8. | |
827 | ||
828 | ||
829 | Table 1-8: Files in /proc/parport | |
830 | .............................................................................. | |
831 | File Content | |
832 | autoprobe Any IEEE-1284 device ID information that has been acquired. | |
833 | devices list of the device drivers using that port. A + will appear by the | |
834 | name of the device currently using the port (it might not appear | |
835 | against any). | |
836 | hardware Parallel port's base address, IRQ line and DMA channel. | |
837 | irq IRQ that parport is using for that port. This is in a separate | |
838 | file to allow you to alter it by writing a new value in (IRQ | |
839 | number or none). | |
840 | .............................................................................. | |
841 | ||
842 | 1.7 TTY info in /proc/tty | |
843 | ------------------------- | |
844 | ||
845 | Information about the available and actually used tty's can be found in the | |
846 | directory /proc/tty.You'll find entries for drivers and line disciplines in | |
847 | this directory, as shown in Table 1-9. | |
848 | ||
849 | ||
850 | Table 1-9: Files in /proc/tty | |
851 | .............................................................................. | |
852 | File Content | |
853 | drivers list of drivers and their usage | |
854 | ldiscs registered line disciplines | |
855 | driver/serial usage statistic and status of single tty lines | |
856 | .............................................................................. | |
857 | ||
858 | To see which tty's are currently in use, you can simply look into the file | |
859 | /proc/tty/drivers: | |
860 | ||
861 | > cat /proc/tty/drivers | |
862 | pty_slave /dev/pts 136 0-255 pty:slave | |
863 | pty_master /dev/ptm 128 0-255 pty:master | |
864 | pty_slave /dev/ttyp 3 0-255 pty:slave | |
865 | pty_master /dev/pty 2 0-255 pty:master | |
866 | serial /dev/cua 5 64-67 serial:callout | |
867 | serial /dev/ttyS 4 64-67 serial | |
868 | /dev/tty0 /dev/tty0 4 0 system:vtmaster | |
869 | /dev/ptmx /dev/ptmx 5 2 system | |
870 | /dev/console /dev/console 5 1 system:console | |
871 | /dev/tty /dev/tty 5 0 system:/dev/tty | |
872 | unknown /dev/tty 4 1-63 console | |
873 | ||
874 | ||
875 | 1.8 Miscellaneous kernel statistics in /proc/stat | |
876 | ------------------------------------------------- | |
877 | ||
878 | Various pieces of information about kernel activity are available in the | |
879 | /proc/stat file. All of the numbers reported in this file are aggregates | |
880 | since the system first booted. For a quick look, simply cat the file: | |
881 | ||
882 | > cat /proc/stat | |
b68f2c3a LC |
883 | cpu 2255 34 2290 22625563 6290 127 456 0 |
884 | cpu0 1132 34 1441 11311718 3675 127 438 0 | |
885 | cpu1 1123 0 849 11313845 2614 0 18 0 | |
1da177e4 LT |
886 | intr 114930548 113199788 3 0 5 263 0 4 [... lots more numbers ...] |
887 | ctxt 1990473 | |
888 | btime 1062191376 | |
889 | processes 2915 | |
890 | procs_running 1 | |
891 | procs_blocked 0 | |
892 | ||
893 | The very first "cpu" line aggregates the numbers in all of the other "cpuN" | |
894 | lines. These numbers identify the amount of time the CPU has spent performing | |
895 | different kinds of work. Time units are in USER_HZ (typically hundredths of a | |
896 | second). The meanings of the columns are as follows, from left to right: | |
897 | ||
898 | - user: normal processes executing in user mode | |
899 | - nice: niced processes executing in user mode | |
900 | - system: processes executing in kernel mode | |
901 | - idle: twiddling thumbs | |
902 | - iowait: waiting for I/O to complete | |
903 | - irq: servicing interrupts | |
904 | - softirq: servicing softirqs | |
b68f2c3a | 905 | - steal: involuntary wait |
1da177e4 LT |
906 | |
907 | The "intr" line gives counts of interrupts serviced since boot time, for each | |
908 | of the possible system interrupts. The first column is the total of all | |
909 | interrupts serviced; each subsequent column is the total for that particular | |
910 | interrupt. | |
911 | ||
912 | The "ctxt" line gives the total number of context switches across all CPUs. | |
913 | ||
914 | The "btime" line gives the time at which the system booted, in seconds since | |
915 | the Unix epoch. | |
916 | ||
917 | The "processes" line gives the number of processes and threads created, which | |
918 | includes (but is not limited to) those created by calls to the fork() and | |
919 | clone() system calls. | |
920 | ||
921 | The "procs_running" line gives the number of processes currently running on | |
922 | CPUs. | |
923 | ||
924 | The "procs_blocked" line gives the number of processes currently blocked, | |
925 | waiting for I/O to complete. | |
926 | ||
37515fac | 927 | |
c9de560d AT |
928 | 1.9 Ext4 file system parameters |
929 | ------------------------------ | |
37515fac TT |
930 | |
931 | Information about mounted ext4 file systems can be found in | |
932 | /proc/fs/ext4. Each mounted filesystem will have a directory in | |
933 | /proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or | |
934 | /proc/fs/ext4/dm-0). The files in each per-device directory are shown | |
935 | in Table 1-10, below. | |
936 | ||
937 | Table 1-10: Files in /proc/fs/ext4/<devname> | |
938 | .............................................................................. | |
939 | File Content | |
940 | mb_groups details of multiblock allocator buddy cache of free blocks | |
941 | mb_history multiblock allocation history | |
942 | stats controls whether the multiblock allocator should start | |
943 | collecting statistics, which are shown during the unmount | |
944 | group_prealloc the multiblock allocator will round up allocation | |
945 | requests to a multiple of this tuning parameter if the | |
946 | stripe size is not set in the ext4 superblock | |
947 | max_to_scan The maximum number of extents the multiblock allocator | |
948 | will search to find the best extent | |
949 | min_to_scan The minimum number of extents the multiblock allocator | |
950 | will search to find the best extent | |
951 | order2_req Tuning parameter which controls the minimum size for | |
952 | requests (as a power of 2) where the buddy cache is | |
953 | used | |
954 | stream_req Files which have fewer blocks than this tunable | |
955 | parameter will have their blocks allocated out of a | |
956 | block group specific preallocation pool, so that small | |
957 | files are packed closely together. Each large file | |
958 | will have its blocks allocated out of its own unique | |
959 | preallocation pool. | |
240799cd TT |
960 | inode_readahead Tuning parameter which controls the maximum number of |
961 | inode table blocks that ext4's inode table readahead | |
962 | algorithm will pre-read into the buffer cache | |
37515fac TT |
963 | .............................................................................. |
964 | ||
1da177e4 LT |
965 | |
966 | ------------------------------------------------------------------------------ | |
967 | Summary | |
968 | ------------------------------------------------------------------------------ | |
969 | The /proc file system serves information about the running system. It not only | |
970 | allows access to process data but also allows you to request the kernel status | |
971 | by reading files in the hierarchy. | |
972 | ||
973 | The directory structure of /proc reflects the types of information and makes | |
974 | it easy, if not obvious, where to look for specific data. | |
975 | ------------------------------------------------------------------------------ | |
976 | ||
977 | ------------------------------------------------------------------------------ | |
978 | CHAPTER 2: MODIFYING SYSTEM PARAMETERS | |
979 | ------------------------------------------------------------------------------ | |
980 | ||
981 | ------------------------------------------------------------------------------ | |
982 | In This Chapter | |
983 | ------------------------------------------------------------------------------ | |
984 | * Modifying kernel parameters by writing into files found in /proc/sys | |
985 | * Exploring the files which modify certain parameters | |
986 | * Review of the /proc/sys file tree | |
987 | ------------------------------------------------------------------------------ | |
988 | ||
989 | ||
990 | A very interesting part of /proc is the directory /proc/sys. This is not only | |
991 | a source of information, it also allows you to change parameters within the | |
992 | kernel. Be very careful when attempting this. You can optimize your system, | |
993 | but you can also cause it to crash. Never alter kernel parameters on a | |
994 | production system. Set up a development machine and test to make sure that | |
995 | everything works the way you want it to. You may have no alternative but to | |
996 | reboot the machine once an error has been made. | |
997 | ||
998 | To change a value, simply echo the new value into the file. An example is | |
999 | given below in the section on the file system data. You need to be root to do | |
1000 | this. You can create your own boot script to perform this every time your | |
1001 | system boots. | |
1002 | ||
1003 | The files in /proc/sys can be used to fine tune and monitor miscellaneous and | |
1004 | general things in the operation of the Linux kernel. Since some of the files | |
1005 | can inadvertently disrupt your system, it is advisable to read both | |
1006 | documentation and source before actually making adjustments. In any case, be | |
1007 | very careful when writing to any of these files. The entries in /proc may | |
1008 | change slightly between the 2.1.* and the 2.2 kernel, so if there is any doubt | |
1009 | review the kernel documentation in the directory /usr/src/linux/Documentation. | |
1010 | This chapter is heavily based on the documentation included in the pre 2.2 | |
1011 | kernels, and became part of it in version 2.2.1 of the Linux kernel. | |
1012 | ||
1013 | 2.1 /proc/sys/fs - File system data | |
1014 | ----------------------------------- | |
1015 | ||
1016 | This subdirectory contains specific file system, file handle, inode, dentry | |
1017 | and quota information. | |
1018 | ||
1019 | Currently, these files are in /proc/sys/fs: | |
1020 | ||
1021 | dentry-state | |
1022 | ------------ | |
1023 | ||
1024 | Status of the directory cache. Since directory entries are dynamically | |
1025 | allocated and deallocated, this file indicates the current status. It holds | |
1026 | six values, in which the last two are not used and are always zero. The others | |
1027 | are listed in table 2-1. | |
1028 | ||
1029 | ||
1030 | Table 2-1: Status files of the directory cache | |
1031 | .............................................................................. | |
1032 | File Content | |
1033 | nr_dentry Almost always zero | |
1034 | nr_unused Number of unused cache entries | |
1035 | age_limit | |
1036 | in seconds after the entry may be reclaimed, when memory is short | |
1037 | want_pages internally | |
1038 | .............................................................................. | |
1039 | ||
1040 | dquot-nr and dquot-max | |
1041 | ---------------------- | |
1042 | ||
1043 | The file dquot-max shows the maximum number of cached disk quota entries. | |
1044 | ||
1045 | The file dquot-nr shows the number of allocated disk quota entries and the | |
1046 | number of free disk quota entries. | |
1047 | ||
1048 | If the number of available cached disk quotas is very low and you have a large | |
1049 | number of simultaneous system users, you might want to raise the limit. | |
1050 | ||
1051 | file-nr and file-max | |
1052 | -------------------- | |
1053 | ||
1054 | The kernel allocates file handles dynamically, but doesn't free them again at | |
1055 | this time. | |
1056 | ||
1057 | The value in file-max denotes the maximum number of file handles that the | |
1058 | Linux kernel will allocate. When you get a lot of error messages about running | |
1059 | out of file handles, you might want to raise this limit. The default value is | |
1060 | 10% of RAM in kilobytes. To change it, just write the new number into the | |
1061 | file: | |
1062 | ||
1063 | # cat /proc/sys/fs/file-max | |
1064 | 4096 | |
1065 | # echo 8192 > /proc/sys/fs/file-max | |
1066 | # cat /proc/sys/fs/file-max | |
1067 | 8192 | |
1068 | ||
1069 | ||
1070 | This method of revision is useful for all customizable parameters of the | |
1071 | kernel - simply echo the new value to the corresponding file. | |
1072 | ||
1073 | Historically, the three values in file-nr denoted the number of allocated file | |
1074 | handles, the number of allocated but unused file handles, and the maximum | |
1075 | number of file handles. Linux 2.6 always reports 0 as the number of free file | |
1076 | handles -- this is not an error, it just means that the number of allocated | |
1077 | file handles exactly matches the number of used file handles. | |
1078 | ||
1079 | Attempts to allocate more file descriptors than file-max are reported with | |
1080 | printk, look for "VFS: file-max limit <number> reached". | |
1081 | ||
1082 | inode-state and inode-nr | |
1083 | ------------------------ | |
1084 | ||
1085 | The file inode-nr contains the first two items from inode-state, so we'll skip | |
1086 | to that file... | |
1087 | ||
1088 | inode-state contains two actual numbers and five dummy values. The numbers | |
1089 | are nr_inodes and nr_free_inodes (in order of appearance). | |
1090 | ||
1091 | nr_inodes | |
1092 | ~~~~~~~~~ | |
1093 | ||
1094 | Denotes the number of inodes the system has allocated. This number will | |
1095 | grow and shrink dynamically. | |
1096 | ||
9cfe015a ED |
1097 | nr_open |
1098 | ------- | |
1099 | ||
1100 | Denotes the maximum number of file-handles a process can | |
1101 | allocate. Default value is 1024*1024 (1048576) which should be | |
1102 | enough for most machines. Actual limit depends on RLIMIT_NOFILE | |
1103 | resource limit. | |
1104 | ||
1da177e4 LT |
1105 | nr_free_inodes |
1106 | -------------- | |
1107 | ||
1108 | Represents the number of free inodes. Ie. The number of inuse inodes is | |
1109 | (nr_inodes - nr_free_inodes). | |
1110 | ||
1da177e4 LT |
1111 | aio-nr and aio-max-nr |
1112 | --------------------- | |
1113 | ||
1114 | aio-nr is the running total of the number of events specified on the | |
1115 | io_setup system call for all currently active aio contexts. If aio-nr | |
1116 | reaches aio-max-nr then io_setup will fail with EAGAIN. Note that | |
1117 | raising aio-max-nr does not result in the pre-allocation or re-sizing | |
1118 | of any kernel data structures. | |
1119 | ||
1120 | 2.2 /proc/sys/fs/binfmt_misc - Miscellaneous binary formats | |
1121 | ----------------------------------------------------------- | |
1122 | ||
1123 | Besides these files, there is the subdirectory /proc/sys/fs/binfmt_misc. This | |
1124 | handles the kernel support for miscellaneous binary formats. | |
1125 | ||
1126 | Binfmt_misc provides the ability to register additional binary formats to the | |
1127 | Kernel without compiling an additional module/kernel. Therefore, binfmt_misc | |
1128 | needs to know magic numbers at the beginning or the filename extension of the | |
1129 | binary. | |
1130 | ||
1131 | It works by maintaining a linked list of structs that contain a description of | |
1132 | a binary format, including a magic with size (or the filename extension), | |
1133 | offset and mask, and the interpreter name. On request it invokes the given | |
1134 | interpreter with the original program as argument, as binfmt_java and | |
1135 | binfmt_em86 and binfmt_mz do. Since binfmt_misc does not define any default | |
1136 | binary-formats, you have to register an additional binary-format. | |
1137 | ||
1138 | There are two general files in binfmt_misc and one file per registered format. | |
1139 | The two general files are register and status. | |
1140 | ||
1141 | Registering a new binary format | |
1142 | ------------------------------- | |
1143 | ||
1144 | To register a new binary format you have to issue the command | |
1145 | ||
1146 | echo :name:type:offset:magic:mask:interpreter: > /proc/sys/fs/binfmt_misc/register | |
1147 | ||
1148 | ||
1149 | ||
1150 | with appropriate name (the name for the /proc-dir entry), offset (defaults to | |
1151 | 0, if omitted), magic, mask (which can be omitted, defaults to all 0xff) and | |
1152 | last but not least, the interpreter that is to be invoked (for example and | |
1153 | testing /bin/echo). Type can be M for usual magic matching or E for filename | |
1154 | extension matching (give extension in place of magic). | |
1155 | ||
1156 | Check or reset the status of the binary format handler | |
1157 | ------------------------------------------------------ | |
1158 | ||
1159 | If you do a cat on the file /proc/sys/fs/binfmt_misc/status, you will get the | |
1160 | current status (enabled/disabled) of binfmt_misc. Change the status by echoing | |
1161 | 0 (disables) or 1 (enables) or -1 (caution: this clears all previously | |
1162 | registered binary formats) to status. For example echo 0 > status to disable | |
1163 | binfmt_misc (temporarily). | |
1164 | ||
1165 | Status of a single handler | |
1166 | -------------------------- | |
1167 | ||
1168 | Each registered handler has an entry in /proc/sys/fs/binfmt_misc. These files | |
1169 | perform the same function as status, but their scope is limited to the actual | |
1170 | binary format. By cating this file, you also receive all related information | |
1171 | about the interpreter/magic of the binfmt. | |
1172 | ||
1173 | Example usage of binfmt_misc (emulate binfmt_java) | |
1174 | -------------------------------------------------- | |
1175 | ||
1176 | cd /proc/sys/fs/binfmt_misc | |
1177 | echo ':Java:M::\xca\xfe\xba\xbe::/usr/local/java/bin/javawrapper:' > register | |
1178 | echo ':HTML:E::html::/usr/local/java/bin/appletviewer:' > register | |
1179 | echo ':Applet:M::<!--applet::/usr/local/java/bin/appletviewer:' > register | |
1180 | echo ':DEXE:M::\x0eDEX::/usr/bin/dosexec:' > register | |
1181 | ||
1182 | ||
1183 | These four lines add support for Java executables and Java applets (like | |
1184 | binfmt_java, additionally recognizing the .html extension with no need to put | |
1185 | <!--applet> to every applet file). You have to install the JDK and the | |
1186 | shell-script /usr/local/java/bin/javawrapper too. It works around the | |
1187 | brokenness of the Java filename handling. To add a Java binary, just create a | |
1188 | link to the class-file somewhere in the path. | |
1189 | ||
1190 | 2.3 /proc/sys/kernel - general kernel parameters | |
1191 | ------------------------------------------------ | |
1192 | ||
1193 | This directory reflects general kernel behaviors. As I've said before, the | |
1194 | contents depend on your configuration. Here you'll find the most important | |
1195 | files, along with descriptions of what they mean and how to use them. | |
1196 | ||
1197 | acct | |
1198 | ---- | |
1199 | ||
1200 | The file contains three values; highwater, lowwater, and frequency. | |
1201 | ||
1202 | It exists only when BSD-style process accounting is enabled. These values | |
1203 | control its behavior. If the free space on the file system where the log lives | |
1204 | goes below lowwater percentage, accounting suspends. If it goes above | |
1205 | highwater percentage, accounting resumes. Frequency determines how often you | |
1206 | check the amount of free space (value is in seconds). Default settings are: 4, | |
1207 | 2, and 30. That is, suspend accounting if there is less than 2 percent free; | |
1208 | resume it if we have a value of 3 or more percent; consider information about | |
1209 | the amount of free space valid for 30 seconds | |
1210 | ||
1211 | ctrl-alt-del | |
1212 | ------------ | |
1213 | ||
1214 | When the value in this file is 0, ctrl-alt-del is trapped and sent to the init | |
1215 | program to handle a graceful restart. However, when the value is greater that | |
1216 | zero, Linux's reaction to this key combination will be an immediate reboot, | |
1217 | without syncing its dirty buffers. | |
1218 | ||
1219 | [NOTE] | |
1220 | When a program (like dosemu) has the keyboard in raw mode, the | |
1221 | ctrl-alt-del is intercepted by the program before it ever reaches the | |
1222 | kernel tty layer, and it is up to the program to decide what to do with | |
1223 | it. | |
1224 | ||
1225 | domainname and hostname | |
1226 | ----------------------- | |
1227 | ||
1228 | These files can be controlled to set the NIS domainname and hostname of your | |
1229 | box. For the classic darkstar.frop.org a simple: | |
1230 | ||
1231 | # echo "darkstar" > /proc/sys/kernel/hostname | |
1232 | # echo "frop.org" > /proc/sys/kernel/domainname | |
1233 | ||
1234 | ||
1235 | would suffice to set your hostname and NIS domainname. | |
1236 | ||
1237 | osrelease, ostype and version | |
1238 | ----------------------------- | |
1239 | ||
1240 | The names make it pretty obvious what these fields contain: | |
1241 | ||
1242 | > cat /proc/sys/kernel/osrelease | |
1243 | 2.2.12 | |
1244 | ||
1245 | > cat /proc/sys/kernel/ostype | |
1246 | Linux | |
1247 | ||
1248 | > cat /proc/sys/kernel/version | |
1249 | #4 Fri Oct 1 12:41:14 PDT 1999 | |
1250 | ||
1251 | ||
1252 | The files osrelease and ostype should be clear enough. Version needs a little | |
1253 | more clarification. The #4 means that this is the 4th kernel built from this | |
1254 | source base and the date after it indicates the time the kernel was built. The | |
1255 | only way to tune these values is to rebuild the kernel. | |
1256 | ||
1257 | panic | |
1258 | ----- | |
1259 | ||
1260 | The value in this file represents the number of seconds the kernel waits | |
1261 | before rebooting on a panic. When you use the software watchdog, the | |
1262 | recommended setting is 60. If set to 0, the auto reboot after a kernel panic | |
1263 | is disabled, which is the default setting. | |
1264 | ||
1265 | printk | |
1266 | ------ | |
1267 | ||
1268 | The four values in printk denote | |
1269 | * console_loglevel, | |
1270 | * default_message_loglevel, | |
1271 | * minimum_console_loglevel and | |
1272 | * default_console_loglevel | |
1273 | respectively. | |
1274 | ||
1275 | These values influence printk() behavior when printing or logging error | |
1276 | messages, which come from inside the kernel. See syslog(2) for more | |
1277 | information on the different log levels. | |
1278 | ||
1279 | console_loglevel | |
1280 | ---------------- | |
1281 | ||
1282 | Messages with a higher priority than this will be printed to the console. | |
1283 | ||
1284 | default_message_level | |
1285 | --------------------- | |
1286 | ||
1287 | Messages without an explicit priority will be printed with this priority. | |
1288 | ||
1289 | minimum_console_loglevel | |
1290 | ------------------------ | |
1291 | ||
1292 | Minimum (highest) value to which the console_loglevel can be set. | |
1293 | ||
1294 | default_console_loglevel | |
1295 | ------------------------ | |
1296 | ||
1297 | Default value for console_loglevel. | |
1298 | ||
1299 | sg-big-buff | |
1300 | ----------- | |
1301 | ||
1302 | This file shows the size of the generic SCSI (sg) buffer. At this point, you | |
1303 | can't tune it yet, but you can change it at compile time by editing | |
1304 | include/scsi/sg.h and changing the value of SG_BIG_BUFF. | |
1305 | ||
1306 | If you use a scanner with SANE (Scanner Access Now Easy) you might want to set | |
1307 | this to a higher value. Refer to the SANE documentation on this issue. | |
1308 | ||
1309 | modprobe | |
1310 | -------- | |
1311 | ||
1312 | The location where the modprobe binary is located. The kernel uses this | |
1313 | program to load modules on demand. | |
1314 | ||
1315 | unknown_nmi_panic | |
1316 | ----------------- | |
1317 | ||
1318 | The value in this file affects behavior of handling NMI. When the value is | |
1319 | non-zero, unknown NMI is trapped and then panic occurs. At that time, kernel | |
1320 | debugging information is displayed on console. | |
1321 | ||
1322 | NMI switch that most IA32 servers have fires unknown NMI up, for example. | |
1323 | If a system hangs up, try pressing the NMI switch. | |
1324 | ||
22b8ab66 BW |
1325 | panic_on_unrecovered_nmi |
1326 | ------------------------ | |
1327 | ||
1328 | The default Linux behaviour on an NMI of either memory or unknown is to continue | |
1329 | operation. For many environments such as scientific computing it is preferable | |
1330 | that the box is taken out and the error dealt with than an uncorrected | |
1331 | parity/ECC error get propogated. | |
1332 | ||
1333 | A small number of systems do generate NMI's for bizarre random reasons such as | |
1334 | power management so the default is off. That sysctl works like the existing | |
1335 | panic controls already in that directory. | |
1336 | ||
e33e89ab DZ |
1337 | nmi_watchdog |
1338 | ------------ | |
1339 | ||
1340 | Enables/Disables the NMI watchdog on x86 systems. When the value is non-zero | |
1341 | the NMI watchdog is enabled and will continuously test all online cpus to | |
1342 | determine whether or not they are still functioning properly. | |
1343 | ||
1344 | Because the NMI watchdog shares registers with oprofile, by disabling the NMI | |
1345 | watchdog, oprofile may have more registers to utilize. | |
1da177e4 | 1346 | |
61e55d05 ND |
1347 | msgmni |
1348 | ------ | |
1349 | ||
1350 | Maximum number of message queue ids on the system. | |
1351 | This value scales to the amount of lowmem. It is automatically recomputed | |
1352 | upon memory add/remove or ipc namespace creation/removal. | |
1353 | When a value is written into this file, msgmni's value becomes fixed, i.e. it | |
1354 | is not recomputed anymore when one of the above events occurs. | |
1355 | Use auto_msgmni to change this behavior. | |
1356 | ||
1357 | auto_msgmni | |
1358 | ----------- | |
1359 | ||
1360 | Enables/Disables automatic recomputing of msgmni upon memory add/remove or | |
1361 | upon ipc namespace creation/removal (see the msgmni description above). | |
1362 | Echoing "1" into this file enables msgmni automatic recomputing. | |
1363 | Echoing "0" turns it off. | |
1364 | auto_msgmni default value is 1. | |
1365 | ||
1da177e4 LT |
1366 | |
1367 | 2.4 /proc/sys/vm - The virtual memory subsystem | |
1368 | ----------------------------------------------- | |
1369 | ||
1370 | The files in this directory can be used to tune the operation of the virtual | |
1371 | memory (VM) subsystem of the Linux kernel. | |
1372 | ||
1373 | vfs_cache_pressure | |
1374 | ------------------ | |
1375 | ||
1376 | Controls the tendency of the kernel to reclaim the memory which is used for | |
1377 | caching of directory and inode objects. | |
1378 | ||
1379 | At the default value of vfs_cache_pressure=100 the kernel will attempt to | |
1380 | reclaim dentries and inodes at a "fair" rate with respect to pagecache and | |
1381 | swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer | |
1382 | to retain dentry and inode caches. Increasing vfs_cache_pressure beyond 100 | |
1383 | causes the kernel to prefer to reclaim dentries and inodes. | |
1384 | ||
1385 | dirty_background_ratio | |
1386 | ---------------------- | |
1387 | ||
7a6560e0 AR |
1388 | Contains, as a percentage of the dirtyable system memory (free pages + mapped |
1389 | pages + file cache, not including locked pages and HugePages), the number of | |
1390 | pages at which the pdflush background writeback daemon will start writing out | |
1391 | dirty data. | |
1da177e4 LT |
1392 | |
1393 | dirty_ratio | |
1394 | ----------------- | |
1395 | ||
7a6560e0 AR |
1396 | Contains, as a percentage of the dirtyable system memory (free pages + mapped |
1397 | pages + file cache, not including locked pages and HugePages), the number of | |
1398 | pages at which a process which is generating disk writes will itself start | |
1399 | writing out dirty data. | |
1da177e4 LT |
1400 | |
1401 | dirty_writeback_centisecs | |
1402 | ------------------------- | |
1403 | ||
1404 | The pdflush writeback daemons will periodically wake up and write `old' data | |
1405 | out to disk. This tunable expresses the interval between those wakeups, in | |
1406 | 100'ths of a second. | |
1407 | ||
1408 | Setting this to zero disables periodic writeback altogether. | |
1409 | ||
1410 | dirty_expire_centisecs | |
1411 | ---------------------- | |
1412 | ||
1413 | This tunable is used to define when dirty data is old enough to be eligible | |
1414 | for writeout by the pdflush daemons. It is expressed in 100'ths of a second. | |
1415 | Data which has been dirty in-memory for longer than this interval will be | |
1416 | written out next time a pdflush daemon wakes up. | |
1417 | ||
195cf453 BG |
1418 | highmem_is_dirtyable |
1419 | -------------------- | |
1420 | ||
1421 | Only present if CONFIG_HIGHMEM is set. | |
1422 | ||
1423 | This defaults to 0 (false), meaning that the ratios set above are calculated | |
1424 | as a percentage of lowmem only. This protects against excessive scanning | |
1425 | in page reclaim, swapping and general VM distress. | |
1426 | ||
1427 | Setting this to 1 can be useful on 32 bit machines where you want to make | |
1428 | random changes within an MMAPed file that is larger than your available | |
1429 | lowmem without causing large quantities of random IO. Is is safe if the | |
1430 | behavior of all programs running on the machine is known and memory will | |
1431 | not be otherwise stressed. | |
1432 | ||
1da177e4 LT |
1433 | legacy_va_layout |
1434 | ---------------- | |
1435 | ||
1436 | If non-zero, this sysctl disables the new 32-bit mmap mmap layout - the kernel | |
1437 | will use the legacy (2.4) layout for all processes. | |
1438 | ||
7786fa9a | 1439 | lowmem_reserve_ratio |
1da177e4 LT |
1440 | --------------------- |
1441 | ||
1442 | For some specialised workloads on highmem machines it is dangerous for | |
1443 | the kernel to allow process memory to be allocated from the "lowmem" | |
1444 | zone. This is because that memory could then be pinned via the mlock() | |
1445 | system call, or by unavailability of swapspace. | |
1446 | ||
1447 | And on large highmem machines this lack of reclaimable lowmem memory | |
1448 | can be fatal. | |
1449 | ||
1450 | So the Linux page allocator has a mechanism which prevents allocations | |
1451 | which _could_ use highmem from using too much lowmem. This means that | |
1452 | a certain amount of lowmem is defended from the possibility of being | |
1453 | captured into pinned user memory. | |
1454 | ||
1455 | (The same argument applies to the old 16 megabyte ISA DMA region. This | |
1456 | mechanism will also defend that region from allocations which could use | |
1457 | highmem or lowmem). | |
1458 | ||
7786fa9a YG |
1459 | The `lowmem_reserve_ratio' tunable determines how aggressive the kernel is |
1460 | in defending these lower zones. | |
1da177e4 LT |
1461 | |
1462 | If you have a machine which uses highmem or ISA DMA and your | |
1463 | applications are using mlock(), or if you are running with no swap then | |
7786fa9a YG |
1464 | you probably should change the lowmem_reserve_ratio setting. |
1465 | ||
1466 | The lowmem_reserve_ratio is an array. You can see them by reading this file. | |
1467 | - | |
1468 | % cat /proc/sys/vm/lowmem_reserve_ratio | |
1469 | 256 256 32 | |
1470 | - | |
1471 | Note: # of this elements is one fewer than number of zones. Because the highest | |
1472 | zone's value is not necessary for following calculation. | |
1473 | ||
1474 | But, these values are not used directly. The kernel calculates # of protection | |
1475 | pages for each zones from them. These are shown as array of protection pages | |
1476 | in /proc/zoneinfo like followings. (This is an example of x86-64 box). | |
1477 | Each zone has an array of protection pages like this. | |
1478 | ||
1479 | - | |
1480 | Node 0, zone DMA | |
1481 | pages free 1355 | |
1482 | min 3 | |
1483 | low 3 | |
1484 | high 4 | |
1485 | : | |
1486 | : | |
1487 | numa_other 0 | |
1488 | protection: (0, 2004, 2004, 2004) | |
1489 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
1490 | pagesets | |
1491 | cpu: 0 pcp: 0 | |
1492 | : | |
1493 | - | |
1494 | These protections are added to score to judge whether this zone should be used | |
1495 | for page allocation or should be reclaimed. | |
1496 | ||
1497 | In this example, if normal pages (index=2) are required to this DMA zone and | |
1498 | pages_high is used for watermark, the kernel judges this zone should not be | |
1499 | used because pages_free(1355) is smaller than watermark + protection[2] | |
1500 | (4 + 2004 = 2008). If this protection value is 0, this zone would be used for | |
1501 | normal page requirement. If requirement is DMA zone(index=0), protection[0] | |
1502 | (=0) is used. | |
1503 | ||
d9195881 | 1504 | zone[i]'s protection[j] is calculated by following expression. |
7786fa9a YG |
1505 | |
1506 | (i < j): | |
1507 | zone[i]->protection[j] | |
1508 | = (total sums of present_pages from zone[i+1] to zone[j] on the node) | |
1509 | / lowmem_reserve_ratio[i]; | |
1510 | (i = j): | |
1511 | (should not be protected. = 0; | |
1512 | (i > j): | |
1513 | (not necessary, but looks 0) | |
1514 | ||
1515 | The default values of lowmem_reserve_ratio[i] are | |
1516 | 256 (if zone[i] means DMA or DMA32 zone) | |
1517 | 32 (others). | |
1518 | As above expression, they are reciprocal number of ratio. | |
1519 | 256 means 1/256. # of protection pages becomes about "0.39%" of total present | |
1520 | pages of higher zones on the node. | |
1521 | ||
1522 | If you would like to protect more pages, smaller values are effective. | |
1523 | The minimum value is 1 (1/1 -> 100%). | |
1da177e4 LT |
1524 | |
1525 | page-cluster | |
1526 | ------------ | |
1527 | ||
1528 | page-cluster controls the number of pages which are written to swap in | |
1529 | a single attempt. The swap I/O size. | |
1530 | ||
1531 | It is a logarithmic value - setting it to zero means "1 page", setting | |
1532 | it to 1 means "2 pages", setting it to 2 means "4 pages", etc. | |
1533 | ||
1534 | The default value is three (eight pages at a time). There may be some | |
1535 | small benefits in tuning this to a different value if your workload is | |
1536 | swap-intensive. | |
1537 | ||
1538 | overcommit_memory | |
1539 | ----------------- | |
1540 | ||
af97c722 CE |
1541 | Controls overcommit of system memory, possibly allowing processes |
1542 | to allocate (but not use) more memory than is actually available. | |
1543 | ||
1544 | ||
1545 | 0 - Heuristic overcommit handling. Obvious overcommits of | |
1546 | address space are refused. Used for a typical system. It | |
1547 | ensures a seriously wild allocation fails while allowing | |
1548 | overcommit to reduce swap usage. root is allowed to | |
53cb4726 | 1549 | allocate slightly more memory in this mode. This is the |
af97c722 CE |
1550 | default. |
1551 | ||
1552 | 1 - Always overcommit. Appropriate for some scientific | |
1553 | applications. | |
1554 | ||
1555 | 2 - Don't overcommit. The total address space commit | |
1556 | for the system is not permitted to exceed swap plus a | |
1557 | configurable percentage (default is 50) of physical RAM. | |
1558 | Depending on the percentage you use, in most situations | |
1559 | this means a process will not be killed while attempting | |
1560 | to use already-allocated memory but will receive errors | |
1561 | on memory allocation as appropriate. | |
1562 | ||
1563 | overcommit_ratio | |
1564 | ---------------- | |
1565 | ||
1566 | Percentage of physical memory size to include in overcommit calculations | |
1567 | (see above.) | |
1568 | ||
1569 | Memory allocation limit = swapspace + physmem * (overcommit_ratio / 100) | |
1570 | ||
1571 | swapspace = total size of all swap areas | |
1572 | physmem = size of physical memory in system | |
1da177e4 LT |
1573 | |
1574 | nr_hugepages and hugetlb_shm_group | |
1575 | ---------------------------------- | |
1576 | ||
1577 | nr_hugepages configures number of hugetlb page reserved for the system. | |
1578 | ||
1579 | hugetlb_shm_group contains group id that is allowed to create SysV shared | |
1580 | memory segment using hugetlb page. | |
1581 | ||
ed7ed365 MG |
1582 | hugepages_treat_as_movable |
1583 | -------------------------- | |
1584 | ||
1585 | This parameter is only useful when kernelcore= is specified at boot time to | |
1586 | create ZONE_MOVABLE for pages that may be reclaimed or migrated. Huge pages | |
1587 | are not movable so are not normally allocated from ZONE_MOVABLE. A non-zero | |
1588 | value written to hugepages_treat_as_movable allows huge pages to be allocated | |
1589 | from ZONE_MOVABLE. | |
1590 | ||
1591 | Once enabled, the ZONE_MOVABLE is treated as an area of memory the huge | |
1592 | pages pool can easily grow or shrink within. Assuming that applications are | |
1593 | not running that mlock() a lot of memory, it is likely the huge pages pool | |
1594 | can grow to the size of ZONE_MOVABLE by repeatedly entering the desired value | |
1595 | into nr_hugepages and triggering page reclaim. | |
1596 | ||
1da177e4 LT |
1597 | laptop_mode |
1598 | ----------- | |
1599 | ||
1600 | laptop_mode is a knob that controls "laptop mode". All the things that are | |
a09a20b5 | 1601 | controlled by this knob are discussed in Documentation/laptops/laptop-mode.txt. |
1da177e4 LT |
1602 | |
1603 | block_dump | |
1604 | ---------- | |
1605 | ||
1606 | block_dump enables block I/O debugging when set to a nonzero value. More | |
a09a20b5 | 1607 | information on block I/O debugging is in Documentation/laptops/laptop-mode.txt. |
1da177e4 LT |
1608 | |
1609 | swap_token_timeout | |
1610 | ------------------ | |
1611 | ||
1612 | This file contains valid hold time of swap out protection token. The Linux | |
1613 | VM has token based thrashing control mechanism and uses the token to prevent | |
1614 | unnecessary page faults in thrashing situation. The unit of the value is | |
1615 | second. The value would be useful to tune thrashing behavior. | |
1616 | ||
9d0243bc AM |
1617 | drop_caches |
1618 | ----------- | |
1619 | ||
1620 | Writing to this will cause the kernel to drop clean caches, dentries and | |
1621 | inodes from memory, causing that memory to become free. | |
1622 | ||
1623 | To free pagecache: | |
1624 | echo 1 > /proc/sys/vm/drop_caches | |
1625 | To free dentries and inodes: | |
1626 | echo 2 > /proc/sys/vm/drop_caches | |
1627 | To free pagecache, dentries and inodes: | |
1628 | echo 3 > /proc/sys/vm/drop_caches | |
1629 | ||
1630 | As this is a non-destructive operation and dirty objects are not freeable, the | |
1631 | user should run `sync' first. | |
1632 | ||
1633 | ||
1da177e4 LT |
1634 | 2.5 /proc/sys/dev - Device specific parameters |
1635 | ---------------------------------------------- | |
1636 | ||
1637 | Currently there is only support for CDROM drives, and for those, there is only | |
1638 | one read-only file containing information about the CD-ROM drives attached to | |
1639 | the system: | |
1640 | ||
1641 | >cat /proc/sys/dev/cdrom/info | |
1642 | CD-ROM information, Id: cdrom.c 2.55 1999/04/25 | |
1643 | ||
1644 | drive name: sr0 hdb | |
1645 | drive speed: 32 40 | |
1646 | drive # of slots: 1 0 | |
1647 | Can close tray: 1 1 | |
1648 | Can open tray: 1 1 | |
1649 | Can lock tray: 1 1 | |
1650 | Can change speed: 1 1 | |
1651 | Can select disk: 0 1 | |
1652 | Can read multisession: 1 1 | |
1653 | Can read MCN: 1 1 | |
1654 | Reports media changed: 1 1 | |
1655 | Can play audio: 1 1 | |
1656 | ||
1657 | ||
1658 | You see two drives, sr0 and hdb, along with a list of their features. | |
1659 | ||
1660 | 2.6 /proc/sys/sunrpc - Remote procedure calls | |
1661 | --------------------------------------------- | |
1662 | ||
1663 | This directory contains four files, which enable or disable debugging for the | |
1664 | RPC functions NFS, NFS-daemon, RPC and NLM. The default values are 0. They can | |
1665 | be set to one to turn debugging on. (The default value is 0 for each) | |
1666 | ||
1667 | 2.7 /proc/sys/net - Networking stuff | |
1668 | ------------------------------------ | |
1669 | ||
1670 | The interface to the networking parts of the kernel is located in | |
1671 | /proc/sys/net. Table 2-3 shows all possible subdirectories. You may see only | |
1672 | some of them, depending on your kernel's configuration. | |
1673 | ||
1674 | ||
1675 | Table 2-3: Subdirectories in /proc/sys/net | |
1676 | .............................................................................. | |
1677 | Directory Content Directory Content | |
1678 | core General parameter appletalk Appletalk protocol | |
1679 | unix Unix domain sockets netrom NET/ROM | |
1680 | 802 E802 protocol ax25 AX25 | |
1681 | ethernet Ethernet protocol rose X.25 PLP layer | |
1682 | ipv4 IP version 4 x25 X.25 protocol | |
1683 | ipx IPX token-ring IBM token ring | |
1684 | bridge Bridging decnet DEC net | |
1685 | ipv6 IP version 6 | |
1686 | .............................................................................. | |
1687 | ||
1688 | We will concentrate on IP networking here. Since AX15, X.25, and DEC Net are | |
1689 | only minor players in the Linux world, we'll skip them in this chapter. You'll | |
1690 | find some short info on Appletalk and IPX further on in this chapter. Review | |
1691 | the online documentation and the kernel source to get a detailed view of the | |
1692 | parameters for those protocols. In this section we'll discuss the | |
1693 | subdirectories printed in bold letters in the table above. As default values | |
1694 | are suitable for most needs, there is no need to change these values. | |
1695 | ||
1696 | /proc/sys/net/core - Network core options | |
1697 | ----------------------------------------- | |
1698 | ||
1699 | rmem_default | |
1700 | ------------ | |
1701 | ||
1702 | The default setting of the socket receive buffer in bytes. | |
1703 | ||
1704 | rmem_max | |
1705 | -------- | |
1706 | ||
1707 | The maximum receive socket buffer size in bytes. | |
1708 | ||
1709 | wmem_default | |
1710 | ------------ | |
1711 | ||
1712 | The default setting (in bytes) of the socket send buffer. | |
1713 | ||
1714 | wmem_max | |
1715 | -------- | |
1716 | ||
1717 | The maximum send socket buffer size in bytes. | |
1718 | ||
1719 | message_burst and message_cost | |
1720 | ------------------------------ | |
1721 | ||
1722 | These parameters are used to limit the warning messages written to the kernel | |
1723 | log from the networking code. They enforce a rate limit to make a | |
1724 | denial-of-service attack impossible. A higher message_cost factor, results in | |
1725 | fewer messages that will be written. Message_burst controls when messages will | |
1726 | be dropped. The default settings limit warning messages to one every five | |
1727 | seconds. | |
1728 | ||
a2a316fd SH |
1729 | warnings |
1730 | -------- | |
1731 | ||
1732 | This controls console messages from the networking stack that can occur because | |
1733 | of problems on the network like duplicate address or bad checksums. Normally, | |
1734 | this should be enabled, but if the problem persists the messages can be | |
1735 | disabled. | |
1736 | ||
1737 | ||
1da177e4 LT |
1738 | netdev_max_backlog |
1739 | ------------------ | |
1740 | ||
1741 | Maximum number of packets, queued on the INPUT side, when the interface | |
1742 | receives packets faster than kernel can process them. | |
1743 | ||
1744 | optmem_max | |
1745 | ---------- | |
1746 | ||
1747 | Maximum ancillary buffer size allowed per socket. Ancillary data is a sequence | |
1748 | of struct cmsghdr structures with appended data. | |
1749 | ||
1750 | /proc/sys/net/unix - Parameters for Unix domain sockets | |
1751 | ------------------------------------------------------- | |
1752 | ||
1753 | There are only two files in this subdirectory. They control the delays for | |
1754 | deleting and destroying socket descriptors. | |
1755 | ||
1756 | 2.8 /proc/sys/net/ipv4 - IPV4 settings | |
1757 | -------------------------------------- | |
1758 | ||
1759 | IP version 4 is still the most used protocol in Unix networking. It will be | |
1760 | replaced by IP version 6 in the next couple of years, but for the moment it's | |
1761 | the de facto standard for the internet and is used in most networking | |
1762 | environments around the world. Because of the importance of this protocol, | |
1763 | we'll have a deeper look into the subtree controlling the behavior of the IPv4 | |
1764 | subsystem of the Linux kernel. | |
1765 | ||
1766 | Let's start with the entries in /proc/sys/net/ipv4. | |
1767 | ||
1768 | ICMP settings | |
1769 | ------------- | |
1770 | ||
1771 | icmp_echo_ignore_all and icmp_echo_ignore_broadcasts | |
1772 | ---------------------------------------------------- | |
1773 | ||
1774 | Turn on (1) or off (0), if the kernel should ignore all ICMP ECHO requests, or | |
1775 | just those to broadcast and multicast addresses. | |
1776 | ||
1777 | Please note that if you accept ICMP echo requests with a broadcast/multi\-cast | |
1778 | destination address your network may be used as an exploder for denial of | |
1779 | service packet flooding attacks to other hosts. | |
1780 | ||
1781 | icmp_destunreach_rate, icmp_echoreply_rate, icmp_paramprob_rate and icmp_timeexeed_rate | |
1782 | --------------------------------------------------------------------------------------- | |
1783 | ||
1784 | Sets limits for sending ICMP packets to specific targets. A value of zero | |
1785 | disables all limiting. Any positive value sets the maximum package rate in | |
1786 | hundredth of a second (on Intel systems). | |
1787 | ||
1788 | IP settings | |
1789 | ----------- | |
1790 | ||
1791 | ip_autoconfig | |
1792 | ------------- | |
1793 | ||
1794 | This file contains the number one if the host received its IP configuration by | |
1795 | RARP, BOOTP, DHCP or a similar mechanism. Otherwise it is zero. | |
1796 | ||
1797 | ip_default_ttl | |
1798 | -------------- | |
1799 | ||
1800 | TTL (Time To Live) for IPv4 interfaces. This is simply the maximum number of | |
1801 | hops a packet may travel. | |
1802 | ||
1803 | ip_dynaddr | |
1804 | ---------- | |
1805 | ||
1806 | Enable dynamic socket address rewriting on interface address change. This is | |
1807 | useful for dialup interface with changing IP addresses. | |
1808 | ||
1809 | ip_forward | |
1810 | ---------- | |
1811 | ||
1812 | Enable or disable forwarding of IP packages between interfaces. Changing this | |
1813 | value resets all other parameters to their default values. They differ if the | |
1814 | kernel is configured as host or router. | |
1815 | ||
1816 | ip_local_port_range | |
1817 | ------------------- | |
1818 | ||
1819 | Range of ports used by TCP and UDP to choose the local port. Contains two | |
1820 | numbers, the first number is the lowest port, the second number the highest | |
1821 | local port. Default is 1024-4999. Should be changed to 32768-61000 for | |
1822 | high-usage systems. | |
1823 | ||
1824 | ip_no_pmtu_disc | |
1825 | --------------- | |
1826 | ||
1827 | Global switch to turn path MTU discovery off. It can also be set on a per | |
1828 | socket basis by the applications or on a per route basis. | |
1829 | ||
1830 | ip_masq_debug | |
1831 | ------------- | |
1832 | ||
1833 | Enable/disable debugging of IP masquerading. | |
1834 | ||
1835 | IP fragmentation settings | |
1836 | ------------------------- | |
1837 | ||
1838 | ipfrag_high_trash and ipfrag_low_trash | |
1839 | -------------------------------------- | |
1840 | ||
1841 | Maximum memory used to reassemble IP fragments. When ipfrag_high_thresh bytes | |
1842 | of memory is allocated for this purpose, the fragment handler will toss | |
1843 | packets until ipfrag_low_thresh is reached. | |
1844 | ||
1845 | ipfrag_time | |
1846 | ----------- | |
1847 | ||
1848 | Time in seconds to keep an IP fragment in memory. | |
1849 | ||
1850 | TCP settings | |
1851 | ------------ | |
1852 | ||
1853 | tcp_ecn | |
1854 | ------- | |
1855 | ||
fa00e7e1 | 1856 | This file controls the use of the ECN bit in the IPv4 headers. This is a new |
1da177e4 | 1857 | feature about Explicit Congestion Notification, but some routers and firewalls |
fa00e7e1 ML |
1858 | block traffic that has this bit set, so it could be necessary to echo 0 to |
1859 | /proc/sys/net/ipv4/tcp_ecn if you want to talk to these sites. For more info | |
1da177e4 LT |
1860 | you could read RFC2481. |
1861 | ||
1862 | tcp_retrans_collapse | |
1863 | -------------------- | |
1864 | ||
1865 | Bug-to-bug compatibility with some broken printers. On retransmit, try to send | |
1866 | larger packets to work around bugs in certain TCP stacks. Can be turned off by | |
1867 | setting it to zero. | |
1868 | ||
1869 | tcp_keepalive_probes | |
1870 | -------------------- | |
1871 | ||
1872 | Number of keep alive probes TCP sends out, until it decides that the | |
1873 | connection is broken. | |
1874 | ||
1875 | tcp_keepalive_time | |
1876 | ------------------ | |
1877 | ||
1878 | How often TCP sends out keep alive messages, when keep alive is enabled. The | |
1879 | default is 2 hours. | |
1880 | ||
1881 | tcp_syn_retries | |
1882 | --------------- | |
1883 | ||
1884 | Number of times initial SYNs for a TCP connection attempt will be | |
1885 | retransmitted. Should not be higher than 255. This is only the timeout for | |
1886 | outgoing connections, for incoming connections the number of retransmits is | |
1887 | defined by tcp_retries1. | |
1888 | ||
1889 | tcp_sack | |
1890 | -------- | |
1891 | ||
1892 | Enable select acknowledgments after RFC2018. | |
1893 | ||
1894 | tcp_timestamps | |
1895 | -------------- | |
1896 | ||
1897 | Enable timestamps as defined in RFC1323. | |
1898 | ||
1899 | tcp_stdurg | |
1900 | ---------- | |
1901 | ||
1902 | Enable the strict RFC793 interpretation of the TCP urgent pointer field. The | |
1903 | default is to use the BSD compatible interpretation of the urgent pointer | |
1904 | pointing to the first byte after the urgent data. The RFC793 interpretation is | |
1905 | to have it point to the last byte of urgent data. Enabling this option may | |
2fe0ae78 | 1906 | lead to interoperability problems. Disabled by default. |
1da177e4 LT |
1907 | |
1908 | tcp_syncookies | |
1909 | -------------- | |
1910 | ||
1911 | Only valid when the kernel was compiled with CONFIG_SYNCOOKIES. Send out | |
1912 | syncookies when the syn backlog queue of a socket overflows. This is to ward | |
1913 | off the common 'syn flood attack'. Disabled by default. | |
1914 | ||
1915 | Note that the concept of a socket backlog is abandoned. This means the peer | |
1916 | may not receive reliable error messages from an over loaded server with | |
1917 | syncookies enabled. | |
1918 | ||
1919 | tcp_window_scaling | |
1920 | ------------------ | |
1921 | ||
1922 | Enable window scaling as defined in RFC1323. | |
1923 | ||
1924 | tcp_fin_timeout | |
1925 | --------------- | |
1926 | ||
1927 | The length of time in seconds it takes to receive a final FIN before the | |
1928 | socket is always closed. This is strictly a violation of the TCP | |
1929 | specification, but required to prevent denial-of-service attacks. | |
1930 | ||
1931 | tcp_max_ka_probes | |
1932 | ----------------- | |
1933 | ||
1934 | Indicates how many keep alive probes are sent per slow timer run. Should not | |
1935 | be set too high to prevent bursts. | |
1936 | ||
1937 | tcp_max_syn_backlog | |
1938 | ------------------- | |
1939 | ||
1940 | Length of the per socket backlog queue. Since Linux 2.2 the backlog specified | |
1941 | in listen(2) only specifies the length of the backlog queue of already | |
1942 | established sockets. When more connection requests arrive Linux starts to drop | |
1943 | packets. When syncookies are enabled the packets are still answered and the | |
1944 | maximum queue is effectively ignored. | |
1945 | ||
1946 | tcp_retries1 | |
1947 | ------------ | |
1948 | ||
1949 | Defines how often an answer to a TCP connection request is retransmitted | |
1950 | before giving up. | |
1951 | ||
1952 | tcp_retries2 | |
1953 | ------------ | |
1954 | ||
1955 | Defines how often a TCP packet is retransmitted before giving up. | |
1956 | ||
1957 | Interface specific settings | |
1958 | --------------------------- | |
1959 | ||
1960 | In the directory /proc/sys/net/ipv4/conf you'll find one subdirectory for each | |
1961 | interface the system knows about and one directory calls all. Changes in the | |
1962 | all subdirectory affect all interfaces, whereas changes in the other | |
1963 | subdirectories affect only one interface. All directories have the same | |
1964 | entries: | |
1965 | ||
1966 | accept_redirects | |
1967 | ---------------- | |
1968 | ||
1969 | This switch decides if the kernel accepts ICMP redirect messages or not. The | |
1970 | default is 'yes' if the kernel is configured for a regular host and 'no' for a | |
1971 | router configuration. | |
1972 | ||
1973 | accept_source_route | |
1974 | ------------------- | |
1975 | ||
1976 | Should source routed packages be accepted or declined. The default is | |
1977 | dependent on the kernel configuration. It's 'yes' for routers and 'no' for | |
1978 | hosts. | |
1979 | ||
1980 | bootp_relay | |
1981 | ~~~~~~~~~~~ | |
1982 | ||
1983 | Accept packets with source address 0.b.c.d with destinations not to this host | |
1984 | as local ones. It is supposed that a BOOTP relay daemon will catch and forward | |
1985 | such packets. | |
1986 | ||
1987 | The default is 0, since this feature is not implemented yet (kernel version | |
1988 | 2.2.12). | |
1989 | ||
1990 | forwarding | |
1991 | ---------- | |
1992 | ||
1993 | Enable or disable IP forwarding on this interface. | |
1994 | ||
1995 | log_martians | |
1996 | ------------ | |
1997 | ||
1998 | Log packets with source addresses with no known route to kernel log. | |
1999 | ||
2000 | mc_forwarding | |
2001 | ------------- | |
2002 | ||
2003 | Do multicast routing. The kernel needs to be compiled with CONFIG_MROUTE and a | |
2004 | multicast routing daemon is required. | |
2005 | ||
2006 | proxy_arp | |
2007 | --------- | |
2008 | ||
2009 | Does (1) or does not (0) perform proxy ARP. | |
2010 | ||
2011 | rp_filter | |
2012 | --------- | |
2013 | ||
2014 | Integer value determines if a source validation should be made. 1 means yes, 0 | |
2015 | means no. Disabled by default, but local/broadcast address spoofing is always | |
2016 | on. | |
2017 | ||
2018 | If you set this to 1 on a router that is the only connection for a network to | |
2019 | the net, it will prevent spoofing attacks against your internal networks | |
2020 | (external addresses can still be spoofed), without the need for additional | |
2021 | firewall rules. | |
2022 | ||
2023 | secure_redirects | |
2024 | ---------------- | |
2025 | ||
2026 | Accept ICMP redirect messages only for gateways, listed in default gateway | |
2027 | list. Enabled by default. | |
2028 | ||
2029 | shared_media | |
2030 | ------------ | |
2031 | ||
2032 | If it is not set the kernel does not assume that different subnets on this | |
2033 | device can communicate directly. Default setting is 'yes'. | |
2034 | ||
2035 | send_redirects | |
2036 | -------------- | |
2037 | ||
2038 | Determines whether to send ICMP redirects to other hosts. | |
2039 | ||
2040 | Routing settings | |
2041 | ---------------- | |
2042 | ||
2043 | The directory /proc/sys/net/ipv4/route contains several file to control | |
2044 | routing issues. | |
2045 | ||
2046 | error_burst and error_cost | |
2047 | -------------------------- | |
2048 | ||
2049 | These parameters are used to limit how many ICMP destination unreachable to | |
2050 | send from the host in question. ICMP destination unreachable messages are | |
84eb8d06 | 2051 | sent when we cannot reach the next hop while trying to transmit a packet. |
1da177e4 LT |
2052 | It will also print some error messages to kernel logs if someone is ignoring |
2053 | our ICMP redirects. The higher the error_cost factor is, the fewer | |
2054 | destination unreachable and error messages will be let through. Error_burst | |
2055 | controls when destination unreachable messages and error messages will be | |
2056 | dropped. The default settings limit warning messages to five every second. | |
2057 | ||
2058 | flush | |
2059 | ----- | |
2060 | ||
2061 | Writing to this file results in a flush of the routing cache. | |
2062 | ||
2063 | gc_elasticity, gc_interval, gc_min_interval_ms, gc_timeout, gc_thresh | |
2064 | --------------------------------------------------------------------- | |
2065 | ||
2066 | Values to control the frequency and behavior of the garbage collection | |
2067 | algorithm for the routing cache. gc_min_interval is deprecated and replaced | |
2068 | by gc_min_interval_ms. | |
2069 | ||
2070 | ||
2071 | max_size | |
2072 | -------- | |
2073 | ||
2074 | Maximum size of the routing cache. Old entries will be purged once the cache | |
2075 | reached has this size. | |
2076 | ||
1da177e4 LT |
2077 | redirect_load, redirect_number |
2078 | ------------------------------ | |
2079 | ||
2080 | Factors which determine if more ICPM redirects should be sent to a specific | |
2081 | host. No redirects will be sent once the load limit or the maximum number of | |
2082 | redirects has been reached. | |
2083 | ||
2084 | redirect_silence | |
2085 | ---------------- | |
2086 | ||
2087 | Timeout for redirects. After this period redirects will be sent again, even if | |
2088 | this has been stopped, because the load or number limit has been reached. | |
2089 | ||
2090 | Network Neighbor handling | |
2091 | ------------------------- | |
2092 | ||
2093 | Settings about how to handle connections with direct neighbors (nodes attached | |
2094 | to the same link) can be found in the directory /proc/sys/net/ipv4/neigh. | |
2095 | ||
2096 | As we saw it in the conf directory, there is a default subdirectory which | |
2097 | holds the default values, and one directory for each interface. The contents | |
2098 | of the directories are identical, with the single exception that the default | |
2099 | settings contain additional options to set garbage collection parameters. | |
2100 | ||
2101 | In the interface directories you'll find the following entries: | |
2102 | ||
2103 | base_reachable_time, base_reachable_time_ms | |
2104 | ------------------------------------------- | |
2105 | ||
2106 | A base value used for computing the random reachable time value as specified | |
2107 | in RFC2461. | |
2108 | ||
2109 | Expression of base_reachable_time, which is deprecated, is in seconds. | |
2110 | Expression of base_reachable_time_ms is in milliseconds. | |
2111 | ||
2112 | retrans_time, retrans_time_ms | |
2113 | ----------------------------- | |
2114 | ||
2115 | The time between retransmitted Neighbor Solicitation messages. | |
2116 | Used for address resolution and to determine if a neighbor is | |
2117 | unreachable. | |
2118 | ||
2119 | Expression of retrans_time, which is deprecated, is in 1/100 seconds (for | |
2120 | IPv4) or in jiffies (for IPv6). | |
2121 | Expression of retrans_time_ms is in milliseconds. | |
2122 | ||
2123 | unres_qlen | |
2124 | ---------- | |
2125 | ||
2126 | Maximum queue length for a pending arp request - the number of packets which | |
2127 | are accepted from other layers while the ARP address is still resolved. | |
2128 | ||
2129 | anycast_delay | |
2130 | ------------- | |
2131 | ||
2132 | Maximum for random delay of answers to neighbor solicitation messages in | |
2133 | jiffies (1/100 sec). Not yet implemented (Linux does not have anycast support | |
2134 | yet). | |
2135 | ||
2136 | ucast_solicit | |
2137 | ------------- | |
2138 | ||
2139 | Maximum number of retries for unicast solicitation. | |
2140 | ||
2141 | mcast_solicit | |
2142 | ------------- | |
2143 | ||
2144 | Maximum number of retries for multicast solicitation. | |
2145 | ||
2146 | delay_first_probe_time | |
2147 | ---------------------- | |
2148 | ||
2149 | Delay for the first time probe if the neighbor is reachable. (see | |
2150 | gc_stale_time) | |
2151 | ||
2152 | locktime | |
2153 | -------- | |
2154 | ||
2155 | An ARP/neighbor entry is only replaced with a new one if the old is at least | |
2156 | locktime old. This prevents ARP cache thrashing. | |
2157 | ||
2158 | proxy_delay | |
2159 | ----------- | |
2160 | ||
2161 | Maximum time (real time is random [0..proxytime]) before answering to an ARP | |
2162 | request for which we have an proxy ARP entry. In some cases, this is used to | |
2163 | prevent network flooding. | |
2164 | ||
2165 | proxy_qlen | |
2166 | ---------- | |
2167 | ||
2168 | Maximum queue length of the delayed proxy arp timer. (see proxy_delay). | |
2169 | ||
53cb4726 | 2170 | app_solicit |
1da177e4 LT |
2171 | ---------- |
2172 | ||
2173 | Determines the number of requests to send to the user level ARP daemon. Use 0 | |
2174 | to turn off. | |
2175 | ||
2176 | gc_stale_time | |
2177 | ------------- | |
2178 | ||
2179 | Determines how often to check for stale ARP entries. After an ARP entry is | |
2180 | stale it will be resolved again (which is useful when an IP address migrates | |
2181 | to another machine). When ucast_solicit is greater than 0 it first tries to | |
2182 | send an ARP packet directly to the known host When that fails and | |
2183 | mcast_solicit is greater than 0, an ARP request is broadcasted. | |
2184 | ||
2185 | 2.9 Appletalk | |
2186 | ------------- | |
2187 | ||
2188 | The /proc/sys/net/appletalk directory holds the Appletalk configuration data | |
2189 | when Appletalk is loaded. The configurable parameters are: | |
2190 | ||
2191 | aarp-expiry-time | |
2192 | ---------------- | |
2193 | ||
2194 | The amount of time we keep an ARP entry before expiring it. Used to age out | |
2195 | old hosts. | |
2196 | ||
2197 | aarp-resolve-time | |
2198 | ----------------- | |
2199 | ||
2200 | The amount of time we will spend trying to resolve an Appletalk address. | |
2201 | ||
2202 | aarp-retransmit-limit | |
2203 | --------------------- | |
2204 | ||
2205 | The number of times we will retransmit a query before giving up. | |
2206 | ||
2207 | aarp-tick-time | |
2208 | -------------- | |
2209 | ||
2210 | Controls the rate at which expires are checked. | |
2211 | ||
2212 | The directory /proc/net/appletalk holds the list of active Appletalk sockets | |
2213 | on a machine. | |
2214 | ||
2215 | The fields indicate the DDP type, the local address (in network:node format) | |
2216 | the remote address, the size of the transmit pending queue, the size of the | |
2217 | received queue (bytes waiting for applications to read) the state and the uid | |
2218 | owning the socket. | |
2219 | ||
2220 | /proc/net/atalk_iface lists all the interfaces configured for appletalk.It | |
2221 | shows the name of the interface, its Appletalk address, the network range on | |
2222 | that address (or network number for phase 1 networks), and the status of the | |
2223 | interface. | |
2224 | ||
2225 | /proc/net/atalk_route lists each known network route. It lists the target | |
2226 | (network) that the route leads to, the router (may be directly connected), the | |
2227 | route flags, and the device the route is using. | |
2228 | ||
2229 | 2.10 IPX | |
2230 | -------- | |
2231 | ||
2232 | The IPX protocol has no tunable values in proc/sys/net. | |
2233 | ||
2234 | The IPX protocol does, however, provide proc/net/ipx. This lists each IPX | |
2235 | socket giving the local and remote addresses in Novell format (that is | |
2236 | network:node:port). In accordance with the strange Novell tradition, | |
2237 | everything but the port is in hex. Not_Connected is displayed for sockets that | |
2238 | are not tied to a specific remote address. The Tx and Rx queue sizes indicate | |
2239 | the number of bytes pending for transmission and reception. The state | |
2240 | indicates the state the socket is in and the uid is the owning uid of the | |
2241 | socket. | |
2242 | ||
2243 | The /proc/net/ipx_interface file lists all IPX interfaces. For each interface | |
2244 | it gives the network number, the node number, and indicates if the network is | |
2245 | the primary network. It also indicates which device it is bound to (or | |
2246 | Internal for internal networks) and the Frame Type if appropriate. Linux | |
2247 | supports 802.3, 802.2, 802.2 SNAP and DIX (Blue Book) ethernet framing for | |
2248 | IPX. | |
2249 | ||
2250 | The /proc/net/ipx_route table holds a list of IPX routes. For each route it | |
2251 | gives the destination network, the router node (or Directly) and the network | |
2252 | address of the router (or Connected) for internal networks. | |
2253 | ||
2254 | 2.11 /proc/sys/fs/mqueue - POSIX message queues filesystem | |
2255 | ---------------------------------------------------------- | |
2256 | ||
2257 | The "mqueue" filesystem provides the necessary kernel features to enable the | |
2258 | creation of a user space library that implements the POSIX message queues | |
2259 | API (as noted by the MSG tag in the POSIX 1003.1-2001 version of the System | |
2260 | Interfaces specification.) | |
2261 | ||
2262 | The "mqueue" filesystem contains values for determining/setting the amount of | |
2263 | resources used by the file system. | |
2264 | ||
2265 | /proc/sys/fs/mqueue/queues_max is a read/write file for setting/getting the | |
2266 | maximum number of message queues allowed on the system. | |
2267 | ||
2268 | /proc/sys/fs/mqueue/msg_max is a read/write file for setting/getting the | |
2269 | maximum number of messages in a queue value. In fact it is the limiting value | |
2270 | for another (user) limit which is set in mq_open invocation. This attribute of | |
2271 | a queue must be less or equal then msg_max. | |
2272 | ||
2273 | /proc/sys/fs/mqueue/msgsize_max is a read/write file for setting/getting the | |
2274 | maximum message size value (it is every message queue's attribute set during | |
2275 | its creation). | |
2276 | ||
d7ff0dbf JFM |
2277 | 2.12 /proc/<pid>/oom_adj - Adjust the oom-killer score |
2278 | ------------------------------------------------------ | |
2279 | ||
2280 | This file can be used to adjust the score used to select which processes | |
2281 | should be killed in an out-of-memory situation. Giving it a high score will | |
2282 | increase the likelihood of this process being killed by the oom-killer. Valid | |
2283 | values are in the range -16 to +15, plus the special value -17, which disables | |
2284 | oom-killing altogether for this process. | |
2285 | ||
2286 | 2.13 /proc/<pid>/oom_score - Display current oom-killer score | |
2287 | ------------------------------------------------------------- | |
2288 | ||
2289 | ------------------------------------------------------------------------------ | |
2290 | This file can be used to check the current score used by the oom-killer is for | |
2291 | any given <pid>. Use it together with /proc/<pid>/oom_adj to tune which | |
2292 | process should be killed in an out-of-memory situation. | |
1da177e4 LT |
2293 | |
2294 | ------------------------------------------------------------------------------ | |
2295 | Summary | |
2296 | ------------------------------------------------------------------------------ | |
2297 | Certain aspects of kernel behavior can be modified at runtime, without the | |
2298 | need to recompile the kernel, or even to reboot the system. The files in the | |
2299 | /proc/sys tree can not only be read, but also modified. You can use the echo | |
2300 | command to write value into these files, thereby changing the default settings | |
2301 | of the kernel. | |
2302 | ------------------------------------------------------------------------------ | |
f9c99463 RK |
2303 | |
2304 | 2.14 /proc/<pid>/io - Display the IO accounting fields | |
2305 | ------------------------------------------------------- | |
2306 | ||
2307 | This file contains IO statistics for each running process | |
2308 | ||
2309 | Example | |
2310 | ------- | |
2311 | ||
2312 | test:/tmp # dd if=/dev/zero of=/tmp/test.dat & | |
2313 | [1] 3828 | |
2314 | ||
2315 | test:/tmp # cat /proc/3828/io | |
2316 | rchar: 323934931 | |
2317 | wchar: 323929600 | |
2318 | syscr: 632687 | |
2319 | syscw: 632675 | |
2320 | read_bytes: 0 | |
2321 | write_bytes: 323932160 | |
2322 | cancelled_write_bytes: 0 | |
2323 | ||
2324 | ||
2325 | Description | |
2326 | ----------- | |
2327 | ||
2328 | rchar | |
2329 | ----- | |
2330 | ||
2331 | I/O counter: chars read | |
2332 | The number of bytes which this task has caused to be read from storage. This | |
2333 | is simply the sum of bytes which this process passed to read() and pread(). | |
2334 | It includes things like tty IO and it is unaffected by whether or not actual | |
2335 | physical disk IO was required (the read might have been satisfied from | |
2336 | pagecache) | |
2337 | ||
2338 | ||
2339 | wchar | |
2340 | ----- | |
2341 | ||
2342 | I/O counter: chars written | |
2343 | The number of bytes which this task has caused, or shall cause to be written | |
2344 | to disk. Similar caveats apply here as with rchar. | |
2345 | ||
2346 | ||
2347 | syscr | |
2348 | ----- | |
2349 | ||
2350 | I/O counter: read syscalls | |
2351 | Attempt to count the number of read I/O operations, i.e. syscalls like read() | |
2352 | and pread(). | |
2353 | ||
2354 | ||
2355 | syscw | |
2356 | ----- | |
2357 | ||
2358 | I/O counter: write syscalls | |
2359 | Attempt to count the number of write I/O operations, i.e. syscalls like | |
2360 | write() and pwrite(). | |
2361 | ||
2362 | ||
2363 | read_bytes | |
2364 | ---------- | |
2365 | ||
2366 | I/O counter: bytes read | |
2367 | Attempt to count the number of bytes which this process really did cause to | |
2368 | be fetched from the storage layer. Done at the submit_bio() level, so it is | |
2369 | accurate for block-backed filesystems. <please add status regarding NFS and | |
2370 | CIFS at a later time> | |
2371 | ||
2372 | ||
2373 | write_bytes | |
2374 | ----------- | |
2375 | ||
2376 | I/O counter: bytes written | |
2377 | Attempt to count the number of bytes which this process caused to be sent to | |
2378 | the storage layer. This is done at page-dirtying time. | |
2379 | ||
2380 | ||
2381 | cancelled_write_bytes | |
2382 | --------------------- | |
2383 | ||
2384 | The big inaccuracy here is truncate. If a process writes 1MB to a file and | |
2385 | then deletes the file, it will in fact perform no writeout. But it will have | |
2386 | been accounted as having caused 1MB of write. | |
2387 | In other words: The number of bytes which this process caused to not happen, | |
2388 | by truncating pagecache. A task can cause "negative" IO too. If this task | |
2389 | truncates some dirty pagecache, some IO which another task has been accounted | |
2390 | for (in it's write_bytes) will not be happening. We _could_ just subtract that | |
2391 | from the truncating task's write_bytes, but there is information loss in doing | |
2392 | that. | |
2393 | ||
2394 | ||
2395 | Note | |
2396 | ---- | |
2397 | ||
2398 | At its current implementation state, this is a bit racy on 32-bit machines: if | |
2399 | process A reads process B's /proc/pid/io while process B is updating one of | |
2400 | those 64-bit counters, process A could see an intermediate result. | |
2401 | ||
2402 | ||
2403 | More information about this can be found within the taskstats documentation in | |
2404 | Documentation/accounting. | |
2405 | ||
bb90110d KH |
2406 | 2.15 /proc/<pid>/coredump_filter - Core dump filtering settings |
2407 | --------------------------------------------------------------- | |
2408 | When a process is dumped, all anonymous memory is written to a core file as | |
2409 | long as the size of the core file isn't limited. But sometimes we don't want | |
2410 | to dump some memory segments, for example, huge shared memory. Conversely, | |
2411 | sometimes we want to save file-backed memory segments into a core file, not | |
2412 | only the individual files. | |
2413 | ||
2414 | /proc/<pid>/coredump_filter allows you to customize which memory segments | |
2415 | will be dumped when the <pid> process is dumped. coredump_filter is a bitmask | |
2416 | of memory types. If a bit of the bitmask is set, memory segments of the | |
2417 | corresponding memory type are dumped, otherwise they are not dumped. | |
2418 | ||
e575f111 | 2419 | The following 7 memory types are supported: |
bb90110d KH |
2420 | - (bit 0) anonymous private memory |
2421 | - (bit 1) anonymous shared memory | |
2422 | - (bit 2) file-backed private memory | |
2423 | - (bit 3) file-backed shared memory | |
b261dfea HK |
2424 | - (bit 4) ELF header pages in file-backed private memory areas (it is |
2425 | effective only if the bit 2 is cleared) | |
e575f111 KM |
2426 | - (bit 5) hugetlb private memory |
2427 | - (bit 6) hugetlb shared memory | |
bb90110d KH |
2428 | |
2429 | Note that MMIO pages such as frame buffer are never dumped and vDSO pages | |
2430 | are always dumped regardless of the bitmask status. | |
2431 | ||
e575f111 KM |
2432 | Note bit 0-4 doesn't effect any hugetlb memory. hugetlb memory are only |
2433 | effected by bit 5-6. | |
2434 | ||
2435 | Default value of coredump_filter is 0x23; this means all anonymous memory | |
2436 | segments and hugetlb private memory are dumped. | |
bb90110d KH |
2437 | |
2438 | If you don't want to dump all shared memory segments attached to pid 1234, | |
e575f111 | 2439 | write 0x21 to the process's proc file. |
bb90110d | 2440 | |
e575f111 | 2441 | $ echo 0x21 > /proc/1234/coredump_filter |
bb90110d KH |
2442 | |
2443 | When a new process is created, the process inherits the bitmask status from its | |
2444 | parent. It is useful to set up coredump_filter before the program runs. | |
2445 | For example: | |
2446 | ||
2447 | $ echo 0x7 > /proc/self/coredump_filter | |
2448 | $ ./some_program | |
2449 | ||
2d4d4864 RP |
2450 | 2.16 /proc/<pid>/mountinfo - Information about mounts |
2451 | -------------------------------------------------------- | |
2452 | ||
2453 | This file contains lines of the form: | |
2454 | ||
2455 | 36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue | |
2456 | (1)(2)(3) (4) (5) (6) (7) (8) (9) (10) (11) | |
2457 | ||
2458 | (1) mount ID: unique identifier of the mount (may be reused after umount) | |
2459 | (2) parent ID: ID of parent (or of self for the top of the mount tree) | |
2460 | (3) major:minor: value of st_dev for files on filesystem | |
2461 | (4) root: root of the mount within the filesystem | |
2462 | (5) mount point: mount point relative to the process's root | |
2463 | (6) mount options: per mount options | |
2464 | (7) optional fields: zero or more fields of the form "tag[:value]" | |
2465 | (8) separator: marks the end of the optional fields | |
2466 | (9) filesystem type: name of filesystem of the form "type[.subtype]" | |
2467 | (10) mount source: filesystem specific information or "none" | |
2468 | (11) super options: per super block options | |
2469 | ||
2470 | Parsers should ignore all unrecognised optional fields. Currently the | |
2471 | possible optional fields are: | |
2472 | ||
2473 | shared:X mount is shared in peer group X | |
2474 | master:X mount is slave to peer group X | |
97e7e0f7 | 2475 | propagate_from:X mount is slave and receives propagation from peer group X (*) |
2d4d4864 RP |
2476 | unbindable mount is unbindable |
2477 | ||
97e7e0f7 MS |
2478 | (*) X is the closest dominant peer group under the process's root. If |
2479 | X is the immediate master of the mount, or if there's no dominant peer | |
2480 | group under the same root, then only the "master:X" field is present | |
2481 | and not the "propagate_from:X" field. | |
2482 | ||
2d4d4864 RP |
2483 | For more information on mount propagation see: |
2484 | ||
2485 | Documentation/filesystems/sharedsubtree.txt | |
2486 | ||
7ef9964e DL |
2487 | 2.17 /proc/sys/fs/epoll - Configuration options for the epoll interface |
2488 | -------------------------------------------------------- | |
2489 | ||
2490 | This directory contains configuration options for the epoll(7) interface. | |
2491 | ||
2492 | max_user_instances | |
2493 | ------------------ | |
2494 | ||
2495 | This is the maximum number of epoll file descriptors that a single user can | |
2496 | have open at a given time. The default value is 128, and should be enough | |
2497 | for normal users. | |
2498 | ||
2499 | max_user_watches | |
2500 | ---------------- | |
2501 | ||
2502 | Every epoll file descriptor can store a number of files to be monitored | |
2503 | for event readiness. Each one of these monitored files constitutes a "watch". | |
2504 | This configuration option sets the maximum number of "watches" that are | |
2505 | allowed for each user. | |
2506 | Each "watch" costs roughly 90 bytes on a 32bit kernel, and roughly 160 bytes | |
2507 | on a 64bit one. | |
2508 | The current default value for max_user_watches is the 1/32 of the available | |
2509 | low memory, divided for the "watch" cost in bytes. | |
2510 | ||
2511 | ||
f9c99463 | 2512 | ------------------------------------------------------------------------------ |
7ef9964e | 2513 |