Commit | Line | Data |
---|---|---|
aa46a7e0 EGM |
1 | kmemtrace - Kernel Memory Tracer |
2 | ||
3 | by Eduard - Gabriel Munteanu | |
4 | <eduard.munteanu@linux360.ro> | |
5 | ||
6 | I. Introduction | |
7 | =============== | |
8 | ||
9 | kmemtrace helps kernel developers figure out two things: | |
10 | 1) how different allocators (SLAB, SLUB etc.) perform | |
11 | 2) how kernel code allocates memory and how much | |
12 | ||
13 | To do this, we trace every allocation and export information to the userspace | |
14 | through the relay interface. We export things such as the number of requested | |
15 | bytes, the number of bytes actually allocated (i.e. including internal | |
16 | fragmentation), whether this is a slab allocation or a plain kmalloc() and so | |
17 | on. | |
18 | ||
19 | The actual analysis is performed by a userspace tool (see section III for | |
20 | details on where to get it from). It logs the data exported by the kernel, | |
21 | processes it and (as of writing this) can provide the following information: | |
22 | - the total amount of memory allocated and fragmentation per call-site | |
23 | - the amount of memory allocated and fragmentation per allocation | |
24 | - total memory allocated and fragmentation in the collected dataset | |
25 | - number of cross-CPU allocation and frees (makes sense in NUMA environments) | |
26 | ||
27 | Moreover, it can potentially find inconsistent and erroneous behavior in | |
28 | kernel code, such as using slab free functions on kmalloc'ed memory or | |
29 | allocating less memory than requested (but not truly failed allocations). | |
30 | ||
31 | kmemtrace also makes provisions for tracing on some arch and analysing the | |
32 | data on another. | |
33 | ||
34 | II. Design and goals | |
35 | ==================== | |
36 | ||
37 | kmemtrace was designed to handle rather large amounts of data. Thus, it uses | |
38 | the relay interface to export whatever is logged to userspace, which then | |
39 | stores it. Analysis and reporting is done asynchronously, that is, after the | |
40 | data is collected and stored. By design, it allows one to log and analyse | |
41 | on different machines and different arches. | |
42 | ||
43 | As of writing this, the ABI is not considered stable, though it might not | |
44 | change much. However, no guarantees are made about compatibility yet. When | |
45 | deemed stable, the ABI should still allow easy extension while maintaining | |
46 | backward compatibility. This is described further in Documentation/ABI. | |
47 | ||
48 | Summary of design goals: | |
49 | - allow logging and analysis to be done across different machines | |
50 | - be fast and anticipate usage in high-load environments (*) | |
51 | - be reasonably extensible | |
52 | - make it possible for GNU/Linux distributions to have kmemtrace | |
53 | included in their repositories | |
54 | ||
55 | (*) - one of the reasons Pekka Enberg's original userspace data analysis | |
56 | tool's code was rewritten from Perl to C (although this is more than a | |
57 | simple conversion) | |
58 | ||
59 | ||
60 | III. Quick usage guide | |
61 | ====================== | |
62 | ||
63 | 1) Get a kernel that supports kmemtrace and build it accordingly (i.e. enable | |
bf6803d6 | 64 | CONFIG_KMEMTRACE). |
aa46a7e0 EGM |
65 | |
66 | 2) Get the userspace tool and build it: | |
ff2f5ff0 | 67 | $ git clone git://repo.or.cz/kmemtrace-user.git # current repository |
aa46a7e0 EGM |
68 | $ cd kmemtrace-user/ |
69 | $ ./autogen.sh | |
70 | $ ./configure | |
71 | $ make | |
72 | ||
73 | 3) Boot the kmemtrace-enabled kernel if you haven't, preferably in the | |
74 | 'single' runlevel (so that relay buffers don't fill up easily), and run | |
75 | kmemtrace: | |
76 | # '$' does not mean user, but root here. | |
77 | $ mount -t debugfs none /sys/kernel/debug | |
78 | $ mount -t proc none /proc | |
79 | $ cd path/to/kmemtrace-user/ | |
80 | $ ./kmemtraced | |
81 | Wait a bit, then stop it with CTRL+C. | |
82 | $ cat /sys/kernel/debug/kmemtrace/total_overruns # Check if we didn't | |
83 | # overrun, should | |
84 | # be zero. | |
85 | $ (Optionally) [Run kmemtrace_check separately on each cpu[0-9]*.out file to | |
86 | check its correctness] | |
87 | $ ./kmemtrace-report | |
88 | ||
89 | Now you should have a nice and short summary of how the allocator performs. | |
90 | ||
91 | IV. FAQ and known issues | |
92 | ======================== | |
93 | ||
94 | Q: 'cat /sys/kernel/debug/kmemtrace/total_overruns' is non-zero, how do I fix | |
95 | this? Should I worry? | |
96 | A: If it's non-zero, this affects kmemtrace's accuracy, depending on how | |
97 | large the number is. You can fix it by supplying a higher | |
98 | 'kmemtrace.subbufs=N' kernel parameter. | |
99 | --- | |
100 | ||
101 | Q: kmemtrace_check reports errors, how do I fix this? Should I worry? | |
102 | A: This is a bug and should be reported. It can occur for a variety of | |
103 | reasons: | |
104 | - possible bugs in relay code | |
105 | - possible misuse of relay by kmemtrace | |
106 | - timestamps being collected unorderly | |
107 | Or you may fix it yourself and send us a patch. | |
108 | --- | |
109 | ||
110 | Q: kmemtrace_report shows many errors, how do I fix this? Should I worry? | |
111 | A: This is a known issue and I'm working on it. These might be true errors | |
112 | in kernel code, which may have inconsistent behavior (e.g. allocating memory | |
113 | with kmem_cache_alloc() and freeing it with kfree()). Pekka Enberg pointed | |
114 | out this behavior may work with SLAB, but may fail with other allocators. | |
115 | ||
116 | It may also be due to lack of tracing in some unusual allocator functions. | |
117 | ||
118 | We don't want bug reports regarding this issue yet. | |
119 | --- | |
120 | ||
121 | V. See also | |
122 | =========== | |
123 | ||
124 | Documentation/kernel-parameters.txt | |
125 | Documentation/ABI/testing/debugfs-kmemtrace | |
126 |