Commit | Line | Data |
---|---|---|
e33e0a43 | 1 | perf-record(1) |
c1c2365a | 2 | ============== |
e33e0a43 IM |
3 | |
4 | NAME | |
5 | ---- | |
23ac9cbe | 6 | perf-record - Run a command and record its profile into perf.data |
e33e0a43 IM |
7 | |
8 | SYNOPSIS | |
9 | -------- | |
10 | [verse] | |
11 | 'perf record' [-e <EVENT> | --event=EVENT] [-l] [-a] <command> | |
9e096753 | 12 | 'perf record' [-e <EVENT> | --event=EVENT] [-l] [-a] -- <command> [<options>] |
e33e0a43 IM |
13 | |
14 | DESCRIPTION | |
15 | ----------- | |
16 | This command runs a command and gathers a performance counter profile | |
23ac9cbe | 17 | from it, into perf.data - without displaying anything. |
e33e0a43 IM |
18 | |
19 | This file can then be inspected later on, using 'perf report'. | |
20 | ||
21 | ||
22 | OPTIONS | |
23 | ------- | |
24 | <command>...:: | |
25 | Any command you can specify in a shell. | |
26 | ||
27 | -e:: | |
28 | --event=:: | |
1b290d67 | 29 | Select the PMU event. Selection can be: |
e33e0a43 | 30 | |
1b290d67 FW |
31 | - a symbolic event name (use 'perf list' to list all events) |
32 | ||
33 | - a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a | |
34 | hexadecimal event descriptor. | |
35 | ||
f9ab9c19 CS |
36 | - a symbolically formed PMU event like 'pmu/param1=0x3,param2/' where |
37 | 'param1', 'param2', etc are defined as formats for the PMU in | |
38 | /sys/bus/event_sources/devices/<pmu>/format/*. | |
39 | ||
40 | - a symbolically formed event like 'pmu/config=M,config1=N,config3=K/' | |
41 | ||
42 | where M, N, K are numbers (in decimal, hex, octal format). Acceptable | |
43 | values for each of 'config', 'config1' and 'config2' are defined by | |
44 | corresponding entries in /sys/bus/event_sources/devices/<pmu>/format/* | |
45 | param1 and param2 are defined as formats for the PMU in: | |
46 | /sys/bus/event_sources/devices/<pmu>/format/* | |
47 | ||
3d5d68aa | 48 | There are also some params which are not defined in .../<pmu>/format/*. |
ee4c7588 | 49 | These params can be used to overload default config values per event. |
3d5d68aa KL |
50 | Here is a list of the params. |
51 | - 'period': Set event sampling period | |
09af2a55 | 52 | - 'freq': Set event sampling frequency |
32067712 KL |
53 | - 'time': Disable/enable time stamping. Acceptable values are 1 for |
54 | enabling time stamping. 0 for disabling time stamping. | |
55 | The default is 1. | |
d457c963 | 56 | - 'call-graph': Disable/enable callgraph. Acceptable str are "fp" for |
f9db0d0f KL |
57 | FP mode, "dwarf" for DWARF mode, "lbr" for LBR mode and |
58 | "no" for disable callgraph. | |
d457c963 | 59 | - 'stack-size': user stack size for dwarf mode |
3d5d68aa KL |
60 | Note: If user explicitly sets options which conflict with the params, |
61 | the value set by the params will be overridden. | |
62 | ||
3741eb9f | 63 | - a hardware breakpoint event in the form of '\mem:addr[/len][:access]' |
1b290d67 FW |
64 | where addr is the address in memory you want to break in. |
65 | Access is the memory access type (read, write, execute) it can | |
3741eb9f JS |
66 | be passed as follows: '\mem:addr[:[r][w][x]]'. len is the range, |
67 | number of bytes from specified addr, which the breakpoint will cover. | |
1b290d67 FW |
68 | If you want to profile read-write accesses in 0x1000, just set |
69 | 'mem:0x1000:rw'. | |
3741eb9f JS |
70 | If you want to profile write accesses in [0x1000~1008), just set |
71 | 'mem:0x1000/8:w'. | |
08dbd7e3 | 72 | |
9a75606c NK |
73 | - a group of events surrounded by a pair of brace ("{event1,event2,...}"). |
74 | Each event is separated by commas and the group should be quoted to | |
75 | prevent the shell interpretation. You also need to use --group on | |
76 | "perf report" to view group events together. | |
77 | ||
08dbd7e3 | 78 | --filter=<filter>:: |
4ba1faa1 WN |
79 | Event filter. This option should follow a event selector (-e) which |
80 | selects tracepoint event(s). Multiple '--filter' options are combined | |
81 | using '&&'. | |
82 | ||
83 | --exclude-perf:: | |
84 | Don't record events issued by perf itself. This option should follow | |
85 | a event selector (-e) which selects tracepoint event(s). It adds a | |
86 | filter expression 'common_pid != $PERFPID' to filters. If other | |
87 | '--filter' exists, the new filter expression will be combined with | |
88 | them by '&&'. | |
08dbd7e3 | 89 | |
e33e0a43 | 90 | -a:: |
08dbd7e3 SB |
91 | --all-cpus:: |
92 | System-wide collection from all CPUs. | |
e33e0a43 | 93 | |
386c0b70 ACM |
94 | -p:: |
95 | --pid=:: | |
b52956c9 | 96 | Record events on existing process ID (comma separated list). |
08dbd7e3 SB |
97 | |
98 | -t:: | |
99 | --tid=:: | |
b52956c9 | 100 | Record events on existing thread ID (comma separated list). |
69e7e5b0 AH |
101 | This option also disables inheritance by default. Enable it by adding |
102 | --inherit. | |
386c0b70 | 103 | |
0d37aa34 ACM |
104 | -u:: |
105 | --uid=:: | |
106 | Record events in threads owned by uid. Name or number. | |
107 | ||
386c0b70 ACM |
108 | -r:: |
109 | --realtime=:: | |
110 | Collect data with this RT SCHED_FIFO priority. | |
563aecb2 | 111 | |
509051ea | 112 | --no-buffering:: |
acac03fa | 113 | Collect data without buffering. |
386c0b70 | 114 | |
386c0b70 ACM |
115 | -c:: |
116 | --count=:: | |
117 | Event period to sample. | |
118 | ||
119 | -o:: | |
120 | --output=:: | |
121 | Output file name. | |
122 | ||
123 | -i:: | |
2e6cdf99 SE |
124 | --no-inherit:: |
125 | Child tasks do not inherit counters. | |
386c0b70 ACM |
126 | -F:: |
127 | --freq=:: | |
128 | Profile at this frequency. | |
129 | ||
130 | -m:: | |
131 | --mmap-pages=:: | |
27050f53 JO |
132 | Number of mmap data pages (must be a power of two) or size |
133 | specification with appended unit character - B/K/M/G. The | |
134 | size is rounded up to have nearest pages power of two value. | |
e9db1310 AH |
135 | Also, by adding a comma, the number of mmap pages for AUX |
136 | area tracing can be specified. | |
386c0b70 | 137 | |
9a75606c NK |
138 | --group:: |
139 | Put all events in a single event group. This precedes the --event | |
140 | option and remains only for backward compatibility. See --event. | |
141 | ||
386c0b70 | 142 | -g:: |
09b0fd45 JO |
143 | Enables call-graph (stack chain/backtrace) recording. |
144 | ||
386c0b70 | 145 | --call-graph:: |
09b0fd45 JO |
146 | Setup and enable call-graph (stack chain/backtrace) recording, |
147 | implies -g. | |
148 | ||
149 | Allows specifying "fp" (frame pointer) or "dwarf" | |
aad2b21c KL |
150 | (DWARF's CFI - Call Frame Information) or "lbr" |
151 | (Hardware Last Branch Record facility) as the method to collect | |
09b0fd45 JO |
152 | the information used to show the call graphs. |
153 | ||
154 | In some systems, where binaries are build with gcc | |
155 | --fomit-frame-pointer, using the "fp" method will produce bogus | |
156 | call graphs, using "dwarf", if available (perf tools linked to | |
157 | the libunwind library) should be used instead. | |
aad2b21c KL |
158 | Using the "lbr" method doesn't require any compiler options. It |
159 | will produce call graphs from the hardware LBR registers. The | |
160 | main limition is that it is only available on new Intel | |
161 | platforms, such as Haswell. It can only get user call chain. It | |
162 | doesn't work with branch stack sampling at the same time. | |
386c0b70 | 163 | |
b44308f5 ACM |
164 | -q:: |
165 | --quiet:: | |
166 | Don't print any message, useful for scripting. | |
167 | ||
386c0b70 ACM |
168 | -v:: |
169 | --verbose:: | |
170 | Be more verbose (show counter open errors, etc). | |
171 | ||
172 | -s:: | |
173 | --stat:: | |
1f91d5fd NK |
174 | Record per-thread event counts. Use it with 'perf report -T' to see |
175 | the values. | |
386c0b70 ACM |
176 | |
177 | -d:: | |
178 | --data:: | |
56100321 | 179 | Record the sample addresses. |
386c0b70 | 180 | |
9c90a61c ACM |
181 | -T:: |
182 | --timestamp:: | |
56100321 PZ |
183 | Record the sample timestamps. Use it with 'perf report -D' to see the |
184 | timestamps, for instance. | |
185 | ||
186 | -P:: | |
187 | --period:: | |
188 | Record the sample period. | |
9c90a61c | 189 | |
386c0b70 ACM |
190 | -n:: |
191 | --no-samples:: | |
192 | Don't sample. | |
e33e0a43 | 193 | |
ec7ba4ea FW |
194 | -R:: |
195 | --raw-samples:: | |
bdef3b02 | 196 | Collect raw sample records from all opened counters (default for tracepoint counters). |
ec7ba4ea | 197 | |
c45c6ea2 SE |
198 | -C:: |
199 | --cpu:: | |
08dbd7e3 SB |
200 | Collect samples only on the list of CPUs provided. Multiple CPUs can be provided as a |
201 | comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2. | |
c45c6ea2 SE |
202 | In per-thread mode with inheritance mode on (default), samples are captured only when |
203 | the thread executes on the designated CPUs. Default is to monitor all CPUs. | |
204 | ||
a1ac1d3c SE |
205 | -N:: |
206 | --no-buildid-cache:: | |
96355f2c | 207 | Do not update the buildid cache. This saves some overhead in situations |
a1ac1d3c SE |
208 | where the information in the perf.data file (which includes buildids) |
209 | is sufficient. | |
210 | ||
023695d9 SE |
211 | -G name,...:: |
212 | --cgroup name,...:: | |
213 | monitor only in the container (cgroup) called "name". This option is available only | |
214 | in per-cpu mode. The cgroup filesystem must be mounted. All threads belonging to | |
215 | container "name" are monitored when they run on the monitored CPUs. Multiple cgroups | |
216 | can be provided. Each cgroup is applied to the corresponding event, i.e., first cgroup | |
217 | to first event, second cgroup to second event and so on. It is possible to provide | |
218 | an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have | |
219 | corresponding events, i.e., they always refer to events defined earlier on the command | |
220 | line. | |
221 | ||
bdfebd84 | 222 | -b:: |
a5aabdac SE |
223 | --branch-any:: |
224 | Enable taken branch stack sampling. Any type of taken branch may be sampled. | |
225 | This is a shortcut for --branch-filter any. See --branch-filter for more infos. | |
226 | ||
227 | -j:: | |
228 | --branch-filter:: | |
bdfebd84 RAV |
229 | Enable taken branch stack sampling. Each sample captures a series of consecutive |
230 | taken branches. The number of branches captured with each sample depends on the | |
231 | underlying hardware, the type of branches of interest, and the executed code. | |
232 | It is possible to select the types of branches captured by enabling filters. The | |
233 | following filters are defined: | |
234 | ||
a5aabdac | 235 | - any: any type of branches |
bdfebd84 RAV |
236 | - any_call: any function call or system call |
237 | - any_ret: any function return or system call return | |
2e49a948 | 238 | - ind_call: any indirect branch |
bdfebd84 RAV |
239 | - u: only when the branch target is at the user level |
240 | - k: only when the branch target is in the kernel | |
241 | - hv: only when the target is at the hypervisor level | |
0126d493 AK |
242 | - in_tx: only when the target is in a hardware transaction |
243 | - no_tx: only when the target is not in a hardware transaction | |
244 | - abort_tx: only when the target is a hardware transaction abort | |
3e39db4a | 245 | - cond: conditional branches |
bdfebd84 RAV |
246 | |
247 | + | |
3e39db4a | 248 | The option requires at least one branch type among any, any_call, any_ret, ind_call, cond. |
9c768207 | 249 | The privilege levels may be omitted, in which case, the privilege levels of the associated |
a5aabdac SE |
250 | event are applied to the branch filter. Both kernel (k) and hypervisor (hv) privilege |
251 | levels are subject to permissions. When sampling on multiple events, branch stack sampling | |
252 | is enabled for all the sampling events. The sampled branch type is the same for all events. | |
253 | The various filters must be specified as a comma separated list: --branch-filter any_ret,u,k | |
254 | Note that this feature may not be available on all processors. | |
bdfebd84 | 255 | |
05484298 AK |
256 | --weight:: |
257 | Enable weightened sampling. An additional weight is recorded per sample and can be | |
258 | displayed with the weight and local_weight sort keys. This currently works for TSX | |
259 | abort events and some memory events in precise mode on modern Intel CPUs. | |
260 | ||
475eeab9 AK |
261 | --transaction:: |
262 | Record transaction flags for transaction related events. | |
263 | ||
3aa5939d AH |
264 | --per-thread:: |
265 | Use per-thread mmaps. By default per-cpu mmaps are created. This option | |
266 | overrides that and uses per-thread mmaps. A side-effect of that is that | |
267 | inheritance is automatically disabled. --per-thread is ignored with a warning | |
268 | if combined with -a or -C options. | |
539e6bb7 | 269 | |
a6205a35 ACM |
270 | -D:: |
271 | --delay=:: | |
6619a53e AK |
272 | After starting the program, wait msecs before measuring. This is useful to |
273 | filter out the startup phase of the program, which is often very different. | |
274 | ||
4b6c5177 SE |
275 | -I:: |
276 | --intr-regs:: | |
277 | Capture machine state (registers) at interrupt, i.e., on counter overflows for | |
278 | each sample. List of captured registers depends on the architecture. This option | |
bcc84ec6 SE |
279 | is off by default. It is possible to select the registers to sample using their |
280 | symbolic names, e.g. on x86, ax, si. To list the available registers use | |
281 | --intr-regs=\?. To name registers, pass a comma separated list such as | |
282 | --intr-regs=ax,bx. The list of register is architecture dependent. | |
283 | ||
4b6c5177 | 284 | |
85c273d2 AK |
285 | --running-time:: |
286 | Record running and enabled time for read events (:S) | |
287 | ||
814c8c38 PZ |
288 | -k:: |
289 | --clockid:: | |
290 | Sets the clock id to use for the various time fields in the perf_event_type | |
291 | records. See clock_gettime(). In particular CLOCK_MONOTONIC and | |
292 | CLOCK_MONOTONIC_RAW are supported, some events might also allow | |
293 | CLOCK_BOOTTIME, CLOCK_REALTIME and CLOCK_TAI. | |
294 | ||
2dd6d8a1 AH |
295 | -S:: |
296 | --snapshot:: | |
297 | Select AUX area tracing Snapshot Mode. This option is valid only with an | |
298 | AUX area tracing event. Optionally the number of bytes to capture per | |
299 | snapshot can be specified. In Snapshot Mode, trace data is captured only when | |
300 | signal SIGUSR2 is received. | |
301 | ||
9d9cad76 KL |
302 | --proc-map-timeout:: |
303 | When processing pre-existing threads /proc/XXX/mmap, it may take a long time, | |
304 | because the file may be huge. A time out is needed in such cases. | |
305 | This option sets the time out limit. The default value is 500 ms. | |
306 | ||
b757bb09 AH |
307 | --switch-events:: |
308 | Record context switch events i.e. events of type PERF_RECORD_SWITCH or | |
309 | PERF_RECORD_SWITCH_CPU_WIDE. | |
310 | ||
e33e0a43 IM |
311 | SEE ALSO |
312 | -------- | |
386b05e3 | 313 | linkperf:perf-stat[1], linkperf:perf-list[1] |