| 1 | perf-report(1) |
| 2 | ============== |
| 3 | |
| 4 | NAME |
| 5 | ---- |
| 6 | perf-report - Read perf.data (created by perf record) and display the profile |
| 7 | |
| 8 | SYNOPSIS |
| 9 | -------- |
| 10 | [verse] |
| 11 | 'perf report' [-i <file> | --input=file] |
| 12 | |
| 13 | DESCRIPTION |
| 14 | ----------- |
| 15 | This command displays the performance counter profile information recorded |
| 16 | via perf record. |
| 17 | |
| 18 | OPTIONS |
| 19 | ------- |
| 20 | -i:: |
| 21 | --input=:: |
| 22 | Input file name. (default: perf.data unless stdin is a fifo) |
| 23 | |
| 24 | -v:: |
| 25 | --verbose:: |
| 26 | Be more verbose. (show symbol address, etc) |
| 27 | |
| 28 | -n:: |
| 29 | --show-nr-samples:: |
| 30 | Show the number of samples for each symbol |
| 31 | |
| 32 | --show-cpu-utilization:: |
| 33 | Show sample percentage for different cpu modes. |
| 34 | |
| 35 | -T:: |
| 36 | --threads:: |
| 37 | Show per-thread event counters. The input data file should be recorded |
| 38 | with -s option. |
| 39 | -c:: |
| 40 | --comms=:: |
| 41 | Only consider symbols in these comms. CSV that understands |
| 42 | file://filename entries. This option will affect the percentage of |
| 43 | the overhead column. See --percentage for more info. |
| 44 | --pid=:: |
| 45 | Only show events for given process ID (comma separated list). |
| 46 | |
| 47 | --tid=:: |
| 48 | Only show events for given thread ID (comma separated list). |
| 49 | -d:: |
| 50 | --dsos=:: |
| 51 | Only consider symbols in these dsos. CSV that understands |
| 52 | file://filename entries. This option will affect the percentage of |
| 53 | the overhead column. See --percentage for more info. |
| 54 | -S:: |
| 55 | --symbols=:: |
| 56 | Only consider these symbols. CSV that understands |
| 57 | file://filename entries. This option will affect the percentage of |
| 58 | the overhead column. See --percentage for more info. |
| 59 | |
| 60 | --symbol-filter=:: |
| 61 | Only show symbols that match (partially) with this filter. |
| 62 | |
| 63 | -U:: |
| 64 | --hide-unresolved:: |
| 65 | Only display entries resolved to a symbol. |
| 66 | |
| 67 | -s:: |
| 68 | --sort=:: |
| 69 | Sort histogram entries by given key(s) - multiple keys can be specified |
| 70 | in CSV format. Following sort keys are available: |
| 71 | pid, comm, dso, symbol, parent, cpu, socket, srcline, weight, local_weight. |
| 72 | |
| 73 | Each key has following meaning: |
| 74 | |
| 75 | - comm: command (name) of the task which can be read via /proc/<pid>/comm |
| 76 | - pid: command and tid of the task |
| 77 | - dso: name of library or module executed at the time of sample |
| 78 | - symbol: name of function executed at the time of sample |
| 79 | - parent: name of function matched to the parent regex filter. Unmatched |
| 80 | entries are displayed as "[other]". |
| 81 | - cpu: cpu number the task ran at the time of sample |
| 82 | - socket: processor socket number the task ran at the time of sample |
| 83 | - srcline: filename and line number executed at the time of sample. The |
| 84 | DWARF debugging info must be provided. |
| 85 | - srcfile: file name of the source file of the same. Requires dwarf |
| 86 | information. |
| 87 | - weight: Event specific weight, e.g. memory latency or transaction |
| 88 | abort cost. This is the global weight. |
| 89 | - local_weight: Local weight version of the weight above. |
| 90 | - transaction: Transaction abort flags. |
| 91 | - overhead: Overhead percentage of sample |
| 92 | - overhead_sys: Overhead percentage of sample running in system mode |
| 93 | - overhead_us: Overhead percentage of sample running in user mode |
| 94 | - overhead_guest_sys: Overhead percentage of sample running in system mode |
| 95 | on guest machine |
| 96 | - overhead_guest_us: Overhead percentage of sample running in user mode on |
| 97 | guest machine |
| 98 | - sample: Number of sample |
| 99 | - period: Raw number of event count of sample |
| 100 | |
| 101 | By default, comm, dso and symbol keys are used. |
| 102 | (i.e. --sort comm,dso,symbol) |
| 103 | |
| 104 | If --branch-stack option is used, following sort keys are also |
| 105 | available: |
| 106 | |
| 107 | - dso_from: name of library or module branched from |
| 108 | - dso_to: name of library or module branched to |
| 109 | - symbol_from: name of function branched from |
| 110 | - symbol_to: name of function branched to |
| 111 | - srcline_from: source file and line branched from |
| 112 | - srcline_to: source file and line branched to |
| 113 | - mispredict: "N" for predicted branch, "Y" for mispredicted branch |
| 114 | - in_tx: branch in TSX transaction |
| 115 | - abort: TSX transaction abort. |
| 116 | - cycles: Cycles in basic block |
| 117 | |
| 118 | And default sort keys are changed to comm, dso_from, symbol_from, dso_to |
| 119 | and symbol_to, see '--branch-stack'. |
| 120 | |
| 121 | If the --mem-mode option is used, the following sort keys are also available |
| 122 | (incompatible with --branch-stack): |
| 123 | symbol_daddr, dso_daddr, locked, tlb, mem, snoop, dcacheline. |
| 124 | |
| 125 | - symbol_daddr: name of data symbol being executed on at the time of sample |
| 126 | - dso_daddr: name of library or module containing the data being executed |
| 127 | on at the time of the sample |
| 128 | - locked: whether the bus was locked at the time of the sample |
| 129 | - tlb: type of tlb access for the data at the time of the sample |
| 130 | - mem: type of memory access for the data at the time of the sample |
| 131 | - snoop: type of snoop (if any) for the data at the time of the sample |
| 132 | - dcacheline: the cacheline the data address is on at the time of the sample |
| 133 | |
| 134 | And the default sort keys are changed to local_weight, mem, sym, dso, |
| 135 | symbol_daddr, dso_daddr, snoop, tlb, locked, see '--mem-mode'. |
| 136 | |
| 137 | If the data file has tracepoint event(s), following (dynamic) sort keys |
| 138 | are also available: |
| 139 | trace, trace_fields, [<event>.]<field>[/raw] |
| 140 | |
| 141 | - trace: pretty printed trace output in a single column |
| 142 | - trace_fields: fields in tracepoints in separate columns |
| 143 | - <field name>: optional event and field name for a specific field |
| 144 | |
| 145 | The last form consists of event and field names. If event name is |
| 146 | omitted, it searches all events for matching field name. The matched |
| 147 | field will be shown only for the event has the field. The event name |
| 148 | supports substring match so user doesn't need to specify full subsystem |
| 149 | and event name everytime. For example, 'sched:sched_switch' event can |
| 150 | be shortened to 'switch' as long as it's not ambiguous. Also event can |
| 151 | be specified by its index (starting from 1) preceded by the '%'. |
| 152 | So '%1' is the first event, '%2' is the second, and so on. |
| 153 | |
| 154 | The field name can have '/raw' suffix which disables pretty printing |
| 155 | and shows raw field value like hex numbers. The --raw-trace option |
| 156 | has the same effect for all dynamic sort keys. |
| 157 | |
| 158 | The default sort keys are changed to 'trace' if all events in the data |
| 159 | file are tracepoint. |
| 160 | |
| 161 | -F:: |
| 162 | --fields=:: |
| 163 | Specify output field - multiple keys can be specified in CSV format. |
| 164 | Following fields are available: |
| 165 | overhead, overhead_sys, overhead_us, overhead_children, sample and period. |
| 166 | Also it can contain any sort key(s). |
| 167 | |
| 168 | By default, every sort keys not specified in -F will be appended |
| 169 | automatically. |
| 170 | |
| 171 | -p:: |
| 172 | --parent=<regex>:: |
| 173 | A regex filter to identify parent. The parent is a caller of this |
| 174 | function and searched through the callchain, thus it requires callchain |
| 175 | information recorded. The pattern is in the exteneded regex format and |
| 176 | defaults to "\^sys_|^do_page_fault", see '--sort parent'. |
| 177 | |
| 178 | -x:: |
| 179 | --exclude-other:: |
| 180 | Only display entries with parent-match. |
| 181 | |
| 182 | -w:: |
| 183 | --column-widths=<width[,width...]>:: |
| 184 | Force each column width to the provided list, for large terminal |
| 185 | readability. 0 means no limit (default behavior). |
| 186 | |
| 187 | -t:: |
| 188 | --field-separator=:: |
| 189 | Use a special separator character and don't pad with spaces, replacing |
| 190 | all occurrences of this separator in symbol names (and other output) |
| 191 | with a '.' character, that thus it's the only non valid separator. |
| 192 | |
| 193 | -D:: |
| 194 | --dump-raw-trace:: |
| 195 | Dump raw trace in ASCII. |
| 196 | |
| 197 | -g:: |
| 198 | --call-graph=<print_type,threshold[,print_limit],order,sort_key[,branch],value>:: |
| 199 | Display call chains using type, min percent threshold, print limit, |
| 200 | call order, sort key, optional branch and value. Note that ordering of |
| 201 | parameters is not fixed so any parement can be given in an arbitraty order. |
| 202 | One exception is the print_limit which should be preceded by threshold. |
| 203 | |
| 204 | print_type can be either: |
| 205 | - flat: single column, linear exposure of call chains. |
| 206 | - graph: use a graph tree, displaying absolute overhead rates. (default) |
| 207 | - fractal: like graph, but displays relative rates. Each branch of |
| 208 | the tree is considered as a new profiled object. |
| 209 | - folded: call chains are displayed in a line, separated by semicolons |
| 210 | - none: disable call chain display. |
| 211 | |
| 212 | threshold is a percentage value which specifies a minimum percent to be |
| 213 | included in the output call graph. Default is 0.5 (%). |
| 214 | |
| 215 | print_limit is only applied when stdio interface is used. It's to limit |
| 216 | number of call graph entries in a single hist entry. Note that it needs |
| 217 | to be given after threshold (but not necessarily consecutive). |
| 218 | Default is 0 (unlimited). |
| 219 | |
| 220 | order can be either: |
| 221 | - callee: callee based call graph. |
| 222 | - caller: inverted caller based call graph. |
| 223 | Default is 'caller' when --children is used, otherwise 'callee'. |
| 224 | |
| 225 | sort_key can be: |
| 226 | - function: compare on functions (default) |
| 227 | - address: compare on individual code addresses |
| 228 | |
| 229 | branch can be: |
| 230 | - branch: include last branch information in callgraph when available. |
| 231 | Usually more convenient to use --branch-history for this. |
| 232 | |
| 233 | value can be: |
| 234 | - percent: diplay overhead percent (default) |
| 235 | - period: display event period |
| 236 | - count: display event count |
| 237 | |
| 238 | --children:: |
| 239 | Accumulate callchain of children to parent entry so that then can |
| 240 | show up in the output. The output will have a new "Children" column |
| 241 | and will be sorted on the data. It requires callchains are recorded. |
| 242 | See the `overhead calculation' section for more details. |
| 243 | |
| 244 | --max-stack:: |
| 245 | Set the stack depth limit when parsing the callchain, anything |
| 246 | beyond the specified depth will be ignored. This is a trade-off |
| 247 | between information loss and faster processing especially for |
| 248 | workloads that can have a very long callchain stack. |
| 249 | Note that when using the --itrace option the synthesized callchain size |
| 250 | will override this value if the synthesized callchain size is bigger. |
| 251 | |
| 252 | Default: 127 |
| 253 | |
| 254 | -G:: |
| 255 | --inverted:: |
| 256 | alias for inverted caller based call graph. |
| 257 | |
| 258 | --ignore-callees=<regex>:: |
| 259 | Ignore callees of the function(s) matching the given regex. |
| 260 | This has the effect of collecting the callers of each such |
| 261 | function into one place in the call-graph tree. |
| 262 | |
| 263 | --pretty=<key>:: |
| 264 | Pretty printing style. key: normal, raw |
| 265 | |
| 266 | --stdio:: Use the stdio interface. |
| 267 | |
| 268 | --tui:: Use the TUI interface, that is integrated with annotate and allows |
| 269 | zooming into DSOs or threads, among other features. Use of --tui |
| 270 | requires a tty, if one is not present, as when piping to other |
| 271 | commands, the stdio interface is used. |
| 272 | |
| 273 | --gtk:: Use the GTK2 interface. |
| 274 | |
| 275 | -k:: |
| 276 | --vmlinux=<file>:: |
| 277 | vmlinux pathname |
| 278 | |
| 279 | --kallsyms=<file>:: |
| 280 | kallsyms pathname |
| 281 | |
| 282 | -m:: |
| 283 | --modules:: |
| 284 | Load module symbols. WARNING: This should only be used with -k and |
| 285 | a LIVE kernel. |
| 286 | |
| 287 | -f:: |
| 288 | --force:: |
| 289 | Don't do ownership validation. |
| 290 | |
| 291 | --symfs=<directory>:: |
| 292 | Look for files with symbols relative to this directory. |
| 293 | |
| 294 | -C:: |
| 295 | --cpu:: Only report samples for the list of CPUs provided. Multiple CPUs can |
| 296 | be provided as a comma-separated list with no space: 0,1. Ranges of |
| 297 | CPUs are specified with -: 0-2. Default is to report samples on all |
| 298 | CPUs. |
| 299 | |
| 300 | -M:: |
| 301 | --disassembler-style=:: Set disassembler style for objdump. |
| 302 | |
| 303 | --source:: |
| 304 | Interleave source code with assembly code. Enabled by default, |
| 305 | disable with --no-source. |
| 306 | |
| 307 | --asm-raw:: |
| 308 | Show raw instruction encoding of assembly instructions. |
| 309 | |
| 310 | --show-total-period:: Show a column with the sum of periods. |
| 311 | |
| 312 | -I:: |
| 313 | --show-info:: |
| 314 | Display extended information about the perf.data file. This adds |
| 315 | information which may be very large and thus may clutter the display. |
| 316 | It currently includes: cpu and numa topology of the host system. |
| 317 | |
| 318 | -b:: |
| 319 | --branch-stack:: |
| 320 | Use the addresses of sampled taken branches instead of the instruction |
| 321 | address to build the histograms. To generate meaningful output, the |
| 322 | perf.data file must have been obtained using perf record -b or |
| 323 | perf record --branch-filter xxx where xxx is a branch filter option. |
| 324 | perf report is able to auto-detect whether a perf.data file contains |
| 325 | branch stacks and it will automatically switch to the branch view mode, |
| 326 | unless --no-branch-stack is used. |
| 327 | |
| 328 | --branch-history:: |
| 329 | Add the addresses of sampled taken branches to the callstack. |
| 330 | This allows to examine the path the program took to each sample. |
| 331 | The data collection must have used -b (or -j) and -g. |
| 332 | |
| 333 | --objdump=<path>:: |
| 334 | Path to objdump binary. |
| 335 | |
| 336 | --group:: |
| 337 | Show event group information together. |
| 338 | |
| 339 | --demangle:: |
| 340 | Demangle symbol names to human readable form. It's enabled by default, |
| 341 | disable with --no-demangle. |
| 342 | |
| 343 | --demangle-kernel:: |
| 344 | Demangle kernel symbol names to human readable form (for C++ kernels). |
| 345 | |
| 346 | --mem-mode:: |
| 347 | Use the data addresses of samples in addition to instruction addresses |
| 348 | to build the histograms. To generate meaningful output, the perf.data |
| 349 | file must have been obtained using perf record -d -W and using a |
| 350 | special event -e cpu/mem-loads/ or -e cpu/mem-stores/. See |
| 351 | 'perf mem' for simpler access. |
| 352 | |
| 353 | --percent-limit:: |
| 354 | Do not show entries which have an overhead under that percent. |
| 355 | (Default: 0). Note that this option also sets the percent limit (threshold) |
| 356 | of callchains. However the default value of callchain threshold is |
| 357 | different than the default value of hist entries. Please see the |
| 358 | --call-graph option for details. |
| 359 | |
| 360 | --percentage:: |
| 361 | Determine how to display the overhead percentage of filtered entries. |
| 362 | Filters can be applied by --comms, --dsos and/or --symbols options and |
| 363 | Zoom operations on the TUI (thread, dso, etc). |
| 364 | |
| 365 | "relative" means it's relative to filtered entries only so that the |
| 366 | sum of shown entries will be always 100%. "absolute" means it retains |
| 367 | the original value before and after the filter is applied. |
| 368 | |
| 369 | --header:: |
| 370 | Show header information in the perf.data file. This includes |
| 371 | various information like hostname, OS and perf version, cpu/mem |
| 372 | info, perf command line, event list and so on. Currently only |
| 373 | --stdio output supports this feature. |
| 374 | |
| 375 | --header-only:: |
| 376 | Show only perf.data header (forces --stdio). |
| 377 | |
| 378 | --itrace:: |
| 379 | Options for decoding instruction tracing data. The options are: |
| 380 | |
| 381 | include::itrace.txt[] |
| 382 | |
| 383 | To disable decoding entirely, use --no-itrace. |
| 384 | |
| 385 | --full-source-path:: |
| 386 | Show the full path for source files for srcline output. |
| 387 | |
| 388 | --show-ref-call-graph:: |
| 389 | When multiple events are sampled, it may not be needed to collect |
| 390 | callgraphs for all of them. The sample sites are usually nearby, |
| 391 | and it's enough to collect the callgraphs on a reference event. |
| 392 | So user can use "call-graph=no" event modifier to disable callgraph |
| 393 | for other events to reduce the overhead. |
| 394 | However, perf report cannot show callgraphs for the event which |
| 395 | disable the callgraph. |
| 396 | This option extends the perf report to show reference callgraphs, |
| 397 | which collected by reference event, in no callgraph event. |
| 398 | |
| 399 | --socket-filter:: |
| 400 | Only report the samples on the processor socket that match with this filter |
| 401 | |
| 402 | --raw-trace:: |
| 403 | When displaying traceevent output, do not use print fmt or plugins. |
| 404 | |
| 405 | --hierarchy:: |
| 406 | Enable hierarchical output. |
| 407 | |
| 408 | include::callchain-overhead-calculation.txt[] |
| 409 | |
| 410 | SEE ALSO |
| 411 | -------- |
| 412 | linkperf:perf-stat[1], linkperf:perf-annotate[1] |