tools/perf/Documentation/perf-report.txt

   1 perf-report(1)
   2 ==============
   3
   4 NAME
   5 ----
   6 perf-report - Read perf.data (created by perf record) and display the profile
   7
   8 SYNOPSIS
   9 --------
  10 [verse]
  11 'perf report' [-i <file> | --input=file]
  12
  13 DESCRIPTION
  14 -----------
  15 This command displays the performance counter profile information recorded
  16 via perf record.
  17
  18 OPTIONS
  19 -------
  20 -i::
  21 --input=::
  22         Input file name. (default: perf.data unless stdin is a fifo)
  23
  24 -v::
  25 --verbose::
  26         Be more verbose. (show symbol address, etc)
  27
  28 -n::
  29 --show-nr-samples::
  30         Show the number of samples for each symbol
  31
  32 --showcpuutilization::
  33         Show sample percentage for different cpu modes.
  34
  35 -T::
  36 --threads::
  37         Show per-thread event counters.  The input data file should be recorded
  38         with -s option.
  39 -c::
  40 --comms=::
  41         Only consider symbols in these comms. CSV that understands
  42         file://filename entries.  This option will affect the percentage of
  43         the overhead column.  See --percentage for more info.
  44 --pid=::
  45         Only show events for given process ID (comma separated list).
  46
  47 --tid=::
  48         Only show events for given thread ID (comma separated list).
  49 -d::
  50 --dsos=::
  51         Only consider symbols in these dsos. CSV that understands
  52         file://filename entries.  This option will affect the percentage of
  53         the overhead column.  See --percentage for more info.
  54 -S::
  55 --symbols=::
  56         Only consider these symbols. CSV that understands
  57         file://filename entries.  This option will affect the percentage of
  58         the overhead column.  See --percentage for more info.
  59
  60 --symbol-filter=::
  61         Only show symbols that match (partially) with this filter.
  62
  63 -U::
  64 --hide-unresolved::
  65         Only display entries resolved to a symbol.
  66
  67 -s::
  68 --sort=::
  69         Sort histogram entries by given key(s) - multiple keys can be specified
  70         in CSV format.  Following sort keys are available:
  71         pid, comm, dso, symbol, parent, cpu, srcline, weight, local_weight.
  72
  73         Each key has following meaning:
  74
  75         - comm: command (name) of the task which can be read via /proc/<pid>/comm
  76         - pid: command and tid of the task
  77         - dso: name of library or module executed at the time of sample
  78         - symbol: name of function executed at the time of sample
  79         - parent: name of function matched to the parent regex filter. Unmatched
  80         entries are displayed as "[other]".
  81         - cpu: cpu number the task ran at the time of sample
  82         - srcline: filename and line number executed at the time of sample.  The
  83         DWARF debugging info must be provided.
  84         - weight: Event specific weight, e.g. memory latency or transaction
  85         abort cost. This is the global weight.
  86         - local_weight: Local weight version of the weight above.
  87         - transaction: Transaction abort flags.
  88         - overhead: Overhead percentage of sample
  89         - overhead_sys: Overhead percentage of sample running in system mode
  90         - overhead_us: Overhead percentage of sample running in user mode
  91         - overhead_guest_sys: Overhead percentage of sample running in system mode
  92         on guest machine
  93         - overhead_guest_us: Overhead percentage of sample running in user mode on
  94         guest machine
  95         - sample: Number of sample
  96         - period: Raw number of event count of sample
  97
  98         By default, comm, dso and symbol keys are used.
  99         (i.e. --sort comm,dso,symbol)
 100
 101         If --branch-stack option is used, following sort keys are also
 102         available:
 103         dso_from, dso_to, symbol_from, symbol_to, mispredict.
 104
 105         - dso_from: name of library or module branched from
 106         - dso_to: name of library or module branched to
 107         - symbol_from: name of function branched from
 108         - symbol_to: name of function branched to
 109         - mispredict: "N" for predicted branch, "Y" for mispredicted branch
 110         - in_tx: branch in TSX transaction
 111         - abort: TSX transaction abort.
 112         - cycles: Cycles in basic block
 113
 114         And default sort keys are changed to comm, dso_from, symbol_from, dso_to
 115         and symbol_to, see '--branch-stack'.
 116
 117 -F::
 118 --fields=::
 119         Specify output field - multiple keys can be specified in CSV format.
 120         Following fields are available:
 121         overhead, overhead_sys, overhead_us, overhead_children, sample and period.
 122         Also it can contain any sort key(s).
 123
 124         By default, every sort keys not specified in -F will be appended
 125         automatically.
 126
 127         If --mem-mode option is used, following sort keys are also available
 128         (incompatible with --branch-stack):
 129         symbol_daddr, dso_daddr, locked, tlb, mem, snoop, dcacheline.
 130
 131         - symbol_daddr: name of data symbol being executed on at the time of sample
 132         - dso_daddr: name of library or module containing the data being executed
 133         on at the time of sample
 134         - locked: whether the bus was locked at the time of sample
 135         - tlb: type of tlb access for the data at the time of sample
 136         - mem: type of memory access for the data at the time of sample
 137         - snoop: type of snoop (if any) for the data at the time of sample
 138         - dcacheline: the cacheline the data address is on at the time of sample
 139
 140         And default sort keys are changed to local_weight, mem, sym, dso,
 141         symbol_daddr, dso_daddr, snoop, tlb, locked, see '--mem-mode'.
 142
 143 -p::
 144 --parent=<regex>::
 145         A regex filter to identify parent. The parent is a caller of this
 146         function and searched through the callchain, thus it requires callchain
 147         information recorded. The pattern is in the exteneded regex format and
 148         defaults to "\^sys_|^do_page_fault", see '--sort parent'.
 149
 150 -x::
 151 --exclude-other::
 152         Only display entries with parent-match.
 153
 154 -w::
 155 --column-widths=<width[,width...]>::
 156         Force each column width to the provided list, for large terminal
 157         readability.  0 means no limit (default behavior).
 158
 159 -t::
 160 --field-separator=::
 161         Use a special separator character and don't pad with spaces, replacing
 162         all occurrences of this separator in symbol names (and other output)
 163         with a '.' character, that thus it's the only non valid separator.
 164
 165 -D::
 166 --dump-raw-trace::
 167         Dump raw trace in ASCII.
 168
 169 -g [type,min[,limit],order[,key][,branch]]::
 170 --call-graph::
 171         Display call chains using type, min percent threshold, optional print
 172         limit and order.
 173         type can be either:
 174         - flat: single column, linear exposure of call chains.
 175         - graph: use a graph tree, displaying absolute overhead rates.
 176         - fractal: like graph, but displays relative rates. Each branch of
 177                  the tree is considered as a new profiled object. +
 178
 179         order can be either:
 180         - callee: callee based call graph.
 181         - caller: inverted caller based call graph.
 182
 183         key can be:
 184         - function: compare on functions
 185         - address: compare on individual code addresses
 186
 187         branch can be:
 188         - branch: include last branch information in callgraph
 189         when available. Usually more convenient to use --branch-history
 190         for this.
 191
 192         Default: fractal,0.5,callee,function.
 193
 194 --children::
 195         Accumulate callchain of children to parent entry so that then can
 196         show up in the output.  The output will have a new "Children" column
 197         and will be sorted on the data.  It requires callchains are recorded.
 198         See the `overhead calculation' section for more details.
 199
 200 --max-stack::
 201         Set the stack depth limit when parsing the callchain, anything
 202         beyond the specified depth will be ignored. This is a trade-off
 203         between information loss and faster processing especially for
 204         workloads that can have a very long callchain stack.
 205
 206         Default: 127
 207
 208 -G::
 209 --inverted::
 210         alias for inverted caller based call graph.
 211
 212 --ignore-callees=<regex>::
 213         Ignore callees of the function(s) matching the given regex.
 214         This has the effect of collecting the callers of each such
 215         function into one place in the call-graph tree.
 216
 217 --pretty=<key>::
 218         Pretty printing style.  key: normal, raw
 219
 220 --stdio:: Use the stdio interface.
 221
 222 --tui:: Use the TUI interface, that is integrated with annotate and allows
 223         zooming into DSOs or threads, among other features. Use of --tui
 224         requires a tty, if one is not present, as when piping to other
 225         commands, the stdio interface is used.
 226
 227 --gtk:: Use the GTK2 interface.
 228
 229 -k::
 230 --vmlinux=<file>::
 231         vmlinux pathname
 232
 233 --kallsyms=<file>::
 234         kallsyms pathname
 235
 236 -m::
 237 --modules::
 238         Load module symbols. WARNING: This should only be used with -k and
 239         a LIVE kernel.
 240
 241 -f::
 242 --force::
 243         Don't complain, do it.
 244
 245 --symfs=<directory>::
 246         Look for files with symbols relative to this directory.
 247
 248 -C::
 249 --cpu:: Only report samples for the list of CPUs provided. Multiple CPUs can
 250         be provided as a comma-separated list with no space: 0,1. Ranges of
 251         CPUs are specified with -: 0-2. Default is to report samples on all
 252         CPUs.
 253
 254 -M::
 255 --disassembler-style=:: Set disassembler style for objdump.
 256
 257 --source::
 258         Interleave source code with assembly code. Enabled by default,
 259         disable with --no-source.
 260
 261 --asm-raw::
 262         Show raw instruction encoding of assembly instructions.
 263
 264 --show-total-period:: Show a column with the sum of periods.
 265
 266 -I::
 267 --show-info::
 268         Display extended information about the perf.data file. This adds
 269         information which may be very large and thus may clutter the display.
 270         It currently includes: cpu and numa topology of the host system.
 271
 272 -b::
 273 --branch-stack::
 274         Use the addresses of sampled taken branches instead of the instruction
 275         address to build the histograms. To generate meaningful output, the
 276         perf.data file must have been obtained using perf record -b or
 277         perf record --branch-filter xxx where xxx is a branch filter option.
 278         perf report is able to auto-detect whether a perf.data file contains
 279         branch stacks and it will automatically switch to the branch view mode,
 280         unless --no-branch-stack is used.
 281
 282 --branch-history::
 283         Add the addresses of sampled taken branches to the callstack.
 284         This allows to examine the path the program took to each sample.
 285         The data collection must have used -b (or -j) and -g.
 286
 287 --objdump=<path>::
 288         Path to objdump binary.
 289
 290 --group::
 291         Show event group information together.
 292
 293 --demangle::
 294         Demangle symbol names to human readable form. It's enabled by default,
 295         disable with --no-demangle.
 296
 297 --demangle-kernel::
 298         Demangle kernel symbol names to human readable form (for C++ kernels).
 299
 300 --mem-mode::
 301         Use the data addresses of samples in addition to instruction addresses
 302         to build the histograms.  To generate meaningful output, the perf.data
 303         file must have been obtained using perf record -d -W and using a
 304         special event -e cpu/mem-loads/ or -e cpu/mem-stores/. See
 305         'perf mem' for simpler access.
 306
 307 --percent-limit::
 308         Do not show entries which have an overhead under that percent.
 309         (Default: 0).
 310
 311 --percentage::
 312         Determine how to display the overhead percentage of filtered entries.
 313         Filters can be applied by --comms, --dsos and/or --symbols options and
 314         Zoom operations on the TUI (thread, dso, etc).
 315
 316         "relative" means it's relative to filtered entries only so that the
 317         sum of shown entries will be always 100%.  "absolute" means it retains
 318         the original value before and after the filter is applied.
 319
 320 --header::
 321         Show header information in the perf.data file.  This includes
 322         various information like hostname, OS and perf version, cpu/mem
 323         info, perf command line, event list and so on.  Currently only
 324         --stdio output supports this feature.
 325
 326 --header-only::
 327         Show only perf.data header (forces --stdio).
 328
 329 --itrace::
 330         Options for decoding instruction tracing data. The options are:
 331
 332                 i       synthesize instructions events
 333                 b       synthesize branches events
 334                 c       synthesize branches events (calls only)
 335                 r       synthesize branches events (returns only)
 336                 x       synthesize transactions events
 337                 e       synthesize error events
 338                 d       create a debug log
 339                 g       synthesize a call chain (use with i or x)
 340
 341         The default is all events i.e. the same as --itrace=ibxe
 342
 343         In addition, the period (default 100000) for instructions events
 344         can be specified in units of:
 345
 346                 i       instructions
 347                 t       ticks
 348                 ms      milliseconds
 349                 us      microseconds
 350                 ns      nanoseconds (default)
 351
 352         Also the call chain size (default 16, max. 1024) for instructions or
 353         transactions events can be specified.
 354
 355         To disable decoding entirely, use --no-itrace.
 356
 357
 358 include::callchain-overhead-calculation.txt[]
 359
 360 SEE ALSO
 361 --------
 362 linkperf:perf-stat[1], linkperf:perf-annotate[1]