| 1 | perf-report(1) |
| 2 | ============== |
| 3 | |
| 4 | NAME |
| 5 | ---- |
| 6 | perf-report - Read perf.data (created by perf record) and display the profile |
| 7 | |
| 8 | SYNOPSIS |
| 9 | -------- |
| 10 | [verse] |
| 11 | 'perf report' [-i <file> | --input=file] |
| 12 | |
| 13 | DESCRIPTION |
| 14 | ----------- |
| 15 | This command displays the performance counter profile information recorded |
| 16 | via perf record. |
| 17 | |
| 18 | OPTIONS |
| 19 | ------- |
| 20 | -i:: |
| 21 | --input=:: |
| 22 | Input file name. (default: perf.data unless stdin is a fifo) |
| 23 | |
| 24 | -v:: |
| 25 | --verbose:: |
| 26 | Be more verbose. (show symbol address, etc) |
| 27 | |
| 28 | -n:: |
| 29 | --show-nr-samples:: |
| 30 | Show the number of samples for each symbol |
| 31 | |
| 32 | --showcpuutilization:: |
| 33 | Show sample percentage for different cpu modes. |
| 34 | |
| 35 | -T:: |
| 36 | --threads:: |
| 37 | Show per-thread event counters |
| 38 | -c:: |
| 39 | --comms=:: |
| 40 | Only consider symbols in these comms. CSV that understands |
| 41 | file://filename entries. This option will affect the percentage of |
| 42 | the overhead column. See --percentage for more info. |
| 43 | --pid=:: |
| 44 | Only show events for given process ID (comma separated list). |
| 45 | |
| 46 | --tid=:: |
| 47 | Only show events for given thread ID (comma separated list). |
| 48 | -d:: |
| 49 | --dsos=:: |
| 50 | Only consider symbols in these dsos. CSV that understands |
| 51 | file://filename entries. This option will affect the percentage of |
| 52 | the overhead column. See --percentage for more info. |
| 53 | -S:: |
| 54 | --symbols=:: |
| 55 | Only consider these symbols. CSV that understands |
| 56 | file://filename entries. This option will affect the percentage of |
| 57 | the overhead column. See --percentage for more info. |
| 58 | |
| 59 | --symbol-filter=:: |
| 60 | Only show symbols that match (partially) with this filter. |
| 61 | |
| 62 | -U:: |
| 63 | --hide-unresolved:: |
| 64 | Only display entries resolved to a symbol. |
| 65 | |
| 66 | -s:: |
| 67 | --sort=:: |
| 68 | Sort histogram entries by given key(s) - multiple keys can be specified |
| 69 | in CSV format. Following sort keys are available: |
| 70 | pid, comm, dso, symbol, parent, cpu, srcline, weight, local_weight. |
| 71 | |
| 72 | Each key has following meaning: |
| 73 | |
| 74 | - comm: command (name) of the task which can be read via /proc/<pid>/comm |
| 75 | - pid: command and tid of the task |
| 76 | - dso: name of library or module executed at the time of sample |
| 77 | - symbol: name of function executed at the time of sample |
| 78 | - parent: name of function matched to the parent regex filter. Unmatched |
| 79 | entries are displayed as "[other]". |
| 80 | - cpu: cpu number the task ran at the time of sample |
| 81 | - srcline: filename and line number executed at the time of sample. The |
| 82 | DWARF debugging info must be provided. |
| 83 | - weight: Event specific weight, e.g. memory latency or transaction |
| 84 | abort cost. This is the global weight. |
| 85 | - local_weight: Local weight version of the weight above. |
| 86 | - transaction: Transaction abort flags. |
| 87 | - overhead: Overhead percentage of sample |
| 88 | - overhead_sys: Overhead percentage of sample running in system mode |
| 89 | - overhead_us: Overhead percentage of sample running in user mode |
| 90 | - overhead_guest_sys: Overhead percentage of sample running in system mode |
| 91 | on guest machine |
| 92 | - overhead_guest_us: Overhead percentage of sample running in user mode on |
| 93 | guest machine |
| 94 | - sample: Number of sample |
| 95 | - period: Raw number of event count of sample |
| 96 | |
| 97 | By default, comm, dso and symbol keys are used. |
| 98 | (i.e. --sort comm,dso,symbol) |
| 99 | |
| 100 | If --branch-stack option is used, following sort keys are also |
| 101 | available: |
| 102 | dso_from, dso_to, symbol_from, symbol_to, mispredict. |
| 103 | |
| 104 | - dso_from: name of library or module branched from |
| 105 | - dso_to: name of library or module branched to |
| 106 | - symbol_from: name of function branched from |
| 107 | - symbol_to: name of function branched to |
| 108 | - mispredict: "N" for predicted branch, "Y" for mispredicted branch |
| 109 | - in_tx: branch in TSX transaction |
| 110 | - abort: TSX transaction abort. |
| 111 | |
| 112 | And default sort keys are changed to comm, dso_from, symbol_from, dso_to |
| 113 | and symbol_to, see '--branch-stack'. |
| 114 | |
| 115 | -F:: |
| 116 | --fields=:: |
| 117 | Specify output field - multiple keys can be specified in CSV format. |
| 118 | Following fields are available: |
| 119 | overhead, overhead_sys, overhead_us, overhead_children, sample and period. |
| 120 | Also it can contain any sort key(s). |
| 121 | |
| 122 | By default, every sort keys not specified in -F will be appended |
| 123 | automatically. |
| 124 | |
| 125 | If --mem-mode option is used, following sort keys are also available |
| 126 | (incompatible with --branch-stack): |
| 127 | symbol_daddr, dso_daddr, locked, tlb, mem, snoop, dcacheline. |
| 128 | |
| 129 | - symbol_daddr: name of data symbol being executed on at the time of sample |
| 130 | - dso_daddr: name of library or module containing the data being executed |
| 131 | on at the time of sample |
| 132 | - locked: whether the bus was locked at the time of sample |
| 133 | - tlb: type of tlb access for the data at the time of sample |
| 134 | - mem: type of memory access for the data at the time of sample |
| 135 | - snoop: type of snoop (if any) for the data at the time of sample |
| 136 | - dcacheline: the cacheline the data address is on at the time of sample |
| 137 | |
| 138 | And default sort keys are changed to local_weight, mem, sym, dso, |
| 139 | symbol_daddr, dso_daddr, snoop, tlb, locked, see '--mem-mode'. |
| 140 | |
| 141 | -p:: |
| 142 | --parent=<regex>:: |
| 143 | A regex filter to identify parent. The parent is a caller of this |
| 144 | function and searched through the callchain, thus it requires callchain |
| 145 | information recorded. The pattern is in the exteneded regex format and |
| 146 | defaults to "\^sys_|^do_page_fault", see '--sort parent'. |
| 147 | |
| 148 | -x:: |
| 149 | --exclude-other:: |
| 150 | Only display entries with parent-match. |
| 151 | |
| 152 | -w:: |
| 153 | --column-widths=<width[,width...]>:: |
| 154 | Force each column width to the provided list, for large terminal |
| 155 | readability. 0 means no limit (default behavior). |
| 156 | |
| 157 | -t:: |
| 158 | --field-separator=:: |
| 159 | Use a special separator character and don't pad with spaces, replacing |
| 160 | all occurrences of this separator in symbol names (and other output) |
| 161 | with a '.' character, that thus it's the only non valid separator. |
| 162 | |
| 163 | -D:: |
| 164 | --dump-raw-trace:: |
| 165 | Dump raw trace in ASCII. |
| 166 | |
| 167 | -g [type,min[,limit],order[,key][,branch]]:: |
| 168 | --call-graph:: |
| 169 | Display call chains using type, min percent threshold, optional print |
| 170 | limit and order. |
| 171 | type can be either: |
| 172 | - flat: single column, linear exposure of call chains. |
| 173 | - graph: use a graph tree, displaying absolute overhead rates. |
| 174 | - fractal: like graph, but displays relative rates. Each branch of |
| 175 | the tree is considered as a new profiled object. + |
| 176 | |
| 177 | order can be either: |
| 178 | - callee: callee based call graph. |
| 179 | - caller: inverted caller based call graph. |
| 180 | |
| 181 | key can be: |
| 182 | - function: compare on functions |
| 183 | - address: compare on individual code addresses |
| 184 | |
| 185 | branch can be: |
| 186 | - branch: include last branch information in callgraph |
| 187 | when available. Usually more convenient to use --branch-history |
| 188 | for this. |
| 189 | |
| 190 | Default: fractal,0.5,callee,function. |
| 191 | |
| 192 | --children:: |
| 193 | Accumulate callchain of children to parent entry so that then can |
| 194 | show up in the output. The output will have a new "Children" column |
| 195 | and will be sorted on the data. It requires callchains are recorded. |
| 196 | |
| 197 | --max-stack:: |
| 198 | Set the stack depth limit when parsing the callchain, anything |
| 199 | beyond the specified depth will be ignored. This is a trade-off |
| 200 | between information loss and faster processing especially for |
| 201 | workloads that can have a very long callchain stack. |
| 202 | |
| 203 | Default: 127 |
| 204 | |
| 205 | -G:: |
| 206 | --inverted:: |
| 207 | alias for inverted caller based call graph. |
| 208 | |
| 209 | --ignore-callees=<regex>:: |
| 210 | Ignore callees of the function(s) matching the given regex. |
| 211 | This has the effect of collecting the callers of each such |
| 212 | function into one place in the call-graph tree. |
| 213 | |
| 214 | --pretty=<key>:: |
| 215 | Pretty printing style. key: normal, raw |
| 216 | |
| 217 | --stdio:: Use the stdio interface. |
| 218 | |
| 219 | --tui:: Use the TUI interface, that is integrated with annotate and allows |
| 220 | zooming into DSOs or threads, among other features. Use of --tui |
| 221 | requires a tty, if one is not present, as when piping to other |
| 222 | commands, the stdio interface is used. |
| 223 | |
| 224 | --gtk:: Use the GTK2 interface. |
| 225 | |
| 226 | -k:: |
| 227 | --vmlinux=<file>:: |
| 228 | vmlinux pathname |
| 229 | |
| 230 | --kallsyms=<file>:: |
| 231 | kallsyms pathname |
| 232 | |
| 233 | -m:: |
| 234 | --modules:: |
| 235 | Load module symbols. WARNING: This should only be used with -k and |
| 236 | a LIVE kernel. |
| 237 | |
| 238 | -f:: |
| 239 | --force:: |
| 240 | Don't complain, do it. |
| 241 | |
| 242 | --symfs=<directory>:: |
| 243 | Look for files with symbols relative to this directory. |
| 244 | |
| 245 | -C:: |
| 246 | --cpu:: Only report samples for the list of CPUs provided. Multiple CPUs can |
| 247 | be provided as a comma-separated list with no space: 0,1. Ranges of |
| 248 | CPUs are specified with -: 0-2. Default is to report samples on all |
| 249 | CPUs. |
| 250 | |
| 251 | -M:: |
| 252 | --disassembler-style=:: Set disassembler style for objdump. |
| 253 | |
| 254 | --source:: |
| 255 | Interleave source code with assembly code. Enabled by default, |
| 256 | disable with --no-source. |
| 257 | |
| 258 | --asm-raw:: |
| 259 | Show raw instruction encoding of assembly instructions. |
| 260 | |
| 261 | --show-total-period:: Show a column with the sum of periods. |
| 262 | |
| 263 | -I:: |
| 264 | --show-info:: |
| 265 | Display extended information about the perf.data file. This adds |
| 266 | information which may be very large and thus may clutter the display. |
| 267 | It currently includes: cpu and numa topology of the host system. |
| 268 | |
| 269 | -b:: |
| 270 | --branch-stack:: |
| 271 | Use the addresses of sampled taken branches instead of the instruction |
| 272 | address to build the histograms. To generate meaningful output, the |
| 273 | perf.data file must have been obtained using perf record -b or |
| 274 | perf record --branch-filter xxx where xxx is a branch filter option. |
| 275 | perf report is able to auto-detect whether a perf.data file contains |
| 276 | branch stacks and it will automatically switch to the branch view mode, |
| 277 | unless --no-branch-stack is used. |
| 278 | |
| 279 | --branch-history:: |
| 280 | Add the addresses of sampled taken branches to the callstack. |
| 281 | This allows to examine the path the program took to each sample. |
| 282 | The data collection must have used -b (or -j) and -g. |
| 283 | |
| 284 | --objdump=<path>:: |
| 285 | Path to objdump binary. |
| 286 | |
| 287 | --group:: |
| 288 | Show event group information together. |
| 289 | |
| 290 | --demangle:: |
| 291 | Demangle symbol names to human readable form. It's enabled by default, |
| 292 | disable with --no-demangle. |
| 293 | |
| 294 | --demangle-kernel:: |
| 295 | Demangle kernel symbol names to human readable form (for C++ kernels). |
| 296 | |
| 297 | --mem-mode:: |
| 298 | Use the data addresses of samples in addition to instruction addresses |
| 299 | to build the histograms. To generate meaningful output, the perf.data |
| 300 | file must have been obtained using perf record -d -W and using a |
| 301 | special event -e cpu/mem-loads/ or -e cpu/mem-stores/. See |
| 302 | 'perf mem' for simpler access. |
| 303 | |
| 304 | --percent-limit:: |
| 305 | Do not show entries which have an overhead under that percent. |
| 306 | (Default: 0). |
| 307 | |
| 308 | --percentage:: |
| 309 | Determine how to display the overhead percentage of filtered entries. |
| 310 | Filters can be applied by --comms, --dsos and/or --symbols options and |
| 311 | Zoom operations on the TUI (thread, dso, etc). |
| 312 | |
| 313 | "relative" means it's relative to filtered entries only so that the |
| 314 | sum of shown entries will be always 100%. "absolute" means it retains |
| 315 | the original value before and after the filter is applied. |
| 316 | |
| 317 | --header:: |
| 318 | Show header information in the perf.data file. This includes |
| 319 | various information like hostname, OS and perf version, cpu/mem |
| 320 | info, perf command line, event list and so on. Currently only |
| 321 | --stdio output supports this feature. |
| 322 | |
| 323 | --header-only:: |
| 324 | Show only perf.data header (forces --stdio). |
| 325 | |
| 326 | SEE ALSO |
| 327 | -------- |
| 328 | linkperf:perf-stat[1], linkperf:perf-annotate[1] |