| 1 | README for GPROF |
| 2 | |
| 3 | This is the GNU profiler. It is distributed with other "binary |
| 4 | utilities" which should be in ../binutils. See ../binutils/README for |
| 5 | more general notes, including where to send bug reports. |
| 6 | |
| 7 | This file documents the changes and new features available with this |
| 8 | version of GNU gprof. |
| 9 | |
| 10 | * New Features |
| 11 | |
| 12 | o Long options |
| 13 | |
| 14 | o Supports generalized file format, without breaking backward compatibility: |
| 15 | new file format supports basic-block execution counts and non-realtime |
| 16 | histograms (see below) |
| 17 | |
| 18 | o Supports profiling at the line level: flat profiles, call-graph profiles, |
| 19 | and execution-counts can all be displayed at a level that identifies |
| 20 | individual lines rather than just functions |
| 21 | |
| 22 | o Test-coverage support (similar to Sun tcov program): source files |
| 23 | can be annotated with the number of times a function was invoked |
| 24 | or with the number of times each basic-block in a function was |
| 25 | executed |
| 26 | |
| 27 | o Generalized histograms: not just execution-time, but arbitrary |
| 28 | histograms are support (for example, performance counter based |
| 29 | profiles) |
| 30 | |
| 31 | o Powerful mechanism to select data to be included/excluded from |
| 32 | analysis and/or output |
| 33 | |
| 34 | o Support for DEC OSF/1 v3.0 |
| 35 | |
| 36 | o Full cross-platform profiling support: gprof uses BFD to support |
| 37 | arbitrary, non-native object file formats and non-native byte-orders |
| 38 | (this feature has not been tested yet) |
| 39 | |
| 40 | o In the call-graph function index, static function names are now |
| 41 | printed together with the filename in which the function was defined |
| 42 | (required bfd_find_nearest_line() support and symbolic debugging |
| 43 | information to be present in the executable file) |
| 44 | |
| 45 | o Major overhaul of source code (compiles cleanly with -Wall, etc.) |
| 46 | |
| 47 | * Supported Platforms |
| 48 | |
| 49 | The current version is known to work on: |
| 50 | |
| 51 | o DEC OSF/1 v3.0 |
| 52 | All features supported. |
| 53 | |
| 54 | o SunOS 4.1.x |
| 55 | All features supported. |
| 56 | |
| 57 | o Solaris 2.3 |
| 58 | Line-level profiling unsupported because bfd_find_nearest_line() |
| 59 | is not fully implemented for Elf binaries. |
| 60 | |
| 61 | o HP-UX 9.01 |
| 62 | Line-level profiling unsupported because bfd_find_nearest_line() |
| 63 | is not fully implemented for SOM binaries. |
| 64 | |
| 65 | * Detailed Description |
| 66 | |
| 67 | ** User Interface Changes |
| 68 | |
| 69 | The command-line interface is backwards compatible with earlier |
| 70 | versions of GNU gprof and Berkeley gprof. The only exception is |
| 71 | the option to delete arcs from the call graph. The old syntax |
| 72 | was: |
| 73 | |
| 74 | -k fromname toname |
| 75 | |
| 76 | while the new syntax is: |
| 77 | |
| 78 | -k fromname/toname |
| 79 | |
| 80 | This change was necessary to be compatible with long-option parsing. |
| 81 | Also, "fromname" and "toname" can now be arbitrary symspecs rather |
| 82 | than just function names (see below for an explanation of symspecs). |
| 83 | For example, option "-k gprof.c/" suppresses all arcs due to calls out |
| 84 | of file "gprof.c". |
| 85 | |
| 86 | *** Sym Specs |
| 87 | |
| 88 | It is often necessary to apply gprof only to specific parts of a |
| 89 | program. GNU gprof has a simple but powerful mechanism to achieve |
| 90 | this. So called {\em symspecs\/} provide the foundation for this |
| 91 | mechanism. A symspec selects the parts of a profiled program to which |
| 92 | an operation should be applied to. The syntax of a symspec is |
| 93 | simple: |
| 94 | |
| 95 | filename_containing_a_dot |
| 96 | | funcname_not_containing_a_dot |
| 97 | | linenumber |
| 98 | | ( [ any_filename ] `:' ( any_funcname | linenumber ) ) |
| 99 | |
| 100 | Here are some examples: |
| 101 | |
| 102 | main.c Selects everything in file "main.c"---the |
| 103 | dot in the string tells gprof to interpret |
| 104 | the string as a filename, rather than as |
| 105 | a function name. To select a file whose |
| 106 | name does contain a dot, a trailing colon |
| 107 | should be specified. For example, "odd:" is |
| 108 | interpreted as the file named "odd". |
| 109 | |
| 110 | main Selects all functions named "main". Notice |
| 111 | that there may be multiple instances of the |
| 112 | same function name because some of the |
| 113 | definitions may be local (i.e., static). |
| 114 | Unless a function name is unique in a program, |
| 115 | you must use the colon notation explained |
| 116 | below to specify a function from a specific |
| 117 | source file. Sometimes, functionnames contain |
| 118 | dots. In such cases, it is necessary to |
| 119 | add a leading colon to the name. For example, |
| 120 | ":.mul" selects function ".mul". |
| 121 | |
| 122 | main.c:main Selects function "main" in file "main.c". |
| 123 | |
| 124 | main.c:134 Selects line 134 in file "main.c". |
| 125 | |
| 126 | IMPLEMENTATION NOTE: The source code uses the type sym_id for symspecs. |
| 127 | At some point, this probably ought to be changed to "sym_spec" to make |
| 128 | reading the code easier. |
| 129 | |
| 130 | *** Long options |
| 131 | |
| 132 | GNU gprof now supports long options. The following is a list of all |
| 133 | supported options. Options that are listed without description |
| 134 | operate in the same manner as the corresponding option in older |
| 135 | versions of gprof. |
| 136 | |
| 137 | Short Form: Long Form: |
| 138 | ----------- ---------- |
| 139 | -l --line |
| 140 | Request profiling at the line-level rather |
| 141 | than just at the function level. Source |
| 142 | lines are identified by symbols of the form: |
| 143 | |
| 144 | func (file:line) |
| 145 | |
| 146 | where "func" is the function name, "file" is the |
| 147 | file name and "line" is the line-number that |
| 148 | corresponds to the line. |
| 149 | |
| 150 | To work properly, the binary must contain symbolic |
| 151 | debugging information. This means that the source |
| 152 | have to be translated with option "-g" specified. |
| 153 | Functions for which there is no symbolic debugging |
| 154 | information available are treated as if "--line" |
| 155 | had not been specified. However, the line number |
| 156 | printed with such symbols is usually incorrect |
| 157 | and should be ignored. |
| 158 | |
| 159 | -a --no-static |
| 160 | -A[symspec] --annotated-source[=symspec] |
| 161 | Request output in the form of annotated source |
| 162 | files. If "symspec" is specified, print output only |
| 163 | for symbols selected by "symspec". If the option |
| 164 | is specified multiple times, annotated output is |
| 165 | generated for the union of all symspecs. |
| 166 | |
| 167 | Examples: |
| 168 | |
| 169 | -A Prints annotated source for all |
| 170 | source files. |
| 171 | -Agprof.c Prints annotated source for file |
| 172 | gprof.c. |
| 173 | -Afoobar Prints annotated source for files |
| 174 | containing a function named "foobar". |
| 175 | The entire file will be printed, but |
| 176 | only the function itself will be |
| 177 | annotated with profile data. |
| 178 | |
| 179 | -J[symspec] --no-annotated-source[=symspec] |
| 180 | Suppress annotated source output. If specified |
| 181 | without argument, annotated output is suppressed |
| 182 | completely. With an argument, annotated output |
| 183 | is suppressed only for the symbols selected by |
| 184 | "symspec". If the option is specified multiple |
| 185 | times, annotated output is suppressed for the |
| 186 | union of all symspecs. This option has lower |
| 187 | precedence than --annotated-source |
| 188 | |
| 189 | -p[symspec] --flat-profile[=symspec] |
| 190 | Request output in the form of a flat profile |
| 191 | (unless any other output-style option is specified, |
| 192 | this option is turned on by default). If |
| 193 | "symspec" is specified, include only symbols |
| 194 | selected by "symspec" in flat profile. If the |
| 195 | option is specified multiple times, the flat |
| 196 | profile includes symbols selected by the union |
| 197 | of all symspecs. |
| 198 | |
| 199 | -P[symspec] --no-flat-profile[=symspec] |
| 200 | Suppress output in the flat profile. If given |
| 201 | without an argument, the flat profile is suppressed |
| 202 | completely. If "symspec" is specified, suppress |
| 203 | the selected symbols in the flat profile. If the |
| 204 | option is specified multiple times, the union of |
| 205 | the selected symbols is suppressed. This option |
| 206 | has lower precedence than --flat-profile. |
| 207 | |
| 208 | -q[symspec] --graph[=symspec] |
| 209 | Request output in the form of a call-graph |
| 210 | (unless any other output-style option is specified, |
| 211 | this option is turned on by default). If "symspec" |
| 212 | is specified, include only symbols selected by |
| 213 | "symspec" in the call-graph. If the option is |
| 214 | specified multiple times, the call-graph includes |
| 215 | symbols selected by the union of all symspecs. |
| 216 | |
| 217 | -Q[symspec] --no-graph[=symspec] |
| 218 | Suppress output in the call-graph. If given without |
| 219 | an argument, the call-graph is suppressed completely. |
| 220 | With a "symspec", suppress the selected symbols |
| 221 | from the call-graph. If the option is specified |
| 222 | multiple times, the union of the selected symbols |
| 223 | is suppressed. This option has lower precedence |
| 224 | than --graph. |
| 225 | |
| 226 | -C[symspec] --exec-counts[=symspec] |
| 227 | Request output in the form of execution counts. |
| 228 | If "symspec" is present, include only symbols |
| 229 | selected by "symspec" in the execution count |
| 230 | listing. If the option is specified multiple |
| 231 | times, the execution count listing includes |
| 232 | symbols selected by the union of all symspecs. |
| 233 | |
| 234 | -Z[symspec] --no-exec-counts[=symspec] |
| 235 | Suppress output in the execution count listing. |
| 236 | If given without an argument, the listing is |
| 237 | suppressed completely. With a "symspec", suppress |
| 238 | the selected symbols from the call-graph. If the |
| 239 | option is specified multiple times, the union of |
| 240 | the selected symbols is suppressed. This option |
| 241 | has lower precedence than --exec-counts. |
| 242 | |
| 243 | -i --file-info |
| 244 | Print information about the profile files that |
| 245 | are read. The information consists of the |
| 246 | number and types of records present in the |
| 247 | profile file. Currently, a profile file can |
| 248 | contain any number and any combination of histogram, |
| 249 | call-graph, or basic-block count records. |
| 250 | |
| 251 | -s --sum |
| 252 | |
| 253 | -x --all-lines |
| 254 | This option affects annotated source output only. |
| 255 | By default, only the lines at the beginning of |
| 256 | a basic-block are annotated. If this option is |
| 257 | specified, every line in a basic-block is annotated |
| 258 | by repeating the annotation for the first line. |
| 259 | This option is identical to tcov's "-a". |
| 260 | |
| 261 | -I dirs --directory-path=dirs |
| 262 | This option affects annotated source output only. |
| 263 | Specifies the list of directories to be searched |
| 264 | for source files. The argument "dirs" is a colon |
| 265 | separated list of directories. By default, gprof |
| 266 | searches for source files relative to the current |
| 267 | working directory only. |
| 268 | |
| 269 | -z --display-unused-functions |
| 270 | |
| 271 | -m num --min-count=num |
| 272 | This option affects annotated source and execution |
| 273 | count output only. Symbols that are executed |
| 274 | less than "num" times are suppressed. For annotated |
| 275 | source output, suppressed symbols are marked |
| 276 | by five hash-marks (#####). In an execution count |
| 277 | output, suppressed symbols do not appear at all. |
| 278 | |
| 279 | -L --print-path |
| 280 | Normally, source filenames are printed with the path |
| 281 | component suppressed. With this option, gprof |
| 282 | can be forced to print the full pathname of |
| 283 | source filenames. The full pathname is determined |
| 284 | from symbolic debugging information in the image file |
| 285 | and is relative to the directory in which the compiler |
| 286 | was invoked. |
| 287 | |
| 288 | -y --separate-files |
| 289 | This option affects annotated source output only. |
| 290 | Normally, gprof prints annotated source files |
| 291 | to standard-output. If this option is specified, |
| 292 | annotated source for a file named "path/filename" |
| 293 | is generated in the file "filename-ann". That is, |
| 294 | annotated output is {\em always\/} generated in |
| 295 | gprof's current working directory. Care has to |
| 296 | be taken if a program consists of files that have |
| 297 | identical filenames, but distinct paths. |
| 298 | |
| 299 | -c --static-call-graph |
| 300 | |
| 301 | -t num --table-length=num |
| 302 | This option affects annotated source output only. |
| 303 | After annotating a source file, gprof generates |
| 304 | an execution count summary consisting of a table |
| 305 | of lines with the top execution counts. By |
| 306 | default, this table is ten entries long. |
| 307 | This option can be used to change the table length |
| 308 | or, by specifying an argument value of 0, it can be |
| 309 | suppressed completely. |
| 310 | |
| 311 | -n symspec --time=symspec |
| 312 | Only symbols selected by "symspec" are considered |
| 313 | in total and percentage time computations. |
| 314 | However, this option does not affect percentage time |
| 315 | computation for the flat profile. |
| 316 | If the option is specified multiple times, the union |
| 317 | of all selected symbols is used in time computations. |
| 318 | |
| 319 | -N --no-time=symspec |
| 320 | Exclude the symbols selected by "symspec" from |
| 321 | total and percentage time computations. |
| 322 | However, this option does not affect percentage time |
| 323 | computation for the flat profile. |
| 324 | This option is ignored if any --time options are |
| 325 | specified. |
| 326 | |
| 327 | -w num --width=num |
| 328 | Sets the output line width. Currently, this option |
| 329 | affects the printing of the call-graph function index |
| 330 | only. |
| 331 | |
| 332 | -e <no long form---for backwards compatibility only> |
| 333 | -E <no long form---for backwards compatibility only> |
| 334 | -f <no long form---for backwards compatibility only> |
| 335 | -F <no long form---for backwards compatibility only> |
| 336 | -k <no long form---for backwards compatibility only> |
| 337 | -b --brief |
| 338 | -dnum --debug[=num] |
| 339 | |
| 340 | -h --help |
| 341 | Prints a usage message. |
| 342 | |
| 343 | -O name --file-format=name |
| 344 | Selects the format of the profile data files. |
| 345 | Recognized formats are "auto", "bsd", "magic", |
| 346 | and "prof". The last one is not yet supported. |
| 347 | Format "auto" attempts to detect the file format |
| 348 | automatically (this is the default behavior). |
| 349 | It attempts to read the profile data files as |
| 350 | "magic" files and if this fails, falls back to |
| 351 | the "bsd" format. "bsd" forces gprof to read |
| 352 | the data files in the BSD format. "magic" forces |
| 353 | gprof to read the data files in the "magic" format. |
| 354 | |
| 355 | -T --traditional |
| 356 | -v --version |
| 357 | |
| 358 | ** File Format Changes |
| 359 | |
| 360 | The old BSD-derived format used for profile data does not contain a |
| 361 | magic cookie that allows to check whether a data file really is a |
| 362 | gprof file. Furthermore, it does not provide a version number, thus |
| 363 | rendering changes to the file format almost impossible. GNU gprof |
| 364 | uses a new file format that provides these features. For backward |
| 365 | compatibility, GNU gprof continues to support the old BSD-derived |
| 366 | format, but not all features are supported with it. For example, |
| 367 | basic-block execution counts cannot be accommodated by the old file |
| 368 | format. |
| 369 | |
| 370 | The new file format is defined in header file \file{gmon_out.h}. It |
| 371 | consists of a header containing the magic cookie and a version number, |
| 372 | as well as some spare bytes available for future extensions. All data |
| 373 | in a profile data file is in the native format of the host on which |
| 374 | the profile was collected. GNU gprof adapts automatically to the |
| 375 | byte-order in use. |
| 376 | |
| 377 | In the new file format, the header is followed by a sequence of |
| 378 | records. Currently, there are three different record types: histogram |
| 379 | records, call-graph arc records, and basic-block execution count |
| 380 | records. Each file can contain any number of each record type. When |
| 381 | reading a file, GNU gprof will ensure records of the same type are |
| 382 | compatible with each other and compute the union of all records. For |
| 383 | example, for basic-block execution counts, the union is simply the sum |
| 384 | of all execution counts for each basic-block. |
| 385 | |
| 386 | *** Histogram Records |
| 387 | |
| 388 | Histogram records consist of a header that is followed by an array of |
| 389 | bins. The header contains the text-segment range that the histogram |
| 390 | spans, the size of the histogram in bytes (unlike in the old BSD |
| 391 | format, this does not include the size of the header), the rate of the |
| 392 | profiling clock, and the physical dimension that the bin counts |
| 393 | represent after being scaled by the profiling clock rate. The |
| 394 | physical dimension is specified in two parts: a long name of up to 15 |
| 395 | characters and a single character abbreviation. For example, a |
| 396 | histogram representing real-time would specify the long name as |
| 397 | "seconds" and the abbreviation as "s". This feature is useful for |
| 398 | architectures that support performance monitor hardware (which, |
| 399 | fortunately, is becoming increasingly common). For example, under DEC |
| 400 | OSF/1, the "uprofile" command can be used to produce a histogram of, |
| 401 | say, instruction cache misses. In this case, the dimension in the |
| 402 | histogram header could be set to "i-cache misses" and the abbreviation |
| 403 | could be set to "1" (because it is simply a count, not a physical |
| 404 | dimension). Also, the profiling rate would have to be set to 1 in |
| 405 | this case. |
| 406 | |
| 407 | Histogram bins are 16-bit numbers and each bin represent an equal |
| 408 | amount of text-space. For example, if the text-segment is one |
| 409 | thousand bytes long and if there are ten bins in the histogram, each |
| 410 | bin represents one hundred bytes. |
| 411 | |
| 412 | |
| 413 | *** Call-Graph Records |
| 414 | |
| 415 | Call-graph records have a format that is identical to the one used in |
| 416 | the BSD-derived file format. It consists of an arc in the call graph |
| 417 | and a count indicating the number of times the arc was traversed |
| 418 | during program execution. Arcs are specified by a pair of addresses: |
| 419 | the first must be within caller's function and the second must be |
| 420 | within the callee's function. When performing profiling at the |
| 421 | function level, these addresses can point anywhere within the |
| 422 | respective function. However, when profiling at the line-level, it is |
| 423 | better if the addresses are as close to the call-site/entry-point as |
| 424 | possible. This will ensure that the line-level call-graph is able to |
| 425 | identify exactly which line of source code performed calls to a |
| 426 | function. |
| 427 | |
| 428 | *** Basic-Block Execution Count Records |
| 429 | |
| 430 | Basic-block execution count records consist of a header followed by a |
| 431 | sequence of address/count pairs. The header simply specifies the |
| 432 | length of the sequence. In an address/count pair, the address |
| 433 | identifies a basic-block and the count specifies the number of times |
| 434 | that basic-block was executed. Any address within the basic-address can |
| 435 | be used. |
| 436 | |
| 437 | IMPLEMENTATION NOTE: gcc -a can be used to instrument a program to |
| 438 | record basic-block execution counts. However, the __bb_exit_func() |
| 439 | that is currently present in libgcc2.c does not generate a gmon.out |
| 440 | file in a suitable format. This should be fixed for future releases |
| 441 | of gcc. In the meantime, contact davidm@cs.arizona.edu for a version |
| 442 | of __bb_exit_func() to is appropriate. |