perf counters: update docs
[deliverable/linux.git] / Documentation / perf-counters.txt
CommitLineData
e7bc62b6
IM
1
2Performance Counters for Linux
3------------------------------
4
5Performance counters are special hardware registers available on most modern
6CPUs. These registers count the number of certain types of hw events: such
7as instructions executed, cachemisses suffered, or branches mis-predicted -
8without slowing down the kernel or applications. These registers can also
9trigger interrupts when a threshold number of events have passed - and can
10thus be used to profile the code that runs on that CPU.
11
12The Linux Performance Counter subsystem provides an abstraction of these
447557ac
IM
13hardware capabilities. It provides per task and per CPU counters, counter
14groups, and it provides event capabilities on top of those.
e7bc62b6
IM
15
16Performance counters are accessed via special file descriptors.
17There's one file descriptor per virtual counter used.
18
19The special file descriptor is opened via the perf_counter_open()
20system call:
21
447557ac
IM
22 int sys_perf_counter_open(struct perf_counter_hw_event *hw_event_uptr,
23 pid_t pid, int cpu, int group_fd);
e7bc62b6
IM
24
25The syscall returns the new fd. The fd can be used via the normal
26VFS system calls: read() can be used to read the counter, fcntl()
27can be used to set the blocking mode, etc.
28
29Multiple counters can be kept open at a time, and the counters
30can be poll()ed.
31
447557ac
IM
32When creating a new counter fd, 'perf_counter_hw_event' is:
33
34/*
35 * Hardware event to monitor via a performance monitoring counter:
36 */
37struct perf_counter_hw_event {
38 s64 type;
39
40 u64 irq_period;
41 u32 record_type;
42
43 u32 disabled : 1, /* off by default */
44 nmi : 1, /* NMI sampling */
45 raw : 1, /* raw event type */
46 __reserved_1 : 29;
47
48 u64 __reserved_2;
49};
50
51/*
52 * Generalized performance counter event types, used by the hw_event.type
53 * parameter of the sys_perf_counter_open() syscall:
54 */
55enum hw_event_types {
56 /*
57 * Common hardware events, generalized by the kernel:
58 */
59 PERF_COUNT_CYCLES = 0,
60 PERF_COUNT_INSTRUCTIONS = 1,
61 PERF_COUNT_CACHE_REFERENCES = 2,
62 PERF_COUNT_CACHE_MISSES = 3,
63 PERF_COUNT_BRANCH_INSTRUCTIONS = 4,
64 PERF_COUNT_BRANCH_MISSES = 5,
65
66 /*
67 * Special "software" counters provided by the kernel, even if
68 * the hardware does not support performance counters. These
69 * counters measure various physical and sw events of the
70 * kernel (and allow the profiling of them as well):
71 */
72 PERF_COUNT_CPU_CLOCK = -1,
73 PERF_COUNT_TASK_CLOCK = -2,
74 /*
75 * Future software events:
76 */
77 /* PERF_COUNT_PAGE_FAULTS = -3,
78 PERF_COUNT_CONTEXT_SWITCHES = -4, */
79};
e7bc62b6
IM
80
81These are standardized types of events that work uniformly on all CPUs
82that implements Performance Counters support under Linux. If a CPU is
83not able to count branch-misses, then the system call will return
84-EINVAL.
85
447557ac
IM
86More hw_event_types are supported as well, but they are CPU
87specific and are enumerated via /sys on a per CPU basis. Raw hw event
88types can be passed in under hw_event.type if hw_event.raw is 1.
89For example, to count "External bus cycles while bus lock signal asserted"
90events on Intel Core CPUs, pass in a 0x4064 event type value and set
91hw_event.raw to 1.
e7bc62b6
IM
92
93'record_type' is the type of data that a read() will provide for the
94counter, and it can be one of:
95
447557ac
IM
96/*
97 * IRQ-notification data record type:
98 */
99enum perf_counter_record_type {
100 PERF_RECORD_SIMPLE = 0,
101 PERF_RECORD_IRQ = 1,
102 PERF_RECORD_GROUP = 2,
103};
e7bc62b6
IM
104
105a "simple" counter is one that counts hardware events and allows
106them to be read out into a u64 count value. (read() returns 8 on
107a successful read of a simple counter.)
108
109An "irq" counter is one that will also provide an IRQ context information:
110the IP of the interrupted context. In this case read() will return
111the 8-byte counter value, plus the Instruction Pointer address of the
112interrupted context.
113
447557ac
IM
114The parameter 'hw_event_period' is the number of events before waking up
115a read() that is blocked on a counter fd. Zero value means a non-blocking
116counter.
117
e7bc62b6
IM
118The 'pid' parameter allows the counter to be specific to a task:
119
120 pid == 0: if the pid parameter is zero, the counter is attached to the
121 current task.
122
123 pid > 0: the counter is attached to a specific task (if the current task
124 has sufficient privilege to do so)
125
126 pid < 0: all tasks are counted (per cpu counters)
127
128The 'cpu' parameter allows a counter to be made specific to a full
129CPU:
130
131 cpu >= 0: the counter is restricted to a specific CPU
132 cpu == -1: the counter counts on all CPUs
133
447557ac 134(Note: the combination of 'pid == -1' and 'cpu == -1' is not valid.)
e7bc62b6
IM
135
136A 'pid > 0' and 'cpu == -1' counter is a per task counter that counts
137events of that task and 'follows' that task to whatever CPU the task
138gets schedule to. Per task counters can be created by any user, for
139their own tasks.
140
141A 'pid == -1' and 'cpu == x' counter is a per CPU counter that counts
142all events on CPU-x. Per CPU counters need CAP_SYS_ADMIN privilege.
143
447557ac
IM
144Group counters are created by passing in a group_fd of another counter.
145Groups are scheduled at once and can be used with PERF_RECORD_GROUP
146to record multi-dimensional timestamps.
147
This page took 0.045416 seconds and 5 git commands to generate.