Commit | Line | Data |
---|---|---|
1da177e4 LT |
1 | CPU frequency and voltage scaling code in the Linux(TM) kernel |
2 | ||
3 | ||
4 | L i n u x C P U F r e q | |
5 | ||
6 | C P U F r e q G o v e r n o r s | |
7 | ||
8 | - information for users and developers - | |
9 | ||
10 | ||
11 | Dominik Brodowski <linux@brodo.de> | |
594dd2c9 | 12 | some additions and corrections by Nico Golde <nico@ngolde.de> |
1da177e4 LT |
13 | |
14 | ||
15 | ||
16 | Clock scaling allows you to change the clock speed of the CPUs on the | |
17 | fly. This is a nice method to save battery power, because the lower | |
18 | the clock speed, the less power the CPU consumes. | |
19 | ||
20 | ||
21 | Contents: | |
22 | --------- | |
23 | 1. What is a CPUFreq Governor? | |
24 | ||
25 | 2. Governors In the Linux Kernel | |
26 | 2.1 Performance | |
27 | 2.2 Powersave | |
28 | 2.3 Userspace | |
594dd2c9 | 29 | 2.4 Ondemand |
537208c8 | 30 | 2.5 Conservative |
1da177e4 LT |
31 | |
32 | 3. The Governor Interface in the CPUfreq Core | |
33 | ||
34 | ||
35 | ||
36 | 1. What Is A CPUFreq Governor? | |
37 | ============================== | |
38 | ||
39 | Most cpufreq drivers (in fact, all except one, longrun) or even most | |
40 | cpu frequency scaling algorithms only offer the CPU to be set to one | |
41 | frequency. In order to offer dynamic frequency scaling, the cpufreq | |
42 | core must be able to tell these drivers of a "target frequency". So | |
43 | these specific drivers will be transformed to offer a "->target" | |
44 | call instead of the existing "->setpolicy" call. For "longrun", all | |
45 | stays the same, though. | |
46 | ||
47 | How to decide what frequency within the CPUfreq policy should be used? | |
48 | That's done using "cpufreq governors". Two are already in this patch | |
49 | -- they're the already existing "powersave" and "performance" which | |
50 | set the frequency statically to the lowest or highest frequency, | |
51 | respectively. At least two more such governors will be ready for | |
52 | addition in the near future, but likely many more as there are various | |
53 | different theories and models about dynamic frequency scaling | |
54 | around. Using such a generic interface as cpufreq offers to scaling | |
55 | governors, these can be tested extensively, and the best one can be | |
56 | selected for each specific use. | |
57 | ||
58 | Basically, it's the following flow graph: | |
59 | ||
2fe0ae78 | 60 | CPU can be set to switch independently | CPU can only be set |
1da177e4 LT |
61 | within specific "limits" | to specific frequencies |
62 | ||
63 | "CPUfreq policy" | |
64 | consists of frequency limits (policy->{min,max}) | |
65 | and CPUfreq governor to be used | |
66 | / \ | |
67 | / \ | |
68 | / the cpufreq governor decides | |
69 | / (dynamically or statically) | |
70 | / what target_freq to set within | |
71 | / the limits of policy->{min,max} | |
72 | / \ | |
73 | / \ | |
74 | Using the ->setpolicy call, Using the ->target call, | |
75 | the limits and the the frequency closest | |
76 | "policy" is set. to target_freq is set. | |
77 | It is assured that it | |
78 | is within policy->{min,max} | |
79 | ||
80 | ||
81 | 2. Governors In the Linux Kernel | |
82 | ================================ | |
83 | ||
84 | 2.1 Performance | |
85 | --------------- | |
86 | ||
87 | The CPUfreq governor "performance" sets the CPU statically to the | |
88 | highest frequency within the borders of scaling_min_freq and | |
89 | scaling_max_freq. | |
90 | ||
91 | ||
594dd2c9 | 92 | 2.2 Powersave |
1da177e4 LT |
93 | ------------- |
94 | ||
95 | The CPUfreq governor "powersave" sets the CPU statically to the | |
96 | lowest frequency within the borders of scaling_min_freq and | |
97 | scaling_max_freq. | |
98 | ||
99 | ||
594dd2c9 | 100 | 2.3 Userspace |
1da177e4 LT |
101 | ------------- |
102 | ||
103 | The CPUfreq governor "userspace" allows the user, or any userspace | |
104 | program running with UID "root", to set the CPU to a specific frequency | |
105 | by making a sysfs file "scaling_setspeed" available in the CPU-device | |
106 | directory. | |
107 | ||
108 | ||
594dd2c9 NG |
109 | 2.4 Ondemand |
110 | ------------ | |
111 | ||
a2ffd275 | 112 | The CPUfreq governor "ondemand" sets the CPU depending on the |
594dd2c9 | 113 | current usage. To do this the CPU must have the capability to |
537208c8 AC |
114 | switch the frequency very quickly. There are a number of sysfs file |
115 | accessible parameters: | |
116 | ||
117 | sampling_rate: measured in uS (10^-6 seconds), this is how often you | |
118 | want the kernel to look at the CPU usage and to make decisions on | |
119 | what to do about the frequency. Typically this is set to values of | |
120 | around '10000' or more. | |
121 | ||
122 | show_sampling_rate_(min|max): the minimum and maximum sampling rates | |
123 | available that you may set 'sampling_rate' to. | |
124 | ||
125 | up_threshold: defines what the average CPU usaged between the samplings | |
126 | of 'sampling_rate' needs to be for the kernel to make a decision on | |
127 | whether it should increase the frequency. For example when it is set | |
128 | to its default value of '80' it means that between the checking | |
129 | intervals the CPU needs to be on average more than 80% in use to then | |
130 | decide that the CPU frequency needs to be increased. | |
131 | ||
132 | sampling_down_factor: this parameter controls the rate that the CPU | |
133 | makes a decision on when to decrease the frequency. When set to its | |
134 | default value of '5' it means that at 1/5 the sampling_rate the kernel | |
135 | makes a decision to lower the frequency. Five "lower rate" decisions | |
136 | have to be made in a row before the CPU frequency is actually lower. | |
137 | If set to '1' then the frequency decreases as quickly as it increases, | |
138 | if set to '2' it decreases at half the rate of the increase. | |
139 | ||
992caacf ML |
140 | ignore_nice_load: this parameter takes a value of '0' or '1'. When |
141 | set to '0' (its default), all processes are counted towards the | |
142 | 'cpu utilisation' value. When set to '1', the processes that are | |
537208c8 | 143 | run with a 'nice' value will not count (and thus be ignored) in the |
992caacf | 144 | overall usage calculation. This is useful if you are running a CPU |
537208c8 AC |
145 | intensive calculation on your laptop that you do not care how long it |
146 | takes to complete as you can 'nice' it and prevent it from taking part | |
147 | in the deciding process of whether to increase your CPU frequency. | |
148 | ||
149 | ||
150 | 2.5 Conservative | |
151 | ---------------- | |
152 | ||
153 | The CPUfreq governor "conservative", much like the "ondemand" | |
154 | governor, sets the CPU depending on the current usage. It differs in | |
155 | behaviour in that it gracefully increases and decreases the CPU speed | |
156 | rather than jumping to max speed the moment there is any load on the | |
157 | CPU. This behaviour more suitable in a battery powered environment. | |
158 | The governor is tweaked in the same manner as the "ondemand" governor | |
159 | through sysfs with the addition of: | |
160 | ||
161 | freq_step: this describes what percentage steps the cpu freq should be | |
162 | increased and decreased smoothly by. By default the cpu frequency will | |
163 | increase in 5% chunks of your maximum cpu frequency. You can change this | |
164 | value to anywhere between 0 and 100 where '0' will effectively lock your | |
165 | CPU at a speed regardless of its load whilst '100' will, in theory, make | |
166 | it behave identically to the "ondemand" governor. | |
167 | ||
168 | down_threshold: same as the 'up_threshold' found for the "ondemand" | |
169 | governor but for the opposite direction. For example when set to its | |
170 | default value of '20' it means that if the CPU usage needs to be below | |
171 | 20% between samples to have the frequency decreased. | |
1da177e4 LT |
172 | |
173 | 3. The Governor Interface in the CPUfreq Core | |
174 | ============================================= | |
175 | ||
176 | A new governor must register itself with the CPUfreq core using | |
177 | "cpufreq_register_governor". The struct cpufreq_governor, which has to | |
178 | be passed to that function, must contain the following values: | |
179 | ||
180 | governor->name - A unique name for this governor | |
181 | governor->governor - The governor callback function | |
182 | governor->owner - .THIS_MODULE for the governor module (if | |
183 | appropriate) | |
184 | ||
185 | The governor->governor callback is called with the current (or to-be-set) | |
186 | cpufreq_policy struct for that CPU, and an unsigned int event. The | |
187 | following events are currently defined: | |
188 | ||
189 | CPUFREQ_GOV_START: This governor shall start its duty for the CPU | |
190 | policy->cpu | |
191 | CPUFREQ_GOV_STOP: This governor shall end its duty for the CPU | |
192 | policy->cpu | |
193 | CPUFREQ_GOV_LIMITS: The limits for CPU policy->cpu have changed to | |
194 | policy->min and policy->max. | |
195 | ||
196 | If you need other "events" externally of your driver, _only_ use the | |
197 | cpufreq_governor_l(unsigned int cpu, unsigned int event) call to the | |
198 | CPUfreq core to ensure proper locking. | |
199 | ||
200 | ||
201 | The CPUfreq governor may call the CPU processor driver using one of | |
202 | these two functions: | |
203 | ||
204 | int cpufreq_driver_target(struct cpufreq_policy *policy, | |
205 | unsigned int target_freq, | |
206 | unsigned int relation); | |
207 | ||
208 | int __cpufreq_driver_target(struct cpufreq_policy *policy, | |
209 | unsigned int target_freq, | |
210 | unsigned int relation); | |
211 | ||
212 | target_freq must be within policy->min and policy->max, of course. | |
213 | What's the difference between these two functions? When your governor | |
214 | still is in a direct code path of a call to governor->governor, the | |
215 | per-CPU cpufreq lock is still held in the cpufreq core, and there's | |
216 | no need to lock it again (in fact, this would cause a deadlock). So | |
217 | use __cpufreq_driver_target only in these cases. In all other cases | |
218 | (for example, when there's a "daemonized" function that wakes up | |
219 | every second), use cpufreq_driver_target to lock the cpufreq per-CPU | |
220 | lock before the command is passed to the cpufreq processor driver. | |
221 |