Commit | Line | Data |
---|---|---|
faebe9fd PE |
1 | |
2 | The Resource Counter | |
3 | ||
4 | The resource counter, declared at include/linux/res_counter.h, | |
5 | is supposed to facilitate the resource management by controllers | |
6 | by providing common stuff for accounting. | |
7 | ||
8 | This "stuff" includes the res_counter structure and routines | |
9 | to work with it. | |
10 | ||
11 | ||
12 | ||
13 | 1. Crucial parts of the res_counter structure | |
14 | ||
15 | a. unsigned long long usage | |
16 | ||
17 | The usage value shows the amount of a resource that is consumed | |
18 | by a group at a given time. The units of measurement should be | |
19 | determined by the controller that uses this counter. E.g. it can | |
20 | be bytes, items or any other unit the controller operates on. | |
21 | ||
22 | b. unsigned long long max_usage | |
23 | ||
24 | The maximal value of the usage over time. | |
25 | ||
26 | This value is useful when gathering statistical information about | |
27 | the particular group, as it shows the actual resource requirements | |
28 | for a particular group, not just some usage snapshot. | |
29 | ||
30 | c. unsigned long long limit | |
31 | ||
32 | The maximal allowed amount of resource to consume by the group. In | |
33 | case the group requests for more resources, so that the usage value | |
34 | would exceed the limit, the resource allocation is rejected (see | |
35 | the next section). | |
36 | ||
37 | d. unsigned long long failcnt | |
38 | ||
39 | The failcnt stands for "failures counter". This is the number of | |
40 | resource allocation attempts that failed. | |
41 | ||
42 | c. spinlock_t lock | |
43 | ||
44 | Protects changes of the above values. | |
45 | ||
46 | ||
47 | ||
48 | 2. Basic accounting routines | |
49 | ||
5341cfab AR |
50 | a. void res_counter_init(struct res_counter *rc, |
51 | struct res_counter *rc_parent) | |
faebe9fd PE |
52 | |
53 | Initializes the resource counter. As usual, should be the first | |
54 | routine called for a new counter. | |
55 | ||
5341cfab AR |
56 | The struct res_counter *parent can be used to define a hierarchical |
57 | child -> parent relationship directly in the res_counter structure, | |
58 | NULL can be used to define no relationship. | |
59 | ||
60 | c. int res_counter_charge(struct res_counter *rc, unsigned long val, | |
61 | struct res_counter **limit_fail_at) | |
faebe9fd PE |
62 | |
63 | When a resource is about to be allocated it has to be accounted | |
64 | with the appropriate resource counter (controller should determine | |
65 | which one to use on its own). This operation is called "charging". | |
66 | ||
67 | This is not very important which operation - resource allocation | |
68 | or charging - is performed first, but | |
69 | * if the allocation is performed first, this may create a | |
70 | temporary resource over-usage by the time resource counter is | |
71 | charged; | |
72 | * if the charging is performed first, then it should be uncharged | |
73 | on error path (if the one is called). | |
74 | ||
5341cfab AR |
75 | If the charging fails and a hierarchical dependency exists, the |
76 | limit_fail_at parameter is set to the particular res_counter element | |
77 | where the charging failed. | |
78 | ||
79 | d. int res_counter_charge_locked | |
4d8438f0 | 80 | (struct res_counter *rc, unsigned long val, bool force) |
5341cfab AR |
81 | |
82 | The same as res_counter_charge(), but it must not acquire/release the | |
83 | res_counter->lock internally (it must be called with res_counter->lock | |
4d8438f0 | 84 | held). The force parameter indicates whether we can bypass the limit. |
5341cfab | 85 | |
50bdd430 | 86 | e. u64 res_counter_uncharge[_locked] |
faebe9fd PE |
87 | (struct res_counter *rc, unsigned long val) |
88 | ||
89 | When a resource is released (freed) it should be de-accounted | |
90 | from the resource counter it was accounted to. This is called | |
50bdd430 GC |
91 | "uncharging". The return value of this function indicate the amount |
92 | of charges still present in the counter. | |
faebe9fd | 93 | |
5341cfab | 94 | The _locked routines imply that the res_counter->lock is taken. |
faebe9fd | 95 | |
50bdd430 | 96 | f. u64 res_counter_uncharge_until |
2bb2ba9d FW |
97 | (struct res_counter *rc, struct res_counter *top, |
98 | unsinged long val) | |
99 | ||
100 | Almost same as res_cunter_uncharge() but propagation of uncharge | |
101 | stops when rc == top. This is useful when kill a res_coutner in | |
102 | child cgroup. | |
103 | ||
faebe9fd PE |
104 | 2.1 Other accounting routines |
105 | ||
106 | There are more routines that may help you with common needs, like | |
107 | checking whether the limit is reached or resetting the max_usage | |
108 | value. They are all declared in include/linux/res_counter.h. | |
109 | ||
110 | ||
111 | ||
112 | 3. Analyzing the resource counter registrations | |
113 | ||
114 | a. If the failcnt value constantly grows, this means that the counter's | |
115 | limit is too tight. Either the group is misbehaving and consumes too | |
116 | many resources, or the configuration is not suitable for the group | |
117 | and the limit should be increased. | |
118 | ||
119 | b. The max_usage value can be used to quickly tune the group. One may | |
120 | set the limits to maximal values and either load the container with | |
121 | a common pattern or leave one for a while. After this the max_usage | |
122 | value shows the amount of memory the container would require during | |
123 | its common activity. | |
124 | ||
125 | Setting the limit a bit above this value gives a pretty good | |
126 | configuration that works in most of the cases. | |
127 | ||
128 | c. If the max_usage is much less than the limit, but the failcnt value | |
129 | is growing, then the group tries to allocate a big chunk of resource | |
130 | at once. | |
131 | ||
132 | d. If the max_usage is much less than the limit, but the failcnt value | |
133 | is 0, then this group is given too high limit, that it does not | |
134 | require. It is better to lower the limit a bit leaving more resource | |
135 | for other groups. | |
136 | ||
137 | ||
138 | ||
139 | 4. Communication with the control groups subsystem (cgroups) | |
140 | ||
141 | All the resource controllers that are using cgroups and resource counters | |
142 | should provide files (in the cgroup filesystem) to work with the resource | |
143 | counter fields. They are recommended to adhere to the following rules: | |
144 | ||
145 | a. File names | |
146 | ||
147 | Field name File name | |
148 | --------------------------------------------------- | |
149 | usage usage_in_<unit_of_measurement> | |
150 | max_usage max_usage_in_<unit_of_measurement> | |
151 | limit limit_in_<unit_of_measurement> | |
152 | failcnt failcnt | |
153 | lock no file :) | |
154 | ||
155 | b. Reading from file should show the corresponding field value in the | |
156 | appropriate format. | |
157 | ||
158 | c. Writing to file | |
159 | ||
160 | Field Expected behavior | |
161 | ---------------------------------- | |
162 | usage prohibited | |
163 | max_usage reset to usage | |
164 | limit set the limit | |
165 | failcnt reset to zero | |
166 | ||
167 | ||
168 | ||
169 | 5. Usage example | |
170 | ||
171 | a. Declare a task group (take a look at cgroups subsystem for this) and | |
172 | fold a res_counter into it | |
173 | ||
174 | struct my_group { | |
175 | struct res_counter res; | |
176 | ||
177 | <other fields> | |
178 | } | |
179 | ||
180 | b. Put hooks in resource allocation/release paths | |
181 | ||
182 | int alloc_something(...) | |
183 | { | |
184 | if (res_counter_charge(res_counter_ptr, amount) < 0) | |
185 | return -ENOMEM; | |
186 | ||
187 | <allocate the resource and return to the caller> | |
188 | } | |
189 | ||
190 | void release_something(...) | |
191 | { | |
192 | res_counter_uncharge(res_counter_ptr, amount); | |
193 | ||
194 | <release the resource> | |
195 | } | |
196 | ||
197 | In order to keep the usage value self-consistent, both the | |
198 | "res_counter_ptr" and the "amount" in release_something() should be | |
199 | the same as they were in the alloc_something() when the releasing | |
200 | resource was allocated. | |
201 | ||
202 | c. Provide the way to read res_counter values and set them (the cgroups | |
203 | still can help with it). | |
204 | ||
205 | c. Compile and run :) |