Commit | Line | Data |
---|---|---|
3b1b3f6e | 1 | The cgroup freezer is useful to batch job management system which start |
bde5ab65 MH |
2 | and stop sets of tasks in order to schedule the resources of a machine |
3 | according to the desires of a system administrator. This sort of program | |
4 | is often used on HPC clusters to schedule access to the cluster as a | |
5 | whole. The cgroup freezer uses cgroups to describe the set of tasks to | |
6 | be started/stopped by the batch job management system. It also provides | |
7 | a means to start and stop the tasks composing the job. | |
8 | ||
3b1b3f6e | 9 | The cgroup freezer will also be useful for checkpointing running groups |
bde5ab65 MH |
10 | of tasks. The freezer allows the checkpoint code to obtain a consistent |
11 | image of the tasks by attempting to force the tasks in a cgroup into a | |
12 | quiescent state. Once the tasks are quiescent another task can | |
13 | walk /proc or invoke a kernel interface to gather information about the | |
14 | quiesced tasks. Checkpointed tasks can be restarted later should a | |
15 | recoverable error occur. This also allows the checkpointed tasks to be | |
16 | migrated between nodes in a cluster by copying the gathered information | |
17 | to another node and restarting the tasks there. | |
18 | ||
3b1b3f6e | 19 | Sequences of SIGSTOP and SIGCONT are not always sufficient for stopping |
bde5ab65 MH |
20 | and resuming tasks in userspace. Both of these signals are observable |
21 | from within the tasks we wish to freeze. While SIGSTOP cannot be caught, | |
22 | blocked, or ignored it can be seen by waiting or ptracing parent tasks. | |
23 | SIGCONT is especially unsuitable since it can be caught by the task. Any | |
24 | programs designed to watch for SIGSTOP and SIGCONT could be broken by | |
25 | attempting to use SIGSTOP and SIGCONT to stop and resume tasks. We can | |
26 | demonstrate this problem using nested bash shells: | |
27 | ||
28 | $ echo $$ | |
29 | 16644 | |
30 | $ bash | |
31 | $ echo $$ | |
32 | 16690 | |
33 | ||
34 | From a second, unrelated bash shell: | |
35 | $ kill -SIGSTOP 16690 | |
5f111616 | 36 | $ kill -SIGCONT 16690 |
bde5ab65 | 37 | |
5f111616 | 38 | <at this point 16690 exits and causes 16644 to exit too> |
bde5ab65 | 39 | |
3b1b3f6e | 40 | This happens because bash can observe both signals and choose how it |
bde5ab65 MH |
41 | responds to them. |
42 | ||
3b1b3f6e | 43 | Another example of a program which catches and responds to these |
bde5ab65 MH |
44 | signals is gdb. In fact any program designed to use ptrace is likely to |
45 | have a problem with this method of stopping and resuming tasks. | |
46 | ||
3b1b3f6e | 47 | In contrast, the cgroup freezer uses the kernel freezer code to |
bde5ab65 MH |
48 | prevent the freeze/unfreeze cycle from becoming visible to the tasks |
49 | being frozen. This allows the bash example above and gdb to run as | |
50 | expected. | |
51 | ||
ef9fe980 | 52 | The cgroup freezer is hierarchical. Freezing a cgroup freezes all |
55d01595 | 53 | tasks belonging to the cgroup and all its descendant cgroups. Each |
ef9fe980 TH |
54 | cgroup has its own state (self-state) and the state inherited from the |
55 | parent (parent-state). Iff both states are THAWED, the cgroup is | |
56 | THAWED. | |
bde5ab65 | 57 | |
ef9fe980 TH |
58 | The following cgroupfs files are created by cgroup freezer. |
59 | ||
60 | * freezer.state: Read-write. | |
61 | ||
62 | When read, returns the effective state of the cgroup - "THAWED", | |
63 | "FREEZING" or "FROZEN". This is the combined self and parent-states. | |
64 | If any is freezing, the cgroup is freezing (FREEZING or FROZEN). | |
65 | ||
66 | FREEZING cgroup transitions into FROZEN state when all tasks | |
67 | belonging to the cgroup and its descendants become frozen. Note that | |
68 | a cgroup reverts to FREEZING from FROZEN after a new task is added | |
69 | to the cgroup or one of its descendant cgroups until the new task is | |
70 | frozen. | |
71 | ||
72 | When written, sets the self-state of the cgroup. Two values are | |
73 | allowed - "FROZEN" and "THAWED". If FROZEN is written, the cgroup, | |
74 | if not already freezing, enters FREEZING state along with all its | |
75 | descendant cgroups. | |
76 | ||
77 | If THAWED is written, the self-state of the cgroup is changed to | |
78 | THAWED. Note that the effective state may not change to THAWED if | |
79 | the parent-state is still freezing. If a cgroup's effective state | |
80 | becomes THAWED, all its descendants which are freezing because of | |
81 | the cgroup also leave the freezing state. | |
82 | ||
83 | * freezer.self_freezing: Read only. | |
84 | ||
85 | Shows the self-state. 0 if the self-state is THAWED; otherwise, 1. | |
86 | This value is 1 iff the last write to freezer.state was "FROZEN". | |
87 | ||
88 | * freezer.parent_freezing: Read only. | |
89 | ||
90 | Shows the parent-state. 0 if none of the cgroup's ancestors is | |
91 | frozen; otherwise, 1. | |
92 | ||
93 | The root cgroup is non-freezable and the above interface files don't | |
94 | exist. | |
3b1b3f6e | 95 | |
bde5ab65 MH |
96 | * Examples of usage : |
97 | ||
f6e07d38 JS |
98 | # mkdir /sys/fs/cgroup/freezer |
99 | # mount -t cgroup -ofreezer freezer /sys/fs/cgroup/freezer | |
100 | # mkdir /sys/fs/cgroup/freezer/0 | |
101 | # echo $some_pid > /sys/fs/cgroup/freezer/0/tasks | |
bde5ab65 MH |
102 | |
103 | to get status of the freezer subsystem : | |
104 | ||
f6e07d38 | 105 | # cat /sys/fs/cgroup/freezer/0/freezer.state |
bde5ab65 MH |
106 | THAWED |
107 | ||
108 | to freeze all tasks in the container : | |
109 | ||
f6e07d38 JS |
110 | # echo FROZEN > /sys/fs/cgroup/freezer/0/freezer.state |
111 | # cat /sys/fs/cgroup/freezer/0/freezer.state | |
bde5ab65 | 112 | FREEZING |
f6e07d38 | 113 | # cat /sys/fs/cgroup/freezer/0/freezer.state |
bde5ab65 MH |
114 | FROZEN |
115 | ||
116 | to unfreeze all tasks in the container : | |
117 | ||
f6e07d38 JS |
118 | # echo THAWED > /sys/fs/cgroup/freezer/0/freezer.state |
119 | # cat /sys/fs/cgroup/freezer/0/freezer.state | |
bde5ab65 MH |
120 | THAWED |
121 | ||
122 | This is the basic mechanism which should do the right thing for user space task | |
123 | in a simple scenario. |