Commit | Line | Data |
---|---|---|
4047f8b1 JC |
1 | The padata parallel execution mechanism |
2 | Last updated for 2.6.34 | |
3 | ||
4 | Padata is a mechanism by which the kernel can farm work out to be done in | |
5 | parallel on multiple CPUs while retaining the ordering of tasks. It was | |
6 | developed for use with the IPsec code, which needs to be able to perform | |
7 | encryption and decryption on large numbers of packets without reordering | |
8 | those packets. The crypto developers made a point of writing padata in a | |
9 | sufficiently general fashion that it could be put to other uses as well. | |
10 | ||
11 | The first step in using padata is to set up a padata_instance structure for | |
12 | overall control of how tasks are to be run: | |
13 | ||
14 | #include <linux/padata.h> | |
15 | ||
16 | struct padata_instance *padata_alloc(const struct cpumask *cpumask, | |
17 | struct workqueue_struct *wq); | |
18 | ||
19 | The cpumask describes which processors will be used to execute work | |
20 | submitted to this instance. The workqueue wq is where the work will | |
21 | actually be done; it should be a multithreaded queue, naturally. | |
22 | ||
23 | There are functions for enabling and disabling the instance: | |
24 | ||
25 | void padata_start(struct padata_instance *pinst); | |
26 | void padata_stop(struct padata_instance *pinst); | |
27 | ||
28 | These functions literally do nothing beyond setting or clearing the | |
29 | "padata_start() was called" flag; if that flag is not set, other functions | |
30 | will refuse to work. | |
31 | ||
32 | The list of CPUs to be used can be adjusted with these functions: | |
33 | ||
34 | int padata_set_cpumask(struct padata_instance *pinst, | |
35 | cpumask_var_t cpumask); | |
36 | int padata_add_cpu(struct padata_instance *pinst, int cpu); | |
37 | int padata_remove_cpu(struct padata_instance *pinst, int cpu); | |
38 | ||
39 | Changing the CPU mask has the look of an expensive operation, though, so it | |
40 | probably should not be done with great frequency. | |
41 | ||
42 | Actually submitting work to the padata instance requires the creation of a | |
43 | padata_priv structure: | |
44 | ||
45 | struct padata_priv { | |
46 | /* Other stuff here... */ | |
47 | void (*parallel)(struct padata_priv *padata); | |
48 | void (*serial)(struct padata_priv *padata); | |
49 | }; | |
50 | ||
51 | This structure will almost certainly be embedded within some larger | |
52 | structure specific to the work to be done. Most its fields are private to | |
53 | padata, but the structure should be zeroed at initialization time, and the | |
54 | parallel() and serial() functions should be provided. Those functions will | |
55 | be called in the process of getting the work done as we will see | |
56 | momentarily. | |
57 | ||
58 | The submission of work is done with: | |
59 | ||
60 | int padata_do_parallel(struct padata_instance *pinst, | |
61 | struct padata_priv *padata, int cb_cpu); | |
62 | ||
63 | The pinst and padata structures must be set up as described above; cb_cpu | |
64 | specifies which CPU will be used for the final callback when the work is | |
65 | done; it must be in the current instance's CPU mask. The return value from | |
66 | padata_do_parallel() is a little strange; zero is an error return | |
67 | indicating that the caller forgot the padata_start() formalities. -EBUSY | |
68 | means that somebody, somewhere else is messing with the instance's CPU | |
69 | mask, while -EINVAL is a complaint about cb_cpu not being in that CPU mask. | |
70 | If all goes well, this function will return -EINPROGRESS, indicating that | |
71 | the work is in progress. | |
72 | ||
73 | Each task submitted to padata_do_parallel() will, in turn, be passed to | |
74 | exactly one call to the above-mentioned parallel() function, on one CPU, so | |
75 | true parallelism is achieved by submitting multiple tasks. Despite the | |
76 | fact that the workqueue is used to make these calls, parallel() is run with | |
77 | software interrupts disabled and thus cannot sleep. The parallel() | |
78 | function gets the padata_priv structure pointer as its lone parameter; | |
79 | information about the actual work to be done is probably obtained by using | |
80 | container_of() to find the enclosing structure. | |
81 | ||
82 | Note that parallel() has no return value; the padata subsystem assumes that | |
83 | parallel() will take responsibility for the task from this point. The work | |
84 | need not be completed during this call, but, if parallel() leaves work | |
85 | outstanding, it should be prepared to be called again with a new job before | |
86 | the previous one completes. When a task does complete, parallel() (or | |
87 | whatever function actually finishes the job) should inform padata of the | |
88 | fact with a call to: | |
89 | ||
90 | void padata_do_serial(struct padata_priv *padata); | |
91 | ||
92 | At some point in the future, padata_do_serial() will trigger a call to the | |
93 | serial() function in the padata_priv structure. That call will happen on | |
94 | the CPU requested in the initial call to padata_do_parallel(); it, too, is | |
95 | done through the workqueue, but with local software interrupts disabled. | |
96 | Note that this call may be deferred for a while since the padata code takes | |
97 | pains to ensure that tasks are completed in the order in which they were | |
98 | submitted. | |
99 | ||
100 | The one remaining function in the padata API should be called to clean up | |
101 | when a padata instance is no longer needed: | |
102 | ||
103 | void padata_free(struct padata_instance *pinst); | |
104 | ||
105 | This function will busy-wait while any remaining tasks are completed, so it | |
106 | might be best not to call it while there is work outstanding. Shutting | |
107 | down the workqueue, if necessary, should be done separately. |