[deliverable/linux.git] / Documentation / ww-mutex-design.txt

Wait/Wound Deadlock-Proof Mutex Design
======================================

Please read mutex-design.txt first, as it applies to wait/wound mutexes too.

Motivation for WW-Mutexes
-------------------------

GPU's do operations that commonly involve many buffers.  Those buffers
can be shared across contexts/processes, exist in different memory
domains (for example VRAM vs system memory), and so on.  And with
PRIME / dmabuf, they can even be shared across devices.  So there are
a handful of situations where the driver needs to wait for buffers to
become ready.  If you think about this in terms of waiting on a buffer
mutex for it to become available, this presents a problem because
there is no way to guarantee that buffers appear in a execbuf/batch in
the same order in all contexts.  That is directly under control of
userspace, and a result of the sequence of GL calls that an application
makes.	Which results in the potential for deadlock.  The problem gets
more complex when you consider that the kernel may need to migrate the
buffer(s) into VRAM before the GPU operates on the buffer(s), which
may in turn require evicting some other buffers (and you don't want to
evict other buffers which are already queued up to the GPU), but for a
simplified understanding of the problem you can ignore this.

The algorithm that the TTM graphics subsystem came up with for dealing with
this problem is quite simple.  For each group of buffers (execbuf) that need
to be locked, the caller would be assigned a unique reservation id/ticket,
from a global counter.  In case of deadlock while locking all the buffers
associated with a execbuf, the one with the lowest reservation ticket (i.e.
the oldest task) wins, and the one with the higher reservation id (i.e. the
younger task) unlocks all of the buffers that it has already locked, and then
tries again.

In the RDBMS literature this deadlock handling approach is called wait/wound:
The older tasks waits until it can acquire the contended lock. The younger tasks
needs to back off and drop all the locks it is currently holding, i.e. the
younger task is wounded.

Concepts
--------

Compared to normal mutexes two additional concepts/objects show up in the lock
interface for w/w mutexes:

Acquire context: To ensure eventual forward progress it is important the a task
trying to acquire locks doesn't grab a new reservation id, but keeps the one it
acquired when starting the lock acquisition. This ticket is stored in the
acquire context. Furthermore the acquire context keeps track of debugging state
to catch w/w mutex interface abuse.

W/w class: In contrast to normal mutexes the lock class needs to be explicit for
w/w mutexes, since it is required to initialize the acquire context.

Furthermore there are three different class of w/w lock acquire functions:

* Normal lock acquisition with a context, using ww_mutex_lock.

* Slowpath lock acquisition on the contending lock, used by the wounded task
  after having dropped all already acquired locks. These functions have the
  _slow postfix.

  From a simple semantics point-of-view the _slow functions are not strictly
  required, since simply calling the normal ww_mutex_lock functions on the
  contending lock (after having dropped all other already acquired locks) will
  work correctly. After all if no other ww mutex has been acquired yet there's
  no deadlock potential and hence the ww_mutex_lock call will block and not
  prematurely return -EDEADLK. The advantage of the _slow functions is in
  interface safety:
  - ww_mutex_lock has a __must_check int return type, whereas ww_mutex_lock_slow
    has a void return type. Note that since ww mutex code needs loops/retries
    anyway the __must_check doesn't result in spurious warnings, even though the
    very first lock operation can never fail.
  - When full debugging is enabled ww_mutex_lock_slow checks that all acquired
    ww mutex have been released (preventing deadlocks) and makes sure that we
    block on the contending lock (preventing spinning through the -EDEADLK
    slowpath until the contended lock can be acquired).

* Functions to only acquire a single w/w mutex, which results in the exact same
  semantics as a normal mutex. This is done by calling ww_mutex_lock with a NULL
  context.

  Again this is not strictly required. But often you only want to acquire a
  single lock in which case it's pointless to set up an acquire context (and so
  better to avoid grabbing a deadlock avoidance ticket).

Of course, all the usual variants for handling wake-ups due to signals are also
provided.

Usage
-----

Three different ways to acquire locks within the same w/w class. Common
definitions for methods #1 and #2:

static DEFINE_WW_CLASS(ww_class);

struct obj {
	struct ww_mutex lock;
	/* obj data */
};

struct obj_entry {
	struct list_head head;
	struct obj *obj;
};

Method 1, using a list in execbuf->buffers that's not allowed to be reordered.
This is useful if a list of required objects is already tracked somewhere.
Furthermore the lock helper can use propagate the -EALREADY return code back to
the caller as a signal that an object is twice on the list. This is useful if
the list is constructed from userspace input and the ABI requires userspace to
not have duplicate entries (e.g. for a gpu commandbuffer submission ioctl).

int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
{
	struct obj *res_obj = NULL;
	struct obj_entry *contended_entry = NULL;
	struct obj_entry *entry;

	ww_acquire_init(ctx, &ww_class);

retry:
	list_for_each_entry (entry, list, head) {
		if (entry->obj == res_obj) {
			res_obj = NULL;
			continue;
		}
		ret = ww_mutex_lock(&entry->obj->lock, ctx);
		if (ret < 0) {
			contended_entry = entry;
			goto err;
		}
	}

	ww_acquire_done(ctx);
	return 0;

err:
	list_for_each_entry_continue_reverse (entry, list, head)
		ww_mutex_unlock(&entry->obj->lock);

	if (res_obj)
		ww_mutex_unlock(&res_obj->lock);

	if (ret == -EDEADLK) {
		/* we lost out in a seqno race, lock and retry.. */
		ww_mutex_lock_slow(&contended_entry->obj->lock, ctx);
		res_obj = contended_entry->obj;
		goto retry;
	}
	ww_acquire_fini(ctx);

	return ret;
}

Method 2, using a list in execbuf->buffers that can be reordered. Same semantics
of duplicate entry detection using -EALREADY as method 1 above. But the
list-reordering allows for a bit more idiomatic code.

int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
{
	struct obj_entry *entry, *entry2;

	ww_acquire_init(ctx, &ww_class);

	list_for_each_entry (entry, list, head) {
		ret = ww_mutex_lock(&entry->obj->lock, ctx);
		if (ret < 0) {
			entry2 = entry;

			list_for_each_entry_continue_reverse (entry2, list, head)
				ww_mutex_unlock(&entry2->obj->lock);

			if (ret != -EDEADLK) {
				ww_acquire_fini(ctx);
				return ret;
			}

			/* we lost out in a seqno race, lock and retry.. */
			ww_mutex_lock_slow(&entry->obj->lock, ctx);

			/*
			 * Move buf to head of the list, this will point
			 * buf->next to the first unlocked entry,
			 * restarting the for loop.
			 */
			list_del(&entry->head);
			list_add(&entry->head, list);
		}
	}

	ww_acquire_done(ctx);
	return 0;
}

Unlocking works the same way for both methods #1 and #2:

void unlock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
{
	struct obj_entry *entry;

	list_for_each_entry (entry, list, head)
		ww_mutex_unlock(&entry->obj->lock);

	ww_acquire_fini(ctx);
}

Method 3 is useful if the list of objects is constructed ad-hoc and not upfront,
e.g. when adjusting edges in a graph where each node has its own ww_mutex lock,
and edges can only be changed when holding the locks of all involved nodes. w/w
mutexes are a natural fit for such a case for two reasons:
- They can handle lock-acquisition in any order which allows us to start walking
  a graph from a starting point and then iteratively discovering new edges and
  locking down the nodes those edges connect to.
- Due to the -EALREADY return code signalling that a given objects is already
  held there's no need for additional book-keeping to break cycles in the graph
  or keep track off which looks are already held (when using more than one node
  as a starting point).

Note that this approach differs in two important ways from the above methods:
- Since the list of objects is dynamically constructed (and might very well be
  different when retrying due to hitting the -EDEADLK wound condition) there's
  no need to keep any object on a persistent list when it's not locked. We can
  therefore move the list_head into the object itself.
- On the other hand the dynamic object list construction also means that the -EALREADY return
  code can't be propagated.

Note also that methods #1 and #2 and method #3 can be combined, e.g. to first lock a
list of starting nodes (passed in from userspace) using one of the above
methods. And then lock any additional objects affected by the operations using
method #3 below. The backoff/retry procedure will be a bit more involved, since
when the dynamic locking step hits -EDEADLK we also need to unlock all the
objects acquired with the fixed list. But the w/w mutex debug checks will catch
any interface misuse for these cases.

Also, method 3 can't fail the lock acquisition step since it doesn't return
-EALREADY. Of course this would be different when using the _interruptible
variants, but that's outside of the scope of these examples here.

struct obj {
	struct ww_mutex ww_mutex;
	struct list_head locked_list;
};

static DEFINE_WW_CLASS(ww_class);

void __unlock_objs(struct list_head *list)
{
	struct obj *entry, *temp;

	list_for_each_entry_safe (entry, temp, list, locked_list) {
		/* need to do that before unlocking, since only the current lock holder is
		allowed to use object */
		list_del(&entry->locked_list);
		ww_mutex_unlock(entry->ww_mutex)
	}
}

void lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
{
	struct obj *obj;

	ww_acquire_init(ctx, &ww_class);

retry:
	/* re-init loop start state */
	loop {
		/* magic code which walks over a graph and decides which objects
		 * to lock */

		ret = ww_mutex_lock(obj->ww_mutex, ctx);
		if (ret == -EALREADY) {
			/* we have that one already, get to the next object */
			continue;
		}
		if (ret == -EDEADLK) {
			__unlock_objs(list);

			ww_mutex_lock_slow(obj, ctx);
			list_add(&entry->locked_list, list);
			goto retry;
		}

		/* locked a new object, add it to the list */
		list_add_tail(&entry->locked_list, list);
	}

	ww_acquire_done(ctx);
	return 0;
}

void unlock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
{
	__unlock_objs(list);
	ww_acquire_fini(ctx);
}

Method 4: Only lock one single objects. In that case deadlock detection and
prevention is obviously overkill, since with grabbing just one lock you can't
produce a deadlock within just one class. To simplify this case the w/w mutex
api can be used with a NULL context.

Implementation Details
----------------------

Design:
  ww_mutex currently encapsulates a struct mutex, this means no extra overhead for
  normal mutex locks, which are far more common. As such there is only a small
  increase in code size if wait/wound mutexes are not used.

  In general, not much contention is expected. The locks are typically used to
  serialize access to resources for devices. The only way to make wakeups
  smarter would be at the cost of adding a field to struct mutex_waiter. This
  would add overhead to all cases where normal mutexes are used, and
  ww_mutexes are generally less performance sensitive.

Lockdep:
  Special care has been taken to warn for as many cases of api abuse
  as possible. Some common api abuses will be caught with
  CONFIG_DEBUG_MUTEXES, but CONFIG_PROVE_LOCKING is recommended.

  Some of the errors which will be warned about:
   - Forgetting to call ww_acquire_fini or ww_acquire_init.
   - Attempting to lock more mutexes after ww_acquire_done.
   - Attempting to lock the wrong mutex after -EDEADLK and
     unlocking all mutexes.
   - Attempting to lock the right mutex after -EDEADLK,
     before unlocking all mutexes.

   - Calling ww_mutex_lock_slow before -EDEADLK was returned.

   - Unlocking mutexes with the wrong unlock function.
   - Calling one of the ww_acquire_* twice on the same context.
   - Using a different ww_class for the mutex than for the ww_acquire_ctx.
   - Normal lockdep errors that can result in deadlocks.

  Some of the lockdep errors that can result in deadlocks:
   - Calling ww_acquire_init to initialize a second ww_acquire_ctx before
     having called ww_acquire_fini on the first.
   - 'normal' deadlocks that can occur.

FIXME: Update this section once we have the TASK_DEADLOCK task state flag magic
implemented.
Commit	Line	Data
040a0a37 ML	1	Wait/Wound Deadlock-Proof Mutex Design
	2	======================================
	3
	4	Please read mutex-design.txt first, as it applies to wait/wound mutexes too.
	5
	6	Motivation for WW-Mutexes
	7	-------------------------
	8
	9	GPU's do operations that commonly involve many buffers. Those buffers
	10	can be shared across contexts/processes, exist in different memory
	11	domains (for example VRAM vs system memory), and so on. And with
	12	PRIME / dmabuf, they can even be shared across devices. So there are
	13	a handful of situations where the driver needs to wait for buffers to
	14	become ready. If you think about this in terms of waiting on a buffer
	15	mutex for it to become available, this presents a problem because
	16	there is no way to guarantee that buffers appear in a execbuf/batch in
	17	the same order in all contexts. That is directly under control of
	18	userspace, and a result of the sequence of GL calls that an application
	19	makes. Which results in the potential for deadlock. The problem gets
	20	more complex when you consider that the kernel may need to migrate the
	21	buffer(s) into VRAM before the GPU operates on the buffer(s), which
	22	may in turn require evicting some other buffers (and you don't want to
	23	evict other buffers which are already queued up to the GPU), but for a
	24	simplified understanding of the problem you can ignore this.
	25
	26	The algorithm that the TTM graphics subsystem came up with for dealing with
	27	this problem is quite simple. For each group of buffers (execbuf) that need
	28	to be locked, the caller would be assigned a unique reservation id/ticket,
	29	from a global counter. In case of deadlock while locking all the buffers
	30	associated with a execbuf, the one with the lowest reservation ticket (i.e.
	31	the oldest task) wins, and the one with the higher reservation id (i.e. the
	32	younger task) unlocks all of the buffers that it has already locked, and then
	33	tries again.
	34
	35	In the RDBMS literature this deadlock handling approach is called wait/wound:
	36	The older tasks waits until it can acquire the contended lock. The younger tasks
	37	needs to back off and drop all the locks it is currently holding, i.e. the
	38	younger task is wounded.
	39
	40	Concepts
	41	--------
	42
	43	Compared to normal mutexes two additional concepts/objects show up in the lock
	44	interface for w/w mutexes:
	45
	46	Acquire context: To ensure eventual forward progress it is important the a task
	47	trying to acquire locks doesn't grab a new reservation id, but keeps the one it
	48	acquired when starting the lock acquisition. This ticket is stored in the
	49	acquire context. Furthermore the acquire context keeps track of debugging state
	50	to catch w/w mutex interface abuse.
	51
	52	W/w class: In contrast to normal mutexes the lock class needs to be explicit for
	53	w/w mutexes, since it is required to initialize the acquire context.
	54
	55	Furthermore there are three different class of w/w lock acquire functions:
	56
	57	* Normal lock acquisition with a context, using ww_mutex_lock.
	58
	59	* Slowpath lock acquisition on the contending lock, used by the wounded task
	60	after having dropped all already acquired locks. These functions have the
	61	_slow postfix.
	62
	63	From a simple semantics point-of-view the _slow functions are not strictly
	64	required, since simply calling the normal ww_mutex_lock functions on the
65	contending lock (after having dropped all other already acquired locks) will
66	work correctly. After all if no other ww mutex has been acquired yet there's
67	no deadlock potential and hence the ww_mutex_lock call will block and not
68	prematurely return -EDEADLK. The advantage of the _slow functions is in
69	interface safety:
70	- ww_mutex_lock has a __must_check int return type, whereas ww_mutex_lock_slow
71	has a void return type. Note that since ww mutex code needs loops/retries
72	anyway the __must_check doesn't result in spurious warnings, even though the
73	very first lock operation can never fail.
74	- When full debugging is enabled ww_mutex_lock_slow checks that all acquired
75	ww mutex have been released (preventing deadlocks) and makes sure that we
76	block on the contending lock (preventing spinning through the -EDEADLK
77	slowpath until the contended lock can be acquired).
78
79	* Functions to only acquire a single w/w mutex, which results in the exact same
80	semantics as a normal mutex. This is done by calling ww_mutex_lock with a NULL
81	context.
82
83	Again this is not strictly required. But often you only want to acquire a
84	single lock in which case it's pointless to set up an acquire context (and so
85	better to avoid grabbing a deadlock avoidance ticket).
86
87	Of course, all the usual variants for handling wake-ups due to signals are also
88	provided.
89
90	Usage
91	-----
92
93	Three different ways to acquire locks within the same w/w class. Common
94	definitions for methods #1 and #2:
95
96	static DEFINE_WW_CLASS(ww_class);
97
98	struct obj {
99	struct ww_mutex lock;
100	/* obj data */
101	};
102
103	struct obj_entry {
104	struct list_head head;
105	struct obj *obj;
106	};
107
108	Method 1, using a list in execbuf->buffers that's not allowed to be reordered.
109	This is useful if a list of required objects is already tracked somewhere.
110	Furthermore the lock helper can use propagate the -EALREADY return code back to
111	the caller as a signal that an object is twice on the list. This is useful if
112	the list is constructed from userspace input and the ABI requires userspace to
113	not have duplicate entries (e.g. for a gpu commandbuffer submission ioctl).
114
115	int lock_objs(struct list_head list, struct ww_acquire_ctx ctx)
116	{
117	struct obj *res_obj = NULL;
118	struct obj_entry *contended_entry = NULL;
119	struct obj_entry *entry;
120
121	ww_acquire_init(ctx, &ww_class);
122
123	retry:
124	list_for_each_entry (entry, list, head) {
125	if (entry->obj == res_obj) {
126	res_obj = NULL;
127	continue;
128	}
129	ret = ww_mutex_lock(&entry->obj->lock, ctx);
130	if (ret < 0) {
131	contended_entry = entry;
132	goto err;
133	}
134	}
135
136	ww_acquire_done(ctx);
137	return 0;
138
139	err:
140	list_for_each_entry_continue_reverse (entry, list, head)
141	ww_mutex_unlock(&entry->obj->lock);
142
143	if (res_obj)
144	ww_mutex_unlock(&res_obj->lock);
145
146	if (ret == -EDEADLK) {
147	/* we lost out in a seqno race, lock and retry.. */
148	ww_mutex_lock_slow(&contended_entry->obj->lock, ctx);
149	res_obj = contended_entry->obj;
150	goto retry;
151	}
152	ww_acquire_fini(ctx);
153
154	return ret;
155	}
156
157	Method 2, using a list in execbuf->buffers that can be reordered. Same semantics
158	of duplicate entry detection using -EALREADY as method 1 above. But the
159	list-reordering allows for a bit more idiomatic code.
160
161	int lock_objs(struct list_head list, struct ww_acquire_ctx ctx)
162	{
163	struct obj_entry entry, entry2;
164
165	ww_acquire_init(ctx, &ww_class);
166
167	list_for_each_entry (entry, list, head) {
168	ret = ww_mutex_lock(&entry->obj->lock, ctx);
169	if (ret < 0) {
170	entry2 = entry;
171
172	list_for_each_entry_continue_reverse (entry2, list, head)
173	ww_mutex_unlock(&entry2->obj->lock);
174
175	if (ret != -EDEADLK) {
176	ww_acquire_fini(ctx);
177	return ret;
178	}
179
180	/* we lost out in a seqno race, lock and retry.. */
181	ww_mutex_lock_slow(&entry->obj->lock, ctx);
182
183	/*
184	* Move buf to head of the list, this will point
185	* buf->next to the first unlocked entry,
186	* restarting the for loop.
187	*/
188	list_del(&entry->head);
189	list_add(&entry->head, list);
190	}
191	}
192
193	ww_acquire_done(ctx);
194	return 0;
195	}
196
197	Unlocking works the same way for both methods #1 and #2:
198
199	void unlock_objs(struct list_head list, struct ww_acquire_ctx ctx)
200	{
201	struct obj_entry *entry;
202
203	list_for_each_entry (entry, list, head)
204	ww_mutex_unlock(&entry->obj->lock);
205
206	ww_acquire_fini(ctx);
207	}
208
209	Method 3 is useful if the list of objects is constructed ad-hoc and not upfront,
210	e.g. when adjusting edges in a graph where each node has its own ww_mutex lock,
211	and edges can only be changed when holding the locks of all involved nodes. w/w
212	mutexes are a natural fit for such a case for two reasons:
213	- They can handle lock-acquisition in any order which allows us to start walking
214	a graph from a starting point and then iteratively discovering new edges and
215	locking down the nodes those edges connect to.
216	- Due to the -EALREADY return code signalling that a given objects is already
217	held there's no need for additional book-keeping to break cycles in the graph
218	or keep track off which looks are already held (when using more than one node
219	as a starting point).
220
221	Note that this approach differs in two important ways from the above methods:
222	- Since the list of objects is dynamically constructed (and might very well be
223	different when retrying due to hitting the -EDEADLK wound condition) there's
224	no need to keep any object on a persistent list when it's not locked. We can
225	therefore move the list_head into the object itself.
226	- On the other hand the dynamic object list construction also means that the -EALREADY return
227	code can't be propagated.
228
229	Note also that methods #1 and #2 and method #3 can be combined, e.g. to first lock a
230	list of starting nodes (passed in from userspace) using one of the above
231	methods. And then lock any additional objects affected by the operations using
232	method #3 below. The backoff/retry procedure will be a bit more involved, since
233	when the dynamic locking step hits -EDEADLK we also need to unlock all the
234	objects acquired with the fixed list. But the w/w mutex debug checks will catch
235	any interface misuse for these cases.
236
237	Also, method 3 can't fail the lock acquisition step since it doesn't return
238	-EALREADY. Of course this would be different when using the _interruptible
239	variants, but that's outside of the scope of these examples here.
240
241	struct obj {
242	struct ww_mutex ww_mutex;
243	struct list_head locked_list;
244	};
245
246	static DEFINE_WW_CLASS(ww_class);
247
248	void __unlock_objs(struct list_head *list)
249	{
250	struct obj entry, temp;
251
252	list_for_each_entry_safe (entry, temp, list, locked_list) {
253	/* need to do that before unlocking, since only the current lock holder is
254	allowed to use object */
255	list_del(&entry->locked_list);
256	ww_mutex_unlock(entry->ww_mutex)
257	}
258	}
259
260	void lock_objs(struct list_head list, struct ww_acquire_ctx ctx)
261	{
262	struct obj *obj;
263
264	ww_acquire_init(ctx, &ww_class);
265
266	retry:
267	/* re-init loop start state */
268	loop {
269	/* magic code which walks over a graph and decides which objects
270	* to lock */
271
272	ret = ww_mutex_lock(obj->ww_mutex, ctx);
273	if (ret == -EALREADY) {
274	/* we have that one already, get to the next object */
275	continue;
276	}
277	if (ret == -EDEADLK) {
278	__unlock_objs(list);
279
280	ww_mutex_lock_slow(obj, ctx);
281	list_add(&entry->locked_list, list);
282	goto retry;
283	}
284
285	/* locked a new object, add it to the list */
286	list_add_tail(&entry->locked_list, list);
287	}
288
289	ww_acquire_done(ctx);
290	return 0;
291	}
292
293	void unlock_objs(struct list_head list, struct ww_acquire_ctx ctx)
294	{
295	__unlock_objs(list);
296	ww_acquire_fini(ctx);
297	}
298
299	Method 4: Only lock one single objects. In that case deadlock detection and
300	prevention is obviously overkill, since with grabbing just one lock you can't
301	produce a deadlock within just one class. To simplify this case the w/w mutex
302	api can be used with a NULL context.
303
304	Implementation Details
305	----------------------
306
307	Design:
308	ww_mutex currently encapsulates a struct mutex, this means no extra overhead for
309	normal mutex locks, which are far more common. As such there is only a small
310	increase in code size if wait/wound mutexes are not used.
311
312	In general, not much contention is expected. The locks are typically used to
313	serialize access to resources for devices. The only way to make wakeups
314	smarter would be at the cost of adding a field to struct mutex_waiter. This
315	would add overhead to all cases where normal mutexes are used, and
316	ww_mutexes are generally less performance sensitive.
317
318	Lockdep:
319	Special care has been taken to warn for as many cases of api abuse
320	as possible. Some common api abuses will be caught with
321	CONFIG_DEBUG_MUTEXES, but CONFIG_PROVE_LOCKING is recommended.
322
323	Some of the errors which will be warned about:
324	- Forgetting to call ww_acquire_fini or ww_acquire_init.
325	- Attempting to lock more mutexes after ww_acquire_done.
326	- Attempting to lock the wrong mutex after -EDEADLK and
327	unlocking all mutexes.
328	- Attempting to lock the right mutex after -EDEADLK,
329	before unlocking all mutexes.
330
331	- Calling ww_mutex_lock_slow before -EDEADLK was returned.
332
333	- Unlocking mutexes with the wrong unlock function.
334	- Calling one of the ww_acquire_* twice on the same context.
335	- Using a different ww_class for the mutex than for the ww_acquire_ctx.
336	- Normal lockdep errors that can result in deadlocks.
337
338	Some of the lockdep errors that can result in deadlocks:
339	- Calling ww_acquire_init to initialize a second ww_acquire_ctx before
340	having called ww_acquire_fini on the first.
341	- 'normal' deadlocks that can occur.
342
343	FIXME: Update this section once we have the TASK_DEADLOCK task state flag magic
344	implemented.