deliverable/linux.git
12 years agoMerge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck...
Ingo Molnar [Wed, 14 Dec 2011 07:16:43 +0000 (08:16 +0100)] 
Merge branch 'rcu/next' of git://git./linux/kernel/git/paulmck/linux-rcu into core/rcu

12 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 13 Dec 2011 23:02:31 +0000 (15:02 -0800)] 
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "x86, efi: Calling __pa() with an ioremap()ed address is invalid"
  x86, efi: Make efi_call_phys_{prelog,epilog} CONFIG_RELOCATABLE-aware

12 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph...
Linus Torvalds [Tue, 13 Dec 2011 22:59:42 +0000 (14:59 -0800)] 
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: add missing spin_unlock at ceph_mdsc_build_path()
  ceph: fix SEEK_CUR, SEEK_SET regression
  crush: fix mapping calculation when force argument doesn't exist
  ceph: use i_ceph_lock instead of i_lock
  rbd: remove buggy rollback functionality
  rbd: return an error when an invalid header is read
  ceph: fix rasize reporting by ceph_show_options

12 years agoMerge branch 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Tue, 13 Dec 2011 22:58:56 +0000 (14:58 -0800)] 
Merge branch 'writeback-for-linus' of git://git./linux/kernel/git/wfg/linux

* 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  writeback: set max_pause to lowest value on zero bdi_dirty
  writeback: permit through good bdi even when global dirty exceeded
  writeback: comment on the bdi dirty threshold
  fs: Make write(2) interruptible by a fatal signal
  writeback: Fix issue on make htmldocs

12 years agoceph: add missing spin_unlock at ceph_mdsc_build_path()
Yehuda Sadeh [Tue, 13 Dec 2011 17:57:44 +0000 (09:57 -0800)] 
ceph: add missing spin_unlock at ceph_mdsc_build_path()

one of the paths was missing spin_unlock

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
12 years agoMerge branch 'fixes' of http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur...
Linus Torvalds [Tue, 13 Dec 2011 17:28:23 +0000 (09:28 -0800)] 
Merge branch 'fixes' of ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm

* 'fixes' of http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm:
  ARM: 7204/1: arch/arm/kernel/setup.c: initialize arm_dma_zone_size earlier
  ARM: 7185/1: perf: don't assign platform_device on unsupported CPUs
  ARM: 7187/1: fix unwinding for XIP kernels
  ARM: 7186/1: fix Kconfig issue with PHYS_OFFSET and !MMU

12 years agoceph: fix SEEK_CUR, SEEK_SET regression
Sage Weil [Tue, 13 Dec 2011 17:19:26 +0000 (09:19 -0800)] 
ceph: fix SEEK_CUR, SEEK_SET regression

Commit 06222e491e663dac939f04b125c9dc52126a75c4 got the if wrong so that
it always evaluates as true.  This is semantically harmless, but makes
SEEK_CUR and SEEK_SET needlessly query the server.

Rewrite the if to explicitly enumerate the cases we DO need a valid i_size
to make this code less fragile.

Reported-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
12 years agolinux/log2.h: Fix rounddown_pow_of_two(1)
Linus Torvalds [Tue, 13 Dec 2011 06:06:55 +0000 (22:06 -0800)] 
linux/log2.h: Fix rounddown_pow_of_two(1)

Exactly like roundup_pow_of_two(1), the rounddown version was buggy for
the case of a compile-time constant '1' argument.  Probably because it
originated from the same code, sharing history with the roundup version
from before the bugfix (for that one, see commit 1a06a52ee1b0: "Fix
roundup_pow_of_two(1)").

However, unlike the roundup version, the fix for rounddown is to just
remove the broken special case entirely.  It's simply not needed - the
generic code

    1UL << ilog2(n)

does the right thing for the constant '1' argment too.  The only reason
roundup needed that special case was because rounding up does so by
subtracting one from the argument (and then adding one to the result)
causing the obvious problems with "ilog2(0)".

But rounddown doesn't do any of that, since ilog2() naturally truncates
(ie "rounds down") to the right rounded down value.  And without the
ilog2(0) case, there's no reason for the special case that had the wrong
value.

tl;dr: rounddown_pow_of_two(1) should be 1, not 0.

Acked-by: Dmitry Torokhov <dtor@vmware.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoMerge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groec...
Linus Torvalds [Tue, 13 Dec 2011 04:08:27 +0000 (20:08 -0800)] 
Merge branch 'hwmon-for-linus' of git://git./linux/kernel/git/groeck/linux-staging

* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (jz4740) Staticise jz4740_hwmon_driver
  hwmon: (jz4740) fix signedness bug

12 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc
Linus Torvalds [Tue, 13 Dec 2011 04:06:13 +0000 (20:06 -0800)] 
Merge branch 'for-linus' of git://git./linux/kernel/git/cjb/mmc

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
  mmc: core: Fix deadlock when the CONFIG_MMC_UNSAFE_RESUME is not defined
  mmc: sdhci-s3c: Remove old and misprototyped suspend operations
  mmc: tmio: fix clock gating on platforms with a .set_pwr() method
  mmc: sh_mmcif: fix clock gating on platforms with a .down_pwr() method
  mmc: core: Fix typo at mmc_card_sleep
  mmc: core: Fix power_off_notify during suspend
  mmc: core: Fix setting power notify state variable for non-eMMC
  mmc: core: Add quirk for long data read time
  mmc: Add module.h include to sdhci-cns3xxx.c
  mmc: mxcmmc: fix falling back to PIO
  mmc: omap_hsmmc: DMA unmap only once in case of MMC error

12 years agocpu: Export cpu_up()
Paul E. McKenney [Mon, 12 Dec 2011 05:54:45 +0000 (21:54 -0800)] 
cpu: Export cpu_up()

Building rcutorture as a module requires cpu_up() as well as cpu_down()
exported, so apply EXPORT_SYMBOL_GPL().

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agoRevert "x86, efi: Calling __pa() with an ioremap()ed address is invalid"
Keith Packard [Mon, 12 Dec 2011 00:12:42 +0000 (16:12 -0800)] 
Revert "x86, efi: Calling __pa() with an ioremap()ed address is invalid"

This hangs my MacBook Air at boot time; I get no console
messages at all. I reverted this on top of -rc5 and my machine
boots again.

This reverts commit e8c7106280a305e1ff2a3a8a4dfce141469fb039.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Huang Ying <huang.ying.caritas@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1321621751-3650-1-git-send-email-matt@console
Signed-off-by: Ingo Molnar <mingo@elte.hu>
12 years agocrush: fix mapping calculation when force argument doesn't exist
Sage Weil [Wed, 7 Dec 2011 17:10:26 +0000 (09:10 -0800)] 
crush: fix mapping calculation when force argument doesn't exist

If the force argument isn't valid, we should continue calculating a
mapping as if it weren't specified.

Signed-off-by: Sage Weil <sage@newdream.net>
12 years agohwmon: (jz4740) Staticise jz4740_hwmon_driver
Axel Lin [Thu, 8 Dec 2011 13:33:37 +0000 (08:33 -0500)] 
hwmon: (jz4740) Staticise jz4740_hwmon_driver

It is not used outside this driver so no need to make the symbol global.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
12 years agohwmon: (jz4740) fix signedness bug
Axel Lin [Thu, 8 Dec 2011 13:04:12 +0000 (08:04 -0500)] 
hwmon: (jz4740) fix signedness bug

wait_for_completion_interruptible_timeout() may return negative value.
In this case, checking if (t > 0)  will return true if t is unsigned.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: stable@kernel.org (3.0+)
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
12 years agoARM: 7204/1: arch/arm/kernel/setup.c: initialize arm_dma_zone_size earlier
Arnaud Patard [Sun, 11 Dec 2011 19:32:25 +0000 (20:32 +0100)] 
ARM: 7204/1: arch/arm/kernel/setup.c: initialize arm_dma_zone_size earlier

arm_dma_zone_size is used by arm_bootmem_free() which is called by
paging_init(). Thus it needs to be set before calling it.

Signed-off-by: Arnaud Patard <arnaud.patard@rtp-net.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Cc: stable@kernel.org
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
12 years agorcu: Apply ACCESS_ONCE() to rcu_boost() return value
Paul E. McKenney [Fri, 9 Dec 2011 22:43:47 +0000 (14:43 -0800)] 
rcu: Apply ACCESS_ONCE() to rcu_boost() return value

Both TINY_RCU's and TREE_RCU's implementations of rcu_boost() access
the ->boost_tasks and ->exp_tasks fields without preventing concurrent
changes to these fields.  This commit therefore applies ACCESS_ONCE in
order to prevent compiler mischief.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agoRevert "rcu: Permit rt_mutex_unlock() with irqs disabled"
Paul E. McKenney [Fri, 9 Dec 2011 22:00:06 +0000 (14:00 -0800)] 
Revert "rcu: Permit rt_mutex_unlock() with irqs disabled"

This reverts commit 5342e269b2b58ee0b0b4168a94087faaa60d0567.

The approach taken in this patch was deemed too abusive to mutexes,
and thus too likely to result in maintenance problems in the future.
Instead, we will disallow RCU read-side critical sections that partially
overlap with interrupt-disbled code segments.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agodocs: Additional LWN links to RCU API
Kees Cook [Wed, 7 Dec 2011 23:11:23 +0000 (15:11 -0800)] 
docs: Additional LWN links to RCU API

Tyler Hicks pointed me at an additional article on RCU and I figured
it should probably be mentioned with the others.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Augment rcu_batch_end tracing for idle and callback state
Paul E. McKenney [Thu, 8 Dec 2011 00:32:40 +0000 (16:32 -0800)] 
rcu: Augment rcu_batch_end tracing for idle and callback state

The current rcu_batch_end event trace records only the name of the RCU
flavor and the total number of callbacks that remain queued on the
current CPU.  This is insufficient for testing and tuning the new
dyntick-idle RCU_FAST_NO_HZ code, so this commit adds idle state along
with whether or not any of the callbacks that were ready to invoke
at the beginning of rcu_do_batch() are still queued.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Add rcutorture tests for srcu_read_lock_raw()
Paul E. McKenney [Mon, 5 Dec 2011 23:36:31 +0000 (15:36 -0800)] 
rcu: Add rcutorture tests for srcu_read_lock_raw()

This commit adds simple rcutorture tests for srcu_read_lock_raw() and
srcu_read_unlock_raw().  It does not test doing srcu_read_lock_raw()
in an exception handler and releasing it in the corresponding process
context.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Make rcutorture test for hotpluggability before offlining CPUs
Paul E. McKenney [Sat, 3 Dec 2011 23:09:28 +0000 (15:09 -0800)] 
rcu: Make rcutorture test for hotpluggability before offlining CPUs

The rcutorture test now can automatically exercise CPU hotplug and
collect success statistics, which can be correlated with other rcutorture
activity.  This permits rcutorture to completely exercise RCU regardless
of what sort of userspace and filesystem layout is in use.  Unfortunately,
rcutorture is happy to attempt to offline CPUs that cannot be offlined,
for example, CPU 0 in both the x86 and ARM architectures.  Although this
allows rcutorture testing to proceed normally, it confounds attempts at
error analysis due to the resulting flood of spurious CPU-hotplug errors.

Therefore, this commit uses the new cpu_is_hotpluggable() function to
avoid attempting to offline CPUs that are not hotpluggable, which in
turn avoids spurious CPU-hotplug errors.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agodriver-core/cpu: Expose hotpluggability to the rest of the kernel
Josh Triplett [Sat, 3 Dec 2011 21:06:50 +0000 (13:06 -0800)] 
driver-core/cpu: Expose hotpluggability to the rest of the kernel

When architectures register CPUs, they indicate whether the CPU allows
hotplugging; notably, x86 and ARM don't allow hotplugging CPU 0.
Userspace can easily query the hotpluggability of a CPU via sysfs;
however, the kernel has no convenient way of accessing that property in
an architecture-independent way.  While the kernel can simply try it and
see, some code needs to distinguish between "hotplug failed" and
"hotplug has no hope of working on this CPU"; for example, rcutorture's
CPU hotplug tests want to avoid drowning out real hotplug failures with
expected failures.

Expose this property via a new cpu_is_hotpluggable function, so that the
rest of the kernel can access it in an architecture-independent way.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Remove redundant rcu_cpu_stall_suppress declaration
Paul E. McKenney [Thu, 1 Dec 2011 01:29:18 +0000 (17:29 -0800)] 
rcu: Remove redundant rcu_cpu_stall_suppress declaration

No point in having two identical rcu_cpu_stall_suppress declarations,
so remove the more obscure of the two.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Adaptive dyntick-idle preparation
Paul E. McKenney [Wed, 30 Nov 2011 23:41:14 +0000 (15:41 -0800)] 
rcu: Adaptive dyntick-idle preparation

If there are other CPUs active at a given point in time, then there is a
limit to what a given CPU can do to advance the current RCU grace period.
Beyond this limit, attempting to force the RCU grace period forward will
do nothing but consume energy burning CPU cycles.

Therefore, this commit takes an adaptive approach to RCU_FAST_NO_HZ
preparations for idle.  It pushes the RCU core state machine for
two cycles unconditionally, and then it will push from zero to three
additional cycles, but only as long as the RCU core has work for this
CPU to do immediately.  The rcu_pending() function is used to check
whether the RCU core has such work.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Keep invoking callbacks if CPU otherwise idle
Paul E. McKenney [Tue, 29 Nov 2011 23:57:13 +0000 (15:57 -0800)] 
rcu: Keep invoking callbacks if CPU otherwise idle

The rcu_do_batch() function that invokes callbacks for TREE_RCU and
TREE_PREEMPT_RCU normally throttles callback invocation to avoid degrading
scheduling latency.  However, as long as the CPU would otherwise be idle,
there is no downside to continuing to invoke any callbacks that have passed
through their grace periods.  In fact, processing such callbacks in a
timely manner has the benefit of increasing the probability that the
CPU can enter the power-saving dyntick-idle mode.

Therefore, this commit allows callback invocation to continue beyond the
preset limit as long as the scheduler does not have some other task to
run and as long as context is that of the idle task or the relevant
RCU kthread.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Irq nesting is always 0 on rcu_enter_idle_common
Frederic Weisbecker [Tue, 29 Nov 2011 00:26:56 +0000 (16:26 -0800)] 
rcu: Irq nesting is always 0 on rcu_enter_idle_common

Because tasks don't nest, the ->dyntick_nesting must always be zero upon
entry to rcu_idle_enter_common().  Therefore, pass "0" rather than the
counter itself.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Don't check irq nesting from rcu idle entry/exit
Frederic Weisbecker [Tue, 29 Nov 2011 00:18:56 +0000 (16:18 -0800)] 
rcu: Don't check irq nesting from rcu idle entry/exit

Because tasks do not nest, rcu_idle_enter() and rcu_idle_exit() do
not need to check for nesting.  This commit therefore moves nesting
checks from rcu_idle_enter_common() to rcu_irq_exit() and from
rcu_idle_exit_common() to rcu_irq_enter().

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Permit dyntick-idle with callbacks pending
Paul E. McKenney [Mon, 28 Nov 2011 20:28:34 +0000 (12:28 -0800)] 
rcu: Permit dyntick-idle with callbacks pending

The current implementation of RCU_FAST_NO_HZ prevents CPUs from entering
dyntick-idle state if they have RCU callbacks pending.  Unfortunately,
this has the side-effect of often preventing them from entering this
state, especially if at least one other CPU is not in dyntick-idle state.
However, the resulting per-tick wakeup is wasteful in many cases: if the
CPU has already fully responded to the current RCU grace period, there
will be nothing for it to do until this grace period ends, which will
frequently take several jiffies.

This commit therefore permits a CPU that has done everything that the
current grace period has asked of it (rcu_pending() == 0) even if it
still as RCU callbacks pending.  However, such a CPU posts a timer to
wake it up several jiffies later (6 jiffies, based on experience with
grace-period lengths).  This wakeup is required to handle situations
that can result in all CPUs being in dyntick-idle mode, thus failing
to ever complete the current grace period.  If a CPU wakes up before
the timer goes off, then it cancels that timer, thus avoiding spurious
wakeups.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Document same-context read-side constraints
Paul E. McKenney [Mon, 28 Nov 2011 18:42:42 +0000 (10:42 -0800)] 
rcu: Document same-context read-side constraints

The intent is that a given RCU read-side critical section be confined
to a single context.  For example, it is illegal to invoke rcu_read_lock()
in an exception handler and then invoke rcu_read_unlock() from the
context of the task that received the exception.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Identify dyntick-idle CPUs on first force_quiescent_state() pass
Paul E. McKenney [Wed, 23 Nov 2011 23:02:05 +0000 (15:02 -0800)] 
rcu: Identify dyntick-idle CPUs on first force_quiescent_state() pass

Fixes and workarounds for a number of issues (for example, that in
df4012edc) make it safe to once again detect dyntick-idle CPUs on the
first pass of force_quiescent_state(), so this commit makes that change.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Remove dynticks false positives and RCU failures
Paul E. McKenney [Wed, 23 Nov 2011 21:38:58 +0000 (13:38 -0800)] 
rcu: Remove dynticks false positives and RCU failures

Assertions in rcu_init_percpu_data() unknowingly relied on outgoing
CPUs being turned off before reaching the idle loop.  Unfortunately,
when running under kvm/qemu on x86, CPUs really can get to idle before
begin shut off.  These CPUs are then born in dyntick-idle mode from an
RCU perspective, which results in splats in rcu_init_percpu_data() and
in RCU wrongly ignoring those CPUs despite them being active.  This in
turn can cause RCU to end grace periods prematurely, potentially freeing
up memory that the newly onlined CPUs were still using.  This is most
decidedly not what we need to see in an RCU implementation.

This commit therefore replaces the assertions in rcu_init_percpu_data()
with code that forces RCU's dyntick-idle view of newly onlined CPUs to
match reality.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Reduce latency of rcu_prepare_for_idle()
Paul E. McKenney [Wed, 23 Nov 2011 05:08:13 +0000 (21:08 -0800)] 
rcu: Reduce latency of rcu_prepare_for_idle()

Re-enable interrupts across calls to quiescent-state functions and
also across force_quiescent_state() to reduce latency.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Eliminate RCU_FAST_NO_HZ grace-period hang
Paul E. McKenney [Wed, 23 Nov 2011 04:43:02 +0000 (20:43 -0800)] 
rcu: Eliminate RCU_FAST_NO_HZ grace-period hang

With the new implementation of RCU_FAST_NO_HZ, it was possible to hang
RCU grace periods as follows:

o CPU 0 attempts to go idle, cycles several times through the
rcu_prepare_for_idle() loop, then goes dyntick-idle when
RCU needs nothing more from it, while still having at least
on RCU callback pending.

o CPU 1 goes idle with no callbacks.

Both CPUs can then stay in dyntick-idle mode indefinitely, preventing
the RCU grace period from ever completing, possibly hanging the system.

This commit therefore prevents CPUs that have RCU callbacks from entering
dyntick-idle mode.  This approach also eliminates the need for the
end-of-grace-period IPIs used previously.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Avoid needlessly IPIing CPUs at GP end
Paul E. McKenney [Wed, 23 Nov 2011 01:46:19 +0000 (17:46 -0800)] 
rcu: Avoid needlessly IPIing CPUs at GP end

If a CPU enters dyntick-idle mode with callbacks pending, it will need
an IPI at the end of the grace period.  However, if it exits dyntick-idle
mode before the grace period ends, it will be needlessly IPIed at the
end of the grace period.

Therefore, this commit clears the per-CPU rcu_awake_at_gp_end flag
when a CPU determines that it does not need it.  This in turn requires
disabling interrupts across much of rcu_prepare_for_idle() in order to
avoid having nested interrupts clearing this state out from under us.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Go dyntick-idle more quickly if CPU has serviced current grace period
Paul E. McKenney [Wed, 23 Nov 2011 01:07:11 +0000 (17:07 -0800)] 
rcu: Go dyntick-idle more quickly if CPU has serviced current grace period

The earlier version would attempt to push callbacks through five times
before going into dyntick-idle mode if callbacks remained, but the CPU
had done all that it needed to do for the current RCU grace periods.
This is wasteful:  In most cases, once the CPU has done all that it
needs to for the current RCU grace periods, it will make no further
progress on the callbacks no matter how many times it loops through
the RCU core processing and the idle-entry code.

This commit therefore goes to dyntick-idle mode whenever the current
CPU has done all it can for the current grace period.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Add tracing for RCU_FAST_NO_HZ
Paul E. McKenney [Tue, 22 Nov 2011 22:58:03 +0000 (14:58 -0800)] 
rcu: Add tracing for RCU_FAST_NO_HZ

This commit adds trace_rcu_prep_idle(), which is invoked from
rcu_prepare_for_idle() and rcu_wake_cpu() to trace attempts on
the part of RCU to force CPUs into dyntick-idle mode.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Update trace_rcu_dyntick() header comment
Paul E. McKenney [Tue, 22 Nov 2011 20:13:03 +0000 (12:13 -0800)] 
rcu: Update trace_rcu_dyntick() header comment

This commit updates the trace_rcu_dyntick() header comment to reflect
events added by commit 4b4f421.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agodoc: Add load/store guarantees to Documentation/atomic-ops.txt
Paul E. McKenney [Tue, 22 Nov 2011 18:55:12 +0000 (10:55 -0800)] 
doc: Add load/store guarantees to Documentation/atomic-ops.txt

An IRC discussion uncovered many conflicting opinions on what types
of data may be atomically loaded and stored.  This commit therefore
calls this out the official set: pointers, longs, ints, and chars (but
not shorts).  This commit also gives some examples of compiler mischief
that can thwart atomicity.

Please note that this discussion is relevant to !SMP kernels if
CONFIG_PREEMPT=y: preemption can cause almost as much trouble as can SMP.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Chris Zankel <chris@zankel.net>
12 years agonohz: Remove tick_nohz_idle_enter_norcu() / tick_nohz_idle_exit_norcu()
Frederic Weisbecker [Thu, 17 Nov 2011 17:48:14 +0000 (18:48 +0100)] 
nohz: Remove tick_nohz_idle_enter_norcu() / tick_nohz_idle_exit_norcu()

Those two APIs were provided to optimize the calls of
tick_nohz_idle_enter() and rcu_idle_enter() into a single
irq disabled section. This way no interrupt happening in-between would
needlessly process any RCU job.

Now we are talking about an optimization for which benefits
have yet to be measured. Let's start simple and completely decouple
idle rcu and dyntick idle logics to simplify.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Add rcutorture CPU-hotplug capability
Paul E. McKenney [Thu, 17 Nov 2011 01:48:21 +0000 (17:48 -0800)] 
rcu: Add rcutorture CPU-hotplug capability

Running CPU-hotplug operations concurrently with rcutorture has
historically been a good way to find bugs in both RCU and CPU hotplug.
This commit therefore adds an rcutorture module parameter called
"onoff_interval" that causes a randomly selected CPU-hotplug operation to
be executed at the specified interval, in seconds.  The default value of
"onoff_interval" is zero, which disables rcutorture-instigated CPU-hotplug
operations.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agotile: Make tile use the new is_idle_task() API
Paul E. McKenney [Fri, 11 Nov 2011 00:16:13 +0000 (16:16 -0800)] 
tile: Make tile use the new is_idle_task() API

Change from direct comparison of ->pid with zero to is_idle_task().

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
12 years agoevents: Make events use the new is_idle_task() API
Paul E. McKenney [Fri, 11 Nov 2011 00:02:52 +0000 (16:02 -0800)] 
events: Make events use the new is_idle_task() API

Change from direct comparison of ->pid with zero to is_idle_task().

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agokdb: Make KDB use the new is_idle_task() API
Paul E. McKenney [Thu, 10 Nov 2011 23:59:58 +0000 (15:59 -0800)] 
kdb: Make KDB use the new is_idle_task() API

Change from direct comparison of ->pid with zero to is_idle_task().

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agosparc: Make SPARC use the new is_idle_task() API
Paul E. McKenney [Thu, 10 Nov 2011 23:56:46 +0000 (15:56 -0800)] 
sparc: Make SPARC use the new is_idle_task() API

Change from direct comparison of ->pid with zero to is_idle_task().

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Make RCU use the new is_idle_task() API
Paul E. McKenney [Thu, 10 Nov 2011 23:48:45 +0000 (15:48 -0800)] 
rcu: Make RCU use the new is_idle_task() API

Change from direct comparison of ->pid with zero to is_idle_task().

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agosched: Add is_idle_task() to handle invalidated uses of idle_cpu()
Paul E. McKenney [Thu, 10 Nov 2011 20:41:56 +0000 (12:41 -0800)] 
sched: Add is_idle_task() to handle invalidated uses of idle_cpu()

Commit 908a3283 (Fix idle_cpu()) invalidated some uses of idle_cpu(),
which used to say whether or not the CPU was running the idle task,
but now instead says whether or not the CPU is running the idle task
in the absence of pending wakeups.  Although this new implementation
gives a better answer to the question "is this CPU idle?", it also
invalidates other uses that were made of idle_cpu().

This commit therefore introduces a new is_idle_task() API member
that determines whether or not the specified task is one of the
idle tasks, allowing open-coded "->pid == 0" sequences to be replaced
by something more meaningful.

Suggested-by: Josh Triplett <josh@joshtriplett.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Control rcutorture startup from kernel boot parameters
Paul E. McKenney [Fri, 4 Nov 2011 20:24:07 +0000 (13:24 -0700)] 
rcu: Control rcutorture startup from kernel boot parameters

Currently, if rcutorture is built into the kernel, it must be manually
started or started from an init script.  This is inconvenient for
automated KVM testing, where it is good to be able to fully control
rcutorture execution from the kernel parameters.  This patch therefore
adds a module parameter named "rcutorture_runnable" that defaults
to zero ("don't start automatically"), but which can be set to one
to cause rcutorture to start up immediately during boot.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Add rcutorture system-shutdown capability
Paul E. McKenney [Fri, 4 Nov 2011 18:44:12 +0000 (11:44 -0700)] 
rcu: Add rcutorture system-shutdown capability

Although it is easy to run rcutorture tests under KVM, there is currently
no nice way to run such a test for a fixed time period, collect all of
the rcutorture data, and then shut the system down cleanly.  This commit
therefore adds an rcutorture module parameter named "shutdown_secs" that
specified the run duration in seconds, after which rcutorture terminates
the test and powers the system down.  The default value for "shutdown_secs"
is zero, which disables shutdown.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Permit RCU_FAST_NO_HZ to be used by TREE_PREEMPT_RCU
Paul E. McKenney [Thu, 3 Nov 2011 21:56:12 +0000 (14:56 -0700)] 
rcu: Permit RCU_FAST_NO_HZ to be used by TREE_PREEMPT_RCU

The new implementation of RCU_FAST_NO_HZ is compatible with preemptible
RCU, so this commit removes the Kconfig restriction that previously
prohibited this.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Fix idle-task checks
Paul E. McKenney [Wed, 2 Nov 2011 14:38:25 +0000 (07:38 -0700)] 
rcu: Fix idle-task checks

RCU has traditionally relied on idle_cpu() to determine whether a given
CPU is running in the context of an idle task, but commit 908a3283
(Fix idle_cpu()) has invalidated this approach.  After commit 908a3283,
idle_cpu() will return true if the current CPU is currently running the
idle task, and will be doing so for the foreseeable future.  RCU instead
needs to know whether or not the current CPU is currently running the
idle task, regardless of what the near future might bring.

This commit therefore switches from idle_cpu() to "current->pid != 0".

Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Suggested-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Allow dyntick-idle mode for CPUs with callbacks
Paul E. McKenney [Wed, 2 Nov 2011 13:54:54 +0000 (06:54 -0700)] 
rcu: Allow dyntick-idle mode for CPUs with callbacks

Currently, RCU does not permit a CPU to enter dyntick-idle mode if that
CPU has any RCU callbacks queued.  This means that workloads for which
each CPU wakes up and does some RCU updates every few ticks will never
enter dyntick-idle mode.  This can result in significant unnecessary power
consumption, so this patch permits a given to enter dyntick-idle mode if
it has callbacks, but only if that same CPU has completed all current
work for the RCU core.  We determine use rcu_pending() to determine
whether a given CPU has completed all current work for the RCU core.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Add more information to the wrong-idle-task complaint
Paul E. McKenney [Tue, 1 Nov 2011 15:57:21 +0000 (08:57 -0700)] 
rcu: Add more information to the wrong-idle-task complaint

The current code just complains if the current task is not the idle task.
This commit therefore adds printing of the identity of the idle task.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Deconfuse dynticks entry-exit tracing
Paul E. McKenney [Mon, 31 Oct 2011 22:01:54 +0000 (15:01 -0700)] 
rcu: Deconfuse dynticks entry-exit tracing

The trace_rcu_dyntick() trace event did not print both the old and
the new value of the nesting level, and furthermore printed only
the low-order 32 bits of it.  This could result in some confusion
when interpreting trace-event dumps, so this commit prints both
the old and the new value, prints the full 64 bits, and also selects
the process-entry/exit increment to print nicely in hexadecimal.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Add documentation for raw SRCU read-side primitives
Paul E. McKenney [Thu, 3 Nov 2011 20:43:24 +0000 (13:43 -0700)] 
rcu: Add documentation for raw SRCU read-side primitives

Update various files in Documentation/RCU to reflect srcu_read_lock_raw()
and srcu_read_unlock_raw().  Credit to Peter Zijlstra for suggesting
use of the existing _raw suffix instead of the earlier bulkref names.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Introduce raw SRCU read-side primitives
Paul E. McKenney [Sun, 9 Oct 2011 22:13:11 +0000 (15:13 -0700)] 
rcu: Introduce raw SRCU read-side primitives

The RCU implementations, including SRCU, are designed to be used in a
lock-like fashion, so that the read-side lock and unlock primitives must
execute in the same context for any given read-side critical section.
This constraint is enforced by lockdep-RCU.  However, there is a need
to enter an SRCU read-side critical section within the context of an
exception and then exit in the context of the task that encountered the
exception.  The cost of this capability is that the read-side operations
incur the overhead of disabling interrupts.

Note that although the current implementation allows a given read-side
critical section to be entered by one task and then exited by another, all
known possible implementations that allow this have scalability problems.
Therefore, a given read-side critical section must be exited by the same
task that entered it, though perhaps from an interrupt or exception
handler running within that task's context.  But if you are thinking
in terms of interrupt handlers, make sure that you have considered the
possibility of threaded interrupt handlers.

Credit goes to Peter Zijlstra for suggesting use of the existing _raw
suffix to indicate disabling lockdep over the earlier "bulkref" names.

Requested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
12 years agopowerpc: Tell RCU about idle after hcall tracing
Paul E. McKenney [Fri, 14 Oct 2011 02:05:12 +0000 (19:05 -0700)] 
powerpc: Tell RCU about idle after hcall tracing

The PowerPC pSeries platform (CONFIG_PPC_PSERIES=y) enables
hypervisor-call tracing for CONFIG_TRACEPOINTS=y kernels.  One of the
hypervisor calls that is traced is the H_CEDE call in the idle loop
that tells the hypervisor that this OS instance no longer needs the
current CPU.  However, tracing uses RCU, so this combination of kernel
configuration variables needs to avoid telling RCU about the current CPU's
idleness until after the H_CEDE-entry tracing completes on the one hand,
and must tell RCU that the the current CPU is no longer idle before the
H_CEDE-exit tracing starts.

In all other cases, it suffices to inform RCU of CPU idleness upon
idle-loop entry and exit.

This commit makes the required adjustments.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Fix early call to rcu_idle_enter()
Frederic Weisbecker [Fri, 7 Oct 2011 23:31:02 +0000 (16:31 -0700)] 
rcu: Fix early call to rcu_idle_enter()

On the irq exit path, tick_nohz_irq_exit()
may raise a softirq, which action leads to the wake up
path and select_task_rq_fair() that makes use of rcu
to iterate the domains.

This is an illegal use of RCU because we may be in RCU
extended quiescent state if we interrupted an RCU-idle
window in the idle loop:

[  132.978883] ===============================
[  132.978883] [ INFO: suspicious RCU usage. ]
[  132.978883] -------------------------------
[  132.978883] kernel/sched_fair.c:1707 suspicious rcu_dereference_check() usage!
[  132.978883]
[  132.978883] other info that might help us debug this:
[  132.978883]
[  132.978883]
[  132.978883] rcu_scheduler_active = 1, debug_locks = 0
[  132.978883] RCU used illegally from extended quiescent state!
[  132.978883] 2 locks held by swapper/0:
[  132.978883]  #0:  (&p->pi_lock){-.-.-.}, at: [<ffffffff8105a729>] try_to_wake_up+0x39/0x2f0
[  132.978883]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff8105556a>] select_task_rq_fair+0x6a/0xec0
[  132.978883]
[  132.978883] stack backtrace:
[  132.978883] Pid: 0, comm: swapper Tainted: G        W   3.0.0+ #178
[  132.978883] Call Trace:
[  132.978883]  <IRQ>  [<ffffffff810a01f6>] lockdep_rcu_suspicious+0xe6/0x100
[  132.978883]  [<ffffffff81055c49>] select_task_rq_fair+0x749/0xec0
[  132.978883]  [<ffffffff8105556a>] ? select_task_rq_fair+0x6a/0xec0
[  132.978883]  [<ffffffff812fe494>] ? do_raw_spin_lock+0x54/0x150
[  132.978883]  [<ffffffff810a1f2d>] ? trace_hardirqs_on+0xd/0x10
[  132.978883]  [<ffffffff8105a7c3>] try_to_wake_up+0xd3/0x2f0
[  132.978883]  [<ffffffff81094f98>] ? ktime_get+0x68/0xf0
[  132.978883]  [<ffffffff8105aa35>] wake_up_process+0x15/0x20
[  132.978883]  [<ffffffff81069dd5>] raise_softirq_irqoff+0x65/0x110
[  132.978883]  [<ffffffff8108eb65>] __hrtimer_start_range_ns+0x415/0x5a0
[  132.978883]  [<ffffffff812fe3ee>] ? do_raw_spin_unlock+0x5e/0xb0
[  132.978883]  [<ffffffff8108ed08>] hrtimer_start+0x18/0x20
[  132.978883]  [<ffffffff8109c9c3>] tick_nohz_stop_sched_tick+0x393/0x450
[  132.978883]  [<ffffffff810694f2>] irq_exit+0xd2/0x100
[  132.978883]  [<ffffffff81829e96>] do_IRQ+0x66/0xe0
[  132.978883]  [<ffffffff81820d53>] common_interrupt+0x13/0x13
[  132.978883]  <EOI>  [<ffffffff8103434b>] ? native_safe_halt+0xb/0x10
[  132.978883]  [<ffffffff810a1f2d>] ? trace_hardirqs_on+0xd/0x10
[  132.978883]  [<ffffffff810144ea>] default_idle+0xba/0x370
[  132.978883]  [<ffffffff810147fe>] amd_e400_idle+0x5e/0x130
[  132.978883]  [<ffffffff8100a9f6>] cpu_idle+0xb6/0x120
[  132.978883]  [<ffffffff817f217f>] rest_init+0xef/0x150
[  132.978883]  [<ffffffff817f20e2>] ? rest_init+0x52/0x150
[  132.978883]  [<ffffffff81ed9cf3>] start_kernel+0x3da/0x3e5
[  132.978883]  [<ffffffff81ed9346>] x86_64_start_reservations+0x131/0x135
[  132.978883]  [<ffffffff81ed944d>] x86_64_start_kernel+0x103/0x112

Fix this by calling rcu_idle_enter() after tick_nohz_irq_exit().

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agox86: Call idle notifier after irq_enter()
Frederic Weisbecker [Fri, 7 Oct 2011 16:22:09 +0000 (18:22 +0200)] 
x86: Call idle notifier after irq_enter()

Interrupts notify the idle exit state before calling irq_enter().
But the notifier code calls rcu_read_lock() and this is not
allowed while rcu is in an extended quiescent state. We need
to wait for irq_enter() -> rcu_idle_exit() to be called before
doing so otherwise this results in a grumpy RCU:

[    0.099991] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110()
[    0.099991] Hardware name: AMD690VM-FMH
[    0.099991] Modules linked in:
[    0.099991] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #255
[    0.099991] Call Trace:
[    0.099991]  <IRQ>  [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0
[    0.099991]  [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20
[    0.099991]  [<ffffffff817d6fa2>] __atomic_notifier_call_chain+0xd2/0x110
[    0.099991]  [<ffffffff817d6ff1>] atomic_notifier_call_chain+0x11/0x20
[    0.099991]  [<ffffffff81001873>] exit_idle+0x43/0x50
[    0.099991]  [<ffffffff81020439>] smp_apic_timer_interrupt+0x39/0xa0
[    0.099991]  [<ffffffff817da253>] apic_timer_interrupt+0x13/0x20
[    0.099991]  <EOI>  [<ffffffff8100ae67>] ? default_idle+0xa7/0x350
[    0.099991]  [<ffffffff8100ae65>] ? default_idle+0xa5/0x350
[    0.099991]  [<ffffffff8100b19b>] amd_e400_idle+0x8b/0x110
[    0.099991]  [<ffffffff810cb01f>] ? rcu_enter_nohz+0x8f/0x160
[    0.099991]  [<ffffffff810019a0>] cpu_idle+0xb0/0x110
[    0.099991]  [<ffffffff817a7505>] rest_init+0xe5/0x140
[    0.099991]  [<ffffffff817a7468>] ? rest_init+0x48/0x140
[    0.099991]  [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc
[    0.099991]  [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135
[    0.099991]  [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Henroid <andrew.d.henroid@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agox86: Enter rcu extended qs after idle notifier call
Frederic Weisbecker [Fri, 7 Oct 2011 16:22:08 +0000 (18:22 +0200)] 
x86: Enter rcu extended qs after idle notifier call

The idle notifier, called by enter_idle(), enters into rcu read
side critical section but at that time we already switched into
the RCU-idle window (rcu_idle_enter() has been called). And it's
illegal to use rcu_read_lock() in that state.

This results in rcu reporting its bad mood:

[    1.275635] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110()
[    1.275635] Hardware name: AMD690VM-FMH
[    1.275635] Modules linked in:
[    1.275635] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #252
[    1.275635] Call Trace:
[    1.275635]  [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0
[    1.275635]  [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20
[    1.275635]  [<ffffffff817d6f22>] __atomic_notifier_call_chain+0xd2/0x110
[    1.275635]  [<ffffffff817d6f71>] atomic_notifier_call_chain+0x11/0x20
[    1.275635]  [<ffffffff810018a0>] enter_idle+0x20/0x30
[    1.275635]  [<ffffffff81001995>] cpu_idle+0xa5/0x110
[    1.275635]  [<ffffffff817a7465>] rest_init+0xe5/0x140
[    1.275635]  [<ffffffff817a73c8>] ? rest_init+0x48/0x140
[    1.275635]  [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc
[    1.275635]  [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135
[    1.275635]  [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4
[    1.275635] ---[ end trace a22d306b065d4a66 ]---

Fix this by entering rcu extended quiescent state later, just before
the CPU goes to sleep.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agonohz: Allow rcu extended quiescent state handling seperately from tick stop
Frederic Weisbecker [Sat, 8 Oct 2011 14:01:00 +0000 (16:01 +0200)] 
nohz: Allow rcu extended quiescent state handling seperately from tick stop

It is assumed that rcu won't be used once we switch to tickless
mode and until we restart the tick. However this is not always
true, as in x86-64 where we dereference the idle notifiers after
the tick is stopped.

To prepare for fixing this, add two new APIs:
tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().

If no use of RCU is made in the idle loop between
tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
must instead call the new *_norcu() version such that the arch doesn't
need to call rcu_idle_enter() and rcu_idle_exit().

Otherwise the arch must call tick_nohz_enter_idle() and
tick_nohz_exit_idle() and also call explicitly:

- rcu_idle_enter() after its last use of RCU before the CPU is put
to sleep.
- rcu_idle_exit() before the first use of RCU after the CPU is woken
up.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agonohz: Separate out irq exit and idle loop dyntick logic
Frederic Weisbecker [Fri, 7 Oct 2011 16:22:06 +0000 (18:22 +0200)] 
nohz: Separate out irq exit and idle loop dyntick logic

The tick_nohz_stop_sched_tick() function, which tries to delay
the next timer tick as long as possible, can be called from two
places:

- From the idle loop to start the dytick idle mode
- From interrupt exit if we have interrupted the dyntick
idle mode, so that we reprogram the next tick event in
case the irq changed some internal state that requires this
action.

There are only few minor differences between both that
are handled by that function, driven by the ts->inidle
cpu variable and the inidle parameter. The whole guarantees
that we only update the dyntick mode on irq exit if we actually
interrupted the dyntick idle mode, and that we enter in RCU extended
quiescent state from idle loop entry only.

Split this function into:

- tick_nohz_idle_enter(), which sets ts->inidle to 1, enters
dynticks idle mode unconditionally if it can, and enters into RCU
extended quiescent state.

- tick_nohz_irq_exit() which only updates the dynticks idle mode
when ts->inidle is set (ie: if tick_nohz_idle_enter() has been called).

To maintain symmetry, tick_nohz_restart_sched_tick() has been renamed
into tick_nohz_idle_exit().

This simplifies the code and micro-optimize the irq exit path (no need
for local_irq_save there). This also prepares for the split between
dynticks and rcu extended quiescent state logics. We'll need this split to
further fix illegal uses of RCU in extended quiescent states in the idle
loop.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Make srcu_read_lock_held() call common lockdep-enabled function
Paul E. McKenney [Fri, 7 Oct 2011 16:22:05 +0000 (18:22 +0200)] 
rcu: Make srcu_read_lock_held() call common lockdep-enabled function

A common debug_lockdep_rcu_enabled() function is used to check whether
RCU lockdep splats should be reported, but srcu_read_lock() does not
use it.  This commit therefore brings srcu_read_lock_held() up to date.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Warn when srcu_read_lock() is used in an extended quiescent state
Paul E. McKenney [Fri, 7 Oct 2011 16:22:04 +0000 (18:22 +0200)] 
rcu: Warn when srcu_read_lock() is used in an extended quiescent state

Catch SRCU up to the other variants of RCU by making PROVE_RCU
complain if either srcu_read_lock() or srcu_read_lock_held() are
used from within RCU-idle mode.

Frederic reworked this to allow for the new versions of his patches
that check for extended quiescent states.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Remove one layer of abstraction from PROVE_RCU checking
Paul E. McKenney [Fri, 7 Oct 2011 16:22:03 +0000 (18:22 +0200)] 
rcu: Remove one layer of abstraction from PROVE_RCU checking

Simplify things a bit by substituting the definitions of the single-line
rcu_read_acquire(), rcu_read_release(), rcu_read_acquire_bh(),
rcu_read_release_bh(), rcu_read_acquire_sched(), and
rcu_read_release_sched() functions at their call points.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Warn when rcu_read_lock() is used in extended quiescent state
Frederic Weisbecker [Fri, 7 Oct 2011 16:22:02 +0000 (18:22 +0200)] 
rcu: Warn when rcu_read_lock() is used in extended quiescent state

We are currently able to detect uses of rcu_dereference_check() inside
extended quiescent states (such as the RCU-free window in idle).
But rcu_read_lock() and friends can be used without rcu_dereference(),
so that the earlier commit checking for use of rcu_dereference() and
friends while in RCU idle mode miss some error conditions.  This commit
therefore adds extended quiescent state checking to rcu_read_lock() and
friends.

Uses of RCU from within RCU-idle mode are totally ignored by
RCU, hence the importance of these checks.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Inform the user about extended quiescent state on PROVE_RCU warning
Frederic Weisbecker [Fri, 7 Oct 2011 16:22:01 +0000 (18:22 +0200)] 
rcu: Inform the user about extended quiescent state on PROVE_RCU warning

Inform the user if an RCU usage error is detected by lockdep while in
an extended quiescent state (in this case, the RCU-free window in idle).
This is accomplished by adding a line to the RCU lockdep splat indicating
whether or not the splat occurred in extended quiescent state.

Uses of RCU from within extended quiescent state mode are totally ignored
by RCU, hence the importance of this diagnostic.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Detect illegal rcu dereference in extended quiescent state
Frederic Weisbecker [Fri, 7 Oct 2011 23:25:18 +0000 (16:25 -0700)] 
rcu: Detect illegal rcu dereference in extended quiescent state

Report that none of the rcu read lock maps are held while in an RCU
extended quiescent state (the section between rcu_idle_enter()
and rcu_idle_exit()). This helps detect any use of rcu_dereference()
and friends from within the section in idle where RCU is not allowed.

This way we can guarantee an extended quiescent window where the CPU
can be put in dyntick idle mode or can simply aoid to be part of any
global grace period completion while in the idle loop.

Uses of RCU from such mode are totally ignored by RCU, hence the
importance of these checks.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Remove redundant return from rcu_report_exp_rnp()
Thomas Gleixner [Thu, 3 Nov 2011 19:08:17 +0000 (12:08 -0700)] 
rcu: Remove redundant return from rcu_report_exp_rnp()

Empty void functions do not need "return", so this commit removes it
from rcu_report_exp_rnp().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Omit self-awaken when setting up expedited grace period
Thomas Gleixner [Sat, 22 Oct 2011 14:12:34 +0000 (07:12 -0700)] 
rcu: Omit self-awaken when setting up expedited grace period

When setting up an expedited grace period, if there were no readers, the
task will awaken itself.  This commit removes this useless self-awakening.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Disable preemption in rcu_is_cpu_idle()
Paul E. McKenney [Mon, 3 Oct 2011 18:38:52 +0000 (11:38 -0700)] 
rcu: Disable preemption in rcu_is_cpu_idle()

Because rcu_is_cpu_idle() is to be used to check for extended quiescent
states in RCU-preempt read-side critical sections, it cannot assume that
preemption is disabled.  And preemption must be disabled when accessing
the dyntick-idle state, because otherwise the following sequence of events
could occur:

1. Task A on CPU 1 enters rcu_is_cpu_idle() and picks up the pointer
to CPU 1's per-CPU variables.

2. Task B preempts Task A and starts running on CPU 1.

3. Task A migrates to CPU 2.

4. Task B blocks, leaving CPU 1 idle.

5. Task A continues execution on CPU 2, accessing CPU 1's dyntick-idle
information using the pointer fetched in step 1 above, and finds
that CPU 1 is idle.

6. Task A therefore incorrectly concludes that it is executing in
an extended quiescent state, possibly issuing a spurious splat.

Therefore, this commit disables preemption within the rcu_is_cpu_idle()
function.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Document failing tick as cause of RCU CPU stall warning
Paul E. McKenney [Mon, 3 Oct 2011 00:21:18 +0000 (17:21 -0700)] 
rcu: Document failing tick as cause of RCU CPU stall warning

One of lclaudio's systems was seeing RCU CPU stall warnings from idle.
These turned out to be caused by a bug that stopped scheduling-clock
tick interrupts from being sent to a given CPU for several hundred seconds.
This commit therefore updates the documentation to call this out as a
possible cause for RCU CPU stall warnings.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Add failure tracing to rcutorture
Paul E. McKenney [Sun, 2 Oct 2011 14:44:32 +0000 (07:44 -0700)] 
rcu: Add failure tracing to rcutorture

Trace the rcutorture RCU accesses and dump the trace buffer when the
first failure is detected.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agotrace: Allow ftrace_dump() to be called from modules
Paul E. McKenney [Sun, 2 Oct 2011 18:01:15 +0000 (11:01 -0700)] 
trace: Allow ftrace_dump() to be called from modules

Add an EXPORT_SYMBOL_GPL() so that rcutorture can dump the trace buffer
upon detection of an RCU error.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Track idleness independent of idle tasks
Paul E. McKenney [Fri, 30 Sep 2011 19:10:22 +0000 (12:10 -0700)] 
rcu: Track idleness independent of idle tasks

Earlier versions of RCU used the scheduling-clock tick to detect idleness
by checking for the idle task, but handled idleness differently for
CONFIG_NO_HZ=y.  But there are now a number of uses of RCU read-side
critical sections in the idle task, for example, for tracing.  A more
fine-grained detection of idleness is therefore required.

This commit presses the old dyntick-idle code into full-time service,
so that rcu_idle_enter(), previously known as rcu_enter_nohz(), is
always invoked at the beginning of an idle loop iteration.  Similarly,
rcu_idle_exit(), previously known as rcu_exit_nohz(), is always invoked
at the end of an idle-loop iteration.  This allows the idle task to
use RCU everywhere except between consecutive rcu_idle_enter() and
rcu_idle_exit() calls, in turn allowing architecture maintainers to
specify exactly where in the idle loop that RCU may be used.

Because some of the userspace upcall uses can result in what looks
to RCU like half of an interrupt, it is not possible to expect that
the irq_enter() and irq_exit() hooks will give exact counts.  This
patch therefore expands the ->dynticks_nesting counter to 64 bits
and uses two separate bitfields to count process/idle transitions
and interrupt entry/exit transitions.  It is presumed that userspace
upcalls do not happen in the idle loop or from usermode execution
(though usermode might do a system call that results in an upcall).
The counter is hard-reset on each process/idle transition, which
avoids the interrupt entry/exit error from accumulating.  Overflow
is avoided by the 64-bitness of the ->dyntick_nesting counter.

This commit also adds warnings if a non-idle task asks RCU to enter
idle state (and these checks will need some adjustment before applying
Frederic's OS-jitter patches (http://lkml.org/lkml/2011/10/7/246).
In addition, validation of ->dynticks and ->dynticks_nesting is added.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agolockdep: Update documentation for lock-class leak detection
Paul E. McKenney [Wed, 28 Sep 2011 17:23:39 +0000 (10:23 -0700)] 
lockdep: Update documentation for lock-class leak detection

There are a number of bugs that can leak or overuse lock classes,
which can cause the maximum number of lock classes (currently 8191)
to be exceeded.  However, the documentation does not tell you how to
track down these problems.  This commit addresses this shortcoming.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Make synchronize_sched_expedited() better at work sharing
Paul E. McKenney [Thu, 22 Sep 2011 20:18:44 +0000 (13:18 -0700)] 
rcu: Make synchronize_sched_expedited() better at work sharing

When synchronize_sched_expedited() takes its second and subsequent
snapshots of sync_sched_expedited_started, it subtracts 1.  This
means that the concurrent caller of synchronize_sched_expedited()
that incremented to that value sees our successful completion, it
will not be able to take advantage of it.  This restriction is
pointless, given that our full expedited grace period would have
happened after the other guy started, and thus should be able to
serve as a proxy for the other guy successfully executing
try_stop_cpus().

This commit therefore removes the subtraction of 1.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Avoid RCU-preempt expedited grace-period botch
Paul E. McKenney [Wed, 21 Sep 2011 21:41:37 +0000 (14:41 -0700)] 
rcu: Avoid RCU-preempt expedited grace-period botch

Because rcu_read_unlock_special() samples rcu_preempted_readers_exp(rnp)
after dropping rnp->lock, the following sequence of events is possible:

1. Task A exits its RCU read-side critical section, and removes
itself from the ->blkd_tasks list, releases rnp->lock, and is
then preempted.  Task B remains on the ->blkd_tasks list, and
blocks the current expedited grace period.

2. Task B exits from its RCU read-side critical section and removes
itself from the ->blkd_tasks list.  Because it is the last task
blocking the current expedited grace period, it ends that
expedited grace period.

3. Task A resumes, and samples rcu_preempted_readers_exp(rnp) which
of course indicates that nothing is blocking the nonexistent
expedited grace period. Task A is again preempted.

4. Some other CPU starts an expedited grace period.  There are several
tasks blocking this expedited grace period queued on the
same rcu_node structure that Task A was using in step 1 above.

5. Task A examines its state and incorrectly concludes that it was
the last task blocking the expedited grace period on the current
rcu_node structure.  It therefore reports completion up the
rcu_node tree.

6. The expedited grace period can then incorrectly complete before
the tasks blocked on this same rcu_node structure exit their
RCU read-side critical sections.  Arbitrarily bad things happen.

This commit therefore takes a snapshot of rcu_preempted_readers_exp(rnp)
prior to dropping the lock, so that only the last task thinks that it is
the last task, thus avoiding the failure scenario laid out above.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: ->signaled better named ->fqs_state
Paul E. McKenney [Sun, 11 Sep 2011 04:54:08 +0000 (21:54 -0700)] 
rcu: ->signaled better named ->fqs_state

The ->signaled field was named before complications in the form of
dyntick-idle mode and offlined CPUs.  These complications have required
that force_quiescent_state() be implemented as a state machine, instead
of simply unconditionally sending reschedule IPIs.  Therefore, this
commit renames ->signaled to ->fqs_state to catch up with the new
force_quiescent_state() reality.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agommc: core: Fix deadlock when the CONFIG_MMC_UNSAFE_RESUME is not defined
Sujit Reddy Thumma [Wed, 23 Nov 2011 03:13:18 +0000 (08:43 +0530)] 
mmc: core: Fix deadlock when the CONFIG_MMC_UNSAFE_RESUME is not defined

mmc_suspend_host() tries to claim host during suspend
and release it only when the bus suspend operation is
compeleted. If CONFIG_MMC_UNSAFE_RESUME is defined and
the host is flagged as removable, mmc_suspend_host()
tries to remove the card. In this process, the file system
sync can get blocked trying to acquire host which is already
claimed by mmc_suspend_host() causing deadlock.

Fix this deadlock by releasing host before ->remove() is called.

Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
Acked-by: Ulf Hansson <ulf.hansson@stericsson.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
12 years agommc: sdhci-s3c: Remove old and misprototyped suspend operations
Mark Brown [Mon, 21 Nov 2011 18:00:51 +0000 (18:00 +0000)] 
mmc: sdhci-s3c: Remove old and misprototyped suspend operations

Now that the driver is using dev_pm_ops the suspend operations in the
platform_driver structure won't get called so don't need to be there,
and certainly shouldn't be the same function as dev_pm_ops since the
signatures are different.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
12 years agommc: tmio: fix clock gating on platforms with a .set_pwr() method
Guennadi Liakhovetski [Wed, 16 Nov 2011 09:11:45 +0000 (10:11 +0100)] 
mmc: tmio: fix clock gating on platforms with a .set_pwr() method

Do not power down the card in .set_ios(), unless MMC_POWER_OFF is
requested. This fixes the SDHI functionality on ecovec.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Chris Ball <cjb@laptop.org>
12 years agommc: sh_mmcif: fix clock gating on platforms with a .down_pwr() method
Guennadi Liakhovetski [Wed, 16 Nov 2011 09:10:41 +0000 (10:10 +0100)] 
mmc: sh_mmcif: fix clock gating on platforms with a .down_pwr() method

Do not power down the card in .set_ios(), unless MMC_POWER_OFF is
requested. This fixes the MMCIF interface functionality on ecovec boards.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Chris Ball <cjb@laptop.org>
12 years agommc: core: Fix typo at mmc_card_sleep
Kyungmin Park [Thu, 17 Nov 2011 04:34:33 +0000 (13:34 +0900)] 
mmc: core: Fix typo at mmc_card_sleep

Fix wrong bus_ops->sleep check.  (This isn't expected to have real-world
consequences, because the mmc core always defines both 'awake' and
'sleep' ops.)

Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
12 years agommc: core: Fix power_off_notify during suspend
Girish K S [Tue, 15 Nov 2011 06:25:46 +0000 (11:55 +0530)] 
mmc: core: Fix power_off_notify during suspend

The eMMC 4.5 devices respond to only RESET and AWAKE command in the
sleep state. Hence the mmc switch command to notify power off state
should be sent before the device enters sleep state.

This patch fixes the same.

Signed-off-by: Girish K S <girish.shivananjappa@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
12 years agommc: core: Fix setting power notify state variable for non-eMMC
Girish K S [Fri, 4 Nov 2011 10:52:47 +0000 (16:22 +0530)] 
mmc: core: Fix setting power notify state variable for non-eMMC

This patch skips the setting of the power notify state variable
for non eMMC 4.5 devices. Also fixes the problem of omap_hsmmc
noisy/broken for suspend resume reported by Kevin Hilman.

Signed-off-by: Girish K S <girish.shivananjappa@linaro.org>
Acked-by: Ulf Hansson <ulf.hansson@stericsson.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
12 years agommc: core: Add quirk for long data read time
Stefan Nilsson XK [Thu, 3 Nov 2011 08:44:12 +0000 (09:44 +0100)] 
mmc: core: Add quirk for long data read time

Adds a quirk that sets the data read timeout to a fixed value instead
of relying on the information in the CSD. The timeout value chosen
is 300ms since that has proven enough for the problematic cards found,
but could be increased if other cards require this.

This patch also enables this quirk for certain Micron cards known to
have this problem.

Signed-off-by: Stefan Nilsson XK <stefan.xk.nilsson@stericsson.com>
Signed-off-by: Ulf Hansson <ulf.hansson@stericsson.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Cc: <stable@kernel.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
12 years agommc: Add module.h include to sdhci-cns3xxx.c
Chris Ball [Sat, 12 Nov 2011 00:54:53 +0000 (19:54 -0500)] 
mmc: Add module.h include to sdhci-cns3xxx.c

Fixes: drivers/mmc/host/sdhci-cns3xxx.c:110: error: 'THIS_MODULE'
       undeclared here (not in a function)

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Chris Ball <cjb@laptop.org>
12 years agommc: mxcmmc: fix falling back to PIO
Sascha Hauer [Fri, 11 Nov 2011 15:28:05 +0000 (16:28 +0100)] 
mmc: mxcmmc: fix falling back to PIO

When we can't configure the dma channel we want to fall
back to PIO. We do this by setting host->do_dma to zero.
This does not work as do_dma is used to see whether dma
can be used for the current transfer. Instead, we have
to set host->dma to NULL.

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Cc: <stable@kernel.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
12 years agox86, efi: Make efi_call_phys_{prelog,epilog} CONFIG_RELOCATABLE-aware
Matt Fleming [Thu, 8 Dec 2011 12:10:50 +0000 (12:10 +0000)] 
x86, efi: Make efi_call_phys_{prelog,epilog} CONFIG_RELOCATABLE-aware

efi_call_phys_prelog() sets up a 1:1 mapping of the physical address
range in swapper_pg_dir. Instead of replacing then restoring entries
in swapper_pg_dir we should be using initial_page_table which already
contains the 1:1 mapping.

It's safe to blindly switch back to swapper_pg_dir in the epilog
because the physical EFI routines are only called before
efi_enter_virtual_mode(), e.g. before any user processes have been
forked. Therefore, we don't need to track which pgd was in %cr3 when
we entered the prelog.

The previous code actually contained a bug because it assumed that the
kernel was loaded at a physical address within the first 8MB of ram,
usually at 0x100000. However, this isn't the case with a
CONFIG_RELOCATABLE=y kernel which could have been loaded anywhere in
the physical address space.

Also delete the ancient (and bogus) comments about the page table
being restored after the lock is released. There is no locking.

Cc: Matthew Garrett <mjg@redhat.com>
Cc: Darrent Hart <dvhart@linux.intel.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1323346250.3894.74.camel@mfleming-mobl1.ger.corp.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
12 years agoLinux 3.2-rc5
Linus Torvalds [Fri, 9 Dec 2011 23:09:32 +0000 (15:09 -0800)] 
Linux 3.2-rc5

12 years agoMerge git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Fri, 9 Dec 2011 22:45:44 +0000 (14:45 -0800)] 
Merge git://git.samba.org/sfrench/cifs-2.6

* git://git.samba.org/sfrench/cifs-2.6:
  cifs: check for NULL last_entry before calling cifs_save_resume_key
  cifs: attempt to freeze while looping on a receive attempt
  cifs: Fix sparse warning when calling cifs_strtoUCS
  CIFS: Add descriptions to the brlock cache functions

12 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 9 Dec 2011 22:45:12 +0000 (14:45 -0800)] 
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, efi: Calling __pa() with an ioremap()ed address is invalid
  x86, hpet: Immediately disable HPET timer 1 if rtc irq is masked
  x86/intel_mid: Kconfig select fix
  x86/intel_mid: Fix the Kconfig for MID selection

12 years agoMerge branch 'spi/for-3.2' of git://git.pengutronix.de/git/wsa/linux-2.6
Linus Torvalds [Fri, 9 Dec 2011 22:41:50 +0000 (14:41 -0800)] 
Merge branch 'spi/for-3.2' of git://git.pengutronix.de/git/wsa/linux-2.6

* 'spi/for-3.2' of git://git.pengutronix.de/git/wsa/linux-2.6:
  spi/gpio: fix section mismatch warning
  spi/fsl-espi: disable CONFIG_SPI_FSL_ESPI=m build
  spi/nuc900: Include linux/module.h
  spi/ath79: fix compile error due to missing include

12 years agoMerge branch 'for-linus' of git://neil.brown.name/md
Linus Torvalds [Fri, 9 Dec 2011 16:18:08 +0000 (08:18 -0800)] 
Merge branch 'for-linus' of git://neil.brown.name/md

* 'for-linus' of git://neil.brown.name/md:
  md: raid5 crash during degradation
  md/raid5: never wait for bad-block acks on failed device.
  md: ensure new badblocks are handled promptly.
  md: bad blocks shouldn't cause a Blocked status on a Faulty device.
  md: take a reference to mddev during sysfs access.
  md: refine interpretation of "hold_active == UNTIL_IOCTL".
  md/lock: ensure updates to page_attrs are properly locked.

12 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
Linus Torvalds [Fri, 9 Dec 2011 16:08:57 +0000 (08:08 -0800)] 
Merge git://git./linux/kernel/git/cmetcalf/linux-tile

* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
  arch/tile: use new generic {enable,disable}_percpu_irq() routines
  drivers/net/ethernet/tile: use skb_frag_page() API
  asm-generic/unistd.h: support new process_vm_{readv,write} syscalls
  arch/tile: fix double-free bug in homecache_free_pages()
  arch/tile: add a few #includes and an EXPORT to catch up with kernel changes.

12 years agoMerge branch 'iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro...
Linus Torvalds [Fri, 9 Dec 2011 16:08:14 +0000 (08:08 -0800)] 
Merge branch 'iommu/fixes' of git://git./linux/kernel/git/joro/iommu

* 'iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  MAINTAINERS: Update amd-iommu F: patterns
  iommu/amd: Fix typo in kernel-parameters.txt
  iommu/msm: Fix compile error in mach-msm/devices-iommu.c
  Fix comparison using wrong pointer variable in dma debug code

12 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Linus Torvalds [Fri, 9 Dec 2011 16:07:42 +0000 (08:07 -0800)] 
Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek - Fix lost speaker volume controls
  ALSA: hda/realtek - Create "Bass Speaker" for two speaker pins
  ALSA: hda/realtek - Don't create extra controls with channel suffix
  ALSA: hda - Fix remaining VREF mute-LED NID check in post-3.1 changes
  ALSA: hda - Fix GPIO LED setup for IDT 92HD75 codecs
  ASoC: Provide a more complete DMA driver stub
  ASoC: Remove references to corgi and spitz from machine driver document
  ASoC: Make SND_SOC_MX27VIS_AIC32X4 depend on I2C
  ASoC: Fix dependency for SND_SOC_RAUMFELD and SND_PXA2XX_SOC_HX4700
  ASoC: uda1380: Return proper error in uda1380_modinit failure path
  ASoC: kirkwood: Make SND_KIRKWOOD_SOC_OPENRD and SND_KIRKWOOD_SOC_T5325 depend on I2C
  ASoC: Mark WM8994 ADC muxes as virtual
  ALSA: hda/realtek - Fix Oops in alc_mux_select()
  ALSA: sis7019 - give slow codecs more time to reset

12 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 9 Dec 2011 16:07:24 +0000 (08:07 -0800)] 
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Do no try to schedule task events if there are none
  lockdep, kmemcheck: Annotate ->lock in lockdep_init_map()
  perf header: Use event_name() to get an event name
  perf stat: Failure with "Operation not supported"

12 years agosys_getppid: add missing rcu_dereference
Mandeep Singh Baines [Thu, 8 Dec 2011 22:34:44 +0000 (14:34 -0800)] 
sys_getppid: add missing rcu_dereference

In order to safely dereference current->real_parent inside an
rcu_read_lock, we need an rcu_dereference.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This page took 0.088708 seconds and 5 git commands to generate.