deliverable/linux.git
12 years agomm: memcg: shorten preempt-disabled section around event checks
Johannes Weiner [Fri, 13 Jan 2012 01:18:23 +0000 (17:18 -0800)] 
mm: memcg: shorten preempt-disabled section around event checks

Only the ratelimit checks themselves have to run with preemption
disabled, the resulting actions - checking for usage thresholds,
updating the soft limit tree - can and should run with preemption
enabled.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Reported-by: Yong Zhang <yong.zhang0@gmail.com>
Tested-by: Yong Zhang <yong.zhang0@gmail.com>
Reported-by: Luis Henriques <henrix@camandro.org>
Tested-by: Luis Henriques <henrix@camandro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agomemcg: make mem_cgroup_split_huge_fixup() more efficient
KAMEZAWA Hiroyuki [Fri, 13 Jan 2012 01:18:20 +0000 (17:18 -0800)] 
memcg: make mem_cgroup_split_huge_fixup() more efficient

In split_huge_page(), mem_cgroup_split_huge_fixup() is called to handle
page_cgroup modifcations.  It takes move_lock_page_cgroup() and modifies
page_cgroup and LRU accounting jobs and called HPAGE_PMD_SIZE - 1 times.

But thinking again,
  - compound_lock() is held at move_accout...then, it's not necessary
    to take move_lock_page_cgroup().
  - LRU is locked and all tail pages will go into the same LRU as
    head is now on.
  - page_cgroup is contiguous in huge page range.

This patch fixes mem_cgroup_split_huge_fixup() as to be called once per
hugepage and reduce costs for spliting.

[akpm@linux-foundation.org: fix typo, per Michal]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agomm: memcg: remove unused node/section info from pc->flags
Johannes Weiner [Fri, 13 Jan 2012 01:18:18 +0000 (17:18 -0800)] 
mm: memcg: remove unused node/section info from pc->flags

To find the page corresponding to a certain page_cgroup, the pc->flags
encoded the node or section ID with the base array to compare the pc
pointer to.

Now that the per-memory cgroup LRU lists link page descriptors directly,
there is no longer any code that knows the struct page_cgroup of a PFN
but not the struct page.

[hughd@google.com: remove unused node/section info from pc->flags fix]
Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Ying Han <yinghan@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agomm: make per-memcg LRU lists exclusive
Johannes Weiner [Fri, 13 Jan 2012 01:18:15 +0000 (17:18 -0800)] 
mm: make per-memcg LRU lists exclusive

Now that all code that operated on global per-zone LRU lists is
converted to operate on per-memory cgroup LRU lists instead, there is no
reason to keep the double-LRU scheme around any longer.

The pc->lru member is removed and page->lru is linked directly to the
per-memory cgroup LRU lists, which removes two pointers from a
descriptor that exists for every page frame in the system.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Ying Han <yinghan@google.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agomm: collect LRU list heads into struct lruvec
Johannes Weiner [Fri, 13 Jan 2012 01:18:10 +0000 (17:18 -0800)] 
mm: collect LRU list heads into struct lruvec

Having a unified structure with a LRU list set for both global zones and
per-memcg zones allows to keep that code simple which deals with LRU
lists and does not care about the container itself.

Once the per-memcg LRU lists directly link struct pages, the isolation
function and all other list manipulations are shared between the memcg
case and the global LRU case.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Ying Han <yinghan@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agomm: vmscan: convert global reclaim to per-memcg LRU lists
Johannes Weiner [Fri, 13 Jan 2012 01:18:06 +0000 (17:18 -0800)] 
mm: vmscan: convert global reclaim to per-memcg LRU lists

The global per-zone LRU lists are about to go away on memcg-enabled
kernels, global reclaim must be able to find its pages on the per-memcg
LRU lists.

Since the LRU pages of a zone are distributed over all existing memory
cgroups, a scan target for a zone is complete when all memory cgroups
are scanned for their proportional share of a zone's memory.

The forced scanning of small scan targets from kswapd is limited to
zones marked unreclaimable, otherwise kswapd can quickly overreclaim by
force-scanning the LRU lists of multiple memory cgroups.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Ying Han <yinghan@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agomm: memcg: remove optimization of keeping the root_mem_cgroup LRU lists empty
Johannes Weiner [Fri, 13 Jan 2012 01:18:02 +0000 (17:18 -0800)] 
mm: memcg: remove optimization of keeping the root_mem_cgroup LRU lists empty

root_mem_cgroup, lacking a configurable limit, was never subject to
limit reclaim, so the pages charged to it could be kept off its LRU
lists.  They would be found on the global per-zone LRU lists upon
physical memory pressure and it made sense to avoid uselessly linking
them to both lists.

The global per-zone LRU lists are about to go away on memcg-enabled
kernels, with all pages being exclusively linked to their respective
per-memcg LRU lists.  As a result, pages of the root_mem_cgroup must
also be linked to its LRU lists again.  This is purely about the LRU
list, root_mem_cgroup is still not charged.

The overhead is temporary until the double-LRU scheme is going away
completely.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Ying Han <yinghan@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agomm: move memcg hierarchy reclaim to generic reclaim code
Johannes Weiner [Fri, 13 Jan 2012 01:17:59 +0000 (17:17 -0800)] 
mm: move memcg hierarchy reclaim to generic reclaim code

Memory cgroup limit reclaim and traditional global pressure reclaim will
soon share the same code to reclaim from a hierarchical tree of memory
cgroups.

In preparation of this, move the two right next to each other in
shrink_zone().

The mem_cgroup_hierarchical_reclaim() polymath is split into a soft
limit reclaim function, which still does hierarchy walking on its own,
and a limit (shrinking) reclaim function, which relies on generic
reclaim code to walk the hierarchy.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Ying Han <yinghan@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agomm: memcg: per-priority per-zone hierarchy scan generations
Johannes Weiner [Fri, 13 Jan 2012 01:17:55 +0000 (17:17 -0800)] 
mm: memcg: per-priority per-zone hierarchy scan generations

Memory cgroup limit reclaim currently picks one memory cgroup out of the
target hierarchy, remembers it as the last scanned child, and reclaims
all zones in it with decreasing priority levels.

The new hierarchy reclaim code will pick memory cgroups from the same
hierarchy concurrently from different zones and priority levels, it
becomes necessary that hierarchy roots not only remember the last
scanned child, but do so for each zone and priority level.

Until now, we reclaimed memcgs like this:

    mem = mem_cgroup_iter(root)
    for each priority level:
      for each zone in zonelist:
        reclaim(mem, zone)

But subsequent patches will move the memcg iteration inside the loop
over the zones:

    for each priority level:
      for each zone in zonelist:
        mem = mem_cgroup_iter(root)
        reclaim(mem, zone)

And to keep with the original scan order - memcg -> priority -> zone -
the last scanned memcg has to be remembered per zone and per priority
level.

Furthermore, global reclaim will be switched to the hierarchy walk as
well.  Different from limit reclaim, which can just recheck the limit
after some reclaim progress, its target is to scan all memcgs for the
desired zone pages, proportional to the memcg size, and so reliably
detecting a full hierarchy round-trip will become crucial.

Currently, the code relies on one reclaimer encountering the same memcg
twice, but that is error-prone with concurrent reclaimers.  Instead, use
a generation counter that is increased every time the child with the
highest ID has been visited, so that reclaimers can stop when the
generation changes.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Ying Han <yinghan@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agomm: vmscan: distinguish between memcg triggering reclaim and memcg being scanned
Johannes Weiner [Fri, 13 Jan 2012 01:17:52 +0000 (17:17 -0800)] 
mm: vmscan: distinguish between memcg triggering reclaim and memcg being scanned

Memory cgroup hierarchies are currently handled completely outside of
the traditional reclaim code, which is invoked with a single memory
cgroup as an argument for the whole call stack.

Subsequent patches will switch this code to do hierarchical reclaim, so
there needs to be a distinction between a) the memory cgroup that is
triggering reclaim due to hitting its limit and b) the memory cgroup
that is being scanned as a child of a).

This patch introduces a struct mem_cgroup_zone that contains the
combination of the memory cgroup and the zone being scanned, which is
then passed down the stack instead of the zone argument.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Ying Han <yinghan@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agomm: vmscan: distinguish global reclaim from global LRU scanning
Johannes Weiner [Fri, 13 Jan 2012 01:17:50 +0000 (17:17 -0800)] 
mm: vmscan: distinguish global reclaim from global LRU scanning

The traditional zone reclaim code is scanning the per-zone LRU lists
during direct reclaim and kswapd, and the per-zone per-memory cgroup LRU
lists when reclaiming on behalf of a memory cgroup limit.

Subsequent patches will convert the traditional reclaim code to reclaim
exclusively from the per-memory cgroup LRU lists.  As a result, using
the predicate for which LRU list is scanned will no longer be
appropriate to tell global reclaim from limit reclaim.

This patch adds a global_reclaim() predicate to tell direct/kswapd
reclaim from memory cgroup limit reclaim and substitutes it in all
places where currently scanning_global_lru() is used for that.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Ying Han <yinghan@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agomm: memcg: consolidate hierarchy iteration primitives
Johannes Weiner [Fri, 13 Jan 2012 01:17:48 +0000 (17:17 -0800)] 
mm: memcg: consolidate hierarchy iteration primitives

The memcg naturalization series:

Memory control groups are currently bolted onto the side of
traditional memory management in places where better integration would
be preferrable.  To reclaim memory, for example, memory control groups
maintain their own LRU list and reclaim strategy aside from the global
per-zone LRU list reclaim.  But an extra list head for each existing
page frame is expensive and maintaining it requires additional code.

This patchset disables the global per-zone LRU lists on memory cgroup
configurations and converts all its users to operate on the per-memory
cgroup lists instead.  As LRU pages are then exclusively on one list,
this saves two list pointers for each page frame in the system:

page_cgroup array size with 4G physical memory

  vanilla: allocated 31457280 bytes of page_cgroup
  patched: allocated 15728640 bytes of page_cgroup

At the same time, system performance for various workloads is
unaffected:

100G sparse file cat, 4G physical memory, 10 runs, to test for code
bloat in the traditional LRU handling and kswapd & direct reclaim
paths, without/with the memory controller configured in

  vanilla: 71.603(0.207) seconds
  patched: 71.640(0.156) seconds

  vanilla: 79.558(0.288) seconds
  patched: 77.233(0.147) seconds

100G sparse file cat in 1G memory cgroup, 10 runs, to test for code
bloat in the traditional memory cgroup LRU handling and reclaim path

  vanilla: 96.844(0.281) seconds
  patched: 94.454(0.311) seconds

4 unlimited memcgs running kbuild -j32 each, 4G physical memory, 500M
swap on SSD, 10 runs, to test for regressions in kswapd & direct
reclaim using per-memcg LRU lists with multiple memcgs and multiple
allocators within each memcg

  vanilla: 717.722(1.440) seconds [ 69720.100(11600.835) majfaults ]
  patched: 714.106(2.313) seconds [ 71109.300(14886.186) majfaults ]

16 unlimited memcgs running kbuild, 1900M hierarchical limit, 500M
swap on SSD, 10 runs, to test for regressions in hierarchical memcg
setups

  vanilla: 2742.058(1.992) seconds [ 26479.600(1736.737) majfaults ]
  patched: 2743.267(1.214) seconds [ 27240.700(1076.063) majfaults ]

This patch:

There are currently two different implementations of iterating over a
memory cgroup hierarchy tree.

Consolidate them into one worker function and base the convenience
looping-macros on top of it.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Ying Han <yinghan@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agomemcg: add mem_cgroup_replace_page_cache() to fix LRU issue
KAMEZAWA Hiroyuki [Fri, 13 Jan 2012 01:17:44 +0000 (17:17 -0800)] 
memcg: add mem_cgroup_replace_page_cache() to fix LRU issue

Commit ef6a3c6311 ("mm: add replace_page_cache_page() function") added a
function replace_page_cache_page().  This function replaces a page in the
radix-tree with a new page.  WHen doing this, memory cgroup needs to fix
up the accounting information.  memcg need to check PCG_USED bit etc.

In some(many?) cases, 'newpage' is on LRU before calling
replace_page_cache().  So, memcg's LRU accounting information should be
fixed, too.

This patch adds mem_cgroup_replace_page_cache() and removes the old hooks.
 In that function, old pages will be unaccounted without touching
res_counter and new page will be accounted to the memcg (of old page).
WHen overwriting pc->mem_cgroup of newpage, take zone->lru_lock and avoid
races with LRU handling.

Background:
  replace_page_cache_page() is called by FUSE code in its splice() handling.
  Here, 'newpage' is replacing oldpage but this newpage is not a newly allocated
  page and may be on LRU. LRU mis-accounting will be critical for memory cgroup
  because rmdir() checks the whole LRU is empty and there is no account leak.
  If a page is on the other LRU than it should be, rmdir() will fail.

This bug was added in March 2011, but no bug report yet.  I guess there
are not many people who use memcg and FUSE at the same time with upstream
kernels.

The result of this bug is that admin cannot destroy a memcg because of
account leak.  So, no panic, no deadlock.  And, even if an active cgroup
exist, umount can succseed.  So no problem at shutdown.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoepoll: limit paths
Jason Baron [Fri, 13 Jan 2012 01:17:43 +0000 (17:17 -0800)] 
epoll: limit paths

The current epoll code can be tickled to run basically indefinitely in
both loop detection path check (on ep_insert()), and in the wakeup paths.
The programs that tickle this behavior set up deeply linked networks of
epoll file descriptors that cause the epoll algorithms to traverse them
indefinitely.  A couple of these sample programs have been previously
posted in this thread: https://lkml.org/lkml/2011/2/25/297.

To fix the loop detection path check algorithms, I simply keep track of
the epoll nodes that have been already visited.  Thus, the loop detection
becomes proportional to the number of epoll file descriptor and links.
This dramatically decreases the run-time of the loop check algorithm.  In
one diabolical case I tried it reduced the run-time from 15 mintues (all
in kernel time) to .3 seconds.

Fixing the wakeup paths could be done at wakeup time in a similar manner
by keeping track of nodes that have already been visited, but the
complexity is harder, since there can be multiple wakeups on different
cpus...Thus, I've opted to limit the number of possible wakeup paths when
the paths are created.

This is accomplished, by noting that the end file descriptor points that
are found during the loop detection pass (from the newly added link), are
actually the sources for wakeup events.  I keep a list of these file
descriptors and limit the number and length of these paths that emanate
from these 'source file descriptors'.  In the current implemetation I
allow 1000 paths of length 1, 500 of length 2, 100 of length 3, 50 of
length 4 and 10 of length 5.  Note that it is sufficient to check the
'source file descriptors' reachable from the newly added link, since no
other 'source file descriptors' will have newly added links.  This allows
us to check only the wakeup paths that may have gotten too long, and not
re-check all possible wakeup paths on the system.

In terms of the path limit selection, I think its first worth noting that
the most common case for epoll, is probably the model where you have 1
epoll file descriptor that is monitoring n number of 'source file
descriptors'.  In this case, each 'source file descriptor' has a 1 path of
length 1.  Thus, I believe that the limits I'm proposing are quite
reasonable and in fact may be too generous.  Thus, I'm hoping that the
proposed limits will not prevent any workloads that currently work to
fail.

In terms of locking, I have extended the use of the 'epmutex' to all
epoll_ctl add and remove operations.  Currently its only used in a subset
of the add paths.  I need to hold the epmutex, so that we can correctly
traverse a coherent graph, to check the number of paths.  I believe that
this additional locking is probably ok, since its in the setup/teardown
paths, and doesn't affect the running paths, but it certainly is going to
add some extra overhead.  Also, worth noting is that the epmuex was
recently added to the ep_ctl add operations in the initial path loop
detection code using the argument that it was not on a critical path.

Another thing to note here, is the length of epoll chains that is allowed.
Currently, eventpoll.c defines:

/* Maximum number of nesting allowed inside epoll sets */
#define EP_MAX_NESTS 4

This basically means that I am limited to a graph depth of 5 (EP_MAX_NESTS
+ 1).  However, this limit is currently only enforced during the loop
check detection code, and only when the epoll file descriptors are added
in a certain order.  Thus, this limit is currently easily bypassed.  The
newly added check for wakeup paths, stricly limits the wakeup paths to a
length of 5, regardless of the order in which ep's are linked together.
Thus, a side-effect of the new code is a more consistent enforcement of
the graph depth.

Thus far, I've tested this, using the sample programs previously
mentioned, which now either return quickly or return -EINVAL.  I've also
testing using the piptest.c epoll tester, which showed no difference in
performance.  I've also created a number of different epoll networks and
tested that they behave as expectded.

I believe this solves the original diabolical test cases, while still
preserving the sane epoll nesting.

Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Nelson Elhage <nelhage@ksplice.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agopipe: fail cleanly when root tries F_SETPIPE_SZ with big size
Sasha Levin [Fri, 13 Jan 2012 01:17:40 +0000 (17:17 -0800)] 
pipe: fail cleanly when root tries F_SETPIPE_SZ with big size

When a user with the CAP_SYS_RESOURCE cap tries to F_SETPIPE_SZ a pipe
with size bigger than kmalloc() can alloc it spits out an ugly warning:

  ------------[ cut here ]------------
  WARNING: at mm/page_alloc.c:2095 __alloc_pages_nodemask+0x5d3/0x7a0()
  Pid: 733, comm: a.out Not tainted 3.2.0-rc1+ #4
  Call Trace:
     warn_slowpath_common+0x75/0xb0
     warn_slowpath_null+0x15/0x20
     __alloc_pages_nodemask+0x5d3/0x7a0
     __get_free_pages+0x12/0x50
     __kmalloc+0x12b/0x150
     pipe_set_size+0x75/0x120
     pipe_fcntl+0xf8/0x140
     do_fcntl+0x2d4/0x410
     sys_fcntl+0x66/0xa0
     system_call_fastpath+0x16/0x1b
  ---[ end trace 432f702e6db7b5ee ]---

Instead, make kcalloc() handle the overflow case and fail quietly.

[akpm@linux-foundation.org: switch to sizeof(*bufs) for 80-column niceness]
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Acked-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoslub: document setting min order with debug_guardpage_minorder > 0
Stanislaw Gruszka [Fri, 13 Jan 2012 01:17:39 +0000 (17:17 -0800)] 
slub: document setting min order with debug_guardpage_minorder > 0

Acked-by: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoparisc, exec: remove redundant set_fs(USER_DS)
Mathias Krause [Fri, 13 Jan 2012 01:17:38 +0000 (17:17 -0800)] 
parisc, exec: remove redundant set_fs(USER_DS)

The address limit is already set in flush_old_exec() so those calls to
set_fs(USER_DS) are redundant.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Helge Deller <deller@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoia64, exec: remove redundant set_fs(USER_DS)
Mathias Krause [Fri, 13 Jan 2012 01:17:36 +0000 (17:17 -0800)] 
ia64, exec: remove redundant set_fs(USER_DS)

The address limit is already set in flush_old_exec() so this
set_fs(USER_DS) is redundant.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agodrivers/video/nvidia/nvidia.c: fix warning
Andrew Morton [Fri, 13 Jan 2012 01:17:34 +0000 (17:17 -0800)] 
drivers/video/nvidia/nvidia.c: fix warning

Fix the int/bool confusion in there.

  drivers/video/nvidia/nvidia.c:1602: warning: return from incompatible pointer type

Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agomm,x86,um: move CMPXCHG_DOUBLE config option
Heiko Carstens [Fri, 13 Jan 2012 01:17:33 +0000 (17:17 -0800)] 
mm,x86,um: move CMPXCHG_DOUBLE config option

Move CMPXCHG_DOUBLE and rename it to HAVE_CMPXCHG_DOUBLE so architectures
can simply select the option if it is supported.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agomm,x86,um: move CMPXCHG_LOCAL config option
Heiko Carstens [Fri, 13 Jan 2012 01:17:30 +0000 (17:17 -0800)] 
mm,x86,um: move CMPXCHG_LOCAL config option

Move CMPXCHG_LOCAL and rename it to HAVE_CMPXCHG_LOCAL so architectures
can simply select the option if it is supported.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agomm,slub,x86: decouple size of struct page from CONFIG_CMPXCHG_LOCAL
Heiko Carstens [Fri, 13 Jan 2012 01:17:27 +0000 (17:17 -0800)] 
mm,slub,x86: decouple size of struct page from CONFIG_CMPXCHG_LOCAL

While implementing cmpxchg_double() on s390 I realized that we don't set
CONFIG_CMPXCHG_LOCAL despite the fact that we have support for it.

However setting that option will increase the size of struct page by
eight bytes on 64 bit, which we certainly do not want.  Also, it doesn't
make sense that a present cpu feature should increase the size of struct
page.

Besides that it looks like the dependency to CMPXCHG_LOCAL is wrong and
that it should depend on CMPXCHG_DOUBLE instead.

This patch:

If an architecture supports CMPXCHG_LOCAL this shouldn't result
automatically in larger struct pages if the SLUB allocator is used.
Instead introduce a new config option "HAVE_ALIGNED_STRUCT_PAGE" which
can be selected if a double word aligned struct page is required.  Also
update x86 Kconfig so that it should work as before.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoinclude/linux/linkage.h: remove unused ATTRIB_NORET macro
Joe Perches [Fri, 13 Jan 2012 01:17:25 +0000 (17:17 -0800)] 
include/linux/linkage.h: remove unused ATTRIB_NORET macro

The uses have been renamed so delete the unused macro.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agotreewide: convert uses of ATTRIB_NORETURN to __noreturn
Joe Perches [Fri, 13 Jan 2012 01:17:21 +0000 (17:17 -0800)] 
treewide: convert uses of ATTRIB_NORETURN to __noreturn

Use the more commonly used __noreturn instead of ATTRIB_NORETURN.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agotreewide: remove useless NORET_TYPE macro and uses
Joe Perches [Fri, 13 Jan 2012 01:17:17 +0000 (17:17 -0800)] 
treewide: remove useless NORET_TYPE macro and uses

It's a very old and now unused prototype marking so just delete it.

Neaten panic pointer argument style to keep checkpatch quiet.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoinclude/linux/linkage.h: remove unused NORET_AND macro
Joe Perches [Fri, 13 Jan 2012 01:17:14 +0000 (17:17 -0800)] 
include/linux/linkage.h: remove unused NORET_AND macro

The only use in kernel.h is gone so remove the macro.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agokernel.h: neaten panic prototype
Joe Perches [Fri, 13 Jan 2012 01:17:13 +0000 (17:17 -0800)] 
kernel.h: neaten panic prototype

Use __printf macro.
Convert NORET_AND to ATTRIB_NORET.
Use the normal kernel style for pointer arguments.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agokprobes: silence DEBUG_STRICT_USER_COPY_CHECKS=y warning
Stephen Boyd [Fri, 13 Jan 2012 01:17:11 +0000 (17:17 -0800)] 
kprobes: silence DEBUG_STRICT_USER_COPY_CHECKS=y warning

Enabling DEBUG_STRICT_USER_COPY_CHECKS causes the following warning:

  In file included from arch/x86/include/asm/uaccess.h:573,
                   from kernel/kprobes.c:55:
  In function 'copy_from_user',
      inlined from 'write_enabled_file_bool' at
      kernel/kprobes.c:2191:
  arch/x86/include/asm/uaccess_64.h:65:
  warning: call to 'copy_from_user_overflow' declared with attribute warning: copy_from_user() buffer size is not provably correct

presumably due to buf_size being signed causing GCC to fail to see that
buf_size can't become negative.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoproc: fix null pointer deref in proc_pid_permission()
Xiaotian Feng [Fri, 13 Jan 2012 01:17:08 +0000 (17:17 -0800)] 
proc: fix null pointer deref in proc_pid_permission()

get_proc_task() can fail to search the task and return NULL,
put_task_struct() will then bomb the kernel with following oops:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
  IP: [<ffffffff81217d34>] proc_pid_permission+0x64/0xe0
  PGD 112075067 PUD 112814067 PMD 0
  Oops: 0002 [#1] PREEMPT SMP

This is a regression introduced by commit 0499680a ("procfs: add hidepid=
and gid= mount options").  The kernel should return -ESRCH if
get_proc_task() failed.

Signed-off-by: Xiaotian Feng <dannyfeng@tencent.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Stephen Wilson <wilsons@start.ca>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agopptp: Accept packet with seq zero
Bradley Peterson [Thu, 12 Jan 2012 10:39:16 +0000 (10:39 +0000)] 
pptp: Accept packet with seq zero

Initialize the PPTP "seq received" value to 0xffffffff, so we don't
ignore packets with seq zero.

Signed-off-by: Bradley Peterson <despite@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoRDS: Remove some unused iWARP code
Roland Dreier [Thu, 12 Jan 2012 08:57:56 +0000 (08:57 +0000)] 
RDS: Remove some unused iWARP code

rds_iw_flush_goal() just returns a count, but it is only called in one
place and its return value is ignored there.  So delete all the dead code.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: fsl: fec: handle 10Mbps speed in RMII mode
Eric Benard [Thu, 12 Jan 2012 06:10:28 +0000 (06:10 +0000)] 
net: fsl: fec: handle 10Mbps speed in RMII mode

when the link is 10 Mbps and the mode is RMII, it's necessary
to set FRCONT to 1 in MIIGSK_CFGR to divide the RMII source
clock by 10 in order to support 10 Mbps operations.

Signed-off-by: Eric Bénard <eric@eukrea.com>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agodrivers/net/ethernet/stmicro/stmmac/stmmac_platform.c: add missing iounmap
Julia Lawall [Wed, 11 Jan 2012 23:55:10 +0000 (23:55 +0000)] 
drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c: add missing iounmap

Add missing iounmap in error handling code, in a case where the function
already preforms iounmap on some other execution path.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression e;
statement S,S1;
int ret;
@@
e = \(ioremap\|ioremap_nocache\)(...)
... when != iounmap(e)
if (<+...e...+>) S
... when any
    when != iounmap(e)
*if (...)
   { ... when != iounmap(e)
     return ...; }
... when any
iounmap(e);
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agodrivers/net/ethernet/tundra/tsi108_eth.c: add missing iounmap
Julia Lawall [Wed, 11 Jan 2012 23:55:04 +0000 (23:55 +0000)] 
drivers/net/ethernet/tundra/tsi108_eth.c: add missing iounmap

Add missing iounmap in error handling code, in a case where the function
already preforms iounmap on some other execution path.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression e;
statement S,S1;
int ret;
@@
e = \(ioremap\|ioremap_nocache\)(...)
... when != iounmap(e)
if (<+...e...+>) S
... when any
    when != iounmap(e)
*if (...)
   { ... when != iounmap(e)
     return ...; }
... when any
iounmap(e);
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoksz884x: fix mtu for VLAN
Doug Kehn [Wed, 11 Jan 2012 13:12:43 +0000 (13:12 +0000)] 
ksz884x: fix mtu for VLAN

The Ethernet header does not account for the addition of a VLAN header.
Full size Ethernet frames containing VLAN header are not processed
because the frame is larger than the resulting hw mtu.

Signed-off-by: Doug Kehn <rdkehn@yahoo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet_sched: sfq: add optional RED on top of SFQ
Eric Dumazet [Fri, 6 Jan 2012 06:31:44 +0000 (06:31 +0000)] 
net_sched: sfq: add optional RED on top of SFQ

Adds an optional Random Early Detection on each SFQ flow queue.

Traditional SFQ limits count of packets, while RED permits to also
control number of bytes per flow, and adds ECN capability as well.

1) We dont handle the idle time management in this RED implementation,
since each 'new flow' begins with a null qavg. We really want to address
backlogged flows.

2) if headdrop is selected, we try to ecn mark first packet instead of
currently enqueued packet. This gives faster feedback for tcp flows
compared to traditional RED [ marking the last packet in queue ]

Example of use :

tc qdisc add dev $DEV parent 1:1 handle 10: est 1sec 4sec sfq \
limit 3000 headdrop flows 512 divisor 16384 \
redflowlimit 100000 min 8000 max 60000 probability 0.20 ecn

qdisc sfq 10: parent 1:1 limit 3000p quantum 1514b depth 127 headdrop
flows 512/16384 divisor 16384
 ewma 6 min 8000b max 60000b probability 0.2 ecn
 prob_mark 0 prob_mark_head 4876 prob_drop 6131
 forced_mark 0 forced_mark_head 0 forced_drop 0
 Sent 1175211782 bytes 777537 pkt (dropped 6131, overlimits 11007
requeues 0)
 rate 99483Kbit 8219pps backlog 689392b 456p requeues 0

In this test, with 64 netperf TCP_STREAM sessions, 50% using ECN enabled
flows, we can see number of packets CE marked is smaller than number of
drops (for non ECN flows)

If same test is run, without RED, we can check backlog is much bigger.

qdisc sfq 10: parent 1:1 limit 3000p quantum 1514b depth 127 headdrop
flows 512/16384 divisor 16384
 Sent 1148683617 bytes 795006 pkt (dropped 0, overlimits 0 requeues 0)
 rate 98429Kbit 8521pps backlog 1221290b 841p requeues 0

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Dave Taht <dave.taht@gmail.com>
Tested-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agodp83640: Fix NOHZ local_softirq_pending 08 warning
Manfred Rudigier [Mon, 9 Jan 2012 23:52:15 +0000 (23:52 +0000)] 
dp83640: Fix NOHZ local_softirq_pending 08 warning

Similar problem as in 481a8199142c050b72bff8a1956a49fd0a75bbe0 ("can:
fix NOHZ local_softirq_pending 08 warning"). This fix replaces
netif_rx() with netif_rx_ni() which has to be used from
process/softirq context.

Signed-off-by: Manfred Rudigier <manfred.rudigier@omicron.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agogianfar: Fix invalid TX frames returned on error queue when time stamping
Manfred Rudigier [Mon, 9 Jan 2012 23:26:51 +0000 (23:26 +0000)] 
gianfar: Fix invalid TX frames returned on error queue when time stamping

When TX time stamping for PTP messages is enabled on a socket, a time
stamp is returned on the socket error queue to the user space application
after the frame was transmitted. The transmitted frame is also returned on
the error queue so that an application knows to which frame the time stamp
belongs.

In the current implementation the TxFCB is immediately followed by the
frame. Since the eTSEC inserts the TX time stamp 8 bytes after the TxFCB,
parts of the frame have been overwritten and an invalid frame was returned
on the socket error queue.

This patch fixes the described problem by adding additional 16 padding
bytes between the TxFCB and the frame for all messages sent from a time
stamping enabled socket (other sockets are not affected).

Signed-off-by: Manfred Rudigier <manfred.rudigier@omicron.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agogianfar: Fix missing sock reference when processing TX time stamps
Manfred Rudigier [Mon, 9 Jan 2012 23:26:50 +0000 (23:26 +0000)] 
gianfar: Fix missing sock reference when processing TX time stamps

When there is not enough headroom in the skb a private copy will be made.
However, the private copy had no reference to the socket and consequently
no time stamp could be queued on the socket error queue during the
skb_tstamp_tx function. This patch fixes this issue by also stealing the
sock reference from the original skb after making the private copy.

Signed-off-by: Manfred Rudigier <manfred.rudigier@omicron.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agophylib: introduce mdiobus_alloc_size()
Timur Tabi [Thu, 12 Jan 2012 23:23:04 +0000 (15:23 -0800)] 
phylib: introduce mdiobus_alloc_size()

Introduce function mdiobus_alloc_size() as an alternative to mdiobus_alloc().
Most callers of mdiobus_alloc() also allocate a private data structure, and
then manually point bus->priv to this object.  mdiobus_alloc_size()
combines the two operations into one, which simplifies memory management.

The original mdiobus_alloc() now just calls mdiobus_alloc_size(0).

Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agomodule_param: check that bool parameters really are bool.
Rusty Russell [Thu, 12 Jan 2012 23:02:28 +0000 (09:32 +1030)] 
module_param: check that bool parameters really are bool.

module_param(bool) used to counter-intuitively take an int.  In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.

This tightens the check (you'll get a warning about incompatible
return type) but still allows it.  Next kernel version, we'll remove
it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
12 years agointelfbdrv.c: bailearly is an int module_param
Rusty Russell [Thu, 12 Jan 2012 23:02:28 +0000 (09:32 +1030)] 
intelfbdrv.c: bailearly is an int module_param

Dan Carpenter points out that it's an int, not a bool:
intelfbdrv.c:818: if (bailearly == 1)
intelfbdrv.c:828: if (bailearly == 2)
intelfbdrv.c:836: if (bailearly == 3)
intelfbdrv.c:842: if (bailearly == 4)
intelfbdrv.c:851: if (bailearly == 5)
intelfbdrv.c:859: if (bailearly == 6)
intelfbdrv.c:866:     bailearly > 6 ? bailearly - 6 : 0);
intelfbdrv.c:874: if (bailearly == 18)
intelfbdrv.c:886: if (bailearly == 19)
intelfbdrv.c:893: if (bailearly == 20)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
12 years agoparide/pcd: fix bool verbose module parameter.
Rusty Russell [Thu, 12 Jan 2012 23:02:26 +0000 (09:32 +1030)] 
paride/pcd: fix bool verbose module parameter.

Dan Carpenter points out that it's an int, not a bool:

pcd.c:427: if (verbose > 1)
pcd.c:433: if (verbose > 1)
pcd.c:437: if (verbose < 2)
pcd.c:506:#define DBMSG(msg) ((verbose>1)?(msg):NULL)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
12 years agomodule_param: make bool parameters really bool (drivers & misc)
Rusty Russell [Thu, 12 Jan 2012 23:02:20 +0000 (09:32 +1030)] 
module_param: make bool parameters really bool (drivers & misc)

module_param(bool) used to counter-intuitively take an int.  In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.

It's time to remove the int/unsigned int option.  For this version
it'll simply give a warning, but it'll break next kernel version.

Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
12 years agomodule_param: make bool parameters really bool (arch)
Rusty Russell [Thu, 12 Jan 2012 23:02:18 +0000 (09:32 +1030)] 
module_param: make bool parameters really bool (arch)

module_param(bool) used to counter-intuitively take an int.  In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.

It's time to remove the int/unsigned int option.  For this version
it'll simply give a warning, but it'll break next kernel version.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
12 years agomodule_param: make bool parameters really bool (core code)
Rusty Russell [Thu, 12 Jan 2012 23:02:18 +0000 (09:32 +1030)] 
module_param: make bool parameters really bool (core code)

module_param(bool) used to counter-intuitively take an int.  In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.

It's time to remove the int/unsigned int option.  For this version
it'll simply give a warning, but it'll break next kernel version.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
12 years agokernel/async: remove redundant declaration.
Rusty Russell [Thu, 12 Jan 2012 23:02:18 +0000 (09:32 +1030)] 
kernel/async: remove redundant declaration.

It's in linux/init.h, and I'm about to change it to a bool.

Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
12 years agoprintk: fix unnecessary module_param_name.
Rusty Russell [Thu, 12 Jan 2012 23:02:17 +0000 (09:32 +1030)] 
printk: fix unnecessary module_param_name.

You don't need module_param_name if the name is the same!

Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
12 years agolirc_parallel: fix module parameter description.
Rusty Russell [Thu, 12 Jan 2012 23:02:17 +0000 (09:32 +1030)] 
lirc_parallel: fix module parameter description.

Cut and paste bug.

Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: devel@driverdev.osuosl.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
12 years agomodule_param: avoid bool abuse, add bint for special cases.
Rusty Russell [Thu, 12 Jan 2012 23:02:17 +0000 (09:32 +1030)] 
module_param: avoid bool abuse, add bint for special cases.

For historical reasons, we allow module_param(bool) to take an int (or
an unsigned int).  That's going away.

A few drivers really want an int: they set it to -1 and a parameter
will set it to 0 or 1.  This sucks: reading them from sysfs will give
'Y' for both -1 and 1, but if we change it to an int, then the users
might be broken (if they did "param" instead of "param=1").

Use a new 'bint' parser for them.

(ntfs has a different problem: it needs an int for debug_msgs because
it's also exposed via sysctl.)

Cc: Steve Glendinning <steve.glendinning@smsc.com>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Cc: Christoph Raisch <raisch@de.ibm.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: linux390@de.ibm.com
Cc: Anton Altaparmakov <anton@tuxera.com>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: lm-sensors@lm-sensors.org
Cc: linux-rdma@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-ntfs-dev@lists.sourceforge.net
Cc: alsa-devel@alsa-project.org
Acked-by: Takashi Iwai <tiwai@suse.de> (For the sound part)
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com> (For the hwmon driver)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
12 years agomodule_param: check type correctness for module_param_array
Rusty Russell [Thu, 12 Jan 2012 23:02:16 +0000 (09:32 +1030)] 
module_param: check type correctness for module_param_array

module_param_array(), unlike its non-array cousins, didn't check the type
of the variable.  Fixing this found two bugs.

Cc: Luca Risolia <luca.risolia@studio.unibo.it>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Eric Piel <eric.piel@tremplin-utc.net>
Cc: linux-media@vger.kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
12 years agomodpost: use linker section to generate table.
Rusty Russell [Thu, 12 Jan 2012 23:02:16 +0000 (09:32 +1030)] 
modpost: use linker section to generate table.

This means (most) future busses need only have one hunk in their
patch.  Also took the opportunity to check that function matches the
type.

Again, inspired by Alessandro's patch series.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Alessandro Rubini <rubini@gnudd.com>
12 years agomodpost: use a table rather than a giant if/else statement.
Rusty Russell [Thu, 12 Jan 2012 23:02:15 +0000 (09:32 +1030)] 
modpost: use a table rather than a giant if/else statement.

We look for symbols of form __mod_<busname>_device_table, and for all
but three cases we use a standard interation function (do_table) to
walk over the contents and dump out the aliases.

Alessandro Rubini did this first, I just repainted the bikeshed a bit.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Alessandro Rubini <rubini@gnudd.com>
12 years agomodules: sysfs - export: taint, coresize, initsize
Kay Sievers [Thu, 12 Jan 2012 23:02:15 +0000 (09:32 +1030)] 
modules: sysfs - export: taint, coresize, initsize

Recent tools do not want to use /proc to retrieve module information. A few
values are currently missing from sysfs to replace the information available
in /proc/modules.

This adds /sys/module/*/{coresize,initsize,taint} attributes.

TAINT_PROPRIETARY_MODULE (P) and TAINT_OOT_MODULE (O) flags are both always
shown now, and do no longer exclude each other, also in /proc/modules.

Replace the open-coded sysfs attribute initializers with the __ATTR() macro.

Add the new attributes to Documentation/ABI.

Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
12 years agokernel/params: replace DEBUGP with pr_debug
Jim Cromie [Tue, 6 Dec 2011 19:11:31 +0000 (12:11 -0700)] 
kernel/params: replace DEBUGP with pr_debug

Use more flexible pr_debug.  This allows:

  echo "module params +p" > /dbg/dynamic_debug/control

to turn on debug messages when needed.

Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
12 years agomodule: replace DEBUGP with pr_debug
Jim Cromie [Tue, 6 Dec 2011 19:11:31 +0000 (12:11 -0700)] 
module: replace DEBUGP with pr_debug

Use more flexible pr_debug.  This allows:

  echo "module module +p" > /dbg/dynamic_debug/control

to turn on debug messages when needed.

Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
12 years agomodule: struct module_ref should contains long fields
Eric Dumazet [Thu, 12 Jan 2012 23:02:14 +0000 (09:32 +1030)] 
module: struct module_ref should contains long fields

module_ref contains two "unsigned int" fields.

Thats now too small, since some machines can open more than 2^32 files.

Check commit 518de9b39e8 (fs: allow for more than 2^31 files) for
reference.

We can add an aligned(2 * sizeof(unsigned long)) attribute to force
alloc_percpu() allocating module_ref areas in single cache lines.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Tejun Heo <tj@kernel.org>
CC: Robin Holt <holt@sgi.com>
CC: David Miller <davem@davemloft.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
12 years agomodule: Fix performance regression on modules with large symbol tables
Kevin Cernekee [Thu, 12 Jan 2012 23:02:14 +0000 (09:32 +1030)] 
module: Fix performance regression on modules with large symbol tables

Looking at /proc/kallsyms, one starts to ponder whether all of the extra
strtab-related complexity in module.c is worth the memory savings.

Instead of making the add_kallsyms() loop even more complex, I tried the
other route of deleting the strmap logic and naively copying each string
into core_strtab with no consideration for consolidating duplicates.

Performance on an "already exists" insmod of nvidia.ko (runs
add_kallsyms() but does not actually initialize the module):

Original scheme: 1.230s
With naive copying: 0.058s

Extra space used: 35k (of a 408k module).

Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <73defb5e4bca04a6431392cc341112b1@localhost>

12 years agomodule: Add comments describing how the "strmap" logic works
Kevin Cernekee [Sun, 13 Nov 2011 03:08:55 +0000 (19:08 -0800)] 
module: Add comments describing how the "strmap" logic works

Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
12 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Linus Torvalds [Thu, 12 Jan 2012 20:40:41 +0000 (12:40 -0800)] 
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: bcm5974 - set BUTTONPAD property
  Input: serio_raw - return proper result when serio_raw_write fails
  Input: serio_raw - really signal HUP upon disconnect
  Input: serio_raw - remove stray semicolon
  Input: revert some over-zealous conversions to module_platform_driver()

12 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi...
Linus Torvalds [Thu, 12 Jan 2012 20:39:21 +0000 (12:39 -0800)] 
Merge branch 'for-linus' of git://git./linux/kernel/git/mszeredi/fuse

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  FUSE: Notifying the kernel of deletion.
  fuse: support ioctl on directories
  fuse: Use kcalloc instead of kzalloc to allocate array
  fuse: llseek optimize SEEK_CUR and SEEK_SET

12 years agoMerge tag 'to-linus' of git://github.com/rustyrussell/linux
Linus Torvalds [Thu, 12 Jan 2012 20:37:27 +0000 (12:37 -0800)] 
Merge tag 'to-linus' of git://github.com/rustyrussell/linux

* tag 'to-linus' of git://github.com/rustyrussell/linux: (24 commits)
  lguest: Make sure interrupt is allocated ok by lguest_setup_irq
  lguest: move the lguest tool to the tools directory
  lguest: switch segment-voodoo-numbers to readable symbols
  virtio: balloon: Add freeze, restore handlers to support S4
  virtio: balloon: Move vq initialization into separate function
  virtio: net: Add freeze, restore handlers to support S4
  virtio: net: Move vq and vq buf removal into separate function
  virtio: net: Move vq initialization into separate function
  virtio: blk: Add freeze, restore handlers to support S4
  virtio: blk: Move vq initialization to separate function
  virtio: console: Disable callbacks for virtqueues at start of S4 freeze
  virtio: console: Add freeze and restore handlers to support S4
  virtio: console: Move vq and vq buf removal into separate functions
  virtio: pci: add PM notification handlers for restore, freeze, thaw, poweroff
  virtio: pci: switch to new PM API
  virtio_blk: fix config handler race
  virtio: add debugging if driver doesn't kick.
  virtio: expose added descriptors immediately.
  virtio: avoid modulus operation.
  virtio: support unlocked queue kick
  ...

12 years agonet: decrement memcg jump label when limit, not usage, is changed
Glauber Costa [Thu, 12 Jan 2012 02:16:06 +0000 (02:16 +0000)] 
net: decrement memcg jump label when limit, not usage, is changed

The logic of the current code is that whenever we destroy
a cgroup that had its limit set (set meaning different than
maximum), we should decrement the jump_label counter.
Otherwise we assume it was never incremented.

But what the code actually does is test for RES_USAGE
instead of RES_LIMIT. Usage being different than maximum
is likely to be true most of the time.

The effect of this is that the key must become negative,
and since the jump_label test says:

        !!atomic_read(&key->enabled);

we'll have jump_labels still on when no one else is
using this functionality.

Signed-off-by: Glauber Costa <glommer@parallels.com>
CC: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: reintroduce missing rcu_assign_pointer() calls
Eric Dumazet [Thu, 12 Jan 2012 04:41:32 +0000 (04:41 +0000)] 
net: reintroduce missing rcu_assign_pointer() calls

commit a9b3cd7f32 (rcu: convert uses of rcu_assign_pointer(x, NULL) to
RCU_INIT_POINTER) did a lot of incorrect changes, since it did a
complete conversion of rcu_assign_pointer(x, y) to RCU_INIT_POINTER(x,
y).

We miss needed barriers, even on x86, when y is not NULL.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agobrcmsmac: fix reading of PCI sprom contents
Linus Torvalds [Thu, 12 Jan 2012 20:19:34 +0000 (12:19 -0800)] 
brcmsmac: fix reading of PCI sprom contents

It appears that you can only read the sprom contents with aligned 16-bit
reads: anything else causes at least some versions of the broadcom
chipset to abort the PCI transaction, returning 0xff.

This apparently doesn't trigger very often, because most setups don't
use an external srom chip, and the OTP sprom loading doesn't have this
issue.  But at least the current 11" Macbook Air does trigger it, and
wireless communications were broken as a result.

Acked-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agommc: fix a deadlock between system suspend and MMC block IO
Guennadi Liakhovetski [Wed, 4 Jan 2012 14:28:45 +0000 (15:28 +0100)] 
mmc: fix a deadlock between system suspend and MMC block IO

Performing MMC block IO with simultaneous STR can lead to a deadlock: the
mmc_pm_notify() function claims the host and then calls bus .remove()
method, which lands in mmc_blk_remove(), which calls mmc_blk_remove_req()
then it goes to -> mmc_cleanup_queue() -> kthread_stop(), which waits for
the mmc-block thread to stop. If the mmc-block thread at that time is
processing block requests, it will also try to claim the host in
mmc_blk_issue_rq() and block there. This patch fixes the problem by
calling .remove() before claiming the host.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Acked-by: Arindam Nath <arindam.nath@amd.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
12 years agommc: sdhci: restore the enabled dma when do reset all
Shaohui Xie [Thu, 29 Dec 2011 08:33:00 +0000 (16:33 +0800)] 
mmc: sdhci: restore the enabled dma when do reset all

If dma is enabled, it'll be cleared when reset all is performed, this can
be observed on some platforms, such as P2041 which has a version 2.3
controller, but platform like P4080 which has a version 2.2 controller,
does not suffer this, so we will check if the dma is enabled, we should
restore it after reset all.

Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
12 years agommc: dw_mmc: miscaculated the fifo-depth with wrong bit operation
Jaehoon Chung [Wed, 11 Jan 2012 09:28:21 +0000 (09:28 +0000)] 
mmc: dw_mmc: miscaculated the fifo-depth with wrong bit operation

In FIFOTH register, the RX_WMark field (bits[27:16]) defaults to
FIFO_DEPTH - 1. When reading it, bits[26:16] were being used, so
fix it to use the mask 0xfff instead of 0x7ff.

Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
12 years agommc: host: Adds support for eMMC 4.5 HS200 mode
Girish K S [Fri, 6 Jan 2012 04:26:39 +0000 (09:56 +0530)] 
mmc: host: Adds support for eMMC 4.5 HS200 mode

This patch adds support for the HS200 mode on the host side.
Also enables the tuning feature required when the HS200 mode
is selected.

Signed-off-by: Girish K S <girish.shivananjappa@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
12 years agommc: core: HS200 mode support for eMMC 4.5
Girish K S [Wed, 11 Jan 2012 19:04:52 +0000 (14:04 -0500)] 
mmc: core: HS200 mode support for eMMC 4.5

This patch adds the support of the HS200 bus speed for eMMC 4.5 devices.
The eMMC 4.5 devices have support for 200MHz bus speed. The function
prototype of the tuning function is modified to handle the tuning
command number which is different in sd and mmc case.

Signed-off-by: Girish K S <girish.shivananjappa@linaro.org>
Signed-off-by: Philip Rakity <prakity@marvell.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
12 years agommc: dw_mmc: fixed wrong bit operation for SDMMC_GET_FCNT()
Jaehoon Chung [Thu, 5 Jan 2012 10:12:57 +0000 (19:12 +0900)] 
mmc: dw_mmc: fixed wrong bit operation for SDMMC_GET_FCNT()

In status register, fifo_count is bit[29:17].
(0x1FFF is correct)

Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Will Newton <will.newton@imgtec.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
12 years agommc: core: Separate the timeout value for cache-ctrl
Seungwon Jeon [Fri, 9 Dec 2011 08:47:17 +0000 (17:47 +0900)] 
mmc: core: Separate the timeout value for cache-ctrl

Turning the cache off implies flushing cache which doesn't define
maximum timeout unlike cache-on. This patch will apply the generic
CMD6 timeout only for cache-on. Additionally the kernel message is
added for checking failure case of cache-on.

Signed-off-by: Seungwon Jeon <tgih.jun@samsung.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
12 years agommc: sdhci-spear: Fix compilation error
Viresh Kumar [Wed, 4 Jan 2012 06:18:42 +0000 (11:48 +0530)] 
mmc: sdhci-spear: Fix compilation error

With the inclusion of following patch (59b5bc3929b37):
"mmc: sdhci: remove "state" argument from sdhci_suspend_host"

we get a compilation error for sdhci-spear:
drivers/mmc/host/sdhci-spear.c:283:2: error: too many arguments to function
‘sdhci_suspend_host’

This patch fixes this error.

Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
12 years agommc: sdhci: Deal with failure case in sdhci_suspend_host
Aaron Lu [Wed, 4 Jan 2012 02:07:43 +0000 (10:07 +0800)] 
mmc: sdhci: Deal with failure case in sdhci_suspend_host

If there are errors happened in sdhci_suspend_host, handle it so that
when the function returns with an error, the host's behaviour is the
same before this function call, e.g. card detection is enabled and
tuning timer is active, etc.

Signed-off-by: Philip Rakity <prakity@marvell.com>
Signed-off-by: Aaron Lu <aaron.lu@amd.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
12 years agommc: dw_mmc: Clear the DDR mode for non-DDR
Seungwon Jeon [Mon, 2 Jan 2012 07:00:02 +0000 (16:00 +0900)] 
mmc: dw_mmc: Clear the DDR mode for non-DDR

UHS_REG should be cleared for non-DDR mode. But currently there is
no way to clear DDR mode, if it is already set once. This patch adds
clearing DDR mode for non-DDR mode.

Signed-off-by: Seungwon Jeon <tgih.jun@samsung.com>
Acked-by: Will Newton <will.newton@imgtec.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
12 years agommc: sd: Fix SDR12 timing regression
Alexander Elbs [Wed, 4 Jan 2012 04:26:53 +0000 (23:26 -0500)] 
mmc: sd: Fix SDR12 timing regression

This patch fixes a failure to recognize SD cards reported on a Dell
Vostro with O2 Micro SD card reader.  Patch 49c468f ("mmc: sd: add
support for uhs bus speed mode selection") caused the problem, by
setting the SDHCI_CTRL_HISPD flag even for legacy timings.

Signed-off-by: Alexander Elbs <alex@segv.de>
Acked-by: Philip Rakity <prakity@marvell.com>
Acked-by: Arindam Nath <arindam.nath@amd.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
12 years agommc: sdhci: Fix tuning timer incorrect setting when suspending host
Aaron Lu [Wed, 28 Dec 2011 03:11:12 +0000 (11:11 +0800)] 
mmc: sdhci: Fix tuning timer incorrect setting when suspending host

When suspending host, the tuning timer shoule be deactivated.
And the HOST_NEEDS_TUNING flag should be set after tuning timer is
deactivated.

Signed-off-by: Philip Rakity <prakity@marvell.com>
Signed-off-by: Aaron Lu <aaron.lu@amd.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
12 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
David S. Miller [Thu, 12 Jan 2012 20:10:00 +0000 (12:10 -0800)] 
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless

12 years agoMAINTAINERS: List i2c-omap and i2c-davinci drivers
Jean Delvare [Thu, 12 Jan 2012 19:32:05 +0000 (20:32 +0100)] 
MAINTAINERS: List i2c-omap and i2c-davinci drivers

This will ensure that the right people and lists are notified when
these drivers are modified.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Tony Lindgren <tony@atomide.com>
Acked-by: Sekhar Nori <nsekhar@ti.com>
Cc: Kevin Hilman <khilman@ti.com>
12 years agoMAINTAINERS: i2c: Add third maintainer
Wolfram Sang [Thu, 12 Jan 2012 19:32:05 +0000 (20:32 +0100)] 
MAINTAINERS: i2c: Add third maintainer

Add me as a third maintainer to help out in the i2c subsystem.

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Ben Dooks <ben-linux@fluff.org>
12 years agoi2c/gpio-i2cmux: Convert to use module_platform_driver()
Axel Lin [Thu, 12 Jan 2012 19:32:04 +0000 (20:32 +0100)] 
i2c/gpio-i2cmux: Convert to use module_platform_driver()

Convert gpio-i2cmux to use the module_platform_driver() macro which
makes the code smaller and a bit simpler.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Peter Korsgaard <peter.korsgaard@barco.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
12 years agoi2c/busses: Use module_platform_driver()
Axel Lin [Thu, 12 Jan 2012 19:32:04 +0000 (20:32 +0100)] 
i2c/busses: Use module_platform_driver()

Convert the drivers in drivers/i2c/busses/* to use the
module_platform_driver() macro which makes the code smaller and a bit
simpler.

Cc: Ben Dooks <ben-linux@fluff.org>
Acked-by: Jochen Friedrich <jochen@scram.de>
Acked-by: Peter Korsgaard <jacmet@sunsite.dk>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: Manuel Lauss <manuel.lauss@googlemail.com>
Cc: Barry Song <21cnbao@gmail.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Yong Zhang <yong.zhang0@gmail.com>
Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
12 years agoi2c-dev: Use memdup_user
Thomas Meyer [Thu, 12 Jan 2012 19:32:04 +0000 (20:32 +0100)] 
i2c-dev: Use memdup_user

Use memdup_user rather than duplicating its implementation.
This is a little bit restricted to reduce false positives.

The semantic patch that makes this output is available
in scripts/coccinelle/api/memdup_user.cocci.

More information about semantic patching is available at
http://coccinelle.lip6.fr/

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
12 years agoi2c: Convert to DEFINE_PCI_DEVICE_TABLE
Axel Lin [Thu, 12 Jan 2012 19:32:04 +0000 (20:32 +0100)] 
i2c: Convert to DEFINE_PCI_DEVICE_TABLE

Convert static struct pci_device_id *[] to static DEFINE_PCI_DEVICE_TABLE
tables.

Use DEFINE_PCI_DEVICE_TABLE ensures we make the pci_device_id table const
and marked as __devinitconst.

This also fixes some warnings from checkpatch:
e.g.
WARNING: Use DEFINE_PCI_DEVICE_TABLE for struct pci_device_id
#1096: FILE: i2c/busses/i2c-intel-mid.c:1096:
+static struct pci_device_id intel_mid_i2c_ids[] = {

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: Rudolf Marek <r.marek@assembler.cz>
Cc: Ben Dooks <ben-linux@fluff.org>
Acked-by: Olof Johansson <olof@lixom.net>
Cc: "Mark M. Hoffman" <mhoffman@lightlink.com>
Acked-by: Dirk Brandewie <dirk.brandewie@gmail.com>
Cc: Tomoya MORINAGA <tomoya-linux@dsn.lapis-semi.com>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
12 years agoi2c-ali1535: enable SPARC support
corentin.labbe [Thu, 12 Jan 2012 19:32:04 +0000 (20:32 +0100)] 
i2c-ali1535: enable SPARC support

The i2c-ali1535 driver doesn't work on SPARC, because it assumes that
ioport address are 16-bit wide (address stored in an unsigned short).
But on SPARC arch, ioports are mapped in memory and so must be stored
in an unsigned long.

Use pci_resource_start for getting IOMEM base address, then read the
SMBBA of the i2c bus and use these together for I/O access.

I would like to thank Jean DELVARE for reviewing my patch.

Signed-off-by: LABBE Corentin <corentin.labbe@geomatys.fr>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
12 years agoi2c: Fix error value returned by several bus drivers
Jean Delvare [Thu, 12 Jan 2012 19:32:03 +0000 (20:32 +0100)] 
i2c: Fix error value returned by several bus drivers

When adding checks for ACPI resource conflicts to many bus drivers,
not enough attention was paid to the error paths, and for several
drivers this causes 0 to be returned on error in some cases. Fix this
by properly returning a non-zero value on every error.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: stable@kernel.org
12 years agoceph: ensure prealloc_blob is in place when removing xattr
Alex Elder [Thu, 12 Jan 2012 01:41:01 +0000 (17:41 -0800)] 
ceph: ensure prealloc_blob is in place when removing xattr

In __ceph_build_xattrs_blob(), if a ceph inode's extended attributes
are marked dirty, all attributes recorded in its rb_tree index are
formatted into a "blob" buffer.  The target buffer is recorded in
ceph_inode->i_xattrs.prealloc_blob, and it is expected to exist and
be of sufficient size to hold the attributes.

The extended attributes are marked dirty in two cases: when a new
attribute is added to the inode; or when one is removed.  In the
former case work is done to ensure the prealloc_blob buffer is
properly set up, but in the latter it is not.

Change the logic in ceph_removexattr() so it matches what is
done in ceph_setxattr().  Note that this is done in a way that
keeps the two blocks of code nearly identical, in anticipation
of a subsequent patch that encapsulates some of this logic into
one or more helper routines.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
12 years agorbd: initialize snap_rwsem in rbd_add()
Alex Elder [Thu, 12 Jan 2012 03:42:15 +0000 (19:42 -0800)] 
rbd: initialize snap_rwsem in rbd_add()

New rbd device structures get initialized in rbd_add().  Many of
the fields rely on being initially zero-filled.  However we lockdep
was noticing that the rw_semaphore embedded in the header field
was not getting properly initialized.  Fix that.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
12 years agoceph: enable/disable dentry complete flags via mount option
Sage Weil [Tue, 10 Jan 2012 17:12:55 +0000 (09:12 -0800)] 
ceph: enable/disable dentry complete flags via mount option

Enable/disable use of the dentry dir 'complete' flag via a mount option.
This lets the admin control whether ceph uses the dcache to satisfy
negative lookups or readdir when it has the entire directory contents in
its cache.

This is purely a performance optimization; correctness is guaranteed
whether it is enabled or not.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sage Weil <sage@newdream.net>
12 years agovfs: export symbol d_find_any_alias()
Sage Weil [Tue, 10 Jan 2012 17:04:37 +0000 (09:04 -0800)] 
vfs: export symbol d_find_any_alias()

Ceph needs this.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sage Weil <sage@newdream.net>
12 years agox86: Get rid of 'dubious one-bit signed bitfield' sprase warning
Anton Vorontsov [Wed, 11 Jan 2012 01:11:46 +0000 (05:11 +0400)] 
x86: Get rid of 'dubious one-bit signed bitfield' sprase warning

This very noisy sparse warning appears on almost every file in the
kernel:

  CHECK   init/main.c
  arch/x86/include/asm/thread_info.h:43:55: error: dubious one-bit signed bitfield
  arch/x86/include/asm/thread_info.h:44:46: error: dubious one-bit signed bitfield

This patch changes sig_on_uaccess_error and uaccess_err flags to unsigned
type and thus fixes the warning.

Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>
Acked-by: Andy Lutomirski <luto@mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Linus Torvalds [Thu, 12 Jan 2012 16:00:30 +0000 (08:00 -0800)] 
Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (526 commits)
  ASoC: twl6040 - Add method to query optimum PDM_DL1 gain
  ALSA: hda - Fix the lost power-setup of seconary pins after PM resume
  ALSA: usb-audio: add Yamaha MOX6/MOX8 support
  ALSA: virtuoso: add S/PDIF input support for all Xonars
  ALSA: ice1724 - Support for ooAoo SQ210a
  ALSA: ice1724 - Allow card info based on model only
  ALSA: ice1724 - Create capture pcm only for ADC-enabled configurations
  ALSA: hdspm - Provide unique driver id based on card serial
  ASoC: Dynamically allocate the rtd device for a non-empty release()
  ASoC: Fix recursive dependency due to select ATMEL_SSC in SND_ATMEL_SOC_SSC
  ALSA: hda - Fix the detection of "Loopback Mixing" control for VIA codecs
  ALSA: hda - Return the error from get_wcaps_type() for invalid NIDs
  ALSA: hda - Use auto-parser for HP laptops with cx20459 codec
  ALSA: asihpi - Fix potential Oops in snd_asihpi_cmode_info()
  ALSA: hdsp - Fix potential Oops in snd_hdsp_info_pref_sync_ref()
  ALSA: hda/cirrus - support for iMac12,2 model
  ASoC: cx20442: add bias control over a platform provided regulator
  ALSA: usb-audio - Avoid flood of frame-active debug messages
  ALSA: snd-usb-us122l: Delete calls to preempt_disable
  mfd: Put WM8994 into cache only mode when suspending
  ...

Fix up trivial conflicts in:
 - arch/arm/mach-s3c64xx/mach-crag6410.c:
renamed speyside_wm8962 to tobermory, added littlemill right
next to it
 - drivers/base/regmap/{regcache.c,regmap.c}:
duplicate diff that had already come in with other changes in
the regmap tree

12 years agox86/PCI: build amd_bus.o only when CONFIG_AMD_NB=y
Bjorn Helgaas [Thu, 12 Jan 2012 15:01:40 +0000 (08:01 -0700)] 
x86/PCI: build amd_bus.o only when CONFIG_AMD_NB=y

We only need amd_bus.o for AMD systems with PCI.  arch/x86/pci/Makefile
already depends on CONFIG_PCI=y, so this patch just adds the dependency
on CONFIG_AMD_NB.

Cc: Yinghai Lu <yinghai@kernel.org>
Cc: stable@kernel.org # 2.6.34+ (needs adjustment for k8 -> amd rename)
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoMerge branch 'topic/hda' into for-linus
Takashi Iwai [Thu, 12 Jan 2012 08:59:18 +0000 (09:59 +0100)] 
Merge branch 'topic/hda' into for-linus

12 years agoMerge branch 'topic/misc' into for-linus
Takashi Iwai [Thu, 12 Jan 2012 08:59:14 +0000 (09:59 +0100)] 
Merge branch 'topic/misc' into for-linus

12 years agoMerge branch 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound...
Takashi Iwai [Thu, 12 Jan 2012 08:48:20 +0000 (09:48 +0100)] 
Merge branch 'for-3.3' of git://git./linux/kernel/git/broonie/sound into topic/asoc

12 years agoMerge tag 'rmobile-for-linus' of git://github.com/pmundt/linux-sh
Linus Torvalds [Thu, 12 Jan 2012 07:29:20 +0000 (23:29 -0800)] 
Merge tag 'rmobile-for-linus' of git://github.com/pmundt/linux-sh

SH/R-Mobile updates for 3.3 merge window.

* tag 'rmobile-for-linus' of git://github.com/pmundt/linux-sh: (32 commits)
  arm: mach-shmobile: add a resource name for shdma
  ARM: mach-shmobile: r8a7779 SMP support V3
  ARM: mach-shmobile: Add kota2 defconfig.
  ARM: mach-shmobile: Add marzen defconfig.
  ARM: mach-shmobile: r8a7779 power domain support V2
  ARM: mach-shmobile: Fix up marzen build for recent GIC changes.
  ARM: mach-shmobile: r8a7779 PFC function support
  ARM: mach-shmobile: Flush caches in platform_cpu_die()
  ARM: mach-shmobile: Allow SoC specific CPU kill code
  ARM: mach-shmobile: Fix headsmp.S code to use CPUINIT
  ARM: mach-shmobile: clock-r8a7779: clkz/clkzs support
  ARM: mach-shmobile: clock-r8a7779: add DIV4 clock support
  ARM: mach-shmobile: Marzen LAN89218 support
  ARM: mach-shmobile: Marzen SCIF2/SCIF4 support
  ARM: mach-shmobile: r8a7779 PFC GPIO-only support V2
  ARM: mach-shmobile: r8a7779 and Marzen base support V2
  sh: pfc: Unlock register support
  sh: pfc: Variable bitfield width config register support
  sh: pfc: Add config_reg_helper() function
  sh: pfc: Convert index to field and value pair
  ...

12 years agoMerge tag 'sh-for-linus' of git://github.com/pmundt/linux-sh
Linus Torvalds [Thu, 12 Jan 2012 07:22:52 +0000 (23:22 -0800)] 
Merge tag 'sh-for-linus' of git://github.com/pmundt/linux-sh

SuperH updates for 3.3 merge window.

* tag 'sh-for-linus' of git://github.com/pmundt/linux-sh: (38 commits)
  sh: magicpanelr2: Update for parse_mtd_partitions() fallout.
  sh: mach-rsk: Update for parse_mtd_partitions() fallout.
  sh: sh2a: Improve cache flush/invalidate functions
  sh: also without PM_RUNTIME pm_runtime.o must be built
  sh: add a resource name for shdma
  sh: Remove redundant try_to_freeze() invocations.
  sh: Ensure IRQs are enabled across do_notify_resume().
  sh: Fix up store queue code for subsys_interface changes.
  sh: clkfwk: sh_clk_init_parent() should be called after clk_register()
  sh: add platform_device for renesas_usbhs in board-sh7757lcr
  sh: modify clock-sh7757 for renesas_usbhs
  sh: pfc: ioremap() support
  sh: use ioread32/iowrite32 and mapped_reg for div6
  sh: use ioread32/iowrite32 and mapped_reg for div4
  sh: use ioread32/iowrite32 and mapped_reg for mstp32
  sh: extend clock struct with mapped_reg member
  sh: clkfwk: clock-sh73a0: all div6_clks use SH_CLK_DIV6_EXT()
  sh: clkfwk: clock-sh7724: all div6_clks use SH_CLK_DIV6_EXT()
  sh: clock-sh7723: add CLKDEV_ICK_ID for cleanup
  serial: sh-sci: Handle GPIO function requests.
  ...

12 years agoMerge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 12 Jan 2012 06:52:48 +0000 (22:52 -0800)] 
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Fix lockup by limiting load-balance retries on lock-break
  sched: Fix CONFIG_CGROUP_SCHED dependency
  sched: Remove empty #ifdefs

12 years agolguest: Make sure interrupt is allocated ok by lguest_setup_irq
Stratos Psomadakis [Thu, 12 Jan 2012 05:14:47 +0000 (15:44 +1030)] 
lguest: Make sure interrupt is allocated ok by lguest_setup_irq

Make sure the interrupt is allocated correctly by lguest_setup_irq (check the
return value of irq_alloc_desc_at for -ENOMEM)

Signed-off-by: Stratos Psomadakis <psomas@cslab.ece.ntua.gr>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (cleanups and commentry)
This page took 0.075125 seconds and 5 git commands to generate.