Eric W. Biederman [Mon, 2 Oct 2006 09:17:27 +0000 (02:17 -0700)]
[PATCH] file: Add locking to f_getown
This has been needed for a long time, but now with the advent of a
reference counted struct pid there are real consequences for getting this
wrong.
Someone I think it was Oleg Nesterov pointed out that this construct was
missing locking, when I introduced struct pid. After taking time to review
the locking construct already present I figured out which lock needs to be
taken. The other paths that access f_owner.pid take either the f_owner
read or the write lock.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Cedric Le Goater [Mon, 2 Oct 2006 09:17:26 +0000 (02:17 -0700)]
[PATCH] update mq_notify to use a struct pid
Message queues can signal a process waiting for a message.
This patch replaces the pid_t value with a struct pid to avoid pid wrap
around problems.
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Acked-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eric W. Biederman [Mon, 2 Oct 2006 09:17:25 +0000 (02:17 -0700)]
[PATCH] Use struct pspace in next_pidmap and find_ge_pid
This updates my proc: readdir race fix (take 3) patch
to account for the changes made by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
to introduce struct pspace.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sukadev Bhattiprolu [Mon, 2 Oct 2006 09:17:24 +0000 (02:17 -0700)]
[PATCH] Define struct pspace
Define a per-container pid space object. And create one instance of this
object, init_pspace, to define the entire pid space. Subsequent patches
will provide/use interfaces to create/destroy pid spaces.
Its a subset/rework of Eric Biederman's patch
http://lkml.org/lkml/2006/2/6/285 .
Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sukadev Bhattiprolu [Mon, 2 Oct 2006 09:17:23 +0000 (02:17 -0700)]
[PATCH] Move pidmap to pspace.h
Move struct pidmap and PIDMAP_ENTRIES to a new file, include/linux/pspace.h
where it will be used in subsequent patches to define pid spaces.
Its a subset of Eric Biederman's patch http://lkml.org/lkml/2006/2/6/285
[akpm@osdl.org: cleanups]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Oleg Nesterov [Mon, 2 Oct 2006 09:17:22 +0000 (02:17 -0700)]
[PATCH] pid: simplify pid iterators
I think it is hardly possible to read the current do_each_task_pid(). The
new version is much simpler and makes the code smaller.
Only the do_each_task_pid change is tested, the do_each_pid_task isn't.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eric W. Biederman [Mon, 2 Oct 2006 09:17:21 +0000 (02:17 -0700)]
[PATCH] pids coding style use struct pidmap in next_pidmap
Use struct pidmap instead of pidmap_t.
This updates my proc: readdir race fix (take 3) patch
to account for the changes made by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
to kill pidmap_t.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sukadev Bhattiprolu [Mon, 2 Oct 2006 09:17:20 +0000 (02:17 -0700)]
[PATCH] pids: coding style: use struct pidmap
Use struct pidmap instead of pidmap_t.
Its a subset of Eric Biederman's patch http://lkml.org/lkml/2006/2/6/271.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Dike [Mon, 2 Oct 2006 09:17:18 +0000 (02:17 -0700)]
[PATCH] const struct tty_operations
As part of an SMP cleanliness pass over UML, I consted a bunch of
structures in order to not have to document their locking. One of these
structures was a struct tty_operations. In order to const it in UML
without introducing compiler complaints, the declaration of
tty_set_operations needs to be changed, and then all of its callers need to
be fixed.
This patch declares all struct tty_operations in the tree as const. In all
cases, they are static and used only as input to tty_set_operations. As an
extra check, I ran an i386 allyesconfig build which produced no extra
warnings.
53 drivers are affected. I checked the history of a bunch of them, and in
most cases, there have been only a handful of maintenance changes in the
last six months. serial_core.c was the busiest one that I looked at.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andreas Mohr [Mon, 2 Oct 2006 09:17:17 +0000 (02:17 -0700)]
[PATCH] fs/inode.c tweaks
Only touch inode's i_mtime and i_ctime to make them equal to "now" in case
they aren't yet (don't just update timestamp unconditionally). Uninline
the hash function to save 259 Bytes.
This tiny inode change which may improve cache behaviour also shaves off 8
Bytes from file_update_time() on i386.
Included a tiny codestyle cleanup, too.
Signed-off-by: Andreas Mohr <andi@lisas.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Alexey Dobriyan [Mon, 2 Oct 2006 09:17:16 +0000 (02:17 -0700)]
[PATCH] Remove NULL check in register_nls()
Everybody passes valid pointer there.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eric W. Biederman [Mon, 2 Oct 2006 09:17:15 +0000 (02:17 -0700)]
[PATCH] file: modify struct fown_struct to use a struct pid
File handles can be requested to send sigio and sigurg to processes. By
tracking the destination processes using struct pid instead of pid_t we make
the interface safe from all potential pid wrap around problems.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eric W. Biederman [Mon, 2 Oct 2006 09:17:14 +0000 (02:17 -0700)]
[PATCH] vt: Make vt_pid a struct pid (making it pid wrap around safe).
I took a good hard look at the locking and it appears the locking on vt_pid
is the console semaphore. Every modified path is called under the console
semaphore except reset_vc when it is called from fn_SAK or do_SAK both of
which appear to be in interrupt context. In addition I need to be careful
because in the presence of an oops the console_sem may be arbitrarily
dropped.
Which leads me to conclude the current locking is inadequate for my needs.
Given the weird cases we could hit because of oops printing instead of
introducing an extra spin lock to protect the data and keep the pid to
signal and the signal to send in sync, I have opted to use xchg on just the
struct pid * pointer instead.
Due to console_sem we will stay in sync between vt_pid and vt_mode except
for a small window during a SAK, or oops handling. SAK handling should
kill any user space process that care, and oops handling we are broken
anyway. Besides the worst that can happen is that I try to send the wrong
signal.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eric W. Biederman [Mon, 2 Oct 2006 09:17:13 +0000 (02:17 -0700)]
[PATCH] vt: rework the console spawning variables
This is such a rare path it took me a while to figure out how to test
this after soring out the locking.
This patch does several things.
- The variables used are moved into a structure and declared in vt_kern.h
- A spinlock is added so we don't have SMP races updating the values.
- Instead of raw pid_t value a struct_pid is used to guard against
pid wrap around issues, if the daemon to spawn a new console dies.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eric W. Biederman [Mon, 2 Oct 2006 09:17:12 +0000 (02:17 -0700)]
[PATCH] pid: implement pid_nr
As we stop storing pid_t's and move to storing struct pid *. We need a way to
get the pid_t from the struct pid to report to user space what we have stored.
Having a clean well defined way to do this is especially important as we move
to multiple pid spaces as may need to report a different value to the caller
depending on which pid space the caller is in.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eric W. Biederman [Mon, 2 Oct 2006 09:17:11 +0000 (02:17 -0700)]
[PATCH] pid: export the symbols needed to use struct pid *
pids aren't something that drivers should care about. However there are a lot
of helper layers in the kernel that do care, and are built as modules. Before
I can convert them to using struct pid instead of pid_t I need to export the
appropriate symbols so they can continue to be built.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eric W. Biederman [Mon, 2 Oct 2006 09:17:10 +0000 (02:17 -0700)]
[PATCH] pid: implement signal functions that take a struct pid *
Currently the signal functions all either take a task or a pid_t argument.
This patch implements variants that take a struct pid *. After all of the
users have been update it is my intention to remove the variants that take a
pid_t as using pid_t can be more work (an extra hash table lookup) and
difficult to get right in the presence of multiple pid namespaces.
There are two kinds of functions introduced in this patch. The are the
general use functions kill_pgrp and kill_pid which take a priv argument that
is ultimately used to create the appropriate siginfo information, Then there
are _kill_pgrp_info, kill_pgrp_info, kill_pid_info the internal implementation
helpers that take an explicit siginfo.
The distinction is made because filling out an explcit siginfo is tricky, and
will be even more tricky when pid namespaces are introduced.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eric W. Biederman [Mon, 2 Oct 2006 09:17:09 +0000 (02:17 -0700)]
[PATCH] pid: add do_each_pid_task
To avoid pid rollover confusion the kernel needs to work with struct pid *
instead of pid_t. Currently there is not an iterator that walks through all
of the tasks of a given pid type starting with a struct pid. This prevents us
replacing some pid_t instances with struct pid. So this patch adds
do_each_pid_task which walks through the set of task for a given pid type
starting with a struct pid.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eric W. Biederman [Mon, 2 Oct 2006 09:17:09 +0000 (02:17 -0700)]
[PATCH] pid: implement access helpers for a tacks various process groups
In the last round of cleaning up the pid hash table a more general struct pid
was introduced, that can be referenced counted.
With the more general struct pid most if not all places where we store a pid_t
we can now store a struct pid * and remove the need for a hash table lookup,
and avoid any possible problems with pid roll over.
Looking forward to the pid namespaces struct pid * gives us an absolute form a
pid so we can compare and use them without caring which pid namespace we are
in.
This patchset introduces the infrastructure needed to use struct pid instead
of pid_t, and then it goes on to convert two different kernel users that
currently store a pid_t value.
There are a lot more places to go but this is enough to get the basic idea.
Before we can merge a pid namespace patch all of the kernel pid_t users need
to be examined. Those that deal with user space processes need to be
converted to using a struct pid *. Those that deal with kernel processes need
to converted to using the kthread api. A rare few that only use their current
processes pid values get to be left alone.
This patch:
task_session returns the struct pid of a tasks session.
task_pgrp returns the struct pid of a tasks process group.
task_tgid returns the struct pid of a tasks thread group.
task_pid returns the struct pid of a tasks process id.
These can be used to avoid unnecessary hash table lookups, and to implement
safe pid comparisions in the face of a pid namespace.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eric W. Biederman [Mon, 2 Oct 2006 09:17:07 +0000 (02:17 -0700)]
[PATCH] proc: give the root directory a task
Helper functions in base.c like proc_pident_readdir and proc_pident_lookup
assume the directories have an associated task, and cannot currently be used
on the /proc root directory because it does not have such a task.
This small changes allows for base.c to be simplified and later when multiple
pid spaces are introduced it makes getting the needed context information
trivial.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eric W. Biederman [Mon, 2 Oct 2006 09:17:07 +0000 (02:17 -0700)]
[PATCH] proc: modify proc_pident_lookup to be completely table driven
Currently proc_pident_lookup gets the names and types from a table and then
has a huge switch statement to get the inode and file operations it needs.
That is silly and is becoming increasingly hard to maintain so I just put all
of the information in the table.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eric W. Biederman [Mon, 2 Oct 2006 09:17:05 +0000 (02:17 -0700)]
[PATCH] proc: reorder the functions in base.c
There were enough changes in my last round of cleaning up proc I had to break
up the patch series into smaller chunks, and my last chunk never got resent.
This patchset gives proc dynamic inode numbers (the static inode numbers were
a pain to maintain and prevent all kinds of things), and removes the horrible
switch statements that had to be kept in sync with everything else. Being
fully table driver takes us 90% of the way of being able to register new
process specific attributes in proc.
This patch:
Group the functions by what they implement instead of by type of operation.
As it existed base.c was quickly approaching the point where it could not be
followed.
No functionality or code changes asside from adding/removing forward
declartions are implemented in this patch.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eric W. Biederman [Mon, 2 Oct 2006 09:17:04 +0000 (02:17 -0700)]
[PATCH] proc: readdir race fix (take 3)
The problem: An opendir, readdir, closedir sequence can fail to report
process ids that are continually in use throughout the sequence of system
calls. For this race to trigger the process that proc_pid_readdir stops at
must exit before readdir is called again.
This can cause ps to fail to report processes, and it is in violation of
posix guarantees and normal application expectations with respect to
readdir.
Currently there is no way to work around this problem in user space short
of providing a gargantuan buffer to user space so the directory read all
happens in on system call.
This patch implements the normal directory semantics for proc, that
guarantee that a directory entry that is neither created nor destroyed
while reading the directory entry will be returned. For directory that are
either created or destroyed during the readdir you may or may not see them.
Furthermore you may seek to a directory offset you have previously seen.
These are the guarantee that ext[23] provides and that posix requires, and
more importantly that user space expects. Plus it is a simple semantic to
implement reliable service. It is just a matter of calling readdir a
second time if you are wondering if something new has show up.
These better semantics are implemented by scanning through the pids in
numerical order and by making the file offset a pid plus a fixed offset.
The pid scan happens on the pid bitmap, which when you look at it is
remarkably efficient for a brute force algorithm. Given that a typical
cache line is 64 bytes and thus covers space for 64*8 == 200 pids. There
are only 40 cache lines for the entire 32K pid space. A typical system
will have 100 pids or more so this is actually fewer cache lines we have to
look at to scan a linked list, and the worst case of having to scan the
entire pid bitmap is pretty reasonable.
If we need something more efficient we can go to a more efficient data
structure for indexing the pids, but for now what we have should be
sufficient.
In addition this takes no additional locks and is actually less code than
what we are doing now.
Also another very subtle bug in this area has been fixed. It is possible
to catch a task in the middle of de_thread where a thread is assuming the
thread of it's thread group leader. This patch carefully handles that case
so if we hit it we don't fail to return the pid, that is undergoing the
de_thread dance.
Thanks to KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> for
providing the first fix, pointing this out and working on it.
[oleg@tv-sign.ru: fix it]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Randy Dunlap [Mon, 2 Oct 2006 09:17:02 +0000 (02:17 -0700)]
[PATCH] list module taint flags in Oops/panic
When listing loaded modules during an oops or panic, also list each
module's Tainted flags if non-zero (P: Proprietary or F: Forced load only).
If a module is did not taint the kernel, it is just listed like
usbcore
but if it did taint the kernel, it is listed like
wizmodem(PF)
Example:
[ 3260.121718] Unable to handle kernel NULL pointer dereference at
0000000000000000 RIP:
[ 3260.121729] [<
ffffffff8804c099>] :dump_test:proc_dump_test+0x99/0xc8
[ 3260.121742] PGD
fe8d067 PUD
264a6067 PMD 0
[ 3260.121748] Oops: 0002 [1] SMP
[ 3260.121753] CPU 1
[ 3260.121756] Modules linked in: dump_test(P) snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device ide_cd generic ohci1394 snd_hda_intel snd_hda_codec snd_pcm snd_timer snd ieee1394 snd_page_alloc piix ide_core arcmsr aic79xx scsi_transport_spi usblp
[ 3260.121785] Pid: 5556, comm: bash Tainted: P 2.6.18-git10 #1
[Alternatively, I can look into listing tainted flags with 'lsmod',
but that won't help in oopsen/panics so much.]
[akpm@osdl.org: cleanup]
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Dean Nelson [Mon, 2 Oct 2006 09:17:01 +0000 (02:17 -0700)]
[PATCH] make genpool allocator adhere to kernel-doc standards
The exported kernel interfaces of genpool allocator need to adhere to
the requirements of kernel-doc.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Steve Wise [Mon, 2 Oct 2006 09:17:00 +0000 (02:17 -0700)]
[PATCH] LIB: add gen_pool_destroy()
Modules using the genpool allocator need to be able to destroy the data
structure when unloading.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Dean Nelson <dcn@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Linus Torvalds [Sun, 1 Oct 2006 20:17:44 +0000 (13:17 -0700)]
pccard_store_cis: fix wrong error handling
The test for the error from pcmcia_replace_cis() was incorrect, and
would always trigger (because if an error didn't happen, the "ret" value
would not be zero, it would be the passed-in count).
Reported and debugged by Fabrice Bellet <fabrice@bellet.info>
Rather than just fix the single broken test, make the code in question
use an understandable code-sequence instead, fixing the whole function
to be more readable.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andrew Morton [Sun, 1 Oct 2006 09:22:41 +0000 (02:22 -0700)]
[PATCH] rtc-sysfs fix
It's not clear how this thinko got through..
Cc: Olaf Hering <olaf@aepfle.de>
Cc: David Brownell <david-b@pacbell.net>
Cc: Alessandro Zummo <alessandro.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Linus Torvalds [Sun, 1 Oct 2006 07:40:55 +0000 (00:40 -0700)]
Merge /pub/scm/linux/kernel/git/davej/agpgart
* master.kernel.org:/pub/scm/linux/kernel/git/davej/agpgart:
[AGPGART] printk fixups.
[AGPGART] Use pci_get_slot not pci_find_slot
Linus Torvalds [Sun, 1 Oct 2006 07:40:35 +0000 (00:40 -0700)]
Merge /pub/scm/linux/kernel/git/davej/cpufreq
* master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] Make acpi-cpufreq unsticky again.
[CPUFREQ] longhaul: remove duplicated code.
[CPUFREQ] Longhaul - Disable arbiter CLE266
[CPUFREQ] Fix section mismatch warning
[CPUFREQ] Fix cut-n-paste bug in suspend printk
Zachary Amsden [Sun, 1 Oct 2006 06:29:38 +0000 (23:29 -0700)]
[PATCH] Some config.h removals
During tracking down a PAE compile failure, I found that config.h was being
included in a bunch of places in i386 code. It is no longer necessary, so
drop it.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Zachary Amsden [Sun, 1 Oct 2006 06:29:38 +0000 (23:29 -0700)]
[PATCH] paravirt: update pte hook
Add a pte_update_hook which notifies about pte changes that have been made
without using the set_pte / clear_pte interfaces. This allows shadow mode
hypervisors which do not trap on page table access to maintain synchronized
shadows.
It also turns out, there was one pte update in PAE mode that wasn't using any
accessor interface at all for setting NX protection. Considering it is PAE
specific, and the accessor is i386 specific, I didn't want to add a generic
encapsulation of this behavior yet.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Zachary Amsden [Sun, 1 Oct 2006 06:29:37 +0000 (23:29 -0700)]
[PATCH] paravirt: remove set pte atomic
Now that ptep_establish has a definition in PAE i386 3-level paging code, the
only paging model which is insane enough to have multi-word hardware PTEs
which are not efficient to set atomically, we can remove the ghost of
set_pte_atomic from other architectures which falesly duplicated it, and
remove all knowledge of it from the generic pgtable code.
set_pte_atomic is now a private pte operator which is specific to i386
Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Zachary Amsden [Sun, 1 Oct 2006 06:29:36 +0000 (23:29 -0700)]
[PATCH] paravirt: optimize ptep establish for pae
The ptep_establish macro is only used on user-level PTEs, for P->P mapping
changes. Since these always happen under protection of the pagetable lock,
the strong synchronization of a 64-bit cmpxchg is not needed, in fact, not
even a lock prefix needs to be used. We can simply instead clear the P-bit,
followed by a normal set. The write ordering is still important to avoid the
possibility of the TLB snooping a partially written PTE and getting a bad
mapping installed.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Zachary Amsden [Sun, 1 Oct 2006 06:29:35 +0000 (23:29 -0700)]
[PATCH] paravirt: kpte flush
Create a new PTE function which combines clearing a kernel PTE with the
subsequent flush. This allows the two to be easily combined into a single
hypercall or paravirt-op. More subtly, reverse the order of the flush for
kmap_atomic. Instead of flushing on establishing a mapping, flush on clearing
a mapping. This eliminates the possibility of leaving stale kmap entries
which may still have valid TLB mappings. This is required for direct mode
hypervisors, which need to reprotect all mappings of a given page when
changing the page type from a normal page to a protected page (such as a page
table or descriptor table page). But it also provides some nicer semantics
for real hardware, by providing extra debug-proofing against using stale
mappings, as well as ensuring that no stale mappings exist when changing the
cacheability attributes of a page, which could lead to cache conflicts when
two different types of mappings exist for the same page.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Zachary Amsden [Sun, 1 Oct 2006 06:29:34 +0000 (23:29 -0700)]
[PATCH] paravirt: combine flush accessed dirty.patch
Remove ptep_test_and_clear_{dirty|young} from i386, and instead use the
dominating functions, ptep_clear_flush_{dirty|young}. This allows the TLB
page flush to be contained in the same macro, and allows for an eager
optimization - if reading the PTE initially returned dirty/accessed, we can
assume the fact that no subsequent update to the PTE which cleared accessed /
dirty has occurred, as the only way A/D bits can change without holding the
page table lock is if a remote processor clears them. This eliminates an
extra branch which came from the generic version of the code, as we know that
no other CPU could have cleared the A/D bit, so the flush will always be
needed.
We still export these two defines, even though we do not actually define
the macros in the i386 code:
#define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
#define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_DIRTY
The reason for this is that the only use of these functions is within the
generic clear_flush functions, and we want a strong guarantee that there
are no other users of these functions, so we want to prevent the generic
code from defining them for us.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Zachary Amsden [Sun, 1 Oct 2006 06:29:33 +0000 (23:29 -0700)]
[PATCH] paravirt: lazy mmu mode hooks.patch
Implement lazy MMU update hooks which are SMP safe for both direct and shadow
page tables. The idea is that PTE updates and page invalidations while in
lazy mode can be batched into a single hypercall. We use this in VMI for
shadow page table synchronization, and it is a win. It also can be used by
PPC and for direct page tables on Xen.
For SMP, the enter / leave must happen under protection of the page table
locks for page tables which are being modified. This is because otherwise,
you end up with stale state in the batched hypercall, which other CPUs can
race ahead of. Doing this under the protection of the locks guarantees the
synchronization is correct, and also means that spurious faults which are
generated during this window by remote CPUs are properly handled, as the page
fault handler must re-check the PTE under protection of the same lock.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Zachary Amsden [Sun, 1 Oct 2006 06:29:31 +0000 (23:29 -0700)]
[PATCH] paravirt: pte clear not present
Change pte_clear_full to a more appropriately named pte_clear_not_present,
allowing optimizations when not-present mapping changes need not be reflected
in the hardware TLB for protected page table modes. There is also another
case that can use it in the fremap code.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Zachary Amsden [Sun, 1 Oct 2006 06:29:30 +0000 (23:29 -0700)]
[PATCH] paravirt: remove read hazard from cow
We don't want to read PTEs directly like this after they have been modified,
as a lazy MMU implementation of direct page tables may not have written the
updated PTE back to memory yet.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andrew Morton [Sun, 1 Oct 2006 06:29:29 +0000 (23:29 -0700)]
[PATCH] invalidate_inode_pages2(): ignore page refcounts
The recent fix to invalidate_inode_pages() (git commit
016eb4a) managed to
unfix invalidate_inode_pages2().
The problem is that various bits of code in the kernel can take transient refs
on pages: the page scanner will do this when inspecting a batch of pages, and
the lru_cache_add() batching pagevecs also hold a ref.
Net result is transient failures in invalidate_inode_pages2(). This affects
NFS directory invalidation (observed) and presumably also block-backed
direct-io (not yet reported).
Fix it by reverting invalidate_inode_pages2() back to the old version which
ignores the page refcounts.
We may come up with something more clever later, but for now we need a 2.6.18
fix for NFS.
Cc: Chuck Lever <cel@citi.umich.edu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andi Kleen [Sun, 1 Oct 2006 06:29:28 +0000 (23:29 -0700)]
[PATCH] Support piping into commands in /proc/sys/kernel/core_pattern
Using the infrastructure created in previous patches implement support to
pipe core dumps into programs.
This is done by overloading the existing core_pattern sysctl
with a new syntax:
|program
When the first character of the pattern is a '|' the kernel will instead
threat the rest of the pattern as a command to run. The core dump will be
written to the standard input of that program instead of to a file.
This is useful for having automatic core dump analysis without filling up
disks. The program can do some simple analysis and save only a summary of
the core dump.
The core dump proces will run with the privileges and in the name space of
the process that caused the core dump.
I also increased the core pattern size to 128 bytes so that longer command
lines fit.
Most of the changes comes from allowing core dumps without seeks. They are
fairly straight forward though.
One small incompatibility is that if someone had a core pattern previously
that started with '|' they will get suddenly new behaviour. I think that's
unlikely to be a real problem though.
Additional background:
> Very nice, do you happen to have a program that can accept this kind of
> input for crash dumps? I'm guessing that the embedded people will
> really want this functionality.
I had a cheesy demo/prototype. Basically it wrote the dump to a file again,
ran gdb on it to get a backtrace and wrote the summary to a shared directory.
Then there was a simple CGI script to generate a "top 10" crashes HTML
listing.
Unfortunately this still had the disadvantage to needing full disk space for a
dump except for deleting it afterwards (in fact it was worse because over the
pipe holes didn't work so if you have a holey address map it would require
more space).
Fortunately gdb seems to be happy to handle /proc/pid/fd/xxx input pipes as
cores (at least it worked with zsh's =(cat core) syntax), so it would be
likely possible to do it without temporary space with a simple wrapper that
calls it in the right way. I ran out of time before doing that though.
The demo prototype scripts weren't very good. If there is really interest I
can dig them out (they are currently on a laptop disk on the desk with the
laptop itself being in service), but I would recommend to rewrite them for any
serious application of this and fix the disk space problem.
Also to be really useful it should probably find a way to automatically fetch
the debuginfos (I cheated and just installed them in advance). If nobody else
does it I can probably do the rewrite myself again at some point.
My hope at some point was that desktops would support it in their builtin
crash reporters, but at least the KDE people I talked too seemed to be happy
with their user space only solution.
Alan sayeth:
I don't believe that piping as such as neccessarily the right model, but
the ability to intercept and processes core dumps from user space is asked
for by many enterprise users as well. They want to know about, capture,
analyse and process core dumps, often centrally and in automated form.
[akpm@osdl.org: loff_t != unsigned long]
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andi Kleen [Sun, 1 Oct 2006 06:29:27 +0000 (23:29 -0700)]
[PATCH] Create call_usermodehelper_pipe()
A new member in the ever growing family of call_usermode* functions is
born. The new call_usermodehelper_pipe() function allows to pipe data to
the stdin of the called user mode progam and behaves otherwise like the
normal call_usermodehelp() (except that it always waits for the child to
finish)
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andi Kleen [Sun, 1 Oct 2006 06:29:26 +0000 (23:29 -0700)]
[PATCH] Some cleanup in the pipe code
Split the big and hard to read do_pipe function into smaller pieces.
This creates new create_write_pipe/free_write_pipe/create_read_pipe
functions. These functions are made global so that they can be used by
other parts of the kernel.
The resulting code is more generic and easier to read and has cleaner error
handling and less gotos.
[akpm@osdl.org: cleanup]
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Amol Lad [Sun, 1 Oct 2006 06:29:25 +0000 (23:29 -0700)]
[PATCH] ioremap balanced with iounmap for drivers/serial/sunsu.c
ioremap must be balanced by an iounmap and failing to do so can result
in a memory leak.
Signed-off-by: Amol Lad <amol@verismonetworks.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David S. Miller <davem@sunset.davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Amol Lad [Sun, 1 Oct 2006 06:29:25 +0000 (23:29 -0700)]
[PATCH] ioremap balanced with iounmap for drivers/serial/mux.c
ioremap must be balanced by an iounmap and failing to do so can result
in a memory leak.
Signed-off-by: Amol Lad <amol@verismonetworks.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Amol Lad [Sun, 1 Oct 2006 06:29:24 +0000 (23:29 -0700)]
[PATCH] ioremap balanced with iounmap for drivers/serial/mpsc.c
ioremap must be balanced by an iounmap and failing to do so can result
in a memory leak.
Signed-off-by: Amol Lad <amol@verismonetworks.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Mark A. Greer <mgreer@mvista.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Amol Lad [Sun, 1 Oct 2006 06:29:23 +0000 (23:29 -0700)]
[PATCH] ioremap balanced with iounmap for drivers/serial/mpc52xx_uart.c
ioremap must be balanced by an iounmap and failing to do so can result
in a memory leak.
Signed-off-by: Amol Lad <amol@verismonetworks.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Amol Lad [Sun, 1 Oct 2006 06:29:22 +0000 (23:29 -0700)]
[PATCH] ioremap balanced with iounmap for drivers/serial/ip22zilog.c
ioremap must be balanced by an iounmap and failing to do so can result
in a memory leak.
Signed-off-by: Amol Lad <amol@verismonetworks.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Amol Lad [Sun, 1 Oct 2006 06:29:21 +0000 (23:29 -0700)]
[PATCH] ioremap balanced with iounmap for drivers/serial/ioc4_serial.c
ioremap must be balanced by an iounmap and failing to do so can result
in a memory leak.
Signed-off-by: Amol Lad <amol@verismonetworks.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Brent Casavant <bcasavan@sgi.com>
Cc: Pat Gefre <pfg@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Amol Lad [Sun, 1 Oct 2006 06:29:21 +0000 (23:29 -0700)]
[PATCH] ioremap balanced with iounmap for drivers/serial/8250_gsc.c
ioremap must be balanced by an iounmap and failing to do so can result
in a memory leak.
Signed-off-by: Amol Lad <amol@verismonetworks.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Amol Lad [Sun, 1 Oct 2006 06:29:20 +0000 (23:29 -0700)]
[PATCH] ioremap balanced with iounmap for drivers/serial/8250_acorn,c
ioremap must be balanced by an iounmap and failing to do so can result
in a memory leak.
Signed-off-by: Amol Lad <amol@verismonetworks.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Haavard Skinnemoen [Sun, 1 Oct 2006 06:29:19 +0000 (23:29 -0700)]
[PATCH] Generic ioremap_page_range: x86_64 conversion
Convert x86_64 to use generic ioremap_page_range()
[akpm@osdl.org: build fix]
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Haavard Skinnemoen [Sun, 1 Oct 2006 06:29:18 +0000 (23:29 -0700)]
[PATCH] Generic ioremap_page_range: m32r conversion
Convert m32r to use generic ioremap_page_range()
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: <linux-m32r@ml.linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Haavard Skinnemoen [Sun, 1 Oct 2006 06:29:17 +0000 (23:29 -0700)]
[PATCH] Generic ioremap_page_range: i386 conversion
Convert i386 to use generic ioremap_page_range()
[bunk@stusta.de: build fix]
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Haavard Skinnemoen [Sun, 1 Oct 2006 06:29:17 +0000 (23:29 -0700)]
[PATCH] Generic ioremap_page_range: cris conversion
Convert CRIS to use generic ioremap_page_range()
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Acked-by: Mikael Starvik <starvik@axis.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Haavard Skinnemoen [Sun, 1 Oct 2006 06:29:16 +0000 (23:29 -0700)]
[PATCH] Generic ioremap_page_range: avr32 conversion
Convert AVR32 to use generic ioremap_page_range()
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Haavard Skinnemoen [Sun, 1 Oct 2006 06:29:15 +0000 (23:29 -0700)]
[PATCH] Generic ioremap_page_range: alpha conversion
Convert Alpha to use generic ioremap_page_range() by turning
__alpha_remap_area_pages() into an inline wrapper around ioremap_page_range().
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Haavard Skinnemoen [Sun, 1 Oct 2006 06:29:14 +0000 (23:29 -0700)]
[PATCH] Generic ioremap_page_range: flush_cache_vmap
The existing implementation of ioremap_page_range(), which was taken
from i386, does this:
flush_cache_all();
/* modify page tables */
flush_tlb_all();
I think this is a bit defensive, so this patch changes the generic
implementation to do:
/* modify page tables */
flush_cache_vmap(start, end);
instead, which is similar to what vmalloc() does. This should still
be correct because we never modify existing PTEs. According to
James Bottomley:
The problem the flush_tlb_all() is trying to solve is to avoid stale tlb
entries in the ioremap area. We're just being conservative by flushing
on both map and unmap. Technically what vmalloc/vfree does (only flush
the tlb on unmap) is just fine because it means that the only tlb
entries in the remap area must belong to in-use mappings.
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Andi Kleen <ak@suse.de>
Cc: <linux-m32r@ml.linux-m32r.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Haavard Skinnemoen [Sun, 1 Oct 2006 06:29:12 +0000 (23:29 -0700)]
[PATCH] Generic ioremap_page_range: implementation
This patch adds a generic implementation of ioremap_page_range() in
lib/ioremap.c based on the i386 implementation. It differs from the
i386 version in the following ways:
* The PTE flags are passed as a pgprot_t argument and must be
determined up front by the arch-specific code. No additional
PTE flags are added.
* Uses set_pte_at() instead of set_pte()
[bunk@stusta.de: warning fix]
]dhowells@redhat.com: nommu build fix]
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Andi Kleen <ak@suse.de>
Cc: <linux-m32r@ml.linux-m32r.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fernando Vazquez [Sun, 1 Oct 2006 06:29:10 +0000 (23:29 -0700)]
[PATCH] stack overflow safe kdump: safe smp_send_nmi_allbutself()
Re-implement smp_send_nmi_allbutself() so that calls to smp_processor_id
(through send_IPI_allbutself) can be replaced with safe_smp_processor_id
without affecting other parts of the kernel (as suggested by Eric Biederman).
Signed-off-by: Fernando Vazquez <fernando@intellilink.co.jp>
Looks-reasonable-to: Andi Kleen <ak@muc.de>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fernando Vazquez [Sun, 1 Oct 2006 06:29:09 +0000 (23:29 -0700)]
[PATCH] stack overflow safe kdump: crash: use safe_smp_processor_id()
Substitute "smp_processor_id" with the stack overflow-safe
"safe_smp_processor_id" in the reboot path to the second kernel.
[akpm@osdl.org: build fix]
Signed-off-by: Fernando Vazquez <fernando@intellilink.co.jp>
Looks-reasonable-to: Andi Kleen <ak@muc.de>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fernando Vazquez [Sun, 1 Oct 2006 06:29:08 +0000 (23:29 -0700)]
[PATCH] stack overflow safe kdump: safe_smp_processor_id(): voyager
"safe_smp_processor_id" implementation for i386-Voyager.
Signed-off-by: Fernando Vazquez <fernando@intellilink.co.jp>
Looks-reasonable-to: Andi Kleen <ak@muc.de>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fernando Vazquez [Sun, 1 Oct 2006 06:29:07 +0000 (23:29 -0700)]
[PATCH] stack overflow safe kdump: safe_smp_processor_id()
This is a the first of a series of patch-sets aiming at making kdump more
robust against stack overflows.
This patch set does the following:
* Add safe_smp_processor_id function to i386 architecture (this function was
inspired by the x86_64 function of the same name).
* Substitute "smp_processor_id" with the stack overflow-safe
"safe_smp_processor_id" in the reboot path to the second kernel.
This patch:
On the event of a stack overflow critical data that usually resides at the
bottom of the stack is likely to be stomped and, consequently, its use should
be avoided.
In particular, in the i386 and IA64 architectures the macro smp_processor_id
ultimately makes use of the "cpu" member of struct thread_info which resides
at the bottom of the stack. x86_64, on the other hand, is not affected by
this problem because it benefits from the use of the PDA infrastructure.
To circumvent this problem I suggest implementing "safe_smp_processor_id()"
(it already exists in x86_64) for i386 and IA64 and use it as a replacement
for smp_processor_id in the reboot path to the dump capture kernel. This is a
possible implementation for i386.
Signed-off-by: Fernando Vazquez <fernando@intellilink.co.jp>
Looks-reasonable-to: Andi Kleen <ak@muc.de>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Dave Hansen [Sun, 1 Oct 2006 06:29:06 +0000 (23:29 -0700)]
[PATCH] r/o bind mounts: monitor zeroing of i_nlink
Some filesystems, instead of simply decrementing i_nlink, simply zero it
during an unlink operation. We need to catch these in addition to the
decrement operations.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mark Fasheh [Sun, 1 Oct 2006 06:29:05 +0000 (23:29 -0700)]
[PATCH] r/o bind mounts: clean up OCFS2 nlink handling
OCFS2 does some operations on i_nlink, then reverts them if some of its
operations fail to complete. This does not fit in well with the
drop_nlink() logic where we expect i_nlink to stay at zero once it gets
there.
So, delay all of the nlink operations until we're sure that the operations
have completed. Also, introduce a small helper to check whether an inode
has proper "unlinkable" i_nlink counts no matter whether it is a directory
or regular inode.
This patch is broken out from the others because it does contain some
logical changes.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Dave Hansen [Sun, 1 Oct 2006 06:29:04 +0000 (23:29 -0700)]
[PATCH] r/o bind mount prepwork: inc_nlink() helper
This is mostly included for parity with dec_nlink(), where we will have some
more hooks. This one should stay pretty darn straightforward for now.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Dave Hansen [Sun, 1 Oct 2006 06:29:03 +0000 (23:29 -0700)]
[PATCH] r/o bind mounts: unlink: monitor i_nlink
When a filesystem decrements i_nlink to zero, it means that a write must be
performed in order to drop the inode from the filesystem.
We're shortly going to have keep filesystems from being remounted r/o between
the time that this i_nlink decrement and that write occurs.
So, add a little helper function to do the decrements. We'll tie into it in a
bit to note when i_nlink hits zero.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Dave Hansen [Sun, 1 Oct 2006 06:29:02 +0000 (23:29 -0700)]
[PATCH] r/o bind mount prepwork: move open_namei()'s vfs_create()
The code around vfs_create() in open_namei() is getting a bit too complex.
Right now, there is at least the reference count on the dentry, and the
i_mutex to worry about. Soon, we'll also have mnt_writecount.
So, break the vfs_create() call out of open_namei(), and into a helper
function. This duplicates the call to may_open(), but that isn't such a bad
thing since the arguments (acc_mode and flag) were being heavily massaged
anyway.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Dave Hansen [Sun, 1 Oct 2006 06:29:01 +0000 (23:29 -0700)]
[PATCH] r/o bind mounts: prepare for write access checks: collapse if()
We're shortly going to be adding a bunch more permission checks in these
functions. That requires adding either a bunch of new if() conditions, or
some gotos. This patch collapses existing if()s and uses gotos instead to
prepare for the upcoming changes.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jay Lan [Sun, 1 Oct 2006 06:29:00 +0000 (23:29 -0700)]
[PATCH] csa accounting taskstats update
ChangeLog:
Feedbacks from Andrew Morton:
- define TS_COMM_LEN to 32
- change acct_stimexpd field of task_struct to be of
cputime_t, which is to be used to save the tsk->stime
of last timer interrupt update.
- a new Documentation/accounting/taskstats-struct.txt
to describe fields of taskstats struct.
Feedback from Balbir Singh:
- keep the stime of a task to be zero when both stime
and utime are zero as recoreded in task_struct.
Misc:
- convert accumulated RSS/VM from platform dependent
pages-ticks to MBytes-usecs in the kernel
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jay Lan [Sun, 1 Oct 2006 06:28:59 +0000 (23:28 -0700)]
[PATCH] csa: convert CONFIG tag for extended accounting routines
There were a few accounting data/macros that are used in CSA but are #ifdef'ed
inside CONFIG_BSD_PROCESS_ACCT. This patch is to change those ifdef's from
CONFIG_BSD_PROCESS_ACCT to CONFIG_TASK_XACCT. A few defines are moved from
kernel/acct.c and include/linux/acct.h to kernel/tsacct.c and
include/linux/tsacct_kern.h.
Signed-off-by: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jay Lan [Sun, 1 Oct 2006 06:28:58 +0000 (23:28 -0700)]
[PATCH] csa: Extended system accounting over taskstats
Add extended system accounting handling over taskstats interface. A
CONFIG_TASK_XACCT flag is created to enable the extended accounting code.
Signed-off-by: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jay Lan [Sun, 1 Oct 2006 06:28:55 +0000 (23:28 -0700)]
[PATCH] csa: basic accounting over taskstats
Add some basic accounting fields to the taskstats struct, add a new
kernel/tsacct.c to handle basic accounting data handling upon exit. A handle
is added to taskstats.c to invoke the basic accounting data handling.
Signed-off-by: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: "Michal Piotrowski" <michal.k.k.piotrowski@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Balbir Singh [Sun, 1 Oct 2006 06:28:54 +0000 (23:28 -0700)]
[PATCH] Fix getdelays.c - cpumask length and error reporting
Fix the length passed while (un)registering cpumask. We were passing sizeof
the array, make it strlen().
Error value printed in fatal errors should be derived from the message. The
message contains an nlmsgerr embedded with an error value. We must report
that value to the user.
Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Cc: Jamal Hadi <hadi@cyberus.ca>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Balbir Singh [Sun, 1 Oct 2006 06:28:53 +0000 (23:28 -0700)]
[PATCH] Fix taskstats size calculation (use the new genetlink utility functions)
The addition of the CSA patch pushed the size of struct taskstats to 256
bytes. This exposed a problem with prepare_reply(), we were not allocating
space for the netlink and genetlink header. It worked earlier because
alloc_skb() would align the skb to SMP_CACHE_BYTES, which added some additonal
bytes.
Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Cc: Jamal Hadi <hadi@cyberus.ca>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Balbir Singh [Sun, 1 Oct 2006 06:28:51 +0000 (23:28 -0700)]
[PATCH] Add genetlink utilities for payload length calculation
Add two utility helper functions genlmsg_msg_size() and genlmsg_total_size().
These functions are derived from their netlink counterparts.
Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Cc: Jamal Hadi <hadi@cyberus.ca>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Chen, Kenneth W [Sun, 1 Oct 2006 06:28:50 +0000 (23:28 -0700)]
[PATCH] clean up unused kiocb variables
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Zach Brown <zach.brown@oracle.com>
Cc: Suparna Bhattacharya <suparna@in.ibm.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Badari Pulavarty [Sun, 1 Oct 2006 06:28:49 +0000 (23:28 -0700)]
[PATCH] Add vector AIO support
This work is initially done by Zach Brown to add support for vectored aio.
These are the core changes for AIO to support
IOCB_CMD_PREADV/IOCB_CMD_PWRITEV.
[akpm@osdl.org: huge build fix]
Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Badari Pulavarty [Sun, 1 Oct 2006 06:28:48 +0000 (23:28 -0700)]
[PATCH] Streamline generic_file_* interfaces and filemap cleanups
This patch cleans up generic_file_*_read/write() interfaces. Christoph
Hellwig gave me the idea for this clean ups.
In a nutshell, all filesystems should set .aio_read/.aio_write methods and use
do_sync_read/ do_sync_write() as their .read/.write methods. This allows us
to cleanup all variants of generic_file_* routines.
Final available interfaces:
generic_file_aio_read() - read handler
generic_file_aio_write() - write handler
generic_file_aio_write_nolock() - no lock write handler
__generic_file_aio_write_nolock() - internal worker routine
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Badari Pulavarty [Sun, 1 Oct 2006 06:28:47 +0000 (23:28 -0700)]
[PATCH] Remove readv/writev methods and use aio_read/aio_write instead
This patch removes readv() and writev() methods and replaces them with
aio_read()/aio_write() methods.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Badari Pulavarty [Sun, 1 Oct 2006 06:28:46 +0000 (23:28 -0700)]
[PATCH] Vectorize aio_read/aio_write fileop methods
This patch vectorizes aio_read() and aio_write() methods to prepare for
collapsing all aio & vectored operations into one interface - which is
aio_read()/aio_write().
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Michael Holzheu <HOLZHEU@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Mahoney [Sun, 1 Oct 2006 06:28:45 +0000 (23:28 -0700)]
[PATCH] reiserfs: eliminate minimum window size for bitmap searching
When a file system becomes fragmented (using MythTV, for example), the
bigalloc window searching ends up causing huge performance problems. In a
file system presented by a user experiencing this bug, the file system was
90% free, but no 32-block free windows existed on the entire file system.
This causes the allocator to scan the entire file system for each 128k
write before backing down to searching for individual blocks.
In the end, finding a contiguous window for all the blocks in a write is an
advantageous special case, but one that can be found naturally when such a
window exists anyway.
This patch removes the bigalloc window searching, and has been proven to
fix the test case described above.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Mahoney [Sun, 1 Oct 2006 06:28:44 +0000 (23:28 -0700)]
[PATCH] reiserfs: use generic_file_open for open() checks
The other common disk-based file systems (I checked ext[23], xfs, jfs)
check to ensure that opens of files > 2 GB fail unless O_LARGEFILE is
specified. They check via generic_file_open or their own open routine.
ReiserFS doesn't have an f_op->open defined, and as such, it's possible to
open files > 2 GB without O_LARGEFILE.
This patch adds the f_op->open member to conform with the expected
behavior.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Mahoney [Sun, 1 Oct 2006 06:28:44 +0000 (23:28 -0700)]
[PATCH] reiserfs: on-demand bitmap loading
This is the patch the three previous ones have been leading up to.
It changes the behavior of ReiserFS from loading and caching all the bitmaps
as special, to treating the bitmaps like any other bit of metadata and just
letting the system-wide caches figure out what to hang on to.
Buffer heads are allocated on the fly, so there is no need to retain pointers
to all of them. The caching of the metadata occurs when the data is read and
updated, and is considered invalid and uncached until then.
I needed to remove the vs-4040 check for performing a duplicate operation on a
particular bit. The reason is that while the other sites for working with
bitmaps are allowed to schedule, is_reusable() is called from do_balance(),
which will panic if a schedule occurs in certain places.
The benefit of on-demand bitmaps clearly outweighs a sanity check that depends
on a compile-time option that is discouraged.
[akpm@osdl.org: warning fix]
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Mahoney [Sun, 1 Oct 2006 06:28:43 +0000 (23:28 -0700)]
[PATCH] reiserfs: reorganize bitmap loading functions
This patch moves the bitmap loading code from super.c to bitmap.c
The code is also restructured somewhat. The only difference between new
format bitmaps and old format bitmaps is where they are. That's a two liner
before loading the block to use the correct one. There's no need for an
entirely separate code path.
The load path is generally the same, with the pattern being to throw out a
bunch of requests and then wait for them, then cache the metadata from the
contents.
Again, like the previous patches, the purpose is to set up for later ones.
Update: There was a bug in the previously posted version of this that resulted
in corruption. The problem was that bitmap 0 on new format file systems must
be treated specially, and wasn't. A stupid bug with an easy fix.
This is hopefully the last fix for the disaster that is the reiserfs bitmap
patch set.
If a bitmap block was full, first_zero_hint would end up at zero since it
would never be changed from it's zeroed out value. This just sets it
beyond the end of the bitmap block. If any bits are freed, it will be
reset to a valid bit. When info->free_count = 0, then we already know it's
full.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Mahoney [Sun, 1 Oct 2006 06:28:42 +0000 (23:28 -0700)]
[PATCH] reiserfs: clean up bitmap block buffer head references
Similar to the SB_JOURNAL cleanup that was accepted a while ago, this patch
uses a temporary variable for buffer head references from the bitmap info
array.
This makes the code much more readable in some areas.
It also uses proper reference counting, doing a get_bh() after using the
pointer from the array and brelse()'ing it later. This may seem silly, but a
later patch will replace the simple temporary variables with an actual read,
so the reference freeing will be used then.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Mahoney [Sun, 1 Oct 2006 06:28:40 +0000 (23:28 -0700)]
[PATCH] reiserfs: fix is_reusable bitmap check to not traverse the bitmap info array
There is a check in is_reusable to determine if a particular block is a bitmap
block. It verifies this by going through the array of bitmap block buffer
heads and comparing the block number to each one.
Bitmap blocks are at defined locations on the disk in both old and current
formats. Simply checking against the known good values is enough.
This is a trivial optimization for a non-production codepath, but this is the
first in a series of patches that will ultimately remove the buffer heads from
that array.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Atsushi Nemoto [Sun, 1 Oct 2006 06:28:31 +0000 (23:28 -0700)]
[PATCH] kill wall_jiffies
With 2.6.18-rc4-mm2, now wall_jiffies will always be the same as jiffies.
So we can kill wall_jiffies completely.
This is just a cleanup and logically should not change any real behavior
except for one thing: RTC updating code in (old) ppc and xtensa use a
condition "jiffies - wall_jiffies == 1". This condition is never met so I
suppose it is just a bug. I just remove that condition only instead of
kill the whole "if" block.
[heiko.carstens@de.ibm.com: s390 build fix and cleanup]
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adrian Bunk [Sun, 1 Oct 2006 06:28:29 +0000 (23:28 -0700)]
[PATCH] kernel/time/ntp.c: possible cleanups
This patch contains the following possible cleanups:
- make the following needlessly global function static:
- ntp_update_frequency()
- make the following needlessly global variables static:
- time_state
- time_offset
- time_constant
- time_reftime
- remove the following read-only global variable:
- time_precision
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Roman Zippel [Sun, 1 Oct 2006 06:28:29 +0000 (23:28 -0700)]
[PATCH] ntp: cleanup defines and comments
Remove a few unused defines and remove obsolete information from comments.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Roman Zippel [Sun, 1 Oct 2006 06:28:28 +0000 (23:28 -0700)]
[PATCH] ntp: convert to the NTP4 reference model
This converts the kernel ntp model into a model which matches the nanokernel
reference implementations. The previous patches already increased the
resolution and precision of the computations, so that this conversion becomes
quite simple.
<linux@horizon.com> explains:
The original NTP kernel interface was defined in units of microseconds.
That's what Linux implements. As computers have gotten faster and can now
split microseconds easily, a new kernel interface using nanosecond units was
defined ("the nanokernel", confusing as that name is to OS hackers), and
there's an STA_NANO bit in the adjtimex() status field to tell the application
which units it's using.
The current ntpd supports both, but Linux loses some possible timing
resolution because of quantization effects, and the ntpd hackers would really
like to be able to drop the backwards compatibility code.
Ulrich Windl has been maintaining a patch set to do the conversion for years,
but it's hard to keep in sync.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Roman Zippel [Sun, 1 Oct 2006 06:28:27 +0000 (23:28 -0700)]
[PATCH] ntp: convert time_freq to nsec value
This converts time_freq to a scaled nsec value and adds around 6bit of extra
resolution. This pushes the time_freq to its 32bit limits so the calculatons
have to be done with 64bit.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Roman Zippel [Sun, 1 Oct 2006 06:28:26 +0000 (23:28 -0700)]
[PATCH] ntp: remove time_tolerance
time_tolerance isn't changed at all in the kernel, so simply remove it, this
simplifies the next patch, as it avoids a number of conversions.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Roman Zippel [Sun, 1 Oct 2006 06:28:25 +0000 (23:28 -0700)]
[PATCH] ntp: add time_adjust to tick length
This folds update_ntp_one_tick() into second_overflow() and adds time_adjust
to the tick length, this makes time_next_adjust unnecessary. This slightly
changes the adjtime() behaviour, instead of applying it to the next tick, it's
applied to the next second.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Roman Zippel [Sun, 1 Oct 2006 06:28:25 +0000 (23:28 -0700)]
[PATCH] ntp: prescale time_offset
This converts time_offset into a scaled per tick value. This avoids now
completely the crude compensation in second_overflow().
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Roman Zippel [Sun, 1 Oct 2006 06:28:24 +0000 (23:28 -0700)]
[PATCH] ntp: add time_freq to tick length
This adds the frequency part to ntp_update_frequency().
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Roman Zippel [Sun, 1 Oct 2006 06:28:23 +0000 (23:28 -0700)]
[PATCH] ntp: add time_adj to tick length
This makes time_adj local to second_overflow() and integrates it into the tick
length instead of adding it everytime.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Roman Zippel [Sun, 1 Oct 2006 06:28:22 +0000 (23:28 -0700)]
[PATCH] ntp: add ntp_update_frequency
This introduces ntp_update_frequency() and deinlines ntp_clear() (as it's not
performance critical). ntp_update_frequency() calculates the base tick length
using tick_usec and adds a base adjustment, in case the frequency doesn't
divide evenly by HZ.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
john stultz [Sun, 1 Oct 2006 06:28:22 +0000 (23:28 -0700)]
[PATCH] NTP: Move all the NTP related code to ntp.c
Move all the NTP related code to ntp.c
[akpm@osdl.org: cleanups, build fix]
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Josh Triplett [Sun, 1 Oct 2006 06:28:21 +0000 (23:28 -0700)]
[PATCH] Pass sparse the lock expression given to lock annotations
The lock annotation macros __acquires, __releases, __acquire, and __release
all currently throw away the lock expression passed as an argument. Now
that sparse can parse __context__ and __attribute__((context)) with a
context expression, pass the lock expression down to sparse as the context
expression. This requires a version of sparse from GIT commit
37475a6c1c3e66219e68d912d5eb833f4098fd72 or later.
Signed-off-by: Josh Triplett <josh@freedesktop.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This page took 0.049933 seconds and 5 git commands to generate.