drtracing.org Git - deliverable/linux.git/log

mmc: sdhci: Fix parameter of sdhci_do_start_signal_voltage_switch()

3714f4315354 ("mmc: sdhci: update signal voltage switch code") changed the
type of the second parameter of sdhci_do_start_signal_voltage_switch(),
from "struct mmc_ios *ios" to "int signal_voltage" which causes the
following build warning:

drivers/mmc/host/sdhci.c:2044:2: warning: initialization from incompatible pointer type [enabled by default]
drivers/mmc/host/sdhci.c:2044:2: warning: (near initialization for 'sdhci_ops.start_signal_voltage_switch') [enabled by default]

Use the previous type so that it matches the start_signal_voltage_switch()
definition from host.h.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Reviewed-by: Johan Rudholm <johan.rudholm@stericsson.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: sdhci: check voltage range only on regulators aware of voltage value

Some regulators don't report any voltage values, so checking supported
voltage range results in disabling all SDHCI_CAN_VDD_* flags and
registration failure. This patch finally provides a correct fix for the
registration of SDHCI driver with all possible voltage regulators:
dummy, fixed and regulated without using regulator_count_voltages()
hacks.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: bcm2835: set SDHCI_QUIRK_DATA_TIMEOUT_USES_SDCLK

SDHCI_QUIRK_DATA_TIMEOUT_USES_SDCLK does basically the same as
implementing struct sdhci_ops .get_timeout_clock, so simply set that
quirk and remove the custom code to simplify the driver.

Reported-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Stephen Warren <swarren@wwwdotorg.org>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: support packed write command for eMMC4.5 devices

This patch supports packed write command of eMMC4.5 devices.  Several
writes can be grouped in packed command and all data of the individual
commands can be sent in a single transfer on the bus. Large amounts of
data in one transfer rather than several data of small size are
effective for eMMC write internally.  As a result, packed command help
write throughput be improved.  The following tables show the results
of packed write.

Type A:
test     none |  packed
iozone   25.8 |  31
tiotest  27.6 |  31.2
lmdd     31.2 |  35.4

Type B:
test     none |  packed
iozone   44.1 |  51.1
tiotest  47.9 |  52.5
lmdd     51.6 |  59.2

Type C:
test     none |  packed
iozone   19.5 |  32
tiotest  19.9 |  34.5
lmdd     22.8 |  40.7

Signed-off-by: Seungwon Jeon <tgih.jun@samsung.com>
Reviewed-by: Maya Erez <merez@codeaurora.org>
Reviewed-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: add packed command feature of eMMC4.5

This patch adds packed command feature of eMMC4.5. The maximum number
for packing read (or write) is offered and exception event relevant to
packed command which is used for error handling is enabled. If host
wants to use this feature, MMC_CAP2_PACKED_CMD should be set.

Signed-off-by: Seungwon Jeon <tgih.jun@samsung.com>
Reviewed-by: Maya Erez <merez@codeaurora.org>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Reviewed-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: rtsx: remove driving adjustment

Several new models of readers use different way to select driving
capability (a necessary adjustment along with voltage change). Removing
this from device-independent rtsx_pci_sdmmc module. It will be implemented
in device-depend calls encapsulated by rtsx_pci_switch_output_voltage().

Signed-off-by: Roger Tseng <rogerable@realtek.com>
Reviewed-by: Wei WANG <wei_wang@realsil.com.cn>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: use regulator_can_change_voltage() instead of regulator_count_voltages

mmc_regulator_set_ocr() depends on the ability of regulator to change the
voltage value. When regulator cannot change its voltage output, some code
is skipped to avoid reporting false errors on some boards, which use MMC
hosts with fixed regulators (e.g. Samsung Goni and UniversalC210 boards).

This patch replaces a hacky workaround based on regulator_count_voltages()
value with the correct call to recently introduced
regulator_can_change_voltage() function in regulators core.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: sdhci-pxav3: add pm runtime support

Signed-off-by: Kevin Liu <kliu5@marvell.com>
Signed-off-by: Jialing Fu <jlfu@marvell.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: core: fix indentation

This patch fixes incorrect indentation. (Just code cleanup)

Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: block: don't start new request when the card is removed

It's not necessary to start a new request while error handling if
the card was removed.

Signed-off-by: Seungwon Jeon <tgih.jun@samsung.com>
Acked-by: Jaehoon Chung <jh80.chung@samsung.com>
Tested-by: Konstantin Dorfman <kdorfman@codeaurora.org>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: core: fix permanent sleep of mmcqd during card removal

This patch is derived from:
"mmc: fix async request mechanism for sequential read scenarios".

According as async transfer, a request is handled with twice mmc_start_req.
When the card is removed, the request is actually not issued in the first
mmc_start_req [__mmc_start_data_req]. And then mmc_wait_for_data_req_done
will come in the next mmc_start_req. But there is no event for completions.
wake_up_interruptible is needed in __mmc_start_data_req for the case of
removed card.

Signed-off-by: Seungwon Jeon <tgih.jun@samsung.com>
Acked-by: Jaehoon Chung <jh80.chung@samsung.com>
Tested-by: Konstantin Dorfman <kdorfman@codeaurora.org>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: sdhci: enhance preset value function

4d55c5a1 ("mmc: sdhci: enable preset value after uhs initialization")
added preset value support and enabled it by default during sd card init.

Below are the enhancements introduced by this patch:

1. In current code, preset value is enabled after setting clock finished,
which means the clock is manually set by driver firstly and then suddenly
switched to preset value at this point. So the first setting is useless
and unnecessary. What's more, the first clock setting may differ from the
preset one.  The better way is enable preset value just after switch to
UHS mode so the preset value can take effect immediately. So move preset
value enable from mmc_sd_init_card to sdhci_set_ios which will be called
during set timing.

2. In current code, preset value is disabled at the beginning of
mmc_attach_sd.  It's too late since low freq (400khz) should be set in
mmc_power_up.  So move preset value disable to sdhci_set_ios which will
be called during power up.

3. host->clock and ios->drv_type should also be updated according to the
preset value if it's enabled. Current code missed this.

4. This patch also introduce a quirk to disable preset value in case
preset value doesn't work.

This patch has been verified on sdhci-pxav3 platform with both preset
enabled and disabled.

Signed-off-by: Kevin Liu <kliu5@marvell.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: mmc_spi: Fix return value evaluation of irq_of_parse_and_map()

When irq_of_parse_and_map() returns an error, it does as zero. But in
mmc_spi_get_pdata(), the error return case is compared against NO_IRQ.
This might work where NO_IRQ is zero (defaults to zero when undefined,
as on MIPS) but not where NO_IRQ is different, e.g. on ARM where it's -1.

This patch changes to comparison with 0 which is the error return value
of irq_of_parse_and_map().

Tested on ARM that mmc_spi is working now.

Signed-off-by: Roland Stigge <stigge@antcom.de>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: MAINTAINERS update for omap_hsmmc

Update Maintainer email for omap_hsmmc, as Venkatraman will no longer
be able to maintain omap_hsmmc driver.

Signed-off-by: Balaji T K <balajitk@ti.com>
Acked-by: Venkatraman S <svenkatr@gmail.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: sdhci-pltfm: Add a common clk API implementation of get_timeout_clock

Quite a few drivers have a implementation of the get_timeout_clock
callback which simply returns the result of clk_get_rate on the device's
clock. This patch adds a common implementation of this to the sdhci-pltfm
module and replaces all custom implementations with the common one.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Tested-by: Stephen Warren <swarren@wwwdotorg.org>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Tested-by: Kevin Liu <kliu5@marvell.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: sdhci: update signal voltage switch code

The protocol related code is moved to core stack. So update the host
driver accordingly.

Signed-off-by: Kevin Liu <kliu5@marvell.com>
Tested-by: Tim Wang <wangtt@marvell.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: core: Fixup signal voltage switch

When switching SD and SDIO cards from 3.3V to 1.8V signal levels, the
clock should be gated for 5 ms during the step. After enabling the
clock, the host should wait for at least 1 ms before checking for
failure. Failure by the card to switch is indicated by dat[0:3] being
pulled low. The host should check for this condition and power-cycle
the card if failure is indicated.

Add a retry mechanism for the SDIO case.

If the voltage switch fails repeatedly, give up and continue the
initialization using the original voltage.

This patch places a couple of requirements on the host driver:

1) mmc_set_ios with ios.clock = 0 must gate the clock
2) mmc_power_off must actually cut the power to the card
3) The card_busy host_ops member must be implemented

if these requirements are not fulfilled, the 1.8V signal voltage switch
will still be attempted but may not be successful.

Signed-off-by: Johan Rudholm <johan.rudholm@stericsson.com>
Signed-off-by: Kevin Liu <kliu5@marvell.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Wei WANG <wei_wang@realsil.com.cn>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: core: Break out start_signal_voltage_switch

Allow callers to access the start_signal_voltage_switch host_ops
member without going through any cmd11 logic. This is mostly a
preparation for the following signal voltage switch patch.

Also, reset ios.signal_voltage to its original value if
start_signal_voltage_switch fails.

Signed-off-by: Johan Rudholm <johan.rudholm@stericsson.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Wei WANG <wei_wang@realsil.com.cn>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: core: Add card_busy to host_ops

This host_ops member is used to test if the card is signaling busy by
pulling dat[0:3] low.

Signed-off-by: Johan Rudholm <johan.rudholm@stericsson.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Wei WANG <wei_wang@realsil.com.cn>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: core: Add mmc_power_cycle

Add mmc_power_cycle which can be used to power cycle for instance
SD-cards.

Signed-off-by: Johan Rudholm <johan.rudholm@stericsson.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Wei WANG <wei_wang@realsil.com.cn>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: sd: Simplify by using mmc_host_uhs

Signed-off-by: Johan Rudholm <johan.rudholm@stericsson.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: core: expose RPMB partition only for CMD23 capable hosts

SET_BLOCK_COUNT CMD23 is needed for all access to RPMB partition. If
block count is not set by CMD23, all subsequent read/write commands fail
as per eMMC specification. So, If the host does not support CMD23, do not
expose RPMB partition.

Accessing RPMB partition can cause hang / huge delay for hosts which do
not support CMD23.

Signed-off-by: Balaji T K <balajitk@ti.com>
Reported-and-Tested-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: goldfish: emulated MMC device

This driver handles the virtual MMC device present in the Goldfish emulator.
The patch folds together initial work from Mike Lockwood and patches by
San Mehat, Jun Nakajima and Tom Keel <thomas.keel@intel.com> plus cleanups
by Alan Cox to get it all into 3.6 shape.

Signed-off-by: Mike A. Chan <mikechan@google.com>
[cleaned up and x86 support added]
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Xiaohui Xin <xiaohui.xin@intel.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Bruce Beare <bruce.j.beare@intel.com>
[Moved to 3.4]
Signed-off-by: Tom Keel <thomas.keel@intel.com>
[Moved to 3.7]
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: dt: bus-width can be an optional property

None of mmc drivers implements bus-width as a required device tree
property. Instead, some drivers like atmel-mci, dw_mmc, sdhci-s3c
implement it as an optional one, and will force bus width to be 1
when the property is absent. Let's change the common binding to
reflect what the drivers are usually doing.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: sdhci-esdhc-imx: support 8bit mode

The i.MX esdhc has a nonstandard bit layout for the SDHCI_HOST_CONTROL
register. To support 8bit bus width on i.MX populate the platform_bus_width
callback. This is tested on an i.MX25, but should according to the datasheets
work on the other i.MX using this hardware aswell. The i.MX6, while having
a SDHCI_SPEC_300 controller, still uses the same nonstandard register layout.

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Tested-by: Dirk Behme <dirk.behme@de.bosch.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: sdhci: rename platform_8bit_width to platform_bus_width

The 8bit in the function name is misleading. When set, it will be
used to set the bus width, regardless of whether 8bit or another
bus width is requested, so change the function name to
platform_bus_width.

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Tested-by: Dirk Behme <dirk.behme@de.bosch.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: sdhci-esdhc-imx: Auto CMD23 support for usdhc

SDHCI core will try to use Auto CMD23 for mmc card. Currently, we will
see the following message with mmc card on usdhc due to the lacking of
Auto CMD23 support in the driver.

$ mmc0: new high speed MMC card at address 0001
mmcblk1: mmc0:0001 MMC02G 1.87 GiB
mmcblk1: error -84 transferring data, sector 0, nr 8, cmd response 0x900, card status 0xb00
mmcblk1: retrying using single block read
mmcblk1:

Enable Auto CMD23 support for usdhc so that mmc card can work in
multiple block mode.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Tested-by: Dirk Behme <dirk.behme@de.bosch.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: sdhci-esdhc-imx: manually reset MIX_CTRL for usdhc

It's another violation to SDHC spec that software reset on usdhc
does not reset MIX_CTRL register. Have to do it manually, otherwise
the preserving of the register bits (e.g. AC23EN) may cause mmc card
fail to be initialized.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Tested-by: Dirk Behme <dirk.behme@de.bosch.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: sdhci-esdhc-imx: separate transfer mode from command write for usdhc

The combining of SDHCI_TRANSFER_MODE and SDHCI_COMMAND writes is only
required for esdhc, but not necessarily for usdhc. Different from
esdhc where the bits for transfer mode and command are all in the same
register CMD_XFR_TYP, usdhc has a newly introduced register MIX_CTRL
to hold transfer mode bits. So it makes more sense to separate transfer
mode from command write for usdhc.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Tested-by: Dirk Behme <dirk.behme@de.bosch.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: core: move the cache disabling operation to mmc_suspend

Cache control is an eMMC feature and in therefore should be
part of MMC's bus resume operations, performed in mmc_suspend,
rather than in the generic mmc_suspend_host().

Signed-off-by: Maya Erez <merez@codeaurora.org>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>

MAINTAINERS: mmc: add maintainer entry for dw_mmc driver

Add maintainer entry for the Synopsys DW host driver which is used in
various SOC including EXYNOS series. As Will Newton will no longer be
able to take care of dw_mmc*, I and Jaehoon Chung are willing to maintain
it.

Signed-off-by: Seungwon Jeon <tgih.jun@samsung.com>
Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Acked-by: Will Newton <will.newton@imgtec.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: sdhci-esdhc-imx: Remove unused variables

3f175a6e5 (mmc: sdhci-esdhc-imx: remove ESDHC_CD_GPIO handling from IO
accessory) introduced the following build warnings:

drivers/mmc/host/sdhci-esdhc-imx.c:149:30: warning: unused variable 'boarddata' [-Wunused-variable]
drivers/mmc/host/sdhci-esdhc-imx.c:181:30: warning: unused variable 'boarddata' [-Wunused-variable]

Remove the unused variables.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: fix DT binding documentation SDHCI left-over

The file Documentation/devicetree/bindings/mmc/mmc.txt is common for all
MMC host drivers. Use a generic MMC host reference instead of an SDHCI
left-over.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: sdhci-esdhc-imx: name esdhc specific definitions with ESDHC_ prefix

Rename esdhc local definitions with ESDHC_ rather than SDHCI_ prefix,
so that we can distinguish them from SDHCI core definitions from name.

A couple of bit fields are also changed use shift for consistency and
better readability.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Tested-by: Dirk Behme <dirk.behme@de.bosch.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: sdhci-esdhc-imx: remove D3CD check from SDHCI_HOST_CONTROL write

SDHCI_CTRL_D3CD is not a standard SDHCI_HOST_CONTROL, so there is no
need to check it in SDHCI_HOST_CONTROL write at all. Remove it.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Tested-by: Dirk Behme <dirk.behme@de.bosch.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: sdhci-esdhc-imx: fix host version read

When commit 95a2482 (mmc: sdhci-esdhc-imx: add basic imx6q usdhc
support) works around host version issue on imx6q, it gets the
register address fixup "reg ^= 2" lost for imx25/35/51/53 esdhc.
Thus, the controller version on these SoCs is wrongly identified
as v1 while it's actually v2.

Add the address fixup back and take a different approach to correct
imx6q host version, so that the host version read gets back to work
for all SoCs.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: vt8500: Remove erroneous __exitp in wmt_mci_driver

With the __devinit/__devexit attributes having been removed, this
__exitp attribute causes an unused function warning and should be
removed as well.

Signed-off-by: Tony Prisk <linux@prisktech.co.nz>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: dw_mmc: Remove DW_MCI_QUIRK_NO_WRITE_PROTECT

The original quirk was added in the change 'mmc: dw_mmc: add quirk to
indicate missing write protect line'. The original quirk was added at
a controller level even though each slot has its own write protect (so
the quirk should be at the slot level). A recent change (mmc: dw_mmc:
Add "disable-wp" device tree property) added a slot-level quirk and
support for the quirk directly to dw_mmc.

Signed-off-by: Doug Anderson <dianders@chromium.org>
Acked-by: Will Newton <will.newton@imgtec.com>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: dw_mmc: Handle wp-gpios from device tree

On some SoCs (like exynos5250) you need to use an external GPIO for
write protect. Add support for wp-gpios to the core dw_mmc driver
since it could be useful across multiple SoCs.

With this change I am able to make use of the write protect for the
external SD slot on exynos5250-snow.

Signed-off-by: Doug Anderson <dianders@chromium.org>
Acked-by: Seungwon Jeon <tgih.jun@samsung.com>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: dw_mmc: exynos: Remove code for wp-gpios

The exynos code claimed the write protect with devm_gpio_request() but
never did anything with it. That meant that anyone using a write
protect GPIO would effectively be write protected all the time.

The handling for wp-gpios belongs in the main dw_mmc driver and has
been moved there.

Signed-off-by: Doug Anderson <dianders@chromium.org>
Acked-by: Seungwon Jeon <tgih.jun@samsung.com>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Chris Ball <cjb@laptop.org>

ARM: dts: Add disable-wp for sd card slot on smdk5250

The next change will remove the code from the dw_mmc-exynos that added
the DW_MCI_QUIRK_NO_WRITE_PROTECT. Keep existing functionality of
having no write protect pin on smdk5250 by adding the disable-wp
property.

Signed-off-by: Doug Anderson <dianders@chromium.org>
Acked-by: Seungwon Jeon <tgih.jun@samsung.com>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: dw_mmc: Add "disable-wp" device tree property

The "disable-wp" property is used to specify that a given SD card slot
doesn't have a concept of write protect. This eliminates the need for
special case code for SD slots that should never be write protected
(like a micro SD slot or a dev board).

The dw_mmc driver is special in needing to specify "disable-wp"
because the lack of a "wp-gpios" property means to use the special
purpose write protect line. On some other mmc devices the lack of
"wp-gpios" means that write protect should be disabled.

Signed-off-by: Doug Anderson <dianders@chromium.org>
Acked-by: Seungwon Jeon <tgih.jun@samsung.com>
Acked-by: Will Newton <will.newton@imgtec.com>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: correct the EXCEPTION_EVENTS_STATUS value in comment

The right value is 54 according to eMMC 4.5 specification.

Signed-off-by: ZhangYi <yix.x.zhang@intel.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: mxs-mmc: Fix warning due to incorrect type

Fixes the following warning when building with W=1 option:

drivers/mmc/host/mxs-mmc.c: In function 'mxs_mmc_adtc':
drivers/mmc/host/mxs-mmc.c:401:2: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]

The warning happens because 'i' is used in 'for_each_sg(sgl, sg, sg_len, i)' and should be made unsigned.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Acked-by: Marek Vasut <marex@denx.de>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: mxs-mmc: Add MODULE_ALIAS()

Add an entry for MODULE_ALIAS().

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Acked-by: Marek Vasut <marex@denx.de>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: mvsdio: add pinctrl integration

On many Marvell SoCs, the pins used for the SDIO interface are part of
the MPP pins, that are muxable pins. In order to get the muxing of
those pins correct, this commit integrates the mvsdio driver with the
pinctrl infrastructure by calling devm_pinctrl_get_select_default()
during ->probe().

Note that we permit this function to fail because not all Marvell
platforms have yet been fully converted to using the pinctrl
infrastructure.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Stefan Peter <s.peter@mpl.ch>
Tested-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Chris Ball <cjb@laptop.org>

gianfar: fix compile fail for NET_POLL=y due to struct packing

Commit ee873fda3bec7c668407b837fc5519eb961fcd37 ("gianfar: Pack struct
gfar_priv_grp into three cachelines") moved the irq number and names
off into a separate struct and created accessors for them. However
it was never tested with NET_POLL enabled, and so some conversions
that were simply overlooked went undetected until now.

Make the netpoll ones also use the gfar_irq() accessors.

Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Claudiu Manoil <claudiu.manoil@freescale.com>
Cc: Jianhua Xie <jianhua.xie@freescale.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/vfio: remove depends on CONFIG_EXPERIMENTAL

The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

x86/mm/pageattr: Prevent PSE and GLOABL leftovers to confuse pmd/pte_present and pmd_huge

Without this patch any kernel code that reads kernel memory in
non present kernel pte/pmds (as set by pageattr.c) will crash.

With this kernel code:

static struct page *crash_page;
static unsigned long *crash_address;
[..]
crash_page = alloc_pages(GFP_KERNEL, 9);
crash_address = page_address(crash_page);
if (set_memory_np((unsigned long)crash_address, 1))
printk("set_memory_np failure\n");
[..]

The kernel will crash if inside the "crash tool" one would try
to read the memory at the not present address.

crash> p crash_address
crash_address = $8 = (long unsigned int *) 0xffff88023c000000
crash> rd 0xffff88023c000000
[ *lockup* ]

The lockup happens because _PAGE_GLOBAL and _PAGE_PROTNONE
shares the same bit, and pageattr leaves _PAGE_GLOBAL set on a
kernel pte which is then mistaken as _PAGE_PROTNONE (so
pte_present returns true by mistake and the kernel fault then
gets confused and loops).

With THP the same can happen after we taught pmd_present to
check _PAGE_PROTNONE and _PAGE_PSE in commit
027ef6c87853b0a9df5317 ("mm: thp: fix pmd_present for
split_huge_page and PROT_NONE with THP"). THP has the same
problem with _PAGE_GLOBAL as the 4k pages, but it also has a
problem with _PAGE_PSE, which must be cleared too.

After the patch is applied copy_user correctly returns -EFAULT
and doesn't lockup anymore.

crash> p crash_address
crash_address = $9 = (long unsigned int *) 0xffff88023c000000
crash> rd 0xffff88023c000000
rd: read error: kernel virtual address: ffff88023c000000 type:
"64-bit KVADDR"

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Revert "x86, mm: Make spurious_fault check explicitly check explicitly check the PRESENT bit"

I got a report for a minor regression introduced by commit
027ef6c87853b ("mm: thp: fix pmd_present for split_huge_page and
PROT_NONE with THP").

So the problem is, pageattr creates kernel pagetables (pte and
pmds) that breaks pte_present/pmd_present and the patch above
exposed this invariant breakage for pmd_present.

The same problem already existed for the pte and pte_present and
it was fixed by commit 660a293ea9be709 ("x86, mm: Make
spurious_fault check explicitly check the PRESENT bit") (if it
wasn't for that commit, it wouldn't even be a regression). That
fix avoids the pagefault to use pte_present. I could follow
through by stopping using pmd_present/pmd_huge too.

However I think it's more robust to fix pageattr and to clear
the PSE/GLOBAL bitflags too in addition to the present bitflag.
So the kernel page fault can keep using the regular
pte_present/pmd_present/pmd_huge.

The confusion arises because _PAGE_GLOBAL and _PAGE_PROTNONE are
sharing the same bit, and in the pmd case we pretend _PAGE_PSE
to be set only in present pmds (to facilitate split_huge_page
final tlb flush).

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/mm/numa: Don't check if node is NUMA_NO_NODE

If we aren't debugging per_cpu maps, the cpu's node is stored in
per_cpu variable numa_node. If `node' is NUMA_NO_NODE, it means
the caller wants to clear the cpu's node. So we should also
call set_cpu_numa_node() in this case.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

cputime: Use local_clock() for full dynticks cputime accounting

Running the full dynticks cputime accounting with preemptible
kernel debugging trigger the following warning:

[    4.488303] BUG: using smp_processor_id() in preemptible [00000000] code: init/1
[    4.490971] caller is native_sched_clock+0x22/0x80
[    4.493663] Pid: 1, comm: init Not tainted 3.8.0+ #13
[    4.496376] Call Trace:
[    4.498996]  [<ffffffff813410eb>] debug_smp_processor_id+0xdb/0xf0
[    4.501716]  [<ffffffff8101e642>] native_sched_clock+0x22/0x80
[    4.504434]  [<ffffffff8101db99>] sched_clock+0x9/0x10
[    4.507185]  [<ffffffff81096ccd>] fetch_task_cputime+0xad/0x120
[    4.509916]  [<ffffffff81096dd5>] task_cputime+0x35/0x60
[    4.512622]  [<ffffffff810f146e>] acct_update_integrals+0x1e/0x40
[    4.515372]  [<ffffffff8117d2cf>] do_execve_common+0x4ff/0x5c0
[    4.518117]  [<ffffffff8117cf14>] ? do_execve_common+0x144/0x5c0
[    4.520844]  [<ffffffff81867a10>] ? rest_init+0x160/0x160
[    4.523554]  [<ffffffff8117d457>] do_execve+0x37/0x40
[    4.526276]  [<ffffffff810021a3>] run_init_process+0x23/0x30
[    4.528953]  [<ffffffff81867aac>] kernel_init+0x9c/0xf0
[    4.531608]  [<ffffffff8188356c>] ret_from_fork+0x7c/0xb0

We use sched_clock() to perform and fixup the cputime
accounting. However we are calling it with preemption enabled
from the read side, which trigger the bug above.

To fix this up, use local_clock() instead. It takes care of
preemption and also provide a more reliable clock source. This
is welcome for this kind of statistic that is widely relied on
in userspace.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Ingo Molnar <mingo@kernel.org>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Kevin Hilman <khilman@linaro.org>
Link: http://lkml.kernel.org/r/1361636925-22288-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

cputime: Constify timeval_to_cputime(timeval) argument

Saw the following compiler warning on the linux-next tree:

  kernel/itimer.c: In function 'set_cpu_itimer':
  kernel/itimer.c:152:2: warning: passing argument 1 of 'timeval_to_cputime' discards 'const' qualifier from pointer target type [enabled by default]
  ...

timeval_to_cputime() is always passed a constant timeval in
argument, we need to teach the nsecs based cputime
implementation about that.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Kevin Hilman <khilman@linaro.org>
Link: http://lkml.kernel.org/r/1361636925-22288-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Kevin Hilman <khilman@linaro.org>

xtensa: add support for TLS

The Xtensa architecture provides a global register called THREADPTR
for the purpose of Thread Local Storage (TLS) support. This allows us
to use a fairly simple implementation, keeping the thread pointer in
the regset and simply saving and restoring it upon entering/exiting
the from user space.

Signed-off-by: Chris Zankel <chris@zankel.net>

xtensa: add missing include asm/uaccess.h to checksum.h

This fixes the following build errors seen in the linux-next:

arch/xtensa/include/asm/checksum.h:247:2: error: implicit declaration of
function 'access_ok' [-Werror=implicit-function-declaration]
arch/xtensa/include/asm/checksum.h:247:16: error: 'VERIFY_WRITE' undeclared
(first use in this function)

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>

xtensa: do not enable GENERIC_GPIO by default

Now that drivers/gpio/devres.c build does not depend on GPIOLIB do not
enable GENERIC_GPIO by default to fix the following build errors seen
in the linux-next:

include/asm-generic/gpio.h:270:2: error: implicit declaration of function
'__gpio_get_value' [-Werror=implicit-function-declaration]
include/asm-generic/gpio.h:276:2: error: implicit declaration of function
'__gpio_set_value' [-Werror=implicit-function-declaration]
include/linux/gpio.h:60:19: error: redefinition of 'gpio_cansleep'
include/linux/gpio.h:62:2: error: implicit declaration of function
'__gpio_cansleep' [-Werror=implicit-function-declaration]
include/linux/gpio.h:67:2: error: implicit declaration of function
'__gpio_to_irq' [-Werror=implicit-function-declaration]
drivers/gpio/devres.c:26:2: error: implicit declaration of function
'gpio_free' [-Werror=implicit-function-declaration]
drivers/gpio/devres.c:60:2: error: implicit declaration of function
'gpio_request' [-Werror=implicit-function-declaration]
drivers/gpio/devres.c:90:2: error: implicit declaration of function
'gpio_request_one' [-Werror=implicit-function-declaration]

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>

xtensa: complete ptrace handling of register windows

Compute WindowBase and WindowMask registers correctly on ptrace calls.
Work done earlier by Maxim, Christian and Marc.

Signed-off-by: Marc Gauthier <marc@tensilica.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>

xtensa: add support for oprofile

Support call graph profiling.
Keep upper two bits of PC unchanged through backtrace rather than take
them from sp (a1). The stack pointer is usually in the same GB (same
upper 2 bits) as PC, but technically doesn't always have to be (and
might not in the future, when taking full advantage of MMU v3).

Signed-off-by: Dan Nicolaescu <dann@xtensa-linux.org>
Signed-off-by: Pete Delaney <piet@tensilica.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>

xtensa: move spill_registers to traps.h

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>

xtensa: ISS: add host file-based simulated disk

Simdisk is a block device that maps to a file in the host file system.
It is usable for testing in the simulated environment, like xt-sim or
QEMU. Device binding to host file may be changed at runtime via proc
interface provided the device is not in use. Number of block devices
and initial binding to host files is controlled via kernel/module
parameters, with defaults specified in the kernel configuration.

Signed-off-by: Victor Prupis <vnp@tensilica.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>

xtensa: fix str[n]cmp return value

str[n]cmp functions return negative value if the first string is less
than the second, positive value if the first string is greater than the
second and zero if they are equal. This is important when these
functions are used for sorting/binary search.

With incorrect strcmp return value bsearch was always failing in the
find_symbol_in_section making it impossible to load any module.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>

xtensa: avoid mmap cache aliasing

Provide arch_get_unmapped_area function aligning shared memory mapping
addresses to the biggest of the page size or the cache way size. That
guarantees that corresponding virtual addresses of shared mappings are
cached by the same cache sets.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>

xtensa: add finit_module syscall

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>

xtensa: pull signal definitions from signal-defs.h

This fixes the following build error in the current linux-next:

include/linux/signal.h:261:2: error: unknown type name '__sigrestore_t'
make[2]: *** [arch/xtensa/kernel/asm-offsets.s] Error 1
make[1]: *** [prepare0] Error 2
make: *** [sub-make] Error 2

that appeared after 32dae82 'consolidate kernel-side struct sigaction declarations'

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>

xtensa: fix ipc_parse_version selection

shmctl may be called with IPC_64 flag, select function version of
ipc_parse_version to correctly handle that.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>

xtensa: dispatch medium-priority interrupts

Add support for dispatching medium-priority interrupts, that is,
interrupts of priority levels 2 to EXCM_LEVEL. IRQ handling may be
preempted by higher priority IRQ.

Signed-off-by: Marc Gauthier <marc@tensilica.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>

xtensa: Add config files for Diamond 233L - Rev C processor variant

The Diamond 233L processor is a pre-configured Xtensa processor tailored
for Linux application.

Signed-off-by: Pete Delaney <piet@tensilica.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>

xtensa: use new common dtc rule

The current rules have the .dtb files build in a different directory
from the .dts files. This patch changes xtensa to use the generic dtb
rule which builds .dtb files in the same directory as the source .dts.

This requires moving parts of arch/xtensa/boot/Makefile into newly
created arch/xtensa/boot/dts/Makefile, and updating arch/xtensa/Makefile
to call the new Makefile.

Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Chris Zankel <chris@zankel.net>

xtensa: rename prom_update_property to of_update_property

This rename happened in 79d1c71 powerpc+of: Rename the drivers/of prom_*
functions to of_*.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>

Merge branch 'for-linus' of git://git./linux/kernel/git/viro/signal

Pull signal handling cleanups from Al Viro:
"This is the first pile; another one will come a bit later and will
  contain SYSCALL_DEFINE-related patches.

   - a bunch of signal-related syscalls (both native and compat)
     unified.

   - a bunch of compat syscalls switched to COMPAT_SYSCALL_DEFINE
     (fixing several potential problems with missing argument
     validation, while we are at it)

   - a lot of now-pointless wrappers killed

   - a couple of architectures (cris and hexagon) forgot to save
     altstack settings into sigframe, even though they used the
     (uninitialized) values in sigreturn; fixed.

   - microblaze fixes for delivery of multiple signals arriving at once

   - saner set of helpers for signal delivery introduced, several
     architectures switched to using those."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (143 commits)
  x86: convert to ksignal
  sparc: convert to ksignal
  arm: switch to struct ksignal * passing
  alpha: pass k_sigaction and siginfo_t using ksignal pointer
  burying unused conditionals
  make do_sigaltstack() static
  arm64: switch to generic old sigaction() (compat-only)
  arm64: switch to generic compat rt_sigaction()
  arm64: switch compat to generic old sigsuspend
  arm64: switch to generic compat rt_sigqueueinfo()
  arm64: switch to generic compat rt_sigpending()
  arm64: switch to generic compat rt_sigprocmask()
  arm64: switch to generic sigaltstack
  sparc: switch to generic old sigsuspend
  sparc: COMPAT_SYSCALL_DEFINE does all sign-extension as well as SYSCALL_DEFINE
  sparc: kill sign-extending wrappers for native syscalls
  kill sparc32_open()
  sparc: switch to use of generic old sigaction
  sparc: switch sys_compat_rt_sigaction() to COMPAT_SYSCALL_DEFINE
  mips: switch to generic sys_fork() and sys_clone()
  ...

Merge branch 'drm/hdmi-for-3.9' of git://anongit.freedesktop.org/tegra/linux into drm-next

Thierry writes:
"Remove a duplicate implementation of the CEA VIC lookup and move the CEA
and other mode tables to drm_edid.c to make it more difficult to create
duplicates of the tables.

Add some helpers to pack CEA-861/HDMI AVI, audio and SPD infoframes into
binary buffers that can easily be written into hardware registers. A new
helper function makes it easy construct an AVI infoframe from a DRM
display mode.

Convert the Tegra and Radeon drivers to use the new HDMI helpers."
* 'drm/hdmi-for-3.9' of git://anongit.freedesktop.org/tegra/linux:
  drm/radeon: Use generic HDMI infoframe helpers
  drm/tegra: Use generic HDMI infoframe helpers
  drm: Add EDID helper documentation
  drm: Add HDMI infoframe helpers
  video: Add generic HDMI infoframe helpers
  drm: Add some missing forward declarations
  drm: Move mode tables to drm_edid.c
  drm: Remove duplicate drm_mode_cea_vic()

Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-next

Two regressions fixes from snowboarding land

* 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
drm/i915: Revert hdmi HDP pin checks
drm/i915: Handle untiled planes when computing their offsets

Merge branch 'drm/tegra-for-3.9' of git://anongit.freedesktop.org/tegra/linux into drm-next

Thierry writes:
"Add support for 2 hardware overlays found on Tegra. These support YUV
pixel formats and can be used as video overlays. .mode_set_base() is
implemented and support for VBLANK and page-flipping is added.

A few minor bug fixes are also included and a new debugfs file allows
to inspect the framebuffers attached to the Tegra DRM device."

* 'drm/tegra-for-3.9' of git://anongit.freedesktop.org/tegra/linux:
  drm/tegra: Add list of framebuffers to debugfs
  drm/tegra: Fix color expansion
  drm/tegra: Split DC_CMD_STATE_CONTROL register write
  drm/tegra: Implement page-flipping support
  drm/tegra: Implement VBLANK support
  drm/tegra: Implement .mode_set_base()
  drm/tegra: Add plane support
  drm/tegra: Remove bogus tegra_framebuffer structure
  drm: Add consistency check for page-flipping

vlan: adjust vlan_set_encap_proto() for its callers

There are two places to call vlan_set_encap_proto():
vlan_untag() and __pop_vlan_tci().

vlan_untag() assumes skb->data points after mac addr, otherwise
the following code

        vhdr = (struct vlan_hdr *) skb->data;
        vlan_tci = ntohs(vhdr->h_vlan_TCI);
        __vlan_hwaccel_put_tag(skb, vlan_tci);

        skb_pull_rcsum(skb, VLAN_HLEN);

won't be correct. But __pop_vlan_tci() assumes points _before_
mac addr.

In vlan_set_encap_proto(), it looks for some magic L2 value
after mac addr:

        rawp = skb->data;
        if (*(unsigned short *) rawp == 0xFFFF)
...

Therefore __pop_vlan_tci() is obviously wrong.

A quick fix is avoiding using skb->data in vlan_set_encap_proto(),
use 'vhdr+1' is always correct in both cases.

Cc: David S. Miller <davem@davemloft.net>
Cc: Jesse Gross <jesse@nicira.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'akpm' (more incoming from Andrew)

Merge second patch-bomb from Andrew Morton:

- A little DM fix

- the MM queue

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (154 commits)
  ksm: allocate roots when needed
  mm: cleanup "swapcache" in do_swap_page
  mm,ksm: swapoff might need to copy
  mm,ksm: FOLL_MIGRATION do migration_entry_wait
  ksm: shrink 32-bit rmap_item back to 32 bytes
  ksm: treat unstable nid like in stable tree
  ksm: add some comments
  tmpfs: fix mempolicy object leaks
  tmpfs: fix use-after-free of mempolicy object
  mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages
  mm: export mmu notifier invalidates
  mm: accelerate mm_populate() treatment of THP pages
  mm: use long type for page counts in mm_populate() and get_user_pages()
  mm: accurately document nr_free_*_pages functions with code comments
  HWPOISON: change order of error_states[]'s elements
  HWPOISON: fix misjudgement of page_action() for errors on mlocked pages
  memcg: stop warning on memcg_propagate_kmem
  net: change type of virtio_chan->p9_max_pages
  vmscan: change type of vm_total_pages to unsigned long
  fs/nfsd: change type of max_delegations, nfsd_drc_max_mem and nfsd_drc_mem_used
  ...

ksm: allocate roots when needed

It is a pity to have MAX_NUMNODES+MAX_NUMNODES tree roots statically
allocated, particularly when very few users will ever actually tune
merge_across_nodes 0 to use more than 1+1 of those trees. Not a big
deal (only 16kB wasted on each machine with CONFIG_MAXSMP), but a pity.

Start off with 1+1 statically allocated, then if merge_across_nodes is
ever tuned, allocate for nr_node_ids+nr_node_ids. Do not attempt to
free up the extra if it's tuned back, that would be a waste of effort.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: cleanup "swapcache" in do_swap_page

I dislike the way in which "swapcache" gets used in do_swap_page():
there is always a page from swapcache there (even if maybe uncached by
the time we lock it), but tests are made according to "swapcache".
Rework that with "page != swapcache", as has been done in unuse_pte().

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm,ksm: swapoff might need to copy

Before establishing that KSM page migration was the cause of my
WARN_ON_ONCE(page_mapped(page))s, I suspected that they came from the
lack of a ksm_might_need_to_copy() in swapoff's unuse_pte() - which in
many respects is equivalent to faulting in a page.

In fact I've never caught that as the cause: but in theory it does at
least need the KSM_RUN_UNMERGE check in ksm_might_need_to_copy(), to
avoid bringing a KSM page back in when it's not supposed to be.

I intended to copy how it's done in do_swap_page(), but have a strong
aversion to how "swapcache" ends up being used there: rework it with
"page != swapcache".

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm,ksm: FOLL_MIGRATION do migration_entry_wait

In "ksm: remove old stable nodes more thoroughly" I said that I'd never
seen its WARN_ON_ONCE(page_mapped(page)).  True at the time of writing,
but it soon appeared once I tried fuller tests on the whole series.

It turned out to be due to the KSM page migration itself: unmerge_and_
remove_all_rmap_items() failed to locate and replace all the KSM pages,
because of that hiatus in page migration when old pte has been replaced
by migration entry, but not yet by new pte.  follow_page() finds no page
at that instant, but a KSM page reappears shortly after, without a
fault.

Add FOLL_MIGRATION flag, so follow_page() can do migration_entry_wait()
for KSM's break_cow().  I'd have preferred to avoid another flag, and do
it every time, in case someone else makes the same easy mistake; but did
not find another transgressor (the common get_user_pages() is of course
safe), and cannot be sure that every follow_page() caller is prepared to
sleep - ia64's xencomm_vtop()? Now, THP's wait_split_huge_page() can
already sleep there, since anon_vma locking was changed to mutex, but
maybe that's somehow excluded.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ksm: shrink 32-bit rmap_item back to 32 bytes

Think of struct rmap_item as an extension of struct page (restricted to
MADV_MERGEABLE areas): there may be a lot of them, we need to keep them
small, especially on 32-bit architectures of limited lowmem.

Siting "int nid" after "unsigned int checksum" works nicely on 64-bit,
making no change to its 64-byte struct rmap_item; but bloats the 32-bit
struct rmap_item from (nicely cache-aligned) 32 bytes to 36 bytes, which
rounds up to 40 bytes once allocated from slab.  We'd better avoid that.

Hey, I only just remembered that the anon_vma pointer in struct
rmap_item has no purpose until the rmap_item is hung from a stable tree
node (which has its own nid field); and rmap_item's nid field no purpose
than to say which tree root to tell rb_erase() when unlinking from an
unstable tree.

Double them up in a union.  There's just one place where we set anon_vma
early (when we already hold mmap_sem): now we must remove tree_rmap_item
from its unstable tree there, before overwriting nid.  No need to
spatter BUG()s around: we'd be seeing oopses if this were wrong.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ksm: treat unstable nid like in stable tree

An inconsistency emerged in reviewing the NUMA node changes to KSM: when
meeting a page from the wrong NUMA node in a stable tree, we say that
it's okay for comparisons, but not as a leaf for merging; whereas when
meeting a page from the wrong NUMA node in an unstable tree, we bail out
immediately.

Now, it might be that a wrong NUMA node in an unstable tree is more
likely to correlate with instablility (different content, with rbnode
now misplaced) than page migration; but even so, we are accustomed to
instablility in the unstable tree.

Without strong evidence for which strategy is generally better, I'd
rather be consistent with what's done in the stable tree: accept a page
from the wrong NUMA node for comparison, but not as a leaf for merging.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ksm: add some comments

Added slightly more detail to the Documentation of merge_across_nodes, a
few comments in areas indicated by review, and renamed get_ksm_page()'s
argument from "locked" to "lock_it". No functional change.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

tmpfs: fix mempolicy object leaks

Fix several mempolicy leaks in the tmpfs mount logic.  These leaks are
slow - on the order of one object leaked per mount attempt.

Leak 1 (umount doesn't free mpol allocated in mount):
    while true; do
        mount -t tmpfs -o mpol=interleave,size=100M nodev /mnt
        umount /mnt
    done

Leak 2 (errors parsing remount options will leak mpol):
    mount -t tmpfs -o size=100M nodev /mnt
    while true; do
        mount -o remount,mpol=interleave,size=x /mnt 2> /dev/null
    done
    umount /mnt

Leak 3 (multiple mpol per mount leak mpol):
    while true; do
        mount -t tmpfs -o mpol=interleave,mpol=interleave,size=100M nodev /mnt
        umount /mnt
    done

This patch fixes all of the above.  I could have broken the patch into
three pieces but is seemed easier to review as one.

[akpm@linux-foundation.org: fix handling of mpol_parse_str() errors, per Hugh]
Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

tmpfs: fix use-after-free of mempolicy object

The tmpfs remount logic preserves filesystem mempolicy if the mpol=M
option is not specified in the remount request.  A new policy can be
specified if mpol=M is given.

Before this patch remounting an mpol bound tmpfs without specifying
mpol= mount option in the remount request would set the filesystem's
mempolicy object to a freed mempolicy object.

To reproduce the problem boot a DEBUG_PAGEALLOC kernel and run:
    # mkdir /tmp/x

    # mount -t tmpfs -o size=100M,mpol=interleave nodev /tmp/x

    # grep /tmp/x /proc/mounts
    nodev /tmp/x tmpfs rw,relatime,size=102400k,mpol=interleave:0-3 0 0

    # mount -o remount,size=200M nodev /tmp/x

    # grep /tmp/x /proc/mounts
    nodev /tmp/x tmpfs rw,relatime,size=204800k,mpol=??? 0 0
        # note ? garbage in mpol=... output above

    # dd if=/dev/zero of=/tmp/x/f count=1
        # panic here

Panic:
    BUG: unable to handle kernel NULL pointer dereference at           (null)
    IP: [<          (null)>]           (null)
    [...]
    Oops: 0010 [#1] SMP DEBUG_PAGEALLOC
    Call Trace:
      mpol_shared_policy_init+0xa5/0x160
      shmem_get_inode+0x209/0x270
      shmem_mknod+0x3e/0xf0
      shmem_create+0x18/0x20
      vfs_create+0xb5/0x130
      do_last+0x9a1/0xea0
      path_openat+0xb3/0x4d0
      do_filp_open+0x42/0xa0
      do_sys_open+0xfe/0x1e0
      compat_sys_open+0x1b/0x20
      cstar_dispatch+0x7/0x1f

Non-debug kernels will not crash immediately because referencing the
dangling mpol will not cause a fault.  Instead the filesystem will
reference a freed mempolicy object, which will cause unpredictable
behavior.

The problem boils down to a dropped mpol reference below if
shmem_parse_options() does not allocate a new mpol:

    config = *sbinfo
    shmem_parse_options(data, &config, true)
    mpol_put(sbinfo->mpol)
    sbinfo->mpol = config.mpol  /* BUG: saves unreferenced mpol */

This patch avoids the crash by not releasing the mempolicy if
shmem_parse_options() doesn't create a new mpol.

How far back does this issue go? I see it in both 2.6.36 and 3.3.  I did
not look back further.

Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages

Rob van der Heij reported the following (paraphrased) on private mail.

The scenario is that I want to avoid backups to fill up the page
cache and purge stuff that is more likely to be used again (this is
with s390x Linux on z/VM, so I don't give it as much memory that
we don't care anymore). So I have something with LD_PRELOAD that
intercepts the close() call (from tar, in this case) and issues
a posix_fadvise() just before closing the file.

This mostly works, except for small files (less than 14 pages)
that remains in page cache after the face.

Unfortunately Rob has not had a chance to test this exact patch but the
test program below should be reproducing the problem he described.

The issue is the per-cpu pagevecs for LRU additions.  If the pages are
added by one CPU but fadvise() is called on another then the pages
remain resident as the invalidate_mapping_pages() only drains the local
pagevecs via its call to pagevec_release().  The user-visible effect is
that a program that uses fadvise() properly is not obeyed.

A possible fix for this is to put the necessary smarts into
invalidate_mapping_pages() to globally drain the LRU pagevecs if a
pagevec page could not be discarded.  The downside with this is that an
inode cache shrink would send a global IPI and memory pressure
potentially causing global IPI storms is very undesirable.

Instead, this patch adds a check during fadvise(POSIX_FADV_DONTNEED) to
check if invalidate_mapping_pages() discarded all the requested pages.
If a subset of pages are discarded it drains the LRU pagevecs and tries
again.  If the second attempt fails, it assumes it is due to the pages
being mapped, locked or dirty and does not care.  With this patch, an
application using fadvise() correctly will be obeyed but there is a
downside that a malicious application can force the kernel to send
global IPIs and increase overhead.

If accepted, I would like this to be considered as a -stable candidate.
It's not an urgent issue but it's a system call that is not working as
advertised which is weak.

The following test program demonstrates the problem.  It should never
report that pages are still resident but will without this patch.  It
assumes that CPU 0 and 1 exist.

int main() {
int fd;
int pagesize = getpagesize();
ssize_t written = 0, expected;
char *buf;
unsigned char *vec;
int resident, i;
cpu_set_t set;

/* Prepare a buffer for writing */
expected = FILESIZE_PAGES * pagesize;
buf = malloc(expected + 1);
if (buf == NULL) {
printf("ENOMEM\n");
exit(EXIT_FAILURE);
}
buf[expected] = 0;
memset(buf, 'a', expected);

/* Prepare the mincore vec */
vec = malloc(FILESIZE_PAGES);
if (vec == NULL) {
printf("ENOMEM\n");
exit(EXIT_FAILURE);
}

/* Bind ourselves to CPU 0 */
CPU_ZERO(&set);
CPU_SET(0, &set);
if (sched_setaffinity(getpid(), sizeof(set), &set) == -1) {
perror("sched_setaffinity");
exit(EXIT_FAILURE);
}

/* open file, unlink and write buffer */
fd = open("fadvise-test-file", O_CREAT|O_EXCL|O_RDWR);
if (fd == -1) {
perror("open");
exit(EXIT_FAILURE);
}
unlink("fadvise-test-file");
while (written < expected) {
ssize_t this_write;
this_write = write(fd, buf + written, expected - written);

if (this_write == -1) {
perror("write");
exit(EXIT_FAILURE);
}

written += this_write;
}
free(buf);

/*
* Force ourselves to another CPU. If fadvise only flushes the local
* CPUs pagevecs then the fadvise will fail to discard all file pages
*/
CPU_ZERO(&set);
CPU_SET(1, &set);
if (sched_setaffinity(getpid(), sizeof(set), &set) == -1) {
perror("sched_setaffinity");
exit(EXIT_FAILURE);
}

/* sync and fadvise to discard the page cache */
fsync(fd);
if (posix_fadvise(fd, 0, expected, POSIX_FADV_DONTNEED) == -1) {
perror("posix_fadvise");
exit(EXIT_FAILURE);
}

/* map the file and use mincore to see which parts of it are resident */
buf = mmap(NULL, expected, PROT_READ, MAP_SHARED, fd, 0);
if (buf == NULL) {
perror("mmap");
exit(EXIT_FAILURE);
}
if (mincore(buf, expected, vec) == -1) {
perror("mincore");
exit(EXIT_FAILURE);
}

/* Check residency */
for (i = 0, resident = 0; i < FILESIZE_PAGES; i++) {
if (vec[i])
resident++;
}
if (resident != 0) {
printf("Nr unexpected pages resident: %d\n", resident);
exit(EXIT_FAILURE);
}

munmap(buf, expected);
close(fd);
free(vec);
exit(EXIT_SUCCESS);
}

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Rob van der Heij <rvdheij@gmail.com>
Tested-by: Rob van der Heij <rvdheij@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: export mmu notifier invalidates

We at SGI have a need to address some very high physical address ranges
with our GRU (global reference unit), sometimes across partitioned
machine boundaries and sometimes with larger addresses than the cpu
supports.  We do this with the aid of our own 'extended vma' module
which mimics the vma.  When something (either unmap or exit) frees an
'extended vma' we use the mmu notifiers to clean them up.

We had been able to mimic the functions
__mmu_notifier_invalidate_range_start() and
__mmu_notifier_invalidate_range_end() by locking the per-mm lock and
walking the per-mm notifier list.  But with the change to a global srcu
lock (static in mmu_notifier.c) we can no longer do that.  Our module has
no access to that lock.

So we request that these two functions be exported.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Acked-by: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: accelerate mm_populate() treatment of THP pages

This change adds a follow_page_mask function which is equivalent to
follow_page, but with an extra page_mask argument.

follow_page_mask sets *page_mask to HPAGE_PMD_NR - 1 when it encounters
a THP page, and to 0 in other cases.

__get_user_pages() makes use of this in order to accelerate populating
THP ranges - that is, when both the pages and vmas arrays are NULL, we
don't need to iterate HPAGE_PMD_NR times to cover a single THP page (and
we also avoid taking mm->page_table_lock that many times).

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: use long type for page counts in mm_populate() and get_user_pages()

Use long type for page counts in mm_populate() so as to avoid integer
overflow when running the following test code:

int main(void) {
  void *p = mmap(NULL, 0x100000000000, PROT_READ,
                 MAP_PRIVATE | MAP_ANON, -1, 0);
  printf("p: %p\n", p);
  mlockall(MCL_CURRENT);
  printf("done\n");
  return 0;
}

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: accurately document nr_free_*_pages functions with code comments

nr_free_zone_pages(), nr_free_buffer_pages() and nr_free_pagecache_pages()
are horribly badly named, so accurately document them with code comments
in case of the misuse of them.

[akpm@linux-foundation.org: tweak comments]
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

HWPOISON: change order of error_states[]'s elements

error_states[] has two separate states "unevictable LRU page" and
"mlocked LRU page", and the former one has the higher priority now. But
because of that the latter one is rarely chosen because pages with
PageMlocked highly likely have PG_unevictable set. On the other hand,
PG_unevictable without PageMlocked is common for ramfs or SHM_LOCKed
shared memory, so reversing the priority of these two states helps us
clearly distinguish them.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Chen Gong <gong.chen@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

HWPOISON: fix misjudgement of page_action() for errors on mlocked pages

memory_failure() can't handle memory errors on mlocked pages correctly,
because page_action() judges such errors as ones on "unknown pages"
instead of ones on "unevictable LRU page" or "mlocked LRU page".  In
order to determine page_state page_action() checks page flags at the
timing of the judgement, but such page flags are not the same with those
just after memory_failure() is called, because memory_failure() does
unmapping of the error pages before doing page_action().  This unmapping
changes the page state, especially page_remove_rmap() (called from
try_to_unmap_one()) clears PG_mlocked, so page_action() can't catch
mlocked pages after that.

With this patch, we store the page flag of the error page before doing
unmap, and (only) if the first check with page flags at the time decided
the error page is unknown, we do the second check with the stored page
flag.  This implementation doesn't change error handling for the page
types for which the first check can determine the page state correctly.

[akpm@linux-foundation.org: tweak comments]
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Chen Gong <gong.chen@linux.intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

memcg: stop warning on memcg_propagate_kmem

Whilst I run the risk of a flogging for disloyalty to the Lord of Sealand,
I do have CONFIG_MEMCG=y CONFIG_MEMCG_KMEM not set, and grow tired of the
"mm/memcontrol.c:4972:12: warning: `memcg_propagate_kmem' defined but not
used [-Wunused-function]" seen in 3.8-rc: move the #ifdef outwards.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

net: change type of virtio_chan->p9_max_pages

This member of struct virtio_chan is calculated from nr_free_buffer_pages
so change its type to unsigned long in case of overflow.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

vmscan: change type of vm_total_pages to unsigned long

This variable is calculated from nr_free_pagecache_pages so
change its type to unsigned long.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fs/nfsd: change type of max_delegations, nfsd_drc_max_mem and nfsd_drc_mem_used

The three variables are calculated from nr_free_buffer_pages so change
their types to unsigned long in case of overflow.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fs/buffer.c: change type of max_buffer_heads to unsigned long

max_buffer_heads is calculated from nr_free_buffer_pages(), so change
its type to unsigned long in case of overflow.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ia64: use %ld to print pages calculated in nr_free_buffer_pages

Now the function nr_free_buffer_pages returns unsigned long, so use %ld
to print its return value.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: fix return type for functions nr_free_*_pages

Currently, the amount of RAM that functions nr_free_*_pages return is
held in unsigned int. But in machines with big memory (exceeding 16TB),
the amount may be incorrect because of overflow, so fix it.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

memcg: cleanup mem_cgroup_init comment

We should encourage all memcg controller initialization independent on a
specific mem_cgroup to be done here rather than exploit css_alloc
callback and assume that nothing happens before root cgroup is created.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <htejun@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

memcg: move memcg_stock initialization to mem_cgroup_init

memcg_stock are currently initialized during the root cgroup allocation
which is OK but it pointlessly pollutes memcg allocation code with
something that can be called when the memcg subsystem is initialized by
mem_cgroup_init along with other controller specific parts.

This patch wraps the current memcg_stock initialization code into a
helper calls it from the controller subsystem initialization code.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <htejun@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>