Commit Graph

692872 Commits

Author SHA1 Message Date
Nick Terrell 5d2405227a lib: Add xxhash module
Adds xxhash kernel module with xxh32 and xxh64 hashes. xxhash is an
extremely fast non-cryptographic hash algorithm for checksumming.
The zstd compression and decompression modules added in the next patch
require xxhash. I extracted it out from zstd since it is useful on its
own. I copied the code from the upstream XXHash source repository and
translated it into kernel style. I ran benchmarks and tests in the kernel
and tests in userland.

I benchmarked xxhash as a special character device. I ran in four modes,
no-op, xxh32, xxh64, and crc32. The no-op mode simply copies the data to
kernel space and ignores it. The xxh32, xxh64, and crc32 modes compute
hashes on the copied data. I also ran it with four different buffer sizes.
The benchmark file is located in the upstream zstd source repository under
`contrib/linux-kernel/xxhash_test.c` [1].

I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
16 GB of RAM, and a SSD. I benchmarked using the file `filesystem.squashfs`
from `ubuntu-16.10-desktop-amd64.iso`, which is 1,536,217,088 B large.
Run the following commands for the benchmark:

    modprobe xxhash_test
    mknod xxhash_test c 245 0
    time cp filesystem.squashfs xxhash_test

The time is reported by the time of the userland `cp`.
The GB/s is computed with

    1,536,217,008 B / time(buffer size, hash)

which includes the time to copy from userland.
The Normalized GB/s is computed with

    1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).

| Buffer Size (B) | Hash  | Time (s) | GB/s | Adjusted GB/s |
|-----------------|-------|----------|------|---------------|
|            1024 | none  |    0.408 | 3.77 |             - |
|            1024 | xxh32 |    0.649 | 2.37 |          6.37 |
|            1024 | xxh64 |    0.542 | 2.83 |         11.46 |
|            1024 | crc32 |    1.290 | 1.19 |          1.74 |
|            4096 | none  |    0.380 | 4.04 |             - |
|            4096 | xxh32 |    0.645 | 2.38 |          5.79 |
|            4096 | xxh64 |    0.500 | 3.07 |         12.80 |
|            4096 | crc32 |    1.168 | 1.32 |          1.95 |
|            8192 | none  |    0.351 | 4.38 |             - |
|            8192 | xxh32 |    0.614 | 2.50 |          5.84 |
|            8192 | xxh64 |    0.464 | 3.31 |         13.60 |
|            8192 | crc32 |    1.163 | 1.32 |          1.89 |
|           16384 | none  |    0.346 | 4.43 |             - |
|           16384 | xxh32 |    0.590 | 2.60 |          6.30 |
|           16384 | xxh64 |    0.466 | 3.30 |         12.80 |
|           16384 | crc32 |    1.183 | 1.30 |          1.84 |

Tested in userland using the test-suite in the zstd repo under
`contrib/linux-kernel/test/XXHashUserlandTest.cpp` [2] by mocking the
kernel functions. A line in each branch of every function in `xxhash.c`
was commented out to ensure that the test-suite fails. Additionally
tested while testing zstd and with SMHasher [3].

[1] https://phabricator.intern.facebook.com/P57526246
[2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/XXHashUserlandTest.cpp
[3] https://github.com/aappleby/smhasher

zstd source repository: https://github.com/facebook/zstd
XXHash source repository: https://github.com/cyan4973/xxhash

Signed-off-by: Nick Terrell <terrelln@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2017-08-15 09:02:07 -07:00
Linus Torvalds ef954844c7 Linux 4.13-rc5 2017-08-13 16:01:32 -07:00
Linus Torvalds b2298fc900 Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS fixes from Ralf Baechle:
 "Another round of MIPS fixes:

   - compressed boot: Ignore a generated .c file

   - VDSO: Fix a register clobber list

   - DECstation: Fix an int-handler.S CPU_DADDI_WORKAROUNDS regression

   - Octeon: Fix recent cleanups that cleaned away a bit too much thus
     breaking the arch side of the EDAC and USB drivers.

   - uasm: Fix duplicate const in "const struct foo const bar[]" which
     GCC 7.1 no longer accepts.

   - Fix race on setting and getting cpu_online_mask

   - Fix preemption issue. To do so cleanly introduce macro to get the
     size of L3 cache line.

   - Revert include cleanup that sometimes results in build error

   - MicroMIPS uses bit 0 of the PC to indicate microMIPS mode. Make
     sure this bit is set for kernel entry as well.

   - Prevent configuring the kernel for both microMIPS and MT. There are
     no such CPUs currently and thus the combination is unsupported and
     results in build errors.

  This has been sitting in linux-next for a few days and has survived
  automated testing by Imagination's test farm. No known regressions
  pending except a number of issues that crept up due to lots of people
  switching to GCC 7.1"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: Set ISA bit in entry-y for microMIPS kernels
  MIPS: Prevent building MT support for microMIPS kernels
  MIPS: PCI: Fix smp_processor_id() in preemptible
  MIPS: Introduce cpu_tcache_line_size
  MIPS: DEC: Fix an int-handler.S CPU_DADDI_WORKAROUNDS regression
  MIPS: VDSO: Fix clobber lists in fallback code paths
  Revert "MIPS: Don't unnecessarily include kmalloc.h into <asm/cache.h>."
  MIPS: OCTEON: Fix USB platform code breakage.
  MIPS: Octeon: Fix broken EDAC driver.
  MIPS: gitignore: ignore generated .c files
  MIPS: Fix race on setting and getting cpu_online_mask
  MIPS: mm: remove duplicate "const" qualifier on insn_table
2017-08-13 15:34:28 -07:00
Linus Torvalds c9dc281d91 driver core fixes for 4.13-rc5
Here are 3 firmware core fixes for 4.13-rc5.
 
 All three of these fix reported issues and have been floating around for
 a few weeks.  They have been in linux-next with no reported problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWY+z8w8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yngsgCeJQzYKjLyfY1QXhhjLY2Xy/ufFYIAoM2WafSL
 7OnYFrt2s2VWPp3jEY9a
 =XEz8
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are three firmware core fixes for 4.13-rc5.

  All three of these fix reported issues and have been floating around
  for a few weeks. They have been in linux-next with no reported
  problems"

* tag 'driver-core-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  firmware: avoid invalid fallback aborts by using killable wait
  firmware: fix batched requests - send wake up on failure on direct lookups
  firmware: fix batched requests - wake all waiters
2017-08-13 12:44:18 -07:00
Linus Torvalds ce7ba95cf0 char/misc fixes for 4.13-rc5
Here are two patches for 4.13-rc5.
 
 One is a fix for a reported thunderbolt issue, and the other a fix for
 an MEI driver issue.  Both have been in linux-next with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWY+zfg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynOKQCfYPp20/4S/DRl/O9mtFG6+Iczmm8AnAnFUVrN
 EHmUF2ipsm4FQ4iZSot3
 =yLZO
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc fixes from Greg KH:
 "Here are two patches for 4.13-rc5.

  One is a fix for a reported thunderbolt issue, and the other a fix for
  an MEI driver issue. Both have been in linux-next with no reported
  issues"

* tag 'char-misc-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  thunderbolt: Do not enumerate more ports from DROM than the controller has
  mei: exclude device from suspend direct complete optimization
2017-08-13 12:41:58 -07:00
Linus Torvalds 438630ef5b tty/serial fixes for 4.13-rc5
Here are two tty serial driver fixes for 4.13-rc5.  One is a revert of a
 -rc1 patch that turned out to not be a good idea, and the other is a fix
 for the pl011 serial driver.
 
 Both have been in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWY+1vw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykreACeLI2spfagnOcnW264PGHxLMgj47EAn0JPSNIg
 5FLEtVsgxMuc8kc4LkDb
 =58g+
 -----END PGP SIGNATURE-----

Merge tag 'tty-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
 "Here are two tty serial driver fixes for 4.13-rc5. One is a revert of
  a -rc1 patch that turned out to not be a good idea, and the other is a
  fix for the pl011 serial driver.

  Both have been in linux-next with no reported issues"

* tag 'tty-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  Revert "serial: Delete dead code for CIR serial ports"
  tty: pl011: fix initialization order of QDF2400 E44
2017-08-13 12:33:35 -07:00
Linus Torvalds dd95f18607 staging/iio fixes for 4.13-rc5
Here are some Staging and IIO driver fixes for 4.13-rc5.
 
 Nothing major, just a number of small fixes for reported issues.  All of
 these have been in linux-next for a while now with no reported issues.
 Full details are in the shortlog.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWY+0+A8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykwSgCgzvxKfEcAHXOxB90/bHl3yWh+Xe0AoNRX5xEX
 gavfFbhZlrPqtlZGX1ZI
 =NmcC
 -----END PGP SIGNATURE-----

Merge tag 'staging-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging/iio fixes from Greg KH:
 "Here are some Staging and IIO driver fixes for 4.13-rc5.

  Nothing major, just a number of small fixes for reported issues. All
  of these have been in linux-next for a while now with no reported
  issues. Full details are in the shortlog"

* tag 'staging-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: comedi: comedi_fops: do not call blocking ops when !TASK_RUNNING
  iio: aspeed-adc: wait for initial sequence.
  iio: accel: bmc150: Always restore device to normal mode after suspend-resume
  staging:iio:resolver:ad2s1210 fix negative IIO_ANGL_VEL read
  iio: adc: axp288: Fix the GPADC pin reading often wrongly returning 0
  iio: adc: vf610_adc: Fix VALT selection value for REFSEL bits
  iio: accel: st_accel: add SPI-3wire support
  iio: adc: Revert "axp288: Drop bogus AXP288_ADC_TS_PIN_CTRL register modifications"
  iio: adc: sun4i-gpadc-iio: fix unbalanced irq enable/disable
  iio: pressure: st_pressure_core: disable multiread by default for LPS22HB
  iio: light: tsl2563: use correct event code
2017-08-13 12:30:17 -07:00
Linus Torvalds 10cec917d0 USB fixes for 4.13-rc5
Here are a number of small USB driver fixes and new device ids for
 4.13-rc5.  There is the usual gadget driver fixes, some new quirks for
 "messy" hardware, and some new device ids.
 
 All have been in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWY+2eA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynSGACgrMjsqOsgjgVEnc1IdjS2gHEZnfcAoMmyBGwo
 cNOM+8gnysJLFUTwkppl
 =+bPU
 -----END PGP SIGNATURE-----

Merge tag 'usb-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are a number of small USB driver fixes and new device ids for
  4.13-rc5. There is the usual gadget driver fixes, some new quirks for
  "messy" hardware, and some new device ids.

  All have been in linux-next with no reported issues"

* tag 'usb-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: serial: pl2303: add new ATEN device id
  usb: quirks: Add no-lpm quirk for Moshi USB to Ethernet Adapter
  USB: Check for dropped connection before switching to full speed
  usb:xhci:Add quirk for Certain failing HP keyboard on reset after resume
  usb: renesas_usbhs: gadget: fix unused-but-set-variable warning
  usb: renesas_usbhs: Fix UGCTRL2 value for R-Car Gen3
  usb: phy: phy-msm-usb: Fix usage of devm_regulator_bulk_get()
  usb: gadget: udc: renesas_usb3: Fix usb_gadget_giveback_request() calling
  usb: dwc3: gadget: Correct ISOC DATA PIDs for short packets
  USB: serial: option: add D-Link DWM-222 device ID
  usb: musb: fix tx fifo flush handling again
  usb: core: unlink urbs from the tail of the endpoint's urb_list
  usb-storage: fix deadlock involving host lock and scsi_done
  uas: Add US_FL_IGNORE_RESIDUE for Initio Corporation INIC-3069
  USB: hcd: Mark secondary HCD as dead if the primary one died
  USB: serial: cp210x: add support for Qivicon USB ZigBee dongle
2017-08-13 12:27:42 -07:00
Linus Torvalds 89a55278de Another MTD fix for v4.13-rc5:
An mtdblock regression occurred in -rc1 (all writes were broken!), in the
 process of some block subsystem refactoring. Noticed and fixed last week, but
 I'm a little slow on the uptake.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZj3ucAAoJEFySrpd9RFgtsLQP/AzsaCfuqELdXnuE583V2Ncx
 yZH6xqi/8m+74sz3jdfui+wger97/UnEDoRkfV7cdbaE74JPfVl0lmf/Ue6BXSe/
 pCM+QC26eEJVw+lsFL+AK6uVgcMXuGCAOYAvnjk0AwS/6qam2Er+f1CvqrD3FVDI
 63XH+nFR7q0RSzWpNtDGnaJ78dkuK1Q/qoK2Iqklwzqhdy6YNLF96tScH0ZvC0Jh
 vsx/IU/jQwGRjF2S6OiYnlB99p7x8D8KszntL+Y9LEsSaxqs0ZKlrwbCjj0qcKCQ
 4k+iwPXAtWia24N3pSPXyWX0vuhyAHZd+kNZ9/7obROnZwcU6fybqXrmj8+PtIhF
 JWctUrxELXsv89XBAD7LPcCT+E0s2RxbC49vVWvd6CvsPWfk15nUikKk8jsQK34g
 GErbxavN0cKaYHU5FDngRkl2uFZMvJf1CtKuxouo6wGAWQNScX6172dNHZX5lJ2Q
 O6FBj4+xxXSuHLIagBaiGC3EsIV1Dyl24xj3/dkWUr0qv+rAFmlEagnmCr2t3tDE
 ccjQZSOx9oVypMgr6GE5YdUwyMn4woGTc6npQxSuFyxAo41CDCbAWmZ06k60U5V/
 Uiz1XlwzqR/FhIayi21vPRKNdOug+U5KuO3UEGvGrdujitW/RiY8MvOkRmUDGc9h
 UgTyQ0BXcw9WsarHANFY
 =RDXe
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20170812' of git://git.infradead.org/linux-mtd

Pull another MTD fix from Brian Norris:
 "An mtdblock regression occurred in -rc1 (all writes were broken!), in
  the process of some block subsystem refactoring. Noticed and fixed
  last week, but I'm a little slow on the uptake"

* tag 'for-linus-20170812' of git://git.infradead.org/linux-mtd:
  mtd: blkdevs: Fix mtd block write failure
2017-08-12 16:19:43 -07:00
Abhishek Sahu 9a51544774 mtd: blkdevs: Fix mtd block write failure
All the MTD block write requests are failing with
following error messages

    mkfs.ext4  /dev/mtdblock0

    print_req_error: I/O error, dev mtdblock0, sector 0
    Buffer I/O error on dev mtdblock0, logical block 0,
    lost async page write

The control is going to default case after block write request
because of missing return.

Fixes: commit 2a842acab1 ("block: introduce new block status code type")
Signed-off-by: Abhishek Sahu <absahu@codeaurora.org>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
2017-08-12 14:53:24 -07:00
Linus Torvalds a99bcdce83 Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "The highlights include:

   - Fix iscsi-target payload memory leak during
     ISCSI_FLAG_TEXT_CONTINUE (Varun Prakash)

   - Fix tcm_qla2xxx incorrect use of tcm_qla2xxx_free_cmd during ABORT
     (Pascal de Bruijn + Himanshu Madhani + nab)

   - Fix iscsi-target long-standing issue with parallel delete of a
     single network portal across multiple target instances (Gary Guo +
     nab)

   - Fix target dynamic se_node GPF during uncached shutdown regression
     (Justin Maggard + nab)"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  target: Fix node_acl demo-mode + uncached dynamic shutdown regression
  iscsi-target: Fix iscsi_np reset hung task during parallel delete
  qla2xxx: Fix incorrect tcm_qla2xxx_free_cmd use during TMR ABORT (v2)
  cxgbit: fix sg_nents calculation
  iscsi-target: fix invalid flags in text response
  iscsi-target: fix memory leak in iscsit_setup_text_cmd()
  cxgbit: add missing __kfree_skb()
  tcmu: free old string on reconfig
  tcmu: Fix possible to/from address overflow when doing the memcpy
2017-08-12 12:08:59 -07:00
Linus Torvalds 043cd07c55 xen: Fixes for 4.13-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABAgAGBQJZjs9BAAoJELDendYovxMvXXAH/j3pdoshQbflSUBsDAkybhv5
 BVe7+bhtwnoawcjCpXq27SMY3qG/YWnATW28XjxBCoe3t7StNcJr5QGXTWMnTjwN
 f/YA0aqtCoLp9JhovTi9WTTCf1/I9CKYFBdCaAmLkDeMudyifZkbXiDbDe0UZmAc
 UJt0Jx8KrdMGkuRVp92049calluv+PDHO7gUpGpzoHDJ0IXc1cH9caHTbL+LhioY
 o0qqQOz9FnJQIvqSGYRkjXudmGwHYCr61yXvWhwqa4PE3Tzss2ckGtzZPLI8s1QN
 p5m01FbIMQKjLbwpQZaRWmGxSzY2vYxf/TShK8eIsBfRYxsR4d7cXULC2vIJGFI=
 =jiAk
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.13b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
 "Some fixes for Xen:

   - a fix for a regression introduced in 4.13 for a Xen HVM-guest
     configured with KASLR

   - a fix for a possible deadlock in the xenbus driver when booting the
     system

   - a fix for lost interrupts in Xen guests"

* tag 'for-linus-4.13b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/events: Fix interrupt lost during irq_disable and irq_enable
  xen: avoid deadlock in xenbus
  xen: fix hvm guest with kaslr enabled
  xen: split up xen_hvm_init_shared_info()
  x86: provide an init_mem_mapping hypervisor hook
2017-08-12 09:01:36 -07:00
Linus Torvalds 216e4a1def Some more NFS client bugfixes for 4.13
Stable fix:
 - Fix leaking nfs4_ff_ds_version array
 
 Other fixes:
 - Improve TEST_STATEID OLD_STATEID handling to prevent recovery loop
 - Require 64-bit sector_t for pNFS blocklayout to prevent 32-bit compile
 errors
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlmOFIIACgkQ18tUv7Cl
 QOsaUQ/9E7lAP6yYp8HfjIBayN1gcme0ZeGzmWVdP8R9isvqTE0MjrwoNxk7h61H
 La/qUcymE32bMX8qYlDs0mw+yhiTcR/UoP5lS/4FCSUZoQsE6BWXoh+O9QlqEcuE
 mFbA9SV52Pf5Mdc/bTNKyh7jgCjeqzlu2sRo5LUM+N7G/M2a5RPfJVGVNYpOmVs/
 ay30B5tHG/K3eeXECLjFTw3HeMorsS2coTaxtX6RghqPoVF6OFZarMUt69IX3zgg
 jBjokz7YfaPSeOEIOapGGRRARHRBAaPE8TvAtRd45R2pMk+Lr12cFWLjT72wRCCM
 nXrTpJc+q8feje9YpT5yoKtgRnW6etxKM8dtyYrXG1NO+dfZHNIe2Z1ARplhzhV3
 Rt8lBV0N0b7kHZfyMJjYINhAbUxvS8UghRpljuHm4+f1lkoV6cVhKoaat/7MQDwZ
 I55M2Edl+A6wPQA7hpFuIT++PVN6GDK7D1rZTKaDBfZ3OCTOQLx0g1kZwHYs/lmk
 gvvtkj82RmbIPoG1rbxHTJFoQdVrpVCYAWr4rbgqNvUrZCjxTRmwRmyMpC/M1cXI
 noyZ/F+VdVLa0mADKMUmiQJ6QkoHjRIAIqlJbLRRl2VFlWHfu7hUiXk7hqt5ocQW
 cpxwird0Fur8cbEKVriRcwNpqGBrDDO7bv1lyQkwEOeHWZ6Fv9o=
 =1/Ms
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.13-5' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client fixes from Anna Schumaker:
 "A few more NFS client bugfixes from me for rc5.

  Dros has a stable fix for flexfiles to prevent leaking the
  nfs4_ff_ds_version arrays when freeing a layout, Trond fixed a
  potential recovery loop situation with the TEST_STATEID operation, and
  Christoph fixed up the pNFS blocklayout Kconfig options to prevent
  unsafe use with kernels that don't have large block device support.
  Summary:

  Stable fix:
   - fix leaking nfs4_ff_ds_version array

  Other fixes:
   - improve TEST_STATEID OLD_STATEID handling to prevent recovery loop

   - require 64-bit sector_t for pNFS blocklayout to prevent 32-bit
     compile errors"

* tag 'nfs-for-4.13-5' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  pnfs/blocklayout: require 64-bit sector_t
  NFSv4: Ignore NFS4ERR_OLD_STATEID in nfs41_check_open_stateid()
  nfs/flexfiles: fix leak of nfs4_ff_ds_version arrays
2017-08-11 13:54:09 -07:00
Linus Torvalds e0d0e045b8 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "A set of fixes that should go into this series. This contains:

   - Fix from Bart for blk-mq requeue queue running, preventing a
     continued loop of run/restart.

   - Fix for a bio/blk-integrity issue, in two parts. One from
     Christoph, fixing where verification happens, and one from Milan,
     for a NULL profile.

   - NVMe pull request, most of the changes being for nvme-fc, but also
     a few trivial core/pci fixes"

* 'for-linus' of git://git.kernel.dk/linux-block:
  nvme: fix directive command numd calculation
  nvme: fix nvme reset command timeout handling
  nvme-pci: fix CMB sysfs file removal in reset path
  lpfc: support nvmet_fc defer_rcv callback
  nvmet_fc: add defer_req callback for deferment of cmd buffer return
  nvme: strip trailing 0-bytes in wwid_show
  block: Make blk_mq_delay_kick_requeue_list() rerun the queue at a quiet time
  bio-integrity: only verify integrity on the lowest stacked driver
  bio-integrity: Fix regression if profile verify_fn is NULL
2017-08-11 12:26:49 -07:00
Linus Torvalds 0993133bb8 MMC core:
- Fix lockdep splat when removing mmc_block module
  - Fix the logic for setting eMMC HS400ES signal voltage
 MMC host:
  - omap_hsmmc: Add CMD23 capability to fix -EIO errors
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZjb2FAAoJEP4mhCVzWIwpESsQAKCbrohX+klA9WqYFIjGvX7W
 J24E6w3E8vOrtVh1QCX3zjHAcuzF5LxeHj8YZBDSdxuaIA0Cj86ruNo/YZOku2kG
 sZQiYZoQV/t4CSjgy0pqSEjU1g/eTxgwLq6AxuxQ8xkZvRgVJdjUgHJ1Qzt1p/uu
 qMQefXhD0C/0b1mAI9VL0aMss+lJJjAxgwFIQsOUBkuTrXSI1LVI+mvpJRiltU5k
 A2XGxsYjzpOOB2vJr0OtszcHIwFOXYWURIB3vicW/Sq37AW+D6D0nt92CE3sipBe
 vNZvxhf0I5WgFVGJ0F9G+sgP7pQPpDouMw5eLMf7SjNwEqPwRKjfJ+iz5XmV/RPF
 DHKYR6Civ+5UrakJt4w0MwJ/0tBrFApcBgbOTVLQM2cRk31QPmUJbh3V6uO++htN
 9kTphS/oUNACwnyB1VF+cNxKmvjEtNTfiEUCiVy7+H4LmQKE3oV+Xh9SwpSCEail
 gCD7TJ/WLnOuG1Nx65Gm2qj/99mVldBEkVQs0JMF5Pv/dZZuDAybd8oWtfdkChqO
 T9UQqSkr3NApynhtxBwjmExfDBPa6ESto5iRMWfWOFIygwvkGzA9IQBYH23Pstry
 ayPmEgtb9qcEb4xKhvbcikoN86/MtygpeDOw5yJmgGqNV8VJ1h3XEa99dlFSZzQF
 J9/v1zaK90DOS3s/KjHN
 =jwOl
 -----END PGP SIGNATURE-----

Merge tag 'mmc-v4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
 "MMC core:

   - fix lockdep splat when removing mmc_block module

   - fix the logic for setting eMMC HS400ES signal voltage

  MMC host:

   - omap_hsmmc: add CMD23 capability to fix -EIO errors"

* tag 'mmc-v4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: block: fix lockdep splat when removing mmc_block module
  mmc: mmc: correct the logic for setting HS400ES signal voltage
  mmc: host: omap_hsmmc: Add CMD23 capability to omap_hsmmc driver
2017-08-11 11:56:54 -07:00
Linus Torvalds 7eb97ba611 fbdev fixes for v4.13-rc5:
- allow user to disable write combined mapping in efifb driver (Dave Airlie)
 - fix use after free bugs on driver removal in imxfb driver (Dan Carpenter)
 - fix unused variable warning in omapfb driver (Arnd Bergmann)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJZjbrIAAoJEH4ztj+gR8IL89MP/3ByMRtx9Hw8NkYwRr4zHLK/
 87d0i+JWeD4fhgdV72QD5IdrrW0JKhnCF1zaW3iAlE8DAv0sGHlDbFtV7PCnzrL8
 FNuUEWHCox5ediUD7D8WtN5G9GkNcvpxgyi1xI08trgb6wEUM9JR2o81vDzI1+sO
 M5ovBQOKMYmopwCsQr35Y3+FDJzMxT1cQGwP+oyp1GcS34EgIlxfLONyLAGDVCf5
 v4Zg6vMDe5qZpMg/jVHYlBum/UVmgit+R5+PPLIEDAvyoAbyL3aK6jnjOz7Fc8ka
 sWHmcPMOhBJPMyl3m1gO9ZrfLwd1SpbO/MJukVQMBd6rYInb85lZGBF5pJkSDLvZ
 fV0cozf0LJc1CgkjguZ8uPK1go1qHypwSZCg4bv+iPY/MRjCAlRG3xRLbuPMYLRA
 PaQRvzB3sk2IA79uyCUFMRNY+clAXNeuFTdaWsBot4Esdic+mkRjKBjQMdCra8Id
 y8amNH31lUIBKl4kyJ2wbqBW4HS5hmR53X8pzz6O9InBz10Ojy9OHgHezhSbQeW9
 xdo257BWUjIfWKSiNgfkHy4xhVDpIkSw8fjyJt+Zt9owL69/6I3YXgsEiO+BZer/
 wrt/tTMR/ktM0i39ZguY6T5MjmdCKY51ViDIPbmz4tVcLraQodLy2OaGY9zw9tLa
 krbjZeuFtdcWbip76gJM
 =HcUv
 -----END PGP SIGNATURE-----

Merge tag 'fbdev-v4.13-rc5' of git://github.com/bzolnier/linux

Pull fbdev fixes from Bartlomiej Zolnierkiewicz:

 - allow user to disable write combined mapping in efifb driver (Dave
   Airlie)

 - fix use after free bugs on driver removal in imxfb driver (Dan
   Carpenter)

 - fix unused variable warning in omapfb driver (Arnd Bergmann)

* tag 'fbdev-v4.13-rc5' of git://github.com/bzolnier/linux:
  efifb: allow user to disable write combined mapping.
  fbdev: omapfb: remove unused variable
  video: fbdev: imxfb: use after free in imxfb_remove()
2017-08-11 11:44:18 -07:00
Linus Torvalds 2bfc37cdef Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
 "Fix a few bugs in fuse"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: set mapping error in writepage_locked when it fails
  fuse: Dont call set_page_dirty_lock() for ITER_BVEC pages for async_dio
  fuse: initialize the flock flag in fuse_file on allocation
2017-08-11 11:20:48 -07:00
Linus Torvalds 7d7a827ba9 IOMMU Fix for Linux v4.13-rc4
- Fix a NULL-pointer dereference in arm_smmu_add_device
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJZjcvrAAoJECvwRC2XARrjM60P/AxI7iXgqonciTD/pEbH2yQI
 CdWhLuUhSakG5H33E9WzVw1AhxIOs79AMjjf9YHxC2mPZjH3Mkhx3zIZKwV09u/D
 Exs0MNI7KV+lv/dJ/0ttaOuPFmv9McFxKSf1XHEzRZfEAYkwNf31XNlUf4NoR1rm
 6izUo0CvGZo+wD+GRHW+KlUxFWW+YH8eFaZ3G6ErqPsBBgfyDwHilx85DzWQUazU
 ocHuR6J6uBtTr44oyEsXWkS8S+dIAim73vdnvChWqI5HRPbPZY2lmZN2o5LTe/if
 5BGrpZy1C94d/chw5DuAanX/cFtXrc48j9iKqVVM+vqZrhTMZleogWHI7NUSt2dP
 LyZvFpBjEowErTdxY3dIP2u7vKMjHniMFoalx24kSUNoHxCnPZx5gfrTbw7qGBic
 Py+wVNGhjtNGDSCPIwfMwLw7Xduqugx4XKZ3yya3ka87UUEix42UEIsYwvQZ82O8
 KYvkdwL5oChJH7BPG+iz19I8hIoPm2HHdCOVdVrkcRH13W70MwvLuVXqEOZD3/K0
 kasZo1YZgNn5a7U0/t0EMWuDHTvWnJn469YPbhd8HMJjGU7mty1O3lPcNzpyftuq
 ps7eRFYc7O3ab5A2naAI5f4qPI0igXOeXXHiz87bgS1x5AbcIdIGJyqyTfN7arUE
 tvdC4IDJFiaCO+lI3+Tg
 =fhCL
 -----END PGP SIGNATURE-----

Merge tag 'iommu-fixes-v4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU fix from Joerg Roedel:
 "Fix a NULL-pointer dereference in arm_smmu_add_device"

* tag 'iommu-fixes-v4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/arm-smmu: fix null-pointer dereference in arm_smmu_add_device
2017-08-11 11:15:51 -07:00
Christoph Hellwig 8a9d6e964d pnfs/blocklayout: require 64-bit sector_t
The blocklayout code does not compile cleanly for a 32-bit sector_t,
and also has no reliable checks for devices sizes, which makes it
unsafe to use with a kernel that doesn't support large block devices.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 5c83746a0c ("pnfs/blocklayout: in-kernel GETDEVICEINFO XDR parsing")
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-11 14:10:13 -04:00
Linus Torvalds 8001a975f9 powerpc fixes for 4.13 #6
All fixes for code that went in this cycle.
 
  - A revert of an optimisation to the syscall exit path, which could lead to an
    oops on either older machines or machines with > 1T of memory.
  - Disable some deep idle states if the firmware configuration for them fails.
  - Re-enable HARD/SOFT lockup detectors in defconfigs after a Kconfig change.
  - Six fairly small patches fixing bugs in our new watchdog code.
 
 Thanks to:
   Gautham R. Shenoy, Nicholas Piggin.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZjZBeAAoJEFHr6jzI4aWA0RoP/jreL470i8sjPvm7EuF0+jWb
 D3gaXx55jPo65CLxDH/qMElKk1VxAgCpWHqVN9e/wd9qKxE+xwQ890Enlx4wQRQe
 I7AkQQoATsf05medbvihTcNRxMcOvLPCi4GP64W/aegaTGliKyFWAu7w/ZsSzGi+
 jaskMrBJAR2X4OBfbUi0UEE/KsvDkUje2qJj1NiFyhFFXLyizy4dQknYXVCFD2LR
 IJah0ZaZM0scOd7YDfsfybBbBDnwi4VqdHTLA5NFBB+sz55+bO9lVOYWpnUYLouH
 sdZjpjtCuIS+aYOLHupCDXzW2ntv2AVDe4pckTpXrJbUYypXlg8lVx6kfbJ2QETr
 NrNHljcmmqgqDSglUrDdEpkuQT6Y6hJMroUvo8h20Y+8WXNaKFA5VhhGLkQuBcE4
 v/osXs1aiiBURilWTDS5Jo1f5FTZT8ZkNr1/n5jC7q+Y/qkVFglr32hjuomwWxOz
 SmnwOOcIdJnykr3/pUCvOAffApLfzWFrZeW1MpV1NVlodrubKU939wYZT0yU86b3
 ZSs7pK07eZhfdepanG3nCDahhl2qxMyuEjglSfX8WDvVB2MdoxhoUeLwbyMV9qIZ
 5A7Anozod4tOrU+qxtPp3uQ230nbQ9JW9yIQM/tdr88k8swpo3NO4O5kfcmH8FVN
 CBG5pCRNXIZhtJiYVAiv
 =Vy1K
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.13-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "All fixes for code that went in this cycle.

   - a revert of an optimisation to the syscall exit path, which could
     lead to an oops on either older machines or machines with > 1TB of
     memory

   - disable some deep idle states if the firmware configuration for
     them fails

   - re-enable HARD/SOFT lockup detectors in defconfigs after a Kconfig
     change

   - six fairly small patches fixing bugs in our new watchdog code

  Thanks to: Gautham R Shenoy, Nicholas Piggin"

* tag 'powerpc-4.13-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/watchdog: add locking around init/exit functions
  powerpc/watchdog: Fix marking of stuck CPUs
  powerpc/watchdog: Fix final-check recovered case
  powerpc/watchdog: Moderate touch_nmi_watchdog overhead
  powerpc/watchdog: Improve watchdog lock primitive
  powerpc: NMI IPI improve lock primitive
  powerpc/configs: Re-enable HARD/SOFT lockup detectors
  powerpc/powernv/idle: Disable LOSE_FULL_CONTEXT states when stop-api fails
  Revert "powerpc/64: Avoid restore_math call if possible in syscall exit"
2017-08-11 08:56:01 -07:00
Artem Savkov a7990c647b iommu/arm-smmu: fix null-pointer dereference in arm_smmu_add_device
Commit c54451a "iommu/arm-smmu: Fix the error path in arm_smmu_add_device"
removed fwspec assignment in legacy_binding path as redundant which is
wrong. It needs to be updated after fwspec initialisation in
arm_smmu_register_legacy_master() as it is dereferenced later. Without
this there is a NULL-pointer dereference panic during boot on some hosts.

Signed-off-by: Artem Savkov <asavkov@redhat.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-08-11 16:56:51 +02:00
Liu Shuo 020db9d3c1 xen/events: Fix interrupt lost during irq_disable and irq_enable
Here is a device has xen-pirq-MSI interrupt. Dom0 might lost interrupt
during driver irq_disable/irq_enable. Here is the scenario,
 1. irq_disable -> disable_dynirq -> mask_evtchn(irq channel)
 2. dev interrupt raised by HW and Xen mark its evtchn as pending
 3. irq_enable -> startup_pirq -> eoi_pirq ->
    clear_evtchn(channel of irq) -> clear pending status
 4. consume_one_event process the irq event without pending bit assert
    which result in interrupt lost once
 5. No HW interrupt raising anymore.

Now use enable_dynirq for enable_pirq of xen_pirq_chip to remove
eoi_pirq when irq_enable.

Signed-off-by: Liu Shuo <shuo.a.liu@intel.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-08-11 16:46:01 +02:00
Juergen Gross 529871bb3c xen: avoid deadlock in xenbus
When starting the xenwatch thread a theoretical deadlock situation is
possible:

xs_init() contains:

    task = kthread_run(xenwatch_thread, NULL, "xenwatch");
    if (IS_ERR(task))
        return PTR_ERR(task);
    xenwatch_pid = task->pid;

And xenwatch_thread() does:

    mutex_lock(&xenwatch_mutex);
    ...
    event->handle->callback();
    ...
    mutex_unlock(&xenwatch_mutex);

The callback could call unregister_xenbus_watch() which does:

    ...
    if (current->pid != xenwatch_pid)
        mutex_lock(&xenwatch_mutex);
    ...

In case a watch is firing before xenwatch_pid could be set and the
callback of that watch unregisters a watch, then a self-deadlock would
occur.

Avoid this by setting xenwatch_pid in xenwatch_thread().

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-08-11 16:45:56 +02:00
Jens Axboe 4a8b53be64 Merge branch 'nvme-4.13' of git://git.infradead.org/nvme into for-linus
Pull NVMe fixes from Christoph:

"A few more small fixes - the fc/lpfc update is the biggest by far."
2017-08-11 08:07:19 -06:00
Juergen Gross 4ca83dcf4e xen: fix hvm guest with kaslr enabled
A Xen HVM guest running with KASLR enabled will die rather soon today
because the shared info page mapping is using va() too early. This was
introduced by commit a5d5f328b0 ("xen:
allocate page for shared info page from low memory").

In order to fix this use early_memremap() to get a temporary virtual
address for shared info until va() can be used safely.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-08-11 15:50:26 +02:00
Juergen Gross 10231f69eb xen: split up xen_hvm_init_shared_info()
Instead of calling xen_hvm_init_shared_info() on boot and resume split
it up into a boot time function searching for the pfn to use and a
mapping function doing the hypervisor mapping call.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-08-11 15:50:24 +02:00
Juergen Gross c138d81163 x86: provide an init_mem_mapping hypervisor hook
Provide a hook in hypervisor_x86 called after setting up initial
memory mapping.

This is needed e.g. by Xen HVM guests to map the hypervisor shared
info page.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-08-11 15:50:21 +02:00
Jeff Layton 9183976ef1 fuse: set mapping error in writepage_locked when it fails
This ensures that we see errors on fsync when writeback fails.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-08-11 11:38:26 +02:00
Linus Torvalds b2dbdf2ca1 i915, msm, nouveau, rockchip, exynos, etnaviv, core fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZjSXnAAoJEAx081l5xIa+G70QAKF+APi228YgYrLBEZ5Y/lh5
 KaMT/vfKXcB1tEP7wVFGdeihDCICQTqtOtpd5Q+XwJbLlfP8kxGRntdotzsJwb9I
 m2okKW6umOhvC3jWUIgDzpitqOrWOPoQFPmiLf2er9RLfQDE7gmhB53obHcQahBt
 cplFwVN/k3oywGd0iJ6evRqD/ShSgy/m2tz2w7Pd8mGSJwEGZvvbYTMm3/s7Dbxh
 KNIfi7tHfQ9LULayjD2FJQ8o5/PN9A5bAVLraXj0XtOehSzwt+d7trvKPc1aZl9A
 Y6O7SUjaoqV5fkwqsosCZpyR+KVBwmXe8k4yA4hm2pHqpvzbfCiSzJhJg7h/klwI
 uZowRhNxxvBwt01CZ0rd8zFkZ8HJ28LsKfyF0BG2haJn7g7gZutQzLKRrFnnND1J
 jGGqVkH+US2Uqewmpr0h5lR7pglxBRXCnpI7ZTH9wNtiyf7z9wHhLrkYEveXcVf6
 DIEEf103C7lro+cE+iJU0OZPgLRWePqIrAmZtLD1VQr9jbXhkJphFpTb+0OJhpeb
 KscH4m8Q+T0JlqcyIBk+K2ZCT+zxM0LMohXrlwfd+QU38nIbgWXJY01lGyhXLhMS
 aSImarAGAb1xNru24ylsUgbALRXQfJNTxeNNUu55/MVLi2rSFvm0tb/yy1oU5Bpr
 J+guLSplCdrRQ81RTvR2
 =rjAR
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-for-v4.13-rc5' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
 "Nothing too earth shattering here, it just seems like lots of little
  things all over the place.

  msm has probably the larger amount of changes, but they all seem fine,
  otherwise, some rockchip, i915, etnaviv and exynos fixes, along with
  one nouveau regression fix for some older GPUs"

* tag 'drm-fixes-for-v4.13-rc5' of git://people.freedesktop.org/~airlied/linux: (35 commits)
  drm/nouveau/disp/nv04: avoid creation of output paths
  drm: make DRM_STM default n
  drm/exynos: forbid creating framebuffers from too small GEM buffers
  drm/etnaviv: Fix off-by-one error in reloc checking
  drm/i915: fix backlight invert for non-zero minimum brightness
  drm/i915/shrinker: Wrap need_resched() inside preempt-disable
  drm/i915/perf: fix flex eu registers programming
  drm/i915: Fix out-of-bounds array access in bdw_load_gamma_lut
  drm/i915/gvt: Change the max length of mmio_reg_rw from 4 to 8
  drm/i915/gvt: Initialize MMIO Block with HW state
  drm/rockchip: vop: report error when check resource error
  drm/rockchip: vop: round_up pitches to word align
  drm/rockchip: vop: fix NV12 video display error
  drm/rockchip: vop: fix iommu page fault when resume
  drm/i915/gvt: clean workload queue if error happened
  drm/i915/gvt: change resetting to resetting_eng
  drm/msm: gpu: don't abuse dma_alloc for non-DMA allocations
  drm/msm: gpu: call qcom_mdt interfaces only for ARCH_QCOM
  drm/msm/adreno: Prevent unclocked access when retrieving timestamps
  drm/msm: Remove __user from __u64 data types
  ...
2017-08-10 22:33:47 -07:00
Linus Torvalds 27df704d43 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "21 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (21 commits)
  userfaultfd: replace ENOSPC with ESRCH in case mm has gone during copy/zeropage
  zram: rework copy of compressor name in comp_algorithm_store()
  rmap: do not call mmu_notifier_invalidate_page() under ptl
  mm: fix list corruptions on shmem shrinklist
  mm/balloon_compaction.c: don't zero ballooned pages
  MAINTAINERS: copy virtio on balloon_compaction.c
  mm: fix KSM data corruption
  mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem
  mm: make tlb_flush_pending global
  mm: refactor TLB gathering API
  Revert "mm: numa: defer TLB flush for THP migration as long as possible"
  mm: migrate: fix barriers around tlb_flush_pending
  mm: migrate: prevent racy access to tlb_flush_pending
  fault-inject: fix wrong should_fail() decision in task context
  test_kmod: fix small memory leak on filesystem tests
  test_kmod: fix the lock in register_test_dev_kmod()
  test_kmod: fix bug which allows negative values on two config options
  test_kmod: fix spelling mistake: "EMTPY" -> "EMPTY"
  userfaultfd: hugetlbfs: remove superfluous page unlock in VM_SHARED case
  mm: ratelimit PFNs busy info message
  ...
2017-08-10 16:20:52 -07:00
Mike Rapoport e86b298beb userfaultfd: replace ENOSPC with ESRCH in case mm has gone during copy/zeropage
When the process exit races with outstanding mcopy_atomic, it would be
better to return ESRCH error.  When such race occurs the process and
it's mm are going away and returning "no such process" to the uffd
monitor seems better fit than ENOSPC.

Link: http://lkml.kernel.org/r/1502111545-32305-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10 15:54:07 -07:00
Matthias Kaehlcke f357e345ee zram: rework copy of compressor name in comp_algorithm_store()
comp_algorithm_store() passes the size of the source buffer to strlcpy()
instead of the destination buffer size.  Make it explicit that the two
buffers have the same size and use strcpy() instead of strlcpy().  The
latter can be done safely since the function ensures that the string in
the source buffer is terminated.

Link: http://lkml.kernel.org/r/20170803163350.45245-1-mka@chromium.org
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10 15:54:07 -07:00
Kirill A. Shutemov aac2fea94f rmap: do not call mmu_notifier_invalidate_page() under ptl
MMU notifiers can sleep, but in page_mkclean_one() we call
mmu_notifier_invalidate_page() under page table lock.

Let's instead use mmu_notifier_invalidate_range() outside
page_vma_mapped_walk() loop.

[jglisse@redhat.com: try_to_unmap_one() do not call mmu_notifier under ptl]
  Link: http://lkml.kernel.org/r/20170809204333.27485-1-jglisse@redhat.com
Link: http://lkml.kernel.org/r/20170804134928.l4klfcnqatni7vsc@black.fi.intel.com
Fixes: c7ab0d2fdc ("mm: convert try_to_unmap_one() to use page_vma_mapped_walk()")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Reported-by: axie <axie@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Writer, Tim" <Tim.Writer@amd.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10 15:54:07 -07:00
Cong Wang d041353dc9 mm: fix list corruptions on shmem shrinklist
We saw many list corruption warnings on shmem shrinklist:

  WARNING: CPU: 18 PID: 177 at lib/list_debug.c:59 __list_del_entry+0x9e/0xc0
  list_del corruption. prev->next should be ffff9ae5694b82d8, but was ffff9ae5699ba960
  Modules linked in: intel_rapl sb_edac edac_core x86_pkg_temp_thermal coretemp iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul ghash_clmulni_intel raid0 dcdbas shpchp wmi hed i2c_i801 ioatdma lpc_ich i2c_smbus acpi_cpufreq tcp_diag inet_diag sch_fq_codel ipmi_si ipmi_devintf ipmi_msghandler igb ptp crc32c_intel pps_core i2c_algo_bit i2c_core dca ipv6 crc_ccitt
  CPU: 18 PID: 177 Comm: kswapd1 Not tainted 4.9.34-t3.el7.twitter.x86_64 #1
  Hardware name: Dell Inc. PowerEdge C6220/0W6W6G, BIOS 2.2.3 11/07/2013
  Call Trace:
    dump_stack+0x4d/0x66
    __warn+0xcb/0xf0
    warn_slowpath_fmt+0x4f/0x60
    __list_del_entry+0x9e/0xc0
    shmem_unused_huge_shrink+0xfa/0x2e0
    shmem_unused_huge_scan+0x20/0x30
    super_cache_scan+0x193/0x1a0
    shrink_slab.part.41+0x1e3/0x3f0
    shrink_slab+0x29/0x30
    shrink_node+0xf9/0x2f0
    kswapd+0x2d8/0x6c0
    kthread+0xd7/0xf0
    ret_from_fork+0x22/0x30

  WARNING: CPU: 23 PID: 639 at lib/list_debug.c:33 __list_add+0x89/0xb0
  list_add corruption. prev->next should be next (ffff9ae5699ba960), but was ffff9ae5694b82d8. (prev=ffff9ae5694b82d8).
  Modules linked in: intel_rapl sb_edac edac_core x86_pkg_temp_thermal coretemp iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul ghash_clmulni_intel raid0 dcdbas shpchp wmi hed i2c_i801 ioatdma lpc_ich i2c_smbus acpi_cpufreq tcp_diag inet_diag sch_fq_codel ipmi_si ipmi_devintf ipmi_msghandler igb ptp crc32c_intel pps_core i2c_algo_bit i2c_core dca ipv6 crc_ccitt
  CPU: 23 PID: 639 Comm: systemd-udevd Tainted: G        W       4.9.34-t3.el7.twitter.x86_64 #1
  Hardware name: Dell Inc. PowerEdge C6220/0W6W6G, BIOS 2.2.3 11/07/2013
  Call Trace:
    dump_stack+0x4d/0x66
    __warn+0xcb/0xf0
    warn_slowpath_fmt+0x4f/0x60
    __list_add+0x89/0xb0
    shmem_setattr+0x204/0x230
    notify_change+0x2ef/0x440
    do_truncate+0x5d/0x90
    path_openat+0x331/0x1190
    do_filp_open+0x7e/0xe0
    do_sys_open+0x123/0x200
    SyS_open+0x1e/0x20
    do_syscall_64+0x61/0x170
    entry_SYSCALL64_slow_path+0x25/0x25

The problem is that shmem_unused_huge_shrink() moves entries from the
global sbinfo->shrinklist to its local lists and then releases the
spinlock.  However, a parallel shmem_setattr() could access one of these
entries directly and add it back to the global shrinklist if it is
removed, with the spinlock held.

The logic itself looks solid since an entry could be either in a local
list or the global list, otherwise it is removed from one of them by
list_del_init().  So probably the race condition is that, one CPU is in
the middle of INIT_LIST_HEAD() but the other CPU calls list_empty()
which returns true too early then the following list_add_tail() sees a
corrupted entry.

list_empty_careful() is designed to fix this situation.

[akpm@linux-foundation.org: add comments]
Link: http://lkml.kernel.org/r/20170803054630.18775-1-xiyou.wangcong@gmail.com
Fixes: 779750d20b ("shmem: split huge pages beyond i_size under memory pressure")
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10 15:54:07 -07:00
Wei Wang af54aed94b mm/balloon_compaction.c: don't zero ballooned pages
Revert commit bb01b64cfa ("mm/balloon_compaction.c: enqueue zero page
to balloon device")'

Zeroing ballon pages is rather time consuming, especially when a lot of
pages are in flight. E.g. 7GB worth of ballooned memory takes 2.8s with
__GFP_ZERO while it takes ~491ms without it.

The original commit argued that zeroing will help ksmd to merge these
pages on the host but this argument is assuming that the host actually
marks balloon pages for ksm which is not universally true.  So we pay
performance penalty for something that even might not be used in the end
which is wrong.  The host can zero out pages on its own when there is a
need.

[mhocko@kernel.org: new changelog text]
Link: http://lkml.kernel.org/r/1501761557-9758-1-git-send-email-wei.w.wang@intel.com
Fixes: bb01b64cfa ("mm/balloon_compaction.c: enqueue zero page to balloon device")
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: zhenwei.pi <zhenwei.pi@youruncloud.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10 15:54:07 -07:00
Michael S. Tsirkin c0a6a5ae6b MAINTAINERS: copy virtio on balloon_compaction.c
Changes to mm/balloon_compaction.c can easily break virtio, and virtio
is the only user of that interface.  Add a line to MAINTAINERS so
whoever changes that file remembers to copy us.

Link: http://lkml.kernel.org/r/1501764010-24456-1-git-send-email-mst@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Acked-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10 15:54:07 -07:00
Minchan Kim b3a81d0841 mm: fix KSM data corruption
Nadav reported KSM can corrupt the user data by the TLB batching
race[1].  That means data user written can be lost.

Quote from Nadav Amit:
 "For this race we need 4 CPUs:

  CPU0: Caches a writable and dirty PTE entry, and uses the stale value
  for write later.

  CPU1: Runs madvise_free on the range that includes the PTE. It would
  clear the dirty-bit. It batches TLB flushes.

  CPU2: Writes 4 to /proc/PID/clear_refs , clearing the PTEs soft-dirty.
  We care about the fact that it clears the PTE write-bit, and of
  course, batches TLB flushes.

  CPU3: Runs KSM. Our purpose is to pass the following test in
  write_protect_page():

	if (pte_write(*pvmw.pte) || pte_dirty(*pvmw.pte) ||
	    (pte_protnone(*pvmw.pte) && pte_savedwrite(*pvmw.pte)))

  Since it will avoid TLB flush. And we want to do it while the PTE is
  stale. Later, and before replacing the page, we would be able to
  change the page.

  Note that all the operations the CPU1-3 perform canhappen in parallel
  since they only acquire mmap_sem for read.

  We start with two identical pages. Everything below regards the same
  page/PTE.

  CPU0        CPU1        CPU2        CPU3
  ----        ----        ----        ----
  Write the same
  value on page

  [cache PTE as
   dirty in TLB]

              MADV_FREE
              pte_mkclean()

                          4 > clear_refs
                          pte_wrprotect()

                                      write_protect_page()
                                      [ success, no flush ]

                                      pages_indentical()
                                      [ ok ]

  Write to page
  different value

  [Ok, using stale
   PTE]

                                      replace_page()

  Later, CPU1, CPU2 and CPU3 would flush the TLB, but that is too late.
  CPU0 already wrote on the page, but KSM ignored this write, and it got
  lost"

In above scenario, MADV_FREE is fixed by changing TLB batching API
including [set|clear]_tlb_flush_pending.  Remained thing is soft-dirty
part.

This patch changes soft-dirty uses TLB batching API instead of
flush_tlb_mm and KSM checks pending TLB flush by using
mm_tlb_flush_pending so that it will flush TLB to avoid data lost if
there are other parallel threads pending TLB flush.

[1] http://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com

Link: http://lkml.kernel.org/r/20170802000818.4760-8-namit@vmware.com
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Reported-by: Nadav Amit <namit@vmware.com>
Tested-by: Nadav Amit <namit@vmware.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10 15:54:07 -07:00
Minchan Kim 99baac21e4 mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem
Nadav reported parallel MADV_DONTNEED on same range has a stale TLB
problem and Mel fixed it[1] and found same problem on MADV_FREE[2].

Quote from Mel Gorman:
 "The race in question is CPU 0 running madv_free and updating some PTEs
  while CPU 1 is also running madv_free and looking at the same PTEs.
  CPU 1 may have writable TLB entries for a page but fail the pte_dirty
  check (because CPU 0 has updated it already) and potentially fail to
  flush.

  Hence, when madv_free on CPU 1 returns, there are still potentially
  writable TLB entries and the underlying PTE is still present so that a
  subsequent write does not necessarily propagate the dirty bit to the
  underlying PTE any more. Reclaim at some unknown time at the future
  may then see that the PTE is still clean and discard the page even
  though a write has happened in the meantime. I think this is possible
  but I could have missed some protection in madv_free that prevents it
  happening."

This patch aims for solving both problems all at once and is ready for
other problem with KSM, MADV_FREE and soft-dirty story[3].

TLB batch API(tlb_[gather|finish]_mmu] uses [inc|dec]_tlb_flush_pending
and mmu_tlb_flush_pending so that when tlb_finish_mmu is called, we can
catch there are parallel threads going on.  In that case, forcefully,
flush TLB to prevent for user to access memory via stale TLB entry
although it fail to gather page table entry.

I confirmed this patch works with [4] test program Nadav gave so this
patch supersedes "mm: Always flush VMA ranges affected by zap_page_range
v2" in current mmotm.

NOTE:

This patch modifies arch-specific TLB gathering interface(x86, ia64,
s390, sh, um).  It seems most of architecture are straightforward but
s390 need to be careful because tlb_flush_mmu works only if
mm->context.flush_mm is set to non-zero which happens only a pte entry
really is cleared by ptep_get_and_clear and friends.  However, this
problem never changes the pte entries but need to flush to prevent
memory access from stale tlb.

[1] http://lkml.kernel.org/r/20170725101230.5v7gvnjmcnkzzql3@techsingularity.net
[2] http://lkml.kernel.org/r/20170725100722.2dxnmgypmwnrfawp@suse.de
[3] http://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com
[4] https://patchwork.kernel.org/patch/9861621/

[minchan@kernel.org: decrease tlb flush pending count in tlb_finish_mmu]
  Link: http://lkml.kernel.org/r/20170808080821.GA31730@bbox
Link: http://lkml.kernel.org/r/20170802000818.4760-7-namit@vmware.com
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Reported-by: Nadav Amit <namit@vmware.com>
Reported-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10 15:54:07 -07:00
Minchan Kim 0a2dd266dd mm: make tlb_flush_pending global
Currently, tlb_flush_pending is used only for CONFIG_[NUMA_BALANCING|
COMPACTION] but upcoming patches to solve subtle TLB flush batching
problem will use it regardless of compaction/NUMA so this patch doesn't
remove the dependency.

[akpm@linux-foundation.org: remove more ifdefs from world's ugliest printk statement]
Link: http://lkml.kernel.org/r/20170802000818.4760-6-namit@vmware.com
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10 15:54:07 -07:00
Minchan Kim 56236a5955 mm: refactor TLB gathering API
This patch is a preparatory patch for solving race problems caused by
TLB batch.  For that, we will increase/decrease TLB flush pending count
of mm_struct whenever tlb_[gather|finish]_mmu is called.

Before making it simple, this patch separates architecture specific part
and rename it to arch_tlb_[gather|finish]_mmu and generic part just
calls it.

It shouldn't change any behavior.

Link: http://lkml.kernel.org/r/20170802000818.4760-5-namit@vmware.com
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10 15:54:07 -07:00
Nadav Amit a9b802500e Revert "mm: numa: defer TLB flush for THP migration as long as possible"
While deferring TLB flushes is a good practice, the reverted patch
caused pending TLB flushes to be checked while the page-table lock is
not taken.  As a result, in architectures with weak memory model (PPC),
Linux may miss a memory-barrier, miss the fact TLB flushes are pending,
and cause (in theory) a memory corruption.

Since the alternative of using smp_mb__after_unlock_lock() was
considered a bit open-coded, and the performance impact is expected to
be small, the previous patch is reverted.

This reverts b0943d61b8 ("mm: numa: defer TLB flush for THP migration
as long as possible").

Link: http://lkml.kernel.org/r/20170802000818.4760-4-namit@vmware.com
Signed-off-by: Nadav Amit <namit@vmware.com>
Suggested-by: Mel Gorman <mgorman@suse.de>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10 15:54:07 -07:00
Nadav Amit 0a2c40487f mm: migrate: fix barriers around tlb_flush_pending
Reading tlb_flush_pending while the page-table lock is taken does not
require a barrier, since the lock/unlock already acts as a barrier.
Removing the barrier in mm_tlb_flush_pending() to address this issue.

However, migrate_misplaced_transhuge_page() calls mm_tlb_flush_pending()
while the page-table lock is already released, which may present a
problem on architectures with weak memory model (PPC).  To deal with
this case, a new parameter is added to mm_tlb_flush_pending() to
indicate if it is read without the page-table lock taken, and calling
smp_mb__after_unlock_lock() in this case.

Link: http://lkml.kernel.org/r/20170802000818.4760-3-namit@vmware.com
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10 15:54:07 -07:00
Nadav Amit 16af97dc5a mm: migrate: prevent racy access to tlb_flush_pending
Patch series "fixes of TLB batching races", v6.

It turns out that Linux TLB batching mechanism suffers from various
races.  Races that are caused due to batching during reclamation were
recently handled by Mel and this patch-set deals with others.  The more
fundamental issue is that concurrent updates of the page-tables allow
for TLB flushes to be batched on one core, while another core changes
the page-tables.  This other core may assume a PTE change does not
require a flush based on the updated PTE value, while it is unaware that
TLB flushes are still pending.

This behavior affects KSM (which may result in memory corruption) and
MADV_FREE and MADV_DONTNEED (which may result in incorrect behavior).  A
proof-of-concept can easily produce the wrong behavior of MADV_DONTNEED.
Memory corruption in KSM is harder to produce in practice, but was
observed by hacking the kernel and adding a delay before flushing and
replacing the KSM page.

Finally, there is also one memory barrier missing, which may affect
architectures with weak memory model.

This patch (of 7):

Setting and clearing mm->tlb_flush_pending can be performed by multiple
threads, since mmap_sem may only be acquired for read in
task_numa_work().  If this happens, tlb_flush_pending might be cleared
while one of the threads still changes PTEs and batches TLB flushes.

This can lead to the same race between migration and
change_protection_range() that led to the introduction of
tlb_flush_pending.  The result of this race was data corruption, which
means that this patch also addresses a theoretically possible data
corruption.

An actual data corruption was not observed, yet the race was was
confirmed by adding assertion to check tlb_flush_pending is not set by
two threads, adding artificial latency in change_protection_range() and
using sysctl to reduce kernel.numa_balancing_scan_delay_ms.

Link: http://lkml.kernel.org/r/20170802000818.4760-2-namit@vmware.com
Fixes: 2084140594 ("mm: fix TLB flush race between migration, and
change_protection_range")
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10 15:54:07 -07:00
Akinobu Mita 9eeb52ae71 fault-inject: fix wrong should_fail() decision in task context
Commit 1203c8e6fb ("fault-inject: simplify access check for fail-nth")
unintentionally broke a conditional statement in should_fail().  Any
faults are not injected in the task context by the change when the
systematic fault injection is not used.

This change restores to the previous correct behaviour.

Link: http://lkml.kernel.org/r/1501633700-3488-1-git-send-email-akinobu.mita@gmail.com
Fixes: 1203c8e6fb ("fault-inject: simplify access check for fail-nth")
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reported-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Tested-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10 15:54:06 -07:00
Dan Carpenter 4e98ebe5f4 test_kmod: fix small memory leak on filesystem tests
The break was in the wrong place so file system tests don't work as
intended, leaking memory at each test switch.

[mcgrof@kernel.org: massaged commit subject, noted memory leak issue without the fix]
Link: http://lkml.kernel.org/r/20170802211450.27928-6-mcgrof@kernel.org
Fixes: 39258f448d71 ("kmod: add test driver to stress test the module loader")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Reported-by: David Binderman <dcb314@hotmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10 15:54:06 -07:00
Dan Carpenter 9c56771316 test_kmod: fix the lock in register_test_dev_kmod()
We accidentally just drop the lock twice instead of taking it and then
releasing it.  This isn't a big issue unless you are adding more than
one device to test on, and the kmod.sh doesn't do that yet, however this
obviously is the correct thing to do.

[mcgrof@kernel.org: massaged subject, explain what happens]
Link: http://lkml.kernel.org/r/20170802211450.27928-5-mcgrof@kernel.org
Fixes: 39258f448d71 ("kmod: add test driver to stress test the module loader")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10 15:54:06 -07:00
Luis R. Rodriguez 434b06ae23 test_kmod: fix bug which allows negative values on two config options
Parsing with kstrtol() enables values to be negative, and we failed to
check for negative values when parsing with test_dev_config_update_uint_sync()
or test_dev_config_update_uint_range().

test_dev_config_update_uint_range() has a minimum check though so an
issue is not present there.  test_dev_config_update_uint_sync() is only
used for the number of threads to use (config_num_threads_store()), and
indeed this would fail with an attempt for a large allocation.

Although the issue is only present in practice with the first fix both
by using kstrtoul() instead of kstrtol().

Link: http://lkml.kernel.org/r/20170802211450.27928-4-mcgrof@kernel.org
Fixes: 39258f448d71 ("kmod: add test driver to stress test the module loader")
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10 15:54:06 -07:00
Colin Ian King a4afe8cdec test_kmod: fix spelling mistake: "EMTPY" -> "EMPTY"
Trivial fix to spelling mistake in snprintf text

[mcgrof@kernel.org: massaged commit message]
Link: http://lkml.kernel.org/r/20170802211450.27928-3-mcgrof@kernel.org
Fixes: 39258f448d71 ("kmod: add test driver to stress test the module loader")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michal Marek <mmarek@suse.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Binderman <dcb314@hotmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10 15:54:06 -07:00
Andrea Arcangeli 5af10dfd0a userfaultfd: hugetlbfs: remove superfluous page unlock in VM_SHARED case
huge_add_to_page_cache->add_to_page_cache implicitly unlocks the page
before returning in case of errors.

The error returned was -EEXIST by running UFFDIO_COPY on a non-hole
offset of a VM_SHARED hugetlbfs mapping.  It was an userland bug that
triggered it and the kernel must cope with it returning -EEXIST from
ioctl(UFFDIO_COPY) as expected.

  page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
  kernel BUG at mm/filemap.c:964!
  invalid opcode: 0000 [#1] SMP
  CPU: 1 PID: 22582 Comm: qemu-system-x86 Not tainted 4.11.11-300.fc26.x86_64 #1
  RIP: unlock_page+0x4a/0x50
  Call Trace:
    hugetlb_mcopy_atomic_pte+0xc0/0x320
    mcopy_atomic+0x96f/0xbe0
    userfaultfd_ioctl+0x218/0xe90
    do_vfs_ioctl+0xa5/0x600
    SyS_ioctl+0x79/0x90
    entry_SYSCALL_64_fastpath+0x1a/0xa9

Link: http://lkml.kernel.org/r/20170802165145.22628-2-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Alexey Perevalov <a.perevalov@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10 15:54:06 -07:00
Jonathan Toppins 75dddef325 mm: ratelimit PFNs busy info message
The RDMA subsystem can generate several thousand of these messages per
second eventually leading to a kernel crash.  Ratelimit these messages
to prevent this crash.

Doug said:
 "I've been carrying a version of this for several kernel versions. I
  don't remember when they started, but we have one (and only one) class
  of machines: Dell PE R730xd, that generate these errors. When it
  happens, without a rate limit, we get rcu timeouts and kernel oopses.
  With the rate limit, we just get a lot of annoying kernel messages but
  the machine continues on, recovers, and eventually the memory
  operations all succeed"

And:
 "> Well... why are all these EBUSY's occurring? It sounds inefficient
  > (at least) but if it is expected, normal and unavoidable then
  > perhaps we should just remove that message altogether?

  I don't have an answer to that question. To be honest, I haven't
  looked real hard. We never had this at all, then it started out of the
  blue, but only on our Dell 730xd machines (and it hits all of them),
  but no other classes or brands of machines. And we have our 730xd
  machines loaded up with different brands and models of cards (for
  instance one dedicated to mlx4 hardware, one for qib, one for mlx5, an
  ocrdma/cxgb4 combo, etc), so the fact that it hit all of the machines
  meant it wasn't tied to any particular brand/model of RDMA hardware.
  To me, it always smelled of a hardware oddity specific to maybe the
  CPUs or mainboard chipsets in these machines, so given that I'm not an
  mm expert anyway, I never chased it down.

  A few other relevant details: it showed up somewhere around 4.8/4.9 or
  thereabouts. It never happened before, but the prinkt has been there
  since the 3.18 days, so possibly the test to trigger this message was
  changed, or something else in the allocator changed such that the
  situation started happening on these machines?

  And, like I said, it is specific to our 730xd machines (but they are
  all identical, so that could mean it's something like their specific
  ram configuration is causing the allocator to hit this on these
  machine but not on other machines in the cluster, I don't want to say
  it's necessarily the model of chipset or CPU, there are other bits of
  identicalness between these machines)"

Link: http://lkml.kernel.org/r/499c0f6cc10d6eb829a67f2a4d75b4228a9b356e.1501695897.git.jtoppins@redhat.com
Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
Reviewed-by: Doug Ledford <dledford@redhat.com>
Tested-by: Doug Ledford <dledford@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10 15:54:06 -07:00