Commit Graph

14253 Commits

Author SHA1 Message Date
Andrey Zhizhikin e9646ca701 This is the 5.4.132 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmDu+p0ACgkQONu9yGCS
 aT5SOw/9F58e4gz7PSTn4A9oCTNodRPe9B9rzf3y1Ol0k7T1aeQoWsPFOkZpNSOJ
 tdOGEXnwYnLpMC7nuFshWv1uKGAL/weHADyGV6J37AntYFjpEFhJhSH7pGGhDk7V
 EeIl98luBynPXOKNnDvcrQweeRaHKOInQBT8JJzwwsZbF2oqfOqdU0A787BiRu+3
 zoi/mV0upDB443ji/JY0xj+o4jlbsuD0WxEqgkcD2YHL+QvU5Wr0mGys7m5gG9x7
 TpKpMic0ILrF1vt/znLL5rOlX497prTvZ74ZXV/DYizeYxqtl/UG3CZjo1uf2yqk
 pAXA57paz6DY2Ct+3QbJBeuer27bTz6SCClSS1om9AcUk6oNSdULmMdTGvQb0SLU
 wx1Cy8b2ei04SVl96+McKKZ6ln47LJediGn0qIdwC6O/XHHrLq4u5PkSnQxRU4pA
 GH1tP5oYy4GzL9RbBeiDJQETFiXwkexSEWVyuSc6BhqQXao9yVzmLQbL1zgjH/zO
 m/tckZ3vEg+ll8j4QJCisHRyqYhwfru4PsJQH9Q7q6CtIuGOsd0Z/OUcLuF6knXg
 jDOrDIykE/PnkQ2Dc2RhdONP1ud5j3oBnHvNHs6FDghRKjaixMQzg3g/RNtnAaTj
 +7Xsfbi6ntpZSDOaY7YNgt+ZH3l4YRnUL/xBA6qIygayz374nzI=
 =LU0G
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmD25ikACgkQ7G51OISz
 Hs1dcw/8DKef1hGC5O2WKfpInTYtgnClkyD5/yOnGAPMvMRDGybA3dRejpIEefNM
 Qol1XICjb0wdBDV0I+n+fnGbBgaX3g0N/pn16pdbbSPBBe1L+d97gZNZznDGHYZu
 033qtbxii8e0QTxTvO7nx4L80ZZsyPLchpPxowS/vd1Ezti+pTIU4y43MCc2jYLL
 KqUBDz72TkPLhgVZdDJ1z9gb+OoJ+sJPaeBrO57hpY/os9SxlMPeY56YrD3Hyfy5
 IHZw3bTCDiIXpHBaJG8fvuudaM5M8V3dbD6oXnEPo1Gzb1Y7WR4Z7q28g7arSYjP
 fMPd243mCXd1V7LpmapxXvFsnbdsA7oauTho50dwmEvxQf9jEgX6thBWAFsrItaS
 crHdOppS7Lc3FK8cTMxZd6ZyZpaU6sF183tMOteuhtwmF/uoy1LBHqLnAvtfWYrl
 InGcImgABRkiYBRyODlgC4UNLd49Svon/8HcbBZlmeGIkosXjo5r1itnipgnF/TB
 /NkHRkixYTBCnJZyx+9Lihqw+HMnHVfjOnIBjbXjzX9ITH/tiMn4y87E+x9vRQqr
 Td5AKJwiSXSWZBQoX+XNLqXRwjZKHVQe45J4gzL9dhCzi9bwK99BvBPWr8+JyI7w
 83YQfkhPju47+KFrEN6DUBxdYrROsJLsjgdTl38IlCi4SKoQSkE=
 =Li9I
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.132' into 5.4-2.3.x-imx

This is the 5.4.132 stable release

Conflicts (manual resolve):
- drivers/gpu/drm/rockchip/cdn-dp-core.c:
Fix merge hiccup when integrating upstream commit 450c25b8a4
("drm/rockchip: cdn-dp-core: add missing clk_disable_unprepare() on
error in cdn_dp_grf_write()")

- drivers/perf/fsl_imx8_ddr_perf.c:
Port upstream commit 3fea9b708a ("drivers/perf: fix the missed
ida_simple_remove() in ddr_perf_probe()") manually to NXP version.

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-07-20 15:04:13 +00:00
Miaohe Lin 49496327c2 mm/z3fold: fix potential memory leak in z3fold_destroy_pool()
[ Upstream commit dac0d1cfda56472378d330b1b76b9973557a7b1d ]

There is a memory leak in z3fold_destroy_pool() as it forgets to
free_percpu pool->unbuddied.  Call free_percpu for pool->unbuddied to fix
this issue.

Link: https://lkml.kernel.org/r/20210619093151.1492174-6-linmiaohe@huawei.com
Fixes: d30561c56f ("z3fold: use per-cpu unbuddied lists")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Hillf Danton <hdanton@sina.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:53:47 +02:00
Miaohe Lin 4b515fa948 mm/huge_memory.c: don't discard hugepage if other processes are mapping it
[ Upstream commit babbbdd08af98a59089334eb3effbed5a7a0cf7f ]

If other processes are mapping any other subpages of the hugepage, i.e.
in pte-mapped thp case, page_mapcount() will return 1 incorrectly.  Then
we would discard the page while other processes are still mapping it.  Fix
it by using total_mapcount() which can tell whether other processes are
still mapping it.

Link: https://lkml.kernel.org/r/20210511134857.1581273-6-linmiaohe@huawei.com
Fixes: b8d3c4c300 ("mm/huge_memory.c: don't split THP page when MADV_FREE syscall is called")
Reviewed-by: Yang Shi <shy828301@gmail.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:53:47 +02:00
Andrey Zhizhikin 3aef3ef268 Linux 5.4.129
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAmDcbxkACgkQ3qZv95d3
 LNxZMBAArNPLhVYdEDDFosb6Y/5RGjjZ/79OGHH0p5YiTo8D+wBHi+wXRl5Jp0PA
 3YVVU8lDTbeDm7E7uWeduWjFwEpsPBL8395scbhC6VR3PfnyunjarVXZgi6EHnMl
 p6HjXXtQ1jTrdDSziGDIhZVQT5FGb2/MMx9m69mfi5BTLjGfWy8chHFbC2GZszlp
 Znu9syjisUBbc4I4XHFgXw0hoQSSig6SUTZCrdTpIW/PZ0swfl8ZPxREh0CZNMpw
 Y2orRt+oHlkWPw1/sSkoTE1PRvXwNWFXyw5caOu846jAfhKtxO54SsqJqhM7VLHZ
 pdH4eb6q7AFyt0A62HkIqa5oabs5Vk9G24b8m5ggc2F/UTkHqgwUcMCud0d3DYL0
 Q7OEAmThQzHHKJ+CeNRJLsiKqVBNHmeS24B+ELldlAiX22vLr9pUsIb342Au1ZjR
 S3BTnneAbYGBv4qUoV2yUF9wQ/LxsFMSl/vmjCBOxg7c3LbKYChUwskYnvd6EwWj
 ObCyLU6FK9HWXSBSp/X+irlF1CLla+HuOC+Aej2U5a8DtmHId4LHMeq/XOxZ9s/8
 QUoX4rh5P+TJ8PIiTqXKrQo5rnR79MiYssIhUozKTdt9ZoMtXzI4mVLXN/yzAVD9
 v4aWYx8m2x17Wq+ptaLMSTSed4m3c25uEl4MucLBmKQV8ClAxW8=
 =Sijo
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmDchOoACgkQ7G51OISz
 Hs3s9Q/+OfeIcTzOY6qDX53PSxK5iF3xZEiPR5HOSVu1cY4+1tyt0bkYdQJ2N//d
 NH6YCuOsVI5chRxKN8b6I01U6xxQ0VNvj5RYmuC0AqhhzJBpfhh6FrrGuVxP+dCj
 aa2twfRZO+Y0yzrIYdSvYdRhJlEhMbKGNyHu871tyQQTlp0eQKy9lBhcCuVSCMso
 p64lRlK35GDi6TaWQ7R0SYOziWWSHByf4p3h4ZlpyhB7w6yMNjNrts4qSoZb2uOY
 I453WroS7tWV3qkOZ9FeqjHyg4mjBg87/wSMZ0DbBANinlQZW5YTOjE/WyPossL3
 C4jTP8NBLsng+ZfhAaMRfbvxGj5gbY0ghSOmsuglAOQDd+tlTfB23pkgHFlj6aie
 FUplJnTLd8VXPWkoNVKH6GfG2rZAUtQKM4EWDPt2KK3Dch9W3YH1bHL9EyxR0G7f
 aDBHM9egTyPOYNcfeVDj0KEJTrOxkmCcYz83D+eEDGKHi//Gcb+k9Gsg6q8X6Yxa
 ejFx5RBM5SW57LA70ZK3+HWhqfqcILYJT7Lxoue7bH+15vSGnTkmJKS80OUrP8w8
 y9gQ9WNsmjQUHvtKPAIZUZYlUS05FhtQ+faC8XyrLnfZj+ZCGBZor/KsNe0SP/UN
 bAYudfCn0YkjGfTNcljOe4oAnwmnVKAjGlQkHOFw+llmM0czEBk=
 =vpHo
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.129' into 5.4-2.3.x-imx

Linux 5.4.129

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-06-30 14:51:20 +00:00
Hugh Dickins 61168eafe0 mm, futex: fix shared futex pgoff on shmem huge page
[ Upstream commit fe19bd3dae3d15d2fbfdb3de8839a6ea0fe94264 ]

If more than one futex is placed on a shmem huge page, it can happen
that waking the second wakes the first instead, and leaves the second
waiting: the key's shared.pgoff is wrong.

When 3.11 commit 13d60f4b6a ("futex: Take hugepages into account when
generating futex_key"), the only shared huge pages came from hugetlbfs,
and the code added to deal with its exceptional page->index was put into
hugetlb source.  Then that was missed when 4.8 added shmem huge pages.

page_to_pgoff() is what others use for this nowadays: except that, as
currently written, it gives the right answer on hugetlbfs head, but
nonsense on hugetlbfs tails.  Fix that by calling hugetlbfs-specific
hugetlb_basepage_index() on PageHuge tails as well as on head.

Yes, it's unconventional to declare hugetlb_basepage_index() there in
pagemap.h, rather than in hugetlb.h; but I do not expect anything but
page_to_pgoff() ever to need it.

[akpm@linux-foundation.org: give hugetlb_basepage_index() prototype the correct scope]

Link: https://lkml.kernel.org/r/b17d946b-d09-326e-b42a-52884c36df32@google.com
Fixes: 800d8c63b2 ("shmem: add huge pages support")
Reported-by: Neel Natu <neelnatu@google.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Zhang Yi <wetpzy@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Note on stable backport: leave redundant #include <linux/hugetlb.h>
in kernel/futex.c, to avoid conflict over the header files included.

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30 08:47:55 -04:00
Hugh Dickins a33b70d625 mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk()
Aha! Shouldn't that quick scan over pte_none()s make sure that it holds
ptlock in the PVMW_SYNC case? That too might have been responsible for
BUGs or WARNs in split_huge_page_to_list() or its unmap_page(), though
I've never seen any.

Link: https://lkml.kernel.org/r/1bdf384c-8137-a149-2a1e-475a4791c3c@google.com
Link: https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/
Fixes: ace71a19ce ("mm: introduce page_vma_mapped_walk()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Wang Yugui <wangyugui@e16-tech.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30 08:47:55 -04:00
Hugh Dickins e045e9e79d mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes
Running certain tests with a DEBUG_VM kernel would crash within hours,
on the total_mapcount BUG() in split_huge_page_to_list(), while trying
to free up some memory by punching a hole in a shmem huge page: split's
try_to_unmap() was unable to find all the mappings of the page (which,
on a !DEBUG_VM kernel, would then keep the huge page pinned in memory).

Crash dumps showed two tail pages of a shmem huge page remained mapped
by pte: ptes in a non-huge-aligned vma of a gVisor process, at the end
of a long unmapped range; and no page table had yet been allocated for
the head of the huge page to be mapped into.

Although designed to handle these odd misaligned huge-page-mapped-by-pte
cases, page_vma_mapped_walk() falls short by returning false prematurely
when !pmd_present or !pud_present or !p4d_present or !pgd_present: there
are cases when a huge page may span the boundary, with ptes present in
the next.

Restructure page_vma_mapped_walk() as a loop to continue in these cases,
while keeping its layout much as before.  Add a step_forward() helper to
advance pvmw->address across those boundaries: originally I tried to use
mm's standard p?d_addr_end() macros, but hit the same crash 512 times
less often: because of the way redundant levels are folded together, but
folded differently in different configurations, it was just too
difficult to use them correctly; and step_forward() is simpler anyway.

Link: https://lkml.kernel.org/r/fedb8632-1798-de42-f39e-873551d5bc81@google.com
Fixes: ace71a19ce ("mm: introduce page_vma_mapped_walk()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Wang Yugui <wangyugui@e16-tech.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30 08:47:54 -04:00
Hugh Dickins 037a1d67d2 mm: page_vma_mapped_walk(): get vma_address_end() earlier
page_vma_mapped_walk() cleanup: get THP's vma_address_end() at the
start, rather than later at next_pte.

It's a little unnecessary overhead on the first call, but makes for a
simpler loop in the following commit.

Link: https://lkml.kernel.org/r/4542b34d-862f-7cb4-bb22-e0df6ce830a2@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Wang Yugui <wangyugui@e16-tech.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30 08:47:54 -04:00
Hugh Dickins fa89d53694 mm: page_vma_mapped_walk(): use goto instead of while (1)
page_vma_mapped_walk() cleanup: add a label this_pte, matching next_pte,
and use "goto this_pte", in place of the "while (1)" loop at the end.

Link: https://lkml.kernel.org/r/a52b234a-851-3616-2525-f42736e8934@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Wang Yugui <wangyugui@e16-tech.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30 08:47:54 -04:00
Hugh Dickins a499febd99 mm: page_vma_mapped_walk(): add a level of indentation
page_vma_mapped_walk() cleanup: add a level of indentation to much of
the body, making no functional change in this commit, but reducing the
later diff when this is all converted to a loop.

[hughd@google.com: : page_vma_mapped_walk(): add a level of indentation fix]
  Link: https://lkml.kernel.org/r/7f817555-3ce1-c785-e438-87d8efdcaf26@google.com

Link: https://lkml.kernel.org/r/efde211-f3e2-fe54-977-ef481419e7f3@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Wang Yugui <wangyugui@e16-tech.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30 08:47:54 -04:00
Hugh Dickins b1783bf8c8 mm: page_vma_mapped_walk(): crossing page table boundary
page_vma_mapped_walk() cleanup: adjust the test for crossing page table
boundary - I believe pvmw->address is always page-aligned, but nothing
else here assumed that; and remember to reset pvmw->pte to NULL after
unmapping the page table, though I never saw any bug from that.

Link: https://lkml.kernel.org/r/799b3f9c-2a9e-dfef-5d89-26e9f76fd97@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Wang Yugui <wangyugui@e16-tech.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30 08:47:54 -04:00
Hugh Dickins 80b2270a14 mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block
page_vma_mapped_walk() cleanup: rearrange the !pmd_present() block to
follow the same "return not_found, return not_found, return true"
pattern as the block above it (note: returning not_found there is never
premature, since existence or prior existence of huge pmd guarantees
good alignment).

Link: https://lkml.kernel.org/r/378c8650-1488-2edf-9647-32a53cf2e21@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Wang Yugui <wangyugui@e16-tech.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30 08:47:54 -04:00
Hugh Dickins ef161ccaca mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd
page_vma_mapped_walk() cleanup: re-evaluate pmde after taking lock, then
use it in subsequent tests, instead of repeatedly dereferencing pointer.

Link: https://lkml.kernel.org/r/53fbc9d-891e-46b2-cb4b-468c3b19238e@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Wang Yugui <wangyugui@e16-tech.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30 08:47:53 -04:00
Hugh Dickins 4961160272 mm: page_vma_mapped_walk(): settle PageHuge on entry
page_vma_mapped_walk() cleanup: get the hugetlbfs PageHuge case out of
the way at the start, so no need to worry about it later.

Link: https://lkml.kernel.org/r/e31a483c-6d73-a6bb-26c5-43c3b880a2@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Wang Yugui <wangyugui@e16-tech.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30 08:47:53 -04:00
Hugh Dickins 52e2b20fb5 mm: page_vma_mapped_walk(): use page for pvmw->page
Patch series "mm: page_vma_mapped_walk() cleanup and THP fixes".

I've marked all of these for stable: many are merely cleanups, but I
think they are much better before the main fix than after.

This patch (of 11):

page_vma_mapped_walk() cleanup: sometimes the local copy of pvwm->page
was used, sometimes pvmw->page itself: use the local copy "page"
throughout.

Link: https://lkml.kernel.org/r/589b358c-febc-c88e-d4c2-7834b37fa7bf@google.com
Link: https://lkml.kernel.org/r/88e67645-f467-c279-bf5e-af4b5c6b13eb@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Wang Yugui <wangyugui@e16-tech.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30 08:47:53 -04:00
Yang Shi 82ee7326af mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split
[ Upstream commit 504e070dc08f757bccaed6d05c0f53ecbfac8a23 ]

When debugging the bug reported by Wang Yugui [1], try_to_unmap() may
fail, but the first VM_BUG_ON_PAGE() just checks page_mapcount() however
it may miss the failure when head page is unmapped but other subpage is
mapped.  Then the second DEBUG_VM BUG() that check total mapcount would
catch it.  This may incur some confusion.

As this is not a fatal issue, so consolidate the two DEBUG_VM checks
into one VM_WARN_ON_ONCE_PAGE().

[1] https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/

Link: https://lkml.kernel.org/r/d0f0db68-98b8-ebfb-16dc-f29df24cf012@google.com
Signed-off-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jue Wang <juew@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Wang Yugui <wangyugui@e16-tech.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Note on stable backport: fixed up variables in split_huge_page_to_list(),
and fixed up the conflict on ttu_flags in unmap_page().

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30 08:47:53 -04:00
Hugh Dickins bd43892152 mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page()
[ Upstream commit 22061a1ffabdb9c3385de159c5db7aac3a4df1cc ]

There is a race between THP unmapping and truncation, when truncate sees
pmd_none() and skips the entry, after munmap's zap_huge_pmd() cleared
it, but before its page_remove_rmap() gets to decrement
compound_mapcount: generating false "BUG: Bad page cache" reports that
the page is still mapped when deleted.  This commit fixes that, but not
in the way I hoped.

The first attempt used try_to_unmap(page, TTU_SYNC|TTU_IGNORE_MLOCK)
instead of unmap_mapping_range() in truncate_cleanup_page(): it has
often been an annoyance that we usually call unmap_mapping_range() with
no pages locked, but there apply it to a single locked page.
try_to_unmap() looks more suitable for a single locked page.

However, try_to_unmap_one() contains a VM_BUG_ON_PAGE(!pvmw.pte,page):
it is used to insert THP migration entries, but not used to unmap THPs.
Copy zap_huge_pmd() and add THP handling now? Perhaps, but their TLB
needs are different, I'm too ignorant of the DAX cases, and couldn't
decide how far to go for anon+swap.  Set that aside.

The second attempt took a different tack: make no change in truncate.c,
but modify zap_huge_pmd() to insert an invalidated huge pmd instead of
clearing it initially, then pmd_clear() between page_remove_rmap() and
unlocking at the end.  Nice.  But powerpc blows that approach out of the
water, with its serialize_against_pte_lookup(), and interesting pgtable
usage.  It would need serious help to get working on powerpc (with a
minor optimization issue on s390 too).  Set that aside.

Just add an "if (page_mapped(page)) synchronize_rcu();" or other such
delay, after unmapping in truncate_cleanup_page()? Perhaps, but though
that's likely to reduce or eliminate the number of incidents, it would
give less assurance of whether we had identified the problem correctly.

This successful iteration introduces "unmap_mapping_page(page)" instead
of try_to_unmap(), and goes the usual unmap_mapping_range_tree() route,
with an addition to details.  Then zap_pmd_range() watches for this
case, and does spin_unlock(pmd_lock) if so - just like
page_vma_mapped_walk() now does in the PVMW_SYNC case.  Not pretty, but
safe.

Note that unmap_mapping_page() is doing a VM_BUG_ON(!PageLocked) to
assert its interface; but currently that's only used to make sure that
page->mapping is stable, and zap_pmd_range() doesn't care if the page is
locked or not.  Along these lines, in invalidate_inode_pages2_range()
move the initial unmap_mapping_range() out from under page lock, before
then calling unmap_mapping_page() under page lock if still mapped.

Link: https://lkml.kernel.org/r/a2a4a148-cdd8-942c-4ef8-51b77f643dbe@google.com
Fixes: fc127da085 ("truncate: handle file thp")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jue Wang <juew@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Wang Yugui <wangyugui@e16-tech.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Note on stable backport: fixed up call to truncate_cleanup_page()
in truncate_inode_pages_range().  Use hpage_nr_pages() in
unmap_mapping_page().

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30 08:47:53 -04:00
Jue Wang b767134ec3 mm/thp: fix page_address_in_vma() on file THP tails
commit 31657170deaf1d8d2f6a1955fbc6fa9d228be036 upstream.

Anon THP tails were already supported, but memory-failure may need to
use page_address_in_vma() on file THP tails, which its page->mapping
check did not permit: fix it.

hughd adds: no current usage is known to hit the issue, but this does
fix a subtle trap in a general helper: best fixed in stable sooner than
later.

Link: https://lkml.kernel.org/r/a0d9b53-bf5d-8bab-ac5-759dc61819c1@google.com
Fixes: 800d8c63b2 ("shmem: add huge pages support")
Signed-off-by: Jue Wang <juew@google.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Wang Yugui <wangyugui@e16-tech.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30 08:47:53 -04:00
Hugh Dickins 41432a8a67 mm/thp: fix vma_address() if virtual address below file offset
[ Upstream commit 494334e43c16d63b878536a26505397fce6ff3a2 ]

Running certain tests with a DEBUG_VM kernel would crash within hours,
on the total_mapcount BUG() in split_huge_page_to_list(), while trying
to free up some memory by punching a hole in a shmem huge page: split's
try_to_unmap() was unable to find all the mappings of the page (which,
on a !DEBUG_VM kernel, would then keep the huge page pinned in memory).

When that BUG() was changed to a WARN(), it would later crash on the
VM_BUG_ON_VMA(end < vma->vm_start || start >= vma->vm_end, vma) in
mm/internal.h:vma_address(), used by rmap_walk_file() for
try_to_unmap().

vma_address() is usually correct, but there's a wraparound case when the
vm_start address is unusually low, but vm_pgoff not so low:
vma_address() chooses max(start, vma->vm_start), but that decides on the
wrong address, because start has become almost ULONG_MAX.

Rewrite vma_address() to be more careful about vm_pgoff; move the
VM_BUG_ON_VMA() out of it, returning -EFAULT for errors, so that it can
be safely used from page_mapped_in_vma() and page_address_in_vma() too.

Add vma_address_end() to apply similar care to end address calculation,
in page_vma_mapped_walk() and page_mkclean_one() and try_to_unmap_one();
though it raises a question of whether callers would do better to supply
pvmw->end to page_vma_mapped_walk() - I chose not, for a smaller patch.

An irritation is that their apparent generality breaks down on KSM
pages, which cannot be located by the page->index that page_to_pgoff()
uses: as commit 4b0ece6fa0 ("mm: migrate: fix remove_migration_pte()
for ksm pages") once discovered.  I dithered over the best thing to do
about that, and have ended up with a VM_BUG_ON_PAGE(PageKsm) in both
vma_address() and vma_address_end(); though the only place in danger of
using it on them was try_to_unmap_one().

Sidenote: vma_address() and vma_address_end() now use compound_nr() on a
head page, instead of thp_size(): to make the right calculation on a
hugetlbfs page, whether or not THPs are configured.  try_to_unmap() is
used on hugetlbfs pages, but perhaps the wrong calculation never
mattered.

Link: https://lkml.kernel.org/r/caf1c1a3-7cfb-7f8f-1beb-ba816e932825@google.com
Fixes: a8fa41ad2f ("mm, rmap: check all VMAs that PTE-mapped THP can be part of")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jue Wang <juew@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Wang Yugui <wangyugui@e16-tech.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Note on stable backport: fixed up conflicts on intervening thp_size().

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30 08:47:52 -04:00
Hugh Dickins 4b0a34e222 mm/thp: try_to_unmap() use TTU_SYNC for safe splitting
[ Upstream commit 732ed55823fc3ad998d43b86bf771887bcc5ec67 ]

Stressing huge tmpfs often crashed on unmap_page()'s VM_BUG_ON_PAGE
(!unmap_success): with dump_page() showing mapcount:1, but then its raw
struct page output showing _mapcount ffffffff i.e.  mapcount 0.

And even if that particular VM_BUG_ON_PAGE(!unmap_success) is removed,
it is immediately followed by a VM_BUG_ON_PAGE(compound_mapcount(head)),
and further down an IS_ENABLED(CONFIG_DEBUG_VM) total_mapcount BUG():
all indicative of some mapcount difficulty in development here perhaps.
But the !CONFIG_DEBUG_VM path handles the failures correctly and
silently.

I believe the problem is that once a racing unmap has cleared pte or
pmd, try_to_unmap_one() may skip taking the page table lock, and emerge
from try_to_unmap() before the racing task has reached decrementing
mapcount.

Instead of abandoning the unsafe VM_BUG_ON_PAGE(), and the ones that
follow, use PVMW_SYNC in try_to_unmap_one() in this case: adding
TTU_SYNC to the options, and passing that from unmap_page().

When CONFIG_DEBUG_VM, or for non-debug too? Consensus is to do the same
for both: the slight overhead added should rarely matter, except perhaps
if splitting sparsely-populated multiply-mapped shmem.  Once confident
that bugs are fixed, TTU_SYNC here can be removed, and the race
tolerated.

Link: https://lkml.kernel.org/r/c1e95853-8bcd-d8fd-55fa-e7f2488e78f@google.com
Fixes: fec89c109f ("thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jue Wang <juew@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Wang Yugui <wangyugui@e16-tech.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Note on stable backport: upstream TTU_SYNC 0x10 takes the value which
5.11 commit 013339df116c ("mm/rmap: always do TTU_IGNORE_ACCESS") freed.
It is very tempting to backport that commit (as 5.10 already did) and
make no change here; but on reflection, good as that commit is, I'm
reluctant to include any possible side-effect of it in this series.

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30 08:47:52 -04:00
Hugh Dickins bd092a0f19 mm/thp: make is_huge_zero_pmd() safe and quicker
commit 3b77e8c8cde581dadab9a0f1543a347e24315f11 upstream.

Most callers of is_huge_zero_pmd() supply a pmd already verified
present; but a few (notably zap_huge_pmd()) do not - it might be a pmd
migration entry, in which the pfn is encoded differently from a present
pmd: which might pass the is_huge_zero_pmd() test (though not on x86,
since L1TF forced us to protect against that); or perhaps even crash in
pmd_page() applied to a swap-like entry.

Make it safe by adding pmd_present() check into is_huge_zero_pmd()
itself; and make it quicker by saving huge_zero_pfn, so that
is_huge_zero_pmd() will not need to do that pmd_page() lookup each time.

__split_huge_pmd_locked() checked pmd_trans_huge() before: that worked,
but is unnecessary now that is_huge_zero_pmd() checks present.

Link: https://lkml.kernel.org/r/21ea9ca-a1f5-8b90-5e88-95fb1c49bbfa@google.com
Fixes: e71769ae52 ("mm: enable thp migration for shmem thp")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jue Wang <juew@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Wang Yugui <wangyugui@e16-tech.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30 08:47:52 -04:00
Hugh Dickins 4c37d7f269 mm/thp: fix __split_huge_pmd_locked() on shmem migration entry
[ Upstream commit 99fa8a48203d62b3743d866fc48ef6abaee682be ]

Patch series "mm/thp: fix THP splitting unmap BUGs and related", v10.

Here is v2 batch of long-standing THP bug fixes that I had not got
around to sending before, but prompted now by Wang Yugui's report
https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/

Wang Yugui has tested a rollup of these fixes applied to 5.10.39, and
they have done no harm, but have *not* fixed that issue: something more
is needed and I have no idea of what.

This patch (of 7):

Stressing huge tmpfs page migration racing hole punch often crashed on
the VM_BUG_ON(!pmd_present) in pmdp_huge_clear_flush(), with DEBUG_VM=y
kernel; or shortly afterwards, on a bad dereference in
__split_huge_pmd_locked() when DEBUG_VM=n.  They forgot to allow for pmd
migration entries in the non-anonymous case.

Full disclosure: those particular experiments were on a kernel with more
relaxed mmap_lock and i_mmap_rwsem locking, and were not repeated on the
vanilla kernel: it is conceivable that stricter locking happens to avoid
those cases, or makes them less likely; but __split_huge_pmd_locked()
already allowed for pmd migration entries when handling anonymous THPs,
so this commit brings the shmem and file THP handling into line.

And while there: use old_pmd rather than _pmd, as in the following
blocks; and make it clearer to the eye that the !vma_is_anonymous()
block is self-contained, making an early return after accounting for
unmapping.

Link: https://lkml.kernel.org/r/af88612-1473-2eaa-903-8d1a448b26@google.com
Link: https://lkml.kernel.org/r/dd221a99-efb3-cd1d-6256-7e646af29314@google.com
Fixes: e71769ae52 ("mm: enable thp migration for shmem thp")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Wang Yugui <wangyugui@e16-tech.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jue Wang <juew@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Note on stable backport: this commit made intervening cleanups in
pmdp_huge_clear_flush() redundant: here it's rediffed to skip them.

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30 08:47:52 -04:00
Xu Yu 7ce4b73d34 mm, thp: use head page in __migration_entry_wait()
commit ffc90cbb2970ab88b66ea51dd580469eede57b67 upstream.

We notice that hung task happens in a corner but practical scenario when
CONFIG_PREEMPT_NONE is enabled, as follows.

Process 0                       Process 1                     Process 2..Inf
split_huge_page_to_list
    unmap_page
        split_huge_pmd_address
                                __migration_entry_wait(head)
                                                              __migration_entry_wait(tail)
    remap_page (roll back)
        remove_migration_ptes
            rmap_walk_anon
                cond_resched

Where __migration_entry_wait(tail) is occurred in kernel space, e.g.,
copy_to_user in fstat, which will immediately fault again without
rescheduling, and thus occupy the cpu fully.

When there are too many processes performing __migration_entry_wait on
tail page, remap_page will never be done after cond_resched.

This makes __migration_entry_wait operate on the compound head page,
thus waits for remap_page to complete, whether the THP is split
successfully or roll back.

Note that put_and_wait_on_page_locked helps to drop the page reference
acquired with get_page_unless_zero, as soon as the page is on the wait
queue, before actually waiting.  So splitting the THP is only prevented
for a brief interval.

Link: https://lkml.kernel.org/r/b9836c1dd522e903891760af9f0c86a2cce987eb.1623144009.git.xuyu@linux.alibaba.com
Fixes: ba98828088 ("thp: add option to setup migration entries during PMD split")
Suggested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Gang Deng <gavin.dg@linux.alibaba.com>
Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30 08:47:52 -04:00
Miaohe Lin 68ce37ebe0 mm/rmap: use page_not_mapped in try_to_unmap()
[ Upstream commit b7e188ec98b1644ff70a6d3624ea16aadc39f5e0 ]

page_mapcount_is_zero() calculates accurately how many mappings a hugepage
has in order to check against 0 only.  This is a waste of cpu time.  We
can do this via page_not_mapped() to save some possible atomic_read
cycles.  Remove the function page_mapcount_is_zero() as it's not used
anymore and move page_not_mapped() above try_to_unmap() to avoid
identifier undeclared compilation error.

Link: https://lkml.kernel.org/r/20210130084904.35307-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30 08:47:52 -04:00
Miaohe Lin 432b61863a mm/rmap: remove unneeded semicolon in page_not_mapped()
[ Upstream commit e0af87ff7afcde2660be44302836d2d5618185af ]

Remove extra semicolon without any functional change intended.

Link: https://lkml.kernel.org/r/20210127093425.39640-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30 08:47:51 -04:00
Andrey Zhizhikin db8ff65069 This is the 5.4.128 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmDTLA4ACgkQONu9yGCS
 aT45xg/+IvxFaIOtutEBkFCJvEurRWSozjBKAfX9xtJQGSSKVyDvh7GZWfEXMxZc
 oNf8DWQKvaiZj2mRdgYp6Ilo27Ps6aN3vCo09z+U3mfGQLMbNpPYEvSq6Twl26NB
 8lL8b++0Jo7P+eOALHohBS125/E0etqhoc2HXDFp6pfksj6J7klxlyQ2NX9Ih8xm
 l7Cto5flCHM9g20/CNsqxXPWiuBKnzSvp9YH9HMDgjOV6YSktLGTHAJ8omjPm0V/
 pQVFOo4Kyx34exdA/IzrM/yV4iDThVtwL6+bNErWtl6LwiIcNK3esARYTNjbBBhK
 W156adxp6kl6LqMADr/y77WqvcH6H2PhpRnMj+6t21FpK7cTbXfqvxBfpOvE1Buh
 in95LJN1Iins1PTozBVHcUIpdESO5AN8/2aHq0LRLmVbaLlo6aj+sjdHNPvf7HwW
 8LDHtpGNao/spMuZmvvH+6i3iwuciINCRY9TVBDgkT5LhWhRHBl6+uSLEX/d+s3Z
 663Q6HPu+cfubR7UC8+QsMMtf7KD2yvQuadAz6n/Z41vvSYIUHPGsYtZUmsef3jP
 n4CTAmGavtyR5jaQNkuw8nnIn7cthONw94foFheBH0doxmkXPKcwqmWO9DH77n58
 unMT31ArVg9ObrO/YmLjEaV9X7VlfRf6yw7tey1RJXgrSD3nwgk=
 =9+GF
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmDTTjIACgkQ7G51OISz
 Hs1mJw/5AaIlIv+kMC0qP9aG5Y1uoUQ+fewr4Ofo9Kt8QlVcogdeM16URF7O2CDG
 mICWVXRqY+xWl/84XWJs11RRxfMnXp85myi1jyalOFDYojJp2qOkOyEi8zMg7wui
 N1kkPxEworm6ILrZRKNgT36ZfSEGAD2ogkp4wS/Fldwh2RNHtGUSw3xAPcqg0yDq
 +Ve3V06MDul/dyRup3MYtQ/Kxb5wCc+Lfwh+eXYZIu/DfXy3E4K3w8YB7NPazlTw
 dBlWU/S3Y0uyoq+DvfFQ+nL0GTe0ezGQB5n5n1eNRWg+mEn3IiWXkrBJfha2Swcl
 vv+Fy/5ro9elcn1oQF5uvwDoGHhq/KkR7TjRICY9zmVvJ79IcqZJ/Bdg5CEXM+xc
 EvZDAdXn26y2xKZJzKGQSe7QJurOuQpdW497h1DLGIhDs2sGDjhXuISPcNpjPJKG
 PdgpJ9kG4v2cYnW58UTdzQKNFg7jjFUiiQV/IIoiVlxhg7dVQnSKCgXHcngiZiuQ
 eUR9p7ahfEOlG2N5qTlJqom0HtiJ6I4a5TDQhXoEYlKfxr9711f2+cramhf2cle+
 +G64chvmKcv7Fyx5yIwSJrTJfi6vsuW5VMfXrvGIqjFUEjIGe9rQMhWH34A7s5tG
 RD0Tg90JJr0rNsNa+LZNPbRynbdPBbrVWdrVyXN+9HDs40oe8BY=
 =f7qZ
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.128' into 5.4-2.3.x-imx

This is the 5.4.128 stable release

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-06-23 15:07:27 +00:00
Andrew Morton 4f69c89306 mm/slub.c: include swab.h
commit 1b3865d016815cbd69a1879ca1c8a8901fda1072 upstream.

Fixes build with CONFIG_SLAB_FREELIST_HARDENED=y.

Hopefully.  But it's the right thing to do anwyay.

Fixes: 1ad53d9fa3f61 ("slub: improve bit diffusion for freelist ptr obfuscation")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=213417
Reported-by: <vannguye@cisco.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-23 14:41:30 +02:00
Kees Cook 9ddeea35c4 mm/slub: fix redzoning for small allocations
commit 74c1d3e081533825f2611e46edea1fcdc0701985 upstream.

The redzone area for SLUB exists between s->object_size and s->inuse
(which is at least the word-aligned object_size).  If a cache were
created with an object_size smaller than sizeof(void *), the in-object
stored freelist pointer would overwrite the redzone (e.g.  with boot
param "slub_debug=ZF"):

  BUG test (Tainted: G    B            ): Right Redzone overwritten
  -----------------------------------------------------------------------------

  INFO: 0xffff957ead1c05de-0xffff957ead1c05df @offset=1502. First byte 0x1a instead of 0xbb
  INFO: Slab 0xffffef3950b47000 objects=170 used=170 fp=0x0000000000000000 flags=0x8000000000000200
  INFO: Object 0xffff957ead1c05d8 @offset=1496 fp=0xffff957ead1c0620

  Redzone  (____ptrval____): bb bb bb bb bb bb bb bb    ........
  Object   (____ptrval____): f6 f4 a5 40 1d e8          ...@..
  Redzone  (____ptrval____): 1a aa                      ..
  Padding  (____ptrval____): 00 00 00 00 00 00 00 00    ........

Store the freelist pointer out of line when object_size is smaller than
sizeof(void *) and redzoning is enabled.

Additionally remove the "smaller than sizeof(void *)" check under
CONFIG_DEBUG_VM in kmem_cache_sanity_check() as it is now redundant:
SLAB and SLOB both handle small sizes.

(Note that no caches within this size range are known to exist in the
kernel currently.)

Link: https://lkml.kernel.org/r/20210608183955.280836-3-keescook@chromium.org
Fixes: 81819f0fc8 ("SLUB core")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: "Lin, Zhenpeng" <zplin@psu.edu>
Cc: Marco Elver <elver@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-23 14:41:30 +02:00
Kees Cook c0837e021d mm/slub: clarify verification reporting
commit 8669dbab2ae56085c128894b181c2aa50f97e368 upstream.

Patch series "Actually fix freelist pointer vs redzoning", v4.

This fixes redzoning vs the freelist pointer (both for middle-position
and very small caches).  Both are "theoretical" fixes, in that I see no
evidence of such small-sized caches actually be used in the kernel, but
that's no reason to let the bugs continue to exist, especially since
people doing local development keep tripping over it.  :)

This patch (of 3):

Instead of repeating "Redzone" and "Poison", clarify which sides of
those zones got tripped.  Additionally fix column alignment in the
trailer.

Before:

  BUG test (Tainted: G    B            ): Redzone overwritten
  ...
  Redzone (____ptrval____): bb bb bb bb bb bb bb bb      ........
  Object (____ptrval____): f6 f4 a5 40 1d e8            ...@..
  Redzone (____ptrval____): 1a aa                        ..
  Padding (____ptrval____): 00 00 00 00 00 00 00 00      ........

After:

  BUG test (Tainted: G    B            ): Right Redzone overwritten
  ...
  Redzone  (____ptrval____): bb bb bb bb bb bb bb bb      ........
  Object   (____ptrval____): f6 f4 a5 40 1d e8            ...@..
  Redzone  (____ptrval____): 1a aa                        ..
  Padding  (____ptrval____): 00 00 00 00 00 00 00 00      ........

The earlier commits that slowly resulted in the "Before" reporting were:

  d86bd1bece ("mm/slub: support left redzone")
  ffc79d2880 ("slub: use print_hex_dump")
  2492268472 ("SLUB: change error reporting format to follow lockdep loosely")

Link: https://lkml.kernel.org/r/20210608183955.280836-1-keescook@chromium.org
Link: https://lkml.kernel.org/r/20210608183955.280836-2-keescook@chromium.org
Link: https://lore.kernel.org/lkml/cfdb11d7-fb8e-e578-c939-f7f5fb69a6bd@suse.cz/
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Marco Elver <elver@google.com>
Cc: "Lin, Zhenpeng" <zplin@psu.edu>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-23 14:41:30 +02:00
yangerkun 566345aaab mm/memory-failure: make sure wait for page writeback in memory_failure
[ Upstream commit e8675d291ac007e1c636870db880f837a9ea112a ]

Our syzkaller trigger the "BUG_ON(!list_empty(&inode->i_wb_list))" in
clear_inode:

  kernel BUG at fs/inode.c:519!
  Internal error: Oops - BUG: 0 [#1] SMP
  Modules linked in:
  Process syz-executor.0 (pid: 249, stack limit = 0x00000000a12409d7)
  CPU: 1 PID: 249 Comm: syz-executor.0 Not tainted 4.19.95
  Hardware name: linux,dummy-virt (DT)
  pstate: 80000005 (Nzcv daif -PAN -UAO)
  pc : clear_inode+0x280/0x2a8
  lr : clear_inode+0x280/0x2a8
  Call trace:
    clear_inode+0x280/0x2a8
    ext4_clear_inode+0x38/0xe8
    ext4_free_inode+0x130/0xc68
    ext4_evict_inode+0xb20/0xcb8
    evict+0x1a8/0x3c0
    iput+0x344/0x460
    do_unlinkat+0x260/0x410
    __arm64_sys_unlinkat+0x6c/0xc0
    el0_svc_common+0xdc/0x3b0
    el0_svc_handler+0xf8/0x160
    el0_svc+0x10/0x218
  Kernel panic - not syncing: Fatal exception

A crash dump of this problem show that someone called __munlock_pagevec
to clear page LRU without lock_page: do_mmap -> mmap_region -> do_munmap
-> munlock_vma_pages_range -> __munlock_pagevec.

As a result memory_failure will call identify_page_state without
wait_on_page_writeback.  And after truncate_error_page clear the mapping
of this page.  end_page_writeback won't call sb_clear_inode_writeback to
clear inode->i_wb_list.  That will trigger BUG_ON in clear_inode!

Fix it by checking PageWriteback too to help determine should we skip
wait_on_page_writeback.

Link: https://lkml.kernel.org/r/20210604084705.3729204-1-yangerkun@huawei.com
Fixes: 0bc1f8b068 ("hwpoison: fix the handling path of the victimized page frame that belong to non-LRU")
Signed-off-by: yangerkun <yangerkun@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-23 14:41:23 +02:00
Andrey Zhizhikin 276aedc8f1 This is the 5.4.125 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmDB+Z8ACgkQONu9yGCS
 aT5qig//WVut449WUeYQLKD8rAB5CUVm2Xl3509Ts8W6LSzYGHiYv1SRVeH2y1lS
 QnfCnBciopl2UyYxqXGQwoRYdY1T2E/MWUmwGUk0/qlZYOzg5xQ368Shm0lvohJI
 DsywZrYqJDUCoeyXoWJYrq/3RiAvMK30teKDcn1A2HhhWdo0nsGLp1GUX396ptcV
 3xw2ZvCVwuikwxq5jlQKUEkH59TD/ZkCzvn9gfd86FY1R0ohApLJckhGIuT3wA1c
 Tfekgvfngx1HcEWIAzWFqZPoB8mOF5pn06yZhuPdMKa8UUq78ckN7kbchERj2wJD
 cDFSQQrMI3nL9sA8ryYV1YFl3fyGX5Epm4O465whzjKWoZ9HwN+iwl6Qv+kOmX41
 YUmpUplhsPN+I7+cX1jF7Ohw583uDbFPw6XbyZ0ArZr03JVVv4Vjrv5QA9fVHR06
 OP7+zEUlBtu/g3k0Bj5MU8UKem0shXavkPqukrtB+MhrXh2VngEXEVOvKMOFgA4b
 BnBEga4SrCR/wB+SucIV4fqzV0tq4HD/cPpy67OafrWoqhwlnBsMCQUd+puxkCnM
 y+eEoRwTzRSW+U9y8KdAERW8qSR/vCyKCUoaKxOV3Jj0v8xp0Y6VHKlKmb//w5Gn
 Lk7sNjD60Um3Au53A5pJvh8qNg+OsNc46sEmGGndE4Mrada93gE=
 =O2C+
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmDCIf8ACgkQ7G51OISz
 Hs04IA/9HPtgSX+5Uha8T9IWUKxKwK8BwXAnBnowBkt76X50PLR7/i1wD3WmNdMc
 nVe+bScX6gTjhbqICO2toDZ+lcqWsM00cHPnjZGGwnGDFIvlbxYAYZt/dPTHxgze
 YXDu7dxY5Cb3tAYBX1Ng165Rti9gJC8QNGOLXiCOUhDSTNMepe02wi6bKR3jN/hm
 jjl02Qo9BQI70a1w3zOFHH8ffQuUdOoTFji8hq2u+cJ1tP6FuftJyPvIAm+MDLNd
 83dg1P4eg73Qk+tp93OKrSG3pnCngxgveCB+U3SQnCd2b83asNVigjxoxkrZZJ7L
 9kxq4ifyAfH9TLQJ5lo2xOdQ1ra0+KTBwYKr2X1/N5mrXnmi9OCt54tXFnkPcJLN
 S0HAP9cCf+NtoACirUfNETeZJDaISvHiYT8XhbJ+y1mr+3pbN/4DQ3P/4u1ykoyF
 XBQwuAEw6ljz22HbZBuLrsB339CSwVuJbSaFrmeuUsX7AKIA/p45lr6L5JQssTyD
 a0NWKWFMJ7rV/f10u/B24kZwcSNghx1xMuX8hfBnyPtbR4ChnlnKnSSLsF+AJQEW
 7chnIejPa0UwQAkZmd/b1qaqZq5oIct3BFRZUfcledlej8HweLNJGyy8+T5ZSu5d
 G5ku8CU2hIgnKSBaqF7AaqkcDga2fjelGJetJjqdPwgIzMJmqnE=
 =b+bL
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.125' into 5.4-2.3.x-imx

This is the 5.4.125 stable release

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-06-10 14:30:21 +00:00
Matthew Wilcox (Oracle) 3b7f3cab1d mm/filemap: fix storing to a THP shadow entry
commit 198b62f83eef1d605d70eca32759c92cdcc14175 upstream

When a THP is removed from the page cache by reclaim, we replace it with a
shadow entry that occupies all slots of the XArray previously occupied by
the THP.  If the user then accesses that page again, we only allocate a
single page, but storing it into the shadow entry replaces all entries
with that one page.  That leads to bugs like

page dumped because: VM_BUG_ON_PAGE(page_to_pgoff(page) != offset)
------------[ cut here ]------------
kernel BUG at mm/filemap.c:2529!

https://bugzilla.kernel.org/show_bug.cgi?id=206569

This is hard to reproduce with mainline, but happens regularly with the
THP patchset (as so many more THPs are created).  This solution is take
from the THP patchset.  It splits the shadow entry into order-0 pieces at
the time that we bring a new page into cache.

Fixes: 99cb0dbd47 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Qian Cai <cai@lca.pw>
Link: https://lkml.kernel.org/r/20200903183029.14930-4-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:37:15 +02:00
Mina Almasry 14fd3da3e8 mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY
[ Upstream commit d84cf06e3dd8c5c5b547b5d8931015fc536678e5 ]

The userfaultfd hugetlb tests cause a resv_huge_pages underflow.  This
happens when hugetlb_mcopy_atomic_pte() is called with !is_continue on
an index for which we already have a page in the cache.  When this
happens, we allocate a second page, double consuming the reservation,
and then fail to insert the page into the cache and return -EEXIST.

To fix this, we first check if there is a page in the cache which
already consumed the reservation, and return -EEXIST immediately if so.

There is still a rare condition where we fail to copy the page contents
AND race with a call for hugetlb_no_page() for this index and again we
will underflow resv_huge_pages.  That is fixed in a more complicated
patch not targeted for -stable.

Test:

  Hacked the code locally such that resv_huge_pages underflows produce a
  warning, then:

  ./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10
	2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success
  ./tools/testing/selftests/vm/userfaultfd hugetlb 10
	2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success

Both tests succeed and produce no warnings.  After the test runs number
of free/resv hugepages is correct.

[mike.kravetz@oracle.com: changelog fixes]

Link: https://lkml.kernel.org/r/20210528004649.85298-1-almasrymina@google.com
Fixes: 8fb5debc5f ("userfaultfd: hugetlbfs: add hugetlb_mcopy_atomic_pte for userfaultfd support")
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10 13:37:14 +02:00
Andrey Zhizhikin 2cbb55e591 This is the 5.4.120 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmCkyEcACgkQONu9yGCS
 aT70Qg//Rv09McvLQ+8E0OilJ7TdT0UthXQFP+uPTu+/HPeHQkCO168cn1hbwD9K
 i0YfFYB7PqPe/wccHNsmWHSUYCzA9NnwExA84/jofjswkEMMc95x/bow5/xmLe/5
 ImkjODPVHuQWewgMfbSmNu7Br4wmQC5U/K4r7hp/Aa0FdTjcHMI6Zw40FGbJrWmq
 kiqhW9CeagKbxWrihQNLrSB4E5CdpNNkug/zVus2n9jlFT4tltNGSd7bPsxrp7LN
 EdTfayyPUVZeCoysTNA0WZgz47f+z47vAdIlDHzWCIOZcM1RnJXKA5kFXRf8Fnfa
 +hyvaHSDqYGdRgZxYMcXLL+/cS4foQ/8iQxZBCMomABM0MNUuoJ5tYR6GVetlRcR
 46ZC/5OAvNoKY2Kj4Ky4ROF7aMR3NkYCY6wHUVRcw8778bmuReeLJJPsWojAI+4F
 pWT08+7OUJYb3hRnGxxzKot6CPztdkpQXfXMy+wyNlNbRZ/ivs9/f/GhdblXy/6T
 j12LKIh1IOxpB/wi7GRfeABUuC4MU8xqx6FuPDrBgCTMfVig/wcwF27AUr//a0F5
 xrrzCrDFNAvuyD1WyYilaxWDHAe2o9ROT0JZ4VB3zu40w2VlTT77aqA174xfQa6b
 418Eykw3O11dmsY8AQPTt1HhkDCiewEe4K58CJcmCNEf/inFbvI=
 =kNQc
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmCk3VMACgkQ7G51OISz
 Hs0Y+A/9HqeZOFBPE3BlnEKkdxEPp+zoOj/57pWlS+Xr8ySalZXonAAC975voKjS
 6IFejLkN83q5yCrwYBJ/PCc5Xm9deBf+6AE3KyaNV7CD5ycSBy8o9Fx7IH81wFXt
 quiVNlpi7iGiWsm3ubsvDHNgHwQlvN6mbNp+RIqD4YllheUsTXeR05rrfr+TauXv
 Bv7LEolcyUJo0LTzqPsTlVFG9ec+0qWZb1A7LgOZJFWrS/XaYpoXA49KaNbIZIPj
 jb+sWzTsXhOMc2yB1A6P8T3jsx7NxK4Y2rRyCk0l5Hm//Jl7Kcx+vbe0yQPf2DOs
 c0X+0s8Vwk/Ry606XLcy0nIIGePyj++u43r6noB/cN/LnZUbJY3zbywUNvrY1thS
 YG2MtSiV9TzzKP4xApPF7G/4G6H4HOaC1cQbxc3+7sklJf7Coh3pKv775POIfnLS
 UrQ2ToQRNWQC4/l/O9h9BYiIPWSAAUd0uBgZzcdQWeWbdPXMdb02cwZ5YuPoZgSV
 aLsDN8naaQlsZ20wdQ0DaqKoIryHnD0XHNSMkEvZ0bjPxOOtICekixLJQxjwFBQL
 cx3stcQs5PT8l3PQqc2xnVNTqnL7yWGdMYdfNE+4qTLSiw2s5uXbb7BJwttj9Pit
 QKVSVUtp12ObX1kJzqhKa9GQEUPYREVBpN7NwJL2E5hVObfT6OE=
 =ruSa
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.120' into 5.4-2.3.x-imx

This is the 5.4.120 stable release

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-05-19 09:41:35 +00:00
Peter Xu f77aa56ad9 mm/hugetlb: fix F_SEAL_FUTURE_WRITE
commit 22247efd822e6d263f3c8bd327f3f769aea9b1d9 upstream.

Patch series "mm/hugetlb: Fix issues on file sealing and fork", v2.

Hugh reported issue with F_SEAL_FUTURE_WRITE not applied correctly to
hugetlbfs, which I can easily verify using the memfd_test program, which
seems that the program is hardly run with hugetlbfs pages (as by default
shmem).

Meanwhile I found another probably even more severe issue on that hugetlb
fork won't wr-protect child cow pages, so child can potentially write to
parent private pages.  Patch 2 addresses that.

After this series applied, "memfd_test hugetlbfs" should start to pass.

This patch (of 2):

F_SEAL_FUTURE_WRITE is missing for hugetlb starting from the first day.
There is a test program for that and it fails constantly.

$ ./memfd_test hugetlbfs
memfd-hugetlb: CREATE
memfd-hugetlb: BASIC
memfd-hugetlb: SEAL-WRITE
memfd-hugetlb: SEAL-FUTURE-WRITE
mmap() didn't fail as expected
Aborted (core dumped)

I think it's probably because no one is really running the hugetlbfs test.

Fix it by checking FUTURE_WRITE also in hugetlbfs_file_mmap() as what we
do in shmem_mmap().  Generalize a helper for that.

Link: https://lkml.kernel.org/r/20210503234356.9097-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20210503234356.9097-2-peterx@redhat.com
Fixes: ab3948f58f ("mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-19 10:08:29 +02:00
Axel Rasmussen b3f1731c6d userfaultfd: release page in error path to avoid BUG_ON
commit 7ed9d238c7dbb1fdb63ad96a6184985151b0171c upstream.

Consider the following sequence of events:

1. Userspace issues a UFFD ioctl, which ends up calling into
   shmem_mfill_atomic_pte(). We successfully account the blocks, we
   shmem_alloc_page(), but then the copy_from_user() fails. We return
   -ENOENT. We don't release the page we allocated.
2. Our caller detects this error code, tries the copy_from_user() after
   dropping the mmap_lock, and retries, calling back into
   shmem_mfill_atomic_pte().
3. Meanwhile, let's say another process filled up the tmpfs being used.
4. So shmem_mfill_atomic_pte() fails to account blocks this time, and
   immediately returns - without releasing the page.

This triggers a BUG_ON in our caller, which asserts that the page
should always be consumed, unless -ENOENT is returned.

To fix this, detect if we have such a "dangling" page when accounting
fails, and if so, release it before returning.

Link: https://lkml.kernel.org/r/20210428230858.348400-1-axelrasmussen@google.com
Fixes: cb658a453b ("userfaultfd: shmem: avoid leaking blocks and used blocks in UFFDIO_COPY")
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reported-by: Hugh Dickins <hughd@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-19 10:08:29 +02:00
Miaohe Lin 6aeba28d12 ksm: fix potential missing rmap_item for stable_node
[ Upstream commit c89a384e2551c692a9fe60d093fd7080f50afc51 ]

When removing rmap_item from stable tree, STABLE_FLAG of rmap_item is
cleared with head reserved.  So the following scenario might happen: For
ksm page with rmap_item1:

cmp_and_merge_page
  stable_node->head = &migrate_nodes;
  remove_rmap_item_from_tree, but head still equal to stable_node;
  try_to_merge_with_ksm_page failed;
  return;

For the same ksm page with rmap_item2, stable node migration succeed this
time.  The stable_node->head does not equal to migrate_nodes now.  For ksm
page with rmap_item1 again:

cmp_and_merge_page
 stable_node->head != &migrate_nodes && rmap_item->head == stable_node
 return;

We would miss the rmap_item for stable_node and might result in failed
rmap_walk_ksm().  Fix this by set rmap_item->head to NULL when rmap_item
is removed from stable tree.

Link: https://lkml.kernel.org/r/20210330140228.45635-5-linmiaohe@huawei.com
Fixes: 4146d2d673 ("ksm: make !merge_across_nodes migration safe")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19 10:08:27 +02:00
Miaohe Lin dde73137ce mm/migrate.c: fix potential indeterminate pte entry in migrate_vma_insert_page()
[ Upstream commit 34f5e9b9d1990d286199084efa752530ee3d8297 ]

If the zone device page does not belong to un-addressable device memory,
the variable entry will be uninitialized and lead to indeterminate pte
entry ultimately.  Fix this unexpected case and warn about it.

Link: https://lkml.kernel.org/r/20210325131524.48181-4-linmiaohe@huawei.com
Fixes: df6ad69838 ("mm/device-public-memory: device memory cache coherent with CPU")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19 10:08:27 +02:00
Miaohe Lin 262943265d mm/hugeltb: handle the error case in hugetlb_fix_reserve_counts()
[ Upstream commit da56388c4397878a65b74f7fe97760f5aa7d316b ]

A rare out of memory error would prevent removal of the reserve map region
for a page.  hugetlb_fix_reserve_counts() handles this rare case to avoid
dangling with incorrect counts.  Unfortunately, hugepage_subpool_get_pages
and hugetlb_acct_memory could possibly fail too.  We should correctly
handle these cases.

Link: https://lkml.kernel.org/r/20210410072348.20437-5-linmiaohe@huawei.com
Fixes: b5cec28d36 ("hugetlbfs: truncate_hugepages() takes a range of pages")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Feilong Lin <linfeilong@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19 10:08:27 +02:00
Miaohe Lin 3ddbd4bead khugepaged: fix wrong result value for trace_mm_collapse_huge_page_isolate()
[ Upstream commit 74e579bf231a337ab3786d59e64bc94f45ca7b3f ]

In writable and !referenced case, the result value should be
SCAN_LACK_REFERENCED_PAGE for trace_mm_collapse_huge_page_isolate()
instead of default 0 (SCAN_FAIL) here.

Link: https://lkml.kernel.org/r/20210306032947.35921-5-linmiaohe@huawei.com
Fixes: 7d2eba0557 ("mm: add tracepoint for scanning pages")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Ebru Akagunduz <ebru.akagunduz@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19 10:08:27 +02:00
Andrey Zhizhikin 6602fc5788 This is the 5.4.119 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmCeKsMACgkQONu9yGCS
 aT4cYhAA0qDTHscvm641m/Dv4U9w3gWh2Fs8oPz43+nJ1/8CTrT/gSWA7IRDDHiV
 Dys2canDVLNYTEx1TqwmHbN3R+nvQTpdz2wuJuSf7GKYQj0n3S99BEN6uxod+puu
 /M7apBH5npjZKv1DMRUrQ/AUGVUuBQqtN7Hl5hEL8ibI/bsZV8+dhJJ8c8uyJpam
 peiP5n2lCz5HZ/K5OyEy1jCmWQLIcRN59SmiARy/xk739igoCMUajkY1mV0WVyks
 SKnZEP7tY1mLLYzpW/ZVSkXurx+ZtL1zUctRt5dh5US4uzNt/sfm8oDzyzvGojd/
 iWtXefprJXbI9BGyNaBwwNmzjSXabSkoI75wExxsMQKFZpsq12pz97dwy/pZyU+c
 NlzbmDQg8+Cs9dKsDw6jUXHYSJ9fb4mk6GOF9u0LXgyq1f15/DzjdzkYLXZ3tTOK
 exVFs/CKz7Dg6npdO5kl7mg18AxmVH+OJftltF2+MbUohBs2vRDRr+O4cY8Wlc2Q
 AF85uAE3Mo/yL1pi6O7lMW4ic5yJvTRCX/iPsxyDU8LvxM1Kc7u9CzykX3M0WFLz
 TsKxfPQvoc6WGf8IWy4j1nXMzXQTHL/6CrfzOSTFngR8eqcsbgU0nkKpZNEtvnxN
 k30ID+Mcl4B6k6XTECNJUXjcwg+TR+XtKOjwXAIDaVqwoW759BY=
 =25pJ
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmCe4QMACgkQ7G51OISz
 Hs1JdQ//aC82eBW+TH85PNlVaYn7DL8cU6zE/mX4dofughg8AIg84v0v30rRGF3m
 c0bxBhwcah+4TIPzWSs8QAT7pEcvR3Gj5JZrEeIuyXBwKhRSXgZJKbV4XAE9Z23W
 1Etxwh28NfLbLEnX6qJnejQhHckt4MK9j4W5Gg2L7o6TrZwB7gLjNPfck6t7DnJc
 9Fo4TxwgP+BFu30s9h9eQqv/EIVJPRfiAVgvK4o1CCEHya8bfnJgJDgyxKbdyBg6
 IXYfYWGVe26VCR7PreEFJ3iheVVlI9VicFfhRzbsCdYQYYNqa22VfyMFrOO1nyMO
 SPAQnDRqlpvXIUHZ70puDX9QYji2RhVjf6fnDoQRPlkDaKumO3+zdk8bcCJFk7kF
 f7BPc8babvAn4WeG2hQOlPSKf+Mcc5mFmh08bsb8y9/w2mNPhg8UkCsaymggabkP
 GK/MvaBGxSoV8ft+EInyRLN3qlLrinYZDOOTj40Pd/d3HrJQCcShXzOI4SRd73MU
 +ISKHU9BrsRvVQ5ICjVTRfFMcFDJUt60leVM+QGm6MxQmgAuW/Scy9bNb9sDUyJd
 6TUI2stzAxDRrjz8JDIXFyUImC3juL1c4WnrpP5I6ZWIeXIpVlQAH1Wxg18Dj0DC
 1W5Fb2cicRRLfSPBgjppeipChm9ecT6Fl+3n7kBo5tvSdiiJEo4=
 =Wb4x
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.119' into 5.4-2.3.x-imx

This is the 5.4.119 stable release

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-05-14 20:43:43 +00:00
Jane Chu 4a83a9deea mm/memory-failure: unnecessary amount of unmapping
[ Upstream commit 4d75136be8bf3ae01b0bc3e725b2cdc921e103bd ]

It appears that unmap_mapping_range() actually takes a 'size' as its third
argument rather than a location, the current calling fashion causes
unnecessary amount of unmapping to occur.

Link: https://lkml.kernel.org/r/20210420002821.2749748-1-jane.chu@oracle.com
Fixes: 6100e34b25 ("mm, memory_failure: Teach memory_failure() about dev_pagemap pages")
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14 09:44:32 +02:00
Wang Wensheng de143fb2fe mm/sparse: add the missing sparse_buffer_fini() in error branch
[ Upstream commit 2284f47fe9fe2ed2ef619e5474e155cfeeebd569 ]

sparse_buffer_init() and sparse_buffer_fini() should appear in pair, or a
WARN issue would be through the next time sparse_buffer_init() runs.

Add the missing sparse_buffer_fini() in error branch.

Link: https://lkml.kernel.org/r/20210325113155.118574-1-wangwensheng4@huawei.com
Fixes: 85c77f7913 ("mm/sparse: add new sparse_init_nid() and sparse_init()")
Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14 09:44:32 +02:00
Andrey Zhizhikin e9a7181f47 This is the 5.4.110 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmBtqhEACgkQONu9yGCS
 aT4Xbw//SSc6S+So14ND1v6SFI1BvDpAooneM7qsNxh4OU53be/tJba0XosHu6B1
 Wk8fnNFtoDokfuHWHgQJ0g97SgOlHTnSs+wBGVa2Z0o+446Gf7FIFaH16QKVM7pC
 1t1Y3zxVJ6cKNhGJUOXNrCF+ktPHAAaugxPmFhiX9lacSnt9aKKjJUwgm/5OIPO1
 fbY3VcoaxAGPzqOuKE66nMLZwdLHs7ZNK74OGfr6oog+Rt6ZHwmto/AGdueZQmHh
 cwxPQwkkMWDf7ebihE/19YPWN6etCg7VNjYeGxZmy2c5Zar8mzr9Qi7HbpZOJsn1
 BUzWRX1fMi7DAvRUUQrCR01zAjP9uGCeny4NwnRjWl0PvD69AOQu/EWO7yp3Iy5e
 DmwHSHrH3p1JtJd0cxrDA5F2IjGu/FtiahrpJzqphBdGWDvKhdE4tQK4uZsGp/F2
 rdy4PI9ksy+YnJeXb/w/yRhm/tlzUwelfc/YuW31Y1l40XQRpm3IlZPCLpgswBhU
 MYHuVX2WCG5I7Rw88SU1995GypwLOtR3LxvBwUsbnQcwLGaJbd5S/2g/4Ad8MlyT
 x3ROfoOIwPcEh+sTe4nTstisEkZFE/nQBnAvkhS567LMDdpPxy5Lho51XpFk3Au2
 YSkHhb5OrwZ7pXhAdp4JeQtfmL11v9y5V/wY53iDYIWZWSXUqwI=
 =K+tH
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmButxcACgkQ7G51OISz
 Hs0adQ/8CNQUMjMVaV4YhnjHr7Cf3f6CeFQDkrsOYEWQNWViHKlBj/ixYApQYyYo
 VyUxJaL9xP4fDEHDwPSRJm8/A+BQAG/X32Qbs7r4abhJD89gFw7a0TD4zv4lZVjm
 wi1n9k9k2uOFoYD8lh7QLXJZVB9RCzzSxk37tHe//m8BX3Eo4Ho2Ce2i/C/qtTAp
 o7ztiD8Kpq1jqpi4yQhPX3FWg6q236nXWg83tSPCdKD3hFdf+YsRJ6YiAo/T0rQn
 +nEzI5bBIifCQs5yzm1A4VSKQRp1SDWAAxwRJvQMIhLr1rcZHxapfTxOVdIb1OKy
 uF0UHzqHhO3y9t4dREzlUYhkU0fn7c080eDOX1R+K58Njz0f7OOJZoccbu11jfy6
 96nWnHXLcqJ06N8kS0IOhd7Xg4ESnHpIv+Ae/tSQi7HV7QeSIsYrH+vFla8LPO3G
 1a5JUoMeMZ5m4Uzvp5pUY9uhzICcDyUIaqAXLf1cCg9kgJzU0xUWuTXFF5efzr2+
 X0kCM5yrHZBAKL8IK2nINyilV+QllB8Z3/XmFMbnzTHxbxSsKg1xdOzkypM8thMN
 sZRRsBDXm7B3hxadRoQQ88n+kYOiFphTfr80nKCTmJ/12VGyXRoa8iqAEfNfabLC
 EmrSgQh9zLDDdv6KBzgohHi5giLXnYthQ1ibzeh7HPr+1rUT6Ro=
 =P1S7
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.110' into 5.4-2.3.x-imx

This is the 5.4.110 stable release

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-04-08 07:56:04 +00:00
Ilya Lipnitskiy 00bd9c2240 mm: fix race by making init_zero_pfn() early_initcall
commit e720e7d0e983bf05de80b231bccc39f1487f0f16 upstream.

There are code paths that rely on zero_pfn to be fully initialized
before core_initcall.  For example, wq_sysfs_init() is a core_initcall
function that eventually results in a call to kernel_execve, which
causes a page fault with a subsequent mmput.  If zero_pfn is not
initialized by then it may not get cleaned up properly and result in an
error:

  BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:1

Here is an analysis of the race as seen on a MIPS device. On this
particular MT7621 device (Ubiquiti ER-X), zero_pfn is PFN 0 until
initialized, at which point it becomes PFN 5120:

  1. wq_sysfs_init calls into kobject_uevent_env at core_initcall:
       kobject_uevent_env+0x7e4/0x7ec
       kset_register+0x68/0x88
       bus_register+0xdc/0x34c
       subsys_virtual_register+0x34/0x78
       wq_sysfs_init+0x1c/0x4c
       do_one_initcall+0x50/0x1a8
       kernel_init_freeable+0x230/0x2c8
       kernel_init+0x10/0x100
       ret_from_kernel_thread+0x14/0x1c

  2. kobject_uevent_env() calls call_usermodehelper_exec() which executes
     kernel_execve asynchronously.

  3. Memory allocations in kernel_execve cause a page fault, bumping the
     MM reference counter:
       add_mm_counter_fast+0xb4/0xc0
       handle_mm_fault+0x6e4/0xea0
       __get_user_pages.part.78+0x190/0x37c
       __get_user_pages_remote+0x128/0x360
       get_arg_page+0x34/0xa0
       copy_string_kernel+0x194/0x2a4
       kernel_execve+0x11c/0x298
       call_usermodehelper_exec_async+0x114/0x194

  4. In case zero_pfn has not been initialized yet, zap_pte_range does
     not decrement the MM_ANONPAGES RSS counter and the BUG message is
     triggered shortly afterwards when __mmdrop checks the ref counters:
       __mmdrop+0x98/0x1d0
       free_bprm+0x44/0x118
       kernel_execve+0x160/0x1d8
       call_usermodehelper_exec_async+0x114/0x194
       ret_from_kernel_thread+0x14/0x1c

To avoid races such as described above, initialize init_zero_pfn at
early_initcall level.  Depending on the architecture, ZERO_PAGE is
either constant or gets initialized even earlier, at paging_init, so
there is no issue with initializing zero_pfn earlier.

Link: https://lkml.kernel.org/r/CALCv0x2YqOXEAy2Q=hafjhHCtTHVodChv1qpM=niAXOpqEbt7w@mail.gmail.com
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: stable@vger.kernel.org
Tested-by: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-07 14:47:42 +02:00
Andrey Zhizhikin c4253dfaf6 This is the 5.4.109 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmBjGy8ACgkQONu9yGCS
 aT6P4Q//RUTmWKIEvODK9Hyac0qfvd1CsIgebVR/1hkadYO8OVssIVjSZoyHvfgg
 B2rsjrY1+ywwPl+IYFe4V29SIEuy+YWNo7rjavAPP7W1ybYzhaUXog7KSapho8cy
 hqTlLyWq/TeSehdomz2Luv5vM794RgEV4NjgxnBsncfjUchx5smGQH80xbKRbWFB
 QNq2h1coPbABv3dj1cBb1v2jiCc58QD8rfJuguaHjAiGem2HaMat2iWYo8T2Qcre
 UDb1yrOxCbwltc8+aRRcXI4QuS/4edPz3ZH8H9zdqMQVoS5RX0Alse+w6+F26c1c
 fRZmtg6t70wsznIQ+Jn6ouMY3Ea1jtrF4oVjMCMnno+4V7BgDGW+A+CAqbCC90mt
 QTwaObNyJRjUYjlLmTml7t+S3GqW2YoC2jALs2P3hx/ht0wOl6TIt7YmHCh3/tnR
 wZjyofl+2ml/z+cPqP7/IWGJzzNCEwxreZNcvjgx+k/L/zeNri4q/+fLETe0VE0H
 LNU04JBl2oOOMpkyX8MJODH5Gm9sOg+GiQ3tEZWsgls0mwtxKMxRuu6zNPQvIY93
 cGntM1kVTtQ8fzIUugZR0JgElnosg1xFup3nQKyoids+SEGDgDpC4O5pxYvNW8oo
 jThLWud1waFzhnVXGRGviI0irQPUeYh7Bfw///c7hPHbqw9+F0k=
 =6s9w
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmBsF/oACgkQ7G51OISz
 Hs3CjRAApVP+mu2FCtefIpEnIJJnsPt3L3QQKq3OtChJb6pSRv9FZ7+KW9RYz+qg
 kTyYa1O8FPxaiUH1RUJnRt93qRYf12JitFAUOuEthJGL6Rac5pl2JfvMNAAWvhW7
 nin/Ci7klYyve7Fk3JsBnzpppQHgODTHbQ4TyUH0Mcx8l20o7uxoClevs9JqB3m8
 RbpwRjKKU+4dZU51ueLVPKenS2pk8Hz3kKLPDP27ZJh0PeJrLa4Och7rh3cqVBLW
 nYFJDLBRMRmuVnAHmzCclhH/YFzRC9TKI/iET2u14P1HPdHFO5TUqARbRY37iWLT
 w33qPohDAdR08Kzn7LWGxxoiyoYxqyWgCXhTaeuciMS+NipdHo8/dyDVQClnYZdh
 aPf0eqNciWEkE4KwltvKl4huYwzptsraBxa5eweHa4LhB6M21kmcOdenwPAz1dcO
 upiQqC68ZJAVoT75RD1M1cvQfVDPoNMcrl/Nw+sDqqawwUQ6jpyu46kRK3ttJ0Qx
 EiG5UhS50IyNPbgNUztnsezks26ajyNho69yQuiGA7qgPBKDBPuBrLnk3LlEkuzq
 8fpMRYDQqIYe0YaKb/DV/8X/c9WowskNv9VkGoqx4VbOFnQUFoxl9MEDjzWcksOq
 9OihF8flIwsY8rKFZGJ5EbMbgaw85DutEu6m830CMoIZfUTCRFo=
 =XQ0y
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.109' into 5.4-2.3.x-imx

This is the 5.4.109 stable release

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-04-06 08:12:40 +00:00
Mike Kravetz d0f5726ab1 hugetlbfs: hugetlb_fault_mutex_hash() cleanup
commit 552546366a30d88bd1d6f5efe848b2ab50fd57e5 upstream.

A new clang diagnostic (-Wsizeof-array-div) warns about the calculation
to determine the number of u32's in an array of unsigned longs.
Suppress warning by adding parentheses.

While looking at the above issue, noticed that the 'address' parameter
to hugetlb_fault_mutex_hash is no longer used.  So, remove it from the
definition and all callers.

No functional change.

Link: http://lkml.kernel.org/r/20190919011847.18400-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Ilie Halip <ilie.halip@gmail.com>
Cc: David Bolvansky <david.bolvansky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-30 14:35:19 +02:00
Andrey Zhizhikin 960eed45f6 This is the 5.4.106 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmBSKIcACgkQONu9yGCS
 aT6nww//RYwO4quTQO9h/SnVtYta3C0bkgSjLCuLjM6LY20L5sHiPxMXKn3LTb67
 SSFtW7vyR4gOmIduQ783yoDxzSGuKZvQ48zh5OZYXD4GlhP9JZ5y4IkEf5r0SGIA
 k4pYYX8rPLNaeOu8TprjdGdaDFC4XplFfZEN19sympvv2q20qD+JzvcjjhyCFmvk
 4A9NibAStU4jUK8AvY4STJb9XmaYo337Btv3Y2j+qUBVj6fMsNCfUif1SdGHA4de
 TPzaPVOIm5p4USOy/m+hsc0e/q+nzz+VYYk+T7X9NDU+kAiEOjdyMqwNOtfAUl9A
 k7aca4oQMjO+MNVGrvER7xF0Se+wlTomTINzLYf0YTfkCMh9+Me+pFr8Fivdvhv9
 /mBFOJ0qqYXpezUETh7F5tgzMUHkzEcOiOpEG/sINxnsZXJaa09VJrS2GYIjILFN
 Epe83Z4ekbZtIzfUY+RWYVEP44fvV1lmLqKIs7z4xoz/IgF2NR++ABwyScCY1E2X
 GstK4fJ7wHA/usbmQofyfLMEF9hvawOu/GwWP2IVQRbK3E5Miux+tTkLXvVhqlr+
 CrLXHb8OZSb4+bzZb3fFLg/B6mR+MiNKXYp2WW1/7pqhTfJHHg8P7Ui72nAcM5Jw
 +W0Gezv/DtPqbhK6rGGTUxOTYOvWqJEuh6QAI4mDx1kIeevw13o=
 =MKFy
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmBTK2MACgkQ7G51OISz
 Hs0njRAAlH8lNVQeZfbDi0w6CfR/biyqTI1RCH3Kak6xTQPFv7zx19eejIj6liTf
 8N1U+MB5mzTzUlt7+fakKUBIbZFbf0gFXL0CgS3dIAyr/1AjZx5rvZP9PckH4pIJ
 m777SlNg07Rm6iVTkVBsES9+BhVvl5kLDRsx2WKQLg9rwaPsnxOR3aiQMqueLW9N
 /1l/Kxoz8jld+yK1mT4gmdTzc5zBJ8ywpyNMln0WWw2Esd+X8D+fc3iyj+DqKGaU
 Cdr8Wr1EoAnt5c9HMeuyTQiW5jyIvJwjtjBFJuy2hj0KH+7yeJen/fKBJXxp+J3Y
 FQbR0NLg25zWXxzNDyZ0mLFjB6KIF4IRFUAlEUBO4oavlyyLzAjg0GXE/Cy4ZVrw
 KE5sg6jcpkNNvwBx2rp+MKYhrRkZN26t7p21f8KMsoApyZR3cfT2RxhsFVZ650wi
 QAr2ZouNbwn5K6Bs7jVIv1VCVtW8uJyEeOuJph+4dSBv9fMzj1jVMXezXuKy1Yjn
 q66OkBjletI/hxaa6f/w4k+o20kxINexSmkg+6ZpvYTsBkElQW3GoHm1/QbW6b80
 KGA5MmwvsqXpp5HaepZ5QXpMZWmIGtYxXltBv+BFwQhjKHk+M8hCYKrmD0ZyHAN5
 mkh19gwi+3evxVhOMFb2WKuJngrH/om3fSftx1oJtHmAYadM+XA=
 =81mD
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.106' into 5.4-2.3.x-imx

This is the 5.4.106 stable release

Following conflicts were resolved during merge:
----
- drivers/net/can/flexcan.c:
Merge NXP commit c2aba4909d ("MLK-23225-2 can: flexcan: initialize all
flexcan memory for ECC function") with upstream commit fd872e63b274e ("can:
flexcan: invoke flexcan_chip_freeze() to enter freeze mode").

- drivers/net/ethernet/freescale/enetc/enetc_pf.c:
Merge upstream commit a8ecf0b2d9547 ("net: enetc: initialize RFS/RSS memories
for unused ports too") with NXP commits 7a5abf6a72 ("enetc: Remove mdio bus
on PF probe error path") and 501d929c03 ("enetc: Use DT protocol information
to set up the ports")
----

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-03-18 10:27:50 +00:00
Linus Torvalds 6547ec4286 Revert "mm, slub: consider rest of partial list if acquire_slab() fails"
commit 9b1ea29bc0d7b94d420f96a0f4121403efc3dd85 upstream.

This reverts commit 8ff60eb052eeba95cfb3efe16b08c9199f8121cf.

The kernel test robot reports a huge performance regression due to the
commit, and the reason seems fairly straightforward: when there is
contention on the page list (which is what causes acquire_slab() to
fail), we do _not_ want to just loop and try again, because that will
transfer the contention to the 'n->list_lock' spinlock we hold, and
just make things even worse.

This is admittedly likely a problem only on big machines - the kernel
test robot report comes from a 96-thread dual socket Intel Xeon Gold
6252 setup, but the regression there really is quite noticeable:

   -47.9% regression of stress-ng.rawpkt.ops_per_sec

and the commit that was marked as being fixed (7ced37197196: "slub:
Acquire_slab() avoid loop") actually did the loop exit early very
intentionally (the hint being that "avoid loop" part of that commit
message), exactly to avoid this issue.

The correct thing to do may be to pick some kind of reasonable middle
ground: instead of breaking out of the loop on the very first sign of
contention, or trying over and over and over again, the right thing may
be to re-try _once_, and then give up on the second failure (or pick
your favorite value for "once"..).

Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/lkml/20210301080404.GF12822@xsang-OptiPlex-9020/
Cc: Jann Horn <jannh@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17 17:03:34 +01:00
Andrey Zhizhikin 653b37e2c7 This is the 5.4.103 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmBEt1AACgkQONu9yGCS
 aT5UcBAAuobHx4KFrA3/SWKQo81k7oyyXdbb8BJK3hYLCl+RFD7aYguZlvqITVw+
 Hme5PQPnLvY3jc/TwhuIDOG2o2020mT79J8Ggo5ccP/pMIumOwi4LXLvcFQiUevo
 PnYbXM+QCmxyrm+d10gYeARaGDjP+rI5V46AeB+lkn9SgjzJB649d7BQxnQUxPfB
 bm+PyOhX7WgqvZFmkPR4RmLBBC57OfUtZoPID/mLW0w6kKYcy3GD1uHEp3TeG3Pe
 PVxjC57kEiHqnEck2df2XhaB12QlGUGxJXPDmhx6djsvpr3Ss4XOMOYVkZcfsWW5
 hThRdiBEgoOhjhqpfpuKYXE0IrB41Uxd6LNd4piCGF1xfiPSWF2x0m8a0NmgAynN
 Ungl6BbvgSyawI0luZeocSStD2POEbx264qxvA2t+XlxGCxw8PpS4X5mtNAk7vao
 VathiQFdt9LVtFftq7tVcy1XMt7U6SSOj84opLig5S4LHrUPY+/E0qCgZ4V786eo
 fkE13zUpixsxgbvYphAYIErXrq9o0B3WtaAa4jFB/RBugHy5mDGCavM/MNVgitJr
 Y7L9dNzdx2FpAez/dbod+n/CH5wXsnmLa2ZGqqIBVhxHRMOlYTVKa1ioc0bHrjab
 8giS1lm1EXOFgIPW7xI9aEG5alLy9s2ai29nAPmhpi1N72WUtQQ=
 =OmY7
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmBFGe0ACgkQ7G51OISz
 Hs0dLg//Ujaz2OkNFcwt6/2qdRERYfAK76wJ8CuWToykhgUNKoPLtxHGCFphiDo4
 liE0ABJQLVSptyJSbkw0xcsqMav0EgdBXH3JNaU2Cr2cSW+BIOLblHbtTyLdPffr
 53NGwubt9kgZE2du2T0mBIjEnq2cespdN5v5zBPRkNvF6YwDzTP7864qsW6Ky75H
 KczmdP95uoWm33CDjLj4unMx/4bzhGtp40oae9YiToxXXq+eqjZGUxveOluP/gGX
 UpcQEM5+BAdDOTDcAiFTHqXI8fU6nvHOrl90yfyL7rbcY8K9wbvM1LkXdeHGaO+p
 F6Jf9Wo9k1LxRWCmq8PBK6PaPClrYaMLCs3YPCuTGsKhv5QfqezUshbTEmgoeEJs
 PZmTzbxeolItxhZmt2zsNm/nhGcfUivyvn100y0hSCxUAdctokn09OaB29JX4okt
 I1/gNBWJdzXdK878870MAUD31S3Z4cvrYRMYdHTCNG0SJ/uzcs8YftAIjKZjYb+u
 3jYcqRZrHiQAoC4NagKUsH4Hz/WGB9P7jyJO6dI/pJVVqSBWJwuGom1Mfs9wDkLt
 Z6BT7rg+CQSKj+8r7iq384ozqPXK8TRQYvUQ/vHZQ34S6jo93atIlsdsHrKc5VzE
 a55vkAYZKX/BhxVWYcfA+wOFBHqBNOBdY+Aekx6qpg115+lz/iA=
 =aN1o
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.103' into 5.4-2.3.x-imx

This is the 5.4.103 stable release

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-03-07 18:22:33 +00:00