linux-brain

Commit Graph

Author	SHA1	Message	Date
Andrey Zhizhikin	8b231e0e8e	This is the 5.4.148 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmFLBPMACgkQONu9yGCS aT6BIQ//Wb4ZQJtEVvaKnda7vFwe8BoZzPGYZA4Imn9KERDRgHuavEuRfMQtKc2y YHwe/PD2JreuDHcd+Wz32xsdMe045xNvgiE1oGcxq0jNBvhJqANSmVTWpdqAquON cTmwsK3roa7ELC2g1WjrYZDv6CrCggqvbuM9AJ/cLITtd8zerhLdZo+CCDG/28cH EosrWvkBcaGmX+r/IBC86Rt6K2OFQ/3LLbb79L4vjKi5lopsm5CTAmfOfIk8p1gB mGB3PkQZnIqphBfqGXLGuljl4e+zb1SONrugUh78Egom393Ex34oo+RjWEGe9dV2 Stkuqo0GTi85X7JA7SGCA/xgF8A8yvaaLjQBsJsL9+2ji+GW+J7hfn4mE5h8H3Di UBjeLMFJA8Mge8Ng9xUSttvjRdwSTm0jWTS9SOl07w24b0pKYbMrQdWt2eI6CT+/ ytq3nCxNJZKeVcAVH+OJNrbSLYvMy/PgYvGTbzASkNmpAeyNiHOyBz1sRcoiAM9U QCWDdZyaqDKktqEyKHxK3opqPzbnHfZFFlCxR7Gw7vvR+itIGJEh/50RNv2F6vnu wzowrVxe+Bf1h7JiNEqLLVHdiuygRqjH1ygepGM4+3TVF4jYHzDISyrqlA/Se3Pg Hhvlzsbv7PH+KiApwBFjSeHTs5WOrokGMFQ7ZYFDpPkleWiywS0= =50Hk -----END PGP SIGNATURE----- Merge tag 'v5.4.148' into 5.4-2.3.x-imx This is the 5.4.148 stable release Conflicts: - drivers/dma/imx-sdma.c: Following upstream patches are already applied to NXP tree: `7cfbf391e8` ("dmaengine: imx-sdma: remove duplicated sdma_load_context") `788122c99d` ("Revert "dmaengine: imx-sdma: refine to load context only once"") - drivers/usb/chipidea/host.c: Merge upstream commit `a18cfd715e` ("usb: chipidea: host: fix port index underflow and UBSAN complains") to NXP version. Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>	2021-09-22 13:19:32 +00:00
David Hildenbrand	8656566821	mm/memory_hotplug: use "unsigned long" for PFN in zone_for_pfn_range() commit 7cf209ba8a86410939a24cb1aeb279479a7e0ca6 upstream. Patch series "mm/memory_hotplug: preparatory patches for new online policy and memory" These are all cleanups and one fix previously sent as part of [1]: [PATCH v1 00/12] mm/memory_hotplug: "auto-movable" online policy and memory groups. These patches make sense even without the other series, therefore I pulled them out to make the other series easier to digest. [1] https://lkml.kernel.org/r/20210607195430.48228-1-david@redhat.com This patch (of 4): Checkpatch complained on a follow-up patch that we are using "unsigned" here, which defaults to "unsigned int" and checkpatch is correct. As we will search for a fitting zone using the wrong pfn, we might end up onlining memory to one of the special kernel zones, such as ZONE_DMA, which can end badly as the onlined memory does not satisfy properties of these zones. Use "unsigned long" instead, just as we do in other places when handling PFNs. This can bite us once we have physical addresses in the range of multiple TB. Link: https://lkml.kernel.org/r/20210712124052.26491-2-david@redhat.com Fixes: `e5e6893026` ("mm, memory_hotplug: display allowed zones in the preferred ordering") Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Rapoport <rppt@kernel.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: virtualization@lists.linux-foundation.org Cc: Andy Lutomirski <luto@kernel.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Anton Blanchard <anton@ozlabs.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Dave Jiang <dave.jiang@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: Joe Perches <joe@perches.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Michel Lespinasse <michel@lespinasse.org> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pierre Morel <pmorel@linux.ibm.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Rich Felker <dalias@libc.org> Cc: Scott Cheloha <cheloha@linux.ibm.com> Cc: Sergei Trofimovich <slyfox@gentoo.org> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-09-22 12:26:43 +02:00
Rik van Riel	d0ddb80bbf	mm,vmscan: fix divide by zero in get_scan_count commit 32d4f4b782bb8f0ceb78c6b5dc46eb577ae25bf7 upstream. Commit f56ce412a59d ("mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim") introduced a divide by zero corner case when oomd is being used in combination with cgroup memory.low protection. When oomd decides to kill a cgroup, it will force the cgroup memory to be reclaimed after killing the tasks, by writing to the memory.max file for that cgroup, forcing the remaining page cache and reclaimable slab to be reclaimed down to zero. Previously, on cgroups with some memory.low protection that would result in the memory being reclaimed down to the memory.low limit, or likely not at all, having the page cache reclaimed asynchronously later. With f56ce412a59d the oomd write to memory.max tries to reclaim all the way down to zero, which may race with another reclaimer, to the point of ending up with the divide by zero below. This patch implements the obvious fix. Link: https://lkml.kernel.org/r/20210826220149.058089c6@imladris.surriel.com Fixes: f56ce412a59d ("mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim") Signed-off-by: Rik van Riel <riel@surriel.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Chris Down <chris@chrisdown.name> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-09-22 12:26:37 +02:00
Andrey Zhizhikin	e4dba8435a	This is the 5.4.145 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmE9pNwACgkQONu9yGCS aT5g2Q//ZEiIQBvw6uZEA3z1Y6tuKFyIxxwOu24EbCTvB1oXXQX/XXQQPWori1Ny OpjuQwXJr2LW+/wEKvUEj8mTrpFD+LsZmXRLBHCw9EqqD5RURDqUZt+xh+4xtV/S 2EgF0nCO9+84wo628Lc1C2LBJZZEo/kD7LnGeln+BXwRS1FQvGfD+5KIpOR2YzqI hrCtVfO5ZQpv59PrAQkfwnfITk9BM1cwA7LCD75WEN59405ZV3mzFyTdvz8s0iGt akxmwajLNGjQ/ro567tjpsWiK7EF26mNRTMZqu1jK6h/KjU9sQ4DzCqB+p5TPh/9 mj/Rzq1lSjLodsR0OznKBqFIVaqXyTU+0cMItjos9MBsG/4GOj8ixbXdFRG99WmK bNsYucotSrE9ApYwsmqYaNcHcGeLUIsYCDFCQp3++oeF59+FA7Pp7B4bI/zcYRwY aqbfTkMzo8/e4OF0B2LCx+8r0xol1SoLwBfcP3hb7rlKp9OkSYsKrJ/29CUuINe1 YC5HdrPf2HP36jlVCll5rQa+ERaxtNSCozgwxHG/x2yeOmiVqxdE+vUUmyRidah8 DvYklCM7upUDi1ujbOwbor9R1jQSXkWMFK76EBB3GJPgguFNyczFXm8xBzfRLQvw H6YjIfnxNt+DLPn5uXIEhU7ISTkUno9i1BEd2NoeT1UiYTlk2bw= =/lic -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmE+VUgACgkQ7G51OISz Hs0D+A//c+QInKUmZ5jcHXXqaiD0JYt5+arGOIBAgIjRcWDL3yJpuuP5nwZRnuRf N0SyJjZIpb/WyS27zEaHwOg7xKkhCqagix/J9qdNDP94W0xoGKdXsxotnixXXjMo tY9RUqwA45i412x6RA4sXHBmyhgPwSuoLnHk+qVF4xOBD2rshVXzv6zQAO51Lg4J 3/a2agXlrreCGtHmnKwn47pW+mkHEaquCmcXaz/6PNbWDHwZpmmtCRlsvMW1PyMe 7TZ/mQ3fofSX3bxuzzYWmddcGTNtwe/IsmEFZmhC3Vl4BGtqE2YKj3439zca1572 6ttLDy9K3pAg/rf0NhIe7nIlogmJyunP6U8CVnuAwH0esixWxyXSSXUKTOXhKYFG jgv/+adhqjSZQhm1p1HY8KItsJE8o5m87b2UKMe1UKGExAJ66qJsgZtuufbEDA5r S+qnQEzE4lfaYmCH6WBtn6a1fodONw/KmsJwk5UhNlv2BVS8LQpcpADVRmq/mCDj FpCe7uFJVaDnxpUc1Trv6D6YJpL6n1KIUQcVyExtmBtpuKK/KTLs+NP5hpUZ9xtV ftAEiZkplJP+qACdCeZt5QukiHBIfds32sUZ4PDp3ANbMmY6UpTWGQyZq+7PNL4d +OzCCDzlXRdZioua6z8oTaAkabo2pcK5ao7GXSor35mU6JZvMVg= =Z0Kz -----END PGP SIGNATURE----- Merge tag 'v5.4.145' into 5.4-2.3.x-imx This is the 5.4.145 stable release Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>	2021-09-12 19:30:13 +00:00
Muchun Song	cf1222b877	mm/page_alloc: speed up the iteration of max_order commit 7ad69832f37e3cea8557db6df7c793905f1135e8 upstream. When we free a page whose order is very close to MAX_ORDER and greater than pageblock_order, it wastes some CPU cycles to increase max_order to MAX_ORDER one by one and check the pageblock migratetype of that page repeatedly especially when MAX_ORDER is much larger than pageblock_order. We also should not be checking migratetype of buddy when "order == MAX_ORDER - 1" as the buddy pfn may be invalid, so adjust the condition. With the new check, we don't need the max_order check anymore, so we replace it. Also adjust max_order initialization so that it's lower by one than previously, which makes the code hopefully more clear. Link: https://lkml.kernel.org/r/20201204155109.55451-1-songmuchun@bytedance.com Fixes: `d9dddbf556` ("mm/page_alloc: prevent merging between isolated and other pageblocks") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-09-12 08:56:41 +02:00
Andrey Zhizhikin	79c30f58eb	This is the 5.4.144 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmEx2AAACgkQONu9yGCS aT7csg//ZhXXfRkPNMhpkkMjcV7F825mLAPs1vsluIEIZ0oInOpegu8SyDENOfui HyFLZ/2Stewa0mn7kNS1caAUXLpFvZ087sIz/SipzupFjLTUHFsNcMYrd19R1M4h UK/owAJeoq/pgR4kUck4o/r+47lo8CMqkscbEdKSvwxYUeANIcbGVB5Sf2UaJr5S lqBZeliWY/jYGvLWBoSc7mvUwWRbkKLnQu2JkfvGKM4ODOzpbh8TUhq8NxEL7ZFn mZxtNmWPvG2PHHvNP89pwKnKQx70ySKrlQdDv10gL6nIHhKuqwLxBo28Q+KcKMYr vfoOFS5Vk35jA7Xt8LhNF+lQtDTbN+2YLeDtoAq+aWMmEW/RUYXSU/3thh+WFuO5 uZZAbrh4r3bew+PLFpEtnVjxkpMsU9EC33KuIZXIGlDEkFlEneJ9pMQYH7XIwQnV 5sSSOnbyzkajxv9Kpu6XEg3kKyJf+gk/AB/psgfMR0v/jQ4PXVk9+cZDZxKFcxjj wGywDkgIb+/sPrABWici/yXjIup0OSG1fK9/Ki9uLgNzxXZ0h+4e3DcXNMxs1B/p GpBPP773qIff2lEDhAI+SbP8pHj5Mnc1j77WUQTU9vsIJcftYm4i0G+POpXnynzx gzbjJjOhTBL57OciLQlmL2s5ZZUPgPvu5VoHsRfwOu/bbarRADE= =RA6W -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmEx8s8ACgkQ7G51OISz Hs1JZA/8Cj7g56QymgMuXHEB1PecU7pLpO5egRK3X6xHxJwksD7Xp2LfpaRxjzGw XNQsp+4mbJX4oHiZPjD/RsFOdVuNU3ff3mliSmoH2Tdepa2TuKFt7T8V3GE7FN6K ns52rvIzbhF762nL1Vs+LE0YBq1w6rTvL7eenNxMo9pwUxJv95X91v7BpRQjTAY5 /ngvj8tRKN10dSokwrCpzk47Sj/jhSoLlckJL7+iOopQdhOo/HTfWj1aPCaZC/AX q2EUg/L2GB1Ij342lDNEZSWn2xAvuAT6+45R8p3GxyG6TMihwiKGXQM922MJDZAV T3Chxgu//OlB/spPMAuFgfBNqaX1z+zxv3Dc1EvEbSNPhn6PwEZ2ck9hYkuPmvI3 78dkyqj3x3AR5VKvc/CpnqSokXBjV7B1TOxJlHKvJ77lvWuDwujir+chmULjahA8 bVPpbBC9BfF/nX0cYsjQuDNyddqTpt3cv1Cp9w5gXhs/Nj5MsNDRyZxaVHlGaI/W h3N6rAU2cNDDtI4Zqr8Lo5IgBLMVUPuj9ZUNUJBKq3YX5CooEmjCtBKZch55Ou5h 6xmcaMgrFre3FHKfvVhJ5ACK/DoPuWLvr4Af4Q0v6kgif81Is4LQQ+EEXRLMryi2 fTv2X5r2GLoMHlUH0WgBW8pY0NuoCJiZuxCY5T1c61pCbDCpRss= =yagG -----END PGP SIGNATURE----- Merge tag 'v5.4.144' into 5.4-2.3.x-imx This is the 5.4.144 stable release Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>	2021-09-03 10:02:52 +00:00
Yafang Shao	765437d1f0	mm, oom: make the calculation of oom badness more accurate [ Upstream commit 9066e5cfb73cdbcdbb49e87999482ab615e9fc76 ] Recently we found an issue on our production environment that when memcg oom is triggered the oom killer doesn't chose the process with largest resident memory but chose the first scanned process. Note that all processes in this memcg have the same oom_score_adj, so the oom killer should chose the process with largest resident memory. Bellow is part of the oom info, which is enough to analyze this issue. [7516987.983223] memory: usage 16777216kB, limit 16777216kB, failcnt 52843037 [7516987.983224] memory+swap: usage 16777216kB, limit 9007199254740988kB, failcnt 0 [7516987.983225] kmem: usage 301464kB, limit 9007199254740988kB, failcnt 0 [...] [7516987.983293] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [7516987.983510] [ 5740] 0 5740 257 1 32768 0 -998 pause [7516987.983574] [58804] 0 58804 4594 771 81920 0 -998 entry_point.bas [7516987.983577] [58908] 0 58908 7089 689 98304 0 -998 cron [7516987.983580] [58910] 0 58910 16235 5576 163840 0 -998 supervisord [7516987.983590] [59620] 0 59620 18074 1395 188416 0 -998 sshd [7516987.983594] [59622] 0 59622 18680 6679 188416 0 -998 python [7516987.983598] [59624] 0 59624 1859266 5161 548864 0 -998 odin-agent [7516987.983600] [59625] 0 59625 707223 9248 983040 0 -998 filebeat [7516987.983604] [59627] 0 59627 416433 64239 774144 0 -998 odin-log-agent [7516987.983607] [59631] 0 59631 180671 15012 385024 0 -998 python3 [7516987.983612] [61396] 0 61396 791287 3189 352256 0 -998 client [7516987.983615] [61641] 0 61641 1844642 29089 946176 0 -998 client [7516987.983765] [ 9236] 0 9236 2642 467 53248 0 -998 php_scanner [7516987.983911] [42898] 0 42898 15543 838 167936 0 -998 su [7516987.983915] [42900] 1000 42900 3673 867 77824 0 -998 exec_script_vr2 [7516987.983918] [42925] 1000 42925 36475 19033 335872 0 -998 python [7516987.983921] [57146] 1000 57146 3673 848 73728 0 -998 exec_script_J2p [7516987.983925] [57195] 1000 57195 186359 22958 491520 0 -998 python2 [7516987.983928] [58376] 1000 58376 275764 14402 290816 0 -998 rosmaster [7516987.983931] [58395] 1000 58395 155166 4449 245760 0 -998 rosout [7516987.983935] [58406] 1000 58406 18285584 3967322 37101568 0 -998 data_sim [7516987.984221] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=3aa16c9482ae3a6f6b78bda68a55d32c87c99b985e0f11331cddf05af6c4d753,mems_allowed=0-1,oom_memcg=/kubepods/podf1c273d3-9b36-11ea-b3df-246e9693c184,task_memcg=/kubepods/podf1c273d3-9b36-11ea-b3df-246e9693c184/1f246a3eeea8f70bf91141eeaf1805346a666e225f823906485ea0b6c37dfc3d,task=pause,pid=5740,uid=0 [7516987.984254] Memory cgroup out of memory: Killed process 5740 (pause) total-vm:1028kB, anon-rss:4kB, file-rss:0kB, shmem-rss:0kB [7516988.092344] oom_reaper: reaped process 5740 (pause), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB We can find that the first scanned process 5740 (pause) was killed, but its rss is only one page. That is because, when we calculate the oom badness in oom_badness(), we always ignore the negtive point and convert all of these negtive points to 1. Now as oom_score_adj of all the processes in this targeted memcg have the same value -998, the points of these processes are all negtive value. As a result, the first scanned process will be killed. The oom_socre_adj (-998) in this memcg is set by kubelet, because it is a a Guaranteed pod, which has higher priority to prevent from being killed by system oom. To fix this issue, we should make the calculation of oom point more accurate. We can achieve it by convert the chosen_point from 'unsigned long' to 'long'. [cai@lca.pw: reported a issue in the previous version] [mhocko@suse.com: fixed the issue reported by Cai] [mhocko@suse.com: add the comment in proc_oom_score()] [laoar.shao@gmail.com: v3] Link: http://lkml.kernel.org/r/1594396651-9931-1-git-send-email-laoar.shao@gmail.com Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: David Rientjes <rientjes@google.com> Cc: Qian Cai <cai@lca.pw> Link: http://lkml.kernel.org/r/1594309987-9919-1-git-send-email-laoar.shao@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-09-03 10:08:12 +02:00
Andrey Zhizhikin	49ef8ef4cd	Linux 5.4.143 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAmEnj2AACgkQ3qZv95d3 LNxNEQ//auFOSmgsMtI8LDmKlP/f22+FmICk8+IHeBMRBMDY0WGEEdsRZgcf4R7M hgyBn8ISmU5W0idpoxzVTiNxDJ0YVbVSIX12lZO6OHnwcv6hNW7iOW5TaGjd8EN+ fkh8MtAToBQrp4fFb1QkC11pYNMPiuvDNB2nW+F3ixfYLyC1EF4g2/qVUKy7s6rZ dbqDfuI3Q7R2opsIkpmsV7ClKGbJzsP7oo0H5EOQMpmOowhg3oJy8oYqMMTgij1T bJU8kujNElsK+/nbpVzJPrpprQH9eGP+hB5ZAv6s/FuJ6RmkoAczYQnX3HL6TfCS ymoyJ01gsmDic9RnG6qei5LkCwf5Td2SKjRZdqGWKTluWD1ZAwzUX8Ww6K+t5uWk PQPyCfU2wk2D3JjJWt0vTxl/GZGAkYbZpy5ISZFJhK7/j9/oTSrPWra7/BRu4K2I 2PK7XGjNyQxSguQqmG064Q0nYEOU03pR2H8tyG3iH0nBBd9p54D0Bg0D73I2h0az PoGhBo71m9SYCPP1zSXl+xLFyWGZZDUYCaU9KPlwkYCCcRUSQbfCKwrYHfEcHZgL 4QtYlpUi+/C0Ga7gAK9ierqCKSNTOpoVna618j97uqCYVIU8estLBqX4mMAQquVF R8+cy6L/aTBVw4Zwd0Jmt85GwBHlHahUGEq87+Qpw/laqjkBFcg= =SIVq -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmEorqsACgkQ7G51OISz Hs3MNQ/9GOK/4DrXYH4RTeGJG2+tTk8VMOrmoDxQOkGXggLvpw06gP1Q9Dz8qzee hkILNVr4xC1qxzBzCIAOwzSM0DOQeGJrjtlDPOHxFB6akmwfZ9mU7W4k3YaBHz2c +pp4YM0YalmVoJSDBVyFrN67q8gorK39yPgi3BC2tUAz9OSJfPsmdKwqe8ICe7Lp hq7R8VfeZdcmYW2vRF5v2yOlzg9vlEd2JfXVL+LJ3R9Eo2Ytlam3gaeObJJqtjDw MObvvSLeG2QjZ38tvrWjudfR1Z0hDFy/E1E1AI4y9STLHHQj0eM+3dzEP1mugVVk bvLeas2Raf8IA0tJfNQIgz54DcrCT7vHGblKkqESg6kJ/fVc+7CxaLf72DdU0KrN 0IzRHu3TX8LCj1BQax6eDWlysa3k837WkRMLK55NKPhqkDCb3IhpI3YB+rVgQUoa DpjmZ+sTDPypasZf2rGTdGosZBccsRdNbohmOuR202rKOeMnndGJRQrAwdYym/WO C9pKosNwBCFk4I97chihvamN/vH0MEJkICwR858I6+uCAMPISNqYcSbRYC3YXoyy VU0JBrp4hM5IlKvxKoRuT/Pplt54xhoWREXOS1ua+BOhcWox85QYt+d7uLxt02iZ 4u9IEXwy0QM+qxZaWw+9+IP4DfOgVa3BAsrVzRfAoHrbwMte7xY= =W2Mz -----END PGP SIGNATURE----- Merge tag 'v5.4.143' into 5.4-2.3.x-imx Linux 5.4.143 Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>	2021-08-27 09:21:44 +00:00
Johannes Weiner	41c7f46c89	mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim [ Upstream commit f56ce412a59d7d938b81de8878faef128812482c ] We've noticed occasional OOM killing when memory.low settings are in effect for cgroups. This is unexpected and undesirable as memory.low is supposed to express non-OOMing memory priorities between cgroups. The reason for this is proportional memory.low reclaim. When cgroups are below their memory.low threshold, reclaim passes them over in the first round, and then retries if it couldn't find pages anywhere else. But when cgroups are slightly above their memory.low setting, page scan force is scaled down and diminished in proportion to the overage, to the point where it can cause reclaim to fail as well - only in that case we currently don't retry, and instead trigger OOM. To fix this, hook proportional reclaim into the same retry logic we have in place for when cgroups are skipped entirely. This way if reclaim fails and some cgroups were scanned with diminished pressure, we'll try another full-force cycle before giving up and OOMing. [akpm@linux-foundation.org: coding-style fixes] Link: https://lkml.kernel.org/r/20210817180506.220056-1-hannes@cmpxchg.org Fixes: `9783aa9917` ("mm, memcg: proportional memory.{low,min} reclaim") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Leon Yang <lnyng@fb.com> Reviewed-by: Rik van Riel <riel@surriel.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Chris Down <chris@chrisdown.name> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> [5.4+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-26 08:36:22 -04:00
Yafang Shao	1a3aa81444	mm, memcg: avoid stale protection values when cgroup is above protection [ Upstream commit 22f7496f0b901249f23c5251eb8a10aae126b909 ] Patch series "mm, memcg: memory.{low,min} reclaim fix & cleanup", v4. This series contains a fix for a edge case in my earlier protection calculation patches, and a patch to make the area overall a little more robust to hopefully help avoid this in future. This patch (of 2): A cgroup can have both memory protection and a memory limit to isolate it from its siblings in both directions - for example, to prevent it from being shrunk below 2G under high pressure from outside, but also from growing beyond 4G under low pressure. Commit `9783aa9917` ("mm, memcg: proportional memory.{low,min} reclaim") implemented proportional scan pressure so that multiple siblings in excess of their protection settings don't get reclaimed equally but instead in accordance to their unprotected portion. During limit reclaim, this proportionality shouldn't apply of course: there is no competition, all pressure is from within the cgroup and should be applied as such. Reclaim should operate at full efficiency. However, mem_cgroup_protected() never expected anybody to look at the effective protection values when it indicated that the cgroup is above its protection. As a result, a query during limit reclaim may return stale protection values that were calculated by a previous reclaim cycle in which the cgroup did have siblings. When this happens, reclaim is unnecessarily hesitant and potentially slow to meet the desired limit. In theory this could lead to premature OOM kills, although it's not obvious this has occurred in practice. Workaround the problem by special casing reclaim roots in mem_cgroup_protection. These memcgs are never participating in the reclaim protection because the reclaim is internal. We have to ignore effective protection values for reclaim roots because mem_cgroup_protected might be called from racing reclaim contexts with different roots. Calculation is relying on root -> leaf tree traversal therefore top-down reclaim protection invariants should hold. The only exception is the reclaim root which should have effective protection set to 0 but that would be problematic for the following setup: Let's have global and A's reclaim in parallel: \| A (low=2G, usage = 3G, max = 3G, children_low_usage = 1.5G) \|\ \| C (low = 1G, usage = 2.5G) B (low = 1G, usage = 0.5G) for A reclaim we have B.elow = B.low C.elow = C.low For the global reclaim A.elow = A.low B.elow = min(B.usage, B.low) because children_low_usage <= A.elow C.elow = min(C.usage, C.low) With the effective values resetting we have A reclaim A.elow = 0 B.elow = B.low C.elow = C.low and global reclaim could see the above and then B.elow = C.elow = 0 because children_low_usage > A.elow Which means that protected memcgs would get reclaimed. In future we would like to make mem_cgroup_protected more robust against racing reclaim contexts but that is likely more complex solution than this simple workaround. [hannes@cmpxchg.org - large part of the changelog] [mhocko@suse.com - workaround explanation] [chris@chrisdown.name - retitle] Fixes: `9783aa9917` ("mm, memcg: proportional memory.{low,min} reclaim") Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Chris Down <chris@chrisdown.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Chris Down <chris@chrisdown.name> Acked-by: Roman Gushchin <guro@fb.com> Link: http://lkml.kernel.org/r/cover.1594638158.git.chris@chrisdown.name Link: http://lkml.kernel.org/r/044fb8ecffd001c7905d27c0c2ad998069fdc396.1594638158.git.chris@chrisdown.name Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-26 08:36:22 -04:00
Andrey Zhizhikin	90c98361bb	This is the 5.4.135 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmD9WokACgkQONu9yGCS aT5RTg/+KOmvPPq4DTSRwQqC7Zk1TzPUQ38H2iZxgpISds7Y0S3RKFmJvXcRoxe2 z0y6b1XErmVvamAlULFEYMxkmpwAiUeO137UqJN/kwyybvEejrAKDiv9kOMcEwh9 zKPfrDQ9UQVbInSMsjQrzaME1voYzdUfhd10vGCxFjQl4RFRy06Fj0SfRmsZeeB+ geu5F6xnba5+IW07okT4FTAsMYPqc+PyP/sENiXQPHt43uSNMQTRdLCh0+7slJ0b Lr9S/euozG8L3wYrs7AUFPaMLDvaQoh4k2mp5oXk8MYYrmKWrLo3e7ZNxBptxjd8 NmwfG9WWfCp4LpN8fMnhrUQxkIj+paDTg9ir1bKmpJwm81miXlWazTQHCw1Mige1 u03P9Q0tUQP3khpVSEE583RLjr8NKR/zkXx97KTL54GsFmwSe4QdbXX3ZlVYj4md FN/8MBBqITNOwm4akObRN4ppOCSD+Qp5a94JOXqmmZ36u+wicAB7SZgVZq6PAmXv kQEYxkS0EALLyzMuK5DBB5zcEq6oT/9Gtr107An1gFGj1hqd1NeV0xPguSxUJLE8 GEL2M9s5jyjbqFZHiz3hPDMB5SKY0T6y8sGtKNmAM6woaLxoRp++JcR/U8m3PpD/ wJ432zHfi6ERp9WsAhyiYpijMj+xU3gCeo8JIP5vsQnaFtvqev8= =qauz -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmEKo44ACgkQ7G51OISz Hs0izw//RZp1Hn1xQTToi8PHof/qNviZESLMuhjtxXftG4bX1PZqvKDtBTYudo6+ hsXyjHma/IFyRcNmzqookE1Fli5mrEm0FIdkyxfOTDur7JdRiTfDle7K2Gej0Maq DKuUO2qlXIxKZwe9YmPNKg+ZzFdlMmhdz6rCbAlumt859zErGK/1YLTqDZL4aiGS RZ43eY2BisU23JHbfIyVdvT4xdgL7vB4uadC7WIoM1WXTH/sv6VPd3rIC7oeAGBR q5/D1yfWvV6uyyX60WJnRH2vEUwv35UdNQkrIiFQ7SzonhbJbkE+ZL481g2IfZ3S OwdA2GMn/LE8+Q+IHtoISnUiyw8n7Lae69COHxUIcmggjIGSw5S5Bqoc4OuVw2dv BHICUux3IYwhHNv5Py3CNKiVLg9tKAvFoScrwofV5ToD/pgEBBjtbB0+OIoXtdMp yQVo/CKuiwwIDTrU1FpVC4rt90gS7EErpjOr/QG8paXMHiMxyhAPnBGLr9SPaueD LTXI3ZWNz+ZOFBLH34LZOMdyuWGNbQjwvi86Z5DuCaFL4ZXGAWVl5OUVF7oUyGkL vtgXfh6nzrsVoTBC7tfsuXuFossrTSvlpPtj2t2SB9hQEohE0pL6mS41inuJa4gP b6b0XRtazzskKT4ApEOoaNqlu0ZnDxC/xTdZN9nC5IZ/Mp+BrPQ= =603y -----END PGP SIGNATURE----- Merge tag 'v5.4.135' into 5.4-2.3.x-imx This is the 5.4.135 stable release Conflicts (manual resolve): - drivers/usb/cdns3/gadget.c: Use NXP version, as upstream commit `f53729b828` ("usb: cdns3: Enable TDL_CHK only for OUT ep") is already applied. - arch/arm64/boot/dts/freescale/imx8mq.dtsi: Merge upstream commit `556cf02830` ("arm64: dts: imx8mq: assign PCIe clocks") manually into NXP tree. Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>	2021-08-04 14:25:42 +00:00
Nanyong Sun	8a85afc662	mm: slab: fix kmem_cache_create failed when sysfs node not destroyed The commit d38a2b7a9c93 ("mm: memcg/slab: fix memory leak at non-root kmem_cache destroy") introduced a problem: If one thread destroy a kmem_cache A and another thread concurrently create a kmem_cache B, which is mergeable with A and has same size with A, the B may fail to create due to the duplicate sysfs node. The scenario in detail: 1) Thread 1 uses kmem_cache_destroy() to destroy kmem_cache A which is mergeable, it decreases A's refcount and if refcount is 0, then call memcg_set_kmem_cache_dying() which set A->memcg_params.dying = true, then unlock the slab_mutex and call flush_memcg_workqueue(), it may cost a while. Note: now the sysfs node(like '/kernel/slab/:0000248') of A is still present, it will be deleted in shutdown_cache() which will be called after flush_memcg_workqueue() is done and lock the slab_mutex again. 2) Now if thread 2 is coming, it use kmem_cache_create() to create B, which is mergeable with A(their size is same), it gain the lock of slab_mutex, then call __kmem_cache_alias() trying to find a mergeable node, because of the below added code in commit d38a2b7a9c93 ("mm: memcg/slab: fix memory leak at non-root kmem_cache destroy"), B is not mergeable with A whose memcg_params.dying is true. int slab_unmergeable(struct kmem_cache s) if (s->refcount < 0) return 1; / * Skip the dying kmem_cache. */ if (s->memcg_params.dying) return 1; return 0; } So B has to create its own sysfs node by calling: create_cache-> __kmem_cache_create-> sysfs_slab_add-> kobject_init_and_add Because B is mergeable itself, its filename of sysfs node is based on its size, like '/kernel/slab/:0000248', which is duplicate with A, and the sysfs node of A is still present now, so kobject_init_and_add() will return fail and result in kmem_cache_create() fail. Concurrently modprobe and rmmod the two modules below can reproduce the issue quickly: nf_conntrack_expect, se_sess_cache. See call trace in the end. LTS versions of v4.19.y and v5.4.y have this problem, whereas linux versions after v5.9 do not have this problem because the patchset: ("The new cgroup slab memory controller") almost refactored memcg slab. A potential solution(this patch belongs): Just let the dying kmem_cache be mergeable, the slab_mutex lock can prevent the race between alias kmem_cache creating thread and root kmem_cache destroying thread. In the destroying thread, after flush_memcg_workqueue() is done, judge the refcount again, if someone reference it again during un-lock time, we don't need to destroy the kmem_cache completely, we can reuse it. Another potential solution: revert the commit d38a2b7a9c93 ("mm: memcg/slab: fix memory leak at non-root kmem_cache destroy"), compare to the fail of kmem_cache_create, the memory leak in special scenario seems less harmful. Call trace: sysfs: cannot create duplicate filename '/kernel/slab/:0000248' Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: dump_backtrace+0x0/0x198 show_stack+0x24/0x30 dump_stack+0xb0/0x100 sysfs_warn_dup+0x6c/0x88 sysfs_create_dir_ns+0x104/0x120 kobject_add_internal+0xd0/0x378 kobject_init_and_add+0x90/0xd8 sysfs_slab_add+0x16c/0x2d0 __kmem_cache_create+0x16c/0x1d8 create_cache+0xbc/0x1f8 kmem_cache_create_usercopy+0x1a0/0x230 kmem_cache_create+0x50/0x68 init_se_kmem_caches+0x38/0x258 [target_core_mod] target_core_init_configfs+0x8c/0x390 [target_core_mod] do_one_initcall+0x54/0x230 do_init_module+0x64/0x1ec load_module+0x150c/0x16f0 __se_sys_finit_module+0xf0/0x108 __arm64_sys_finit_module+0x24/0x30 el0_svc_common+0x80/0x1c0 el0_svc_handler+0x78/0xe0 el0_svc+0x10/0x260 kobject_add_internal failed for :0000248 with -EEXIST, don't try to register things with the same name in the same directory. kmem_cache_create(se_sess_cache) failed with error -17 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: dump_backtrace+0x0/0x198 show_stack+0x24/0x30 dump_stack+0xb0/0x100 kmem_cache_create_usercopy+0xa8/0x230 kmem_cache_create+0x50/0x68 init_se_kmem_caches+0x38/0x258 [target_core_mod] target_core_init_configfs+0x8c/0x390 [target_core_mod] do_one_initcall+0x54/0x230 do_init_module+0x64/0x1ec load_module+0x150c/0x16f0 __se_sys_finit_module+0xf0/0x108 __arm64_sys_finit_module+0x24/0x30 el0_svc_common+0x80/0x1c0 el0_svc_handler+0x78/0xe0 el0_svc+0x10/0x260 Fixes: d38a2b7a9c93 ("mm: memcg/slab: fix memory leak at non-root kmem_cache destroy") Signed-off-by: Nanyong Sun <sunnanyong@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-25 14:35:14 +02:00
Andrey Zhizhikin	e9646ca701	This is the 5.4.132 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmDu+p0ACgkQONu9yGCS aT5SOw/9F58e4gz7PSTn4A9oCTNodRPe9B9rzf3y1Ol0k7T1aeQoWsPFOkZpNSOJ tdOGEXnwYnLpMC7nuFshWv1uKGAL/weHADyGV6J37AntYFjpEFhJhSH7pGGhDk7V EeIl98luBynPXOKNnDvcrQweeRaHKOInQBT8JJzwwsZbF2oqfOqdU0A787BiRu+3 zoi/mV0upDB443ji/JY0xj+o4jlbsuD0WxEqgkcD2YHL+QvU5Wr0mGys7m5gG9x7 TpKpMic0ILrF1vt/znLL5rOlX497prTvZ74ZXV/DYizeYxqtl/UG3CZjo1uf2yqk pAXA57paz6DY2Ct+3QbJBeuer27bTz6SCClSS1om9AcUk6oNSdULmMdTGvQb0SLU wx1Cy8b2ei04SVl96+McKKZ6ln47LJediGn0qIdwC6O/XHHrLq4u5PkSnQxRU4pA GH1tP5oYy4GzL9RbBeiDJQETFiXwkexSEWVyuSc6BhqQXao9yVzmLQbL1zgjH/zO m/tckZ3vEg+ll8j4QJCisHRyqYhwfru4PsJQH9Q7q6CtIuGOsd0Z/OUcLuF6knXg jDOrDIykE/PnkQ2Dc2RhdONP1ud5j3oBnHvNHs6FDghRKjaixMQzg3g/RNtnAaTj +7Xsfbi6ntpZSDOaY7YNgt+ZH3l4YRnUL/xBA6qIygayz374nzI= =LU0G -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmD25ikACgkQ7G51OISz Hs1dcw/8DKef1hGC5O2WKfpInTYtgnClkyD5/yOnGAPMvMRDGybA3dRejpIEefNM Qol1XICjb0wdBDV0I+n+fnGbBgaX3g0N/pn16pdbbSPBBe1L+d97gZNZznDGHYZu 033qtbxii8e0QTxTvO7nx4L80ZZsyPLchpPxowS/vd1Ezti+pTIU4y43MCc2jYLL KqUBDz72TkPLhgVZdDJ1z9gb+OoJ+sJPaeBrO57hpY/os9SxlMPeY56YrD3Hyfy5 IHZw3bTCDiIXpHBaJG8fvuudaM5M8V3dbD6oXnEPo1Gzb1Y7WR4Z7q28g7arSYjP fMPd243mCXd1V7LpmapxXvFsnbdsA7oauTho50dwmEvxQf9jEgX6thBWAFsrItaS crHdOppS7Lc3FK8cTMxZd6ZyZpaU6sF183tMOteuhtwmF/uoy1LBHqLnAvtfWYrl InGcImgABRkiYBRyODlgC4UNLd49Svon/8HcbBZlmeGIkosXjo5r1itnipgnF/TB /NkHRkixYTBCnJZyx+9Lihqw+HMnHVfjOnIBjbXjzX9ITH/tiMn4y87E+x9vRQqr Td5AKJwiSXSWZBQoX+XNLqXRwjZKHVQe45J4gzL9dhCzi9bwK99BvBPWr8+JyI7w 83YQfkhPju47+KFrEN6DUBxdYrROsJLsjgdTl38IlCi4SKoQSkE= =Li9I -----END PGP SIGNATURE----- Merge tag 'v5.4.132' into 5.4-2.3.x-imx This is the 5.4.132 stable release Conflicts (manual resolve): - drivers/gpu/drm/rockchip/cdn-dp-core.c: Fix merge hiccup when integrating upstream commit `450c25b8a4` ("drm/rockchip: cdn-dp-core: add missing clk_disable_unprepare() on error in cdn_dp_grf_write()") - drivers/perf/fsl_imx8_ddr_perf.c: Port upstream commit `3fea9b708a` ("drivers/perf: fix the missed ida_simple_remove() in ddr_perf_probe()") manually to NXP version. Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>	2021-07-20 15:04:13 +00:00
Miaohe Lin	49496327c2	mm/z3fold: fix potential memory leak in z3fold_destroy_pool() [ Upstream commit dac0d1cfda56472378d330b1b76b9973557a7b1d ] There is a memory leak in z3fold_destroy_pool() as it forgets to free_percpu pool->unbuddied. Call free_percpu for pool->unbuddied to fix this issue. Link: https://lkml.kernel.org/r/20210619093151.1492174-6-linmiaohe@huawei.com Fixes: `d30561c56f` ("z3fold: use per-cpu unbuddied lists") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Hillf Danton <hdanton@sina.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:53:47 +02:00
Miaohe Lin	4b515fa948	mm/huge_memory.c: don't discard hugepage if other processes are mapping it [ Upstream commit babbbdd08af98a59089334eb3effbed5a7a0cf7f ] If other processes are mapping any other subpages of the hugepage, i.e. in pte-mapped thp case, page_mapcount() will return 1 incorrectly. Then we would discard the page while other processes are still mapping it. Fix it by using total_mapcount() which can tell whether other processes are still mapping it. Link: https://lkml.kernel.org/r/20210511134857.1581273-6-linmiaohe@huawei.com Fixes: `b8d3c4c300` ("mm/huge_memory.c: don't split THP page when MADV_FREE syscall is called") Reviewed-by: Yang Shi <shy828301@gmail.com> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Rik van Riel <riel@surriel.com> Cc: Song Liu <songliubraving@fb.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:53:47 +02:00
Andrey Zhizhikin	3aef3ef268	Linux 5.4.129 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAmDcbxkACgkQ3qZv95d3 LNxZMBAArNPLhVYdEDDFosb6Y/5RGjjZ/79OGHH0p5YiTo8D+wBHi+wXRl5Jp0PA 3YVVU8lDTbeDm7E7uWeduWjFwEpsPBL8395scbhC6VR3PfnyunjarVXZgi6EHnMl p6HjXXtQ1jTrdDSziGDIhZVQT5FGb2/MMx9m69mfi5BTLjGfWy8chHFbC2GZszlp Znu9syjisUBbc4I4XHFgXw0hoQSSig6SUTZCrdTpIW/PZ0swfl8ZPxREh0CZNMpw Y2orRt+oHlkWPw1/sSkoTE1PRvXwNWFXyw5caOu846jAfhKtxO54SsqJqhM7VLHZ pdH4eb6q7AFyt0A62HkIqa5oabs5Vk9G24b8m5ggc2F/UTkHqgwUcMCud0d3DYL0 Q7OEAmThQzHHKJ+CeNRJLsiKqVBNHmeS24B+ELldlAiX22vLr9pUsIb342Au1ZjR S3BTnneAbYGBv4qUoV2yUF9wQ/LxsFMSl/vmjCBOxg7c3LbKYChUwskYnvd6EwWj ObCyLU6FK9HWXSBSp/X+irlF1CLla+HuOC+Aej2U5a8DtmHId4LHMeq/XOxZ9s/8 QUoX4rh5P+TJ8PIiTqXKrQo5rnR79MiYssIhUozKTdt9ZoMtXzI4mVLXN/yzAVD9 v4aWYx8m2x17Wq+ptaLMSTSed4m3c25uEl4MucLBmKQV8ClAxW8= =Sijo -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmDchOoACgkQ7G51OISz Hs3s9Q/+OfeIcTzOY6qDX53PSxK5iF3xZEiPR5HOSVu1cY4+1tyt0bkYdQJ2N//d NH6YCuOsVI5chRxKN8b6I01U6xxQ0VNvj5RYmuC0AqhhzJBpfhh6FrrGuVxP+dCj aa2twfRZO+Y0yzrIYdSvYdRhJlEhMbKGNyHu871tyQQTlp0eQKy9lBhcCuVSCMso p64lRlK35GDi6TaWQ7R0SYOziWWSHByf4p3h4ZlpyhB7w6yMNjNrts4qSoZb2uOY I453WroS7tWV3qkOZ9FeqjHyg4mjBg87/wSMZ0DbBANinlQZW5YTOjE/WyPossL3 C4jTP8NBLsng+ZfhAaMRfbvxGj5gbY0ghSOmsuglAOQDd+tlTfB23pkgHFlj6aie FUplJnTLd8VXPWkoNVKH6GfG2rZAUtQKM4EWDPt2KK3Dch9W3YH1bHL9EyxR0G7f aDBHM9egTyPOYNcfeVDj0KEJTrOxkmCcYz83D+eEDGKHi//Gcb+k9Gsg6q8X6Yxa ejFx5RBM5SW57LA70ZK3+HWhqfqcILYJT7Lxoue7bH+15vSGnTkmJKS80OUrP8w8 y9gQ9WNsmjQUHvtKPAIZUZYlUS05FhtQ+faC8XyrLnfZj+ZCGBZor/KsNe0SP/UN bAYudfCn0YkjGfTNcljOe4oAnwmnVKAjGlQkHOFw+llmM0czEBk= =vpHo -----END PGP SIGNATURE----- Merge tag 'v5.4.129' into 5.4-2.3.x-imx Linux 5.4.129 Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>	2021-06-30 14:51:20 +00:00
Hugh Dickins	61168eafe0	mm, futex: fix shared futex pgoff on shmem huge page [ Upstream commit fe19bd3dae3d15d2fbfdb3de8839a6ea0fe94264 ] If more than one futex is placed on a shmem huge page, it can happen that waking the second wakes the first instead, and leaves the second waiting: the key's shared.pgoff is wrong. When 3.11 commit `13d60f4b6a` ("futex: Take hugepages into account when generating futex_key"), the only shared huge pages came from hugetlbfs, and the code added to deal with its exceptional page->index was put into hugetlb source. Then that was missed when 4.8 added shmem huge pages. page_to_pgoff() is what others use for this nowadays: except that, as currently written, it gives the right answer on hugetlbfs head, but nonsense on hugetlbfs tails. Fix that by calling hugetlbfs-specific hugetlb_basepage_index() on PageHuge tails as well as on head. Yes, it's unconventional to declare hugetlb_basepage_index() there in pagemap.h, rather than in hugetlb.h; but I do not expect anything but page_to_pgoff() ever to need it. [akpm@linux-foundation.org: give hugetlb_basepage_index() prototype the correct scope] Link: https://lkml.kernel.org/r/b17d946b-d09-326e-b42a-52884c36df32@google.com Fixes: `800d8c63b2` ("shmem: add huge pages support") Reported-by: Neel Natu <neelnatu@google.com> Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Zhang Yi <wetpzy@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Note on stable backport: leave redundant #include <linux/hugetlb.h> in kernel/futex.c, to avoid conflict over the header files included. Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-30 08:47:55 -04:00
Hugh Dickins	a33b70d625	mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() Aha! Shouldn't that quick scan over pte_none()s make sure that it holds ptlock in the PVMW_SYNC case? That too might have been responsible for BUGs or WARNs in split_huge_page_to_list() or its unmap_page(), though I've never seen any. Link: https://lkml.kernel.org/r/1bdf384c-8137-a149-2a1e-475a4791c3c@google.com Link: https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/ Fixes: `ace71a19ce` ("mm: introduce page_vma_mapped_walk()") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Wang Yugui <wangyugui@e16-tech.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-30 08:47:55 -04:00
Hugh Dickins	e045e9e79d	mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes Running certain tests with a DEBUG_VM kernel would crash within hours, on the total_mapcount BUG() in split_huge_page_to_list(), while trying to free up some memory by punching a hole in a shmem huge page: split's try_to_unmap() was unable to find all the mappings of the page (which, on a !DEBUG_VM kernel, would then keep the huge page pinned in memory). Crash dumps showed two tail pages of a shmem huge page remained mapped by pte: ptes in a non-huge-aligned vma of a gVisor process, at the end of a long unmapped range; and no page table had yet been allocated for the head of the huge page to be mapped into. Although designed to handle these odd misaligned huge-page-mapped-by-pte cases, page_vma_mapped_walk() falls short by returning false prematurely when !pmd_present or !pud_present or !p4d_present or !pgd_present: there are cases when a huge page may span the boundary, with ptes present in the next. Restructure page_vma_mapped_walk() as a loop to continue in these cases, while keeping its layout much as before. Add a step_forward() helper to advance pvmw->address across those boundaries: originally I tried to use mm's standard p?d_addr_end() macros, but hit the same crash 512 times less often: because of the way redundant levels are folded together, but folded differently in different configurations, it was just too difficult to use them correctly; and step_forward() is simpler anyway. Link: https://lkml.kernel.org/r/fedb8632-1798-de42-f39e-873551d5bc81@google.com Fixes: `ace71a19ce` ("mm: introduce page_vma_mapped_walk()") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-30 08:47:54 -04:00
Hugh Dickins	037a1d67d2	mm: page_vma_mapped_walk(): get vma_address_end() earlier page_vma_mapped_walk() cleanup: get THP's vma_address_end() at the start, rather than later at next_pte. It's a little unnecessary overhead on the first call, but makes for a simpler loop in the following commit. Link: https://lkml.kernel.org/r/4542b34d-862f-7cb4-bb22-e0df6ce830a2@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-30 08:47:54 -04:00
Hugh Dickins	fa89d53694	mm: page_vma_mapped_walk(): use goto instead of while (1) page_vma_mapped_walk() cleanup: add a label this_pte, matching next_pte, and use "goto this_pte", in place of the "while (1)" loop at the end. Link: https://lkml.kernel.org/r/a52b234a-851-3616-2525-f42736e8934@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-30 08:47:54 -04:00
Hugh Dickins	a499febd99	mm: page_vma_mapped_walk(): add a level of indentation page_vma_mapped_walk() cleanup: add a level of indentation to much of the body, making no functional change in this commit, but reducing the later diff when this is all converted to a loop. [hughd@google.com: : page_vma_mapped_walk(): add a level of indentation fix] Link: https://lkml.kernel.org/r/7f817555-3ce1-c785-e438-87d8efdcaf26@google.com Link: https://lkml.kernel.org/r/efde211-f3e2-fe54-977-ef481419e7f3@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-30 08:47:54 -04:00
Hugh Dickins	b1783bf8c8	mm: page_vma_mapped_walk(): crossing page table boundary page_vma_mapped_walk() cleanup: adjust the test for crossing page table boundary - I believe pvmw->address is always page-aligned, but nothing else here assumed that; and remember to reset pvmw->pte to NULL after unmapping the page table, though I never saw any bug from that. Link: https://lkml.kernel.org/r/799b3f9c-2a9e-dfef-5d89-26e9f76fd97@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-30 08:47:54 -04:00
Hugh Dickins	80b2270a14	mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block page_vma_mapped_walk() cleanup: rearrange the !pmd_present() block to follow the same "return not_found, return not_found, return true" pattern as the block above it (note: returning not_found there is never premature, since existence or prior existence of huge pmd guarantees good alignment). Link: https://lkml.kernel.org/r/378c8650-1488-2edf-9647-32a53cf2e21@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-30 08:47:54 -04:00
Hugh Dickins	ef161ccaca	mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd page_vma_mapped_walk() cleanup: re-evaluate pmde after taking lock, then use it in subsequent tests, instead of repeatedly dereferencing pointer. Link: https://lkml.kernel.org/r/53fbc9d-891e-46b2-cb4b-468c3b19238e@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-30 08:47:53 -04:00
Hugh Dickins	4961160272	mm: page_vma_mapped_walk(): settle PageHuge on entry page_vma_mapped_walk() cleanup: get the hugetlbfs PageHuge case out of the way at the start, so no need to worry about it later. Link: https://lkml.kernel.org/r/e31a483c-6d73-a6bb-26c5-43c3b880a2@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-30 08:47:53 -04:00
Hugh Dickins	52e2b20fb5	mm: page_vma_mapped_walk(): use page for pvmw->page Patch series "mm: page_vma_mapped_walk() cleanup and THP fixes". I've marked all of these for stable: many are merely cleanups, but I think they are much better before the main fix than after. This patch (of 11): page_vma_mapped_walk() cleanup: sometimes the local copy of pvwm->page was used, sometimes pvmw->page itself: use the local copy "page" throughout. Link: https://lkml.kernel.org/r/589b358c-febc-c88e-d4c2-7834b37fa7bf@google.com Link: https://lkml.kernel.org/r/88e67645-f467-c279-bf5e-af4b5c6b13eb@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-30 08:47:53 -04:00
Yang Shi	82ee7326af	mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split [ Upstream commit 504e070dc08f757bccaed6d05c0f53ecbfac8a23 ] When debugging the bug reported by Wang Yugui [1], try_to_unmap() may fail, but the first VM_BUG_ON_PAGE() just checks page_mapcount() however it may miss the failure when head page is unmapped but other subpage is mapped. Then the second DEBUG_VM BUG() that check total mapcount would catch it. This may incur some confusion. As this is not a fatal issue, so consolidate the two DEBUG_VM checks into one VM_WARN_ON_ONCE_PAGE(). [1] https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/ Link: https://lkml.kernel.org/r/d0f0db68-98b8-ebfb-16dc-f29df24cf012@google.com Signed-off-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Jue Wang <juew@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Note on stable backport: fixed up variables in split_huge_page_to_list(), and fixed up the conflict on ttu_flags in unmap_page(). Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-30 08:47:53 -04:00
Hugh Dickins	bd43892152	mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() [ Upstream commit 22061a1ffabdb9c3385de159c5db7aac3a4df1cc ] There is a race between THP unmapping and truncation, when truncate sees pmd_none() and skips the entry, after munmap's zap_huge_pmd() cleared it, but before its page_remove_rmap() gets to decrement compound_mapcount: generating false "BUG: Bad page cache" reports that the page is still mapped when deleted. This commit fixes that, but not in the way I hoped. The first attempt used try_to_unmap(page, TTU_SYNC\|TTU_IGNORE_MLOCK) instead of unmap_mapping_range() in truncate_cleanup_page(): it has often been an annoyance that we usually call unmap_mapping_range() with no pages locked, but there apply it to a single locked page. try_to_unmap() looks more suitable for a single locked page. However, try_to_unmap_one() contains a VM_BUG_ON_PAGE(!pvmw.pte,page): it is used to insert THP migration entries, but not used to unmap THPs. Copy zap_huge_pmd() and add THP handling now? Perhaps, but their TLB needs are different, I'm too ignorant of the DAX cases, and couldn't decide how far to go for anon+swap. Set that aside. The second attempt took a different tack: make no change in truncate.c, but modify zap_huge_pmd() to insert an invalidated huge pmd instead of clearing it initially, then pmd_clear() between page_remove_rmap() and unlocking at the end. Nice. But powerpc blows that approach out of the water, with its serialize_against_pte_lookup(), and interesting pgtable usage. It would need serious help to get working on powerpc (with a minor optimization issue on s390 too). Set that aside. Just add an "if (page_mapped(page)) synchronize_rcu();" or other such delay, after unmapping in truncate_cleanup_page()? Perhaps, but though that's likely to reduce or eliminate the number of incidents, it would give less assurance of whether we had identified the problem correctly. This successful iteration introduces "unmap_mapping_page(page)" instead of try_to_unmap(), and goes the usual unmap_mapping_range_tree() route, with an addition to details. Then zap_pmd_range() watches for this case, and does spin_unlock(pmd_lock) if so - just like page_vma_mapped_walk() now does in the PVMW_SYNC case. Not pretty, but safe. Note that unmap_mapping_page() is doing a VM_BUG_ON(!PageLocked) to assert its interface; but currently that's only used to make sure that page->mapping is stable, and zap_pmd_range() doesn't care if the page is locked or not. Along these lines, in invalidate_inode_pages2_range() move the initial unmap_mapping_range() out from under page lock, before then calling unmap_mapping_page() under page lock if still mapped. Link: https://lkml.kernel.org/r/a2a4a148-cdd8-942c-4ef8-51b77f643dbe@google.com Fixes: `fc127da085` ("truncate: handle file thp") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Jue Wang <juew@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Note on stable backport: fixed up call to truncate_cleanup_page() in truncate_inode_pages_range(). Use hpage_nr_pages() in unmap_mapping_page(). Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-30 08:47:53 -04:00
Jue Wang	b767134ec3	mm/thp: fix page_address_in_vma() on file THP tails commit 31657170deaf1d8d2f6a1955fbc6fa9d228be036 upstream. Anon THP tails were already supported, but memory-failure may need to use page_address_in_vma() on file THP tails, which its page->mapping check did not permit: fix it. hughd adds: no current usage is known to hit the issue, but this does fix a subtle trap in a general helper: best fixed in stable sooner than later. Link: https://lkml.kernel.org/r/a0d9b53-bf5d-8bab-ac5-759dc61819c1@google.com Fixes: `800d8c63b2` ("shmem: add huge pages support") Signed-off-by: Jue Wang <juew@google.com> Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Yang Shi <shy828301@gmail.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-30 08:47:53 -04:00
Hugh Dickins	41432a8a67	mm/thp: fix vma_address() if virtual address below file offset [ Upstream commit 494334e43c16d63b878536a26505397fce6ff3a2 ] Running certain tests with a DEBUG_VM kernel would crash within hours, on the total_mapcount BUG() in split_huge_page_to_list(), while trying to free up some memory by punching a hole in a shmem huge page: split's try_to_unmap() was unable to find all the mappings of the page (which, on a !DEBUG_VM kernel, would then keep the huge page pinned in memory). When that BUG() was changed to a WARN(), it would later crash on the VM_BUG_ON_VMA(end < vma->vm_start \|\| start >= vma->vm_end, vma) in mm/internal.h:vma_address(), used by rmap_walk_file() for try_to_unmap(). vma_address() is usually correct, but there's a wraparound case when the vm_start address is unusually low, but vm_pgoff not so low: vma_address() chooses max(start, vma->vm_start), but that decides on the wrong address, because start has become almost ULONG_MAX. Rewrite vma_address() to be more careful about vm_pgoff; move the VM_BUG_ON_VMA() out of it, returning -EFAULT for errors, so that it can be safely used from page_mapped_in_vma() and page_address_in_vma() too. Add vma_address_end() to apply similar care to end address calculation, in page_vma_mapped_walk() and page_mkclean_one() and try_to_unmap_one(); though it raises a question of whether callers would do better to supply pvmw->end to page_vma_mapped_walk() - I chose not, for a smaller patch. An irritation is that their apparent generality breaks down on KSM pages, which cannot be located by the page->index that page_to_pgoff() uses: as commit `4b0ece6fa0` ("mm: migrate: fix remove_migration_pte() for ksm pages") once discovered. I dithered over the best thing to do about that, and have ended up with a VM_BUG_ON_PAGE(PageKsm) in both vma_address() and vma_address_end(); though the only place in danger of using it on them was try_to_unmap_one(). Sidenote: vma_address() and vma_address_end() now use compound_nr() on a head page, instead of thp_size(): to make the right calculation on a hugetlbfs page, whether or not THPs are configured. try_to_unmap() is used on hugetlbfs pages, but perhaps the wrong calculation never mattered. Link: https://lkml.kernel.org/r/caf1c1a3-7cfb-7f8f-1beb-ba816e932825@google.com Fixes: `a8fa41ad2f` ("mm, rmap: check all VMAs that PTE-mapped THP can be part of") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Jue Wang <juew@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Note on stable backport: fixed up conflicts on intervening thp_size(). Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-30 08:47:52 -04:00
Hugh Dickins	4b0a34e222	mm/thp: try_to_unmap() use TTU_SYNC for safe splitting [ Upstream commit 732ed55823fc3ad998d43b86bf771887bcc5ec67 ] Stressing huge tmpfs often crashed on unmap_page()'s VM_BUG_ON_PAGE (!unmap_success): with dump_page() showing mapcount:1, but then its raw struct page output showing _mapcount ffffffff i.e. mapcount 0. And even if that particular VM_BUG_ON_PAGE(!unmap_success) is removed, it is immediately followed by a VM_BUG_ON_PAGE(compound_mapcount(head)), and further down an IS_ENABLED(CONFIG_DEBUG_VM) total_mapcount BUG(): all indicative of some mapcount difficulty in development here perhaps. But the !CONFIG_DEBUG_VM path handles the failures correctly and silently. I believe the problem is that once a racing unmap has cleared pte or pmd, try_to_unmap_one() may skip taking the page table lock, and emerge from try_to_unmap() before the racing task has reached decrementing mapcount. Instead of abandoning the unsafe VM_BUG_ON_PAGE(), and the ones that follow, use PVMW_SYNC in try_to_unmap_one() in this case: adding TTU_SYNC to the options, and passing that from unmap_page(). When CONFIG_DEBUG_VM, or for non-debug too? Consensus is to do the same for both: the slight overhead added should rarely matter, except perhaps if splitting sparsely-populated multiply-mapped shmem. Once confident that bugs are fixed, TTU_SYNC here can be removed, and the race tolerated. Link: https://lkml.kernel.org/r/c1e95853-8bcd-d8fd-55fa-e7f2488e78f@google.com Fixes: `fec89c109f` ("thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers") Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Jue Wang <juew@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Note on stable backport: upstream TTU_SYNC 0x10 takes the value which 5.11 commit 013339df116c ("mm/rmap: always do TTU_IGNORE_ACCESS") freed. It is very tempting to backport that commit (as 5.10 already did) and make no change here; but on reflection, good as that commit is, I'm reluctant to include any possible side-effect of it in this series. Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-30 08:47:52 -04:00
Hugh Dickins	bd092a0f19	mm/thp: make is_huge_zero_pmd() safe and quicker commit 3b77e8c8cde581dadab9a0f1543a347e24315f11 upstream. Most callers of is_huge_zero_pmd() supply a pmd already verified present; but a few (notably zap_huge_pmd()) do not - it might be a pmd migration entry, in which the pfn is encoded differently from a present pmd: which might pass the is_huge_zero_pmd() test (though not on x86, since L1TF forced us to protect against that); or perhaps even crash in pmd_page() applied to a swap-like entry. Make it safe by adding pmd_present() check into is_huge_zero_pmd() itself; and make it quicker by saving huge_zero_pfn, so that is_huge_zero_pmd() will not need to do that pmd_page() lookup each time. __split_huge_pmd_locked() checked pmd_trans_huge() before: that worked, but is unnecessary now that is_huge_zero_pmd() checks present. Link: https://lkml.kernel.org/r/21ea9ca-a1f5-8b90-5e88-95fb1c49bbfa@google.com Fixes: `e71769ae52` ("mm: enable thp migration for shmem thp") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Jue Wang <juew@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-30 08:47:52 -04:00
Hugh Dickins	4c37d7f269	mm/thp: fix __split_huge_pmd_locked() on shmem migration entry [ Upstream commit 99fa8a48203d62b3743d866fc48ef6abaee682be ] Patch series "mm/thp: fix THP splitting unmap BUGs and related", v10. Here is v2 batch of long-standing THP bug fixes that I had not got around to sending before, but prompted now by Wang Yugui's report https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/ Wang Yugui has tested a rollup of these fixes applied to 5.10.39, and they have done no harm, but have not fixed that issue: something more is needed and I have no idea of what. This patch (of 7): Stressing huge tmpfs page migration racing hole punch often crashed on the VM_BUG_ON(!pmd_present) in pmdp_huge_clear_flush(), with DEBUG_VM=y kernel; or shortly afterwards, on a bad dereference in __split_huge_pmd_locked() when DEBUG_VM=n. They forgot to allow for pmd migration entries in the non-anonymous case. Full disclosure: those particular experiments were on a kernel with more relaxed mmap_lock and i_mmap_rwsem locking, and were not repeated on the vanilla kernel: it is conceivable that stricter locking happens to avoid those cases, or makes them less likely; but __split_huge_pmd_locked() already allowed for pmd migration entries when handling anonymous THPs, so this commit brings the shmem and file THP handling into line. And while there: use old_pmd rather than _pmd, as in the following blocks; and make it clearer to the eye that the !vma_is_anonymous() block is self-contained, making an early return after accounting for unmapping. Link: https://lkml.kernel.org/r/af88612-1473-2eaa-903-8d1a448b26@google.com Link: https://lkml.kernel.org/r/dd221a99-efb3-cd1d-6256-7e646af29314@google.com Fixes: `e71769ae52` ("mm: enable thp migration for shmem thp") Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Jue Wang <juew@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Shakeel Butt <shakeelb@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Note on stable backport: this commit made intervening cleanups in pmdp_huge_clear_flush() redundant: here it's rediffed to skip them. Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-30 08:47:52 -04:00
Xu Yu	7ce4b73d34	mm, thp: use head page in __migration_entry_wait() commit ffc90cbb2970ab88b66ea51dd580469eede57b67 upstream. We notice that hung task happens in a corner but practical scenario when CONFIG_PREEMPT_NONE is enabled, as follows. Process 0 Process 1 Process 2..Inf split_huge_page_to_list unmap_page split_huge_pmd_address __migration_entry_wait(head) __migration_entry_wait(tail) remap_page (roll back) remove_migration_ptes rmap_walk_anon cond_resched Where __migration_entry_wait(tail) is occurred in kernel space, e.g., copy_to_user in fstat, which will immediately fault again without rescheduling, and thus occupy the cpu fully. When there are too many processes performing __migration_entry_wait on tail page, remap_page will never be done after cond_resched. This makes __migration_entry_wait operate on the compound head page, thus waits for remap_page to complete, whether the THP is split successfully or roll back. Note that put_and_wait_on_page_locked helps to drop the page reference acquired with get_page_unless_zero, as soon as the page is on the wait queue, before actually waiting. So splitting the THP is only prevented for a brief interval. Link: https://lkml.kernel.org/r/b9836c1dd522e903891760af9f0c86a2cce987eb.1623144009.git.xuyu@linux.alibaba.com Fixes: `ba98828088` ("thp: add option to setup migration entries during PMD split") Suggested-by: Hugh Dickins <hughd@google.com> Signed-off-by: Gang Deng <gavin.dg@linux.alibaba.com> Signed-off-by: Xu Yu <xuyu@linux.alibaba.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-30 08:47:52 -04:00
Miaohe Lin	68ce37ebe0	mm/rmap: use page_not_mapped in try_to_unmap() [ Upstream commit b7e188ec98b1644ff70a6d3624ea16aadc39f5e0 ] page_mapcount_is_zero() calculates accurately how many mappings a hugepage has in order to check against 0 only. This is a waste of cpu time. We can do this via page_not_mapped() to save some possible atomic_read cycles. Remove the function page_mapcount_is_zero() as it's not used anymore and move page_not_mapped() above try_to_unmap() to avoid identifier undeclared compilation error. Link: https://lkml.kernel.org/r/20210130084904.35307-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-30 08:47:52 -04:00
Miaohe Lin	432b61863a	mm/rmap: remove unneeded semicolon in page_not_mapped() [ Upstream commit e0af87ff7afcde2660be44302836d2d5618185af ] Remove extra semicolon without any functional change intended. Link: https://lkml.kernel.org/r/20210127093425.39640-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-30 08:47:51 -04:00
Andrey Zhizhikin	db8ff65069	This is the 5.4.128 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmDTLA4ACgkQONu9yGCS aT45xg/+IvxFaIOtutEBkFCJvEurRWSozjBKAfX9xtJQGSSKVyDvh7GZWfEXMxZc oNf8DWQKvaiZj2mRdgYp6Ilo27Ps6aN3vCo09z+U3mfGQLMbNpPYEvSq6Twl26NB 8lL8b++0Jo7P+eOALHohBS125/E0etqhoc2HXDFp6pfksj6J7klxlyQ2NX9Ih8xm l7Cto5flCHM9g20/CNsqxXPWiuBKnzSvp9YH9HMDgjOV6YSktLGTHAJ8omjPm0V/ pQVFOo4Kyx34exdA/IzrM/yV4iDThVtwL6+bNErWtl6LwiIcNK3esARYTNjbBBhK W156adxp6kl6LqMADr/y77WqvcH6H2PhpRnMj+6t21FpK7cTbXfqvxBfpOvE1Buh in95LJN1Iins1PTozBVHcUIpdESO5AN8/2aHq0LRLmVbaLlo6aj+sjdHNPvf7HwW 8LDHtpGNao/spMuZmvvH+6i3iwuciINCRY9TVBDgkT5LhWhRHBl6+uSLEX/d+s3Z 663Q6HPu+cfubR7UC8+QsMMtf7KD2yvQuadAz6n/Z41vvSYIUHPGsYtZUmsef3jP n4CTAmGavtyR5jaQNkuw8nnIn7cthONw94foFheBH0doxmkXPKcwqmWO9DH77n58 unMT31ArVg9ObrO/YmLjEaV9X7VlfRf6yw7tey1RJXgrSD3nwgk= =9+GF -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmDTTjIACgkQ7G51OISz Hs1mJw/5AaIlIv+kMC0qP9aG5Y1uoUQ+fewr4Ofo9Kt8QlVcogdeM16URF7O2CDG mICWVXRqY+xWl/84XWJs11RRxfMnXp85myi1jyalOFDYojJp2qOkOyEi8zMg7wui N1kkPxEworm6ILrZRKNgT36ZfSEGAD2ogkp4wS/Fldwh2RNHtGUSw3xAPcqg0yDq +Ve3V06MDul/dyRup3MYtQ/Kxb5wCc+Lfwh+eXYZIu/DfXy3E4K3w8YB7NPazlTw dBlWU/S3Y0uyoq+DvfFQ+nL0GTe0ezGQB5n5n1eNRWg+mEn3IiWXkrBJfha2Swcl vv+Fy/5ro9elcn1oQF5uvwDoGHhq/KkR7TjRICY9zmVvJ79IcqZJ/Bdg5CEXM+xc EvZDAdXn26y2xKZJzKGQSe7QJurOuQpdW497h1DLGIhDs2sGDjhXuISPcNpjPJKG PdgpJ9kG4v2cYnW58UTdzQKNFg7jjFUiiQV/IIoiVlxhg7dVQnSKCgXHcngiZiuQ eUR9p7ahfEOlG2N5qTlJqom0HtiJ6I4a5TDQhXoEYlKfxr9711f2+cramhf2cle+ +G64chvmKcv7Fyx5yIwSJrTJfi6vsuW5VMfXrvGIqjFUEjIGe9rQMhWH34A7s5tG RD0Tg90JJr0rNsNa+LZNPbRynbdPBbrVWdrVyXN+9HDs40oe8BY= =f7qZ -----END PGP SIGNATURE----- Merge tag 'v5.4.128' into 5.4-2.3.x-imx This is the 5.4.128 stable release Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>	2021-06-23 15:07:27 +00:00
Andrew Morton	4f69c89306	mm/slub.c: include swab.h commit 1b3865d016815cbd69a1879ca1c8a8901fda1072 upstream. Fixes build with CONFIG_SLAB_FREELIST_HARDENED=y. Hopefully. But it's the right thing to do anwyay. Fixes: 1ad53d9fa3f61 ("slub: improve bit diffusion for freelist ptr obfuscation") Link: https://bugzilla.kernel.org/show_bug.cgi?id=213417 Reported-by: <vannguye@cisco.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-23 14:41:30 +02:00
Kees Cook	9ddeea35c4	mm/slub: fix redzoning for small allocations commit 74c1d3e081533825f2611e46edea1fcdc0701985 upstream. The redzone area for SLUB exists between s->object_size and s->inuse (which is at least the word-aligned object_size). If a cache were created with an object_size smaller than sizeof(void ), the in-object stored freelist pointer would overwrite the redzone (e.g. with boot param "slub_debug=ZF"): BUG test (Tainted: G B ): Right Redzone overwritten ----------------------------------------------------------------------------- INFO: 0xffff957ead1c05de-0xffff957ead1c05df @offset=1502. First byte 0x1a instead of 0xbb INFO: Slab 0xffffef3950b47000 objects=170 used=170 fp=0x0000000000000000 flags=0x8000000000000200 INFO: Object 0xffff957ead1c05d8 @offset=1496 fp=0xffff957ead1c0620 Redzone (____ptrval____): bb bb bb bb bb bb bb bb ........ Object (____ptrval____): f6 f4 a5 40 1d e8 ...@.. Redzone (____ptrval____): 1a aa .. Padding (____ptrval____): 00 00 00 00 00 00 00 00 ........ Store the freelist pointer out of line when object_size is smaller than sizeof(void ) and redzoning is enabled. Additionally remove the "smaller than sizeof(void *)" check under CONFIG_DEBUG_VM in kmem_cache_sanity_check() as it is now redundant: SLAB and SLOB both handle small sizes. (Note that no caches within this size range are known to exist in the kernel currently.) Link: https://lkml.kernel.org/r/20210608183955.280836-3-keescook@chromium.org Fixes: `81819f0fc8` ("SLUB core") Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Lin, Zhenpeng" <zplin@psu.edu> Cc: Marco Elver <elver@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <guro@fb.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-23 14:41:30 +02:00
Kees Cook	c0837e021d	mm/slub: clarify verification reporting commit 8669dbab2ae56085c128894b181c2aa50f97e368 upstream. Patch series "Actually fix freelist pointer vs redzoning", v4. This fixes redzoning vs the freelist pointer (both for middle-position and very small caches). Both are "theoretical" fixes, in that I see no evidence of such small-sized caches actually be used in the kernel, but that's no reason to let the bugs continue to exist, especially since people doing local development keep tripping over it. :) This patch (of 3): Instead of repeating "Redzone" and "Poison", clarify which sides of those zones got tripped. Additionally fix column alignment in the trailer. Before: BUG test (Tainted: G B ): Redzone overwritten ... Redzone (____ptrval____): bb bb bb bb bb bb bb bb ........ Object (____ptrval____): f6 f4 a5 40 1d e8 ...@.. Redzone (____ptrval____): 1a aa .. Padding (____ptrval____): 00 00 00 00 00 00 00 00 ........ After: BUG test (Tainted: G B ): Right Redzone overwritten ... Redzone (____ptrval____): bb bb bb bb bb bb bb bb ........ Object (____ptrval____): f6 f4 a5 40 1d e8 ...@.. Redzone (____ptrval____): 1a aa .. Padding (____ptrval____): 00 00 00 00 00 00 00 00 ........ The earlier commits that slowly resulted in the "Before" reporting were: `d86bd1bece` ("mm/slub: support left redzone") `ffc79d2880` ("slub: use print_hex_dump") `2492268472` ("SLUB: change error reporting format to follow lockdep loosely") Link: https://lkml.kernel.org/r/20210608183955.280836-1-keescook@chromium.org Link: https://lkml.kernel.org/r/20210608183955.280836-2-keescook@chromium.org Link: https://lore.kernel.org/lkml/cfdb11d7-fb8e-e578-c939-f7f5fb69a6bd@suse.cz/ Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Marco Elver <elver@google.com> Cc: "Lin, Zhenpeng" <zplin@psu.edu> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Roman Gushchin <guro@fb.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-23 14:41:30 +02:00
yangerkun	566345aaab	mm/memory-failure: make sure wait for page writeback in memory_failure [ Upstream commit e8675d291ac007e1c636870db880f837a9ea112a ] Our syzkaller trigger the "BUG_ON(!list_empty(&inode->i_wb_list))" in clear_inode: kernel BUG at fs/inode.c:519! Internal error: Oops - BUG: 0 [#1] SMP Modules linked in: Process syz-executor.0 (pid: 249, stack limit = 0x00000000a12409d7) CPU: 1 PID: 249 Comm: syz-executor.0 Not tainted 4.19.95 Hardware name: linux,dummy-virt (DT) pstate: 80000005 (Nzcv daif -PAN -UAO) pc : clear_inode+0x280/0x2a8 lr : clear_inode+0x280/0x2a8 Call trace: clear_inode+0x280/0x2a8 ext4_clear_inode+0x38/0xe8 ext4_free_inode+0x130/0xc68 ext4_evict_inode+0xb20/0xcb8 evict+0x1a8/0x3c0 iput+0x344/0x460 do_unlinkat+0x260/0x410 __arm64_sys_unlinkat+0x6c/0xc0 el0_svc_common+0xdc/0x3b0 el0_svc_handler+0xf8/0x160 el0_svc+0x10/0x218 Kernel panic - not syncing: Fatal exception A crash dump of this problem show that someone called __munlock_pagevec to clear page LRU without lock_page: do_mmap -> mmap_region -> do_munmap -> munlock_vma_pages_range -> __munlock_pagevec. As a result memory_failure will call identify_page_state without wait_on_page_writeback. And after truncate_error_page clear the mapping of this page. end_page_writeback won't call sb_clear_inode_writeback to clear inode->i_wb_list. That will trigger BUG_ON in clear_inode! Fix it by checking PageWriteback too to help determine should we skip wait_on_page_writeback. Link: https://lkml.kernel.org/r/20210604084705.3729204-1-yangerkun@huawei.com Fixes: `0bc1f8b068` ("hwpoison: fix the handling path of the victimized page frame that belong to non-LRU") Signed-off-by: yangerkun <yangerkun@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Jan Kara <jack@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Oscar Salvador <osalvador@suse.de> Cc: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-06-23 14:41:23 +02:00
Andrey Zhizhikin	276aedc8f1	This is the 5.4.125 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmDB+Z8ACgkQONu9yGCS aT5qig//WVut449WUeYQLKD8rAB5CUVm2Xl3509Ts8W6LSzYGHiYv1SRVeH2y1lS QnfCnBciopl2UyYxqXGQwoRYdY1T2E/MWUmwGUk0/qlZYOzg5xQ368Shm0lvohJI DsywZrYqJDUCoeyXoWJYrq/3RiAvMK30teKDcn1A2HhhWdo0nsGLp1GUX396ptcV 3xw2ZvCVwuikwxq5jlQKUEkH59TD/ZkCzvn9gfd86FY1R0ohApLJckhGIuT3wA1c Tfekgvfngx1HcEWIAzWFqZPoB8mOF5pn06yZhuPdMKa8UUq78ckN7kbchERj2wJD cDFSQQrMI3nL9sA8ryYV1YFl3fyGX5Epm4O465whzjKWoZ9HwN+iwl6Qv+kOmX41 YUmpUplhsPN+I7+cX1jF7Ohw583uDbFPw6XbyZ0ArZr03JVVv4Vjrv5QA9fVHR06 OP7+zEUlBtu/g3k0Bj5MU8UKem0shXavkPqukrtB+MhrXh2VngEXEVOvKMOFgA4b BnBEga4SrCR/wB+SucIV4fqzV0tq4HD/cPpy67OafrWoqhwlnBsMCQUd+puxkCnM y+eEoRwTzRSW+U9y8KdAERW8qSR/vCyKCUoaKxOV3Jj0v8xp0Y6VHKlKmb//w5Gn Lk7sNjD60Um3Au53A5pJvh8qNg+OsNc46sEmGGndE4Mrada93gE= =O2C+ -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmDCIf8ACgkQ7G51OISz Hs04IA/9HPtgSX+5Uha8T9IWUKxKwK8BwXAnBnowBkt76X50PLR7/i1wD3WmNdMc nVe+bScX6gTjhbqICO2toDZ+lcqWsM00cHPnjZGGwnGDFIvlbxYAYZt/dPTHxgze YXDu7dxY5Cb3tAYBX1Ng165Rti9gJC8QNGOLXiCOUhDSTNMepe02wi6bKR3jN/hm jjl02Qo9BQI70a1w3zOFHH8ffQuUdOoTFji8hq2u+cJ1tP6FuftJyPvIAm+MDLNd 83dg1P4eg73Qk+tp93OKrSG3pnCngxgveCB+U3SQnCd2b83asNVigjxoxkrZZJ7L 9kxq4ifyAfH9TLQJ5lo2xOdQ1ra0+KTBwYKr2X1/N5mrXnmi9OCt54tXFnkPcJLN S0HAP9cCf+NtoACirUfNETeZJDaISvHiYT8XhbJ+y1mr+3pbN/4DQ3P/4u1ykoyF XBQwuAEw6ljz22HbZBuLrsB339CSwVuJbSaFrmeuUsX7AKIA/p45lr6L5JQssTyD a0NWKWFMJ7rV/f10u/B24kZwcSNghx1xMuX8hfBnyPtbR4ChnlnKnSSLsF+AJQEW 7chnIejPa0UwQAkZmd/b1qaqZq5oIct3BFRZUfcledlej8HweLNJGyy8+T5ZSu5d G5ku8CU2hIgnKSBaqF7AaqkcDga2fjelGJetJjqdPwgIzMJmqnE= =b+bL -----END PGP SIGNATURE----- Merge tag 'v5.4.125' into 5.4-2.3.x-imx This is the 5.4.125 stable release Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>	2021-06-10 14:30:21 +00:00
Matthew Wilcox (Oracle)	3b7f3cab1d	mm/filemap: fix storing to a THP shadow entry commit 198b62f83eef1d605d70eca32759c92cdcc14175 upstream When a THP is removed from the page cache by reclaim, we replace it with a shadow entry that occupies all slots of the XArray previously occupied by the THP. If the user then accesses that page again, we only allocate a single page, but storing it into the shadow entry replaces all entries with that one page. That leads to bugs like page dumped because: VM_BUG_ON_PAGE(page_to_pgoff(page) != offset) ------------[ cut here ]------------ kernel BUG at mm/filemap.c:2529! https://bugzilla.kernel.org/show_bug.cgi?id=206569 This is hard to reproduce with mainline, but happens regularly with the THP patchset (as so many more THPs are created). This solution is take from the THP patchset. It splits the shadow entry into order-0 pieces at the time that we bring a new page into cache. Fixes: `99cb0dbd47` ("mm,thp: add read-only THP support for (non-shmem) FS") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Song Liu <songliubraving@fb.com> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Qian Cai <cai@lca.pw> Link: https://lkml.kernel.org/r/20200903183029.14930-4-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-10 13:37:15 +02:00
Mina Almasry	14fd3da3e8	mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY [ Upstream commit d84cf06e3dd8c5c5b547b5d8931015fc536678e5 ] The userfaultfd hugetlb tests cause a resv_huge_pages underflow. This happens when hugetlb_mcopy_atomic_pte() is called with !is_continue on an index for which we already have a page in the cache. When this happens, we allocate a second page, double consuming the reservation, and then fail to insert the page into the cache and return -EEXIST. To fix this, we first check if there is a page in the cache which already consumed the reservation, and return -EEXIST immediately if so. There is still a rare condition where we fail to copy the page contents AND race with a call for hugetlb_no_page() for this index and again we will underflow resv_huge_pages. That is fixed in a more complicated patch not targeted for -stable. Test: Hacked the code locally such that resv_huge_pages underflows produce a warning, then: ./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10 2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success ./tools/testing/selftests/vm/userfaultfd hugetlb 10 2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success Both tests succeed and produce no warnings. After the test runs number of free/resv hugepages is correct. [mike.kravetz@oracle.com: changelog fixes] Link: https://lkml.kernel.org/r/20210528004649.85298-1-almasrymina@google.com Fixes: `8fb5debc5f` ("userfaultfd: hugetlbfs: add hugetlb_mcopy_atomic_pte for userfaultfd support") Signed-off-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-06-10 13:37:14 +02:00
Andrey Zhizhikin	2cbb55e591	This is the 5.4.120 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmCkyEcACgkQONu9yGCS aT70Qg//Rv09McvLQ+8E0OilJ7TdT0UthXQFP+uPTu+/HPeHQkCO168cn1hbwD9K i0YfFYB7PqPe/wccHNsmWHSUYCzA9NnwExA84/jofjswkEMMc95x/bow5/xmLe/5 ImkjODPVHuQWewgMfbSmNu7Br4wmQC5U/K4r7hp/Aa0FdTjcHMI6Zw40FGbJrWmq kiqhW9CeagKbxWrihQNLrSB4E5CdpNNkug/zVus2n9jlFT4tltNGSd7bPsxrp7LN EdTfayyPUVZeCoysTNA0WZgz47f+z47vAdIlDHzWCIOZcM1RnJXKA5kFXRf8Fnfa +hyvaHSDqYGdRgZxYMcXLL+/cS4foQ/8iQxZBCMomABM0MNUuoJ5tYR6GVetlRcR 46ZC/5OAvNoKY2Kj4Ky4ROF7aMR3NkYCY6wHUVRcw8778bmuReeLJJPsWojAI+4F pWT08+7OUJYb3hRnGxxzKot6CPztdkpQXfXMy+wyNlNbRZ/ivs9/f/GhdblXy/6T j12LKIh1IOxpB/wi7GRfeABUuC4MU8xqx6FuPDrBgCTMfVig/wcwF27AUr//a0F5 xrrzCrDFNAvuyD1WyYilaxWDHAe2o9ROT0JZ4VB3zu40w2VlTT77aqA174xfQa6b 418Eykw3O11dmsY8AQPTt1HhkDCiewEe4K58CJcmCNEf/inFbvI= =kNQc -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmCk3VMACgkQ7G51OISz Hs0Y+A/9HqeZOFBPE3BlnEKkdxEPp+zoOj/57pWlS+Xr8ySalZXonAAC975voKjS 6IFejLkN83q5yCrwYBJ/PCc5Xm9deBf+6AE3KyaNV7CD5ycSBy8o9Fx7IH81wFXt quiVNlpi7iGiWsm3ubsvDHNgHwQlvN6mbNp+RIqD4YllheUsTXeR05rrfr+TauXv Bv7LEolcyUJo0LTzqPsTlVFG9ec+0qWZb1A7LgOZJFWrS/XaYpoXA49KaNbIZIPj jb+sWzTsXhOMc2yB1A6P8T3jsx7NxK4Y2rRyCk0l5Hm//Jl7Kcx+vbe0yQPf2DOs c0X+0s8Vwk/Ry606XLcy0nIIGePyj++u43r6noB/cN/LnZUbJY3zbywUNvrY1thS YG2MtSiV9TzzKP4xApPF7G/4G6H4HOaC1cQbxc3+7sklJf7Coh3pKv775POIfnLS UrQ2ToQRNWQC4/l/O9h9BYiIPWSAAUd0uBgZzcdQWeWbdPXMdb02cwZ5YuPoZgSV aLsDN8naaQlsZ20wdQ0DaqKoIryHnD0XHNSMkEvZ0bjPxOOtICekixLJQxjwFBQL cx3stcQs5PT8l3PQqc2xnVNTqnL7yWGdMYdfNE+4qTLSiw2s5uXbb7BJwttj9Pit QKVSVUtp12ObX1kJzqhKa9GQEUPYREVBpN7NwJL2E5hVObfT6OE= =ruSa -----END PGP SIGNATURE----- Merge tag 'v5.4.120' into 5.4-2.3.x-imx This is the 5.4.120 stable release Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>	2021-05-19 09:41:35 +00:00
Peter Xu	f77aa56ad9	mm/hugetlb: fix F_SEAL_FUTURE_WRITE commit 22247efd822e6d263f3c8bd327f3f769aea9b1d9 upstream. Patch series "mm/hugetlb: Fix issues on file sealing and fork", v2. Hugh reported issue with F_SEAL_FUTURE_WRITE not applied correctly to hugetlbfs, which I can easily verify using the memfd_test program, which seems that the program is hardly run with hugetlbfs pages (as by default shmem). Meanwhile I found another probably even more severe issue on that hugetlb fork won't wr-protect child cow pages, so child can potentially write to parent private pages. Patch 2 addresses that. After this series applied, "memfd_test hugetlbfs" should start to pass. This patch (of 2): F_SEAL_FUTURE_WRITE is missing for hugetlb starting from the first day. There is a test program for that and it fails constantly. $ ./memfd_test hugetlbfs memfd-hugetlb: CREATE memfd-hugetlb: BASIC memfd-hugetlb: SEAL-WRITE memfd-hugetlb: SEAL-FUTURE-WRITE mmap() didn't fail as expected Aborted (core dumped) I think it's probably because no one is really running the hugetlbfs test. Fix it by checking FUTURE_WRITE also in hugetlbfs_file_mmap() as what we do in shmem_mmap(). Generalize a helper for that. Link: https://lkml.kernel.org/r/20210503234356.9097-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20210503234356.9097-2-peterx@redhat.com Fixes: `ab3948f58f` ("mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd") Signed-off-by: Peter Xu <peterx@redhat.com> Reported-by: Hugh Dickins <hughd@google.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-19 10:08:29 +02:00
Axel Rasmussen	b3f1731c6d	userfaultfd: release page in error path to avoid BUG_ON commit 7ed9d238c7dbb1fdb63ad96a6184985151b0171c upstream. Consider the following sequence of events: 1. Userspace issues a UFFD ioctl, which ends up calling into shmem_mfill_atomic_pte(). We successfully account the blocks, we shmem_alloc_page(), but then the copy_from_user() fails. We return -ENOENT. We don't release the page we allocated. 2. Our caller detects this error code, tries the copy_from_user() after dropping the mmap_lock, and retries, calling back into shmem_mfill_atomic_pte(). 3. Meanwhile, let's say another process filled up the tmpfs being used. 4. So shmem_mfill_atomic_pte() fails to account blocks this time, and immediately returns - without releasing the page. This triggers a BUG_ON in our caller, which asserts that the page should always be consumed, unless -ENOENT is returned. To fix this, detect if we have such a "dangling" page when accounting fails, and if so, release it before returning. Link: https://lkml.kernel.org/r/20210428230858.348400-1-axelrasmussen@google.com Fixes: `cb658a453b` ("userfaultfd: shmem: avoid leaking blocks and used blocks in UFFDIO_COPY") Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Reported-by: Hugh Dickins <hughd@google.com> Acked-by: Hugh Dickins <hughd@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-19 10:08:29 +02:00
Miaohe Lin	6aeba28d12	ksm: fix potential missing rmap_item for stable_node [ Upstream commit c89a384e2551c692a9fe60d093fd7080f50afc51 ] When removing rmap_item from stable tree, STABLE_FLAG of rmap_item is cleared with head reserved. So the following scenario might happen: For ksm page with rmap_item1: cmp_and_merge_page stable_node->head = &migrate_nodes; remove_rmap_item_from_tree, but head still equal to stable_node; try_to_merge_with_ksm_page failed; return; For the same ksm page with rmap_item2, stable node migration succeed this time. The stable_node->head does not equal to migrate_nodes now. For ksm page with rmap_item1 again: cmp_and_merge_page stable_node->head != &migrate_nodes && rmap_item->head == stable_node return; We would miss the rmap_item for stable_node and might result in failed rmap_walk_ksm(). Fix this by set rmap_item->head to NULL when rmap_item is removed from stable tree. Link: https://lkml.kernel.org/r/20210330140228.45635-5-linmiaohe@huawei.com Fixes: `4146d2d673` ("ksm: make !merge_across_nodes migration safe") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-19 10:08:27 +02:00
Miaohe Lin	dde73137ce	mm/migrate.c: fix potential indeterminate pte entry in migrate_vma_insert_page() [ Upstream commit 34f5e9b9d1990d286199084efa752530ee3d8297 ] If the zone device page does not belong to un-addressable device memory, the variable entry will be uninitialized and lead to indeterminate pte entry ultimately. Fix this unexpected case and warn about it. Link: https://lkml.kernel.org/r/20210325131524.48181-4-linmiaohe@huawei.com Fixes: `df6ad69838` ("mm/device-public-memory: device memory cache coherent with CPU") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-19 10:08:27 +02:00

1 2 3 4 5 ...

14265 Commits