Commit Graph

2293 Commits

Author SHA1 Message Date
Andrey Zhizhikin 0615afea9f This is the 5.4.133 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmD1IXQACgkQONu9yGCS
 aT50sBAAtueT2WsCuD1psEN219cK0e7AuLrHXtAnkHFJIPWnzS0vyBy7/hmFwWtq
 sBSenqG8qufqtLVnMkeEAhu2/sk/5NHRPQJtK0k4hCzt8FQuiVQco1raOtetIJx+
 +wBfE4FAGDmiYGkcuzh61n1euvpBetXd9HFfWtSfQq7Q3nN+sfv0q1V9ZK0MUJ8v
 ipvSY1hTSDEQQJ7cU48DDJtZUGNxrfEFzi4CLI2YVphzoHEbowd5nxtHUL5cwDhx
 3sHYJoN+5RAkRinzGyviDlRpodNUUkLusBzs54xNIzgdzkckEKniKotZ2lUGsEu+
 QQgj0paNB95GLkY/Rgn6AL03AQdYBgGIjHQkSaYJ+UM9TlacqgFMiGugn28bj0o/
 1F4s6zCWG5tuhM5zNcnTsJmwSPA3eZ0uI6NCkjKC/RAyD5SC6JQqcf5zYCzygdT/
 PpeFRcZGoxyQqmOjW2e+tpNAbHuIeayExx/6/3rw3b/xaR9Ju9mYxNDiIpYZwdc6
 FIWOsHG+bEEZANiWv6Ju7DfOTKg8F7mbm4Zrd00euIWEsxuUZO/lAzxPR8pPzsn7
 2k46PDrhah25Y/tbSE5hdKrLqSorSjIg+7CxLAk7LWPmq13zzEd8y+e/Bk5rFJ4T
 7vPLUb23OYFdrVMOXd1UyKhcP4CKyOf7IvG4SsZwj9WfWoNEDNg=
 =2WCO
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmD25jEACgkQ7G51OISz
 Hs1QgxAAl7FiNoI3XJ2GbS0bQAxYXeUtcQ6lJmHRD/6IEikDr12bQ9XMvF1k4o+P
 15MzTOndSRSW0Yxi5QXRTGcBuqbC+HhnUVHggJpUJugUt9Q5TtO6ZxVX0dkbMhp3
 AXK55Zm465y4dX6ys59IWL138xMKd0pBIfchlb3oSaiQ9qBFmKDXMEYDGOxDC3e9
 VR1EO5PfpzEgrGONO+Xxu+2IDWRiWfKGyIaZCIWsRqlsrjdOFMKbTL2iBwMgrqmi
 D3kndN6kGvxqoHCe3P9chKqNfU+P3durBNomhhXyBZBRNT2XW19UVpk3VIhqa5Dw
 7DA6zHihFuZlI9XEZKAr4cokxS9IRFWZBayYE4diMu4+BA45mKIS+1BRyPDozgRG
 cyp6QaGI8IEzdI1oa6WW/CR/zkhQKyIj/lhwlx98XJlkGoDtfSgGMx6QdmtH8Pfk
 Gmgg3aHV/AQMRasSfKPDLGWD0f3nVzneHh9ceK9/j8gjY+T6msVYw/p4kIhXZQCE
 cZplDehOsFtJubB5lXxe1PZzHedM5p0mrYousngjVhHjbe/5h243fj3gBsTJK3zZ
 XP74VPZyqwBtEUrMEL2nPlsQSDfeZRnNSCiXNq1vJg+skqJTMicv2TUP0134ME8K
 yUjmJxd46diY/bOPBL737MuWMlpm9Bpg31qRe92jSHovKrbZVTk=
 =rRdV
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.133' into 5.4-2.3.x-imx

This is the 5.4.133 stable release

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-07-20 15:05:19 +00:00
Christophe Leroy a82471a14a powerpc/mm: Fix lockup on kernel exec fault
commit cd5d5e602f502895e47e18cd46804d6d7014e65c upstream.

The powerpc kernel is not prepared to handle exec faults from kernel.
Especially, the function is_exec_fault() will return 'false' when an
exec fault is taken by kernel, because the check is based on reading
current->thread.regs->trap which contains the trap from user.

For instance, when provoking a LKDTM EXEC_USERSPACE test,
current->thread.regs->trap is set to SYSCALL trap (0xc00), and
the fault taken by the kernel is not seen as an exec fault by
set_access_flags_filter().

Commit d7df2443cd ("powerpc/mm: Fix spurious segfaults on radix
with autonuma") made it clear and handled it properly. But later on
commit d3ca587404 ("powerpc/mm: Fix reporting of kernel execute
faults") removed that handling, introducing test based on error_code.
And here is the problem, because on the 603 all upper bits of SRR1
get cleared when the TLB instruction miss handler bails out to ISI.

Until commit cbd7e6ca0210 ("powerpc/fault: Avoid heavy
search_exception_tables() verification"), an exec fault from kernel
at a userspace address was indirectly caught by the lack of entry for
that address in the exception tables. But after that commit the
kernel mainly relies on KUAP or on core mm handling to catch wrong
user accesses. Here the access is not wrong, so mm handles it.
It is a minor fault because PAGE_EXEC is not set,
set_access_flags_filter() should set PAGE_EXEC and voila.
But as is_exec_fault() returns false as explained in the beginning,
set_access_flags_filter() bails out without setting PAGE_EXEC flag,
which leads to a forever minor exec fault.

As the kernel is not prepared to handle such exec faults, the thing to
do is to fire in bad_kernel_fault() for any exec fault taken by the
kernel, as it was prior to commit d3ca587404.

Fixes: d3ca587404 ("powerpc/mm: Fix reporting of kernel execute faults")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/024bb05105050f704743a0083fe3548702be5706.1625138205.git.christophe.leroy@csgroup.eu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-19 08:53:14 +02:00
Andrey Zhizhikin 6602fc5788 This is the 5.4.119 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmCeKsMACgkQONu9yGCS
 aT4cYhAA0qDTHscvm641m/Dv4U9w3gWh2Fs8oPz43+nJ1/8CTrT/gSWA7IRDDHiV
 Dys2canDVLNYTEx1TqwmHbN3R+nvQTpdz2wuJuSf7GKYQj0n3S99BEN6uxod+puu
 /M7apBH5npjZKv1DMRUrQ/AUGVUuBQqtN7Hl5hEL8ibI/bsZV8+dhJJ8c8uyJpam
 peiP5n2lCz5HZ/K5OyEy1jCmWQLIcRN59SmiARy/xk739igoCMUajkY1mV0WVyks
 SKnZEP7tY1mLLYzpW/ZVSkXurx+ZtL1zUctRt5dh5US4uzNt/sfm8oDzyzvGojd/
 iWtXefprJXbI9BGyNaBwwNmzjSXabSkoI75wExxsMQKFZpsq12pz97dwy/pZyU+c
 NlzbmDQg8+Cs9dKsDw6jUXHYSJ9fb4mk6GOF9u0LXgyq1f15/DzjdzkYLXZ3tTOK
 exVFs/CKz7Dg6npdO5kl7mg18AxmVH+OJftltF2+MbUohBs2vRDRr+O4cY8Wlc2Q
 AF85uAE3Mo/yL1pi6O7lMW4ic5yJvTRCX/iPsxyDU8LvxM1Kc7u9CzykX3M0WFLz
 TsKxfPQvoc6WGf8IWy4j1nXMzXQTHL/6CrfzOSTFngR8eqcsbgU0nkKpZNEtvnxN
 k30ID+Mcl4B6k6XTECNJUXjcwg+TR+XtKOjwXAIDaVqwoW759BY=
 =25pJ
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmCe4QMACgkQ7G51OISz
 Hs1JdQ//aC82eBW+TH85PNlVaYn7DL8cU6zE/mX4dofughg8AIg84v0v30rRGF3m
 c0bxBhwcah+4TIPzWSs8QAT7pEcvR3Gj5JZrEeIuyXBwKhRSXgZJKbV4XAE9Z23W
 1Etxwh28NfLbLEnX6qJnejQhHckt4MK9j4W5Gg2L7o6TrZwB7gLjNPfck6t7DnJc
 9Fo4TxwgP+BFu30s9h9eQqv/EIVJPRfiAVgvK4o1CCEHya8bfnJgJDgyxKbdyBg6
 IXYfYWGVe26VCR7PreEFJ3iheVVlI9VicFfhRzbsCdYQYYNqa22VfyMFrOO1nyMO
 SPAQnDRqlpvXIUHZ70puDX9QYji2RhVjf6fnDoQRPlkDaKumO3+zdk8bcCJFk7kF
 f7BPc8babvAn4WeG2hQOlPSKf+Mcc5mFmh08bsb8y9/w2mNPhg8UkCsaymggabkP
 GK/MvaBGxSoV8ft+EInyRLN3qlLrinYZDOOTj40Pd/d3HrJQCcShXzOI4SRd73MU
 +ISKHU9BrsRvVQ5ICjVTRfFMcFDJUt60leVM+QGm6MxQmgAuW/Scy9bNb9sDUyJd
 6TUI2stzAxDRrjz8JDIXFyUImC3juL1c4WnrpP5I6ZWIeXIpVlQAH1Wxg18Dj0DC
 1W5Fb2cicRRLfSPBgjppeipChm9ecT6Fl+3n7kBo5tvSdiiJEo4=
 =Wb4x
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.119' into 5.4-2.3.x-imx

This is the 5.4.119 stable release

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-05-14 20:43:43 +00:00
Jordan Niethe 73f9dccb29 powerpc/64s: Fix pte update for kernel memory on radix
[ Upstream commit b8b2f37cf632434456182e9002d63cbc4cccc50c ]

When adding a PTE a ptesync is needed to order the update of the PTE
with subsequent accesses otherwise a spurious fault may be raised.

radix__set_pte_at() does not do this for performance gains. For
non-kernel memory this is not an issue as any faults of this kind are
corrected by the page fault handler. For kernel memory these faults
are not handled. The current solution is that there is a ptesync in
flush_cache_vmap() which should be called when mapping from the
vmalloc region.

However, map_kernel_page() does not call flush_cache_vmap(). This is
troublesome in particular for code patching with Strict RWX on radix.
In do_patch_instruction() the page frame that contains the instruction
to be patched is mapped and then immediately patched. With no ordering
or synchronization between setting up the PTE and writing to the page
it is possible for faults.

As the code patching is done using __put_user_asm_goto() the resulting
fault is obscured - but using a normal store instead it can be seen:

  BUG: Unable to handle kernel data access on write at 0xc008000008f24a3c
  Faulting instruction address: 0xc00000000008bd74
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
  Modules linked in: nop_module(PO+) [last unloaded: nop_module]
  CPU: 4 PID: 757 Comm: sh Tainted: P           O      5.10.0-rc5-01361-ge3c1b78c8440-dirty #43
  NIP:  c00000000008bd74 LR: c00000000008bd50 CTR: c000000000025810
  REGS: c000000016f634a0 TRAP: 0300   Tainted: P           O       (5.10.0-rc5-01361-ge3c1b78c8440-dirty)
  MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 44002884  XER: 00000000
  CFAR: c00000000007c68c DAR: c008000008f24a3c DSISR: 42000000 IRQMASK: 1

This results in the kind of issue reported here:
  https://lore.kernel.org/linuxppc-dev/15AC5B0E-A221-4B8C-9039-FA96B8EF7C88@lca.pw/

Chris Riedl suggested a reliable way to reproduce the issue:
  $ mount -t debugfs none /sys/kernel/debug
  $ (while true; do echo function > /sys/kernel/debug/tracing/current_tracer ; echo nop > /sys/kernel/debug/tracing/current_tracer ; done) &

Turning ftrace on and off does a large amount of code patching which
in usually less then 5min will crash giving a trace like:

   ftrace-powerpc: (____ptrval____): replaced (4b473b11) != old (60000000)
   ------------[ ftrace bug ]------------
   ftrace failed to modify
   [<c000000000bf8e5c>] napi_busy_loop+0xc/0x390
    actual:   11:3b:47:4b
   Setting ftrace call site to call ftrace function
   ftrace record flags: 80000001
    (1)
    expected tramp: c00000000006c96c
   ------------[ cut here ]------------
   WARNING: CPU: 4 PID: 809 at kernel/trace/ftrace.c:2065 ftrace_bug+0x28c/0x2e8
   Modules linked in: nop_module(PO-) [last unloaded: nop_module]
   CPU: 4 PID: 809 Comm: sh Tainted: P           O      5.10.0-rc5-01360-gf878ccaf250a #1
   NIP:  c00000000024f334 LR: c00000000024f330 CTR: c0000000001a5af0
   REGS: c000000004c8b760 TRAP: 0700   Tainted: P           O       (5.10.0-rc5-01360-gf878ccaf250a)
   MSR:  900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 28008848  XER: 20040000
   CFAR: c0000000001a9c98 IRQMASK: 0
   GPR00: c00000000024f330 c000000004c8b9f0 c000000002770600 0000000000000022
   GPR04: 00000000ffff7fff c000000004c8b6d0 0000000000000027 c0000007fe9bcdd8
   GPR08: 0000000000000023 ffffffffffffffd8 0000000000000027 c000000002613118
   GPR12: 0000000000008000 c0000007fffdca00 0000000000000000 0000000000000000
   GPR16: 0000000023ec37c5 0000000000000000 0000000000000000 0000000000000008
   GPR20: c000000004c8bc90 c0000000027a2d20 c000000004c8bcd0 c000000002612fe8
   GPR24: 0000000000000038 0000000000000030 0000000000000028 0000000000000020
   GPR28: c000000000ff1b68 c000000000bf8e5c c00000000312f700 c000000000fbb9b0
   NIP ftrace_bug+0x28c/0x2e8
   LR  ftrace_bug+0x288/0x2e8
   Call Trace:
     ftrace_bug+0x288/0x2e8 (unreliable)
     ftrace_modify_all_code+0x168/0x210
     arch_ftrace_update_code+0x18/0x30
     ftrace_run_update_code+0x44/0xc0
     ftrace_startup+0xf8/0x1c0
     register_ftrace_function+0x4c/0xc0
     function_trace_init+0x80/0xb0
     tracing_set_tracer+0x2a4/0x4f0
     tracing_set_trace_write+0xd4/0x130
     vfs_write+0xf0/0x330
     ksys_write+0x84/0x140
     system_call_exception+0x14c/0x230
     system_call_common+0xf0/0x27c

To fix this when updating kernel memory PTEs using ptesync.

Fixes: f1cb8f9beb ("powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags")
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Tidy up change log slightly]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210208032957.1232102-1-jniethe5@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14 09:44:27 +02:00
Andrey Zhizhikin 6f99d03764 This is the 5.4.87 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl/1wNYACgkQONu9yGCS
 aT64cxAAwwt2H911zFagJCVDfLKXJ4da062n0YcJe3saGSg+mdEkSGYEDxjV6jjM
 jTzK1W5C49sQ9kzIF43YnYgdULwcXJ76G/uqFjFOlmbRzAKAYgs/3KXesa7S4cp+
 LT0fiR7uyViOw1zn4yBIeSnax8uRwT4vR1vV++ILC/7vL6hcnOBOPLxGzUKYlvJQ
 TD8ZQjeTXe5E7IhE+ztuhJQT+hZr1VERTjoktcfmlUps94uITeKdKYoCCZQ/zYIL
 IS7OgnAw5RNERHa1JUZruaGFvJORTu8wAfVtgD1VgRUZAe2ziWH6aCeDPaWaLzS5
 3U7Rc3Fyf0CRYrhe7mI1J864GIEUAe9V34sGQzaU/ap4SWpLvHbu12ePlb+nLNKF
 MZmGEd0eZuKKDSx9dlcx8hbfVg99YpI5oOeDvfCJpYx/uxNzzJhO5wkkZxweiN9s
 XTMUhhkTNkhgYdzn4Y8G9++LLAZpwOImSh3NkntoH+mSVlC+jVBbskz6PdywDjQR
 ROVpW26t5Ee6uDTrjci5cffbfje2y0r9km5/sbRWUz2YGsqYfAI3FtbH5isNUPOm
 Q6ucTd+xvmApfp9bn+XYLnbTQEGAD6mAgSmO11CIDsUJUvOTD/2cv861kATJqhXm
 01rHgohIG604vERppYC3WWFjh0cdevBvwSOpDi1LIdlgbEF6QY0=
 =q0Fm
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAl/8SgIACgkQ7G51OISz
 Hs2VKRAAml9ZwPiiMF8+9Si2UgXrC76XSe9ugeFR8w3xRYfSeE7/xH/eA3pl8iay
 zO3Z+EuF0Qd6Wo+q+8hLO7Qp6PbWfp9dqWoUA2klptVT6cyvrW3lGT65XK8oYvIO
 7ElkQYY8MCx/NLpExbgQzydJbcb7u6Gpl3s+o6eVIXEA4vewlTU5RVpUGs0YxpWR
 lFh6w1LM1bE18HNK34saIsvKYQURBWMdpj+eN1P/Ts4XSTXBv7Xw9Uf4D4berzsd
 +Xtp4AT2zmyq1iU1QRtkNXY2XyBsnMJLfQAEkbpq+hHHY644hJKU7yWiSWgfvAac
 ylY/VZ6kkEvdXsCHC8pCc/MlPMc9T3ciuHIHDRUjSXDctPcxeOdkVuNah0JG1s2l
 UQYD328Sb55zaq4oSJnG9SQVSj299yOnrRmAKLJb12cFa4wxAwXbbaP1w2baf/Ck
 PQAivRb+zp7G17Uirih3yv/UjSK9f1OcyZywU2P1srL18elfjKdJTQAS11uoZ5NQ
 vGhwlQp5VGQmdML5Z2h9V6w7G+XcEaflR5PkM6QmiuAIRk7/0uESSB+gje6sE4eA
 hGbikUI9K3BV6ppx/OlRLgbEgq0f+V2xjwDRZL7qDLsiiOUp6eWyY9K+QgAglnFH
 2XoZTDsZ+7yNQ2nRQvwhHg2hEGyjJh1dDib2yMmslDujD/w2GZA=
 =Oh8o
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.87' into 5.4-2.3.x-imx

This is the 5.4.87 stable release

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-01-11 12:52:13 +00:00
Andrey Zhizhikin b5636ee381 This is the 5.4.73 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl+ahE8ACgkQONu9yGCS
 aT4j1A/9HzkKKoqZ2vXYQ1/uEnUqZech9ly1KxpNTBrSZYAtx3MaWY7tGDEx2BqD
 y6iw9x4MymhHEbpwLg6YmmdWuMQLNNYJGoyLiPJgWhkE4c7zHadhNz1DcPEI8F7z
 bSlUJ3Oebr8gzv0FvUmeVXw7Z2EuOqM1zGgTAZfnKY3DkYHbLnrzUJ4AiI8TNeba
 pPIhjfIJ1TvhF+s5ggf2m8OtSWLZ0doCWCPmCFe2WyERX2WYCzPgsm0yL7L7oXME
 ZqWpOcClBsiYekBNcZ4kxozhJtArCnv24n9VoXJ/YJIlWKvCA6uC8r527nGN/z08
 dfFelj1nDs7/VrCSP4+109EjxLQnSYGgIWP0g0OsC+9wOmrQsYJ1azP1eNjm+NuC
 hPa8uYVEZxwVyJuEfu4ZB4NMZBlD2qnHoskvBKbyZ8yaVnbvlMp552XMwsmJBpCs
 8wArzabrJEz396LUUIYG829D7NBDuRav1Miu+FTzlbn+xZ/Y/S8OmhoG2stWa4wV
 y5x0M0DWgrqiZ9rMkz9A03UNnCInQVTfIBoMl63xFitW4/0vLsln3+CjzlKm7H46
 rD/tKACUoCDjR5DN+JwQzmTdL9zBb4p1cXwWjWb6rON3BkXmO0JVAxzurxI9PfX0
 ZWDydZ3HNmrm0d3J12zf3kTX56PfPFAGWUsEc4Ntb5zdWXSQJsE=
 =fZ3T
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.73' into 5.4-2.3.x-imx

This is the 5.4.73 stable release

Conflicts:
- arch/arm/boot/dts/imx6sl.dtsi:
Commit [a1767c9019] in NXP tree is now covered with commit [5c4c2f437c]
from upstream.

- drivers/gpu/drm/mxsfb/mxsfb_drv.c:
Resolve merge hunk for patch [ed8b90d303] from upstream

- drivers/media/i2c/ov5640.c:
Patch [aa4bb8b883] in NXP tree is now covered by patches [79ec0578c7]
and [b2f8546056] from upstream. Changes from NXP patch [99aa4c8c18] are
covered in upstream version as well.

- drivers/net/ethernet/freescale/fec_main.c:
Fix merge fuzz for patch [9e70485b40] from upstream.

- drivers/usb/cdns3/gadget.c:
Keep NXP version of the file, upstream version is not compatible.

- drivers/usb/dwc3/core.c:
- drivers/usb/dwc3/core.h:
Fix merge fuzz of patch [08045050c6] together wth NXP patch [b30e41dc1e]

- sound/soc/fsl/fsl_sai.c:
- sound/soc/fsl/fsl_sai.h:
Commit [2ea70e51eb72a] in NXP tree is now covered with commit [1ad7f52fe6]
from upstream.

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-01-11 10:09:27 +00:00
Christophe Leroy c7f66ad880 powerpc/mm: Fix verification of MMU_FTR_TYPE_44x
commit 17179aeb9d34cc81e1a4ae3f85e5b12b13a1f8d0 upstream.

MMU_FTR_TYPE_44x cannot be checked by cpu_has_feature()

Use mmu_has_feature() instead

Fixes: 23eb7f560a2a ("powerpc: Convert flush_icache_range & friends to C")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ceede82fadf37f3b8275e61fcf8cf29a3e2ec7fe.1602351011.git.christophe.leroy@csgroup.eu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30 11:51:39 +01:00
Christophe Leroy 3f72486cec powerpc/mm: sanity_check_fault() should work for all, not only BOOK3S
[ Upstream commit 7ceb40027e19567a0a066e3b380cc034cdd9a124 ]

The verification and message introduced by commit 374f3f5979
("powerpc/mm/hash: Handle user access of kernel address gracefully")
applies to all platforms, it should not be limited to BOOK3S.

Make the BOOK3S version of sanity_check_fault() the one for all,
and bail out earlier if not BOOK3S.

Fixes: 374f3f5979 ("powerpc/mm/hash: Handle user access of kernel address gracefully")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/fe199d5af3578d3bf80035d203a94d742a7a28af.1607491748.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30 11:51:22 +01:00
Nicholas Piggin 557c184df3 powerpc/64s/radix: Fix mm_cpumask trimming race vs kthread_use_mm
[ Upstream commit a665eec0a22e11cdde708c1c256a465ebe768047 ]

Commit 0cef77c779 ("powerpc/64s/radix: flush remote CPUs out of
single-threaded mm_cpumask") added a mechanism to trim the mm_cpumask of
a process under certain conditions. One of the assumptions is that
mm_users would not be incremented via a reference outside the process
context with mmget_not_zero() then go on to kthread_use_mm() via that
reference.

That invariant was broken by io_uring code (see previous sparc64 fix),
but I'll point Fixes: to the original powerpc commit because we are
changing that assumption going forward, so this will make backports
match up.

Fix this by no longer relying on that assumption, but by having each CPU
check the mm is not being used, and clearing their own bit from the mask
only if it hasn't been switched-to by the time the IPI is processed.

This relies on commit 38cf307c1f20 ("mm: fix kthread_use_mm() vs TLB
invalidate") and ARCH_WANT_IRQS_OFF_ACTIVATE_MM to disable irqs over mm
switch sequences.

Fixes: 0cef77c779 ("powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Michael Ellerman <mpe@ellerman.id.au>
Depends-on: 38cf307c1f20 ("mm: fix kthread_use_mm() vs TLB invalidate")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200914045219.3736466-5-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29 09:57:49 +01:00
Scott Cheloha 935950e319 pseries/drmem: don't cache node id in drmem_lmb struct
[ Upstream commit e5e179aa3a39c818db8fbc2dce8d2cd24adaf657 ]

At memory hot-remove time we can retrieve an LMB's nid from its
corresponding memory_block.  There is no need to store the nid
in multiple locations.

Note that lmb_to_memblock() uses find_memory_block() to get the
corresponding memory_block.  As find_memory_block() runs in sub-linear
time this approach is negligibly slower than what we do at present.

In exchange for this lookup at hot-remove time we no longer need to
call memory_add_physaddr_to_nid() during drmem_init() for each LMB.
On powerpc, memory_add_physaddr_to_nid() is a linear search, so this
spares us an O(n^2) initialization during boot.

On systems with many LMBs that initialization overhead is palpable and
disruptive.  For example, on a box with 249854 LMBs we're seeing
drmem_init() take upwards of 30 seconds to complete:

[   53.721639] drmem: initializing drmem v2
[   80.604346] watchdog: BUG: soft lockup - CPU#65 stuck for 23s! [swapper/0:1]
[   80.604377] Modules linked in:
[   80.604389] CPU: 65 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc2+ #4
[   80.604397] NIP:  c0000000000a4980 LR: c0000000000a4940 CTR: 0000000000000000
[   80.604407] REGS: c0002dbff8493830 TRAP: 0901   Not tainted  (5.6.0-rc2+)
[   80.604412] MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 44000248  XER: 0000000d
[   80.604431] CFAR: c0000000000a4a38 IRQMASK: 0
[   80.604431] GPR00: c0000000000a4940 c0002dbff8493ac0 c000000001904400 c0003cfffffede30
[   80.604431] GPR04: 0000000000000000 c000000000f4095a 000000000000002f 0000000010000000
[   80.604431] GPR08: c0000bf7ecdb7fb8 c0000bf7ecc2d3c8 0000000000000008 c00c0002fdfb2001
[   80.604431] GPR12: 0000000000000000 c00000001e8ec200
[   80.604477] NIP [c0000000000a4980] hot_add_scn_to_nid+0xa0/0x3e0
[   80.604486] LR [c0000000000a4940] hot_add_scn_to_nid+0x60/0x3e0
[   80.604492] Call Trace:
[   80.604498] [c0002dbff8493ac0] [c0000000000a4940] hot_add_scn_to_nid+0x60/0x3e0 (unreliable)
[   80.604509] [c0002dbff8493b20] [c000000000087c10] memory_add_physaddr_to_nid+0x20/0x60
[   80.604521] [c0002dbff8493b40] [c0000000010d4880] drmem_init+0x25c/0x2f0
[   80.604530] [c0002dbff8493c10] [c000000000010154] do_one_initcall+0x64/0x2c0
[   80.604540] [c0002dbff8493ce0] [c0000000010c4aa0] kernel_init_freeable+0x2d8/0x3a0
[   80.604550] [c0002dbff8493db0] [c000000000010824] kernel_init+0x2c/0x148
[   80.604560] [c0002dbff8493e20] [c00000000000b648] ret_from_kernel_thread+0x5c/0x74
[   80.604567] Instruction dump:
[   80.604574] 392918e8 e9490000 e90a000a e92a0000 80ea000c 1d080018 3908ffe8 7d094214
[   80.604586] 7fa94040 419d00dc e9490010 714a0088 <2faa0008> 409e00ac e9490000 7fbe5040
[   89.047390] drmem: 249854 LMB(s)

With a patched kernel on the same machine we're no longer seeing the
soft lockup.  drmem_init() now completes in negligible time, even when
the LMB count is large.

Fixes: b2d3b5ee66 ("powerpc/pseries: Track LMB nid instead of using device tree")
Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200811015115.63677-1-cheloha@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29 09:57:47 +01:00
Jason Liu 2f68e5475b Merge tag 'v5.4.70' into imx_5.4.y
* tag 'v5.4.70': (3051 commits)
  Linux 5.4.70
  netfilter: ctnetlink: add a range check for l3/l4 protonum
  ep_create_wakeup_source(): dentry name can change under you...
  ...

 Conflicts:
	arch/arm/mach-imx/pm-imx6.c
	arch/arm64/boot/dts/freescale/imx8mm-evk.dts
	arch/arm64/boot/dts/freescale/imx8mn-ddr4-evk.dts
	drivers/crypto/caam/caamalg.c
	drivers/gpu/drm/imx/dw_hdmi-imx.c
	drivers/gpu/drm/imx/imx-ldb.c
	drivers/gpu/drm/imx/ipuv3/ipuv3-crtc.c
	drivers/mmc/host/sdhci-esdhc-imx.c
	drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
	drivers/net/ethernet/freescale/enetc/enetc.c
	drivers/net/ethernet/freescale/enetc/enetc_pf.c
	drivers/thermal/imx_thermal.c
	drivers/usb/cdns3/ep0.c
	drivers/xen/swiotlb-xen.c
	sound/soc/fsl/fsl_esai.c
	sound/soc/fsl/fsl_sai.c

Signed-off-by: Jason Liu <jason.hui.liu@nxp.com>
2020-10-08 17:46:51 +08:00
Alexey Kardashevskiy f211830829 powerpc/book3s64: Fix error handling in mm_iommu_do_alloc()
[ Upstream commit c4b78169e3667413184c9a20e11b5832288a109f ]

The last jump to free_exit in mm_iommu_do_alloc() happens after page
pointers in struct mm_iommu_table_group_mem_t were already converted to
physical addresses. Thus calling put_page() on these physical addresses
will likely crash.

This moves the loop which calculates the pageshift and converts page
struct pointers to physical addresses later after the point when
we cannot fail; thus eliminating the need to convert pointers back.

Fixes: eb9d7a62c3 ("powerpc/mm_iommu: Fix potential deadlock")
Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191223060351.26359-1-aik@ozlabs.ru
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01 13:17:38 +02:00
Aneesh Kumar K.V 600cafd55b powerpc/book3s64/radix: Fix boot failure with large amount of guest memory
[ Upstream commit 103a8542cb35b5130f732d00b0419a594ba1b517 ]

If the hypervisor doesn't support hugepages, the kernel ends up allocating a large
number of page table pages. The early page table allocation was wrongly
setting the max memblock limit to ppc64_rma_size with radix translation
which resulted in boot failure as shown below.

Kernel panic - not syncing:
early_alloc_pgtable: Failed to allocate 16777216 bytes align=0x1000000 nid=-1 from=0x0000000000000000 max_addr=0xffffffffffffffff
 CPU: 0 PID: 0 Comm: swapper Not tainted 5.8.0-24.9-default+ #2
 Call Trace:
 [c0000000016f3d00] [c0000000007c6470] dump_stack+0xc4/0x114 (unreliable)
 [c0000000016f3d40] [c00000000014c78c] panic+0x164/0x418
 [c0000000016f3dd0] [c000000000098890] early_alloc_pgtable+0xe0/0xec
 [c0000000016f3e60] [c0000000010a5440] radix__early_init_mmu+0x360/0x4b4
 [c0000000016f3ef0] [c000000001099bac] early_init_mmu+0x1c/0x3c
 [c0000000016f3f10] [c00000000109a320] early_setup+0x134/0x170

This was because the kernel was checking for the radix feature before we enable the
feature via mmu_features. This resulted in the kernel using hash restrictions on
radix.

Rework the early init code such that the kernel boot with memblock restrictions
as imposed by hash. At that point, the kernel still hasn't finalized the
translation the kernel will end up using.

We have three different ways of detecting radix.

1. dt_cpu_ftrs_scan -> used only in case of PowerNV
2. ibm,pa-features -> Used when we don't use cpu_dt_ftr_scan
3. CAS -> Where we negotiate with hypervisor about the supported translation.

We look at 1 or 2 early in the boot and after that, we look at the CAS vector to
finalize the translation the kernel will use. We also support a kernel command
line option (disable_radix) to switch to hash.

Update the memblock limit after mmu_early_init_devtree() if the kernel is going
to use radix translation. This forces some of the memblock allocations we do before
mmu_early_init_devtree() to be within the RMA limit.

Fixes: 2bfd65e45e ("powerpc/mm/radix: Add radix callbacks for early init routines")
Reported-by: Shirisha Ganta <shiganta@in.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200828100852.426575-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-23 12:40:39 +02:00
Michael Ellerman 9de20a6c4d powerpc: Allow 4224 bytes of stack expansion for the signal frame
commit 63dee5df43a31f3844efabc58972f0a206ca4534 upstream.

We have powerpc specific logic in our page fault handling to decide if
an access to an unmapped address below the stack pointer should expand
the stack VMA.

The code was originally added in 2004 "ported from 2.4". The rough
logic is that the stack is allowed to grow to 1MB with no extra
checking. Over 1MB the access must be within 2048 bytes of the stack
pointer, or be from a user instruction that updates the stack pointer.

The 2048 byte allowance below the stack pointer is there to cover the
288 byte "red zone" as well as the "about 1.5kB" needed by the signal
delivery code.

Unfortunately since then the signal frame has expanded, and is now
4224 bytes on 64-bit kernels with transactional memory enabled. This
means if a process has consumed more than 1MB of stack, and its stack
pointer lies less than 4224 bytes from the next page boundary, signal
delivery will fault when trying to expand the stack and the process
will see a SEGV.

The total size of the signal frame is the size of struct rt_sigframe
(which includes the red zone) plus __SIGNAL_FRAMESIZE (128 bytes on
64-bit).

The 2048 byte allowance was correct until 2008 as the signal frame
was:

struct rt_sigframe {
        struct ucontext    uc;                           /*     0  1440 */
        /* --- cacheline 11 boundary (1408 bytes) was 32 bytes ago --- */
        long unsigned int          _unused[2];           /*  1440    16 */
        unsigned int               tramp[6];             /*  1456    24 */
        struct siginfo *           pinfo;                /*  1480     8 */
        void *                     puc;                  /*  1488     8 */
        struct siginfo     info;                         /*  1496   128 */
        /* --- cacheline 12 boundary (1536 bytes) was 88 bytes ago --- */
        char                       abigap[288];          /*  1624   288 */

        /* size: 1920, cachelines: 15, members: 7 */
        /* padding: 8 */
};

1920 + 128 = 2048

Then in commit ce48b21007 ("powerpc: Add VSX context save/restore,
ptrace and signal support") (Jul 2008) the signal frame expanded to
2304 bytes:

struct rt_sigframe {
        struct ucontext    uc;                           /*     0  1696 */	<--
        /* --- cacheline 13 boundary (1664 bytes) was 32 bytes ago --- */
        long unsigned int          _unused[2];           /*  1696    16 */
        unsigned int               tramp[6];             /*  1712    24 */
        struct siginfo *           pinfo;                /*  1736     8 */
        void *                     puc;                  /*  1744     8 */
        struct siginfo     info;                         /*  1752   128 */
        /* --- cacheline 14 boundary (1792 bytes) was 88 bytes ago --- */
        char                       abigap[288];          /*  1880   288 */

        /* size: 2176, cachelines: 17, members: 7 */
        /* padding: 8 */
};

2176 + 128 = 2304

At this point we should have been exposed to the bug, though as far as
I know it was never reported. I no longer have a system old enough to
easily test on.

Then in 2010 commit 320b2b8de1 ("mm: keep a guard page below a
grow-down stack segment") caused our stack expansion code to never
trigger, as there was always a VMA found for a write up to PAGE_SIZE
below r1.

That meant the bug was hidden as we continued to expand the signal
frame in commit 2b0a576d15 ("powerpc: Add new transactional memory
state to the signal context") (Feb 2013):

struct rt_sigframe {
        struct ucontext    uc;                           /*     0  1696 */
        /* --- cacheline 13 boundary (1664 bytes) was 32 bytes ago --- */
        struct ucontext    uc_transact;                  /*  1696  1696 */	<--
        /* --- cacheline 26 boundary (3328 bytes) was 64 bytes ago --- */
        long unsigned int          _unused[2];           /*  3392    16 */
        unsigned int               tramp[6];             /*  3408    24 */
        struct siginfo *           pinfo;                /*  3432     8 */
        void *                     puc;                  /*  3440     8 */
        struct siginfo     info;                         /*  3448   128 */
        /* --- cacheline 27 boundary (3456 bytes) was 120 bytes ago --- */
        char                       abigap[288];          /*  3576   288 */

        /* size: 3872, cachelines: 31, members: 8 */
        /* padding: 8 */
        /* last cacheline: 32 bytes */
};

3872 + 128 = 4000

And commit 573ebfa660 ("powerpc: Increase stack redzone for 64-bit
userspace to 512 bytes") (Feb 2014):

struct rt_sigframe {
        struct ucontext    uc;                           /*     0  1696 */
        /* --- cacheline 13 boundary (1664 bytes) was 32 bytes ago --- */
        struct ucontext    uc_transact;                  /*  1696  1696 */
        /* --- cacheline 26 boundary (3328 bytes) was 64 bytes ago --- */
        long unsigned int          _unused[2];           /*  3392    16 */
        unsigned int               tramp[6];             /*  3408    24 */
        struct siginfo *           pinfo;                /*  3432     8 */
        void *                     puc;                  /*  3440     8 */
        struct siginfo     info;                         /*  3448   128 */
        /* --- cacheline 27 boundary (3456 bytes) was 120 bytes ago --- */
        char                       abigap[512];          /*  3576   512 */	<--

        /* size: 4096, cachelines: 32, members: 8 */
        /* padding: 8 */
};

4096 + 128 = 4224

Then finally in 2017, commit 1be7107fbe ("mm: larger stack guard
gap, between vmas") exposed us to the existing bug, because it changed
the stack VMA to be the correct/real size, meaning our stack expansion
code is now triggered.

Fix it by increasing the allowance to 4224 bytes.

Hard-coding 4224 is obviously unsafe against future expansions of the
signal frame in the same way as the existing code. We can't easily use
sizeof() because the signal frame structure is not in a header. We
will either fix that, or rip out all the custom stack expansion
checking logic entirely.

Fixes: ce48b21007 ("powerpc: Add VSX context save/restore, ptrace and signal support")
Cc: stable@vger.kernel.org # v2.6.27+
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Tested-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724092528.1578671-2-mpe@ellerman.id.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-21 13:05:24 +02:00
Christophe Leroy 2150c25b76 powerpc/ptdump: Fix build failure in hashpagetable.c
commit 7c466b0807960edc13e4b855be85ea765df9a6cd upstream.

H_SUCCESS is only defined when CONFIG_PPC_PSERIES is defined.

!= H_SUCCESS means != 0. Modify the test accordingly.

Fixes: 65e701b2d2 ("powerpc/ptdump: drop non vital #ifdefs")
Cc: stable@vger.kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/795158fc1d2b3dff3bf7347881947a887ea9391a.1592227105.git.christophe.leroy@csgroup.eu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-21 13:05:24 +02:00
Aneesh Kumar K.V 44ae76d01d powerpc/book3s64/pkeys: Use PVR check instead of cpu feature
[ Upstream commit d79e7a5f26f1d179cbb915a8bf2469b6d7431c29 ]

We are wrongly using CPU_FTRS_POWER8 to check for P8 support. Instead, we should
use PVR value. Now considering we are using CPU_FTRS_POWER8, that
implies we returned true for P9 with older firmware. Keep the same behavior
by checking for P9 PVR value.

Fixes: cf43d3b264 ("powerpc: Enable pkey subsystem")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200709032946.881753-2-aneesh.kumar@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-19 08:16:13 +02:00
Christophe Leroy ceff42e6c1 Revert "powerpc/kasan: Fix shadow pages allocation failure"
commit b506923ee44ae87fc9f4de16b53feb313623e146 upstream.

This reverts commit d2a91cef9bbdeb87b7449fdab1a6be6000930210.

This commit moved too much work in kasan_init(). The allocation
of shadow pages has to be moved for the reason explained in that
patch, but the allocation of page tables still need to be done
before switching to the final hash table.

First revert the incorrect commit, following patch redoes it
properly.

Fixes: d2a91cef9bbd ("powerpc/kasan: Fix shadow pages allocation failure")
Cc: stable@vger.kernel.org
Reported-by: Erhard F. <erhard_f@mailbox.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=208181
Link: https://lore.kernel.org/r/3667deb0911affbf999b99f87c31c77d5e870cd2.1593690707.git.christophe.leroy@csgroup.eu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-11 15:33:39 +02:00
Aneesh Kumar K.V 93d1e96b98 powerpc/book3s64/pkeys: Fix pkey_access_permitted() for execute disable pkey
commit 192b6a780598976feb7321ff007754f8511a4129 upstream.

Even if the IAMR value denies execute access, the current code returns
true from pkey_access_permitted() for an execute permission check, if
the AMR read pkey bit is cleared.

This results in repeated page fault loop with a test like below:

  #define _GNU_SOURCE
  #include <errno.h>
  #include <stdio.h>
  #include <stdlib.h>
  #include <signal.h>
  #include <inttypes.h>

  #include <assert.h>
  #include <malloc.h>
  #include <unistd.h>
  #include <pthread.h>
  #include <sys/mman.h>

  #ifdef SYS_pkey_mprotect
  #undef SYS_pkey_mprotect
  #endif

  #ifdef SYS_pkey_alloc
  #undef SYS_pkey_alloc
  #endif

  #ifdef SYS_pkey_free
  #undef SYS_pkey_free
  #endif

  #undef PKEY_DISABLE_EXECUTE
  #define PKEY_DISABLE_EXECUTE	0x4

  #define SYS_pkey_mprotect	386
  #define SYS_pkey_alloc		384
  #define SYS_pkey_free		385

  #define PPC_INST_NOP		0x60000000
  #define PPC_INST_BLR		0x4e800020
  #define PROT_RWX		(PROT_READ | PROT_WRITE | PROT_EXEC)

  static int sys_pkey_mprotect(void *addr, size_t len, int prot, int pkey)
  {
  	return syscall(SYS_pkey_mprotect, addr, len, prot, pkey);
  }

  static int sys_pkey_alloc(unsigned long flags, unsigned long access_rights)
  {
  	return syscall(SYS_pkey_alloc, flags, access_rights);
  }

  static int sys_pkey_free(int pkey)
  {
  	return syscall(SYS_pkey_free, pkey);
  }

  static void do_execute(void *region)
  {
  	/* jump to region */
  	asm volatile(
  		"mtctr	%0;"
  		"bctrl"
  		: : "r"(region) : "ctr", "lr");
  }

  static void do_protect(void *region)
  {
  	size_t pgsize;
  	int i, pkey;

  	pgsize = getpagesize();

  	pkey = sys_pkey_alloc(0, PKEY_DISABLE_EXECUTE);
  	assert (pkey > 0);

  	/* perform mprotect */
  	assert(!sys_pkey_mprotect(region, pgsize, PROT_RWX, pkey));
  	do_execute(region);

  	/* free pkey */
  	assert(!sys_pkey_free(pkey));

  }

  int main(int argc, char **argv)
  {
  	size_t pgsize, numinsns;
  	unsigned int *region;
  	int i;

  	/* allocate memory region to protect */
  	pgsize = getpagesize();
  	region = memalign(pgsize, pgsize);
  	assert(region != NULL);
  	assert(!mprotect(region, pgsize, PROT_RWX));

  	/* fill page with NOPs with a BLR at the end */
  	numinsns = pgsize / sizeof(region[0]);
  	for (i = 0; i < numinsns - 1; i++)
  		region[i] = PPC_INST_NOP;
  	region[i] = PPC_INST_BLR;

  	do_protect(region);

  	return EXIT_SUCCESS;
  }

The fix is to only check the IAMR for an execute check, the AMR value
is not relevant.

Fixes: f2407ef3ba ("powerpc: helper to validate key-access permissions of a pte")
Cc: stable@vger.kernel.org # v4.16+
Reported-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Add detail to change log, tweak wording & formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200712132047.1038594-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-22 09:33:14 +02:00
Christophe Leroy 5578bf5ce0 powerpc/32s: Don't warn when mapping RO data ROX.
[ Upstream commit 4b19f96a81bceaf0bcf44d79c0855c61158065ec ]

Mapping RO data as ROX is not an issue since that data
cannot be modified to introduce an exploit.

PPC64 accepts to have RO data mapped ROX, as a trade off
between kernel size and strictness of protection.

On PPC32, kernel size is even more critical as amount of
memory is usually small.

Depending on the number of available IBATs, the last IBATs
might overflow the end of text. Only warn if it crosses
the end of RO data.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6499f8eeb2a36330e5c9fc1cee9a79374875bd54.1589866984.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24 17:50:34 +02:00
Christophe Leroy 8e52a1eef1 powerpc/ptdump: Add _PAGE_COHERENT flag
[ Upstream commit 3af4786eb429b2df76cbd7ce3bae21467ac3e4fb ]

For platforms using shared.c (4xx, Book3e, Book3s/32), also handle the
_PAGE_COHERENT flag which corresponds to the M bit of the WIMG flags.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Make it more verbose, use "coherent" rather than "m"]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/324c3d860717e8e91fca3bb6c0f8b23e1644a404.1589866984.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24 17:50:16 +02:00
Christophe Leroy 4719cb8a16 powerpc/kasan: Fix shadow pages allocation failure
commit d2a91cef9bbdeb87b7449fdab1a6be6000930210 upstream.

Doing kasan pages allocation in MMU_init is too early, kernel doesn't
have access yet to the entire memory space and memblock_alloc() fails
when the kernel is a bit big.

Do it from kasan_init() instead.

Fixes: 2edb16efc8 ("powerpc/32: Add KASAN support")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c24163ee5d5f8cdf52fefa45055ceb35435b8f15.1589866984.git.christophe.leroy@csgroup.eu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-22 09:31:23 +02:00
Christophe Leroy e418045e25 powerpc/mm: Fix conditions to perform MMU specific management by blocks on PPC32.
commit 4e3319c23a66dabfd6c35f4d2633d64d99b68096 upstream.

Setting init mem to NX shall depend on sinittext being mapped by
block, not on stext being mapped by block.

Setting text and rodata to RO shall depend on stext being mapped by
block, not on sinittext being mapped by block.

Fixes: 63b2bc6195 ("powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7d565fb8f51b18a3d98445a830b2f6548cb2da2a.1589866984.git.christophe.leroy@csgroup.eu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-22 09:31:13 +02:00
Jason Liu 5691e22711 Merge tag 'v5.4.47' into imx_5.4.y
* tag 'v5.4.47': (2193 commits)
  Linux 5.4.47
  KVM: arm64: Save the host's PtrAuth keys in non-preemptible context
  KVM: arm64: Synchronize sysreg state on injecting an AArch32 exception
  ...

 Conflicts:
	arch/arm/boot/dts/imx6qdl.dtsi
	arch/arm/mach-imx/Kconfig
	arch/arm/mach-imx/common.h
	arch/arm/mach-imx/suspend-imx6.S
	arch/arm64/boot/dts/freescale/imx8qxp-mek.dts
	arch/powerpc/include/asm/cacheflush.h
	drivers/cpufreq/imx6q-cpufreq.c
	drivers/dma/imx-sdma.c
	drivers/edac/synopsys_edac.c
	drivers/firmware/imx/imx-scu.c
	drivers/net/ethernet/freescale/fec.h
	drivers/net/ethernet/freescale/fec_main.c
	drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
	drivers/net/phy/phy_device.c
	drivers/perf/fsl_imx8_ddr_perf.c
	drivers/usb/cdns3/gadget.c
	drivers/usb/dwc3/gadget.c
	include/uapi/linux/dma-buf.h

Signed-off-by: Jason Liu <jason.hui.liu@nxp.com>
2020-06-19 17:32:49 +08:00
Christophe Leroy ac0175f9dd powerpc/ptdump: Properly handle non standard page size
commit b00ff6d8c1c3898b0f768cbb38ef722d25bd2f39 upstream.

In order to properly display information regardless of the page size,
it is necessary to take into account real page size.

Fixes: cabe8138b2 ("powerpc: dump as a single line areas mapping a single physical page.")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a53b2a0ffd042a8d85464bf90d55bc5b970e00a1.1589866984.git.christophe.leroy@csgroup.eu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-17 16:40:26 +02:00
Christophe Leroy c1f3e1d8d7 powerpc/kasan: Fix kasan_remap_early_shadow_ro()
[ Upstream commit af92bad615be75c6c0d1b1c5b48178360250a187 ]

At the moment kasan_remap_early_shadow_ro() does nothing, because
k_end is 0 and k_cur < 0 is always true.

Change the test to k_cur != k_end, as done in
kasan_init_shadow_page_tables()

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Fixes: cbd18991e2 ("powerpc/mm: Fix an Oops in kasan_mmu_init()")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4e7b56865e01569058914c991143f5961b5d4719.1583507333.git.christophe.leroy@c-s.fr
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-17 10:50:25 +02:00
Laurentiu Tudor 81b9336ab2 powerpc/fsl_booke: Avoid creating duplicate tlb1 entry
commit aa4113340ae6c2811e046f08c2bc21011d20a072 upstream.

In the current implementation, the call to loadcam_multi() is wrapped
between switch_to_as1() and restore_to_as0() calls so, when it tries
to create its own temporary AS=1 TLB1 entry, it ends up duplicating
the existing one created by switch_to_as1(). Add a check to skip
creating the temporary entry if already running in AS=1.

Fixes: d9e1831a42 ("powerpc/85xx: Load all early TLB entries at once")
Cc: stable@vger.kernel.org # v4.4+
Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Acked-by: Scott Wood <oss@buserror.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200123111914.2565-1-laurentiu.tudor@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17 10:50:22 +02:00
Michael Ellerman b408bfa79b powerpc/mm: Fix missing KUAP disable in flush_coherent_icache()
[ Upstream commit 59bee45b9712c759ea4d3dcc4eff1752f3a66558 ]

Stefan reported a strange kernel fault which turned out to be due to a
missing KUAP disable in flush_coherent_icache() called from
flush_icache_range().

The fault looks like:

  Kernel attempted to access user page (7fffc30d9c00) - exploit attempt? (uid: 1009)
  BUG: Unable to handle kernel data access on read at 0x7fffc30d9c00
  Faulting instruction address: 0xc00000000007232c
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
  CPU: 35 PID: 5886 Comm: sigtramp Not tainted 5.6.0-rc2-gcc-8.2.0-00003-gfc37a1632d40 #79
  NIP:  c00000000007232c LR: c00000000003b7fc CTR: 0000000000000000
  REGS: c000001e11093940 TRAP: 0300   Not tainted  (5.6.0-rc2-gcc-8.2.0-00003-gfc37a1632d40)
  MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 28000884  XER: 00000000
  CFAR: c0000000000722fc DAR: 00007fffc30d9c00 DSISR: 08000000 IRQMASK: 0
  GPR00: c00000000003b7fc c000001e11093bd0 c0000000023ac200 00007fffc30d9c00
  GPR04: 00007fffc30d9c18 0000000000000000 c000001e11093bd4 0000000000000000
  GPR08: 0000000000000000 0000000000000001 0000000000000000 c000001e1104ed80
  GPR12: 0000000000000000 c000001fff6ab380 c0000000016be2d0 4000000000000000
  GPR16: c000000000000000 bfffffffffffffff 0000000000000000 0000000000000000
  GPR20: 00007fffc30d9c00 00007fffc30d8f58 00007fffc30d9c18 00007fffc30d9c20
  GPR24: 00007fffc30d9c18 0000000000000000 c000001e11093d90 c000001e1104ed80
  GPR28: c000001e11093e90 0000000000000000 c0000000023d9d18 00007fffc30d9c00
  NIP flush_icache_range+0x5c/0x80
  LR  handle_rt_signal64+0x95c/0xc2c
  Call Trace:
    0xc000001e11093d90 (unreliable)
    handle_rt_signal64+0x93c/0xc2c
    do_notify_resume+0x310/0x430
    ret_from_except_lite+0x70/0x74
  Instruction dump:
  409e002c 7c0802a6 3c62ff31 3863f6a0 f8010080 48195fed 60000000 48fe4c8d
  60000000 e8010080 7c0803a6 7c0004ac <7c00ffac> 7c0004ac 4c00012c 38210070

This path through handle_rt_signal64() to setup_trampoline() and
flush_icache_range() is only triggered by 64-bit processes that have
unmapped their VDSO, which is rare.

flush_icache_range() takes a range of addresses to flush. In
flush_coherent_icache() we implement an optimisation for CPUs where we
know we don't actually have to flush the whole range, we just need to
do a single icbi.

However we still execute the icbi on the user address of the start of
the range we're flushing. On CPUs that also implement KUAP (Power9)
that leads to the spurious fault above.

We should be able to pass any address, including a kernel address, to
the icbi on these CPUs, which would avoid any interaction with KUAP.
But I don't want to make that change in a bug fix, just in case it
surfaces some strange behaviour on some CPU.

So for now just disable KUAP around the icbi. Note the icbi is treated
as a load, so we allow read access, not write as you'd expect.

Fixes: 890274c2dc ("powerpc/64s: Implement KUAP for Radix MMU")
Cc: stable@vger.kernel.org # v5.2+
Reported-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200303235708.26004-1-mpe@ellerman.id.au
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-12 13:00:25 +01:00
Alastair D'Silva 42dac16dc4 powerpc: Convert flush_icache_range & friends to C
[ Upstream commit 23eb7f560a2a6a1b0dbaaaae8685da75314347e4 ]

Similar to commit 22e9c88d48
("powerpc/64: reuse PPC32 static inline flush_dcache_range()")
this patch converts the following ASM symbols to C:
    flush_icache_range()
    __flush_dcache_icache()
    __flush_dcache_icache_phys()

This was done as we discovered a long-standing bug where the length of the
range was truncated due to using a 32 bit shift instead of a 64 bit one.

By converting these functions to C, it becomes easier to maintain.

flush_dcache_icache_phys() retains a critical assembler section as we must
ensure there are no memory accesses while the data MMU is disabled
(authored by Christophe Leroy). Since this has no external callers, it has
also been made static, allowing the compiler to inline it within
flush_dcache_icache_page().

Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[mpe: Minor fixups, don't export __flush_dcache_icache()]
Link: https://lore.kernel.org/r/20191104023305.9581-5-alastair@au1.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-12 13:00:25 +01:00
Jason Liu 335d2828a9 This is the 5.4.24 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl5hHjgACgkQONu9yGCS
 aT6CSBAA0c16mnDb59jgmW/sBj/p/MrlD/WJzLriqiKN5BUsPt9++I5mNj8mG+d2
 Glm4086e8L826zv8oKiZm23xk93on+78ExhVFVZvZNaEUpiRNYCGSuDq2NrHW0z+
 kpagkAFLfCUZFoKtmWo+bpl0YtF4dd/fg7+EjyL6qT1DBs8NVMwZx7i/v0xXv7Wc
 0vsGCLYoBLzcW1FB2d9cfAUPCBuGEzL/7TdifNOXRgI9owGsZndFJgXgIzoBUt/P
 tqB8RLjIupCiMEPtsEAZ/rgEQLPFkb3yrBvgjd1wDI8bHUIQU0clqThKVNvmNSmv
 UTBSNgPAhkP8nZG7X9xCkyfEsUefejBJy66da9n4XTGGrXf9ga0BL0nNrOGwOesr
 m+tNnBSFsbFCMqFopQnt4zZSnaf67AOk2mzxbEu4E+sStyW943aDO9MoRRFgaYGH
 pfie3qOKtKta2MuNTJA+q6F0W9H+V5MtMpwbyuy1/dp2eVln2wewBBMvXYdL1YOy
 E/Z87nsQgalsDynz9m/niv32J4JAxHptyOyROkktDLBSzL5RawNn+Op8X5EtmZOe
 sPkiYicqp9CLmMj13qWXJhtuyNdD4wk6FyyAy6cX9mF44+EZGOBkyNP+n8g789Kn
 sqFJ7sfTfOnwLBFciMA5PaMTGNWROyWXNkvvUzO+9t0CyFAnT2U=
 =abGA
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.24' into imx_5.4.y

Merge Linux stable release v5.4.24 into imx_5.4.y

* tag 'v5.4.24': (3306 commits)
  Linux 5.4.24
  blktrace: Protect q->blk_trace with RCU
  kvm: nVMX: VMWRITE checks unsupported field before read-only field
  ...

Signed-off-by: Jason Liu <jason.hui.liu@nxp.com>

 Conflicts:
	arch/arm/boot/dts/imx6sll-evk.dts
	arch/arm/boot/dts/imx7ulp.dtsi
	arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
	drivers/clk/imx/clk-composite-8m.c
	drivers/gpio/gpio-mxc.c
	drivers/irqchip/Kconfig
	drivers/mmc/host/sdhci-of-esdhc.c
	drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
	drivers/net/can/flexcan.c
	drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
	drivers/net/ethernet/mscc/ocelot.c
	drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
	drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
	drivers/net/phy/realtek.c
	drivers/pci/controller/mobiveil/pcie-mobiveil-host.c
	drivers/perf/fsl_imx8_ddr_perf.c
	drivers/tee/optee/shm_pool.c
	drivers/usb/cdns3/gadget.c
	kernel/sched/cpufreq.c
	net/core/xdp.c
	sound/soc/fsl/fsl_esai.c
	sound/soc/fsl/fsl_sai.c
	sound/soc/sof/core.c
	sound/soc/sof/imx/Kconfig
	sound/soc/sof/loader.c
2020-03-08 18:57:18 +08:00
Christophe Leroy 723a44f228 powerpc/hugetlb: Fix 512k hugepages on 8xx with 16k page size
commit f2b67ef90b0d5eca0f2255e02cf2f620bc0ddcdb upstream.

Commit 55c8fc3f49 ("powerpc/8xx: reintroduce 16K pages with HW
assistance") redefined pte_t as a struct of 4 pte_basic_t, because
in 16K pages mode there are four identical entries in the
page table. But the size of hugepage tables is calculated based
of the size of (void *). Therefore, we end up with page tables
of size 1k instead of 4k for 512k pages.

As 512k hugepage tables are the same size as standard page tables,
ie 4k, use the standard page tables instead of PGT_CACHE tables.

Fixes: 3fb69c6a1a ("powerpc/8xx: Enable 512k hugepage support with HW assistance")
Cc: stable@vger.kernel.org # v5.0+
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/90ec56a2315be602494619ed0223bba3b0b8d619.1580997007.git.christophe.leroy@c-s.fr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-28 17:22:18 +01:00
Christophe Leroy 7288d5338c powerpc/mm: Don't log user reads to 0xffffffff
[ Upstream commit 0f9aee0cb9da7db7d96f63cfa2dc5e4f1bffeb87 ]

Running vdsotest leaves many times the following log:

  [   79.629901] vdsotest[396]: User access of kernel address (ffffffff) - exploit attempt? (uid: 0)

A pointer set to (-1) is likely a programming error similar to
a NULL pointer and is not worth logging as an exploit attempt.

Don't log user accesses to 0xffffffff.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0728849e826ba16f1fbd6fa7f5c6cc87bd64e097.1577087627.git.christophe.leroy@c-s.fr
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24 08:36:58 +01:00
Christophe Leroy 17f37249d7 powerpc/ptdump: Fix W+X verification call in mark_rodata_ro()
commit e26ad936dd89d79f66c2b567f700e0c2a7103070 upstream.

ptdump_check_wx() also have to be called when pages are mapped
by blocks.

Fixes: 453d87f6a8 ("powerpc/mm: Warn if W+X pages found on boot")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/37517da8310f4457f28921a4edb88fb21d27b62a.1578989531.git.christophe.leroy@c-s.fr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-14 16:34:16 -05:00
Christophe Leroy 4135a03fdf powerpc/32s: Fix bad_kuap_fault()
commit 6ec20aa2e510b6297906c45f009aa08b2d97269a upstream.

At the moment, bad_kuap_fault() reports a fault only if a bad access
to userspace occurred while access to userspace was not granted.

But if a fault occurs for a write outside the allowed userspace
segment(s) that have been unlocked, bad_kuap_fault() fails to
detect it and the kernel loops forever in do_page_fault().

Fix it by checking that the accessed address is within the allowed
range.

Fixes: a68c31fc01 ("powerpc/32s: Implement Kernel Userspace Access Protection")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f48244e9485ada0a304ed33ccbb8da271180c80d.1579866752.git.christophe.leroy@c-s.fr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 04:35:18 -08:00
Christophe Leroy 796085dbe3 powerpc/ptdump: Fix W+X verification
commit d80ae83f1f932ab7af47b54d0d3bef4f4dba489f upstream.

Verification cannot rely on simple bit checking because on some
platforms PAGE_RW is 0, checking that a page is not W means
checking that PAGE_RO is set instead of checking that PAGE_RW
is not set.

Use pte helpers instead of checking bits.

Fixes: 453d87f6a8 ("powerpc/mm: Warn if W+X pages found on boot")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0d894839fdbb19070f0e1e4140363be4f2bb62fc.1578989540.git.christophe.leroy@c-s.fr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 04:35:17 -08:00
Aneesh Kumar K.V 2cabe61ab8 powerpc/mmu_gather: enable RCU_TABLE_FREE even for !SMP case
commit 12e4d53f3f04e81f9e83d6fc10edc7314ab9f6b9 upstream.

Patch series "Fixup page directory freeing", v4.

This is a repost of patch series from Peter with the arch specific changes
except ppc64 dropped.  ppc64 changes are added here because we are redoing
the patch series on top of ppc64 changes.  This makes it easy to backport
these changes.  Only the first 2 patches need to be backported to stable.

The thing is, on anything SMP, freeing page directories should observe the
exact same order as normal page freeing:

 1) unhook page/directory
 2) TLB invalidate
 3) free page/directory

Without this, any concurrent page-table walk could end up with a
Use-after-Free.  This is esp.  trivial for anything that has software
page-table walkers (HAVE_FAST_GUP / software TLB fill) or the hardware
caches partial page-walks (ie.  caches page directories).

Even on UP this might give issues since mmu_gather is preemptible these
days.  An interrupt or preempted task accessing user pages might stumble
into the free page if the hardware caches page directories.

This patch series fixes ppc64 and add generic MMU_GATHER changes to
support the conversion of other architectures.  I haven't added patches
w.r.t other architecture because they are yet to be acked.

This patch (of 9):

A followup patch is going to make sure we correctly invalidate page walk
cache before we free page table pages.  In order to keep things simple
enable RCU_TABLE_FREE even for !SMP so that we don't have to fixup the
!SMP case differently in the followup patch

!SMP case is right now broken for radix translation w.r.t page walk
cache flush.  We can get interrupted in between page table free and
that would imply we have page walk cache entries pointing to tables
which got freed already.  Michael said "both our platforms that run on
Power9 force SMP on in Kconfig, so the !SMP case is unlikely to be a
problem for anyone in practice, unless they've hacked their kernel to
build it !SMP."

Link: http://lkml.kernel.org/r/20200116064531.483522-2-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 04:35:17 -08:00
Mike Rapoport 673e43cf49 powerpc: Ensure that swiotlb buffer is allocated from low memory
[ Upstream commit 8fabc623238e68b3ac63c0dd1657bf86c1fa33af ]

Some powerpc platforms (e.g. 85xx) limit DMA-able memory way below 4G.
If a system has more physical memory than this limit, the swiotlb
buffer is not addressable because it is allocated from memblock using
top-down mode.

Force memblock to bottom-up mode before calling swiotlb_init() to
ensure that the swiotlb buffer is DMA-able.

Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191204123524.22919-1-rppt@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-12 12:21:28 +01:00
Aneesh Kumar K.V d7742abfe6 powerpc/pmem: Fix kernel crash due to wrong range value usage in flush_dcache_range
commit 6f4679b956741d2da6ad3ebb738cbe1264ac8781 upstream.

This patch fix the below kernel crash.

 BUG: Unable to handle kernel data access on read at 0xc000000380000000
 Faulting instruction address: 0xc00000000008b6f0
cpu 0x5: Vector: 300 (Data Access) at [c0000000d8587790]
    pc: c00000000008b6f0: arch_remove_memory+0x150/0x210
    lr: c00000000008b720: arch_remove_memory+0x180/0x210
    sp: c0000000d8587a20
   msr: 800000000280b033
   dar: c000000380000000
 dsisr: 40000000
  current = 0xc0000000d8558600
  paca    = 0xc00000000fff8f00   irqmask: 0x03   irq_happened: 0x01
    pid   = 1220, comm = ndctl
enter ? for help
 memunmap_pages+0x33c/0x410
 devm_action_release+0x30/0x50
 release_nodes+0x30c/0x3a0
 device_release_driver_internal+0x178/0x240
 unbind_store+0x74/0x190
 drv_attr_store+0x44/0x60
 sysfs_kf_write+0x74/0xa0
 kernfs_fop_write+0x1b0/0x260
 __vfs_write+0x3c/0x70
 vfs_write+0xe4/0x200
 ksys_write+0x7c/0x140
 system_call+0x5c/0x68

Fixes: 076265907cf9 ("powerpc: Chunk calls to flush_dcache_range in arch_*_memory")
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191204052909.59145-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09 10:25:53 +01:00
Michael Ellerman 2fb97a8e25 powerpc/mm: Mark get_slice_psize() & slice_addr_is_low() as notrace
commit 91a063c956084fb21cf2523bce6892514e3f1799 upstream.

These slice routines are called from the SLB miss handler, which can
lead to warnings from the IRQ code, because we have not reconciled the
IRQ state properly:

  WARNING: CPU: 72 PID: 30150 at arch/powerpc/kernel/irq.c:258 arch_local_irq_restore.part.0+0xcc/0x100
  Modules linked in:
  CPU: 72 PID: 30150 Comm: ftracetest Not tainted 5.5.0-rc2-gcc9x-g7e0165b2f1a9 #1
  NIP:  c00000000001d83c LR: c00000000029ab90 CTR: c00000000026cf90
  REGS: c0000007eee3b960 TRAP: 0700   Not tainted  (5.5.0-rc2-gcc9x-g7e0165b2f1a9)
  MSR:  8000000000021033 <SF,ME,IR,DR,RI,LE>  CR: 22242844  XER: 20000000
  CFAR: c00000000001d780 IRQMASK: 0
  ...
  NIP arch_local_irq_restore.part.0+0xcc/0x100
  LR  trace_graph_entry+0x270/0x340
  Call Trace:
    trace_graph_entry+0x254/0x340 (unreliable)
    function_graph_enter+0xe4/0x1a0
    prepare_ftrace_return+0xa0/0x130
    ftrace_graph_caller+0x44/0x94	# (get_slice_psize())
    slb_allocate_user+0x7c/0x100
    do_slb_fault+0xf8/0x300
    instruction_access_slb_common+0x140/0x180

Fixes: 48e7b76957 ("powerpc/64s/hash: Convert SLB miss handlers to C")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191221121337.4894-1-mpe@ellerman.id.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09 10:20:04 +01:00
Alastair D'Silva 0e9d2e5f71 powerpc: Chunk calls to flush_dcache_range in arch_*_memory
commit 076265907cf9633bbef861c7c2a1c26a8209f283 upstream.

When presented with large amounts of memory being hotplugged
(in my test case, ~890GB), the call to flush_dcache_range takes
a while (~50 seconds), triggering RCU stalls.

This patch breaks up the call into 1GB chunks, calling
cond_resched() inbetween to allow the scheduler to run.

Fixes: fb5924fddf ("powerpc/mm: Flush cache on memory hot(un)plug")
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191104023305.9581-6-alastair@au1.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09 10:20:02 +01:00
David Hildenbrand e84c5b7617 mm/memory_hotplug: shrink zones when offlining memory
commit feee6b2989165631b17ac6d4ccdbf6759254e85a upstream.

We currently try to shrink a single zone when removing memory.  We use
the zone of the first page of the memory we are removing.  If that
memmap was never initialized (e.g., memory was never onlined), we will
read garbage and can trigger kernel BUGs (due to a stale pointer):

    BUG: unable to handle page fault for address: 000000000000353d
    #PF: supervisor write access in kernel mode
    #PF: error_code(0x0002) - not-present page
    PGD 0 P4D 0
    Oops: 0002 [#1] SMP PTI
    CPU: 1 PID: 7 Comm: kworker/u8:0 Not tainted 5.3.0-rc5-next-20190820+ #317
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
    Workqueue: kacpi_hotplug acpi_hotplug_work_fn
    RIP: 0010:clear_zone_contiguous+0x5/0x10
    Code: 48 89 c6 48 89 c3 e8 2a fe ff ff 48 85 c0 75 cf 5b 5d c3 c6 85 fd 05 00 00 01 5b 5d c3 0f 1f 840
    RSP: 0018:ffffad2400043c98 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: 0000000200000000 RCX: 0000000000000000
    RDX: 0000000000200000 RSI: 0000000000140000 RDI: 0000000000002f40
    RBP: 0000000140000000 R08: 0000000000000000 R09: 0000000000000001
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000
    R13: 0000000000140000 R14: 0000000000002f40 R15: ffff9e3e7aff3680
    FS:  0000000000000000(0000) GS:ffff9e3e7bb00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000000000000353d CR3: 0000000058610000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     __remove_pages+0x4b/0x640
     arch_remove_memory+0x63/0x8d
     try_remove_memory+0xdb/0x130
     __remove_memory+0xa/0x11
     acpi_memory_device_remove+0x70/0x100
     acpi_bus_trim+0x55/0x90
     acpi_device_hotplug+0x227/0x3a0
     acpi_hotplug_work_fn+0x1a/0x30
     process_one_work+0x221/0x550
     worker_thread+0x50/0x3b0
     kthread+0x105/0x140
     ret_from_fork+0x3a/0x50
    Modules linked in:
    CR2: 000000000000353d

Instead, shrink the zones when offlining memory or when onlining failed.
Introduce and use remove_pfn_range_from_zone(() for that.  We now
properly shrink the zones, even if we have DIMMs whereby

 - Some memory blocks fall into no zone (never onlined)

 - Some memory blocks fall into multiple zones (offlined+re-onlined)

 - Multiple memory blocks that fall into different zones

Drop the zone parameter (with a potential dubious value) from
__remove_pages() and __remove_section().

Link: http://lkml.kernel.org/r/20191006085646.5768-6-david@redhat.com
Fixes: f1dd2cd13c ("mm, memory_hotplug: do not associate hotadded memory to zones until online")	[visible after d0dc12e86b]
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: <stable@vger.kernel.org>	[5.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09 10:19:56 +01:00
Aneesh Kumar K.V 99db9e0fec powerpc/book3s64/hash: Add cond_resched to avoid soft lockup warning
[ Upstream commit 16f6b67cf03cb43db7104acb2ca877bdc2606c92 ]

With large memory (8TB and more) hotplug, we can get soft lockup
warnings as below. These were caused by a long loop without any
explicit cond_resched which is a problem for !PREEMPT kernels.

Avoid this using cond_resched() while inserting hash page table
entries. We already do similar cond_resched() in __add_pages(), see
commit f64ac5e6e3 ("mm, memory_hotplug: add scheduling point to
__add_pages").

  rcu:     3-....: (24002 ticks this GP) idle=13e/1/0x4000000000000002 softirq=722/722 fqs=12001
   (t=24003 jiffies g=4285 q=2002)
  NMI backtrace for cpu 3
  CPU: 3 PID: 3870 Comm: ndctl Not tainted 5.3.0-197.18-default+ #2
  Call Trace:
    dump_stack+0xb0/0xf4 (unreliable)
    nmi_cpu_backtrace+0x124/0x130
    nmi_trigger_cpumask_backtrace+0x1ac/0x1f0
    arch_trigger_cpumask_backtrace+0x28/0x3c
    rcu_dump_cpu_stacks+0xf8/0x154
    rcu_sched_clock_irq+0x878/0xb40
    update_process_times+0x48/0x90
    tick_sched_handle.isra.16+0x4c/0x80
    tick_sched_timer+0x68/0xe0
    __hrtimer_run_queues+0x180/0x430
    hrtimer_interrupt+0x110/0x300
    timer_interrupt+0x108/0x2f0
    decrementer_common+0x114/0x120
  --- interrupt: 901 at arch_add_memory+0xc0/0x130
      LR = arch_add_memory+0x74/0x130
    memremap_pages+0x494/0x650
    devm_memremap_pages+0x3c/0xa0
    pmem_attach_disk+0x188/0x750
    nvdimm_bus_probe+0xac/0x2c0
    really_probe+0x148/0x570
    driver_probe_device+0x19c/0x1d0
    device_driver_attach+0xcc/0x100
    bind_store+0x134/0x1c0
    drv_attr_store+0x44/0x60
    sysfs_kf_write+0x64/0x90
    kernfs_fop_write+0x1a0/0x270
    __vfs_write+0x3c/0x70
    vfs_write+0xd0/0x260
    ksys_write+0xdc/0x130
    system_call+0x5c/0x68

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191001084656.31277-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-04 19:17:12 +01:00
Aneesh Kumar K.V 779c0ee4f4 powerpc/pseries: Don't fail hash page table insert for bolted mapping
[ Upstream commit 75838a3290cd4ebbd1f567f310ba04b6ef017ce4 ]

If the hypervisor returned H_PTEG_FULL for H_ENTER hcall, retry a hash page table
insert by removing a random entry from the group.

After some runtime, it is very well possible to find all the 8 hash page table
entry slot in the hpte group used for mapping. Don't fail a bolted entry insert
in that case. With Storage class memory a user can find this error easily since
a namespace enable/disable is equivalent to memory add/remove.

This results in failures as reported below:

$ ndctl create-namespace -r region1 -t pmem -m devdax -a 65536 -s 100M
libndctl: ndctl_dax_enable: dax1.3: failed to enable
  Error: namespace1.2: failed to enable

failed to create namespace: No such device or address

In kernel log we find the details as below:

Unable to create mapping for hot added memory 0xc000042006000000..0xc00004200d000000: -1
dax_pmem: probe of dax1.3 failed with error -14

This indicates that we failed to create a bolted hash table entry for direct-map
address backing the namespace.

We also observe failures such that not all namespaces will be enabled with
ndctl enable-namespace all command.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191024093542.29777-2-aneesh.kumar@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-04 19:16:52 +01:00
Roy Pledge b558f49bc0 Add APIs to setup HugeTLB mappings for USDPAA 2019-11-25 16:32:12 +08:00
Linus Torvalds a3c0e7b1fe libnvdimm fixes v5.4-rc1
- Complete the reworks to interoperate with powerpc dynamic huge page sizes
 
 - Fix a crash due to missed accounting for the powerpc 'struct
   page'-memmap mapping granularity.
 
 - Fix badblock initialization for volatile (DRAM emulated) pmem ranges.
 
 - Stop triggering request_key() notifications to userspace when
   NVDIMM-security is disabled / not present.
 
 - Miscellaneous small fixups.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJdkAprAAoJEB7SkWpmfYgCjXoQAIwJE1VzNP1V+ARxfs1rTGVz
 pbNJiBnj4gxDaCkcKoatiadRkytUxeUNEcPslEKsfoNinXYqkpjMQoWm2VpILOMU
 nY+SvIudGRnuesq2/Y+CP8zrX6rV4eBDfHK05RN/Zp1IlW7pTDItUx8mJ7glmDwG
 PW0vkvK7yZ+dRFnpQ7QFjhA0Q3oudO5YcTVBDK5YYtDGlv69xfXqc9LW8SszJ1kU
 rhCIT1kdoL5of0TIgG5pTfmggPSQ9y1xPsKjllOHNa3m50eGOkkQLELOVzQb1frW
 cjAsPLjRDSzvdHHSLyu0Is04Q5JU2CucxHl2SXGHiOt5tigH8dk5XFxWt0Pc8EXx
 acYYiBqUXC3MomSYWeLK4BdO2cRTqcPPXgJYAqXblqr+/0ys+rFepjw+j8JkiLZa
 5UCC30l1GXEpw9u6gdCMqvvHN2gHvDB0BV82Sx8wTewJpeL18wCUJoKVuFmpsHko
 p1cCe7St1TzcK3eO+xfeW1rxNrcXUpKVYXVa/WOJW0vwErqAZ6YCdNuyJHocZzXn
 vNyIQmVDOlubsgBAI2ExxeZO6xc8UIwLhLg7XEJ0mg3k6UXA8HZxH2B2THJk1BSF
 RppodkYiMknh11sqgpGp+Hz5XSEg/jvmCdL/qRDGAwhsFhFaxDH37Kg4Qncj2/dg
 uDvDHXNCjbGpzCo3tyNx
 =Z6Fa
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-fixes-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

More libnvdimm updates from Dan Williams:

 - Complete the reworks to interoperate with powerpc dynamic huge page
   sizes

 - Fix a crash due to missed accounting for the powerpc 'struct
   page'-memmap mapping granularity

 - Fix badblock initialization for volatile (DRAM emulated) pmem ranges

 - Stop triggering request_key() notifications to userspace when
   NVDIMM-security is disabled / not present

 - Miscellaneous small fixups

* tag 'libnvdimm-fixes-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  libnvdimm/region: Enable MAP_SYNC for volatile regions
  libnvdimm: prevent nvdimm from requesting key when security is disabled
  libnvdimm/region: Initialize bad block for volatile namespaces
  libnvdimm/nfit_test: Fix acpi_handle redefinition
  libnvdimm/altmap: Track namespace boundaries in altmap
  libnvdimm: Fix endian conversion issues 
  libnvdimm/dax: Pick the right alignment default when creating dax devices
  powerpc/book3s64: Export has_transparent_hugepage() related functions.
2019-09-29 10:33:41 -07:00
Linus Torvalds a2953204b5 powerpc fixes for 5.4 #2
An assortment of fixes that were either missed by me, or didn't arrive quite in
 time for the first v5.4 pull.
 
 Most notable is a fix for an issue with tlbie (broadcast TLB invalidation) on
 Power9, when using the Radix MMU. The tlbie can race with an mtpid (move to PID
 register, essentially MMU context switch) on another thread of the core, which
 can cause stores to continue to go to a page after it's unmapped.
 
 A fix in our KVM code to add a missing barrier, the lack of which has been
 observed to cause missed IPIs and subsequently stuck CPUs in the host.
 
 A change to the way we initialise PCR (Processor Compatibility Register) to make
 it forward compatible with future CPUs.
 
 On some older PowerVM systems our H_BLOCK_REMOVE support could oops, fix it to
 detect such systems and fallback to the old invalidation method.
 
 A fix for an oops seen on some machines when using KASAN on 32-bit.
 
 A handful of other minor fixes, and two new selftests.
 
 Thanks to:
   Alistair Popple, Aneesh Kumar K.V, Christophe Leroy, Gustavo Romero, Joel
   Stanley, Jordan Niethe, Laurent Dufour, Michael Roth, Oliver O'Halloran.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl2PRGITHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgClRD/9jKIT6GjVpRMc+Dg9zHB5/Pir7gePk
 ztXKI+u15GrrXgjtWEZ1PaaXvtNIfs/IZHDQm5gyJjiBAKcGl2v+9ETaMzO5sjZ7
 GSe1F8VX/MwzRnET8Jph8w/b0cy0Q8xndkEOjcJqJ7+TF+SSWqmJEdmBfkU23jWD
 B3kW4W1x2xt/XGsX25l1HpUgpcJqzukCeYUSCdqUu2j+sXAEZmfgTRG8uD4HffzZ
 3As76TrBiJsDnkyH0qi2G1BuLXrQbAMdjTeSGi+cb0gTIunCr190gI4+Tjdu2/z7
 ywWR2ZUkueCNDcLsqXaqpZx50utPJ44//uY750sk72vixJJOVuOWM6+5HKVW83se
 /v0zkOcI9+ywNHe0vLfP3Jm/OMMHYxkIwz6kVu2NSR6sE79B9AZpBFU+Nynq7kKl
 +Hc6md/HATvR+NK6LtQKtEGydRhvxU5n3KBmjq3SQj+B/ZlU6IdgerfhUWrNvg0B
 zzHeT35X6UBpswonhkQLgqJuaWpkClK9wsUy85MuA7aub1EP8S6/X7paKoiOtAHK
 NjlXM2JYV5OKwhjGgdCiI94Bdune7yudKPdsXV3Gr8Iw7wf2bQk1p7VH+LcruyE9
 YJdXwCgN0PaoFUQh3AR4CqzzFwqDya8FQqdkFN3kqhRLVGAMq/PsV8/Tn+myTgQP
 rZnWnbfZh9BMjw==
 =dF42
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "An assortment of fixes that were either missed by me, or didn't arrive
  quite in time for the first v5.4 pull.

   - Most notable is a fix for an issue with tlbie (broadcast TLB
     invalidation) on Power9, when using the Radix MMU. The tlbie can
     race with an mtpid (move to PID register, essentially MMU context
     switch) on another thread of the core, which can cause stores to
     continue to go to a page after it's unmapped.

   - A fix in our KVM code to add a missing barrier, the lack of which
     has been observed to cause missed IPIs and subsequently stuck CPUs
     in the host.

   - A change to the way we initialise PCR (Processor Compatibility
     Register) to make it forward compatible with future CPUs.

   - On some older PowerVM systems our H_BLOCK_REMOVE support could
     oops, fix it to detect such systems and fallback to the old
     invalidation method.

   - A fix for an oops seen on some machines when using KASAN on 32-bit.

   - A handful of other minor fixes, and two new selftests.

  Thanks to: Alistair Popple, Aneesh Kumar K.V, Christophe Leroy,
  Gustavo Romero, Joel Stanley, Jordan Niethe, Laurent Dufour, Michael
  Roth, Oliver O'Halloran"

* tag 'powerpc-5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/eeh: Fix eeh eeh_debugfs_break_device() with SRIOV devices
  powerpc/nvdimm: use H_SCM_QUERY hcall on H_OVERLAP error
  powerpc/nvdimm: Use HCALL error as the return value
  selftests/powerpc: Add test case for tlbie vs mtpidr ordering issue
  powerpc/mm: Fixup tlbie vs mtpidr/mtlpidr ordering issue on POWER9
  powerpc/book3s64/radix: Rename CPU_FTR_P9_TLBIE_BUG feature flag
  powerpc/book3s64/mm: Don't do tlbie fixup for some hardware revisions
  powerpc/pseries: Call H_BLOCK_REMOVE when supported
  powerpc/pseries: Read TLB Block Invalidate Characteristics
  KVM: PPC: Book3S HV: use smp_mb() when setting/clearing host_ipi flag
  powerpc/mm: Fix an Oops in kasan_mmu_init()
  powerpc/mm: Add a helper to select PAGE_KERNEL_RO or PAGE_READONLY
  powerpc/64s: Set reserved PCR bits
  powerpc: Fix definition of PCR bits to work with old binutils
  powerpc/book3s64/radix: Remove WARN_ON in destroy_context()
  powerpc/tm: Add tm-poison test
2019-09-28 13:43:00 -07:00
Mark Rutland b4ed71f557 mm: treewide: clarify pgtable_page_{ctor,dtor}() naming
The naming of pgtable_page_{ctor,dtor}() seems to have confused a few
people, and until recently arm64 used these erroneously/pointlessly for
other levels of page table.

To make it incredibly clear that these only apply to the PTE level, and to
align with the naming of pgtable_pmd_page_{ctor,dtor}(), let's rename them
to pgtable_pte_page_{ctor,dtor}().

These changes were generated with the following shell script:

----
git grep -lw 'pgtable_page_.tor' | while read FILE; do
    sed -i '{s/pgtable_page_ctor/pgtable_pte_page_ctor/}' $FILE;
    sed -i '{s/pgtable_page_dtor/pgtable_pte_page_dtor/}' $FILE;
done
----

... with the documentation re-flowed to remain under 80 columns, and
whitespace fixed up in macros to keep backslashes aligned.

There should be no functional change as a result of this patch.

Link: http://lkml.kernel.org/r/20190722141133.3116-1-mark.rutland@arm.com
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26 10:10:44 -07:00
Kefeng Wang 9ef258bad7 thp: update split_huge_page_pmd() comment
According to 78ddc53473 ("thp: rename split_huge_page_pmd() to
split_huge_pmd()"), update related comment.

Link: http://lkml.kernel.org/r/20190731033406.185285-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-24 15:54:10 -07:00
Matthew Wilcox (Oracle) d8c6546b1a mm: introduce compound_nr()
Replace 1 << compound_order(page) with compound_nr(page).  Minor
improvements in readability.

Link: http://lkml.kernel.org/r/20190721104612.19120-4-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-24 15:54:08 -07:00
Matthew Wilcox (Oracle) 94ad933810 mm: introduce page_shift()
Replace PAGE_SHIFT + compound_order(page) with the new page_shift()
function.  Minor improvements in readability.

[akpm@linux-foundation.org: fix build in tce_page_is_contained()]
  Link: http://lkml.kernel.org/r/201907241853.yNQTrJWd%25lkp@intel.com
Link: http://lkml.kernel.org/r/20190721104612.19120-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-24 15:54:08 -07:00
Aneesh Kumar K.V cf387d9644 libnvdimm/altmap: Track namespace boundaries in altmap
With PFN_MODE_PMEM namespace, the memmap area is allocated from the device
area. Some architectures map the memmap area with large page size. On
architectures like ppc64, 16MB page for memap mapping can map 262144 pfns.
This maps a namespace size of 16G.

When populating memmap region with 16MB page from the device area,
make sure the allocated space is not used to map resources outside this
namespace. Such usage of device area will prevent a namespace destroy.

Add resource end pnf in altmap and use that to check if the memmap area
allocation can map pfn outside the namespace. On ppc64 in such case we fallback
to allocation from memory.

This fix kernel crash reported below:

[  132.034989] WARNING: CPU: 13 PID: 13719 at mm/memremap.c:133 devm_memremap_pages_release+0x2d8/0x2e0
[  133.464754] BUG: Unable to handle kernel data access at 0xc00c00010b204000
[  133.464760] Faulting instruction address: 0xc00000000007580c
[  133.464766] Oops: Kernel access of bad area, sig: 11 [#1]
[  133.464771] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
.....
[  133.464901] NIP [c00000000007580c] vmemmap_free+0x2ac/0x3d0
[  133.464906] LR [c0000000000757f8] vmemmap_free+0x298/0x3d0
[  133.464910] Call Trace:
[  133.464914] [c000007cbfd0f7b0] [c0000000000757f8] vmemmap_free+0x298/0x3d0 (unreliable)
[  133.464921] [c000007cbfd0f8d0] [c000000000370a44] section_deactivate+0x1a4/0x240
[  133.464928] [c000007cbfd0f980] [c000000000386270] __remove_pages+0x3a0/0x590
[  133.464935] [c000007cbfd0fa50] [c000000000074158] arch_remove_memory+0x88/0x160
[  133.464942] [c000007cbfd0fae0] [c0000000003be8c0] devm_memremap_pages_release+0x150/0x2e0
[  133.464949] [c000007cbfd0fb70] [c000000000738ea0] devm_action_release+0x30/0x50
[  133.464955] [c000007cbfd0fb90] [c00000000073a5a4] release_nodes+0x344/0x400
[  133.464961] [c000007cbfd0fc40] [c00000000073378c] device_release_driver_internal+0x15c/0x250
[  133.464968] [c000007cbfd0fc80] [c00000000072fd14] unbind_store+0x104/0x110
[  133.464973] [c000007cbfd0fcd0] [c00000000072ee24] drv_attr_store+0x44/0x70
[  133.464981] [c000007cbfd0fcf0] [c0000000004a32bc] sysfs_kf_write+0x6c/0xa0
[  133.464987] [c000007cbfd0fd10] [c0000000004a1dfc] kernfs_fop_write+0x17c/0x250
[  133.464993] [c000007cbfd0fd60] [c0000000003c348c] __vfs_write+0x3c/0x70
[  133.464999] [c000007cbfd0fd80] [c0000000003c75d0] vfs_write+0xd0/0x250

djbw: Aneesh notes that this crash can likely be triggered in any kernel that
supports 'papr_scm', so flagging that commit for -stable consideration.

Fixes: b5beae5e22 ("powerpc/pseries: Add driver for PAPR SCM regions")
Cc: <stable@vger.kernel.org>
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
Tested-by: Santosh Sivaraj <santosh@fossix.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Link: https://lore.kernel.org/r/20190910062826.10041-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-09-24 10:24:12 -07:00