mirror of
https://github.com/brain-hackers/linux-brain.git
synced 2024-06-09 23:36:23 +09:00
brain
32114 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Andrey Zhizhikin
|
913880358f |
This is the 5.4.149 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmFQYpIACgkQONu9yGCS aT5nNBAAnao6g0C+ZSPUJrX3Aa9I+dcP8le9T4faUD1E8fa3XSIrDhUrZwhvdI06 ljos1XzQe60/CJmY1jnfL4TOQqrfS6tLTseVGIhFAtJQorxxQwzuCEq/2sqlYz/A BEN1/g1XjXyFMmKw598luTClbHk91pnScxA0ZyJ28lhNeBnpuHKK5+PvqNT2bg6G Vc8IGPv7cd48FjfwBzDuWklsQE9FFHPtq2eyhAk6K9QbECnP9wgfdrPx87oyRGN6 tPtSEhlwNM8EEaFZ1/1zgTgj3n35I3LXGfV19YRid20y1SbwB8yFloidx8SjaAOE rMpiyxcDgfYoeHw5WBt+f/QVLx3Ia8uEFgwFSHyD1btNrPGdAlatWgXSrNBLvQuy jIoDtqY9L5Ty3T3rBjyDlXl0oUUDD4JyVteGsrXlzVEHa7YaLIhvrcQ6Es09XDZX TXPinEMPTohO7/cCVHjXOuREXeYukXLrKuZBBNTutANP9Yx7Tj9yAwVtrnakkv1B WykWhjJSmOHcj3q8hm9i1GI8qo3sWIwvM0c8in1OLzA+vpjXPR9onA8PHYidj8LY f4E3I2Xp+zBj8WljLgHIJhpwdo8jq5StdYPl0y3Na/ZVU3El3VKtwiT0RieOnKfp aOj16+CdbCDpdDZofqu/Zio4Do0RFPsHHeEnfedH38aOw1bjctw= =e/m/ -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmFRuk4ACgkQ7G51OISz Hs05Iw/9EJFb9xJJhi1GzCVg6X5rF4Nna8yNwZOtEWtnG3fE0FcJJ1AP8pN0l93v yI+c6KbNl30hqI25ji/VEgoLlOOe5/5QctnPXryXrkf4jvoXQC2TZSGhFfznBlYK XcfbaYSlCf9fgaFQBDfi9S1/k/8yVDs9QKIKf6aUlXKmNWhF9I3Fp6nQ1cnaZjMp tMto2WOilgCFMwb9EuSTuSPHE0bDqmwk7npVQ8/HA2QdMzLySwHVUZKD8vC9Kvsc 1csT8lkWak+7N5p2RNyrJAORL9OBvhdPZVyh0cMet00T+zDV0aUDpn3/6R4gnFUz TmRdG/oRMqlkhQYvKpFiE295AtU41TMjc9Mb0dg9KYCsCh9yTGgZdR0ftYOHFb+o wgXBkURAbj7JETTtRBVMRDEV8Zo1iJjpvGmILT8jO7IsQppkTh2JcjI5rnoxUdGB eAd5W36aZfnOEPdUpoWf2mNF8bgQodDVYHc8cPlknJPKP3GM6EZDUNEkKRFTe8hD /IP4yLrOvqaxP9n82bPNuRTpKM365Q+rmHkPlTof8ZLYCB52gdJBg9/oTVGfuLfE o+DtHU1sBuzukyPEJE2okb3D9uqKM0/xnUxhXQqxDFhXMcI5npllQ7FK71FOWtcp 4YTGwf/Y4DmK0tckzNg1URnzATRR+6b8zjMR1zc9gfUawnBhwWU= =zMkr -----END PGP SIGNATURE----- Merge tag 'v5.4.149' into 5.4-2.3.x-imx This is the 5.4.149 stable release Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com> |
||
Pavel Skripkin
|
b94def8a47 |
profiling: fix shift-out-of-bounds bugs
commit 2d186afd04d669fe9c48b994c41a7405a3c9f16d upstream.
Syzbot reported shift-out-of-bounds bug in profile_init().
The problem was in incorrect prof_shift. Since prof_shift value comes from
userspace we need to clamp this value into [0, BITS_PER_LONG -1]
boundaries.
Second possible shiht-out-of-bounds was found by Tetsuo:
sample_step local variable in read_profile() had "unsigned int" type,
but prof_shift allows to make a BITS_PER_LONG shift. So, to prevent
possible shiht-out-of-bounds sample_step type was changed to
"unsigned long".
Also, "unsigned short int" will be sufficient for storing
[0, BITS_PER_LONG] value, that's why there is no need for
"unsigned long" prof_shift.
Link: https://lkml.kernel.org/r/20210813140022.5011-1-paskripkin@gmail.com
Fixes:
|
||
Cyrill Gorcunov
|
5607b1bae1 |
prctl: allow to setup brk for et_dyn executables
commit e1fbbd073137a9d63279f6bf363151a938347640 upstream.
Keno Fischer reported that when a binray loaded via ld-linux-x the
prctl(PR_SET_MM_MAP) doesn't allow to setup brk value because it lays
before mm:end_data.
For example a test program shows
| # ~/t
|
| start_code 401000
| end_code 401a15
| start_stack 7ffce4577dd0
| start_data 403e10
| end_data 40408c
| start_brk b5b000
| sbrk(0) b5b000
and when executed via ld-linux
| # /lib64/ld-linux-x86-64.so.2 ~/t
|
| start_code 7fc25b0a4000
| end_code 7fc25b0c4524
| start_stack 7fffcc6b2400
| start_data 7fc25b0ce4c0
| end_data 7fc25b0cff98
| start_brk 55555710c000
| sbrk(0) 55555710c000
This of course prevent criu from restoring such programs. Looking into
how kernel operates with brk/start_brk inside brk() syscall I don't see
any problem if we allow to setup brk/start_brk without checking for
end_data. Even if someone pass some weird address here on a purpose then
the worst possible result will be an unexpected unmapping of existing vma
(own vma, since prctl works with the callers memory) but test for
RLIMIT_DATA is still valid and a user won't be able to gain more memory in
case of expanding VMAs via new values shipped with prctl call.
Link: https://lkml.kernel.org/r/20210121221207.GB2174@grain
Fixes:
|
||
Andrey Zhizhikin
|
8b231e0e8e |
This is the 5.4.148 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmFLBPMACgkQONu9yGCS aT6BIQ//Wb4ZQJtEVvaKnda7vFwe8BoZzPGYZA4Imn9KERDRgHuavEuRfMQtKc2y YHwe/PD2JreuDHcd+Wz32xsdMe045xNvgiE1oGcxq0jNBvhJqANSmVTWpdqAquON cTmwsK3roa7ELC2g1WjrYZDv6CrCggqvbuM9AJ/cLITtd8zerhLdZo+CCDG/28cH EosrWvkBcaGmX+r/IBC86Rt6K2OFQ/3LLbb79L4vjKi5lopsm5CTAmfOfIk8p1gB mGB3PkQZnIqphBfqGXLGuljl4e+zb1SONrugUh78Egom393Ex34oo+RjWEGe9dV2 Stkuqo0GTi85X7JA7SGCA/xgF8A8yvaaLjQBsJsL9+2ji+GW+J7hfn4mE5h8H3Di UBjeLMFJA8Mge8Ng9xUSttvjRdwSTm0jWTS9SOl07w24b0pKYbMrQdWt2eI6CT+/ ytq3nCxNJZKeVcAVH+OJNrbSLYvMy/PgYvGTbzASkNmpAeyNiHOyBz1sRcoiAM9U QCWDdZyaqDKktqEyKHxK3opqPzbnHfZFFlCxR7Gw7vvR+itIGJEh/50RNv2F6vnu wzowrVxe+Bf1h7JiNEqLLVHdiuygRqjH1ygepGM4+3TVF4jYHzDISyrqlA/Se3Pg Hhvlzsbv7PH+KiApwBFjSeHTs5WOrokGMFQ7ZYFDpPkleWiywS0= =50Hk -----END PGP SIGNATURE----- Merge tag 'v5.4.148' into 5.4-2.3.x-imx This is the 5.4.148 stable release Conflicts: - drivers/dma/imx-sdma.c: Following upstream patches are already applied to NXP tree: |
||
Masami Hiramatsu
|
b3435cd968 |
tracing/probes: Reject events which have the same name of existing one
[ Upstream commit 8e242060c6a4947e8ae7d29794af6a581db08841 ] Since kprobe_events and uprobe_events only check whether the other same-type probe event has the same name or not, if the user gives the same name of the existing tracepoint event (or the other type of probe events), it silently fails to create the tracefs entry (but registered.) as below. /sys/kernel/tracing # ls events/task/task_rename enable filter format hist id trigger /sys/kernel/tracing # echo p:task/task_rename vfs_read >> kprobe_events [ 113.048508] Could not create tracefs 'task_rename' directory /sys/kernel/tracing # cat kprobe_events p:task/task_rename vfs_read To fix this issue, check whether the existing events have the same name or not in trace_probe_register_event_call(). If exists, it rejects to register the new event. Link: https://lkml.kernel.org/r/162936876189.187130.17558311387542061930.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Baptiste Lepers
|
172749c879 |
events: Reuse value read using READ_ONCE instead of re-reading it
commit b89a05b21f46150ac10a962aa50109250b56b03b upstream.
In perf_event_addr_filters_apply, the task associated with
the event (event->ctx->task) is read using READ_ONCE at the beginning
of the function, checked, and then re-read from event->ctx->task,
voiding all guarantees of the checks. Reuse the value that was read by
READ_ONCE to ensure the consistency of the task struct throughout the
function.
Fixes:
|
||
Vasily Averin
|
0569920e43 |
memcg: enable accounting for pids in nested pid namespaces
commit fab827dbee8c2e06ca4ba000fa6c48bcf9054aba upstream. Commit |
||
Liu Zixian
|
22b11dbbf9 |
mm/hugetlb: initialize hugetlb_usage in mm_init
commit 13db8c50477d83ad3e3b9b0ae247e5cd833a7ae4 upstream.
After fork, the child process will get incorrect (2x) hugetlb_usage. If
a process uses 5 2MB hugetlb pages in an anonymous mapping,
HugetlbPages: 10240 kB
and then forks, the child will show,
HugetlbPages: 20480 kB
The reason for double the amount is because hugetlb_usage will be copied
from the parent and then increased when we copy page tables from parent
to child. Child will have 2x actual usage.
Fix this by adding hugetlb_count_init in mm_init.
Link: https://lkml.kernel.org/r/20210826071742.877-1-liuzixian4@huawei.com
Fixes:
|
||
Zhen Lei
|
b6cee35839 |
workqueue: Fix possible memory leaks in wq_numa_init()
[ Upstream commit f728c4a9e8405caae69d4bc1232c54ff57b5d20f ] In error handling branch "if (WARN_ON(node == NUMA_NO_NODE))", the previously allocated memories are not released. Doing this before allocating memory eliminates memory leaks. tj: Note that the condition only occurs when the arch code is pretty broken and the WARN_ON might as well be BUG_ON(). Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Anthony Iliopoulos
|
be09cbd6a3 |
dma-debug: fix debugfs initialization order
[ Upstream commit 173735c346c412d9f084825ecb04f24ada0e2986 ]
Due to link order, dma_debug_init is called before debugfs has a chance
to initialize (via debugfs_init which also happens in the core initcall
stage), so the directories for dma-debug are never created.
Decouple dma_debug_fs_init from dma_debug_init and defer its init until
core_initcall_sync (after debugfs has been initialized) while letting
dma-debug initialization occur as soon as possible to catch any early
mappings, as suggested in [1].
[1] https://lore.kernel.org/linux-iommu/YIgGa6yF%2Fadg8OSN@kroah.com/
Fixes:
|
||
Andrey Zhizhikin
|
33478cc104 |
This is the 5.4.147 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmFDItIACgkQONu9yGCS aT7qHw/+MC8nqbYT/fMu3ZsTsNB8JZelGhrpXVAFc4E80oZDVEF3aNh8koTk13o/ sI+9LZD6vPJ7kxPBT45PPxDfwM7VlVlQTVWQ8lUjY+8a2Ml7xZSiz9I3bMX0MsvU ylXeAJ4BntXI4679ccSjWGwwcQa+PrHsPrpZqKLBI4+pjxps3eMGwK0yE5jcRd4Z WlhgYbE8QhmB4iWSA5CBr7IY5pVIlKpvovlvIi1TlU1C9LXU6O8OBPkhmbuFF4fG HY2ge6d+xZItcn9RFi+uH51PwPBpN9U2I+QakQ4iyMNgF1uqHzfWh30ZeINxHzw2 2Nn/nCmNeUwoOt6YSQGxlZUieqqvq5y4VeXo2nThqBdds8GPJ4AhvvxmM/NnP0EV lfI6RSxJcYULfl0XQB5sj3fJ2Fo7G7Mx+kg0q0018iTuOx9kNdQF/WXdLqHvBUIi VIaRFR2wKeWXCV7tmb5o8928clv0FWJRi19Gcq8597zWxGC8mlSOg48jVVMkE9rU IaYhIiIRL/Dkf34Hal2+mZbvbD5IjtViw9uYOzME7NuF4VVHD6RavsMOM8AuUc+t OtSRo6mID1rH+RrvFZhZRzcT3Fg5NVbmEhiVG7tH26ZEJE2Gs/CjexQgTHf25+VL /H8Mnjjw7gAmvqAJyG9s3eocnhuuWeYj5YWg3vAxVNoUqUT2uO4= =RNKE -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmFDnvkACgkQ7G51OISz Hs17yQ/8CoCEE3h3RZQ07Y4RSyePZU96IRGqd5Dhyli0BUgWahmaaoBrXYQPL2c1 IzBW2/e0vxCwMflVHxRMS6tXoTyj3dXuNG30SU+pkCDnsWyrWxtNALFgDt0a/wHa pszFKfnXKxrYOKKU2gSPxFu1+0Tm0zV6puwgvwcuKzUFx/wjIZxH+8+fLhTWYu2X 7OBuwCvDjbsQgnruKbyysI14DBu4rfYm/7hXLgRLCtYRdUYeuVrR1W5N8QEqLYr4 daZHCl2T4VF6qiiE5FdZc6yfW0H6KSK8A1B9PYZnSJyeYwLS1edXQFbZq4VBSjPh wIefx3xB0ZKuyPUHc1EYjxvxfYJCbWjFaOT8aT+qDxkpYgMRltur1fmE1fS0P9hp 3g2JrSl5EzzNqklZUuC51X4Gq2rGrO31FCXcOh/rAfbJ4sHeUlWU2pnTN7ilJt/W yW51Rl+mYPNTY964iCkMzVmNyAH9oIgJyR7a9P0gkIUwCqq780POtN2bMzkS/Om0 lauKR5NmuNVw//jfeQddrrdVILfErZ5tHplGNsk3VFMi2/ITNEbmm1CJGb2090F/ EK1lcWOj6vuETN9Lqc/dv2hX0UgDG8I1SVb1SXwRLCCZREiuoS+Z3EmMfLAkmV46 tXz+x34BTvMl3CEZMvy69ovTrJlh6537mFx2T7+FNBqeDLHT2Z4= =yJQM -----END PGP SIGNATURE----- Merge tag 'v5.4.147' into 5.4-2.3.x-imx This is the 5.4.147 stable release Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com> |
||
Greg Kroah-Hartman
|
dc15f641c6 |
Revert "posix-cpu-timers: Force next expiration recalc after itimer reset"
This reverts commit
|
||
Daniel Borkmann
|
ae968e270f |
bpf: Fix pointer arithmetic mask tightening under state pruning
commit e042aa532c84d18ff13291d00620502ce7a38dda upstream. In 7fedb63a8307 ("bpf: Tighten speculative pointer arithmetic mask") we narrowed the offset mask for unprivileged pointer arithmetic in order to mitigate a corner case where in the speculative domain it is possible to advance, for example, the map value pointer by up to value_size-1 out-of- bounds in order to leak kernel memory via side-channel to user space. The verifier's state pruning for scalars leaves one corner case open where in the first verification path R_x holds an unknown scalar with an aux->alu_limit of e.g. 7, and in a second verification path that same register R_x, here denoted as R_x', holds an unknown scalar which has tighter bounds and would thus satisfy range_within(R_x, R_x') as well as tnum_in(R_x, R_x') for state pruning, yielding an aux->alu_limit of 3: Given the second path fits the register constraints for pruning, the final generated mask from aux->alu_limit will remain at 7. While technically not wrong for the non-speculative domain, it would however be possible to craft similar cases where the mask would be too wide as in 7fedb63a8307. One way to fix it is to detect the presence of unknown scalar map pointer arithmetic and force a deeper search on unknown scalars to ensure that we do not run into a masking mismatch. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [OP: adjusted context in include/linux/bpf_verifier.h for 5.4] Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Lorenz Bauer
|
a0a4778fea |
bpf: verifier: Allocate idmap scratch in verifier env
commit c9e73e3d2b1eb1ea7ff068e05007eec3bd8ef1c9 upstream. func_states_equal makes a very short lived allocation for idmap, probably because it's too large to fit on the stack. However the function is called quite often, leading to a lot of alloc / free churn. Replace the temporary allocation with dedicated scratch space in struct bpf_verifier_env. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/bpf/20210429134656.122225-4-lmb@cloudflare.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [OP: adjusted context for 5.4] Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Daniel Borkmann
|
f5893af270 |
bpf: Fix leakage due to insufficient speculative store bypass mitigation
commit 2039f26f3aca5b0e419b98f65dd36481337b86ee upstream. Spectre v4 gadgets make use of memory disambiguation, which is a set of techniques that execute memory access instructions, that is, loads and stores, out of program order; Intel's optimization manual, section 2.4.4.5: A load instruction micro-op may depend on a preceding store. Many microarchitectures block loads until all preceding store addresses are known. The memory disambiguator predicts which loads will not depend on any previous stores. When the disambiguator predicts that a load does not have such a dependency, the load takes its data from the L1 data cache. Eventually, the prediction is verified. If an actual conflict is detected, the load and all succeeding instructions are re-executed. |
||
Daniel Borkmann
|
e80c3533c3 |
bpf: Introduce BPF nospec instruction for mitigating Spectre v4
commit f5e81d1117501546b7be050c5fbafa6efd2c722c upstream. In case of JITs, each of the JIT backends compiles the BPF nospec instruction /either/ to a machine instruction which emits a speculation barrier /or/ to /no/ machine instruction in case the underlying architecture is not affected by Speculative Store Bypass or has different mitigations in place already. This covers both x86 and (implicitly) arm64: In case of x86, we use 'lfence' instruction for mitigation. In case of arm64, we rely on the firmware mitigation as controlled via the ssbd kernel parameter. Whenever the mitigation is enabled, it works for all of the kernel code with no need to provide any additional instructions here (hence only comment in arm64 JIT). Other archs can follow as needed. The BPF nospec instruction is specifically targeting Spectre v4 since i) we don't use a serialization barrier for the Spectre v1 case, and ii) mitigation instructions for v1 and v4 might be different on some archs. The BPF nospec is required for a future commit, where the BPF verifier does annotate intermediate BPF programs with speculation barriers. Co-developed-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Benedict Schlueter <benedict.schlueter@rub.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Signed-off-by: Benedict Schlueter <benedict.schlueter@rub.de> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> [OP: - adjusted context for 5.4 - apply riscv changes to /arch/riscv/net/bpf_jit_comp.c] Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Andrey Ignatov
|
e37eeaf950 |
bpf: Fix possible out of bound write in narrow load handling
[ Upstream commit d7af7e497f0308bc97809cc48b58e8e0f13887e1 ]
Fix a verifier bug found by smatch static checker in [0].
This problem has never been seen in prod to my best knowledge. Fixing it
still seems to be a good idea since it's hard to say for sure whether
it's possible or not to have a scenario where a combination of
convert_ctx_access() and a narrow load would lead to an out of bound
write.
When narrow load is handled, one or two new instructions are added to
insn_buf array, but before it was only checked that
cnt >= ARRAY_SIZE(insn_buf)
And it's safe to add a new instruction to insn_buf[cnt++] only once. The
second try will lead to out of bound write. And this is what can happen
if `shift` is set.
Fix it by making sure that if the BPF_RSH instruction has to be added in
addition to BPF_AND then there is enough space for two more instructions
in insn_buf.
The full report [0] is below:
kernel/bpf/verifier.c:12304 convert_ctx_accesses() warn: offset 'cnt' incremented past end of array
kernel/bpf/verifier.c:12311 convert_ctx_accesses() warn: offset 'cnt' incremented past end of array
kernel/bpf/verifier.c
12282
12283 insn->off = off & ~(size_default - 1);
12284 insn->code = BPF_LDX | BPF_MEM | size_code;
12285 }
12286
12287 target_size = 0;
12288 cnt = convert_ctx_access(type, insn, insn_buf, env->prog,
12289 &target_size);
12290 if (cnt == 0 || cnt >= ARRAY_SIZE(insn_buf) ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Bounds check.
12291 (ctx_field_size && !target_size)) {
12292 verbose(env, "bpf verifier is misconfigured\n");
12293 return -EINVAL;
12294 }
12295
12296 if (is_narrower_load && size < target_size) {
12297 u8 shift = bpf_ctx_narrow_access_offset(
12298 off, size, size_default) * 8;
12299 if (ctx_field_size <= 4) {
12300 if (shift)
12301 insn_buf[cnt++] = BPF_ALU32_IMM(BPF_RSH,
^^^^^
increment beyond end of array
12302 insn->dst_reg,
12303 shift);
--> 12304 insn_buf[cnt++] = BPF_ALU32_IMM(BPF_AND, insn->dst_reg,
^^^^^
out of bounds write
12305 (1 << size * 8) - 1);
12306 } else {
12307 if (shift)
12308 insn_buf[cnt++] = BPF_ALU64_IMM(BPF_RSH,
12309 insn->dst_reg,
12310 shift);
12311 insn_buf[cnt++] = BPF_ALU64_IMM(BPF_AND, insn->dst_reg,
^^^^^^^^^^^^^^^
Same.
12312 (1ULL << size * 8) - 1);
12313 }
12314 }
12315
12316 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
12317 if (!new_prog)
12318 return -ENOMEM;
12319
12320 delta += cnt - 1;
12321
12322 /* keep walking new program and skip insns we just inserted */
12323 env->prog = new_prog;
12324 insn = new_prog->insnsi + i + delta;
12325 }
12326
12327 return 0;
12328 }
[0] https://lore.kernel.org/bpf/20210817050843.GA21456@kili/
v1->v2:
- clarify that problem was only seen by static checker but not in prod;
Fixes:
|
||
Lukasz Luba
|
156eaacba3 |
PM: EM: Increase energy calculation precision
[ Upstream commit 7fcc17d0cb12938d2b3507973a6f93fc9ed2c7a1 ]
The Energy Model (EM) provides useful information about device power in
each performance state to other subsystems like: Energy Aware Scheduler
(EAS). The energy calculation in EAS does arithmetic operation based on
the EM em_cpu_energy(). Current implementation of that function uses
em_perf_state::cost as a pre-computed cost coefficient equal to:
cost = power * max_frequency / frequency.
The 'power' is expressed in milli-Watts (or in abstract scale).
There are corner cases when the EAS energy calculation for two Performance
Domains (PDs) return the same value. The EAS compares these values to
choose smaller one. It might happen that this values are equal due to
rounding error. In such scenario, we need better resolution, e.g. 1000
times better. To provide this possibility increase the resolution in the
em_perf_state::cost for 64-bit architectures. The cost of increasing
resolution on 32-bit is pretty high (64-bit division) and is not justified
since there are no new 32bit big.LITTLE EAS systems expected which would
benefit from this higher resolution.
This patch allows to avoid the rounding to milli-Watt errors, which might
occur in EAS energy estimation for each PD. The rounding error is common
for small tasks which have small utilization value.
There are two places in the code where it makes a difference:
1. In the find_energy_efficient_cpu() where we are searching for
best_delta. We might suffer there when two PDs return the same result,
like in the example below.
Scenario:
Low utilized system e.g. ~200 sum_util for PD0 and ~220 for PD1. There
are quite a few small tasks ~10-15 util. These tasks would suffer for
the rounding error. These utilization values are typical when running games
on Android. One of our partners has reported 5..10mA less battery drain
when running with increased resolution.
Some details:
We have two PDs: PD0 (big) and PD1 (little)
Let's compare w/o patch set ('old') and w/ patch set ('new')
We are comparing energy w/ task and w/o task placed in the PDs
a) 'old' w/o patch set, PD0
task_util = 13
cost = 480
sum_util_w/o_task = 215
sum_util_w_task = 228
scale_cpu = 1024
energy_w/o_task = 480 * 215 / 1024 = 100.78 => 100
energy_w_task = 480 * 228 / 1024 = 106.87 => 106
energy_diff = 106 - 100 = 6
(this is equal to 'old' PD1's energy_diff in 'c)')
b) 'new' w/ patch set, PD0
task_util = 13
cost = 480 * 1000 = 480000
sum_util_w/o_task = 215
sum_util_w_task = 228
energy_w/o_task = 480000 * 215 / 1024 = 100781
energy_w_task = 480000 * 228 / 1024 = 106875
energy_diff = 106875 - 100781 = 6094
(this is not equal to 'new' PD1's energy_diff in 'd)')
c) 'old' w/o patch set, PD1
task_util = 13
cost = 160
sum_util_w/o_task = 283
sum_util_w_task = 293
scale_cpu = 355
energy_w/o_task = 160 * 283 / 355 = 127.55 => 127
energy_w_task = 160 * 296 / 355 = 133.41 => 133
energy_diff = 133 - 127 = 6
(this is equal to 'old' PD0's energy_diff in 'a)')
d) 'new' w/ patch set, PD1
task_util = 13
cost = 160 * 1000 = 160000
sum_util_w/o_task = 283
sum_util_w_task = 293
scale_cpu = 355
energy_w/o_task = 160000 * 283 / 355 = 127549
energy_w_task = 160000 * 296 / 355 = 133408
energy_diff = 133408 - 127549 = 5859
(this is not equal to 'new' PD0's energy_diff in 'b)')
2. Difference in the 6% energy margin filter at the end of
find_energy_efficient_cpu(). With this patch the margin comparison also
has better resolution, so it's possible to have better task placement
thanks to that.
Fixes:
|
||
Waiman Long
|
9fdac650c4 |
cgroup/cpuset: Fix a partition bug with hotplug
[ Upstream commit 15d428e6fe77fffc3f4fff923336036f5496ef17 ]
In cpuset_hotplug_workfn(), the detection of whether the cpu list
has been changed is done by comparing the effective cpus of the top
cpuset with the cpu_active_mask. However, in the rare case that just
all the CPUs in the subparts_cpus are offlined, the detection fails
and the partition states are not updated correctly. Fix it by forcing
the cpus_updated flag to true in this particular case.
Fixes:
|
||
He Fengqing
|
004778bf39 |
bpf: Fix potential memleak and UAF in the verifier.
[ Upstream commit 75f0fc7b48ad45a2e5736bcf8de26c8872fe8695 ]
In bpf_patch_insn_data(), we first use the bpf_patch_insn_single() to
insert new instructions, then use adjust_insn_aux_data() to adjust
insn_aux_data. If the old env->prog have no enough room for new inserted
instructions, we use bpf_prog_realloc to construct new_prog and free the
old env->prog.
There have two errors here. First, if adjust_insn_aux_data() return
ENOMEM, we should free the new_prog. Second, if adjust_insn_aux_data()
return ENOMEM, bpf_patch_insn_data() will return NULL, and env->prog has
been freed in bpf_prog_realloc, but we will use it in bpf_check().
So in this patch, we make the adjust_insn_aux_data() never fails. In
bpf_patch_insn_data(), we first pre-malloc memory for the new
insn_aux_data, then call bpf_patch_insn_single() to insert new
instructions, at last call adjust_insn_aux_data() to adjust
insn_aux_data.
Fixes:
|
||
Zhen Lei
|
57c8e2ea47 |
genirq/timings: Fix error return code in irq_timings_test_irqs()
[ Upstream commit 290fdc4b7ef14e33d0e30058042b0e9bfd02b89b ]
Return a negative error code from the error handling case instead of 0, as
done elsewhere in this function.
Fixes:
|
||
Quentin Perret
|
449884aeb3 |
sched: Fix UCLAMP_FLAG_IDLE setting
[ Upstream commit ca4984a7dd863f3e1c0df775ae3e744bff24c303 ]
The UCLAMP_FLAG_IDLE flag is set on a runqueue when dequeueing the last
uclamp active task (that is, when buckets.tasks reaches 0 for all
buckets) to maintain the last uclamp.max and prevent blocked util from
suddenly becoming visible.
However, there is an asymmetry in how the flag is set and cleared which
can lead to having the flag set whilst there are active tasks on the rq.
Specifically, the flag is cleared in the uclamp_rq_inc() path, which is
called at enqueue time, but set in uclamp_rq_dec_id() which is called
both when dequeueing a task _and_ in the update_uclamp_active() path. As
a result, when both uclamp_rq_{dec,ind}_id() are called from
update_uclamp_active(), the flag ends up being set but not cleared,
hence leaving the runqueue in a broken state.
Fix this by clearing the flag in update_uclamp_active() as well.
Fixes:
|
||
Thomas Gleixner
|
cc608af36e |
hrtimer: Ensure timerfd notification for HIGHRES=n
[ Upstream commit 8c3b5e6ec0fee18bc2ce38d1dfe913413205f908 ] If high resolution timers are disabled the timerfd notification about a clock was set event is not happening for all cases which use clock_was_set_delayed() because that's a NOP for HIGHRES=n, which is wrong. Make clock_was_set_delayed() unconditially available to fix that. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210713135158.196661266@linutronix.de Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Thomas Gleixner
|
a845787830 |
hrtimer: Avoid double reprogramming in __hrtimer_start_range_ns()
[ Upstream commit 627ef5ae2df8eeccb20d5af0e4cfa4df9e61ed28 ] If __hrtimer_start_range_ns() is invoked with an already armed hrtimer then the timer has to be canceled first and then added back. If the timer is the first expiring timer then on removal the clockevent device is reprogrammed to the next expiring timer to avoid that the pending expiry fires needlessly. If the new expiry time ends up to be the first expiry again then the clock event device has to reprogrammed again. Avoid this by checking whether the timer is the first to expire and in that case, keep the timer on the current CPU and delay the reprogramming up to the point where the timer has been enqueued again. Reported-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210713135157.873137732@linutronix.de Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Frederic Weisbecker
|
c322a963d5 |
posix-cpu-timers: Force next expiration recalc after itimer reset
[ Upstream commit 406dd42bd1ba0c01babf9cde169bb319e52f6147 ] When an itimer deactivates a previously armed expiration, it simply doesn't do anything. As a result the process wide cputime counter keeps running and the tick dependency stays set until it reaches the old ghost expiration value. This can be reproduced with the following snippet: void trigger_process_counter(void) { struct itimerval n = {}; n.it_value.tv_sec = 100; setitimer(ITIMER_VIRTUAL, &n, NULL); n.it_value.tv_sec = 0; setitimer(ITIMER_VIRTUAL, &n, NULL); } Fix this with resetting the relevant base expiration. This is similar to disarming a timer. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210726125513.271824-4-frederic@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Sergey Senozhatsky
|
28996dbb8a |
rcu/tree: Handle VM stoppage in stall detection
[ Upstream commit ccfc9dd6914feaa9a81f10f9cce56eb0f7712264 ] The soft watchdog timer function checks if a virtual machine was suspended and hence what looks like a lockup in fact is a false positive. This is what kvm_check_and_clear_guest_paused() does: it tests guest PVCLOCK_GUEST_STOPPED (which is set by the host) and if it's set then we need to touch all watchdogs and bail out. Watchdog timer function runs from IRQ, so PVCLOCK_GUEST_STOPPED check works fine. There is, however, one more watchdog that runs from IRQ, so watchdog timer fn races with it, and that watchdog is not aware of PVCLOCK_GUEST_STOPPED - RCU stall detector. apic_timer_interrupt() smp_apic_timer_interrupt() hrtimer_interrupt() __hrtimer_run_queues() tick_sched_timer() tick_sched_handle() update_process_times() rcu_sched_clock_irq() This triggers RCU stalls on our devices during VM resume. If tick_sched_handle()->rcu_sched_clock_irq() runs on a VCPU before watchdog_timer_fn()->kvm_check_and_clear_guest_paused() then there is nothing on this VCPU that touches watchdogs and RCU reads stale gp stall timestamp and new jiffies value, which makes it think that RCU has stalled. Make RCU stall watchdog aware of PVCLOCK_GUEST_STOPPED and don't report RCU stalls when we resume the VM. Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Dietmar Eggemann
|
b7c560ae51 |
sched/deadline: Fix missing clock update in migrate_task_rq_dl()
[ Upstream commit b4da13aa28d4fd0071247b7b41c579ee8a86c81a ] A missing clock update is causing the following warning: rq->clock_update_flags < RQCF_ACT_SKIP WARNING: CPU: 112 PID: 2041 at kernel/sched/sched.h:1453 sub_running_bw.isra.0+0x190/0x1a0 ... CPU: 112 PID: 2041 Comm: sugov:112 Tainted: G W 5.14.0-rc1 #1 Hardware name: WIWYNN Mt.Jade Server System B81.030Z1.0007/Mt.Jade Motherboard, BIOS 1.6.20210526 (SCP: 1.06.20210526) 2021/05/26 ... Call trace: sub_running_bw.isra.0+0x190/0x1a0 migrate_task_rq_dl+0xf8/0x1e0 set_task_cpu+0xa8/0x1f0 try_to_wake_up+0x150/0x3d4 wake_up_q+0x64/0xc0 __up_write+0xd0/0x1c0 up_write+0x4c/0x2b0 cppc_set_perf+0x120/0x2d0 cppc_cpufreq_set_target+0xe0/0x1a4 [cppc_cpufreq] __cpufreq_driver_target+0x74/0x140 sugov_work+0x64/0x80 kthread_worker_fn+0xe0/0x230 kthread+0x138/0x140 ret_from_fork+0x10/0x18 The task causing this is the `cppc_fie` DL task introduced by commit 1eb5dde674f5 ("cpufreq: CPPC: Add support for frequency invariance"). With CONFIG_ACPI_CPPC_CPUFREQ_FIE=y and schedutil cpufreq governor on slow-switching system (like on this Ampere Altra WIWYNN Mt. Jade Arm Server): DL task `curr=sugov:112` lets `p=cppc_fie` migrate and since the latter is in `non_contending` state, migrate_task_rq_dl() calls sub_running_bw()->__sub_running_bw()->cpufreq_update_util()-> rq_clock()->assert_clock_updated() on p. Fix this by updating the clock for a non_contending task in migrate_task_rq_dl() before calling sub_running_bw(). Reported-by: Bruno Goncalves <bgoncalv@redhat.com> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Daniel Bristot de Oliveira <bristot@kernel.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/r/20210804135925.3734605-1-dietmar.eggemann@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Quentin Perret
|
bba2b82d1b |
sched/deadline: Fix reset_on_fork reporting of DL tasks
[ Upstream commit f95091536f78971b269ec321b057b8d630b0ad8a ] It is possible for sched_getattr() to incorrectly report the state of the reset_on_fork flag when called on a deadline task. Indeed, if the flag was set on a deadline task using sched_setattr() with flags (SCHED_FLAG_RESET_ON_FORK | SCHED_FLAG_KEEP_PARAMS), then p->sched_reset_on_fork will be set, but __setscheduler() will bail out early, which means that the dl_se->flags will not get updated by __setscheduler_params()->__setparam_dl(). Consequently, if sched_getattr() is then called on the task, __getparam_dl() will override kattr.sched_flags with the now out-of-date copy in dl_se->flags and report the stale value to userspace. To fix this, make sure to only copy the flags that are relevant to sched_deadline to and from the dl_se->flags field. Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210727101103.2729607-2-qperret@google.com Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Peter Zijlstra
|
a5e42516a6 |
locking/mutex: Fix HANDOFF condition
[ Upstream commit 048661a1f963e9517630f080687d48af79ed784c ] Yanfei reported that setting HANDOFF should not depend on recomputing @first, only on @first state. Which would then give: if (ww_ctx || !first) first = __mutex_waiter_is_first(lock, &waiter); if (first) __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF); But because 'ww_ctx || !first' is basically 'always' and the test for first is relatively cheap, omit that first branch entirely. Reported-by: Yanfei Xu <yanfei.xu@windriver.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Waiman Long <longman@redhat.com> Reviewed-by: Yanfei Xu <yanfei.xu@windriver.com> Link: https://lore.kernel.org/r/20210630154114.896786297@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Andrey Zhizhikin
|
e4dba8435a |
This is the 5.4.145 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmE9pNwACgkQONu9yGCS aT5g2Q//ZEiIQBvw6uZEA3z1Y6tuKFyIxxwOu24EbCTvB1oXXQX/XXQQPWori1Ny OpjuQwXJr2LW+/wEKvUEj8mTrpFD+LsZmXRLBHCw9EqqD5RURDqUZt+xh+4xtV/S 2EgF0nCO9+84wo628Lc1C2LBJZZEo/kD7LnGeln+BXwRS1FQvGfD+5KIpOR2YzqI hrCtVfO5ZQpv59PrAQkfwnfITk9BM1cwA7LCD75WEN59405ZV3mzFyTdvz8s0iGt akxmwajLNGjQ/ro567tjpsWiK7EF26mNRTMZqu1jK6h/KjU9sQ4DzCqB+p5TPh/9 mj/Rzq1lSjLodsR0OznKBqFIVaqXyTU+0cMItjos9MBsG/4GOj8ixbXdFRG99WmK bNsYucotSrE9ApYwsmqYaNcHcGeLUIsYCDFCQp3++oeF59+FA7Pp7B4bI/zcYRwY aqbfTkMzo8/e4OF0B2LCx+8r0xol1SoLwBfcP3hb7rlKp9OkSYsKrJ/29CUuINe1 YC5HdrPf2HP36jlVCll5rQa+ERaxtNSCozgwxHG/x2yeOmiVqxdE+vUUmyRidah8 DvYklCM7upUDi1ujbOwbor9R1jQSXkWMFK76EBB3GJPgguFNyczFXm8xBzfRLQvw H6YjIfnxNt+DLPn5uXIEhU7ISTkUno9i1BEd2NoeT1UiYTlk2bw= =/lic -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmE+VUgACgkQ7G51OISz Hs0D+A//c+QInKUmZ5jcHXXqaiD0JYt5+arGOIBAgIjRcWDL3yJpuuP5nwZRnuRf N0SyJjZIpb/WyS27zEaHwOg7xKkhCqagix/J9qdNDP94W0xoGKdXsxotnixXXjMo tY9RUqwA45i412x6RA4sXHBmyhgPwSuoLnHk+qVF4xOBD2rshVXzv6zQAO51Lg4J 3/a2agXlrreCGtHmnKwn47pW+mkHEaquCmcXaz/6PNbWDHwZpmmtCRlsvMW1PyMe 7TZ/mQ3fofSX3bxuzzYWmddcGTNtwe/IsmEFZmhC3Vl4BGtqE2YKj3439zca1572 6ttLDy9K3pAg/rf0NhIe7nIlogmJyunP6U8CVnuAwH0esixWxyXSSXUKTOXhKYFG jgv/+adhqjSZQhm1p1HY8KItsJE8o5m87b2UKMe1UKGExAJ66qJsgZtuufbEDA5r S+qnQEzE4lfaYmCH6WBtn6a1fodONw/KmsJwk5UhNlv2BVS8LQpcpADVRmq/mCDj FpCe7uFJVaDnxpUc1Trv6D6YJpL6n1KIUQcVyExtmBtpuKK/KTLs+NP5hpUZ9xtV ftAEiZkplJP+qACdCeZt5QukiHBIfds32sUZ4PDp3ANbMmY6UpTWGQyZq+7PNL4d +OzCCDzlXRdZioua6z8oTaAkabo2pcK5ao7GXSor35mU6JZvMVg= =Z0Kz -----END PGP SIGNATURE----- Merge tag 'v5.4.145' into 5.4-2.3.x-imx This is the 5.4.145 stable release Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com> |
||
Peter Zijlstra
|
56c77c1b52 |
kthread: Fix PF_KTHREAD vs to_kthread() race
commit 3a7956e25e1d7b3c148569e78895e1f3178122a9 upstream. The kthread_is_per_cpu() construct relies on only being called on PF_KTHREAD tasks (per the WARN in to_kthread). This gives rise to the following usage pattern: if ((p->flags & PF_KTHREAD) && kthread_is_per_cpu(p)) However, as reported by syzcaller, this is broken. The scenario is: CPU0 CPU1 (running p) (p->flags & PF_KTHREAD) // true begin_new_exec() me->flags &= ~(PF_KTHREAD|...); kthread_is_per_cpu(p) to_kthread(p) WARN(!(p->flags & PF_KTHREAD) <-- *SPLAT* Introduce __to_kthread() that omits the WARN and is sure to check both values. Use this to remove the problematic pattern for kthread_is_per_cpu() and fix a number of other kthread_*() functions that have similar issues but are currently not used in ways that would expose the problem. Notably kthread_func() is only ever called on 'current', while kthread_probe_data() is only used for PF_WQ_WORKER, which implies the task is from kthread_create*(). Fixes: ac687e6e8c26 ("kthread: Extract KTHREAD_IS_PER_CPU") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <Valentin.Schneider@arm.com> Link: https://lkml.kernel.org/r/YH6WJc825C4P0FCK@hirez.programming.kicks-ass.net Signed-off-by: Patrick Schaaf <bof@bof.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Andrey Zhizhikin
|
79c30f58eb |
This is the 5.4.144 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmEx2AAACgkQONu9yGCS aT7csg//ZhXXfRkPNMhpkkMjcV7F825mLAPs1vsluIEIZ0oInOpegu8SyDENOfui HyFLZ/2Stewa0mn7kNS1caAUXLpFvZ087sIz/SipzupFjLTUHFsNcMYrd19R1M4h UK/owAJeoq/pgR4kUck4o/r+47lo8CMqkscbEdKSvwxYUeANIcbGVB5Sf2UaJr5S lqBZeliWY/jYGvLWBoSc7mvUwWRbkKLnQu2JkfvGKM4ODOzpbh8TUhq8NxEL7ZFn mZxtNmWPvG2PHHvNP89pwKnKQx70ySKrlQdDv10gL6nIHhKuqwLxBo28Q+KcKMYr vfoOFS5Vk35jA7Xt8LhNF+lQtDTbN+2YLeDtoAq+aWMmEW/RUYXSU/3thh+WFuO5 uZZAbrh4r3bew+PLFpEtnVjxkpMsU9EC33KuIZXIGlDEkFlEneJ9pMQYH7XIwQnV 5sSSOnbyzkajxv9Kpu6XEg3kKyJf+gk/AB/psgfMR0v/jQ4PXVk9+cZDZxKFcxjj wGywDkgIb+/sPrABWici/yXjIup0OSG1fK9/Ki9uLgNzxXZ0h+4e3DcXNMxs1B/p GpBPP773qIff2lEDhAI+SbP8pHj5Mnc1j77WUQTU9vsIJcftYm4i0G+POpXnynzx gzbjJjOhTBL57OciLQlmL2s5ZZUPgPvu5VoHsRfwOu/bbarRADE= =RA6W -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmEx8s8ACgkQ7G51OISz Hs1JZA/8Cj7g56QymgMuXHEB1PecU7pLpO5egRK3X6xHxJwksD7Xp2LfpaRxjzGw XNQsp+4mbJX4oHiZPjD/RsFOdVuNU3ff3mliSmoH2Tdepa2TuKFt7T8V3GE7FN6K ns52rvIzbhF762nL1Vs+LE0YBq1w6rTvL7eenNxMo9pwUxJv95X91v7BpRQjTAY5 /ngvj8tRKN10dSokwrCpzk47Sj/jhSoLlckJL7+iOopQdhOo/HTfWj1aPCaZC/AX q2EUg/L2GB1Ij342lDNEZSWn2xAvuAT6+45R8p3GxyG6TMihwiKGXQM922MJDZAV T3Chxgu//OlB/spPMAuFgfBNqaX1z+zxv3Dc1EvEbSNPhn6PwEZ2ck9hYkuPmvI3 78dkyqj3x3AR5VKvc/CpnqSokXBjV7B1TOxJlHKvJ77lvWuDwujir+chmULjahA8 bVPpbBC9BfF/nX0cYsjQuDNyddqTpt3cv1Cp9w5gXhs/Nj5MsNDRyZxaVHlGaI/W h3N6rAU2cNDDtI4Zqr8Lo5IgBLMVUPuj9ZUNUJBKq3YX5CooEmjCtBKZch55Ou5h 6xmcaMgrFre3FHKfvVhJ5ACK/DoPuWLvr4Af4Q0v6kgif81Is4LQQ+EEXRLMryi2 fTv2X5r2GLoMHlUH0WgBW8pY0NuoCJiZuxCY5T1c61pCbDCpRss= =yagG -----END PGP SIGNATURE----- Merge tag 'v5.4.144' into 5.4-2.3.x-imx This is the 5.4.144 stable release Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com> |
||
Richard Guy Briggs
|
0634c0f919 |
audit: move put_tree() to avoid trim_trees refcount underflow and UAF
commit 67d69e9d1a6c889d98951c1d74b19332ce0565af upstream. AUDIT_TRIM is expected to be idempotent, but multiple executions resulted in a refcount underflow and use-after-free. git bisect fingered commit fb041bb7c0a9 ("locking/refcount: Consolidate implementations of refcount_t") but this patch with its more thorough checking that wasn't in the x86 assembly code merely exposed a previously existing tree refcount imbalance in the case of tree trimming code that was refactored with prune_one() to remove a tree introduced in commit |
||
Andrii Nakryiko
|
38adbf21f3 |
bpf: Fix cast to pointer from integer of different size warning
commit 2dedd7d2165565bafa89718eaadfc5d1a7865f66 upstream. Fix "warning: cast to pointer from integer of different size" when casting u64 addr to void *. Fixes: a23740ec43ba ("bpf: Track contents of read-only maps as scalars") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191011172053.2980619-1-andriin@fb.com Cc: Rafael David Tinoco <rafaeldtinoco@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Andrii Nakryiko
|
812ee47ad7 |
bpf: Track contents of read-only maps as scalars
commit a23740ec43ba022dbfd139d0fe3eff193216272b upstream. Maps that are read-only both from BPF program side and user space side have their contents constant, so verifier can track referenced values precisely and use that knowledge for dead code elimination, branch pruning, etc. This patch teaches BPF verifier how to do this. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191009201458.2679171-2-andriin@fb.com Signed-off-by: Rafael David Tinoco <rafaeldtinoco@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Andrey Zhizhikin
|
49ef8ef4cd |
Linux 5.4.143
-----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAmEnj2AACgkQ3qZv95d3 LNxNEQ//auFOSmgsMtI8LDmKlP/f22+FmICk8+IHeBMRBMDY0WGEEdsRZgcf4R7M hgyBn8ISmU5W0idpoxzVTiNxDJ0YVbVSIX12lZO6OHnwcv6hNW7iOW5TaGjd8EN+ fkh8MtAToBQrp4fFb1QkC11pYNMPiuvDNB2nW+F3ixfYLyC1EF4g2/qVUKy7s6rZ dbqDfuI3Q7R2opsIkpmsV7ClKGbJzsP7oo0H5EOQMpmOowhg3oJy8oYqMMTgij1T bJU8kujNElsK+/nbpVzJPrpprQH9eGP+hB5ZAv6s/FuJ6RmkoAczYQnX3HL6TfCS ymoyJ01gsmDic9RnG6qei5LkCwf5Td2SKjRZdqGWKTluWD1ZAwzUX8Ww6K+t5uWk PQPyCfU2wk2D3JjJWt0vTxl/GZGAkYbZpy5ISZFJhK7/j9/oTSrPWra7/BRu4K2I 2PK7XGjNyQxSguQqmG064Q0nYEOU03pR2H8tyG3iH0nBBd9p54D0Bg0D73I2h0az PoGhBo71m9SYCPP1zSXl+xLFyWGZZDUYCaU9KPlwkYCCcRUSQbfCKwrYHfEcHZgL 4QtYlpUi+/C0Ga7gAK9ierqCKSNTOpoVna618j97uqCYVIU8estLBqX4mMAQquVF R8+cy6L/aTBVw4Zwd0Jmt85GwBHlHahUGEq87+Qpw/laqjkBFcg= =SIVq -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmEorqsACgkQ7G51OISz Hs3MNQ/9GOK/4DrXYH4RTeGJG2+tTk8VMOrmoDxQOkGXggLvpw06gP1Q9Dz8qzee hkILNVr4xC1qxzBzCIAOwzSM0DOQeGJrjtlDPOHxFB6akmwfZ9mU7W4k3YaBHz2c +pp4YM0YalmVoJSDBVyFrN67q8gorK39yPgi3BC2tUAz9OSJfPsmdKwqe8ICe7Lp hq7R8VfeZdcmYW2vRF5v2yOlzg9vlEd2JfXVL+LJ3R9Eo2Ytlam3gaeObJJqtjDw MObvvSLeG2QjZ38tvrWjudfR1Z0hDFy/E1E1AI4y9STLHHQj0eM+3dzEP1mugVVk bvLeas2Raf8IA0tJfNQIgz54DcrCT7vHGblKkqESg6kJ/fVc+7CxaLf72DdU0KrN 0IzRHu3TX8LCj1BQax6eDWlysa3k837WkRMLK55NKPhqkDCb3IhpI3YB+rVgQUoa DpjmZ+sTDPypasZf2rGTdGosZBccsRdNbohmOuR202rKOeMnndGJRQrAwdYym/WO C9pKosNwBCFk4I97chihvamN/vH0MEJkICwR858I6+uCAMPISNqYcSbRYC3YXoyy VU0JBrp4hM5IlKvxKoRuT/Pplt54xhoWREXOS1ua+BOhcWox85QYt+d7uLxt02iZ 4u9IEXwy0QM+qxZaWw+9+IP4DfOgVa3BAsrVzRfAoHrbwMte7xY= =W2Mz -----END PGP SIGNATURE----- Merge tag 'v5.4.143' into 5.4-2.3.x-imx Linux 5.4.143 Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com> |
||
Steven Rostedt (VMware)
|
20c2f141b1 |
tracing / histogram: Fix NULL pointer dereference on strcmp() on NULL event name
[ Upstream commit 5acce0bff2a0420ce87d4591daeb867f47d552c2 ]
The following commands:
# echo 'read_max u64 size;' > synthetic_events
# echo 'hist:keys=common_pid:count=count:onmax($count).trace(read_max,count)' > events/syscalls/sys_enter_read/trigger
Causes:
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP
CPU: 4 PID: 1763 Comm: bash Not tainted 5.14.0-rc2-test+ #155
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01
v03.03 07/14/2016
RIP: 0010:strcmp+0xc/0x20
Code: 75 f7 31 c0 0f b6 0c 06 88 0c 02 48 83 c0 01 84 c9 75 f1 4c 89 c0
c3 0f 1f 80 00 00 00 00 31 c0 eb 08 48 83 c0 01 84 d2 74 0f <0f> b6 14 07
3a 14 06 74 ef 19 c0 83 c8 01 c3 31 c0 c3 66 90 48 89
RSP: 0018:ffffb5fdc0963ca8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffffffb3a4e040 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9714c0d0b640 RDI: 0000000000000000
RBP: 0000000000000000 R08: 00000022986b7cde R09: ffffffffb3a4dff8
R10: 0000000000000000 R11: 0000000000000000 R12: ffff9714c50603c8
R13: 0000000000000000 R14: ffff97143fdf9e48 R15: ffff9714c01a2210
FS: 00007f1fa6785740(0000) GS:ffff9714da400000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000002d863004 CR4: 00000000001706e0
Call Trace:
__find_event_file+0x4e/0x80
action_create+0x6b7/0xeb0
? kstrdup+0x44/0x60
event_hist_trigger_func+0x1a07/0x2130
trigger_process_regex+0xbd/0x110
event_trigger_write+0x71/0xd0
vfs_write+0xe9/0x310
ksys_write+0x68/0xe0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f1fa6879e87
The problem was the "trace(read_max,count)" where the "count" should be
"$count" as "onmax()" only handles variables (although it really should be
able to figure out that "count" is a field of sys_enter_read). But there's
a path that does not find the variable and ends up passing a NULL for the
event, which ends up getting passed to "strcmp()".
Add a check for NULL to return and error on the command with:
# cat error_log
hist:syscalls:sys_enter_read: error: Couldn't create or find variable
Command: hist:keys=common_pid:count=count:onmax($count).trace(read_max,count)
^
Link: https://lkml.kernel.org/r/20210808003011.4037f8d0@oasis.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
Fixes:
|
||
Ilya Leoshkevich
|
1fe038030c |
bpf: Clear zext_dst of dead insns
[ Upstream commit 45c709f8c71b525b51988e782febe84ce933e7e0 ] "access skb fields ok" verifier test fails on s390 with the "verifier bug. zext_dst is set, but no reg is defined" message. The first insns of the test prog are ... 0: 61 01 00 00 00 00 00 00 ldxw %r0,[%r1+0] 8: 35 00 00 01 00 00 00 00 jge %r0,0,1 10: 61 01 00 08 00 00 00 00 ldxw %r0,[%r1+8] ... and the 3rd one is dead (this does not look intentional to me, but this is a separate topic). sanitize_dead_code() converts dead insns into "ja -1", but keeps zext_dst. When opt_subreg_zext_lo32_rnd_hi32() tries to parse such an insn, it sees this discrepancy and bails. This problem can be seen only with JITs whose bpf_jit_needs_zext() returns true. Fix by clearning dead insns' zext_dst. The commits that contributed to this problem are: 1. |
||
Andrey Zhizhikin
|
eb3365561f |
This is the 5.4.142 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmEcr3MACgkQONu9yGCS aT6eag//f6COc7PQMKCJU7hcw0Xe4pIPmUUj+EpkwztzfX45dCzWxbhHxRiOqtKa ReSUXZ8mJLzYJgyHRr6FfsUqENWzKGqHby15yZ2h0rEyJns/V054NiBjz1aWoQZ4 axpF1SYaLfLglfobslLc/3+JbyTfxBK/+m6XnZRJXqMMFnJ+hljJJxXuCryEQnU6 KtAlrS2ITpbEyAECAE02oErxGDGCnTDzGpQvlSeJWqJVlisrsGIvGowjFliy6ONf YDjsejKlNUlQwnplXErefuXf7uhT/36sN0DnxCy5yXJ8SJwnzja3eYDz1yG9apG0 ZR7KM3dN3L8viuRx2GEOubh8EMbirErD9DrhaPyaNhEPKHI2cHxHdG2prj5WkBzZ OjXcW32FDWzw6/kfnHEOBl0OrmhsEIY1/pP8jegape8lDrj/szN0ViJe0rzElba1 6pb0D/ASFPYtYwR1O2/qZiPqqzHQEAFfDyDMKEKzogbNAHUbfvaE6g3qYafwQgS6 o+g/BBxtrGNaIWtMtQ75aeoqFA4mkE9MrLJ1SzEFpw/PvHCHtFItCyEcUwaNvEz8 OdwceDSIkT4Bn0GzEwuxKxcFyZ1R3rIABPIUGbid8Q3w6ZgM/vr2BPR3vkUY0zl+ g9DPae9S4K8A+kYGwyeYzZ0dPC6otb8h01RtiGJyQgyiPAeTutk= =QJcB -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmEc1fYACgkQ7G51OISz Hs0LrQ//cvoxuCT2kOcr+wqobXo3v9d/xEypsLwVaKrg0k8kjLGrqDUDqo7yFm4o WydvPqL4xd0m7H3+PMSojwydq5o4Jx7x3+pGOt19xrdunGSj8R2eKD/4sboTWM3M w1p1cjladgX2rcbdpuh7wVVB6VFG49QVZJpum/nSvfkt4n9rhza4rT9/2wVcfJ1m QwI1F66lbGm8zQ8bAHJyflZz6QbJWbAlbEZwKM7SWU+hR5LW9wyOXdFgW6p5r2LM 9FCnLA2K2wVuXUZcUKXXBDpctIeoUS8dc3MaXFKRMgjdCwMvwWp5GYUuGQIkSsRB xE25Sq1R22Jx2mnq4V9EVAUEN7KCVHQKVprrlDp1aPDP/xndlXDMXHf5oLW/12O7 O1p2XehdxcTA/KZEETTLdiMx23Gku0NjzjKuZGTc0Op+iwF7VjHVvSOgrw+TtTMn 05wwH472TFwnpXo9HzT1ugh8RnmeF2qD3fNIzwaQIAgDSuMG8Xdu8V34UiMK3Gpc /kXoAzPQgpAlzyAd2cP4IuPYs8PRNFskeZmH6rbrhwYGCZ6IhN/dJnNAD6LWKn6d 99UxBoIPKVgi191/Gx+L6J1+Bi8Ulov8qMNVBp8Bv8RDLdGhozIxu7GzNxvklEVZ s4mvZ7mVgtsXGxgbQWnRkKVkJMkQ3P9j2B0b0Cjsmz+1OOw/9xw= =be34 -----END PGP SIGNATURE----- Merge tag 'v5.4.142' into 5.4-2.3.x-imx This is the 5.4.142 stable release Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com> |
||
Ben Dai
|
0c8dea3fd5 |
genirq/timings: Prevent potential array overflow in __irq_timings_store()
commit b9cc7d8a4656a6e815852c27ab50365009cb69c1 upstream.
When the interrupt interval is greater than 2 ^ PREDICTION_BUFFER_SIZE *
PREDICTION_FACTOR us and less than 1s, the calculated index will be greater
than the length of irqs->ema_time[]. Check the calculated index before
using it to prevent array overflow.
Fixes:
|
||
Bixuan Cui
|
4dfe809271 |
genirq/msi: Ensure deactivation on teardown
commit dbbc93576e03fbe24b365fab0e901eb442237a8a upstream.
msi_domain_alloc_irqs() invokes irq_domain_activate_irq(), but
msi_domain_free_irqs() does not enforce deactivation before tearing down
the interrupts.
This happens when PCI/MSI interrupts are set up and never used before being
torn down again, e.g. in error handling pathes. The only place which cleans
that up is the error handling path in msi_domain_alloc_irqs().
Move the cleanup from msi_domain_alloc_irqs() into msi_domain_free_irqs()
to cure that.
Fixes:
|
||
Thomas Gleixner
|
eda32c2188 |
genirq: Provide IRQCHIP_AFFINITY_PRE_STARTUP
commit 826da771291fc25a428e871f9e7fb465e390f852 upstream.
X86 IO/APIC and MSI interrupts (when used without interrupts remapping)
require that the affinity setup on startup is done before the interrupt is
enabled for the first time as the non-remapped operation mode cannot safely
migrate enabled interrupts from arbitrary contexts. Provide a new irq chip
flag which allows affected hardware to request this.
This has to be opt-in because there have been reports in the past that some
interrupt chips cannot handle affinity setting before startup.
Fixes:
|
||
Andrey Zhizhikin
|
915e71b823 |
This is the 5.4.141 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmEY9Z4ACgkQONu9yGCS aT6Zcw/9Hw/LCgboD9SnkaHWx1vWwXilz4uqTp6+qfCUZmtjWnvORA/wUzAqhcwu 1pO3aveUVLS4FyGQbn4vdMs5Zv7ZkDHchJ+FnloLAPnIUI8xce1YxKkwT8ysvvOF jFf2vPOr7PqCxTlGZ3MitJPhl4mkVghlwq6rk/EwFuqG8JRUiwL2jeHVX24VywJI E+XKOMJwXHdB+VWmmP5yvqDcPcQwYVAB01BBMWYEHFa4WTrRwICsQz6Y1319YAFT Sb0m2g/Pwv3YEzdgtCCTrB8FOr2Bjum/KoAZd+v9tQuFkZSwp14XhAlCWlYBSHUE AWoVc9w6KYNt4cCdem7P8oZWqAnkR9pLTwlg1XWZCQCTzqIp8YfsjTMONyKf/94O fHVTHTh7x0f2taGhAmzo52dIRBUBA8bGQ8F/t1XUc51RWn9CEBc230ich8BQtH1t hm3X9eB1CM9FRg4imIf9RW5X8+gkTSYNfrnpBh7IEKXIPiiKpDLyO3MN4IuFLBzN fRPhQacTxCq8QgO6eiS8M5cafglLL3U7GsE/qyiVQjVeoGNnLZINamdfmDLsgfGH f67pTwVNKNcoT1XJf1MqgDM2IoAcZcQqLtNhedva9cC74SiHv+nf28eDWwJJ5u10 fdNJowrZdF828BCuee6UYUsOXJ6r8I6UybLuzpZzpvBvvMiYZH0= =edfG -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmEaTrMACgkQ7G51OISz Hs0cJw//SsJNcVX8LJxpcWj0NS4+HNz7tNZGJZ1MP4C1LjoMNolqJrHQccs9sbEW bfDSI1LnWd49Haoua+KMn5hysHzdxa9w/dhAkbNUopt1MFP7WKzpWjp93eOpydfc 4smXDOByGyklyXeXf7xWwJo4BVXBC+OutMhXFMrTzagIIsi0K7p9ToedSVEDOIgn Sa22tOiDsolryAqSFHmn3mTc0ju1SPS2+wT1oxFuDzOWQJGHPXeaFl1sgIU8F7q5 HixGaxMU9ofYI9L7pr4q8Gka3dSRfDpWGxG0sp9rjgI/AHX2sCNQbd7ueRPLJaEn A2AGDBVNVUK+YuTnBoHoHTK6GSR2gV656AK/EsSTPedant6id6I3PrffANWfDv0Y TtdxC68r2DQisUiOqFn7gCt0td456P+pASfJNuYy6yISA1Wsy1zjBULjc+YPqfVd IOIJV9PtE3VqRfdRsZtXA5G7QBzlTGtDs9EKPg3EzSgAn50aDt3cYrIx7XwoNe/7 TkMQED/PR3Msr5qU60abkeqsMMaM37cV8ZFerOB4we/6MmD6aPbThzLza7JEvvjg IP4GlgcxzmBH6aFLJlGZa2Vw6ge4NzUgiRRjrNz3lZN+6R1zdZKj5ECEVtOrLpdV L1XLfT58wsKi3CFWfGWu4I3B6FONwaolg1D7tALH2fmK6D6q+TU= =dQtZ -----END PGP SIGNATURE----- Merge tag 'v5.4.141' into 5.4-2.3.x-imx This is the 5.4.141 stable release Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com> |
||
Andrey Zhizhikin
|
49dc55b9cb |
This is the 5.4.140 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmEVBCYACgkQONu9yGCS aT53WxAAqljdZCHORMxU9rnAHSGNHMtGH3UA7TXDU3SKOYSDRW4FOxI3XUJzJLeW jWB/ZXRSeNmSpwFVmUNYhMkHP3VTXDp73xx2y8DI8U20ykiTeyO6Ed+zW8GluWBP uvvdtjV511wspCUiGKOnD88z9FKvfb5OQKxRb03XrwxQqo3JvWSB5QZhWaBP0UnW j6YWAQm/luvsjx0V4sW36mDj3FWihtlyFyh4Psa7yOdlu6whgLZdGMeSCqsGAcGx 6SdshcXrMpJqU9op70a2WHbo8YYaEyLZ4bOK5FmXPfKokh7HmqHEXi7HuW2UcDmr hi3bR455LqQchw3a7OtiGaEF4liUnJw+EIQx1kaA330EvjlIUwayxdyTitZ/z+5c x9i3NS6bLFUL0FPl79tM5oyd7cR4ZSyrqIAVmE8Z+npCuk3XcKWgxfTvuPemgoBk 89Lbpe+C/zWBkStZFmK8OHAv9iBhP/jR2TmRtRhgHJQkV5qCiXCHejb3g8jur99F q4a9AmvN2ignkejh0darNXk2VdfTBfWIVrXjhcncsHSHGcV4xbc1uDyqQad0aug5 iRtmvkmYG0SruHFi3mF9KhKP1IjD0vI2uah6GeX0FLb8zQIuddNpkXSZMS/MZV0c pZicz6qB4JYT3AiiFEmfDtt1FGMwf1weZBmrfHE1OH1FWiZYC/w= =5ku+ -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmEaTq4ACgkQ7G51OISz Hs2sKg//UmCY1/aMvA+3Tq2VmyyYN9Rp0NZdocQWTpw3yMEIla7JpxSqWWQi9/6U cawfBRYwoY1OnpQL/heyAptuV7/kZdaJMEpFd//DvdnDabnaxKMnTnRkyh+VdIw0 vC9Bk/oHDK+ZTcNhbBqZVscmOJ3ox20t/ST/u4SeAq8dYew78AfAV4D1GjfN48Id 18qDzCg+TX9CXxXUGTyX4V9G+MnBnfjeUcb1U2bsHqQ8uUCLtFVm5zc42u6GrD3x VDnh2WTnnhryc/fefitjUILVKvYRfDVTagERRKB9VldlXBVz0LxXcnmGfRMkeFR9 zuL/9j4lOtCWaSoqpkXUpvpYgW35TJN+4EVeO8sUCqztzCyNAW9M4Qrf0OvC5aTE pi/v8b6BzuqJczPMBggk2SdetCqYvgJbeMS2nBZsgkAZk1zplUOEosSHWtToGFxo g2rPnHlxhBabTuQAXSQeV5wHs7h+cUhd7TSpWWpcEGRLP4qwXEfgw0ktLkGLxg1q 9xQc/utISWlbv1bjqNPjbc8Vi3nX20PqWTVc+o3QFRGRU/9xqCYoKJ5wdZWe/8zR mRw55Rz460m8W28IFHiGFDpNB236wAcqgisiaEsHYGkpS1WYvaIdKXDwckCXFE2C 6xbaMkfWU0z20MfbBxuhlv+Pipv+jrD3qtDQb60y57GP6BiRbOM= =3HAp -----END PGP SIGNATURE----- Merge tag 'v5.4.140' into 5.4-2.3.x-imx This is the 5.4.140 stable release Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com> |
||
Andrey Zhizhikin
|
bbdb668ff5 |
This is the 5.4.139 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmEPgf0ACgkQONu9yGCS aT6RQA//cKA0KmfDEwpNktHuGWMnhbjuf+WSsjqqoKRYCCdbBhc/HMTL05Xjbvpg VCrYchavp8lwvSd8d0cFMA4jcE1zjut+JzG08W1aIV0DJDflbCLlM8jzl/3Ft6c8 CTWHRNEyBUw1ynaUVV/L+Vlox9GTk4SYY92pXX6Ciar0sJHLeXDw9VK/NUQG51d7 ctfvro0D8JM0+HHG/CZM+wkmpMW5nUNCnBubsb3fp5Tpi2rMCyxVVyj+NwT+mYO5 jCOl/DTMJBLFBqG53cwP/sEqTvLrqhCF3ZRPBi5hmLm5+NfvWz3Orlalfn0nFU0n n+7fKUH/LghuduXnxSMwAtbZUhP6rGqDwOnMJtqEiGJQloNC1f/ER1VNFvOG/bm0 +SQBB6iR56Z+cnqKpyz41JdOsUk4Y2dDRA5bh1h5bw4ctfXDBgQ/OqXWHIboLlQg 7BNlq1tQoUSu8IHhJtZJLtpdSLs6jtZ4nPtAeMjLDElYJIKtzCKhSkGnyWA8V5i/ V07zDlYBFryyvBJcJEgNHLaZt7wh0MEDYinOlnxOzapG8JYabItmioFABGzOCXu4 2QXCWEuIdMk+J79yQIGGUNSRKWwTyPoxBRbkbAHU0hXHI6R9V6V0/3Rp8hlcoPZd MSU77GD306j/+ekM04gNZrI0ploywEbqxcDoM2XBSXcZTrFxtdg= =FS2D -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmEaTqUACgkQ7G51OISz Hs3zOA//REX8dd54cRbpgOAS+yYDcRuy+SfUTpKb+BFFWBEF599fezX7NKV+0ubB mvp3//bpd35NsmSV5rtD0+CcD4NYbLu2fO5zG+50y0HcxKRrgveifquitemjGKhg nMITs9F5ZvDCteMciW6zb6xZNfM6ehEpIGWtMuACgOm89AubGfrk9ZrCd4Wk/Uaw 6pvPpDjzElOfJ8un7F1vwmcbbY/ApyvDdzYnZf6fKzNNfx4dkDt2uEjlLUlw0Wee Xahd610tfG8YYTCecbEvqtgy9RSy3TI19sMKM86GD3IJCo+LVmJ8g475A8FkglCF 8xhogK8Px2LqxwVOX4wtm/o7hP6wtzwjScapAVN6TPx2t++Ab7WzpMJEpiy4gGFh u7BVoS9SbVjQU4tlonXEncGm1Bj3qw0UmdW3H9VkCnlIQUYMR88b3KeZhR4cZUDv SPhvY5JEn94F80B6Bbm926eOBeRAwtqRezW5er3kzCZA3m0RlOMicwf2UsLlQd2a cifoD3af5d3KJZzMVUX3uO8G4ArT7qJBtS4CeKA1U7TUbrPT9DecGILKdV61/E2L +Fg05QYe+Xyh0cI/K6nsdrnLkVCFq7uAT/6TyW6RNl4SJHf0Wli9qsdkDJIN6QQw q2hy8GOJvDk3sFr7W3C51uFnuBHU0uYC2cQp5M3e4zG1UCfRADg= =Aw5O -----END PGP SIGNATURE----- Merge tag 'v5.4.139' into 5.4-2.3.x-imx This is the 5.4.139 stable release Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com> |
||
Masami Hiramatsu
|
396f29ea0c |
tracing: Reject string operand in the histogram expression
commit a9d10ca4986571bffc19778742d508cc8dd13e02 upstream.
Since the string type can not be the target of the addition / subtraction
operation, it must be rejected. Without this fix, the string type silently
converted to digits.
Link: https://lkml.kernel.org/r/162742654278.290973.1523000673366456634.stgit@devnote2
Cc: stable@vger.kernel.org
Fixes:
|
||
Thomas Gleixner
|
42ac2c6348 |
timers: Move clearing of base::timer_running under base:: Lock
commit bb7262b295472eb6858b5c49893954794027cd84 upstream.
syzbot reported KCSAN data races vs. timer_base::timer_running being set to
NULL without holding base::lock in expire_timers().
This looks innocent and most reads are clearly not problematic, but
Frederic identified an issue which is:
int data = 0;
void timer_func(struct timer_list *t)
{
data = 1;
}
CPU 0 CPU 1
------------------------------ --------------------------
base = lock_timer_base(timer, &flags); raw_spin_unlock(&base->lock);
if (base->running_timer != timer) call_timer_fn(timer, fn, baseclk);
ret = detach_if_pending(timer, base, true); base->running_timer = NULL;
raw_spin_unlock_irqrestore(&base->lock, flags); raw_spin_lock(&base->lock);
x = data;
If the timer has previously executed on CPU 1 and then CPU 0 can observe
base->running_timer == NULL and returns, assuming the timer has completed,
but it's not guaranteed on all architectures. The comment for
del_timer_sync() makes that guarantee. Moving the assignment under
base->lock prevents this.
For non-RT kernel it's performance wise completely irrelevant whether the
store happens before or after taking the lock. For an RT kernel moving the
store under the lock requires an extra unlock/lock pair in the case that
there is a waiter for the timer, but that's not the end of the world.
Reported-by: syzbot+aa7c2385d46c5eba0b89@syzkaller.appspotmail.com
Reported-by: syzbot+abea4558531bae1ba9fe@syzkaller.appspotmail.com
Fixes:
|
||
Steven Rostedt (VMware)
|
7da261e6bb |
tracing / histogram: Give calculation hist_fields a size
commit 2c05caa7ba8803209769b9e4fe02c38d77ae88d0 upstream.
When working on my user space applications, I found a bug in the synthetic
event code where the automated synthetic event field was not matching the
event field calculation it was attached to. Looking deeper into it, it was
because the calculation hist_field was not given a size.
The synthetic event fields are matched to their hist_fields either by
having the field have an identical string type, or if that does not match,
then the size and signed values are used to match the fields.
The problem arose when I tried to match a calculation where the fields
were "unsigned int". My tool created a synthetic event of type "u32". But
it failed to match. The string was:
diff=field1-field2:onmatch(event).trace(synth,$diff)
Adding debugging into the kernel, I found that the size of "diff" was 0.
And since it was given "unsigned int" as a type, the histogram fallback
code used size and signed. The signed matched, but the size of u32 (4) did
not match zero, and the event failed to be created.
This can be worse if the field you want to match is not one of the
acceptable fields for a synthetic event. As event fields can have any type
that is supported in Linux, this can cause an issue. For example, if a
type is an enum. Then there's no way to use that with any calculations.
Have the calculation field simply take on the size of what it is
calculating.
Link: https://lkml.kernel.org/r/20210730171951.59c7743f@oasis.local.home
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Fixes:
|
||
Daniel Borkmann
|
fd568de580 |
bpf: Fix leakage under speculation on mispredicted branches
commit 9183671af6dbf60a1219371d4ed73e23f43b49db upstream
The verifier only enumerates valid control-flow paths and skips paths that
are unreachable in the non-speculative domain. And so it can miss issues
under speculative execution on mispredicted branches.
For example, a type confusion has been demonstrated with the following
crafted program:
// r0 = pointer to a map array entry
// r6 = pointer to readable stack slot
// r9 = scalar controlled by attacker
1: r0 = *(u64 *)(r0) // cache miss
2: if r0 != 0x0 goto line 4
3: r6 = r9
4: if r0 != 0x1 goto line 6
5: r9 = *(u8 *)(r6)
6: // leak r9
Since line 3 runs iff r0 == 0 and line 5 runs iff r0 == 1, the verifier
concludes that the pointer dereference on line 5 is safe. But: if the
attacker trains both the branches to fall-through, such that the following
is speculatively executed ...
r6 = r9
r9 = *(u8 *)(r6)
// leak r9
... then the program will dereference an attacker-controlled value and could
leak its content under speculative execution via side-channel. This requires
to mistrain the branch predictor, which can be rather tricky, because the
branches are mutually exclusive. However such training can be done at
congruent addresses in user space using different branches that are not
mutually exclusive. That is, by training branches in user space ...
A: if r0 != 0x0 goto line C
B: ...
C: if r0 != 0x0 goto line D
D: ...
... such that addresses A and C collide to the same CPU branch prediction
entries in the PHT (pattern history table) as those of the BPF program's
lines 2 and 4, respectively. A non-privileged attacker could simply brute
force such collisions in the PHT until observing the attack succeeding.
Alternative methods to mistrain the branch predictor are also possible that
avoid brute forcing the collisions in the PHT. A reliable attack has been
demonstrated, for example, using the following crafted program:
// r0 = pointer to a [control] map array entry
// r7 = *(u64 *)(r0 + 0), training/attack phase
// r8 = *(u64 *)(r0 + 8), oob address
// [...]
// r0 = pointer to a [data] map array entry
1: if r7 == 0x3 goto line 3
2: r8 = r0
// crafted sequence of conditional jumps to separate the conditional
// branch in line 193 from the current execution flow
3: if r0 != 0x0 goto line 5
4: if r0 == 0x0 goto exit
5: if r0 != 0x0 goto line 7
6: if r0 == 0x0 goto exit
[...]
187: if r0 != 0x0 goto line 189
188: if r0 == 0x0 goto exit
// load any slowly-loaded value (due to cache miss in phase 3) ...
189: r3 = *(u64 *)(r0 + 0x1200)
// ... and turn it into known zero for verifier, while preserving slowly-
// loaded dependency when executing:
190: r3 &= 1
191: r3 &= 2
// speculatively bypassed phase dependency
192: r7 += r3
193: if r7 == 0x3 goto exit
194: r4 = *(u8 *)(r8 + 0)
// leak r4
As can be seen, in training phase (phase != 0x3), the condition in line 1
turns into false and therefore r8 with the oob address is overridden with
the valid map value address, which in line 194 we can read out without
issues. However, in attack phase, line 2 is skipped, and due to the cache
miss in line 189 where the map value is (zeroed and later) added to the
phase register, the condition in line 193 takes the fall-through path due
to prior branch predictor training, where under speculation, it'll load the
byte at oob address r8 (unknown scalar type at that point) which could then
be leaked via side-channel.
One way to mitigate these is to 'branch off' an unreachable path, meaning,
the current verification path keeps following the is_branch_taken() path
and we push the other branch to the verification stack. Given this is
unreachable from the non-speculative domain, this branch's vstate is
explicitly marked as speculative. This is needed for two reasons: i) if
this path is solely seen from speculative execution, then we later on still
want the dead code elimination to kick in in order to sanitize these
instructions with jmp-1s, and ii) to ensure that paths walked in the
non-speculative domain are not pruned from earlier walks of paths walked in
the speculative domain. Additionally, for robustness, we mark the registers
which have been part of the conditional as unknown in the speculative path
given there should be no assumptions made on their content.
The fix in here mitigates type confusion attacks described earlier due to
i) all code paths in the BPF program being explored and ii) existing
verifier logic already ensuring that given memory access instruction
references one specific data structure.
An alternative to this fix that has also been looked at in this scope was to
mark aux->alu_state at the jump instruction with a BPF_JMP_TAKEN state as
well as direction encoding (always-goto, always-fallthrough, unknown), such
that mixing of different always-* directions themselves as well as mixing of
always-* with unknown directions would cause a program rejection by the
verifier, e.g. programs with constructs like 'if ([...]) { x = 0; } else
{ x = 1; }' with subsequent 'if (x == 1) { [...] }'. For unprivileged, this
would result in only single direction always-* taken paths, and unknown taken
paths being allowed, such that the former could be patched from a conditional
jump to an unconditional jump (ja). Compared to this approach here, it would
have two downsides: i) valid programs that otherwise are not performing any
pointer arithmetic, etc, would potentially be rejected/broken, and ii) we are
required to turn off path pruning for unprivileged, where both can be avoided
in this work through pushing the invalid branch to the verification stack.
The issue was originally discovered by Adam and Ofek, and later independently
discovered and reported as a result of Benedict and Piotr's research work.
Fixes:
|
||
Daniel Borkmann
|
d2f790327f |
bpf: Do not mark insn as seen under speculative path verification
commit fe9a5ca7e370e613a9a75a13008a3845ea759d6e upstream ... in such circumstances, we do not want to mark the instruction as seen given the goal is still to jmp-1 rewrite/sanitize dead code, if it is not reachable from the non-speculative path verification. We do however want to verify it for safety regardless. With the patch as-is all the insns that have been marked as seen before the patch will also be marked as seen after the patch (just with a potentially different non-zero count). An upcoming patch will also verify paths that are unreachable in the non-speculative domain, hence this extension is needed. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Reviewed-by: Benedict Schlueter <benedict.schlueter@rub.de> Reviewed-by: Piotr Krysiuk <piotras@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> [OP: - env->pass_cnt is not used in 5.4, so adjust sanitize_mark_insn_seen() to assign "true" instead - drop sanitize_insn_aux_data() comment changes, as the function is not present in 5.4] Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |