linux-brain

Commit Graph

Author	SHA1	Message	Date
Andrey Zhizhikin	e9646ca701	This is the 5.4.132 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmDu+p0ACgkQONu9yGCS aT5SOw/9F58e4gz7PSTn4A9oCTNodRPe9B9rzf3y1Ol0k7T1aeQoWsPFOkZpNSOJ tdOGEXnwYnLpMC7nuFshWv1uKGAL/weHADyGV6J37AntYFjpEFhJhSH7pGGhDk7V EeIl98luBynPXOKNnDvcrQweeRaHKOInQBT8JJzwwsZbF2oqfOqdU0A787BiRu+3 zoi/mV0upDB443ji/JY0xj+o4jlbsuD0WxEqgkcD2YHL+QvU5Wr0mGys7m5gG9x7 TpKpMic0ILrF1vt/znLL5rOlX497prTvZ74ZXV/DYizeYxqtl/UG3CZjo1uf2yqk pAXA57paz6DY2Ct+3QbJBeuer27bTz6SCClSS1om9AcUk6oNSdULmMdTGvQb0SLU wx1Cy8b2ei04SVl96+McKKZ6ln47LJediGn0qIdwC6O/XHHrLq4u5PkSnQxRU4pA GH1tP5oYy4GzL9RbBeiDJQETFiXwkexSEWVyuSc6BhqQXao9yVzmLQbL1zgjH/zO m/tckZ3vEg+ll8j4QJCisHRyqYhwfru4PsJQH9Q7q6CtIuGOsd0Z/OUcLuF6knXg jDOrDIykE/PnkQ2Dc2RhdONP1ud5j3oBnHvNHs6FDghRKjaixMQzg3g/RNtnAaTj +7Xsfbi6ntpZSDOaY7YNgt+ZH3l4YRnUL/xBA6qIygayz374nzI= =LU0G -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmD25ikACgkQ7G51OISz Hs1dcw/8DKef1hGC5O2WKfpInTYtgnClkyD5/yOnGAPMvMRDGybA3dRejpIEefNM Qol1XICjb0wdBDV0I+n+fnGbBgaX3g0N/pn16pdbbSPBBe1L+d97gZNZznDGHYZu 033qtbxii8e0QTxTvO7nx4L80ZZsyPLchpPxowS/vd1Ezti+pTIU4y43MCc2jYLL KqUBDz72TkPLhgVZdDJ1z9gb+OoJ+sJPaeBrO57hpY/os9SxlMPeY56YrD3Hyfy5 IHZw3bTCDiIXpHBaJG8fvuudaM5M8V3dbD6oXnEPo1Gzb1Y7WR4Z7q28g7arSYjP fMPd243mCXd1V7LpmapxXvFsnbdsA7oauTho50dwmEvxQf9jEgX6thBWAFsrItaS crHdOppS7Lc3FK8cTMxZd6ZyZpaU6sF183tMOteuhtwmF/uoy1LBHqLnAvtfWYrl InGcImgABRkiYBRyODlgC4UNLd49Svon/8HcbBZlmeGIkosXjo5r1itnipgnF/TB /NkHRkixYTBCnJZyx+9Lihqw+HMnHVfjOnIBjbXjzX9ITH/tiMn4y87E+x9vRQqr Td5AKJwiSXSWZBQoX+XNLqXRwjZKHVQe45J4gzL9dhCzi9bwK99BvBPWr8+JyI7w 83YQfkhPju47+KFrEN6DUBxdYrROsJLsjgdTl38IlCi4SKoQSkE= =Li9I -----END PGP SIGNATURE----- Merge tag 'v5.4.132' into 5.4-2.3.x-imx This is the 5.4.132 stable release Conflicts (manual resolve): - drivers/gpu/drm/rockchip/cdn-dp-core.c: Fix merge hiccup when integrating upstream commit `450c25b8a4` ("drm/rockchip: cdn-dp-core: add missing clk_disable_unprepare() on error in cdn_dp_grf_write()") - drivers/perf/fsl_imx8_ddr_perf.c: Port upstream commit `3fea9b708a` ("drivers/perf: fix the missed ida_simple_remove() in ddr_perf_probe()") manually to NXP version. Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>	2021-07-20 15:04:13 +00:00
Paul E. McKenney	61f6c18fff	rcu: Invoke rcu_spawn_core_kthreads() from rcu_spawn_gp_kthread() [ Upstream commit 8e4b1d2bc198e34b48fc7cc3a3c5a2fcb269e271 ] Currently, rcu_spawn_core_kthreads() is invoked via an early_initcall(), which works, except that rcu_spawn_gp_kthread() is also invoked via an early_initcall() and rcu_spawn_core_kthreads() relies on adjustments to kthread_prio that are carried out by rcu_spawn_gp_kthread(). There is no guaranttee of ordering among early_initcall() handlers, and thus no guarantee that kthread_prio will be properly checked and range-limited at the time that rcu_spawn_core_kthreads() needs it. In most cases, this bug is harmless. After all, the only reason that rcu_spawn_gp_kthread() adjusts the value of kthread_prio is if the user specified a nonsensical value for this boot parameter, which experience indicates is rare. Nevertheless, a bug is a bug. This commit therefore causes the rcu_spawn_core_kthreads() function to be invoked directly from rcu_spawn_gp_kthread() after any needed adjustments to kthread_prio have been carried out. Fixes: `48d07c04b4` ("rcu: Enable elimination of Tree-RCU softirq processing") Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:53:36 +02:00
Qais Yousef	97f32c7f33	sched/uclamp: Fix uclamp_tg_restrict() [ Upstream commit 0213b7083e81f4acd69db32cb72eb4e5f220329a ] Now cpu.uclamp.min acts as a protection, we need to make sure that the uclamp request of the task is within the allowed range of the cgroup, that is it is clamp()'ed correctly by tg->uclamp[UCLAMP_MIN] and tg->uclamp[UCLAMP_MAX]. As reported by Xuewen [1] we can have some corner cases where there's inversion between uclamp requested by task (p) and the uclamp values of the taskgroup it's attached to (tg). Following table demonstrates 2 corner cases: \| p \| tg \| effective -----------+-----+------+----------- CASE 1 -----------+-----+------+----------- uclamp_min \| 60% \| 0% \| 60% -----------+-----+------+----------- uclamp_max \| 80% \| 50% \| 50% -----------+-----+------+----------- CASE 2 -----------+-----+------+----------- uclamp_min \| 0% \| 30% \| 30% -----------+-----+------+----------- uclamp_max \| 20% \| 50% \| 20% -----------+-----+------+----------- With this fix we get: \| p \| tg \| effective -----------+-----+------+----------- CASE 1 -----------+-----+------+----------- uclamp_min \| 60% \| 0% \| 50% -----------+-----+------+----------- uclamp_max \| 80% \| 50% \| 50% -----------+-----+------+----------- CASE 2 -----------+-----+------+----------- uclamp_min \| 0% \| 30% \| 30% -----------+-----+------+----------- uclamp_max \| 20% \| 50% \| 30% -----------+-----+------+----------- Additionally uclamp_update_active_tasks() must now unconditionally update both UCLAMP_MIN/MAX because changing the tg's UCLAMP_MAX for instance could have an impact on the effective UCLAMP_MIN of the tasks. \| p \| tg \| effective -----------+-----+------+----------- old -----------+-----+------+----------- uclamp_min \| 60% \| 0% \| 50% -----------+-----+------+----------- uclamp_max \| 80% \| 50% \| 50% -----------+-----+------+----------- new -----------+-----+------+----------- uclamp_min \| 60% \| 0% \| 60% -----------+-----+------+----------- uclamp_max \| 80% \|70% \| 70% -----------+-----+------+----------- [1] https://lore.kernel.org/lkml/CAB8ipk_a6VFNjiEnHRHkUMBKbA+qzPQvhtNjJ_YNzQhqV_o8Zw@mail.gmail.com/ Fixes: 0c18f2ecfcc2 ("sched/uclamp: Fix wrong implementation of cpu.uclamp.min") Reported-by: Xuewen Yan <xuewen.yan94@gmail.com> Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210617165155.3774110-1-qais.yousef@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:53:24 +02:00
Vincent Donnefort	a3ddf1fb37	sched/rt: Fix Deadline utilization tracking during policy change [ Upstream commit d7d607096ae6d378b4e92d49946d22739c047d4c ] DL keeps track of the utilization on a per-rq basis with the structure avg_dl. This utilization is updated during task_tick_dl(), put_prev_task_dl() and set_next_task_dl(). However, when the current running task changes its policy, set_next_task_dl() which would usually take care of updating the utilization when the rq starts running DL tasks, will not see a such change, leaving the avg_dl structure outdated. When that very same task will be dequeued later, put_prev_task_dl() will then update the utilization, based on a wrong last_update_time, leading to a huge spike in the DL utilization signal. The signal would eventually recover from this issue after few ms. Even if no DL tasks are run, avg_dl is also updated in __update_blocked_others(). But as the CPU capacity depends partly on the avg_dl, this issue has nonetheless a significant impact on the scheduler. Fix this issue by ensuring a load update when a running task changes its policy to DL. Fixes: `3727e0e` ("sched/dl: Add dl_rq utilization tracking") Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/1624271872-211872-3-git-send-email-vincent.donnefort@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:53:24 +02:00
Vincent Donnefort	3fb53be07f	sched/rt: Fix RT utilization tracking during policy change [ Upstream commit fecfcbc288e9f4923f40fd23ca78a6acdc7fdf6c ] RT keeps track of the utilization on a per-rq basis with the structure avg_rt. This utilization is updated during task_tick_rt(), put_prev_task_rt() and set_next_task_rt(). However, when the current running task changes its policy, set_next_task_rt() which would usually take care of updating the utilization when the rq starts running RT tasks, will not see a such change, leaving the avg_rt structure outdated. When that very same task will be dequeued later, put_prev_task_rt() will then update the utilization, based on a wrong last_update_time, leading to a huge spike in the RT utilization signal. The signal would eventually recover from this issue after few ms. Even if no RT tasks are run, avg_rt is also updated in __update_blocked_others(). But as the CPU capacity depends partly on the avg_rt, this issue has nonetheless a significant impact on the scheduler. Fix this issue by ensuring a load update when a running task changes its policy to RT. Fixes: `371bf427` ("sched/rt: Add rt_rq utilization tracking") Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/1624271872-211872-2-git-send-email-vincent.donnefort@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:53:24 +02:00
Qais Yousef	8e5ffc1039	sched/uclamp: Fix locking around cpu_util_update_eff() [ Upstream commit 93b73858701fd01de26a4a874eb95f9b7156fd4b ] cpu_cgroup_css_online() calls cpu_util_update_eff() without holding the uclamp_mutex or rcu_read_lock() like other call sites, which is a mistake. The uclamp_mutex is required to protect against concurrent reads and writes that could update the cgroup hierarchy. The rcu_read_lock() is required to traverse the cgroup data structures in cpu_util_update_eff(). Surround the caller with the required locks and add some asserts to better document the dependency in cpu_util_update_eff(). Fixes: 7226017ad37a ("sched/uclamp: Fix a bug in propagating uclamp value in new cgroups") Reported-by: Quentin Perret <qperret@google.com> Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210510145032.1934078-3-qais.yousef@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:53:20 +02:00
Qais Yousef	0b199ce65b	sched/uclamp: Fix wrong implementation of cpu.uclamp.min [ Upstream commit 0c18f2ecfcc274a4bcc1d122f79ebd4001c3b445 ] cpu.uclamp.min is a protection as described in cgroup-v2 Resource Distribution Model Documentation/admin-guide/cgroup-v2.rst which means we try our best to preserve the minimum performance point of tasks in this group. See full description of cpu.uclamp.min in the cgroup-v2.rst. But the current implementation makes it a limit, which is not what was intended. For example: tg->cpu.uclamp.min = 20% p0->uclamp[UCLAMP_MIN] = 0 p1->uclamp[UCLAMP_MIN] = 50% Previous Behavior (limit): p0->effective_uclamp = 0 p1->effective_uclamp = 20% New Behavior (Protection): p0->effective_uclamp = 20% p1->effective_uclamp = 50% Which is inline with how protections should work. With this change the cgroup and per-task behaviors are the same, as expected. Additionally, we remove the confusing relationship between cgroup and !user_defined flag. We don't want for example RT tasks that are boosted by default to max to change their boost value when they attach to a cgroup. If a cgroup wants to limit the max performance point of tasks attached to it, then cpu.uclamp.max must be set accordingly. Or if they want to set different boost value based on cgroup, then sysctl_sched_util_clamp_min_rt_default must be used to NOT boost to max and set the right cpu.uclamp.min for each group to let the RT tasks obtain the desired boost value when attached to that group. As it stands the dependency on !user_defined flag adds an extra layer of complexity that is not required now cpu.uclamp.min behaves properly as a protection. The propagation model of effective cpu.uclamp.min in child cgroups as implemented by cpu_util_update_eff() is still correct. The parent protection sets an upper limit of what the child cgroups will effectively get. Fixes: `3eac870a32` (sched/uclamp: Use TG's clamps to restrict TASK's clamps) Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210510145032.1934078-2-qais.yousef@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:53:20 +02:00
Petr Mladek	a3aab894d9	kthread_worker: fix return value when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync() [ Upstream commit d71ba1649fa3c464c51ec7163e4b817345bff2c7 ] kthread_mod_delayed_work() might race with kthread_cancel_delayed_work_sync() or another kthread_mod_delayed_work() call. The function lets the other operation win when it sees work->canceling counter set. And it returns @false. But it should return @true as it is done by the related workqueue API, see mod_delayed_work_on(). The reason is that the return value might be used for reference counting. It has to distinguish the case when the number of queued works has changed or stayed the same. The change is safe. kthread_mod_delayed_work() return value is not checked anywhere at the moment. Link: https://lore.kernel.org/r/20210521163526.GA17916@redhat.com Link: https://lkml.kernel.org/r/20210610133051.15337-4-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Reported-by: Oleg Nesterov <oleg@redhat.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Minchan Kim <minchan@google.com> Cc: <jenhaochen@google.com> Cc: Martin Liu <liumartin@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:53:19 +02:00
Paul E. McKenney	dba9cda5aa	clocksource: Retry clock read if long delays detected [ Upstream commit db3a34e17433de2390eb80d436970edcebd0ca3e ] When the clocksource watchdog marks a clock as unstable, this might be due to that clock being unstable or it might be due to delays that happen to occur between the reads of the two clocks. Yes, interrupts are disabled across those two reads, but there are no shortage of things that can delay interrupts-disabled regions of code ranging from SMI handlers to vCPU preemption. It would be good to have some indication as to why the clock was marked unstable. Therefore, re-read the watchdog clock on either side of the read from the clock under test. If the watchdog clock shows an excessive time delta between its pair of reads, the reads are retried. The maximum number of retries is specified by a new kernel boot parameter clocksource.max_cswd_read_retries, which defaults to three, that is, up to four reads, one initial and up to three retries. If more than one retry was required, a message is printed on the console (the occasional single retry is expected behavior, especially in guest OSes). If the maximum number of retries is exceeded, the clock under test will be marked unstable. However, the probability of this happening due to various sorts of delays is quite small. In addition, the reason (clock-read delays) for the unstable marking will be apparent. Reported-by: Chris Mason <clm@fb.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Feng Tang <feng.tang@intel.com> Link: https://lore.kernel.org/r/20210527190124.440372-1-paulmck@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:53:18 +02:00
Boqun Feng	2ef6cd6e48	lockding/lockdep: Avoid to find wrong lock dep path in check_irq_usage() [ Upstream commit 7b1f8c6179769af6ffa055e1169610b51d71edd5 ] In the step #3 of check_irq_usage(), we seach backwards to find a lock whose usage conflicts the usage of @target_entry1 on safe/unsafe. However, we should only keep the irq-unsafe usage of @target_entry1 into consideration, because it could be a case where a lock is hardirq-unsafe but soft-safe, and in check_irq_usage() we find it because its hardirq-unsafe could result into a hardirq-safe-unsafe deadlock, but currently since we don't filter out the other usage bits, so we may find a lock dependency path softirq-unsafe -> softirq-safe, which in fact doesn't cause a deadlock. And this may cause misleading lockdep splats. Fix this by only keeping LOCKF_ENABLED_IRQ_ALL bits when we try the backwards search. Reported-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210618170110.3699115-4-boqun.feng@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:53:15 +02:00
Boqun Feng	1b45a85262	locking/lockdep: Fix the dep path printing for backwards BFS [ Upstream commit 69c7a5fb2482636f525f016c8333fdb9111ecb9d ] We use the same code to print backwards lock dependency path as the forwards lock dependency path, and this could result into incorrect printing because for a backwards lock_list ->trace is not the call trace where the lock of ->class is acquired. Fix this by introducing a separate function on printing the backwards dependency path. Also add a few comments about the printing while we are at it. Reported-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210618170110.3699115-2-boqun.feng@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:53:15 +02:00
Odin Ugedal	432188f626	sched/fair: Fix ascii art by relpacing tabs [ Upstream commit 08f7c2f4d0e9f4283f5796b8168044c034a1bfcb ] When using something other than 8 spaces per tab, this ascii art makes not sense, and the reader might end up wondering what this advanced equation "is". Signed-off-by: Odin Ugedal <odin@uged.al> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20210518125202.78658-4-odin@uged.al Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:53:12 +02:00
Steven Rostedt (VMware)	c65755f595	tracepoint: Add tracepoint_probe_register_may_exist() for BPF tracing commit 9913d5745bd720c4266805c8d29952a3702e4eca upstream. All internal use cases for tracepoint_probe_register() is set to not ever be called with the same function and data. If it is, it is considered a bug, as that means the accounting of handling tracepoints is corrupted. If the function and data for a tracepoint is already registered when tracepoint_probe_register() is called, it will call WARN_ON_ONCE() and return with EEXISTS. The BPF system call can end up calling tracepoint_probe_register() with the same data, which now means that this can trigger the warning because of a user space process. As WARN_ON_ONCE() should not be called because user space called a system call with bad data, there needs to be a way to register a tracepoint without triggering a warning. Enter tracepoint_probe_register_may_exist(), which can be called, but will not cause a WARN_ON() if the probe already exists. It will still error out with EEXIST, which will then be sent to the user space that performed the BPF system call. This keeps the previous testing for issues with other users of the tracepoint code, while letting BPF call it with duplicated data and not warn about it. Link: https://lore.kernel.org/lkml/20210626135845.4080-1-penguin-kernel@I-love.SAKURA.ne.jp/ Link: https://syzkaller.appspot.com/bug?id=41f4318cf01762389f4d1c1c459da4f542fe5153 Cc: stable@vger.kernel.org Fixes: `c4f6699dfc` ("bpf: introduce BPF_RAW_TRACEPOINT") Reported-by: syzbot <syzbot+721aa903751db87aa244@syzkaller.appspotmail.com> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: syzbot+721aa903751db87aa244@syzkaller.appspotmail.com Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-14 16:53:08 +02:00
Steven Rostedt (VMware)	acf8494ba5	tracing/histograms: Fix parsing of "sym-offset" modifier commit 26c563731056c3ee66f91106c3078a8c36bb7a9e upstream. With the addition of simple mathematical operations (plus and minus), the parsing of the "sym-offset" modifier broke, as it took the '-' part of the "sym-offset" as a minus, and tried to break it up into a mathematical operation of "field.sym - offset", in which case it failed to parse (unless the event had a field called "offset"). Both .sym and .sym-offset modifiers should not be entered into mathematical calculations anyway. If ".sym-offset" is found in the modifier, then simply make it not an operation that can be calculated on. Link: https://lkml.kernel.org/r/20210707110821.188ae255@oasis.local.home Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: stable@vger.kernel.org Fixes: `100719dcef` ("tracing: Add simple expression support to hist triggers") Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-14 16:53:07 +02:00
Andrey Zhizhikin	3aef3ef268	Linux 5.4.129 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAmDcbxkACgkQ3qZv95d3 LNxZMBAArNPLhVYdEDDFosb6Y/5RGjjZ/79OGHH0p5YiTo8D+wBHi+wXRl5Jp0PA 3YVVU8lDTbeDm7E7uWeduWjFwEpsPBL8395scbhC6VR3PfnyunjarVXZgi6EHnMl p6HjXXtQ1jTrdDSziGDIhZVQT5FGb2/MMx9m69mfi5BTLjGfWy8chHFbC2GZszlp Znu9syjisUBbc4I4XHFgXw0hoQSSig6SUTZCrdTpIW/PZ0swfl8ZPxREh0CZNMpw Y2orRt+oHlkWPw1/sSkoTE1PRvXwNWFXyw5caOu846jAfhKtxO54SsqJqhM7VLHZ pdH4eb6q7AFyt0A62HkIqa5oabs5Vk9G24b8m5ggc2F/UTkHqgwUcMCud0d3DYL0 Q7OEAmThQzHHKJ+CeNRJLsiKqVBNHmeS24B+ELldlAiX22vLr9pUsIb342Au1ZjR S3BTnneAbYGBv4qUoV2yUF9wQ/LxsFMSl/vmjCBOxg7c3LbKYChUwskYnvd6EwWj ObCyLU6FK9HWXSBSp/X+irlF1CLla+HuOC+Aej2U5a8DtmHId4LHMeq/XOxZ9s/8 QUoX4rh5P+TJ8PIiTqXKrQo5rnR79MiYssIhUozKTdt9ZoMtXzI4mVLXN/yzAVD9 v4aWYx8m2x17Wq+ptaLMSTSed4m3c25uEl4MucLBmKQV8ClAxW8= =Sijo -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmDchOoACgkQ7G51OISz Hs3s9Q/+OfeIcTzOY6qDX53PSxK5iF3xZEiPR5HOSVu1cY4+1tyt0bkYdQJ2N//d NH6YCuOsVI5chRxKN8b6I01U6xxQ0VNvj5RYmuC0AqhhzJBpfhh6FrrGuVxP+dCj aa2twfRZO+Y0yzrIYdSvYdRhJlEhMbKGNyHu871tyQQTlp0eQKy9lBhcCuVSCMso p64lRlK35GDi6TaWQ7R0SYOziWWSHByf4p3h4ZlpyhB7w6yMNjNrts4qSoZb2uOY I453WroS7tWV3qkOZ9FeqjHyg4mjBg87/wSMZ0DbBANinlQZW5YTOjE/WyPossL3 C4jTP8NBLsng+ZfhAaMRfbvxGj5gbY0ghSOmsuglAOQDd+tlTfB23pkgHFlj6aie FUplJnTLd8VXPWkoNVKH6GfG2rZAUtQKM4EWDPt2KK3Dch9W3YH1bHL9EyxR0G7f aDBHM9egTyPOYNcfeVDj0KEJTrOxkmCcYz83D+eEDGKHi//Gcb+k9Gsg6q8X6Yxa ejFx5RBM5SW57LA70ZK3+HWhqfqcILYJT7Lxoue7bH+15vSGnTkmJKS80OUrP8w8 y9gQ9WNsmjQUHvtKPAIZUZYlUS05FhtQ+faC8XyrLnfZj+ZCGBZor/KsNe0SP/UN bAYudfCn0YkjGfTNcljOe4oAnwmnVKAjGlQkHOFw+llmM0czEBk= =vpHo -----END PGP SIGNATURE----- Merge tag 'v5.4.129' into 5.4-2.3.x-imx Linux 5.4.129 Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>	2021-06-30 14:51:20 +00:00
Hugh Dickins	61168eafe0	mm, futex: fix shared futex pgoff on shmem huge page [ Upstream commit fe19bd3dae3d15d2fbfdb3de8839a6ea0fe94264 ] If more than one futex is placed on a shmem huge page, it can happen that waking the second wakes the first instead, and leaves the second waiting: the key's shared.pgoff is wrong. When 3.11 commit `13d60f4b6a` ("futex: Take hugepages into account when generating futex_key"), the only shared huge pages came from hugetlbfs, and the code added to deal with its exceptional page->index was put into hugetlb source. Then that was missed when 4.8 added shmem huge pages. page_to_pgoff() is what others use for this nowadays: except that, as currently written, it gives the right answer on hugetlbfs head, but nonsense on hugetlbfs tails. Fix that by calling hugetlbfs-specific hugetlb_basepage_index() on PageHuge tails as well as on head. Yes, it's unconventional to declare hugetlb_basepage_index() there in pagemap.h, rather than in hugetlb.h; but I do not expect anything but page_to_pgoff() ever to need it. [akpm@linux-foundation.org: give hugetlb_basepage_index() prototype the correct scope] Link: https://lkml.kernel.org/r/b17d946b-d09-326e-b42a-52884c36df32@google.com Fixes: `800d8c63b2` ("shmem: add huge pages support") Reported-by: Neel Natu <neelnatu@google.com> Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Zhang Yi <wetpzy@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Note on stable backport: leave redundant #include <linux/hugetlb.h> in kernel/futex.c, to avoid conflict over the header files included. Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-30 08:47:55 -04:00
Petr Mladek	42f11f0fe9	kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync() commit 5fa54346caf67b4b1b10b1f390316ae466da4d53 upstream. The system might hang with the following backtrace: schedule+0x80/0x100 schedule_timeout+0x48/0x138 wait_for_common+0xa4/0x134 wait_for_completion+0x1c/0x2c kthread_flush_work+0x114/0x1cc kthread_cancel_work_sync.llvm.16514401384283632983+0xe8/0x144 kthread_cancel_delayed_work_sync+0x18/0x2c xxxx_pm_notify+0xb0/0xd8 blocking_notifier_call_chain_robust+0x80/0x194 pm_notifier_call_chain_robust+0x28/0x4c suspend_prepare+0x40/0x260 enter_state+0x80/0x3f4 pm_suspend+0x60/0xdc state_store+0x108/0x144 kobj_attr_store+0x38/0x88 sysfs_kf_write+0x64/0xc0 kernfs_fop_write_iter+0x108/0x1d0 vfs_write+0x2f4/0x368 ksys_write+0x7c/0xec It is caused by the following race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync(): CPU0 CPU1 Context: Thread A Context: Thread B kthread_mod_delayed_work() spin_lock() __kthread_cancel_work() spin_unlock() del_timer_sync() kthread_cancel_delayed_work_sync() spin_lock() __kthread_cancel_work() spin_unlock() del_timer_sync() spin_lock() work->canceling++ spin_unlock spin_lock() queue_delayed_work() // dwork is put into the worker->delayed_work_list spin_unlock() kthread_flush_work() // flush_work is put at the tail of the dwork wait_for_completion() Context: IRQ kthread_delayed_work_timer_fn() spin_lock() list_del_init(&work->node); spin_unlock() BANG: flush_work is not longer linked and will never get proceed. The problem is that kthread_mod_delayed_work() checks work->canceling flag before canceling the timer. A simple solution is to (re)check work->canceling after __kthread_cancel_work(). But then it is not clear what should be returned when __kthread_cancel_work() removed the work from the queue (list) and it can't queue it again with the new @delay. The return value might be used for reference counting. The caller has to know whether a new work has been queued or an existing one was replaced. The proper solution is that kthread_mod_delayed_work() will remove the work from the queue (list) _only_ when work->canceling is not set. The flag must be checked after the timer is stopped and the remaining operations can be done under worker->lock. Note that kthread_mod_delayed_work() could remove the timer and then bail out. It is fine. The other canceling caller needs to cancel the timer as well. The important thing is that the queue (list) manipulation is done atomically under worker->lock. Link: https://lkml.kernel.org/r/20210610133051.15337-3-pmladek@suse.com Fixes: `9a6b06c8d9` ("kthread: allow to modify delayed kthread work") Signed-off-by: Petr Mladek <pmladek@suse.com> Reported-by: Martin Liu <liumartin@google.com> Cc: <jenhaochen@google.com> Cc: Minchan Kim <minchan@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-30 08:47:51 -04:00
Petr Mladek	06ab015d18	kthread_worker: split code for canceling the delayed work timer commit 34b3d5344719d14fd2185b2d9459b3abcb8cf9d8 upstream. Patch series "kthread_worker: Fix race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync()". This patchset fixes the race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync() including proper return value handling. This patch (of 2): Simple code refactoring as a preparation step for fixing a race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync(). It does not modify the existing behavior. Link: https://lkml.kernel.org/r/20210610133051.15337-2-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: <jenhaochen@google.com> Cc: Martin Liu <liumartin@google.com> Cc: Minchan Kim <minchan@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-30 08:47:51 -04:00
Mimi Zohar	e2dc07ca4e	module: limit enabling module.sig_enforce [ Upstream commit 0c18f29aae7ce3dadd26d8ee3505d07cc982df75 ] Irrespective as to whether CONFIG_MODULE_SIG is configured, specifying "module.sig_enforce=1" on the boot command line sets "sig_enforce". Only allow "sig_enforce" to be set when CONFIG_MODULE_SIG is configured. This patch makes the presence of /sys/module/module/parameters/sig_enforce dependent on CONFIG_MODULE_SIG=y. Fixes: `fda784e50a` ("module: export module signature enforcement status") Reported-by: Nayna Jain <nayna@linux.ibm.com> Tested-by: Mimi Zohar <zohar@linux.ibm.com> Tested-by: Jessica Yu <jeyu@kernel.org> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Jessica Yu <jeyu@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-06-30 08:47:42 -04:00
Andrey Zhizhikin	db8ff65069	This is the 5.4.128 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmDTLA4ACgkQONu9yGCS aT45xg/+IvxFaIOtutEBkFCJvEurRWSozjBKAfX9xtJQGSSKVyDvh7GZWfEXMxZc oNf8DWQKvaiZj2mRdgYp6Ilo27Ps6aN3vCo09z+U3mfGQLMbNpPYEvSq6Twl26NB 8lL8b++0Jo7P+eOALHohBS125/E0etqhoc2HXDFp6pfksj6J7klxlyQ2NX9Ih8xm l7Cto5flCHM9g20/CNsqxXPWiuBKnzSvp9YH9HMDgjOV6YSktLGTHAJ8omjPm0V/ pQVFOo4Kyx34exdA/IzrM/yV4iDThVtwL6+bNErWtl6LwiIcNK3esARYTNjbBBhK W156adxp6kl6LqMADr/y77WqvcH6H2PhpRnMj+6t21FpK7cTbXfqvxBfpOvE1Buh in95LJN1Iins1PTozBVHcUIpdESO5AN8/2aHq0LRLmVbaLlo6aj+sjdHNPvf7HwW 8LDHtpGNao/spMuZmvvH+6i3iwuciINCRY9TVBDgkT5LhWhRHBl6+uSLEX/d+s3Z 663Q6HPu+cfubR7UC8+QsMMtf7KD2yvQuadAz6n/Z41vvSYIUHPGsYtZUmsef3jP n4CTAmGavtyR5jaQNkuw8nnIn7cthONw94foFheBH0doxmkXPKcwqmWO9DH77n58 unMT31ArVg9ObrO/YmLjEaV9X7VlfRf6yw7tey1RJXgrSD3nwgk= =9+GF -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmDTTjIACgkQ7G51OISz Hs1mJw/5AaIlIv+kMC0qP9aG5Y1uoUQ+fewr4Ofo9Kt8QlVcogdeM16URF7O2CDG mICWVXRqY+xWl/84XWJs11RRxfMnXp85myi1jyalOFDYojJp2qOkOyEi8zMg7wui N1kkPxEworm6ILrZRKNgT36ZfSEGAD2ogkp4wS/Fldwh2RNHtGUSw3xAPcqg0yDq +Ve3V06MDul/dyRup3MYtQ/Kxb5wCc+Lfwh+eXYZIu/DfXy3E4K3w8YB7NPazlTw dBlWU/S3Y0uyoq+DvfFQ+nL0GTe0ezGQB5n5n1eNRWg+mEn3IiWXkrBJfha2Swcl vv+Fy/5ro9elcn1oQF5uvwDoGHhq/KkR7TjRICY9zmVvJ79IcqZJ/Bdg5CEXM+xc EvZDAdXn26y2xKZJzKGQSe7QJurOuQpdW497h1DLGIhDs2sGDjhXuISPcNpjPJKG PdgpJ9kG4v2cYnW58UTdzQKNFg7jjFUiiQV/IIoiVlxhg7dVQnSKCgXHcngiZiuQ eUR9p7ahfEOlG2N5qTlJqom0HtiJ6I4a5TDQhXoEYlKfxr9711f2+cramhf2cle+ +G64chvmKcv7Fyx5yIwSJrTJfi6vsuW5VMfXrvGIqjFUEjIGe9rQMhWH34A7s5tG RD0Tg90JJr0rNsNa+LZNPbRynbdPBbrVWdrVyXN+9HDs40oe8BY= =f7qZ -----END PGP SIGNATURE----- Merge tag 'v5.4.128' into 5.4-2.3.x-imx This is the 5.4.128 stable release Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>	2021-06-23 15:07:27 +00:00
Steven Rostedt (VMware)	c7660ab812	tracing: Do no increment trace_clock_global() by one commit 89529d8b8f8daf92d9979382b8d2eb39966846ea upstream. The trace_clock_global() tries to make sure the events between CPUs is somewhat in order. A global value is used and updated by the latest read of a clock. If one CPU is ahead by a little, and is read by another CPU, a lock is taken, and if the timestamp of the other CPU is behind, it will simply use the other CPUs timestamp. The lock is also only taken with a "trylock" due to tracing, and strange recursions can happen. The lock is not taken at all in NMI context. In the case where the lock is not able to be taken, the non synced timestamp is returned. But it will not be less than the saved global timestamp. The problem arises because when the time goes "backwards" the time returned is the saved timestamp plus 1. If the lock is not taken, and the plus one to the timestamp is returned, there's a small race that can cause the time to go backwards! CPU0 CPU1 ---- ---- trace_clock_global() { ts = clock() [ 1000 ] trylock(clock_lock) [ success ] global_ts = ts; [ 1000 ] <interrupted by NMI> trace_clock_global() { ts = clock() [ 999 ] if (ts < global_ts) ts = global_ts + 1 [ 1001 ] trylock(clock_lock) [ fail ] return ts [ 1001] } unlock(clock_lock); return ts; [ 1000 ] } trace_clock_global() { ts = clock() [ 1000 ] if (ts < global_ts) [ false 1000 == 1000 ] trylock(clock_lock) [ success ] global_ts = ts; [ 1000 ] unlock(clock_lock) return ts; [ 1000 ] } The above case shows to reads of trace_clock_global() on the same CPU, but the second read returns one less than the first read. That is, time when backwards, and this is not what is allowed by trace_clock_global(). This was triggered by heavy tracing and the ring buffer checker that tests for the clock going backwards: Ring buffer clock went backwards: 20613921464 -> 20613921463 ------------[ cut here ]------------ WARNING: CPU: 2 PID: 0 at kernel/trace/ring_buffer.c:3412 check_buffer+0x1b9/0x1c0 Modules linked in: [..] [CPU: 2]TIME DOES NOT MATCH expected:20620711698 actual:20620711697 delta:6790234 before:20613921463 after:20613921463 [20613915818] PAGE TIME STAMP [20613915818] delta:0 [20613915819] delta:1 [20613916035] delta:216 [20613916465] delta:430 [20613916575] delta:110 [20613916749] delta:174 [20613917248] delta:499 [20613917333] delta:85 [20613917775] delta:442 [20613917921] delta:146 [20613918321] delta:400 [20613918568] delta:247 [20613918768] delta:200 [20613919306] delta:538 [20613919353] delta:47 [20613919980] delta:627 [20613920296] delta:316 [20613920571] delta:275 [20613920862] delta:291 [20613921152] delta:290 [20613921464] delta:312 [20613921464] delta:0 TIME EXTEND [20613921464] delta:0 This happened more than once, and always for an off by one result. It also started happening after commit aafe104aa9096 was added. Cc: stable@vger.kernel.org Fixes: aafe104aa9096 ("tracing: Restructure trace_clock_global() to never block") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-23 14:41:28 +02:00
Steven Rostedt (VMware)	79894a5d75	tracing: Do not stop recording comms if the trace file is being read commit 4fdd595e4f9a1ff6d93ec702eaecae451cfc6591 upstream. A while ago, when the "trace" file was opened, tracing was stopped, and code was added to stop recording the comms to saved_cmdlines, for mapping of the pids to the task name. Code has been added that only records the comm if a trace event occurred, and there's no reason to not trace it if the trace file is opened. Cc: stable@vger.kernel.org Fixes: `7ffbd48d5c` ("tracing: Cache comms only after an event occurred") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-23 14:41:28 +02:00
Steven Rostedt (VMware)	4ab1152bb7	tracing: Do not stop recording cmdlines when tracing is off commit 85550c83da421fb12dc1816c45012e1e638d2b38 upstream. The saved_cmdlines is used to map pids to the task name, such that the output of the tracing does not just show pids, but also gives a human readable name for the task. If the name is not mapped, the output looks like this: <...>-1316 [005] ...2 132.044039: ... Instead of this: gnome-shell-1316 [005] ...2 132.044039: ... The names are updated when tracing is running, but are skipped if tracing is stopped. Unfortunately, this stops the recording of the names if the top level tracer is stopped, and not if there's other tracers active. The recording of a name only happens when a new event is written into a ring buffer, so there is no need to test if tracing is on or not. If tracing is off, then no event is written and no need to test if tracing is off or not. Remove the check, as it hides the names of tasks for events in the instance buffers. Cc: stable@vger.kernel.org Fixes: `7ffbd48d5c` ("tracing: Cache comms only after an event occurred") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-23 14:41:28 +02:00
Andrey Zhizhikin	5134d8a627	This is the 5.4.126 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmDJy8EACgkQONu9yGCS aT4iaBAAxDY3xDhlTee0ajZwowSVBgjGeVuVFkoKFHNFHu190lXikaiaULlkcfot MPPjUJMH4jM+i7eUC4LWKbxgzphMyAQszAZSd7MWNzgalOids0D77VqZXwEaZIqI wam6/vd9lIeJNf/H8kinm+SDozuv0MkECSHCquTaivvCTYTMK9qKwUPk+E8h8t2o dJ4kJJY24mDRp6F+jnY0nR1CaHXSltxlMG8Viy8HJLsLKe7cRGhkXzCdLnf1m5ea ppYiaLdsD1prlrNDnXWhd8zzRYh4AEnTpEMZuFt4U0oQ7L7D0mCZHw/13SyP21z8 dzE54J4d368RZ0tT5XgiMmxd+4eBYo6Xmlj3+DmADUUHeSddXuJcLLlzDUWgHak2 5z+eqS8yh1lU2NFcMCqcgia1XBx+R1n+Ibt8IF/nyh8/PhdQ22wTtl+lfvR/G1pb w6GIoFB/jvuQ+x/RoWcmaeaqu0TKlIbtqwKOgcTNGtNTh7H7V5sDHr9VtHK5hKVE 6Jdyq+lmIgYIV2wCV3vmfLLlr0NIEDmV7NET/q0VgyWhp+kNSBVYivFeKFGMQba3 ES5LseCUwTzpCv7uFO48gQK+0v3lqmkgWjU1tFDGZKCh9Dlj6eNrwntYyIJPk2hx Dh5u3nrOeRuVYRP5W6ejWOTDDA12+diwtSMBM47vtHMYr+gBUXk= =JTCj -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmDKAVwACgkQ7G51OISz Hs372A//cLcSXeASJjtYGAEqf9Yj0ZMHrY7NSb52u6/oz95VP3zVlxVAsyxhHcaN jXlcne5V5oTx86YI0QNs/wGEO/ErwhRFlN6t8NX0bmBBOj9PrvdguJlA4HSVk4xs wdBDgLtWwAKfv508KXfsX58le7oGszXst3sb9ovTlWwb56Ux4E9nIvWwvdao2Z3v hfs8uzpkhmgEHdRvrFstpxJH7ewmpG24Pr6extkyk4gfXAXVfa7/nOsRceVbMZ8K ln9Yam6MaDB4HH29enXVhK+b9rL7UfM4wwW5JSPlXhH2REfYze+r2HYorN2Bm8mR snClc+NlrpHyE8oFeydu3ZX1caXMnM07EaG6l/l33Tot8PczaTOeSZOezHFEGFAY juOE0YgqxemlNCCJUKNXfdkkPhd5wfrlzJjbNv+84hahUUAT5ET4HmuSUcTzvPkv tl61nmvsRzx635YNs5P7Z0Q5sWwxWKAMoTRM8+0DA1Uov+Hevuy2EHtg01YIsMAc E4RloWALKgbJ/yYG/hlx/xG6JAu8dMkFoaG7d6xCpNu/5oX3QnnP5EgapOQio/fr HZKO96hGdUTbwNkgbtnN6djGIcEsgnCc8JYg6h75ROalB+HHVszynKW97AwaAWTQ MV6oT1eYAJ1vjrP+XqH0ibkV10tC9Vrxs/hEeO6r7vvlVS0zytE= =jJ3r -----END PGP SIGNATURE----- Merge tag 'v5.4.126' into 5.4-2.3.x-imx This is the 5.4.126 stable release Conflicts: - drivers/usb/cdns3/gadget.c: Skip upstream commit f0509160f25e3 ("usb: cdns3: Fix runtime PM imbalance on error") as the implementation is not present in the NXP tree to apply it. Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>	2021-06-16 13:49:01 +00:00
Liangyan	d63f00ec90	tracing: Correct the length check which causes memory corruption commit 3e08a9f9760f4a70d633c328a76408e62d6f80a3 upstream. We've suffered from severe kernel crashes due to memory corruption on our production environment, like, Call Trace: [1640542.554277] general protection fault: 0000 [#1] SMP PTI [1640542.554856] CPU: 17 PID: 26996 Comm: python Kdump: loaded Tainted:G [1640542.556629] RIP: 0010:kmem_cache_alloc+0x90/0x190 [1640542.559074] RSP: 0018:ffffb16faa597df8 EFLAGS: 00010286 [1640542.559587] RAX: 0000000000000000 RBX: 0000000000400200 RCX: 0000000006e931bf [1640542.560323] RDX: 0000000006e931be RSI: 0000000000400200 RDI: ffff9a45ff004300 [1640542.560996] RBP: 0000000000400200 R08: 0000000000023420 R09: 0000000000000000 [1640542.561670] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff9a20608d [1640542.562366] R13: ffff9a45ff004300 R14: ffff9a45ff004300 R15: 696c662f65636976 [1640542.563128] FS: 00007f45d7c6f740(0000) GS:ffff9a45ff840000(0000) knlGS:0000000000000000 [1640542.563937] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1640542.564557] CR2: 00007f45d71311a0 CR3: 000000189d63e004 CR4: 00000000003606e0 [1640542.565279] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [1640542.566069] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [1640542.566742] Call Trace: [1640542.567009] anon_vma_clone+0x5d/0x170 [1640542.567417] __split_vma+0x91/0x1a0 [1640542.567777] do_munmap+0x2c6/0x320 [1640542.568128] vm_munmap+0x54/0x70 [1640542.569990] __x64_sys_munmap+0x22/0x30 [1640542.572005] do_syscall_64+0x5b/0x1b0 [1640542.573724] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [1640542.575642] RIP: 0033:0x7f45d6e61e27 James Wang has reproduced it stably on the latest 4.19 LTS. After some debugging, we finally proved that it's due to ftrace buffer out-of-bound access using a debug tool as follows: [ 86.775200] BUG: Out-of-bounds write at addr 0xffff88aefe8b7000 [ 86.780806] no_context+0xdf/0x3c0 [ 86.784327] __do_page_fault+0x252/0x470 [ 86.788367] do_page_fault+0x32/0x140 [ 86.792145] page_fault+0x1e/0x30 [ 86.795576] strncpy_from_unsafe+0x66/0xb0 [ 86.799789] fetch_memory_string+0x25/0x40 [ 86.804002] fetch_deref_string+0x51/0x60 [ 86.808134] kprobe_trace_func+0x32d/0x3a0 [ 86.812347] kprobe_dispatcher+0x45/0x50 [ 86.816385] kprobe_ftrace_handler+0x90/0xf0 [ 86.820779] ftrace_ops_assist_func+0xa1/0x140 [ 86.825340] 0xffffffffc00750bf [ 86.828603] do_sys_open+0x5/0x1f0 [ 86.832124] do_syscall_64+0x5b/0x1b0 [ 86.835900] entry_SYSCALL_64_after_hwframe+0x44/0xa9 commit b220c049d519 ("tracing: Check length before giving out the filter buffer") adds length check to protect trace data overflow introduced in `0fc1b09ff1`, seems that this fix can't prevent overflow entirely, the length check should also take the sizeof entry->array[0] into account, since this array[0] is filled the length of trace data and occupy addtional space and risk overflow. Link: https://lkml.kernel.org/r/20210607125734.1770447-1-liangyan.peng@linux.alibaba.com Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@redhat.com> Cc: Xunlei Pang <xlpang@linux.alibaba.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Fixes: b220c049d519 ("tracing: Check length before giving out the filter buffer") Reviewed-by: Xunlei Pang <xlpang@linux.alibaba.com> Reviewed-by: yinbinbin <yinbinbin@alibabacloud.com> Reviewed-by: Wetp Zhang <wetp.zy@linux.alibaba.com> Tested-by: James Wang <jnwang@linux.alibaba.com> Signed-off-by: Liangyan <liangyan.peng@linux.alibaba.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-16 11:59:46 +02:00
Steven Rostedt (VMware)	7e4e824b10	ftrace: Do not blindly read the ip address in ftrace_bug() commit 6c14133d2d3f768e0a35128faac8aa6ed4815051 upstream. It was reported that a bug on arm64 caused a bad ip address to be used for updating into a nop in ftrace_init(), but the error path (rightfully) returned -EINVAL and not -EFAULT, as the bug caused more than one error to occur. But because -EINVAL was returned, the ftrace_bug() tried to report what was at the location of the ip address, and read it directly. This caused the machine to panic, as the ip was not pointing to a valid memory address. Instead, read the ip address with copy_from_kernel_nofault() to safely access the memory, and if it faults, report that the address faulted, otherwise report what was in that location. Link: https://lore.kernel.org/lkml/20210607032329.28671-1-mark-pk.tsai@mediatek.com/ Cc: stable@vger.kernel.org Fixes: `05736a427f` ("ftrace: warn on failure to disable mcount callers") Reported-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-16 11:59:45 +02:00
Vincent Guittot	71c751cbb9	sched/fair: Make sure to update tg contrib for blocked load commit 02da26ad5ed6ea8680e5d01f20661439611ed776 upstream. During the update of fair blocked load (__update_blocked_fair()), we update the contribution of the cfs in tg->load_avg if cfs_rq's pelt has decayed. Nevertheless, the pelt values of a cfs_rq could have been recently updated while propagating the change of a child. In this case, cfs_rq's pelt will not decayed because it has already been updated and we don't update tg->load_avg. __update_blocked_fair ... for_each_leaf_cfs_rq_safe: child cfs_rq update cfs_rq_load_avg() for child cfs_rq ... update_load_avg(cfs_rq_of(se), se, 0) ... update cfs_rq_load_avg() for parent cfs_rq -propagation of child's load makes parent cfs_rq->load_sum becoming null -UPDATE_TG is not set so it doesn't update parent cfs_rq->tg_load_avg_contrib .. for_each_leaf_cfs_rq_safe: parent cfs_rq update cfs_rq_load_avg() for parent cfs_rq - nothing to do because parent cfs_rq has already been updated recently so cfs_rq->tg_load_avg_contrib is not updated ... parent cfs_rq is decayed list_del_leaf_cfs_rq parent cfs_rq - but it still contibutes to tg->load_avg we must set UPDATE_TG flags when propagting pending load to the parent Fixes: `039ae8bcf7` ("sched/fair: Fix O(nr_cgroups) in the load balancing path") Reported-by: Odin Ugedal <odin@uged.al> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Odin Ugedal <odin@uged.al> Link: https://lkml.kernel.org/r/20210527122916.27683-3-vincent.guittot@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-16 11:59:44 +02:00
Marco Elver	26ab08df86	perf: Fix data race between pin_count increment/decrement commit 6c605f8371159432ec61cbb1488dcf7ad24ad19a upstream. KCSAN reports a data race between increment and decrement of pin_count: write to 0xffff888237c2d4e0 of 4 bytes by task 15740 on cpu 1: find_get_context kernel/events/core.c:4617 __do_sys_perf_event_open kernel/events/core.c:12097 [inline] __se_sys_perf_event_open kernel/events/core.c:11933 ... read to 0xffff888237c2d4e0 of 4 bytes by task 15743 on cpu 0: perf_unpin_context kernel/events/core.c:1525 [inline] __do_sys_perf_event_open kernel/events/core.c:12328 [inline] __se_sys_perf_event_open kernel/events/core.c:11933 ... Because neither read-modify-write here is atomic, this can lead to one of the operations being lost, resulting in an inconsistent pin_count. Fix it by adding the missing locking in the CPU-event case. Fixes: `fe4b04fa31` ("perf: Cure task_oncpu_function_call() races") Reported-by: syzbot+142c9018f5962db69c7e@syzkaller.appspotmail.com Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210527104711.2671610-1-elver@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-16 11:59:44 +02:00
Alexander Kuznetsov	4d14a82ef1	cgroup1: don't allow '\n' in renaming commit b7e24eb1caa5f8da20d405d262dba67943aedc42 upstream. cgroup_mkdir() have restriction on newline usage in names: $ mkdir $'/sys/fs/cgroup/cpu/test\ntest2' mkdir: cannot create directory '/sys/fs/cgroup/cpu/test\ntest2': Invalid argument But in cgroup1_rename() such check is missed. This allows us to make /proc/<pid>/cgroup unparsable: $ mkdir /sys/fs/cgroup/cpu/test $ mv /sys/fs/cgroup/cpu/test $'/sys/fs/cgroup/cpu/test\ntest2' $ echo $$ > $'/sys/fs/cgroup/cpu/test\ntest2' $ cat /proc/self/cgroup 11:pids:/ 10:freezer:/ 9:hugetlb:/ 8:cpuset:/ 7:blkio:/user.slice 6:memory:/user.slice 5:net_cls,net_prio:/ 4:perf_event:/ 3:devices:/user.slice 2:cpu,cpuacct:/test test2 1:name=systemd:/ 0::/ Signed-off-by: Alexander Kuznetsov <wwfq@yandex-team.ru> Reported-by: Andrey Krasichkov <buglloc@yandex-team.ru> Acked-by: Dmitry Yakunin <zeil@yandex-team.ru> Cc: stable@vger.kernel.org Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-16 11:59:40 +02:00
Sergey Senozhatsky	f9e7a38d14	wq: handle VM suspension in stall detection [ Upstream commit 940d71c6462e8151c78f28e4919aa8882ff2054e ] If VCPU is suspended (VM suspend) in wq_watchdog_timer_fn() then once this VCPU resumes it will see the new jiffies value, while it may take a while before IRQ detects PVCLOCK_GUEST_STOPPED on this VCPU and updates all the watchdogs via pvclock_touch_watchdogs(). There is a small chance of misreported WQ stalls in the meantime, because new jiffies is time_after() old 'ts + thresh'. wq_watchdog_timer_fn() { for_each_pool(pool, pi) { if (time_after(jiffies, ts + thresh)) { pr_emerg("BUG: workqueue lockup - pool"); } } } Save jiffies at the beginning of this function and use that value for stall detection. If VM gets suspended then we continue using "old" jiffies value and old WQ touch timestamps. If IRQ at some point restarts the stall detection cycle (pvclock_touch_watchdogs()) then old jiffies will always be before new 'ts + thresh'. Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-06-16 11:59:35 +02:00
Shakeel Butt	92215c1f24	cgroup: disable controllers at parse time [ Upstream commit 45e1ba40837ac2f6f4d4716bddb8d44bd7e4a251 ] This patch effectively reverts the commit `a3e72739b7` ("cgroup: fix too early usage of static_branch_disable()"). The commit `6041186a32` ("init: initialize jump labels before command line option parsing") has moved the jump_label_init() before parse_args() which has made the commit `a3e72739b7` unnecessary. On the other hand there are consequences of disabling the controllers later as there are subsystems doing the controller checks for different decisions. One such incident is reported [1] regarding the memory controller and its impact on memory reclaim code. [1] https://lore.kernel.org/linux-mm/921e53f3-4b13-aab8-4a9e-e83ff15371e4@nec.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Reported-by: NOMURA JUNICHI(野村　淳一) <junichi.nomura@nec.com> Signed-off-by: Tejun Heo <tj@kernel.org> Tested-by: Jun'ichi Nomura <junichi.nomura@nec.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-06-16 11:59:35 +02:00
Andrey Zhizhikin	e3ec85be6f	This is the 5.4.123 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmCw0KEACgkQONu9yGCS aT4TxxAAvW/xsJKApUuCTrrdY/YMj96bFT6dcF/17N36Tb+lNRL+Ig+dYlfammlD sp7enkerXrIU+27ngnsJfznJLDXOCUPw77p9H90Kzbo7k3wi7Pt2nqFRyuOwYLRN CLo3gmWNWBReCjYmUK5uzUlGeUnQOtD7pcEMexOVxgMmRsJ1uKU2UGKHCIa8VxS1 /xf8mQJv0lIbm8GLSYzCsP3zYafzc0cY0ST6CvCCX+71ryfKXGTtH0wAUvdvWsTs yvpc7UhFcnW9t2e77w9DmyQRf/nDfsu/ueXqITkJot4fYshA0w9owsNxKdnq/DKV nfIVK/agISYeT44uWOqb9TBUdEz2nWA31eIrXD6/Tf4ZUVw2WQy/UD8l5gUt+G7s Ag4nKSEZP4BAsZbnwnYPUm/qsfLm5l2+Spj4IK6mA7TZKMvjgLiHJIxihYzfeuof q/E9NeH8ejfrO6tnMyTUZzmPRjMImoeAvg+NaldpDlSyDE+DG+l9iIjO66Lg0uif dupHtAXFEjgfqHIRfvq4lW6lRKmUwhUkw0mdGgOSbgjoTktJNdrAVoQgsOIvhYBJ fuhs9PTGT5isekDEotdsj8wuXlVKQ2rMKl98Z1djZXA73INobftgbtFxaVGIhLj/ 7LK6pbnaClT5bHJBi+AvVExOZolPsCxgxaXsESCV4nxppEOjRxA= =+GH7 -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmCz5/oACgkQ7G51OISz Hs3hTg/9H2jbISAaXYNpwhHnBJa4eyinrWYBsZ5r1OlUjc00N6uge8VMzEDAjOO8 Ep8cDwCKJ3nz+h/xwN8D5ERqTCjliTEB88FPxgW5iebn4uTYNLz10rPCVbHu7M6g bK8eNnBCQKFgaRILlG7G6C/5Radm56MoaNV5izfrWjr8xJ3LJEusSkAetl820p3Z FYFLZMPvbe1C5J3wPF85a9WdUHu1lY89vNmac30y9+W4V0ogDHKUX/9gIZTKVNLK G6MkjaOkKscB++sBnLeB/4UT4H/2lKmIY8Q/Lf4jUjm5Pd8DLR9SXxNzyTvaenH7 LkX7/vzxmjrurDyhrd2qk9pMhjs9VvutXuQFHWHMZgDfo3PLpc/TBaWfTCdj/VYR pETYsgiGVJKWw8kDBSlNgooMVuozqj2mAXkYK2ghvjM+975A+rZNobsWCRnU/rER xHiII8br11jFXb8b2EZ4BDCfkQgubsuQmiXKbxVjD11HY6Z7zErAb9UGNuCjKlz3 NxvXtpvjpNG+c95OW0WwgRM9iNCkEwN7GsjhHai9Rd6+olVflzc2r4gVn6sSqVhZ CD2ukwRUEtVxJ3xbd2vZnfp735aB/+UHBm8jQYxIORCNLoIVs9tIGlkeIK92W2A7 oU0PcIwI40DxOwc4a6Ybppx0ePUl6CCxlkn2m3HX+y73CcinS1g= =mDdc -----END PGP SIGNATURE----- Merge tag 'v5.4.123' into 5.4-2.3.x-imx This is the 5.4.123 stable release Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>	2021-05-30 19:31:03 +00:00
Daniel Borkmann	3173c7c807	bpf: No need to simulate speculative domain for immediates commit a7036191277f9fa68d92f2071ddc38c09b1e5ee5 upstream. In 801c6058d14a ("bpf: Fix leakage of uninitialized bpf stack under speculation") we replaced masking logic with direct loads of immediates if the register is a known constant. Given in this case we do not apply any masking, there is also no reason for the operation to be truncated under the speculative domain. Therefore, there is also zero reason for the verifier to branch-off and simulate this case, it only needs to do it for unknown but bounded scalars. As a side-effect, this also enables few test cases that were previously rejected due to simulation under zero truncation. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Piotr Krysiuk <piotras@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-28 13:10:26 +02:00
Daniel Borkmann	2b3cc41d50	bpf: Fix mask direction swap upon off reg sign change commit bb01a1bba579b4b1c5566af24d95f1767859771e upstream. Masking direction as indicated via mask_to_left is considered to be calculated once and then used to derive pointer limits. Thus, this needs to be placed into bpf_sanitize_info instead so we can pass it to sanitize_ptr_alu() call after the pointer move. Piotr noticed a corner case where the off reg causes masking direction change which then results in an incorrect final aux->alu_limit. Fixes: 7fedb63a8307 ("bpf: Tighten speculative pointer arithmetic mask") Reported-by: Piotr Krysiuk <piotras@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Piotr Krysiuk <piotras@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-28 13:10:26 +02:00
Daniel Borkmann	2768f99622	bpf: Wrap aux data inside bpf_sanitize_info container commit 3d0220f6861d713213b015b582e9f21e5b28d2e0 upstream. Add a container structure struct bpf_sanitize_info which holds the current aux info, and update call-sites to sanitize_ptr_alu() to pass it in. This is needed for passing in additional state later on. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Piotr Krysiuk <piotras@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-28 13:10:26 +02:00
Andrey Zhizhikin	d34d22f869	This is the 5.4.122 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmCuHYEACgkQONu9yGCS aT6VkxAAr0kISsNHDXB1tDLsOiPsl+hvFQS7DNw8LRpBN8deqiOMrK2OGnXygAWE q6554BmS7sxJ9oSu4fL+fJpuSTdMPsDEGh5qFFXqdb0wjxw9QaHK1SPbLO0QRMmt OPH9tHtNKY9Udiu1ZXj/HHWkUf41VjBwIUpa1riP1ht7WCPPFAF0yeUjEMDB3ZNe m8CkSDWS6NqFQxQcYBWLTVufVSVyu+MkJS0t50KDQEZFv/12pSRllkJ3M/RdBxNV hvQTIFJr/3jPmk9Q5Vt0ZG2mKCtObYcboDxs5tfKVd03uErMqcchFMpL7DGXBBFx S77URkYra6nJvJJB533SiWYR3zKcihnl8eMmV4NqCTgR+pjf2G7MMEqxJhADvhGu wg5IGqMJID2p7nlkPZtod4pap3VY1zkotKdeTjUm6URnf5G9JkgdvqTUsCPQPuEm WIlEqziZSZxy3bj8mm88116+TyDDb7b9Hu0rz3qYYDOBon2r0uZ+SyfeSD76csnS ncEr2XVSlV12g2WQP/zB+ypLQ8YDJpYcyhAdNS2VQIFgjSxODBUEb76zYYqNHTQC PUrztFbbwJ/iH/SXQjzuRsRB3x4XwNCmRGwTMXTaZYot8ui9Ka4gDY+mZ8T0uTS8 68sFCzb+M+zQf1i72s6Vp9dz5msymSbDQbIf79fJ3lbDA4oVXnA= =vILd -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmCuZ9cACgkQ7G51OISz Hs1wVBAAgGPvST+l20L4AS/D3NtPX8kwnk3JAZJ0YJjSPHfCkSnGmrjNc7PNmbxy uYXOlbUfscJU03Erski3VbSlqLHK7XVr3d4nIVY3ZISjL7iHa6xaRa5SONrwQHvT xUW9OcGgwxLdbBluLi8RmqZH0xf7Ds6uuQRZT5YN35/st7Qj5sKxKQc6IyTHceYb NklevN9xuQGC5PfPeWInRw1y0viSeANRwPlINwLif7vsAVgHoR4ju3Zie/mCIrnI 6GMgMSbJWxWUZ7y2BEgTWRrNatA/eS6KG3Q0vjSzo42XKTTyl673Gq7GqSPqNzyY yjN5MJyi4G1jjOdkFWLZCRjXRssq6BhWJMbaXKYVm/vAN+oqecoaA0xBzUr80z6w ZEqM2j+f7hUwtjIiduDxdApjeaPXnMDINvdPMK2qLgnHDIJobaTLQODdvRRSeK5e LdxLcwnHno08EBeXUO2e2ttTzVolM0p4gr88AsR3Sdnu/HqJxND4bQiot9Nwywo+ 1d247qUjQjx4d3MspJ5Wn3zOE/sEnH4NgNepkUM/lJMwku5P7Dz9WGstPwSbAF4D /19k/c97jUi3cnJIsFBbwEx+bQFR6CAnVIZm/SvsOpOwasF6TcguFjEfdDO2DiV+ VSClDYusmRUSexvRX0kiigZ9JcyBH4y4XnmAe52QypwI/uupvAU= =Lxcg -----END PGP SIGNATURE----- Merge tag 'v5.4.122' into 5.4-2.3.x-imx This is the 5.4.122 stable release Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>	2021-05-26 15:23:00 +00:00
Zqiang	845c2b9d99	locking/mutex: clear MUTEX_FLAGS if wait_list is empty due to signal [ Upstream commit 3a010c493271f04578b133de977e0e5dd2848cea ] When a interruptible mutex locker is interrupted by a signal without acquiring this lock and removed from the wait queue. if the mutex isn't contended enough to have a waiter put into the wait queue again, the setting of the WAITER bit will force mutex locker to go into the slowpath to acquire the lock every time, so if the wait queue is empty, the WAITER bit need to be clear. Fixes: `040a0a3710` ("mutex: Add support for wound/wait style locks") Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Zqiang <qiang.zhang@windriver.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210517034005.30828-1-qiang.zhang@windriver.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-26 12:05:15 +02:00
Oleg Nesterov	670d34d543	ptrace: make ptrace() fail if the tracee changed its pid unexpectedly [ Upstream commit dbb5afad100a828c97e012c6106566d99f041db6 ] Suppose we have 2 threads, the group-leader L and a sub-theread T, both parked in ptrace_stop(). Debugger tries to resume both threads and does ptrace(PTRACE_CONT, T); ptrace(PTRACE_CONT, L); If the sub-thread T execs in between, the 2nd PTRACE_CONT doesn not resume the old leader L, it resumes the post-exec thread T which was actually now stopped in PTHREAD_EVENT_EXEC. In this case the PTHREAD_EVENT_EXEC event is lost, and the tracer can't know that the tracee changed its pid. This patch makes ptrace() fail in this case until debugger does wait() and consumes PTHREAD_EVENT_EXEC which reports old_pid. This affects all ptrace requests except the "asynchronous" PTRACE_INTERRUPT/KILL. The patch doesn't add the new PTRACE_ option to not complicate the API, and I _hope_ this won't cause any noticeable regression: - If debugger uses PTRACE_O_TRACEEXEC and the thread did an exec and the tracer does a ptrace request without having consumed the exec event, it's 100% sure that the thread the ptracer thinks it is targeting does not exist anymore, or isn't the same as the one it thinks it is targeting. - To some degree this patch adds nothing new. In the scenario above ptrace(L) can fail with -ESRCH if it is called after the execing sub-thread wakes the leader up and before it "steals" the leader's pid. Test-case: #include <stdio.h> #include <unistd.h> #include <signal.h> #include <sys/ptrace.h> #include <sys/wait.h> #include <errno.h> #include <pthread.h> #include <assert.h> void tf(void arg) { execve("/usr/bin/true", NULL, NULL); assert(0); return NULL; } int main(void) { int leader = fork(); if (!leader) { kill(getpid(), SIGSTOP); pthread_t th; pthread_create(&th, NULL, tf, NULL); for (;;) pause(); return 0; } waitpid(leader, NULL, WSTOPPED); ptrace(PTRACE_SEIZE, leader, 0, PTRACE_O_TRACECLONE \| PTRACE_O_TRACEEXEC); waitpid(leader, NULL, 0); ptrace(PTRACE_CONT, leader, 0,0); waitpid(leader, NULL, 0); int status, thread = waitpid(-1, &status, 0); assert(thread > 0 && thread != leader); assert(status == 0x80137f); ptrace(PTRACE_CONT, thread, 0,0); /* * waitid() because waitpid(leader, &status, WNOWAIT) does not * report status. Why ???? * * Why WEXITED? because we have another kernel problem connected * to mt-exec. / siginfo_t info; assert(waitid(P_PID, leader, &info, WSTOPPED\|WEXITED\|WNOWAIT) == 0); assert(info.si_pid == leader && info.si_status == 0x0405); / OK, it sleeps in ptrace(PTRACE_EVENT_EXEC == 0x04) */ assert(ptrace(PTRACE_CONT, leader, 0,0) == -1); assert(errno == ESRCH); assert(leader == waitpid(leader, &status, WNOHANG)); assert(status == 0x04057f); assert(ptrace(PTRACE_CONT, leader, 0,0) == 0); return 0; } Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Simon Marchi <simon.marchi@efficios.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Pedro Alves <palves@redhat.com> Acked-by: Simon Marchi <simon.marchi@efficios.com> Acked-by: Jan Kratochvil <jan.kratochvil@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-26 12:05:15 +02:00
Andrey Zhizhikin	2cbb55e591	This is the 5.4.120 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmCkyEcACgkQONu9yGCS aT70Qg//Rv09McvLQ+8E0OilJ7TdT0UthXQFP+uPTu+/HPeHQkCO168cn1hbwD9K i0YfFYB7PqPe/wccHNsmWHSUYCzA9NnwExA84/jofjswkEMMc95x/bow5/xmLe/5 ImkjODPVHuQWewgMfbSmNu7Br4wmQC5U/K4r7hp/Aa0FdTjcHMI6Zw40FGbJrWmq kiqhW9CeagKbxWrihQNLrSB4E5CdpNNkug/zVus2n9jlFT4tltNGSd7bPsxrp7LN EdTfayyPUVZeCoysTNA0WZgz47f+z47vAdIlDHzWCIOZcM1RnJXKA5kFXRf8Fnfa +hyvaHSDqYGdRgZxYMcXLL+/cS4foQ/8iQxZBCMomABM0MNUuoJ5tYR6GVetlRcR 46ZC/5OAvNoKY2Kj4Ky4ROF7aMR3NkYCY6wHUVRcw8778bmuReeLJJPsWojAI+4F pWT08+7OUJYb3hRnGxxzKot6CPztdkpQXfXMy+wyNlNbRZ/ivs9/f/GhdblXy/6T j12LKIh1IOxpB/wi7GRfeABUuC4MU8xqx6FuPDrBgCTMfVig/wcwF27AUr//a0F5 xrrzCrDFNAvuyD1WyYilaxWDHAe2o9ROT0JZ4VB3zu40w2VlTT77aqA174xfQa6b 418Eykw3O11dmsY8AQPTt1HhkDCiewEe4K58CJcmCNEf/inFbvI= =kNQc -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmCk3VMACgkQ7G51OISz Hs0Y+A/9HqeZOFBPE3BlnEKkdxEPp+zoOj/57pWlS+Xr8ySalZXonAAC975voKjS 6IFejLkN83q5yCrwYBJ/PCc5Xm9deBf+6AE3KyaNV7CD5ycSBy8o9Fx7IH81wFXt quiVNlpi7iGiWsm3ubsvDHNgHwQlvN6mbNp+RIqD4YllheUsTXeR05rrfr+TauXv Bv7LEolcyUJo0LTzqPsTlVFG9ec+0qWZb1A7LgOZJFWrS/XaYpoXA49KaNbIZIPj jb+sWzTsXhOMc2yB1A6P8T3jsx7NxK4Y2rRyCk0l5Hm//Jl7Kcx+vbe0yQPf2DOs c0X+0s8Vwk/Ry606XLcy0nIIGePyj++u43r6noB/cN/LnZUbJY3zbywUNvrY1thS YG2MtSiV9TzzKP4xApPF7G/4G6H4HOaC1cQbxc3+7sklJf7Coh3pKv775POIfnLS UrQ2ToQRNWQC4/l/O9h9BYiIPWSAAUd0uBgZzcdQWeWbdPXMdb02cwZ5YuPoZgSV aLsDN8naaQlsZ20wdQ0DaqKoIryHnD0XHNSMkEvZ0bjPxOOtICekixLJQxjwFBQL cx3stcQs5PT8l3PQqc2xnVNTqnL7yWGdMYdfNE+4qTLSiw2s5uXbb7BJwttj9Pit QKVSVUtp12ObX1kJzqhKa9GQEUPYREVBpN7NwJL2E5hVObfT6OE= =ruSa -----END PGP SIGNATURE----- Merge tag 'v5.4.120' into 5.4-2.3.x-imx This is the 5.4.120 stable release Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>	2021-05-19 09:41:35 +00:00
Jia-Ju Bai	a8cfa7aff1	kernel: kexec_file: fix error return code of kexec_calculate_store_digests() [ Upstream commit 31d82c2c787d5cf65fedd35ebbc0c1bd95c1a679 ] When vzalloc() returns NULL to sha_regions, no error return code of kexec_calculate_store_digests() is assigned. To fix this bug, ret is assigned with -ENOMEM in this case. Link: https://lkml.kernel.org/r/20210309083904.24321-1-baijiaju1990@gmail.com Fixes: `a43cac0d9d` ("kexec: split kexec_file syscall code to kexec_file.c") Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Acked-by: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-19 10:08:28 +02:00
Odin Ugedal	043ebbccdd	sched/fair: Fix unfairness caused by missing load decay [ Upstream commit 0258bdfaff5bd13c4d2383150b7097aecd6b6d82 ] This fixes an issue where old load on a cfs_rq is not properly decayed, resulting in strange behavior where fairness can decrease drastically. Real workloads with equally weighted control groups have ended up getting a respective 99% and 1%(!!) of cpu time. When an idle task is attached to a cfs_rq by attaching a pid to a cgroup, the old load of the task is attached to the new cfs_rq and sched_entity by attach_entity_cfs_rq. If the task is then moved to another cpu (and therefore cfs_rq) before being enqueued/woken up, the load will be moved to cfs_rq->removed from the sched_entity. Such a move will happen when enforcing a cpuset on the task (eg. via a cgroup) that force it to move. The load will however not be removed from the task_group itself, making it look like there is a constant load on that cfs_rq. This causes the vruntime of tasks on other sibling cfs_rq's to increase faster than they are supposed to; causing severe fairness issues. If no other task is started on the given cfs_rq, and due to the cpuset it would not happen, this load would never be properly unloaded. With this patch the load will be properly removed inside update_blocked_averages. This also applies to tasks moved to the fair scheduling class and moved to another cpu, and this path will also fix that. For fork, the entity is queued right away, so this problem does not affect that. This applies to cases where the new process is the first in the cfs_rq, issue introduced `3d30544f02` ("sched/fair: Apply more PELT fixes"), and when there has previously been load on the cgroup but the cgroup was removed from the leaflist due to having null PELT load, indroduced in `039ae8bcf7` ("sched/fair: Fix O(nr_cgroups) in the load balancing path"). For a simple cgroup hierarchy (as seen below) with two equally weighted groups, that in theory should get 50/50 of cpu time each, it often leads to a load of 60/40 or 70/30. parent/ cg-1/ cpu.weight: 100 cpuset.cpus: 1 cg-2/ cpu.weight: 100 cpuset.cpus: 1 If the hierarchy is deeper (as seen below), while keeping cg-1 and cg-2 equally weighted, they should still get a 50/50 balance of cpu time. This however sometimes results in a balance of 10/90 or 1/99(!!) between the task groups. $ ps u -C stress USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 18568 1.1 0.0 3684 100 pts/12 R+ 13:36 0:00 stress --cpu 1 root 18580 99.3 0.0 3684 100 pts/12 R+ 13:36 0:09 stress --cpu 1 parent/ cg-1/ cpu.weight: 100 sub-group/ cpu.weight: 1 cpuset.cpus: 1 cg-2/ cpu.weight: 100 sub-group/ cpu.weight: 10000 cpuset.cpus: 1 This can be reproduced by attaching an idle process to a cgroup and moving it to a given cpuset before it wakes up. The issue is evident in many (if not most) container runtimes, and has been reproduced with both crun and runc (and therefore docker and all its "derivatives"), and with both cgroup v1 and v2. Fixes: `3d30544f02` ("sched/fair: Apply more PELT fixes") Fixes: `039ae8bcf7` ("sched/fair: Fix O(nr_cgroups) in the load balancing path") Signed-off-by: Odin Ugedal <odin@uged.al> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20210501141950.23622-2-odin@uged.al Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-19 10:08:28 +02:00
Quentin Perret	687f523c13	sched: Fix out-of-bound access in uclamp [ Upstream commit 6d2f8909a5fabb73fe2a63918117943986c39b6c ] Util-clamp places tasks in different buckets based on their clamp values for performance reasons. However, the size of buckets is currently computed using a rounding division, which can lead to an off-by-one error in some configurations. For instance, with 20 buckets, the bucket size will be 1024/20=51. A task with a clamp of 1024 will be mapped to bucket id 1024/51=20. Sadly, correct indexes are in range [0,19], hence leading to an out of bound memory access. Clamp the bucket id to fix the issue. Fixes: `69842cba9a` ("sched/uclamp: Add CPU's clamp buckets refcounting") Suggested-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lkml.kernel.org/r/20210430151412.160913-1-qperret@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-19 10:08:28 +02:00
Andrey Zhizhikin	6602fc5788	This is the 5.4.119 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmCeKsMACgkQONu9yGCS aT4cYhAA0qDTHscvm641m/Dv4U9w3gWh2Fs8oPz43+nJ1/8CTrT/gSWA7IRDDHiV Dys2canDVLNYTEx1TqwmHbN3R+nvQTpdz2wuJuSf7GKYQj0n3S99BEN6uxod+puu /M7apBH5npjZKv1DMRUrQ/AUGVUuBQqtN7Hl5hEL8ibI/bsZV8+dhJJ8c8uyJpam peiP5n2lCz5HZ/K5OyEy1jCmWQLIcRN59SmiARy/xk739igoCMUajkY1mV0WVyks SKnZEP7tY1mLLYzpW/ZVSkXurx+ZtL1zUctRt5dh5US4uzNt/sfm8oDzyzvGojd/ iWtXefprJXbI9BGyNaBwwNmzjSXabSkoI75wExxsMQKFZpsq12pz97dwy/pZyU+c NlzbmDQg8+Cs9dKsDw6jUXHYSJ9fb4mk6GOF9u0LXgyq1f15/DzjdzkYLXZ3tTOK exVFs/CKz7Dg6npdO5kl7mg18AxmVH+OJftltF2+MbUohBs2vRDRr+O4cY8Wlc2Q AF85uAE3Mo/yL1pi6O7lMW4ic5yJvTRCX/iPsxyDU8LvxM1Kc7u9CzykX3M0WFLz TsKxfPQvoc6WGf8IWy4j1nXMzXQTHL/6CrfzOSTFngR8eqcsbgU0nkKpZNEtvnxN k30ID+Mcl4B6k6XTECNJUXjcwg+TR+XtKOjwXAIDaVqwoW759BY= =25pJ -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmCe4QMACgkQ7G51OISz Hs1JdQ//aC82eBW+TH85PNlVaYn7DL8cU6zE/mX4dofughg8AIg84v0v30rRGF3m c0bxBhwcah+4TIPzWSs8QAT7pEcvR3Gj5JZrEeIuyXBwKhRSXgZJKbV4XAE9Z23W 1Etxwh28NfLbLEnX6qJnejQhHckt4MK9j4W5Gg2L7o6TrZwB7gLjNPfck6t7DnJc 9Fo4TxwgP+BFu30s9h9eQqv/EIVJPRfiAVgvK4o1CCEHya8bfnJgJDgyxKbdyBg6 IXYfYWGVe26VCR7PreEFJ3iheVVlI9VicFfhRzbsCdYQYYNqa22VfyMFrOO1nyMO SPAQnDRqlpvXIUHZ70puDX9QYji2RhVjf6fnDoQRPlkDaKumO3+zdk8bcCJFk7kF f7BPc8babvAn4WeG2hQOlPSKf+Mcc5mFmh08bsb8y9/w2mNPhg8UkCsaymggabkP GK/MvaBGxSoV8ft+EInyRLN3qlLrinYZDOOTj40Pd/d3HrJQCcShXzOI4SRd73MU +ISKHU9BrsRvVQ5ICjVTRfFMcFDJUt60leVM+QGm6MxQmgAuW/Scy9bNb9sDUyJd 6TUI2stzAxDRrjz8JDIXFyUImC3juL1c4WnrpP5I6ZWIeXIpVlQAH1Wxg18Dj0DC 1W5Fb2cicRRLfSPBgjppeipChm9ecT6Fl+3n7kBo5tvSdiiJEo4= =Wb4x -----END PGP SIGNATURE----- Merge tag 'v5.4.119' into 5.4-2.3.x-imx This is the 5.4.119 stable release Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>	2021-05-14 20:43:43 +00:00
Arnd Bergmann	32e046965f	smp: Fix smp_call_function_single_async prototype commit 1139aeb1c521eb4a050920ce6c64c36c4f2a3ab7 upstream. As of commit `966a967116` ("smp: Avoid using two cache lines for struct call_single_data"), the smp code prefers 32-byte aligned call_single_data objects for performance reasons, but the block layer includes an instance of this structure in the main 'struct request' that is more senstive to size than to performance here, see `4ccafe0320` ("block: unalign call_single_data in struct request"). The result is a violation of the calling conventions that clang correctly points out: block/blk-mq.c:630:39: warning: passing 8-byte aligned argument to 32-byte aligned parameter 2 of 'smp_call_function_single_async' may result in an unaligned pointer access [-Walign-mismatch] smp_call_function_single_async(cpu, &rq->csd); It does seem that the usage of the call_single_data without cache line alignment should still be allowed by the smp code, so just change the function prototype so it accepts both, but leave the default alignment unchanged for the other users. This seems better to me than adding a local hack to shut up an otherwise correct warning in the caller. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Jens Axboe <axboe@kernel.dk> Link: https://lkml.kernel.org/r/20210505211300.3174456-1-arnd@kernel.org [nc: Fix conflicts] Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-14 09:44:33 +02:00
Waiman Long	279749d0d4	sched/debug: Fix cgroup_path[] serialization [ Upstream commit ad789f84c9a145f8a18744c0387cec22ec51651e ] The handling of sysrq key can be activated by echoing the key to /proc/sysrq-trigger or via the magic key sequence typed into a terminal that is connected to the system in some way (serial, USB or other mean). In the former case, the handling is done in a user context. In the latter case, it is likely to be in an interrupt context. Currently in print_cpu() of kernel/sched/debug.c, sched_debug_lock is taken with interrupt disabled for the whole duration of the calls to print_*_stats() and print_rq() which could last for the quite some time if the information dump happens on the serial console. If the system has many cpus and the sched_debug_lock is somehow busy (e.g. parallel sysrq-t), the system may hit a hard lockup panic depending on the actually serial console implementation of the system. The purpose of sched_debug_lock is to serialize the use of the global cgroup_path[] buffer in print_cpu(). The rests of the printk calls don't need serialization from sched_debug_lock. Calling printk() with interrupt disabled can still be problematic if multiple instances are running. Allocating a stack buffer of PATH_MAX bytes is not feasible because of the limited size of the kernel stack. The solution implemented in this patch is to allow only one caller at a time to use the full size group_path[], while other simultaneous callers will have to use shorter stack buffers with the possibility of path name truncation. A "..." suffix will be printed if truncation may have happened. The cgroup path name is provided for informational purpose only, so occasional path name truncation should not be a big problem. Fixes: `efe25c2c7b` ("sched: Reinstate group names in /proc/sched_debug") Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210415195426.6677-1-longman@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-14 09:44:26 +02:00
Andrey Zhizhikin	ba4e63325c	This is the 5.4.118 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmCacuMACgkQONu9yGCS aT7jXQ/+N95y28rkW+9aG33bMKwodiGO3pax1ZT59SwVICDQQQhK6zXmsVtWP3hv oaDqbfN+ap/Ms0dARSxhq4NxtGc1RX8Jv+0XJ0nJ10JkJqAizNwglhtfA4NDAeB1 w0M4b6vYYpotjReo86ZB8SC870eKUIocJKiayksIvgOTJewvq+4qDqn3h6VKdV3s p9Gxjz/8l2koGfUix+lPvPRx2c7juw49Nje0fWQzfHYUwtOYn8s7e6NZxtIJtYtq F80lqdXjGAXkUCf1omW+6TifSUPfmx1aPgOPBiP8WBlNwJ8hvsq6s+2MGdC+0PkZ 4UPTllSe/Q2g1xbO67yFHNYFYE4PKojZ8NKvJXcp5nvBDNpbiefaRROM7PbkQQmm p1Bayy39Hlsmxb6/d/9HOANOZZeCaF1PchaLviwfkrq64U/Yg2csFHl/uX71fJoT RchzeLRWPCqN91Bm5tgUeBGibqNsfkZNzfbiOEGN7MzZNsU3BZm0KbKpqnXzSvgG 6guZD1m4cjmyT7BzRsSremecIn9n8TmxT/lutAGtUi8TWodWBc3kvtxe3/xBILQ1 MOWhBIhO9/2HAjJ+h/GIFGOrwhGtFmA5x1gGXOSE+Kkxx1jUiPE9zvPFQrgYrdAQ yL25fPyfNO5MTUC2rEF7s0hW5dWbcL7H8r8ZbXSh2oaUokn+a00= =FHFi -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmCaq/0ACgkQ7G51OISz Hs10mA//ci+tIUNzwXtWW5LAW5gpapDK3vVLwT4e1hrjeHFlHhIAVfRX2CcpLSoj dgpQOFKFKJ3q30GzkJSG0eniurqYNBz2T7N0bozPcKIvvbUVF7JnZECfizC24+kd GcjSUx15vwnHhYbg7cwXb8kkVjzjL/pfGSdlKsUZ5q2dcbIxWH8XVt/DKGaQyq+G zRH46Wc4TLQim4W5nsiNiSZw9L2kj+GTrFFJttG/+/K7OMMXrjiRdpUM90ufUqJm OPRn52myz7R9+SygQ2++5pSXIws2vAp6xd6CSQ+uXF7kxiDGCs+W/+v6TDr/8a4K haONfcWqcnc4IXLkr7m6nCQCqowBRzK6gXYlocWbsfJDBoQw8dzQQbe2EY8wWxwe qjLA9/Y/cR9YKinSdilex9Wixt1S0tsWZVWY4wvsztH24nM1l/jLAxw8ebaoTxv6 5thEsBmZCCJLP7gvqPh0bXyIGki03yZGpa/7D7cU8zNLQTOpY3rHfWzK518mRzrG LDIM9YfW3k/4evN8t5pJEbO+R9oeWyvQV3Ka48fttOYlyy5AMqxlZSEFwypjTz2N hE0x1gBV2l9k6kxErkQs/IWNeOO4xr53TNgepcMis0ADsTLv3J0kGv0tC4Ifvdwf Sl3+TXovEeCcUYSCFCUviPpdSziGQh+w9XxYeDfC7zbNKVHQoqk= =zVUy -----END PGP SIGNATURE----- Merge tag 'v5.4.118' into 5.4-2.3.x-imx This is the 5.4.118 stable release Conflicts (manual resolve): - drivers/mmc/core/core.c: - drivers/mmc/core/host.c: Fix merge fuzz for upstream commit `909a01b951` ("mmc: core: Fix hanging on I/O during system suspend for removable cards") Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>	2021-05-11 16:08:17 +00:00
Steven Rostedt (VMware)	c64da3294a	tracing: Restructure trace_clock_global() to never block commit aafe104aa9096827a429bc1358f8260ee565b7cc upstream. It was reported that a fix to the ring buffer recursion detection would cause a hung machine when performing suspend / resume testing. The following backtrace was extracted from debugging that case: Call Trace: trace_clock_global+0x91/0xa0 __rb_reserve_next+0x237/0x460 ring_buffer_lock_reserve+0x12a/0x3f0 trace_buffer_lock_reserve+0x10/0x50 __trace_graph_return+0x1f/0x80 trace_graph_return+0xb7/0xf0 ? trace_clock_global+0x91/0xa0 ftrace_return_to_handler+0x8b/0xf0 ? pv_hash+0xa0/0xa0 return_to_handler+0x15/0x30 ? ftrace_graph_caller+0xa0/0xa0 ? trace_clock_global+0x91/0xa0 ? __rb_reserve_next+0x237/0x460 ? ring_buffer_lock_reserve+0x12a/0x3f0 ? trace_event_buffer_lock_reserve+0x3c/0x120 ? trace_event_buffer_reserve+0x6b/0xc0 ? trace_event_raw_event_device_pm_callback_start+0x125/0x2d0 ? dpm_run_callback+0x3b/0xc0 ? pm_ops_is_empty+0x50/0x50 ? platform_get_irq_byname_optional+0x90/0x90 ? trace_device_pm_callback_start+0x82/0xd0 ? dpm_run_callback+0x49/0xc0 With the following RIP: RIP: 0010:native_queued_spin_lock_slowpath+0x69/0x200 Since the fix to the recursion detection would allow a single recursion to happen while tracing, this lead to the trace_clock_global() taking a spin lock and then trying to take it again: ring_buffer_lock_reserve() { trace_clock_global() { arch_spin_lock() { queued_spin_lock_slowpath() { /* lock taken / (something else gets traced by function graph tracer) ring_buffer_lock_reserve() { trace_clock_global() { arch_spin_lock() { queued_spin_lock_slowpath() { / DEAD LOCK! / Tracing should never* block, as it can lead to strange lockups like the above. Restructure the trace_clock_global() code to instead of simply taking a lock to update the recorded "prev_time" simply use it, as two events happening on two different CPUs that calls this at the same time, really doesn't matter which one goes first. Use a trylock to grab the lock for updating the prev_time, and if it fails, simply try again the next time. If it failed to be taken, that means something else is already updating it. Link: https://lkml.kernel.org/r/20210430121758.650b6e8a@gandalf.local.home Cc: stable@vger.kernel.org Tested-by: Konstantin Kharlamov <hi-angel@yandex.ru> Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com> Fixes: b02414c8f045 ("ring-buffer: Fix recursion protection transitions between interrupt context") # started showing the problem Fixes: `14131f2f98` ("tracing: implement trace_clock_*() APIs") # where the bug happened Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212761 Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-11 14:04:18 +02:00
Steven Rostedt (VMware)	0834094c9a	tracing: Map all PIDs to command lines commit 785e3c0a3a870e72dc530856136ab4c8dd207128 upstream. The default max PID is set by PID_MAX_DEFAULT, and the tracing infrastructure uses this number to map PIDs to the comm names of the tasks, such output of the trace can show names from the recorded PIDs in the ring buffer. This mapping is also exported to user space via the "saved_cmdlines" file in the tracefs directory. But currently the mapping expects the PIDs to be less than PID_MAX_DEFAULT, which is the default maximum and not the real maximum. Recently, systemd will increases the maximum value of a PID on the system, and when tasks are traced that have a PID higher than PID_MAX_DEFAULT, its comm is not recorded. This leads to the entire trace to have "<...>" as the comm name, which is pretty useless. Instead, keep the array mapping the size of PID_MAX_DEFAULT, but instead of just mapping the index to the comm, map a mask of the PID (PID_MAX_DEFAULT - 1) to the comm, and find the full PID from the map_cmdline_to_pid array (that already exists). This bug goes back to the beginning of ftrace, but hasn't been an issue until user space started increasing the maximum value of PIDs. Link: https://lkml.kernel.org/r/20210427113207.3c601884@gandalf.local.home Cc: stable@vger.kernel.org Fixes: `bc0c38d139` ("ftrace: latency tracer infrastructure") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-11 14:04:18 +02:00
Masahiro Yamada	79c95130a5	kbuild: update config_data.gz only when the content of .config is changed commit 46b41d5dd8019b264717978c39c43313a524d033 upstream. If the timestamp of the .config file is updated, config_data.gz is regenerated, then vmlinux is re-linked. This occurs even if the content of the .config has not changed at all. This issue was mitigated by commit `67424f61f8` ("kconfig: do not write .config if the content is the same"); Kconfig does not update the .config when it ends up with the identical configuration. The issue is remaining when the .config is created by _defconfig with some config fragment(s) applied on top. This is typical for powerpc and mips, where several _defconfig targets are constructed by using merge_config.sh. One workaround is to have the copy of the .config. The filechk rule updates the copy, kernel/config_data, by checking the content instead of the timestamp. With this commit, the second run with the same configuration avoids the needless rebuilds. $ make ARCH=mips defconfig all [ snip ] $ make ARCH=mips defconfig all *** Default configuration is based on target '32r2el_defconfig' Using ./arch/mips/configs/generic_defconfig as base Merging arch/mips/configs/generic/32r2.config Merging arch/mips/configs/generic/el.config Merging ./arch/mips/configs/generic/board-boston.config Merging ./arch/mips/configs/generic/board-ni169445.config Merging ./arch/mips/configs/generic/board-ocelot.config Merging ./arch/mips/configs/generic/board-ranchu.config Merging ./arch/mips/configs/generic/board-sead-3.config Merging ./arch/mips/configs/generic/board-xilfpga.config # # configuration written to .config # SYNC include/config/auto.conf CALL scripts/checksyscalls.sh CALL scripts/atomic/check-atomics.sh CHK include/generated/compile.h CHK include/generated/autoksyms.h Reported-by: Elliot Berman <eberman@codeaurora.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-11 14:04:16 +02:00
Thomas Gleixner	8d2be04dbb	Revert `337f13046f` ("futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op") commit 4fbf5d6837bf81fd7a27d771358f4ee6c4f243f8 upstream. The FUTEX_WAIT operand has historically a relative timeout which means that the clock id is irrelevant as relative timeouts on CLOCK_REALTIME are not subject to wall clock changes and therefore are mapped by the kernel to CLOCK_MONOTONIC for simplicity. If a caller would set FUTEX_CLOCK_REALTIME for FUTEX_WAIT the timeout is still treated relative vs. CLOCK_MONOTONIC and then the wait arms that timeout based on CLOCK_REALTIME which is broken and obviously has never been used or even tested. Reject any attempt to use FUTEX_CLOCK_REALTIME with FUTEX_WAIT again. The desired functionality can be achieved with FUTEX_WAIT_BITSET and a FUTEX_BITSET_MATCH_ANY argument. Fixes: `337f13046f` ("futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210422194704.834797921@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-11 14:04:16 +02:00

1 2 3 4 5 ...

32045 Commits