linux-brain/arch
Michael Ellerman 5c7c5c49d9 powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi()
commit 7c6986ade69e3c81bac831645bc72109cd798a80 upstream.

In raise_backtrace_ipi() we iterate through the cpumask of CPUs, sending
each an IPI asking them to do a backtrace, but we don't wait for the
backtrace to happen.

We then iterate through the CPU mask again, and if any CPU hasn't done
the backtrace and cleared itself from the mask, we print a trace on its
behalf, noting that the trace may be "stale".

This works well enough when a CPU is not responding, because in that
case it doesn't receive the IPI and the sending CPU is left to print the
trace. But when all CPUs are responding we are left with a race between
the sending and receiving CPUs, if the sending CPU wins the race then it
will erroneously print a trace.

This leads to spurious "stale" traces from the sending CPU, which can
then be interleaved messily with the receiving CPU, note the CPU
numbers, eg:

  [ 1658.929157][    C7] rcu: Stack dump where RCU GP kthread last ran:
  [ 1658.929223][    C7] Sending NMI from CPU 7 to CPUs 1:
  [ 1658.929303][    C1] NMI backtrace for cpu 1
  [ 1658.929303][    C7] CPU 1 didn't respond to backtrace IPI, inspecting paca.
  [ 1658.929362][    C1] CPU: 1 PID: 325 Comm: kworker/1:1H Tainted: G        W   E     5.13.0-rc2+ #46
  [ 1658.929405][    C7] irq_soft_mask: 0x01 in_mce: 0 in_nmi: 0 current: 325 (kworker/1:1H)
  [ 1658.929465][    C1] Workqueue: events_highpri test_work_fn [test_lockup]
  [ 1658.929549][    C7] Back trace of paca->saved_r1 (0xc0000000057fb400) (possibly stale):
  [ 1658.929592][    C1] NIP:  c00000000002cf50 LR: c008000000820178 CTR: c00000000002cfa0

To fix it, change the logic so that the sending CPU waits 5s for the
receiving CPU to print its trace. If the receiving CPU prints its trace
successfully then the sending CPU just continues, avoiding any spurious
"stale" trace.

This has the added benefit of allowing all CPUs to print their traces in
order and avoids any interleaving of their output.

Fixes: 5cc05910f2 ("powerpc/64s: Wire up arch_trigger_cpumask_backtrace()")
Cc: stable@vger.kernel.org # v4.18+
Reported-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210625140408.3351173-1-mpe@ellerman.id.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-14 16:53:08 +02:00
..
alpha alpha: fix annotation of io{read,write}{16,32}be() 2020-08-26 10:40:58 +02:00
arc ARCv2: save ABI registers across signal handling 2021-06-23 14:41:29 +02:00
arm ARM: dts: at91: sama5d4: fix pinctrl muxing 2021-07-14 16:53:02 +02:00
arm64 arm64: link with -z norelro for LLD or aarch64-elf 2021-06-30 08:47:44 -04:00
c6x mm: consolidate pgtable_cache_init() and pgd_cache_init() 2019-09-24 15:54:09 -07:00
csky csky: change a Kconfig symbol name to fix e1000 build error 2021-04-28 13:19:16 +02:00
h8300 h8300: fix PREEMPTION build, TI_PRE_COUNT undefined 2021-02-17 10:35:18 +01:00
hexagon hexagon: define ioremap_uc 2020-05-10 10:31:31 +02:00
ia64 tweewide: Fix most Shebang lines 2021-05-22 11:38:30 +02:00
m68k m68k: mvme147,mvme16x: Don't wipe PCC timer config bits 2021-05-14 09:44:19 +02:00
microblaze microblaze: Prevent the overflow of the start 2020-02-24 08:37:02 +01:00
mips MIPS: generic: Update node names to avoid unit addresses 2021-06-30 08:47:44 -04:00
nds32 nds32: flush_dcache_page: use page_mapping_file to avoid races with swapoff 2021-04-14 08:24:10 +02:00
nios2 nios2 update for v5.4-rc1 2019-09-27 13:02:19 -07:00
openrisc openrisc: Define memory barrier mb 2021-06-03 08:59:11 +02:00
parisc parisc: avoid a warning on u8 cast for cmpxchg on u8 pointers 2021-04-14 08:24:11 +02:00
powerpc powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi() 2021-07-14 16:53:08 +02:00
riscv riscv: Use -mno-relax when using lld linker 2021-06-18 09:58:58 +02:00
s390 s390/stack: fix possible register corruption with stack switch helper 2021-07-11 12:52:07 +02:00
sh sh: dma: fix kconfig dependency for G2_DMA 2021-01-27 11:47:52 +01:00
sparc sparc64: Fix opcode filtering in handling of no fault loads 2021-03-30 14:35:22 +02:00
um um: Disable CONFIG_GCOV with MODULES 2021-05-22 11:38:28 +02:00
unicore32 mm: treewide: clarify pgtable_page_{ctor,dtor}() naming 2019-09-26 10:10:44 -07:00
x86 KVM: SVM: Call SEV Guest Decommission if ASID binding fails 2021-07-11 12:52:07 +02:00
xtensa xtensa: move coprocessor_flush to the .text section 2021-04-07 14:47:42 +02:00
.gitignore
Kconfig Revert: "ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS" 2020-12-30 11:51:47 +01:00