linux-brain/kernel/rcu
Sergey Senozhatsky 28996dbb8a rcu/tree: Handle VM stoppage in stall detection
[ Upstream commit ccfc9dd6914feaa9a81f10f9cce56eb0f7712264 ]

The soft watchdog timer function checks if a virtual machine
was suspended and hence what looks like a lockup in fact
is a false positive.

This is what kvm_check_and_clear_guest_paused() does: it
tests guest PVCLOCK_GUEST_STOPPED (which is set by the host)
and if it's set then we need to touch all watchdogs and bail
out.

Watchdog timer function runs from IRQ, so PVCLOCK_GUEST_STOPPED
check works fine.

There is, however, one more watchdog that runs from IRQ, so
watchdog timer fn races with it, and that watchdog is not aware
of PVCLOCK_GUEST_STOPPED - RCU stall detector.

apic_timer_interrupt()
 smp_apic_timer_interrupt()
  hrtimer_interrupt()
   __hrtimer_run_queues()
    tick_sched_timer()
     tick_sched_handle()
      update_process_times()
       rcu_sched_clock_irq()

This triggers RCU stalls on our devices during VM resume.

If tick_sched_handle()->rcu_sched_clock_irq() runs on a VCPU
before watchdog_timer_fn()->kvm_check_and_clear_guest_paused()
then there is nothing on this VCPU that touches watchdogs and
RCU reads stale gp stall timestamp and new jiffies value, which
makes it think that RCU has stalled.

Make RCU stall watchdog aware of PVCLOCK_GUEST_STOPPED and
don't report RCU stalls when we resume the VM.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-15 09:47:26 +02:00
..
Kconfig rcu: Use CONFIG_PREEMPTION 2019-07-31 19:03:35 +02:00
Kconfig.debug rcu: Add support for consolidated-RCU reader checking 2019-08-09 11:00:35 -07:00
Makefile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
rcu.h srcu: Fix broken node geometry after early ssp init 2021-07-20 16:10:41 +02:00
rcu_segcblist.c rcu/nocb: Add bypass callback queueing 2019-08-13 14:37:32 -07:00
rcu_segcblist.h rcu/nocb: Add bypass callback queueing 2019-08-13 14:37:32 -07:00
rcuperf.c rcuperf: Make rcuperf kernel test more robust for !expedited mode 2019-08-01 14:30:22 -07:00
rcutorture.c rcu/nocb: Print no-CBs diagnostics when rcutorture writer unduly delayed 2019-08-13 14:38:24 -07:00
srcutiny.c srcu: Remove cleanup_srcu_struct_quiesced() 2019-03-26 14:39:24 -07:00
srcutree.c srcu: Fix broken node geometry after early ssp init 2021-07-20 16:10:41 +02:00
sync.c rcu/sync: Simplify the state machine 2019-05-28 09:05:23 -07:00
tiny.c rcu: rcu_qs -- Use raise_softirq_irqoff to not save irqs twice 2019-03-26 14:37:49 -07:00
tree.c srcu: Fix broken node geometry after early ssp init 2021-07-20 16:10:41 +02:00
tree.h rcu/nocb: Print no-CBs diagnostics when rcutorture writer unduly delayed 2019-08-13 14:38:24 -07:00
tree_exp.h rcu: Allow only one expedited GP to run concurrently with wakeups 2020-03-05 16:43:50 +01:00
tree_plugin.h rcu/nocb: Perform deferred wake up before last idle's need_resched() check 2021-03-04 10:26:47 +01:00
tree_stall.h rcu/tree: Handle VM stoppage in stall detection 2021-09-15 09:47:26 +02:00
update.c Merge branches 'consolidate.2019.08.01b', 'fixes.2019.08.12a', 'lists.2019.08.13a' and 'torture.2019.08.01b' into HEAD 2019-08-13 14:30:30 -07:00