linux-brain/arch/x86
Xiaochen Shen e3f5c2e990 x86/resctrl: Fix a deadlock due to inaccurate reference
commit 334b0f4e9b1b4a1d475f803419d202f6c5e4d18e upstream.

There is a race condition which results in a deadlock when rmdir and
mkdir execute concurrently:

$ ls /sys/fs/resctrl/c1/mon_groups/m1/
cpus  cpus_list  mon_data  tasks

Thread 1: rmdir /sys/fs/resctrl/c1
Thread 2: mkdir /sys/fs/resctrl/c1/mon_groups/m1

3 locks held by mkdir/48649:
 #0:  (sb_writers#17){.+.+}, at: [<ffffffffb4ca2aa0>] mnt_want_write+0x20/0x50
 #1:  (&type->i_mutex_dir_key#8/1){+.+.}, at: [<ffffffffb4c8c13b>] filename_create+0x7b/0x170
 #2:  (rdtgroup_mutex){+.+.}, at: [<ffffffffb4a4389d>] rdtgroup_kn_lock_live+0x3d/0x70

4 locks held by rmdir/48652:
 #0:  (sb_writers#17){.+.+}, at: [<ffffffffb4ca2aa0>] mnt_want_write+0x20/0x50
 #1:  (&type->i_mutex_dir_key#8/1){+.+.}, at: [<ffffffffb4c8c3cf>] do_rmdir+0x13f/0x1e0
 #2:  (&type->i_mutex_dir_key#8){++++}, at: [<ffffffffb4c86d5d>] vfs_rmdir+0x4d/0x120
 #3:  (rdtgroup_mutex){+.+.}, at: [<ffffffffb4a4389d>] rdtgroup_kn_lock_live+0x3d/0x70

Thread 1 is deleting control group "c1". Holding rdtgroup_mutex,
kernfs_remove() removes all kernfs nodes under directory "c1"
recursively, then waits for sub kernfs node "mon_groups" to drop active
reference.

Thread 2 is trying to create a subdirectory "m1" in the "mon_groups"
directory. The wrapper kernfs_iop_mkdir() takes an active reference to
the "mon_groups" directory but the code drops the active reference to
the parent directory "c1" instead.

As a result, Thread 1 is blocked on waiting for active reference to drop
and never release rdtgroup_mutex, while Thread 2 is also blocked on
trying to get rdtgroup_mutex.

Thread 1 (rdtgroup_rmdir)   Thread 2 (rdtgroup_mkdir)
(rmdir /sys/fs/resctrl/c1)  (mkdir /sys/fs/resctrl/c1/mon_groups/m1)
-------------------------   -------------------------
                            kernfs_iop_mkdir
                              /*
                               * kn: "m1", parent_kn: "mon_groups",
                               * prgrp_kn: parent_kn->parent: "c1",
                               *
                               * "mon_groups", parent_kn->active++: 1
                               */
                              kernfs_get_active(parent_kn)
kernfs_iop_rmdir
  /* "c1", kn->active++ */
  kernfs_get_active(kn)

  rdtgroup_kn_lock_live
    atomic_inc(&rdtgrp->waitcount)
    /* "c1", kn->active-- */
    kernfs_break_active_protection(kn)
    mutex_lock

  rdtgroup_rmdir_ctrl
    free_all_child_rdtgrp
      sentry->flags = RDT_DELETED

    rdtgroup_ctrl_remove
      rdtgrp->flags = RDT_DELETED
      kernfs_get(kn)
      kernfs_remove(rdtgrp->kn)
        __kernfs_remove
          /* "mon_groups", sub_kn */
          atomic_add(KN_DEACTIVATED_BIAS, &sub_kn->active)
          kernfs_drain(sub_kn)
            /*
             * sub_kn->active == KN_DEACTIVATED_BIAS + 1,
             * waiting on sub_kn->active to drop, but it
             * never drops in Thread 2 which is blocked
             * on getting rdtgroup_mutex.
             */
Thread 1 hangs here ---->
            wait_event(sub_kn->active == KN_DEACTIVATED_BIAS)
            ...
                              rdtgroup_mkdir
                                rdtgroup_mkdir_mon(parent_kn, prgrp_kn)
                                  mkdir_rdt_prepare(parent_kn, prgrp_kn)
                                    rdtgroup_kn_lock_live(prgrp_kn)
                                      atomic_inc(&rdtgrp->waitcount)
                                      /*
                                       * "c1", prgrp_kn->active--
                                       *
                                       * The active reference on "c1" is
                                       * dropped, but not matching the
                                       * actual active reference taken
                                       * on "mon_groups", thus causing
                                       * Thread 1 to wait forever while
                                       * holding rdtgroup_mutex.
                                       */
                                      kernfs_break_active_protection(
                                                               prgrp_kn)
                                      /*
                                       * Trying to get rdtgroup_mutex
                                       * which is held by Thread 1.
                                       */
Thread 2 hangs here ---->             mutex_lock
                                      ...

The problem is that the creation of a subdirectory in the "mon_groups"
directory incorrectly releases the active protection of its parent
directory instead of itself before it starts waiting for rdtgroup_mutex.
This is triggered by the rdtgroup_mkdir() flow calling
rdtgroup_kn_lock_live()/rdtgroup_kn_unlock() with kernfs node of the
parent control group ("c1") as argument. It should be called with kernfs
node "mon_groups" instead. What is currently missing is that the
kn->priv of "mon_groups" is NULL instead of pointing to the rdtgrp.

Fix it by pointing kn->priv to rdtgrp when "mon_groups" is created. Then
it could be passed to rdtgroup_kn_lock_live()/rdtgroup_kn_unlock()
instead. And then it operates on the same rdtgroup structure but handles
the active reference of kernfs node "mon_groups" to prevent deadlock.
The same changes are also made to the "mon_data" directories.

This results in some unused function parameters that will be cleaned up
in follow-up patch as the focus here is on the fix only in support of
backporting efforts.

Backporting notes:

Since upstream commit fa7d949337 ("x86/resctrl: Rename and move rdt
files to a separate directory"), the file
arch/x86/kernel/cpu/intel_rdt_rdtgroup.c has been renamed and moved to
arch/x86/kernel/cpu/resctrl/rdtgroup.c.
Apply the change against file arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
for older stable trees.

Fixes: c7d9aac613 ("x86/intel_rdt/cqm: Add mkdir support for RDT monitoring")
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1578500886-21771-4-git-send-email-xiaochen.shen@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-05 14:18:11 +00:00
..
boot x86/efistub: Disable paging at mixed mode entry 2020-01-23 08:20:32 +01:00
configs x86/unwind: Rename unwinder config options to 'CONFIG_UNWINDER_*' 2017-12-25 14:26:13 +01:00
crypto crypto: x86/crct10dif-pcl - fix use via crypto_shash_digest() 2019-05-21 18:50:15 +02:00
entry x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations 2019-08-06 19:05:29 +02:00
events perf/x86/intel: Fix PT PMI handling 2020-01-12 12:11:58 +01:00
hyperv x86/hyperv: mark hyperv_init as __init function 2019-12-01 09:14:37 +01:00
ia32 x86/ia32: Fix ia32_restore_sigcontext() AC leak 2019-05-31 06:47:31 -07:00
include x86/crash: Add a forward declaration of struct kimage 2019-12-31 12:37:40 +01:00
kernel x86/resctrl: Fix a deadlock due to inaccurate reference 2020-02-05 14:18:11 +00:00
kvm KVM: x86: fix out-of-bounds write in KVM_GET_EMULATED_CPUID (CVE-2019-19332) 2019-12-17 20:38:58 +01:00
lib x86/insn: Add some Intel instructions to the opcode map 2019-12-31 12:37:45 +01:00
math-emu x86: math-emu: Hide clang warnings for 16-bit overflow 2019-08-06 19:05:23 +02:00
mm x86/mm: Remove unused variable 'cpu' 2020-01-27 14:46:24 +01:00
net bpf: get rid of pure_initcall dependency to enable jits 2019-08-25 10:50:02 +02:00
oprofile x86/oprofile: Fix bogus GCC-8 warning in nmi_setup() 2018-02-28 10:19:41 +01:00
pci x86/PCI: Avoid AMD FCH XHCI USB PME# from D0 defect 2019-12-17 20:38:48 +01:00
platform x86/efi: Update e820 with reserved EFI boot services data to fix kexec breakage 2020-01-12 12:11:50 +01:00
power PM / hibernate: Check the success of generating md5 digest before hibernation 2019-11-24 08:23:02 +01:00
purgatory License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
ras License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
realmode x86/build: Specify elf_i386 linker emulation explicitly for i386 objects 2019-04-05 22:31:39 +02:00
tools x86/insn: Fix awk regexp warnings 2019-12-01 09:14:21 +01:00
um um: Drop own definition of PTRACE_SYSEMU/_SINGLESTEP 2018-11-21 09:24:06 +01:00
video
xen kprobes/x86/xen: blacklist non-attachable xen interrupt functions 2019-12-05 15:37:26 +01:00
.gitignore
Kbuild Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-09-07 09:25:15 -07:00
Kconfig x86/olpc: Fix build error with CONFIG_MFD_CS5535=m 2019-11-24 08:23:16 +01:00
Kconfig.cpu License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
Kconfig.debug x86, perf: Fix the dependency of the x86 insn decoder selftest 2020-01-27 14:46:45 +01:00
Makefile x86/retpolines: Fix up backport of a9d57ef15c 2019-10-05 12:48:05 +02:00
Makefile_32.cpu License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
Makefile.um License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00