linux-brain/arch
Dave Hansen 2d57051507 x86/apic: Add extra serialization for non-serializing MSRs
commit 25a068b8e9a4eb193d755d58efcb3c98928636e0 upstream.

Jan Kiszka reported that the x2apic_wrmsr_fence() function uses a plain
MFENCE while the Intel SDM (10.12.3 MSR Access in x2APIC Mode) calls for
MFENCE; LFENCE.

Short summary: we have special MSRs that have weaker ordering than all
the rest. Add fencing consistent with current SDM recommendations.

This is not known to cause any issues in practice, only in theory.

Longer story below:

The reason the kernel uses a different semantic is that the SDM changed
(roughly in late 2017). The SDM changed because folks at Intel were
auditing all of the recommended fences in the SDM and realized that the
x2apic fences were insufficient.

Why was the pain MFENCE judged insufficient?

WRMSR itself is normally a serializing instruction. No fences are needed
because the instruction itself serializes everything.

But, there are explicit exceptions for this serializing behavior written
into the WRMSR instruction documentation for two classes of MSRs:
IA32_TSC_DEADLINE and the X2APIC MSRs.

Back to x2apic: WRMSR is *not* serializing in this specific case.
But why is MFENCE insufficient? MFENCE makes writes visible, but
only affects load/store instructions. WRMSR is unfortunately not a
load/store instruction and is unaffected by MFENCE. This means that a
non-serializing WRMSR could be reordered by the CPU to execute before
the writes made visible by the MFENCE have even occurred in the first
place.

This means that an x2apic IPI could theoretically be triggered before
there is any (visible) data to process.

Does this affect anything in practice? I honestly don't know. It seems
quite possible that by the time an interrupt gets to consume the (not
yet) MFENCE'd data, it has become visible, mostly by accident.

To be safe, add the SDM-recommended fences for all x2apic WRMSRs.

This also leaves open the question of the _other_ weakly-ordered WRMSR:
MSR_IA32_TSC_DEADLINE. While it has the same ordering architecture as
the x2APIC MSRs, it seems substantially less likely to be a problem in
practice. While writes to the in-memory Local Vector Table (LVT) might
theoretically be reordered with respect to a weakly-ordered WRMSR like
TSC_DEADLINE, the SDM has this to say:

  In x2APIC mode, the WRMSR instruction is used to write to the LVT
  entry. The processor ensures the ordering of this write and any
  subsequent WRMSR to the deadline; no fencing is required.

But, that might still leave xAPIC exposed. The safest thing to do for
now is to add the extra, recommended LFENCE.

 [ bp: Massage commit message, fix typos, drop accidentally added
   newline to tools/arch/x86/include/asm/barrier.h. ]

Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200305174708.F77040DD@viggo.jf.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-10 09:25:32 +01:00
..
alpha alpha: fix annotation of io{read,write}{16,32}be() 2020-08-26 10:40:58 +02:00
arc arch/arc: add copy_user_page() to <asm/page.h> to fix build error on ARC 2021-01-19 18:26:15 +01:00
arm ARM: footbridge: fix dc21285 PCI configuration accessors 2021-02-10 09:25:31 +01:00
arm64 arm64: dts: ls1046a: fix dcfg address range 2021-02-10 09:25:27 +01:00
c6x mm: consolidate pgtable_cache_init() and pgd_cache_init() 2019-09-24 15:54:09 -07:00
csky csky: Fixup abiv2 syscall_trace break a4 & a5 2020-06-17 16:40:21 +02:00
h8300 mm: consolidate pgtable_cache_init() and pgd_cache_init() 2019-09-24 15:54:09 -07:00
hexagon hexagon: define ioremap_uc 2020-05-10 10:31:31 +02:00
ia64 ia64: fix build error with !COREDUMP 2020-11-05 11:43:33 +01:00
m68k m68k: q40: Fix info-leak in rtc_ioctl 2020-10-01 13:17:12 +02:00
microblaze microblaze: Prevent the overflow of the start 2020-02-24 08:37:02 +01:00
mips MIPS: relocatable: fix possible boot hangup with KASLR enabled 2021-01-19 18:26:12 +01:00
nds32 asm-generic/nds32: don't redefine cacheflush primitives 2020-01-17 19:48:43 +01:00
nios2 nios2 update for v5.4-rc1 2019-09-27 13:02:19 -07:00
openrisc openrisc: Fix issue with get_user for 64-bit values 2020-11-01 12:01:06 +01:00
parisc kbuild: fix broken builds because of GZIP,BZIP2,LZOP variables 2020-09-03 11:27:10 +02:00
powerpc powerpc: Fix alignment bug within the init sections 2021-01-27 11:47:47 +01:00
riscv riscv: defconfig: enable gpio support for HiFive Unleashed 2021-01-27 11:47:45 +01:00
s390 s390/kexec_file: fix diag308 subcode when loading crash kernel 2020-12-30 11:51:34 +01:00
sh sh: dma: fix kconfig dependency for G2_DMA 2021-01-27 11:47:52 +01:00
sparc sparc: fix handling of page table constructor failure 2020-12-30 11:51:26 +01:00
um um: virtio: free vu_dev only with the contained struct device 2021-02-10 09:25:27 +01:00
unicore32 mm: treewide: clarify pgtable_page_{ctor,dtor}() naming 2019-09-26 10:10:44 -07:00
x86 x86/apic: Add extra serialization for non-serializing MSRs 2021-02-10 09:25:32 +01:00
xtensa xtensa: uaccess: Add missing __user to strncpy_from_user() prototype 2020-12-02 08:49:49 +01:00
.gitignore
Kconfig Revert: "ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS" 2020-12-30 11:51:47 +01:00