linux-brain/arch
Mike Rapoport 08f33350ed x86/mm: Fix kern_addr_valid() to cope with existing but not present entries
commit 34b1999da935a33be6239226bfa6cd4f704c5c88 upstream.

Jiri Olsa reported a fault when running:

  # cat /proc/kallsyms | grep ksys_read
  ffffffff8136d580 T ksys_read
  # objdump -d --start-address=0xffffffff8136d580 --stop-address=0xffffffff8136d590 /proc/kcore

  /proc/kcore:     file format elf64-x86-64

  Segmentation fault

  general protection fault, probably for non-canonical address 0xf887ffcbff000: 0000 [#1] SMP PTI
  CPU: 12 PID: 1079 Comm: objdump Not tainted 5.14.0-rc5qemu+ #508
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-4.fc34 04/01/2014
  RIP: 0010:kern_addr_valid
  Call Trace:
   read_kcore
   ? rcu_read_lock_sched_held
   ? rcu_read_lock_sched_held
   ? rcu_read_lock_sched_held
   ? trace_hardirqs_on
   ? rcu_read_lock_sched_held
   ? lock_acquire
   ? lock_acquire
   ? rcu_read_lock_sched_held
   ? lock_acquire
   ? rcu_read_lock_sched_held
   ? rcu_read_lock_sched_held
   ? rcu_read_lock_sched_held
   ? lock_release
   ? _raw_spin_unlock
   ? __handle_mm_fault
   ? rcu_read_lock_sched_held
   ? lock_acquire
   ? rcu_read_lock_sched_held
   ? lock_release
   proc_reg_read
   ? vfs_read
   vfs_read
   ksys_read
   do_syscall_64
   entry_SYSCALL_64_after_hwframe

The fault happens because kern_addr_valid() dereferences existent but not
present PMD in the high kernel mappings.

Such PMDs are created when free_kernel_image_pages() frees regions larger
than 2Mb. In this case, a part of the freed memory is mapped with PMDs and
the set_memory_np_noalias() -> ... -> __change_page_attr() sequence will
mark the PMD as not present rather than wipe it completely.

Have kern_addr_valid() check whether higher level page table entries are
present before trying to dereference them to fix this issue and to avoid
similar issues in the future.

Stable backporting note:
------------------------

Note that the stable marking is for all active stable branches because
there could be cases where pagetable entries exist but are not valid -
see 9a14aefc1d ("x86: cpa, fix lookup_address"), for example. So make
sure to be on the safe side here and use pXY_present() accessors rather
than pXY_none() which could #GP when accessing pages in the direct map.

Also see:

  c40a56a781 ("x86/mm/init: Remove freed kernel image areas from alias mapping")

for more info.

Reported-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Cc: <stable@vger.kernel.org>	# 4.4+
Link: https://lkml.kernel.org/r/20210819132717.19358-1-rppt@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22 12:26:40 +02:00
..
alpha alpha: Send stop IPI to send to online CPUs 2021-08-12 13:21:05 +02:00
arc ARC: wireup clone3 syscall 2021-09-12 08:56:40 +02:00
arm ARM: tegra: tamonten: Fix UART pad setting 2021-09-22 12:26:32 +02:00
arm64 arm64/sve: Use correct size when reinitialising SVE state 2021-09-22 12:26:39 +02:00
c6x mm: consolidate pgtable_cache_init() and pgd_cache_init() 2019-09-24 15:54:09 -07:00
csky csky: change a Kconfig symbol name to fix e1000 build error 2021-04-28 13:19:16 +02:00
h8300 h8300: fix PREEMPTION build, TI_PRE_COUNT undefined 2021-02-17 10:35:18 +01:00
hexagon hexagon: use common DISCARDS macro 2021-07-20 16:10:50 +02:00
ia64 ia64: mca_drv: fix incorrect array size calculation 2021-07-14 16:53:19 +02:00
m68k m68knommu: only set CONFIG_ISA_DMA_API for ColdFire sub-arch 2021-09-22 12:26:34 +02:00
microblaze microblaze: Prevent the overflow of the start 2020-02-24 08:37:02 +01:00
mips MIPS: Malta: fix alignment of the devicetree buffer 2021-09-22 12:26:26 +02:00
nds32 nds32: fix up stack guard gap 2021-07-28 13:31:01 +02:00
nios2 nios2 update for v5.4-rc1 2019-09-27 13:02:19 -07:00
openrisc openrisc: don't printk() unconditionally 2021-09-22 12:26:24 +02:00
parisc parisc: fix crash with signals and alloca 2021-09-22 12:26:37 +02:00
powerpc KVM: PPC: Fix clearing never mapped TCEs in realmode 2021-09-22 12:26:25 +02:00
riscv bpf: Introduce BPF nospec instruction for mitigating Spectre v4 2021-09-15 09:47:38 +02:00
s390 s390/pv: fix the forcing of the swiotlb 2021-09-22 12:26:37 +02:00
sh sh: dma: fix kconfig dependency for G2_DMA 2021-01-27 11:47:52 +01:00
sparc bpf: Introduce BPF nospec instruction for mitigating Spectre v4 2021-09-15 09:47:38 +02:00
um um: fix error return code in winch_tramp() 2021-07-20 16:10:49 +02:00
unicore32 mm: treewide: clarify pgtable_page_{ctor,dtor}() naming 2019-09-26 10:10:44 -07:00
x86 x86/mm: Fix kern_addr_valid() to cope with existing but not present entries 2021-09-22 12:26:40 +02:00
xtensa xtensa: ISS: don't panic in rs_init 2021-09-22 12:26:30 +02:00
.gitignore
Kconfig Revert: "ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS" 2020-12-30 11:51:47 +01:00