linux-brain/mm
Hugh Dickins 1972bb9d92 kaiser: vmstat show NR_KAISERTABLE as nr_overhead
The kaiser update made an interesting choice, never to free any shadow
page tables.  Contention on global spinlock was worrying, particularly
with it held across page table scans when freeing.  Something had to be
done: I was going to add refcounting; but simply never to free them is
an appealing choice, minimizing contention without complicating the code
(the more a page table is found already, the less the spinlock is used).

But leaking pages in this way is also a worry: can we get away with it?
At the very least, we need a count to show how bad it actually gets:
in principle, one might end up wasting about 1/256 of memory that way
(1/512 for when direct-mapped pages have to be user-mapped, plus 1/512
for when they are user-mapped from the vmalloc area on another occasion
(but we don't have vmalloc'ed stacks, so only large ldts are vmalloc'ed).

Add per-cpu stat NR_KAISERTABLE: including 256 at startup for the
shared pgd entries, and 1 for each intermediate page table added
thereafter for user-mapping - but leave out the 1 per mm, for its
shadow pgd, because that distracts from the monotonic increase.
Shown in /proc/vmstat as nr_overhead (0 if kaiser not enabled).

In practice, it doesn't look so bad so far: more like 1/12000 after
nine hours of gtests below; and movable pageblock segregation should
tend to cluster the kaiser tables into a subset of the address space
(if not, they will be bad for compaction too).  But production may
tell a different story: keep an eye on this number, and bring back
lighter freeing if it gets out of control (maybe a shrinker).

["nr_overhead" should of course say "nr_kaisertable", if it needs
to stay; but for the moment we are being coy, preferring that when
Joe Blow notices a new line in his /proc/vmstat, he does not get
too curious about what this "kaiser" stuff might be.]

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:46:33 +01:00
..
kasan kasan: respect /proc/sys/kernel/traceoff_on_warning 2017-06-17 06:41:51 +02:00
Kconfig Allow KASAN and HOTPLUG_MEMORY to co-exist when doing build testing 2016-10-27 16:23:01 -07:00
Kconfig.debug PM / Hibernate: allow hibernation with PAGE_POISONING_ZERO 2016-09-13 02:35:27 +02:00
Makefile Disable the __builtin_return_address() warning globally after all 2016-10-12 10:23:41 -07:00
backing-dev.c block: fix double-free in the failure path of cgwb_bdi_init() 2017-02-26 11:10:52 +01:00
balloon_compaction.c mm: balloon: use general non-lru movable page feature 2016-07-26 16:19:19 -07:00
bootmem.c mm: kmemleak: avoid using __va() on addresses that don't have a lowmem mapping 2016-10-11 15:06:33 -07:00
cleancache.c cleancache: constify cleancache_ops structure 2016-01-27 09:09:57 -05:00
cma.c mm/cma.c: check the max limit for cma allocation 2016-11-11 08:12:37 -08:00
cma.h mm: cma: mark cma_bitmap_maxno() inline in header 2015-08-14 15:56:32 -07:00
cma_debug.c mm/cma_debug: correct size input to bitmap function 2015-07-17 16:39:54 -07:00
compaction.c mm, compaction: fix NR_ISOLATED_* stats for pfn based migration 2017-01-12 11:39:32 +01:00
debug.c mm: clarify why we avoid page_mapcount() for slab pages in dump_page() 2016-10-07 18:46:29 -07:00
debug_page_ref.c mm/page_ref: add tracepoint to track down page reference manipulation 2016-03-17 15:09:34 -07:00
dmapool.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
early_ioremap.c mm/early_ioremap: use offset_in_page macro 2015-11-05 19:34:48 -08:00
fadvise.c mm/fadvise.c: do not discard partial pages with POSIX_FADV_DONTNEED 2016-06-09 14:23:11 -07:00
failslab.c mm: fault-inject take over bootstrap kmem_cache check 2016-03-15 16:55:16 -07:00
filemap.c mm: do not access page->mapping directly on page_endio 2017-03-12 06:41:43 +01:00
frame_vector.c mm: replace get_vaddr_frames() write/force parameters with gup_flags 2016-10-19 08:11:24 -07:00
frontswap.c mm, frontswap: convert frontswap_enabled to static key 2016-07-26 16:19:19 -07:00
gup.c mm: larger stack guard gap, between vmas 2017-06-24 07:11:18 +02:00
highmem.c mm/highmem: make nr_free_highpages() handles all highmem zones by itself 2016-05-19 19:12:14 -07:00
huge_memory.c thp: fix MADV_DONTNEED vs. numa balancing race 2017-12-14 09:28:16 +01:00
hugetlb.c mm, hugetlbfs: introduce ->split() to vm_operations_struct 2017-12-05 11:24:32 +01:00
hugetlb_cgroup.c mm, hugetlb_cgroup: round limit_in_bytes down to hugepage size 2016-05-20 17:58:30 -07:00
hwpoison-inject.c hwpoison: use page_cgroup_ino for filtering by memcg 2015-09-10 13:29:01 -07:00
init-mm.c mm: Add a user_ns owner to mm_struct and fix ptrace permission checks 2017-01-06 10:40:13 +01:00
internal.h mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries 2017-08-11 08:49:29 -07:00
interval_tree.c mm: replace vma->sharead.linear with vma->shared 2015-02-10 14:30:31 -08:00
khugepaged.c mm: khugepaged: fix radix tree node leak in shmem collapse error path 2017-01-12 11:39:32 +01:00
kmemcheck.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
kmemleak-test.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
kmemleak.c mm: kmemleak: scan .data.ro_after_init 2016-11-11 08:12:37 -08:00
ksm.c ksm: prevent crash after write_protect_page fails 2017-06-07 12:07:49 +02:00
list_lru.c mm/list_lru.c: fix list_lru_count_node() to be race free 2017-07-21 07:42:21 +02:00
maccess.c x86: remove more uaccess_32.h complexity 2016-05-22 17:21:27 -07:00
madvise.c mm/madvise.c: fix madvise() infinite loop under special circumstances 2017-12-05 11:24:32 +01:00
memblock.c mm/memblock.c: reversed logic in memblock_discard() 2017-08-30 10:21:47 +02:00
memcontrol.c mm/cgroup: avoid panic when init with low memory 2017-10-08 10:26:10 +02:00
memory-failure.c mm/memory-failure.c: use compound_head() flags for huge pages 2017-06-24 07:11:16 +02:00
memory.c mm/memory.c: fix mem_cgroup_oom_disable() call missing 2017-09-13 14:13:36 -07:00
memory_hotplug.c mm/memory_hotplug: set magic number to page->freelist instead of page->lru.next 2017-10-21 17:21:36 +02:00
mempolicy.c mm/mempolicy: fix use after free when calling get_mempolicy 2017-08-24 17:12:20 -07:00
mempool.c Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements" 2016-07-28 16:07:41 -07:00
memtest.c memtest: remove unused header files 2015-09-08 15:35:28 -07:00
migrate.c Sanitize 'move_pages()' permission checks 2017-08-24 17:12:21 -07:00
mincore.c mm, swap: use offset of swap entry as key of swap cache 2016-10-07 18:46:28 -07:00
mlock.c mlock: fix mlock count can not decrease in race condition 2017-06-07 12:07:49 +02:00
mm_init.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
mmap.c mm, hugetlbfs: introduce ->split() to vm_operations_struct 2017-12-05 11:24:32 +01:00
mmu_context.c mm/mmu_context, sched/core: Fix mmu_context.h assumption 2016-04-28 11:44:19 +02:00
mmu_notifier.c fix Christoph's email addresses 2016-03-17 15:09:34 -07:00
mmzone.c mm, page_alloc: inline the fast path of the zonelist iterator 2016-05-19 19:12:14 -07:00
mprotect.c mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries 2017-08-11 08:49:29 -07:00
mremap.c mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries 2017-08-11 08:49:29 -07:00
msync.c mm/msync: use offset_in_page macro 2015-11-05 19:34:48 -08:00
nobootmem.c mm: discard memblock data later 2017-08-24 17:12:19 -07:00
nommu.c ptrace: Don't allow accessing an undumpable mm 2017-01-06 10:40:13 +01:00
oom_kill.c mm, oom_reaper: gather each vma to prevent leaking TLB entry 2017-12-09 22:01:47 +01:00
page-writeback.c mm: don't use radix tree writeback tags for pages in swap cache 2016-10-07 18:46:28 -07:00
page_alloc.c mm: fix remote numa hits statistics 2017-12-09 22:01:51 +01:00
page_counter.c mm: page_counter: let page_counter_try_charge() return bool 2015-11-05 19:34:48 -08:00
page_ext.c mm/page_ext: support extra space allocation by page_ext user 2016-10-07 18:46:27 -07:00
page_idle.c mm, vmscan: move lru_lock to the node 2016-07-28 16:07:41 -07:00
page_io.c mm/page_io.c: replace some BUG_ON()s with VM_BUG_ON_PAGE() 2016-10-07 18:46:29 -07:00
page_isolation.c mm/page_isolation: fix typo: "paes" -> "pages" 2016-10-07 18:46:29 -07:00
page_owner.c mm/page_owner: don't define fields on struct page_ext by hard-coding 2016-10-07 18:46:27 -07:00
page_poison.c mm: check the return value of lookup_page_ext for all call sites 2016-06-03 15:06:22 -07:00
pagewalk.c mm/pagewalk.c: report holes in hugetlb ranges 2017-11-24 08:33:42 +01:00
percpu-km.c mm: percpu: use pr_fmt to prefix output 2016-03-17 15:09:34 -07:00
percpu-vm.c percpu: move region iterations out of pcpu_[de]populate_chunk() 2014-09-02 14:46:02 -04:00
percpu.c percpu: acquire pcpu_lock when updating pcpu_nr_empty_pop_pages 2017-03-26 13:05:58 +02:00
pgtable-generic.c mm/thp/migration: switch from flush_tlb_range to flush_pmd_tlb_range 2016-03-17 15:09:34 -07:00
process_vm_access.c mm: remove write/force parameters from __get_user_pages_unlocked() 2016-10-18 14:13:37 -07:00
quicklist.c fix Christoph's email addresses 2016-03-17 15:09:34 -07:00
readahead.c mm: silently skip readahead for DAX inodes 2016-08-26 17:39:35 -07:00
rmap.c mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries 2017-08-11 08:49:29 -07:00
shmem.c mm, shmem: fix handling /sys/kernel/mm/transparent_hugepage/shmem_enabled 2017-08-30 10:21:46 +02:00
slab.c slub: move synchronize_sched out of slab_mutex on shrink 2017-03-22 12:43:38 +01:00
slab.h slub: move synchronize_sched out of slab_mutex on shrink 2017-03-22 12:43:38 +01:00
slab_common.c slub: do not merge cache if slub_debug contains a never-merge flag 2017-10-21 17:21:36 +02:00
slob.c slub: move synchronize_sched out of slab_mutex on shrink 2017-03-22 12:43:38 +01:00
slub.c mm/slub.c: trace free objects at KERN_INFO 2017-06-07 12:07:49 +02:00
sparse-vmemmap.c treewide: replace obsolete _refok by __ref 2016-08-02 17:31:41 -04:00
sparse.c mm/memory_hotplug: set magic number to page->freelist instead of page->lru.next 2017-10-21 17:21:36 +02:00
swap.c thp: reduce usage of huge zero page's atomic counter 2016-10-07 18:46:28 -07:00
swap_cgroup.c mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff() 2017-07-05 14:40:17 +02:00
swap_state.c mm, swap: use offset of swap entry as key of swap cache 2016-10-07 18:46:28 -07:00
swapfile.c mm: support anonymous stable page 2017-01-19 20:17:59 +01:00
truncate.c fs: add i_blocksize() 2017-06-14 15:06:00 +02:00
usercopy.c mm: usercopy: Check for module addresses 2016-09-20 16:07:39 -07:00
userfaultfd.c mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
util.c Merge branch 'mm-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-10-22 09:39:10 -07:00
vmacache.c mm: unrig VMA cache hit ratio 2016-10-07 18:46:27 -07:00
vmalloc.c mm/vmalloc.c: huge-vmap: fail gracefully on unexpected huge vmap mappings 2017-07-05 14:40:28 +02:00
vmpressure.c mm: vmpressure: fix sending wrong events on underflow 2017-03-12 06:41:43 +01:00
vmscan.c mm, vmscan: consider eligible zones in get_scan_count 2017-03-12 06:41:44 +01:00
vmstat.c kaiser: vmstat show NR_KAISERTABLE as nr_overhead 2018-01-05 15:46:33 +01:00
workingset.c mm: workingset: fix premature shadow node shrinking with cgroups 2017-04-08 09:30:36 +02:00
z3fold.c mm/z3fold.c: avoid modifying HEADLESS page and minor cleanup 2016-06-03 16:02:55 -07:00
zbud.c mm/zbud.c: use list_last_entry() instead of list_tail_entry() 2016-01-15 11:40:52 -08:00
zpool.c mm: zsmalloc: constify struct zs_pool name 2015-11-06 17:50:42 -08:00
zsmalloc.c zsmalloc: calling zs_map_object() from irq is a bug 2017-12-14 09:28:23 +01:00
zswap.c zswap: disable changing params if init fails 2017-02-09 08:08:27 +01:00