mm: document ZONE_DEVICE memory-model implications

Explain the general mechanisms of 'ZONE_DEVICE' pages and list the users
of 'devm_memremap_pages()'.

[dan.j.williams@intel.com: update ZONE_DEVICE memory model documentation]
  Link: http://lkml.kernel.org/r/156109575458.1409767.1885676287099277666.stgit@dwillia2-desk3.amr.corp.intel.com
Link: http://lkml.kernel.org/r/156092354985.979959.15763234410543451710.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit is contained in:
Dan Williams 2019-07-18 15:58:29 -07:00 committed by Linus Torvalds
parent ba72b4c8cf
commit a0653406a3
1 changed files with 40 additions and 0 deletions

View File

@ -181,3 +181,43 @@ that is eventually passed to vmemmap_populate() through a long chain
of function calls. The vmemmap_populate() implementation may use the
`vmem_altmap` along with :c:func:`altmap_alloc_block_buf` helper to
allocate memory map on the persistent memory device.
ZONE_DEVICE
===========
The `ZONE_DEVICE` facility builds upon `SPARSEMEM_VMEMMAP` to offer
`struct page` `mem_map` services for device driver identified physical
address ranges. The "device" aspect of `ZONE_DEVICE` relates to the fact
that the page objects for these address ranges are never marked online,
and that a reference must be taken against the device, not just the page
to keep the memory pinned for active use. `ZONE_DEVICE`, via
:c:func:`devm_memremap_pages`, performs just enough memory hotplug to
turn on :c:func:`pfn_to_page`, :c:func:`page_to_pfn`, and
:c:func:`get_user_pages` service for the given range of pfns. Since the
page reference count never drops below 1 the page is never tracked as
free memory and the page's `struct list_head lru` space is repurposed
for back referencing to the host device / driver that mapped the memory.
While `SPARSEMEM` presents memory as a collection of sections,
optionally collected into memory blocks, `ZONE_DEVICE` users have a need
for smaller granularity of populating the `mem_map`. Given that
`ZONE_DEVICE` memory is never marked online it is subsequently never
subject to its memory ranges being exposed through the sysfs memory
hotplug api on memory block boundaries. The implementation relies on
this lack of user-api constraint to allow sub-section sized memory
ranges to be specified to :c:func:`arch_add_memory`, the top-half of
memory hotplug. Sub-section support allows for 2MB as the cross-arch
common alignment granularity for :c:func:`devm_memremap_pages`.
The users of `ZONE_DEVICE` are:
* pmem: Map platform persistent memory to be used as a direct-I/O target
via DAX mappings.
* hmm: Extend `ZONE_DEVICE` with `->page_fault()` and `->page_free()`
event callbacks to allow a device-driver to coordinate memory management
events related to device-memory, typically GPU memory. See
Documentation/vm/hmm.rst.
* p2pdma: Create `struct page` objects to allow peer devices in a
PCI/-E topology to coordinate direct-DMA operations between themselves,
i.e. bypass host memory.