docs: ext4.rst: document case-insensitive directories

Introduces the case-insensitive features on ext4 for system administrators. Explain the minimum of design decisions that are important for sysadmins wanting to enable this feature. Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-06-09 23:36:23 +09:00 · 2019-04-25 14:13:27 -04:00 · 2019-04-25 14:13:27 -04:00 · 0a790fe438
commit 0a790fe438
parent b886ee3e77
1 changed files with 38 additions and 0 deletions
--- a/Documentation/admin-guide/ext4.rst
+++ b/Documentation/admin-guide/ext4.rst
@ -91,10 +91,48 @@ Currently Available
 * large block (up to pagesize) support
 * efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force
  the ordering)
 * Case-insensitive file name lookups
 [1] Filesystems with a block size of 1k may see a limit imposed by the
 directory hash tree having a maximum depth of two.
 case-insensitive file name lookups
 ======================================================
 The case-insensitive file name lookup feature is supported on a
 per-directory basis, allowing the user to mix case-insensitive and
 case-sensitive directories in the same filesystem.  It is enabled by
 flipping the +F inode attribute of an empty directory.  The
 case-insensitive string match operation is only defined when we know how
 text in encoded in a byte sequence.  For that reason, in order to enable
 case-insensitive directories, the filesystem must have the
 casefold feature, which stores the filesystem-wide encoding
 model used.  By default, the charset adopted is the latest version of
 Unicode (12.1.0, by the time of this writing), encoded in the UTF-8
 form.  The comparison algorithm is implemented by normalizing the
 strings to the Canonical decomposition form, as defined by Unicode,
 followed by a byte per byte comparison.
 The case-awareness is name-preserving on the disk, meaning that the file
 name provided by userspace is a byte-per-byte match to what is actually
 written in the disk.  The Unicode normalization format used by the
 kernel is thus an internal representation, and not exposed to the
 userspace nor to the disk, with the important exception of disk hashes,
 used on large case-insensitive directories with DX feature.  On DX
 directories, the hash must be calculated using the casefolded version of
 the filename, meaning that the normalization format used actually has an
 impact on where the directory entry is stored.
 When we change from viewing filenames as opaque byte sequences to seeing
 them as encoded strings we need to address what happens when a program
 tries to create a file with an invalid name.  The Unicode subsystem
 within the kernel leaves the decision of what to do in this case to the
 filesystem, which select its preferred behavior by enabling/disabling
 the strict mode.  When Ext4 encounters one of those strings and the
 filesystem did not require strict mode, it falls back to considering the
 entire string as an opaque byte sequence, which still allows the user to
 operate on that file, but the case-insensitive lookups won't work.
 Options
 =======