linux-brain/fs/fuse/dir.c
David Howells a528d35e8b statx: Add a system call to make enhanced file info available
Add a system call to make extended file information available, including
file creation and some attribute flags where available through the
underlying filesystem.

The getattr inode operation is altered to take two additional arguments: a
u32 request_mask and an unsigned int flags that indicate the
synchronisation mode.  This change is propagated to the vfs_getattr*()
function.

Functions like vfs_stat() are now inline wrappers around new functions
vfs_statx() and vfs_statx_fd() to reduce stack usage.

========
OVERVIEW
========

The idea was initially proposed as a set of xattrs that could be retrieved
with getxattr(), but the general preference proved to be for a new syscall
with an extended stat structure.

A number of requests were gathered for features to be included.  The
following have been included:

 (1) Make the fields a consistent size on all arches and make them large.

 (2) Spare space, request flags and information flags are provided for
     future expansion.

 (3) Better support for the y2038 problem [Arnd Bergmann] (tv_sec is an
     __s64).

 (4) Creation time: The SMB protocol carries the creation time, which could
     be exported by Samba, which will in turn help CIFS make use of
     FS-Cache as that can be used for coherency data (stx_btime).

     This is also specified in NFSv4 as a recommended attribute and could
     be exported by NFSD [Steve French].

 (5) Lightweight stat: Ask for just those details of interest, and allow a
     netfs (such as NFS) to approximate anything not of interest, possibly
     without going to the server [Trond Myklebust, Ulrich Drepper, Andreas
     Dilger] (AT_STATX_DONT_SYNC).

 (6) Heavyweight stat: Force a netfs to go to the server, even if it thinks
     its cached attributes are up to date [Trond Myklebust]
     (AT_STATX_FORCE_SYNC).

And the following have been left out for future extension:

 (7) Data version number: Could be used by userspace NFS servers [Aneesh
     Kumar].

     Can also be used to modify fill_post_wcc() in NFSD which retrieves
     i_version directly, but has just called vfs_getattr().  It could get
     it from the kstat struct if it used vfs_xgetattr() instead.

     (There's disagreement on the exact semantics of a single field, since
     not all filesystems do this the same way).

 (8) BSD stat compatibility: Including more fields from the BSD stat such
     as creation time (st_btime) and inode generation number (st_gen)
     [Jeremy Allison, Bernd Schubert].

 (9) Inode generation number: Useful for FUSE and userspace NFS servers
     [Bernd Schubert].

     (This was asked for but later deemed unnecessary with the
     open-by-handle capability available and caused disagreement as to
     whether it's a security hole or not).

(10) Extra coherency data may be useful in making backups [Andreas Dilger].

     (No particular data were offered, but things like last backup
     timestamp, the data version number and the DOS archive bit would come
     into this category).

(11) Allow the filesystem to indicate what it can/cannot provide: A
     filesystem can now say it doesn't support a standard stat feature if
     that isn't available, so if, for instance, inode numbers or UIDs don't
     exist or are fabricated locally...

     (This requires a separate system call - I have an fsinfo() call idea
     for this).

(12) Store a 16-byte volume ID in the superblock that can be returned in
     struct xstat [Steve French].

     (Deferred to fsinfo).

(13) Include granularity fields in the time data to indicate the
     granularity of each of the times (NFSv4 time_delta) [Steve French].

     (Deferred to fsinfo).

(14) FS_IOC_GETFLAGS value.  These could be translated to BSD's st_flags.
     Note that the Linux IOC flags are a mess and filesystems such as Ext4
     define flags that aren't in linux/fs.h, so translation in the kernel
     may be a necessity (or, possibly, we provide the filesystem type too).

     (Some attributes are made available in stx_attributes, but the general
     feeling was that the IOC flags were to ext[234]-specific and shouldn't
     be exposed through statx this way).

(15) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer,
     Michael Kerrisk].

     (Deferred, probably to fsinfo.  Finding out if there's an ACL or
     seclabal might require extra filesystem operations).

(16) Femtosecond-resolution timestamps [Dave Chinner].

     (A __reserved field has been left in the statx_timestamp struct for
     this - if there proves to be a need).

(17) A set multiple attributes syscall to go with this.

===============
NEW SYSTEM CALL
===============

The new system call is:

	int ret = statx(int dfd,
			const char *filename,
			unsigned int flags,
			unsigned int mask,
			struct statx *buffer);

The dfd, filename and flags parameters indicate the file to query, in a
similar way to fstatat().  There is no equivalent of lstat() as that can be
emulated with statx() by passing AT_SYMLINK_NOFOLLOW in flags.  There is
also no equivalent of fstat() as that can be emulated by passing a NULL
filename to statx() with the fd of interest in dfd.

Whether or not statx() synchronises the attributes with the backing store
can be controlled by OR'ing a value into the flags argument (this typically
only affects network filesystems):

 (1) AT_STATX_SYNC_AS_STAT tells statx() to behave as stat() does in this
     respect.

 (2) AT_STATX_FORCE_SYNC will require a network filesystem to synchronise
     its attributes with the server - which might require data writeback to
     occur to get the timestamps correct.

 (3) AT_STATX_DONT_SYNC will suppress synchronisation with the server in a
     network filesystem.  The resulting values should be considered
     approximate.

mask is a bitmask indicating the fields in struct statx that are of
interest to the caller.  The user should set this to STATX_BASIC_STATS to
get the basic set returned by stat().  It should be noted that asking for
more information may entail extra I/O operations.

buffer points to the destination for the data.  This must be 256 bytes in
size.

======================
MAIN ATTRIBUTES RECORD
======================

The following structures are defined in which to return the main attribute
set:

	struct statx_timestamp {
		__s64	tv_sec;
		__s32	tv_nsec;
		__s32	__reserved;
	};

	struct statx {
		__u32	stx_mask;
		__u32	stx_blksize;
		__u64	stx_attributes;
		__u32	stx_nlink;
		__u32	stx_uid;
		__u32	stx_gid;
		__u16	stx_mode;
		__u16	__spare0[1];
		__u64	stx_ino;
		__u64	stx_size;
		__u64	stx_blocks;
		__u64	__spare1[1];
		struct statx_timestamp	stx_atime;
		struct statx_timestamp	stx_btime;
		struct statx_timestamp	stx_ctime;
		struct statx_timestamp	stx_mtime;
		__u32	stx_rdev_major;
		__u32	stx_rdev_minor;
		__u32	stx_dev_major;
		__u32	stx_dev_minor;
		__u64	__spare2[14];
	};

The defined bits in request_mask and stx_mask are:

	STATX_TYPE		Want/got stx_mode & S_IFMT
	STATX_MODE		Want/got stx_mode & ~S_IFMT
	STATX_NLINK		Want/got stx_nlink
	STATX_UID		Want/got stx_uid
	STATX_GID		Want/got stx_gid
	STATX_ATIME		Want/got stx_atime{,_ns}
	STATX_MTIME		Want/got stx_mtime{,_ns}
	STATX_CTIME		Want/got stx_ctime{,_ns}
	STATX_INO		Want/got stx_ino
	STATX_SIZE		Want/got stx_size
	STATX_BLOCKS		Want/got stx_blocks
	STATX_BASIC_STATS	[The stuff in the normal stat struct]
	STATX_BTIME		Want/got stx_btime{,_ns}
	STATX_ALL		[All currently available stuff]

stx_btime is the file creation time, stx_mask is a bitmask indicating the
data provided and __spares*[] are where as-yet undefined fields can be
placed.

Time fields are structures with separate seconds and nanoseconds fields
plus a reserved field in case we want to add even finer resolution.  Note
that times will be negative if before 1970; in such a case, the nanosecond
fields will also be negative if not zero.

The bits defined in the stx_attributes field convey information about a
file, how it is accessed, where it is and what it does.  The following
attributes map to FS_*_FL flags and are the same numerical value:

	STATX_ATTR_COMPRESSED		File is compressed by the fs
	STATX_ATTR_IMMUTABLE		File is marked immutable
	STATX_ATTR_APPEND		File is append-only
	STATX_ATTR_NODUMP		File is not to be dumped
	STATX_ATTR_ENCRYPTED		File requires key to decrypt in fs

Within the kernel, the supported flags are listed by:

	KSTAT_ATTR_FS_IOC_FLAGS

[Are any other IOC flags of sufficient general interest to be exposed
through this interface?]

New flags include:

	STATX_ATTR_AUTOMOUNT		Object is an automount trigger

These are for the use of GUI tools that might want to mark files specially,
depending on what they are.

Fields in struct statx come in a number of classes:

 (0) stx_dev_*, stx_blksize.

     These are local system information and are always available.

 (1) stx_mode, stx_nlinks, stx_uid, stx_gid, stx_[amc]time, stx_ino,
     stx_size, stx_blocks.

     These will be returned whether the caller asks for them or not.  The
     corresponding bits in stx_mask will be set to indicate whether they
     actually have valid values.

     If the caller didn't ask for them, then they may be approximated.  For
     example, NFS won't waste any time updating them from the server,
     unless as a byproduct of updating something requested.

     If the values don't actually exist for the underlying object (such as
     UID or GID on a DOS file), then the bit won't be set in the stx_mask,
     even if the caller asked for the value.  In such a case, the returned
     value will be a fabrication.

     Note that there are instances where the type might not be valid, for
     instance Windows reparse points.

 (2) stx_rdev_*.

     This will be set only if stx_mode indicates we're looking at a
     blockdev or a chardev, otherwise will be 0.

 (3) stx_btime.

     Similar to (1), except this will be set to 0 if it doesn't exist.

=======
TESTING
=======

The following test program can be used to test the statx system call:

	samples/statx/test-statx.c

Just compile and run, passing it paths to the files you want to examine.
The file is built automatically if CONFIG_SAMPLES is enabled.

Here's some example output.  Firstly, an NFS directory that crosses to
another FSID.  Note that the AUTOMOUNT attribute is set because transiting
this directory will cause d_automount to be invoked by the VFS.

	[root@andromeda ~]# /tmp/test-statx -A /warthog/data
	statx(/warthog/data) = 0
	results=7ff
	  Size: 4096            Blocks: 8          IO Block: 1048576  directory
	Device: 00:26           Inode: 1703937     Links: 125
	Access: (3777/drwxrwxrwx)  Uid:     0   Gid:  4041
	Access: 2016-11-24 09:02:12.219699527+0000
	Modify: 2016-11-17 10:44:36.225653653+0000
	Change: 2016-11-17 10:44:36.225653653+0000
	Attributes: 0000000000001000 (-------- -------- -------- -------- -------- -------- ---m---- --------)

Secondly, the result of automounting on that directory.

	[root@andromeda ~]# /tmp/test-statx /warthog/data
	statx(/warthog/data) = 0
	results=7ff
	  Size: 4096            Blocks: 8          IO Block: 1048576  directory
	Device: 00:27           Inode: 2           Links: 125
	Access: (3777/drwxrwxrwx)  Uid:     0   Gid:  4041
	Access: 2016-11-24 09:02:12.219699527+0000
	Modify: 2016-11-17 10:44:36.225653653+0000
	Change: 2016-11-17 10:44:36.225653653+0000

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-03-02 20:51:15 -05:00

1853 lines
45 KiB
C

/*
FUSE: Filesystem in Userspace
Copyright (C) 2001-2008 Miklos Szeredi <miklos@szeredi.hu>
This program can be distributed under the terms of the GNU GPL.
See the file COPYING.
*/
#include "fuse_i.h"
#include <linux/pagemap.h>
#include <linux/file.h>
#include <linux/sched.h>
#include <linux/namei.h>
#include <linux/slab.h>
#include <linux/xattr.h>
#include <linux/posix_acl.h>
static bool fuse_use_readdirplus(struct inode *dir, struct dir_context *ctx)
{
struct fuse_conn *fc = get_fuse_conn(dir);
struct fuse_inode *fi = get_fuse_inode(dir);
if (!fc->do_readdirplus)
return false;
if (!fc->readdirplus_auto)
return true;
if (test_and_clear_bit(FUSE_I_ADVISE_RDPLUS, &fi->state))
return true;
if (ctx->pos == 0)
return true;
return false;
}
static void fuse_advise_use_readdirplus(struct inode *dir)
{
struct fuse_inode *fi = get_fuse_inode(dir);
set_bit(FUSE_I_ADVISE_RDPLUS, &fi->state);
}
union fuse_dentry {
u64 time;
struct rcu_head rcu;
};
static inline void fuse_dentry_settime(struct dentry *entry, u64 time)
{
((union fuse_dentry *) entry->d_fsdata)->time = time;
}
static inline u64 fuse_dentry_time(struct dentry *entry)
{
return ((union fuse_dentry *) entry->d_fsdata)->time;
}
/*
* FUSE caches dentries and attributes with separate timeout. The
* time in jiffies until the dentry/attributes are valid is stored in
* dentry->d_fsdata and fuse_inode->i_time respectively.
*/
/*
* Calculate the time in jiffies until a dentry/attributes are valid
*/
static u64 time_to_jiffies(u64 sec, u32 nsec)
{
if (sec || nsec) {
struct timespec64 ts = {
sec,
min_t(u32, nsec, NSEC_PER_SEC - 1)
};
return get_jiffies_64() + timespec64_to_jiffies(&ts);
} else
return 0;
}
/*
* Set dentry and possibly attribute timeouts from the lookup/mk*
* replies
*/
static void fuse_change_entry_timeout(struct dentry *entry,
struct fuse_entry_out *o)
{
fuse_dentry_settime(entry,
time_to_jiffies(o->entry_valid, o->entry_valid_nsec));
}
static u64 attr_timeout(struct fuse_attr_out *o)
{
return time_to_jiffies(o->attr_valid, o->attr_valid_nsec);
}
static u64 entry_attr_timeout(struct fuse_entry_out *o)
{
return time_to_jiffies(o->attr_valid, o->attr_valid_nsec);
}
/*
* Mark the attributes as stale, so that at the next call to
* ->getattr() they will be fetched from userspace
*/
void fuse_invalidate_attr(struct inode *inode)
{
get_fuse_inode(inode)->i_time = 0;
}
/**
* Mark the attributes as stale due to an atime change. Avoid the invalidate if
* atime is not used.
*/
void fuse_invalidate_atime(struct inode *inode)
{
if (!IS_RDONLY(inode))
fuse_invalidate_attr(inode);
}
/*
* Just mark the entry as stale, so that a next attempt to look it up
* will result in a new lookup call to userspace
*
* This is called when a dentry is about to become negative and the
* timeout is unknown (unlink, rmdir, rename and in some cases
* lookup)
*/
void fuse_invalidate_entry_cache(struct dentry *entry)
{
fuse_dentry_settime(entry, 0);
}
/*
* Same as fuse_invalidate_entry_cache(), but also try to remove the
* dentry from the hash
*/
static void fuse_invalidate_entry(struct dentry *entry)
{
d_invalidate(entry);
fuse_invalidate_entry_cache(entry);
}
static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_args *args,
u64 nodeid, const struct qstr *name,
struct fuse_entry_out *outarg)
{
memset(outarg, 0, sizeof(struct fuse_entry_out));
args->in.h.opcode = FUSE_LOOKUP;
args->in.h.nodeid = nodeid;
args->in.numargs = 1;
args->in.args[0].size = name->len + 1;
args->in.args[0].value = name->name;
args->out.numargs = 1;
args->out.args[0].size = sizeof(struct fuse_entry_out);
args->out.args[0].value = outarg;
}
u64 fuse_get_attr_version(struct fuse_conn *fc)
{
u64 curr_version;
/*
* The spin lock isn't actually needed on 64bit archs, but we
* don't yet care too much about such optimizations.
*/
spin_lock(&fc->lock);
curr_version = fc->attr_version;
spin_unlock(&fc->lock);
return curr_version;
}
/*
* Check whether the dentry is still valid
*
* If the entry validity timeout has expired and the dentry is
* positive, try to redo the lookup. If the lookup results in a
* different inode, then let the VFS invalidate the dentry and redo
* the lookup once more. If the lookup results in the same inode,
* then refresh the attributes, timeouts and mark the dentry valid.
*/
static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
{
struct inode *inode;
struct dentry *parent;
struct fuse_conn *fc;
struct fuse_inode *fi;
int ret;
inode = d_inode_rcu(entry);
if (inode && is_bad_inode(inode))
goto invalid;
else if (time_before64(fuse_dentry_time(entry), get_jiffies_64()) ||
(flags & LOOKUP_REVAL)) {
struct fuse_entry_out outarg;
FUSE_ARGS(args);
struct fuse_forget_link *forget;
u64 attr_version;
/* For negative dentries, always do a fresh lookup */
if (!inode)
goto invalid;
ret = -ECHILD;
if (flags & LOOKUP_RCU)
goto out;
fc = get_fuse_conn(inode);
forget = fuse_alloc_forget();
ret = -ENOMEM;
if (!forget)
goto out;
attr_version = fuse_get_attr_version(fc);
parent = dget_parent(entry);
fuse_lookup_init(fc, &args, get_node_id(d_inode(parent)),
&entry->d_name, &outarg);
ret = fuse_simple_request(fc, &args);
dput(parent);
/* Zero nodeid is same as -ENOENT */
if (!ret && !outarg.nodeid)
ret = -ENOENT;
if (!ret) {
fi = get_fuse_inode(inode);
if (outarg.nodeid != get_node_id(inode)) {
fuse_queue_forget(fc, forget, outarg.nodeid, 1);
goto invalid;
}
spin_lock(&fc->lock);
fi->nlookup++;
spin_unlock(&fc->lock);
}
kfree(forget);
if (ret == -ENOMEM)
goto out;
if (ret || (outarg.attr.mode ^ inode->i_mode) & S_IFMT)
goto invalid;
forget_all_cached_acls(inode);
fuse_change_attributes(inode, &outarg.attr,
entry_attr_timeout(&outarg),
attr_version);
fuse_change_entry_timeout(entry, &outarg);
} else if (inode) {
fi = get_fuse_inode(inode);
if (flags & LOOKUP_RCU) {
if (test_bit(FUSE_I_INIT_RDPLUS, &fi->state))
return -ECHILD;
} else if (test_and_clear_bit(FUSE_I_INIT_RDPLUS, &fi->state)) {
parent = dget_parent(entry);
fuse_advise_use_readdirplus(d_inode(parent));
dput(parent);
}
}
ret = 1;
out:
return ret;
invalid:
ret = 0;
goto out;
}
static int invalid_nodeid(u64 nodeid)
{
return !nodeid || nodeid == FUSE_ROOT_ID;
}
static int fuse_dentry_init(struct dentry *dentry)
{
dentry->d_fsdata = kzalloc(sizeof(union fuse_dentry), GFP_KERNEL);
return dentry->d_fsdata ? 0 : -ENOMEM;
}
static void fuse_dentry_release(struct dentry *dentry)
{
union fuse_dentry *fd = dentry->d_fsdata;
kfree_rcu(fd, rcu);
}
const struct dentry_operations fuse_dentry_operations = {
.d_revalidate = fuse_dentry_revalidate,
.d_init = fuse_dentry_init,
.d_release = fuse_dentry_release,
};
const struct dentry_operations fuse_root_dentry_operations = {
.d_init = fuse_dentry_init,
.d_release = fuse_dentry_release,
};
int fuse_valid_type(int m)
{
return S_ISREG(m) || S_ISDIR(m) || S_ISLNK(m) || S_ISCHR(m) ||
S_ISBLK(m) || S_ISFIFO(m) || S_ISSOCK(m);
}
int fuse_lookup_name(struct super_block *sb, u64 nodeid, const struct qstr *name,
struct fuse_entry_out *outarg, struct inode **inode)
{
struct fuse_conn *fc = get_fuse_conn_super(sb);
FUSE_ARGS(args);
struct fuse_forget_link *forget;
u64 attr_version;
int err;
*inode = NULL;
err = -ENAMETOOLONG;
if (name->len > FUSE_NAME_MAX)
goto out;
forget = fuse_alloc_forget();
err = -ENOMEM;
if (!forget)
goto out;
attr_version = fuse_get_attr_version(fc);
fuse_lookup_init(fc, &args, nodeid, name, outarg);
err = fuse_simple_request(fc, &args);
/* Zero nodeid is same as -ENOENT, but with valid timeout */
if (err || !outarg->nodeid)
goto out_put_forget;
err = -EIO;
if (!outarg->nodeid)
goto out_put_forget;
if (!fuse_valid_type(outarg->attr.mode))
goto out_put_forget;
*inode = fuse_iget(sb, outarg->nodeid, outarg->generation,
&outarg->attr, entry_attr_timeout(outarg),
attr_version);
err = -ENOMEM;
if (!*inode) {
fuse_queue_forget(fc, forget, outarg->nodeid, 1);
goto out;
}
err = 0;
out_put_forget:
kfree(forget);
out:
return err;
}
static struct dentry *fuse_lookup(struct inode *dir, struct dentry *entry,
unsigned int flags)
{
int err;
struct fuse_entry_out outarg;
struct inode *inode;
struct dentry *newent;
bool outarg_valid = true;
fuse_lock_inode(dir);
err = fuse_lookup_name(dir->i_sb, get_node_id(dir), &entry->d_name,
&outarg, &inode);
fuse_unlock_inode(dir);
if (err == -ENOENT) {
outarg_valid = false;
err = 0;
}
if (err)
goto out_err;
err = -EIO;
if (inode && get_node_id(inode) == FUSE_ROOT_ID)
goto out_iput;
newent = d_splice_alias(inode, entry);
err = PTR_ERR(newent);
if (IS_ERR(newent))
goto out_err;
entry = newent ? newent : entry;
if (outarg_valid)
fuse_change_entry_timeout(entry, &outarg);
else
fuse_invalidate_entry_cache(entry);
fuse_advise_use_readdirplus(dir);
return newent;
out_iput:
iput(inode);
out_err:
return ERR_PTR(err);
}
/*
* Atomic create+open operation
*
* If the filesystem doesn't support this, then fall back to separate
* 'mknod' + 'open' requests.
*/
static int fuse_create_open(struct inode *dir, struct dentry *entry,
struct file *file, unsigned flags,
umode_t mode, int *opened)
{
int err;
struct inode *inode;
struct fuse_conn *fc = get_fuse_conn(dir);
FUSE_ARGS(args);
struct fuse_forget_link *forget;
struct fuse_create_in inarg;
struct fuse_open_out outopen;
struct fuse_entry_out outentry;
struct fuse_file *ff;
/* Userspace expects S_IFREG in create mode */
BUG_ON((mode & S_IFMT) != S_IFREG);
forget = fuse_alloc_forget();
err = -ENOMEM;
if (!forget)
goto out_err;
err = -ENOMEM;
ff = fuse_file_alloc(fc);
if (!ff)
goto out_put_forget_req;
if (!fc->dont_mask)
mode &= ~current_umask();
flags &= ~O_NOCTTY;
memset(&inarg, 0, sizeof(inarg));
memset(&outentry, 0, sizeof(outentry));
inarg.flags = flags;
inarg.mode = mode;
inarg.umask = current_umask();
args.in.h.opcode = FUSE_CREATE;
args.in.h.nodeid = get_node_id(dir);
args.in.numargs = 2;
args.in.args[0].size = sizeof(inarg);
args.in.args[0].value = &inarg;
args.in.args[1].size = entry->d_name.len + 1;
args.in.args[1].value = entry->d_name.name;
args.out.numargs = 2;
args.out.args[0].size = sizeof(outentry);
args.out.args[0].value = &outentry;
args.out.args[1].size = sizeof(outopen);
args.out.args[1].value = &outopen;
err = fuse_simple_request(fc, &args);
if (err)
goto out_free_ff;
err = -EIO;
if (!S_ISREG(outentry.attr.mode) || invalid_nodeid(outentry.nodeid))
goto out_free_ff;
ff->fh = outopen.fh;
ff->nodeid = outentry.nodeid;
ff->open_flags = outopen.open_flags;
inode = fuse_iget(dir->i_sb, outentry.nodeid, outentry.generation,
&outentry.attr, entry_attr_timeout(&outentry), 0);
if (!inode) {
flags &= ~(O_CREAT | O_EXCL | O_TRUNC);
fuse_sync_release(ff, flags);
fuse_queue_forget(fc, forget, outentry.nodeid, 1);
err = -ENOMEM;
goto out_err;
}
kfree(forget);
d_instantiate(entry, inode);
fuse_change_entry_timeout(entry, &outentry);
fuse_invalidate_attr(dir);
err = finish_open(file, entry, generic_file_open, opened);
if (err) {
fuse_sync_release(ff, flags);
} else {
file->private_data = fuse_file_get(ff);
fuse_finish_open(inode, file);
}
return err;
out_free_ff:
fuse_file_free(ff);
out_put_forget_req:
kfree(forget);
out_err:
return err;
}
static int fuse_mknod(struct inode *, struct dentry *, umode_t, dev_t);
static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
struct file *file, unsigned flags,
umode_t mode, int *opened)
{
int err;
struct fuse_conn *fc = get_fuse_conn(dir);
struct dentry *res = NULL;
if (d_in_lookup(entry)) {
res = fuse_lookup(dir, entry, 0);
if (IS_ERR(res))
return PTR_ERR(res);
if (res)
entry = res;
}
if (!(flags & O_CREAT) || d_really_is_positive(entry))
goto no_open;
/* Only creates */
*opened |= FILE_CREATED;
if (fc->no_create)
goto mknod;
err = fuse_create_open(dir, entry, file, flags, mode, opened);
if (err == -ENOSYS) {
fc->no_create = 1;
goto mknod;
}
out_dput:
dput(res);
return err;
mknod:
err = fuse_mknod(dir, entry, mode, 0);
if (err)
goto out_dput;
no_open:
return finish_no_open(file, res);
}
/*
* Code shared between mknod, mkdir, symlink and link
*/
static int create_new_entry(struct fuse_conn *fc, struct fuse_args *args,
struct inode *dir, struct dentry *entry,
umode_t mode)
{
struct fuse_entry_out outarg;
struct inode *inode;
int err;
struct fuse_forget_link *forget;
forget = fuse_alloc_forget();
if (!forget)
return -ENOMEM;
memset(&outarg, 0, sizeof(outarg));
args->in.h.nodeid = get_node_id(dir);
args->out.numargs = 1;
args->out.args[0].size = sizeof(outarg);
args->out.args[0].value = &outarg;
err = fuse_simple_request(fc, args);
if (err)
goto out_put_forget_req;
err = -EIO;
if (invalid_nodeid(outarg.nodeid))
goto out_put_forget_req;
if ((outarg.attr.mode ^ mode) & S_IFMT)
goto out_put_forget_req;
inode = fuse_iget(dir->i_sb, outarg.nodeid, outarg.generation,
&outarg.attr, entry_attr_timeout(&outarg), 0);
if (!inode) {
fuse_queue_forget(fc, forget, outarg.nodeid, 1);
return -ENOMEM;
}
kfree(forget);
err = d_instantiate_no_diralias(entry, inode);
if (err)
return err;
fuse_change_entry_timeout(entry, &outarg);
fuse_invalidate_attr(dir);
return 0;
out_put_forget_req:
kfree(forget);
return err;
}
static int fuse_mknod(struct inode *dir, struct dentry *entry, umode_t mode,
dev_t rdev)
{
struct fuse_mknod_in inarg;
struct fuse_conn *fc = get_fuse_conn(dir);
FUSE_ARGS(args);
if (!fc->dont_mask)
mode &= ~current_umask();
memset(&inarg, 0, sizeof(inarg));
inarg.mode = mode;
inarg.rdev = new_encode_dev(rdev);
inarg.umask = current_umask();
args.in.h.opcode = FUSE_MKNOD;
args.in.numargs = 2;
args.in.args[0].size = sizeof(inarg);
args.in.args[0].value = &inarg;
args.in.args[1].size = entry->d_name.len + 1;
args.in.args[1].value = entry->d_name.name;
return create_new_entry(fc, &args, dir, entry, mode);
}
static int fuse_create(struct inode *dir, struct dentry *entry, umode_t mode,
bool excl)
{
return fuse_mknod(dir, entry, mode, 0);
}
static int fuse_mkdir(struct inode *dir, struct dentry *entry, umode_t mode)
{
struct fuse_mkdir_in inarg;
struct fuse_conn *fc = get_fuse_conn(dir);
FUSE_ARGS(args);
if (!fc->dont_mask)
mode &= ~current_umask();
memset(&inarg, 0, sizeof(inarg));
inarg.mode = mode;
inarg.umask = current_umask();
args.in.h.opcode = FUSE_MKDIR;
args.in.numargs = 2;
args.in.args[0].size = sizeof(inarg);
args.in.args[0].value = &inarg;
args.in.args[1].size = entry->d_name.len + 1;
args.in.args[1].value = entry->d_name.name;
return create_new_entry(fc, &args, dir, entry, S_IFDIR);
}
static int fuse_symlink(struct inode *dir, struct dentry *entry,
const char *link)
{
struct fuse_conn *fc = get_fuse_conn(dir);
unsigned len = strlen(link) + 1;
FUSE_ARGS(args);
args.in.h.opcode = FUSE_SYMLINK;
args.in.numargs = 2;
args.in.args[0].size = entry->d_name.len + 1;
args.in.args[0].value = entry->d_name.name;
args.in.args[1].size = len;
args.in.args[1].value = link;
return create_new_entry(fc, &args, dir, entry, S_IFLNK);
}
void fuse_update_ctime(struct inode *inode)
{
if (!IS_NOCMTIME(inode)) {
inode->i_ctime = current_time(inode);
mark_inode_dirty_sync(inode);
}
}
static int fuse_unlink(struct inode *dir, struct dentry *entry)
{
int err;
struct fuse_conn *fc = get_fuse_conn(dir);
FUSE_ARGS(args);
args.in.h.opcode = FUSE_UNLINK;
args.in.h.nodeid = get_node_id(dir);
args.in.numargs = 1;
args.in.args[0].size = entry->d_name.len + 1;
args.in.args[0].value = entry->d_name.name;
err = fuse_simple_request(fc, &args);
if (!err) {
struct inode *inode = d_inode(entry);
struct fuse_inode *fi = get_fuse_inode(inode);
spin_lock(&fc->lock);
fi->attr_version = ++fc->attr_version;
/*
* If i_nlink == 0 then unlink doesn't make sense, yet this can
* happen if userspace filesystem is careless. It would be
* difficult to enforce correct nlink usage so just ignore this
* condition here
*/
if (inode->i_nlink > 0)
drop_nlink(inode);
spin_unlock(&fc->lock);
fuse_invalidate_attr(inode);
fuse_invalidate_attr(dir);
fuse_invalidate_entry_cache(entry);
fuse_update_ctime(inode);
} else if (err == -EINTR)
fuse_invalidate_entry(entry);
return err;
}
static int fuse_rmdir(struct inode *dir, struct dentry *entry)
{
int err;
struct fuse_conn *fc = get_fuse_conn(dir);
FUSE_ARGS(args);
args.in.h.opcode = FUSE_RMDIR;
args.in.h.nodeid = get_node_id(dir);
args.in.numargs = 1;
args.in.args[0].size = entry->d_name.len + 1;
args.in.args[0].value = entry->d_name.name;
err = fuse_simple_request(fc, &args);
if (!err) {
clear_nlink(d_inode(entry));
fuse_invalidate_attr(dir);
fuse_invalidate_entry_cache(entry);
} else if (err == -EINTR)
fuse_invalidate_entry(entry);
return err;
}
static int fuse_rename_common(struct inode *olddir, struct dentry *oldent,
struct inode *newdir, struct dentry *newent,
unsigned int flags, int opcode, size_t argsize)
{
int err;
struct fuse_rename2_in inarg;
struct fuse_conn *fc = get_fuse_conn(olddir);
FUSE_ARGS(args);
memset(&inarg, 0, argsize);
inarg.newdir = get_node_id(newdir);
inarg.flags = flags;
args.in.h.opcode = opcode;
args.in.h.nodeid = get_node_id(olddir);
args.in.numargs = 3;
args.in.args[0].size = argsize;
args.in.args[0].value = &inarg;
args.in.args[1].size = oldent->d_name.len + 1;
args.in.args[1].value = oldent->d_name.name;
args.in.args[2].size = newent->d_name.len + 1;
args.in.args[2].value = newent->d_name.name;
err = fuse_simple_request(fc, &args);
if (!err) {
/* ctime changes */
fuse_invalidate_attr(d_inode(oldent));
fuse_update_ctime(d_inode(oldent));
if (flags & RENAME_EXCHANGE) {
fuse_invalidate_attr(d_inode(newent));
fuse_update_ctime(d_inode(newent));
}
fuse_invalidate_attr(olddir);
if (olddir != newdir)
fuse_invalidate_attr(newdir);
/* newent will end up negative */
if (!(flags & RENAME_EXCHANGE) && d_really_is_positive(newent)) {
fuse_invalidate_attr(d_inode(newent));
fuse_invalidate_entry_cache(newent);
fuse_update_ctime(d_inode(newent));
}
} else if (err == -EINTR) {
/* If request was interrupted, DEITY only knows if the
rename actually took place. If the invalidation
fails (e.g. some process has CWD under the renamed
directory), then there can be inconsistency between
the dcache and the real filesystem. Tough luck. */
fuse_invalidate_entry(oldent);
if (d_really_is_positive(newent))
fuse_invalidate_entry(newent);
}
return err;
}
static int fuse_rename2(struct inode *olddir, struct dentry *oldent,
struct inode *newdir, struct dentry *newent,
unsigned int flags)
{
struct fuse_conn *fc = get_fuse_conn(olddir);
int err;
if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE))
return -EINVAL;
if (flags) {
if (fc->no_rename2 || fc->minor < 23)
return -EINVAL;
err = fuse_rename_common(olddir, oldent, newdir, newent, flags,
FUSE_RENAME2,
sizeof(struct fuse_rename2_in));
if (err == -ENOSYS) {
fc->no_rename2 = 1;
err = -EINVAL;
}
} else {
err = fuse_rename_common(olddir, oldent, newdir, newent, 0,
FUSE_RENAME,
sizeof(struct fuse_rename_in));
}
return err;
}
static int fuse_link(struct dentry *entry, struct inode *newdir,
struct dentry *newent)
{
int err;
struct fuse_link_in inarg;
struct inode *inode = d_inode(entry);
struct fuse_conn *fc = get_fuse_conn(inode);
FUSE_ARGS(args);
memset(&inarg, 0, sizeof(inarg));
inarg.oldnodeid = get_node_id(inode);
args.in.h.opcode = FUSE_LINK;
args.in.numargs = 2;
args.in.args[0].size = sizeof(inarg);
args.in.args[0].value = &inarg;
args.in.args[1].size = newent->d_name.len + 1;
args.in.args[1].value = newent->d_name.name;
err = create_new_entry(fc, &args, newdir, newent, inode->i_mode);
/* Contrary to "normal" filesystems it can happen that link
makes two "logical" inodes point to the same "physical"
inode. We invalidate the attributes of the old one, so it
will reflect changes in the backing inode (link count,
etc.)
*/
if (!err) {
struct fuse_inode *fi = get_fuse_inode(inode);
spin_lock(&fc->lock);
fi->attr_version = ++fc->attr_version;
inc_nlink(inode);
spin_unlock(&fc->lock);
fuse_invalidate_attr(inode);
fuse_update_ctime(inode);
} else if (err == -EINTR) {
fuse_invalidate_attr(inode);
}
return err;
}
static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
struct kstat *stat)
{
unsigned int blkbits;
struct fuse_conn *fc = get_fuse_conn(inode);
/* see the comment in fuse_change_attributes() */
if (fc->writeback_cache && S_ISREG(inode->i_mode)) {
attr->size = i_size_read(inode);
attr->mtime = inode->i_mtime.tv_sec;
attr->mtimensec = inode->i_mtime.tv_nsec;
attr->ctime = inode->i_ctime.tv_sec;
attr->ctimensec = inode->i_ctime.tv_nsec;
}
stat->dev = inode->i_sb->s_dev;
stat->ino = attr->ino;
stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
stat->nlink = attr->nlink;
stat->uid = make_kuid(&init_user_ns, attr->uid);
stat->gid = make_kgid(&init_user_ns, attr->gid);
stat->rdev = inode->i_rdev;
stat->atime.tv_sec = attr->atime;
stat->atime.tv_nsec = attr->atimensec;
stat->mtime.tv_sec = attr->mtime;
stat->mtime.tv_nsec = attr->mtimensec;
stat->ctime.tv_sec = attr->ctime;
stat->ctime.tv_nsec = attr->ctimensec;
stat->size = attr->size;
stat->blocks = attr->blocks;
if (attr->blksize != 0)
blkbits = ilog2(attr->blksize);
else
blkbits = inode->i_sb->s_blocksize_bits;
stat->blksize = 1 << blkbits;
}
static int fuse_do_getattr(struct inode *inode, struct kstat *stat,
struct file *file)
{
int err;
struct fuse_getattr_in inarg;
struct fuse_attr_out outarg;
struct fuse_conn *fc = get_fuse_conn(inode);
FUSE_ARGS(args);
u64 attr_version;
attr_version = fuse_get_attr_version(fc);
memset(&inarg, 0, sizeof(inarg));
memset(&outarg, 0, sizeof(outarg));
/* Directories have separate file-handle space */
if (file && S_ISREG(inode->i_mode)) {
struct fuse_file *ff = file->private_data;
inarg.getattr_flags |= FUSE_GETATTR_FH;
inarg.fh = ff->fh;
}
args.in.h.opcode = FUSE_GETATTR;
args.in.h.nodeid = get_node_id(inode);
args.in.numargs = 1;
args.in.args[0].size = sizeof(inarg);
args.in.args[0].value = &inarg;
args.out.numargs = 1;
args.out.args[0].size = sizeof(outarg);
args.out.args[0].value = &outarg;
err = fuse_simple_request(fc, &args);
if (!err) {
if ((inode->i_mode ^ outarg.attr.mode) & S_IFMT) {
make_bad_inode(inode);
err = -EIO;
} else {
fuse_change_attributes(inode, &outarg.attr,
attr_timeout(&outarg),
attr_version);
if (stat)
fuse_fillattr(inode, &outarg.attr, stat);
}
}
return err;
}
int fuse_update_attributes(struct inode *inode, struct kstat *stat,
struct file *file, bool *refreshed)
{
struct fuse_inode *fi = get_fuse_inode(inode);
int err;
bool r;
if (time_before64(fi->i_time, get_jiffies_64())) {
r = true;
forget_all_cached_acls(inode);
err = fuse_do_getattr(inode, stat, file);
} else {
r = false;
err = 0;
if (stat) {
generic_fillattr(inode, stat);
stat->mode = fi->orig_i_mode;
stat->ino = fi->orig_ino;
}
}
if (refreshed != NULL)
*refreshed = r;
return err;
}
int fuse_reverse_inval_entry(struct super_block *sb, u64 parent_nodeid,
u64 child_nodeid, struct qstr *name)
{
int err = -ENOTDIR;
struct inode *parent;
struct dentry *dir;
struct dentry *entry;
parent = ilookup5(sb, parent_nodeid, fuse_inode_eq, &parent_nodeid);
if (!parent)
return -ENOENT;
inode_lock(parent);
if (!S_ISDIR(parent->i_mode))
goto unlock;
err = -ENOENT;
dir = d_find_alias(parent);
if (!dir)
goto unlock;
name->hash = full_name_hash(dir, name->name, name->len);
entry = d_lookup(dir, name);
dput(dir);
if (!entry)
goto unlock;
fuse_invalidate_attr(parent);
fuse_invalidate_entry(entry);
if (child_nodeid != 0 && d_really_is_positive(entry)) {
inode_lock(d_inode(entry));
if (get_node_id(d_inode(entry)) != child_nodeid) {
err = -ENOENT;
goto badentry;
}
if (d_mountpoint(entry)) {
err = -EBUSY;
goto badentry;
}
if (d_is_dir(entry)) {
shrink_dcache_parent(entry);
if (!simple_empty(entry)) {
err = -ENOTEMPTY;
goto badentry;
}
d_inode(entry)->i_flags |= S_DEAD;
}
dont_mount(entry);
clear_nlink(d_inode(entry));
err = 0;
badentry:
inode_unlock(d_inode(entry));
if (!err)
d_delete(entry);
} else {
err = 0;
}
dput(entry);
unlock:
inode_unlock(parent);
iput(parent);
return err;
}
/*
* Calling into a user-controlled filesystem gives the filesystem
* daemon ptrace-like capabilities over the current process. This
* means, that the filesystem daemon is able to record the exact
* filesystem operations performed, and can also control the behavior
* of the requester process in otherwise impossible ways. For example
* it can delay the operation for arbitrary length of time allowing
* DoS against the requester.
*
* For this reason only those processes can call into the filesystem,
* for which the owner of the mount has ptrace privilege. This
* excludes processes started by other users, suid or sgid processes.
*/
int fuse_allow_current_process(struct fuse_conn *fc)
{
const struct cred *cred;
if (fc->allow_other)
return 1;
cred = current_cred();
if (uid_eq(cred->euid, fc->user_id) &&
uid_eq(cred->suid, fc->user_id) &&
uid_eq(cred->uid, fc->user_id) &&
gid_eq(cred->egid, fc->group_id) &&
gid_eq(cred->sgid, fc->group_id) &&
gid_eq(cred->gid, fc->group_id))
return 1;
return 0;
}
static int fuse_access(struct inode *inode, int mask)
{
struct fuse_conn *fc = get_fuse_conn(inode);
FUSE_ARGS(args);
struct fuse_access_in inarg;
int err;
BUG_ON(mask & MAY_NOT_BLOCK);
if (fc->no_access)
return 0;
memset(&inarg, 0, sizeof(inarg));
inarg.mask = mask & (MAY_READ | MAY_WRITE | MAY_EXEC);
args.in.h.opcode = FUSE_ACCESS;
args.in.h.nodeid = get_node_id(inode);
args.in.numargs = 1;
args.in.args[0].size = sizeof(inarg);
args.in.args[0].value = &inarg;
err = fuse_simple_request(fc, &args);
if (err == -ENOSYS) {
fc->no_access = 1;
err = 0;
}
return err;
}
static int fuse_perm_getattr(struct inode *inode, int mask)
{
if (mask & MAY_NOT_BLOCK)
return -ECHILD;
forget_all_cached_acls(inode);
return fuse_do_getattr(inode, NULL, NULL);
}
/*
* Check permission. The two basic access models of FUSE are:
*
* 1) Local access checking ('default_permissions' mount option) based
* on file mode. This is the plain old disk filesystem permission
* modell.
*
* 2) "Remote" access checking, where server is responsible for
* checking permission in each inode operation. An exception to this
* is if ->permission() was invoked from sys_access() in which case an
* access request is sent. Execute permission is still checked
* locally based on file mode.
*/
static int fuse_permission(struct inode *inode, int mask)
{
struct fuse_conn *fc = get_fuse_conn(inode);
bool refreshed = false;
int err = 0;
if (!fuse_allow_current_process(fc))
return -EACCES;
/*
* If attributes are needed, refresh them before proceeding
*/
if (fc->default_permissions ||
((mask & MAY_EXEC) && S_ISREG(inode->i_mode))) {
struct fuse_inode *fi = get_fuse_inode(inode);
if (time_before64(fi->i_time, get_jiffies_64())) {
refreshed = true;
err = fuse_perm_getattr(inode, mask);
if (err)
return err;
}
}
if (fc->default_permissions) {
err = generic_permission(inode, mask);
/* If permission is denied, try to refresh file
attributes. This is also needed, because the root
node will at first have no permissions */
if (err == -EACCES && !refreshed) {
err = fuse_perm_getattr(inode, mask);
if (!err)
err = generic_permission(inode, mask);
}
/* Note: the opposite of the above test does not
exist. So if permissions are revoked this won't be
noticed immediately, only after the attribute
timeout has expired */
} else if (mask & (MAY_ACCESS | MAY_CHDIR)) {
err = fuse_access(inode, mask);
} else if ((mask & MAY_EXEC) && S_ISREG(inode->i_mode)) {
if (!(inode->i_mode & S_IXUGO)) {
if (refreshed)
return -EACCES;
err = fuse_perm_getattr(inode, mask);
if (!err && !(inode->i_mode & S_IXUGO))
return -EACCES;
}
}
return err;
}
static int parse_dirfile(char *buf, size_t nbytes, struct file *file,
struct dir_context *ctx)
{
while (nbytes >= FUSE_NAME_OFFSET) {
struct fuse_dirent *dirent = (struct fuse_dirent *) buf;
size_t reclen = FUSE_DIRENT_SIZE(dirent);
if (!dirent->namelen || dirent->namelen > FUSE_NAME_MAX)
return -EIO;
if (reclen > nbytes)
break;
if (memchr(dirent->name, '/', dirent->namelen) != NULL)
return -EIO;
if (!dir_emit(ctx, dirent->name, dirent->namelen,
dirent->ino, dirent->type))
break;
buf += reclen;
nbytes -= reclen;
ctx->pos = dirent->off;
}
return 0;
}
static int fuse_direntplus_link(struct file *file,
struct fuse_direntplus *direntplus,
u64 attr_version)
{
struct fuse_entry_out *o = &direntplus->entry_out;
struct fuse_dirent *dirent = &direntplus->dirent;
struct dentry *parent = file->f_path.dentry;
struct qstr name = QSTR_INIT(dirent->name, dirent->namelen);
struct dentry *dentry;
struct dentry *alias;
struct inode *dir = d_inode(parent);
struct fuse_conn *fc;
struct inode *inode;
DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
if (!o->nodeid) {
/*
* Unlike in the case of fuse_lookup, zero nodeid does not mean
* ENOENT. Instead, it only means the userspace filesystem did
* not want to return attributes/handle for this entry.
*
* So do nothing.
*/
return 0;
}
if (name.name[0] == '.') {
/*
* We could potentially refresh the attributes of the directory
* and its parent?
*/
if (name.len == 1)
return 0;
if (name.name[1] == '.' && name.len == 2)
return 0;
}
if (invalid_nodeid(o->nodeid))
return -EIO;
if (!fuse_valid_type(o->attr.mode))
return -EIO;
fc = get_fuse_conn(dir);
name.hash = full_name_hash(parent, name.name, name.len);
dentry = d_lookup(parent, &name);
if (!dentry) {
retry:
dentry = d_alloc_parallel(parent, &name, &wq);
if (IS_ERR(dentry))
return PTR_ERR(dentry);
}
if (!d_in_lookup(dentry)) {
struct fuse_inode *fi;
inode = d_inode(dentry);
if (!inode ||
get_node_id(inode) != o->nodeid ||
((o->attr.mode ^ inode->i_mode) & S_IFMT)) {
d_invalidate(dentry);
dput(dentry);
goto retry;
}
if (is_bad_inode(inode)) {
dput(dentry);
return -EIO;
}
fi = get_fuse_inode(inode);
spin_lock(&fc->lock);
fi->nlookup++;
spin_unlock(&fc->lock);
forget_all_cached_acls(inode);
fuse_change_attributes(inode, &o->attr,
entry_attr_timeout(o),
attr_version);
/*
* The other branch comes via fuse_iget()
* which bumps nlookup inside
*/
} else {
inode = fuse_iget(dir->i_sb, o->nodeid, o->generation,
&o->attr, entry_attr_timeout(o),
attr_version);
if (!inode)
inode = ERR_PTR(-ENOMEM);
alias = d_splice_alias(inode, dentry);
d_lookup_done(dentry);
if (alias) {
dput(dentry);
dentry = alias;
}
if (IS_ERR(dentry))
return PTR_ERR(dentry);
}
if (fc->readdirplus_auto)
set_bit(FUSE_I_INIT_RDPLUS, &get_fuse_inode(inode)->state);
fuse_change_entry_timeout(dentry, o);
dput(dentry);
return 0;
}
static int parse_dirplusfile(char *buf, size_t nbytes, struct file *file,
struct dir_context *ctx, u64 attr_version)
{
struct fuse_direntplus *direntplus;
struct fuse_dirent *dirent;
size_t reclen;
int over = 0;
int ret;
while (nbytes >= FUSE_NAME_OFFSET_DIRENTPLUS) {
direntplus = (struct fuse_direntplus *) buf;
dirent = &direntplus->dirent;
reclen = FUSE_DIRENTPLUS_SIZE(direntplus);
if (!dirent->namelen || dirent->namelen > FUSE_NAME_MAX)
return -EIO;
if (reclen > nbytes)
break;
if (memchr(dirent->name, '/', dirent->namelen) != NULL)
return -EIO;
if (!over) {
/* We fill entries into dstbuf only as much as
it can hold. But we still continue iterating
over remaining entries to link them. If not,
we need to send a FORGET for each of those
which we did not link.
*/
over = !dir_emit(ctx, dirent->name, dirent->namelen,
dirent->ino, dirent->type);
ctx->pos = dirent->off;
}
buf += reclen;
nbytes -= reclen;
ret = fuse_direntplus_link(file, direntplus, attr_version);
if (ret)
fuse_force_forget(file, direntplus->entry_out.nodeid);
}
return 0;
}
static int fuse_readdir(struct file *file, struct dir_context *ctx)
{
int plus, err;
size_t nbytes;
struct page *page;
struct inode *inode = file_inode(file);
struct fuse_conn *fc = get_fuse_conn(inode);
struct fuse_req *req;
u64 attr_version = 0;
if (is_bad_inode(inode))
return -EIO;
req = fuse_get_req(fc, 1);
if (IS_ERR(req))
return PTR_ERR(req);
page = alloc_page(GFP_KERNEL);
if (!page) {
fuse_put_request(fc, req);
return -ENOMEM;
}
plus = fuse_use_readdirplus(inode, ctx);
req->out.argpages = 1;
req->num_pages = 1;
req->pages[0] = page;
req->page_descs[0].length = PAGE_SIZE;
if (plus) {
attr_version = fuse_get_attr_version(fc);
fuse_read_fill(req, file, ctx->pos, PAGE_SIZE,
FUSE_READDIRPLUS);
} else {
fuse_read_fill(req, file, ctx->pos, PAGE_SIZE,
FUSE_READDIR);
}
fuse_lock_inode(inode);
fuse_request_send(fc, req);
fuse_unlock_inode(inode);
nbytes = req->out.args[0].size;
err = req->out.h.error;
fuse_put_request(fc, req);
if (!err) {
if (plus) {
err = parse_dirplusfile(page_address(page), nbytes,
file, ctx,
attr_version);
} else {
err = parse_dirfile(page_address(page), nbytes, file,
ctx);
}
}
__free_page(page);
fuse_invalidate_atime(inode);
return err;
}
static const char *fuse_get_link(struct dentry *dentry,
struct inode *inode,
struct delayed_call *done)
{
struct fuse_conn *fc = get_fuse_conn(inode);
FUSE_ARGS(args);
char *link;
ssize_t ret;
if (!dentry)
return ERR_PTR(-ECHILD);
link = kmalloc(PAGE_SIZE, GFP_KERNEL);
if (!link)
return ERR_PTR(-ENOMEM);
args.in.h.opcode = FUSE_READLINK;
args.in.h.nodeid = get_node_id(inode);
args.out.argvar = 1;
args.out.numargs = 1;
args.out.args[0].size = PAGE_SIZE - 1;
args.out.args[0].value = link;
ret = fuse_simple_request(fc, &args);
if (ret < 0) {
kfree(link);
link = ERR_PTR(ret);
} else {
link[ret] = '\0';
set_delayed_call(done, kfree_link, link);
}
fuse_invalidate_atime(inode);
return link;
}
static int fuse_dir_open(struct inode *inode, struct file *file)
{
return fuse_open_common(inode, file, true);
}
static int fuse_dir_release(struct inode *inode, struct file *file)
{
fuse_release_common(file, FUSE_RELEASEDIR);
return 0;
}
static int fuse_dir_fsync(struct file *file, loff_t start, loff_t end,
int datasync)
{
return fuse_fsync_common(file, start, end, datasync, 1);
}
static long fuse_dir_ioctl(struct file *file, unsigned int cmd,
unsigned long arg)
{
struct fuse_conn *fc = get_fuse_conn(file->f_mapping->host);
/* FUSE_IOCTL_DIR only supported for API version >= 7.18 */
if (fc->minor < 18)
return -ENOTTY;
return fuse_ioctl_common(file, cmd, arg, FUSE_IOCTL_DIR);
}
static long fuse_dir_compat_ioctl(struct file *file, unsigned int cmd,
unsigned long arg)
{
struct fuse_conn *fc = get_fuse_conn(file->f_mapping->host);
if (fc->minor < 18)
return -ENOTTY;
return fuse_ioctl_common(file, cmd, arg,
FUSE_IOCTL_COMPAT | FUSE_IOCTL_DIR);
}
static bool update_mtime(unsigned ivalid, bool trust_local_mtime)
{
/* Always update if mtime is explicitly set */
if (ivalid & ATTR_MTIME_SET)
return true;
/* Or if kernel i_mtime is the official one */
if (trust_local_mtime)
return true;
/* If it's an open(O_TRUNC) or an ftruncate(), don't update */
if ((ivalid & ATTR_SIZE) && (ivalid & (ATTR_OPEN | ATTR_FILE)))
return false;
/* In all other cases update */
return true;
}
static void iattr_to_fattr(struct iattr *iattr, struct fuse_setattr_in *arg,
bool trust_local_cmtime)
{
unsigned ivalid = iattr->ia_valid;
if (ivalid & ATTR_MODE)
arg->valid |= FATTR_MODE, arg->mode = iattr->ia_mode;
if (ivalid & ATTR_UID)
arg->valid |= FATTR_UID, arg->uid = from_kuid(&init_user_ns, iattr->ia_uid);
if (ivalid & ATTR_GID)
arg->valid |= FATTR_GID, arg->gid = from_kgid(&init_user_ns, iattr->ia_gid);
if (ivalid & ATTR_SIZE)
arg->valid |= FATTR_SIZE, arg->size = iattr->ia_size;
if (ivalid & ATTR_ATIME) {
arg->valid |= FATTR_ATIME;
arg->atime = iattr->ia_atime.tv_sec;
arg->atimensec = iattr->ia_atime.tv_nsec;
if (!(ivalid & ATTR_ATIME_SET))
arg->valid |= FATTR_ATIME_NOW;
}
if ((ivalid & ATTR_MTIME) && update_mtime(ivalid, trust_local_cmtime)) {
arg->valid |= FATTR_MTIME;
arg->mtime = iattr->ia_mtime.tv_sec;
arg->mtimensec = iattr->ia_mtime.tv_nsec;
if (!(ivalid & ATTR_MTIME_SET) && !trust_local_cmtime)
arg->valid |= FATTR_MTIME_NOW;
}
if ((ivalid & ATTR_CTIME) && trust_local_cmtime) {
arg->valid |= FATTR_CTIME;
arg->ctime = iattr->ia_ctime.tv_sec;
arg->ctimensec = iattr->ia_ctime.tv_nsec;
}
}
/*
* Prevent concurrent writepages on inode
*
* This is done by adding a negative bias to the inode write counter
* and waiting for all pending writes to finish.
*/
void fuse_set_nowrite(struct inode *inode)
{
struct fuse_conn *fc = get_fuse_conn(inode);
struct fuse_inode *fi = get_fuse_inode(inode);
BUG_ON(!inode_is_locked(inode));
spin_lock(&fc->lock);
BUG_ON(fi->writectr < 0);
fi->writectr += FUSE_NOWRITE;
spin_unlock(&fc->lock);
wait_event(fi->page_waitq, fi->writectr == FUSE_NOWRITE);
}
/*
* Allow writepages on inode
*
* Remove the bias from the writecounter and send any queued
* writepages.
*/
static void __fuse_release_nowrite(struct inode *inode)
{
struct fuse_inode *fi = get_fuse_inode(inode);
BUG_ON(fi->writectr != FUSE_NOWRITE);
fi->writectr = 0;
fuse_flush_writepages(inode);
}
void fuse_release_nowrite(struct inode *inode)
{
struct fuse_conn *fc = get_fuse_conn(inode);
spin_lock(&fc->lock);
__fuse_release_nowrite(inode);
spin_unlock(&fc->lock);
}
static void fuse_setattr_fill(struct fuse_conn *fc, struct fuse_args *args,
struct inode *inode,
struct fuse_setattr_in *inarg_p,
struct fuse_attr_out *outarg_p)
{
args->in.h.opcode = FUSE_SETATTR;
args->in.h.nodeid = get_node_id(inode);
args->in.numargs = 1;
args->in.args[0].size = sizeof(*inarg_p);
args->in.args[0].value = inarg_p;
args->out.numargs = 1;
args->out.args[0].size = sizeof(*outarg_p);
args->out.args[0].value = outarg_p;
}
/*
* Flush inode->i_mtime to the server
*/
int fuse_flush_times(struct inode *inode, struct fuse_file *ff)
{
struct fuse_conn *fc = get_fuse_conn(inode);
FUSE_ARGS(args);
struct fuse_setattr_in inarg;
struct fuse_attr_out outarg;
memset(&inarg, 0, sizeof(inarg));
memset(&outarg, 0, sizeof(outarg));
inarg.valid = FATTR_MTIME;
inarg.mtime = inode->i_mtime.tv_sec;
inarg.mtimensec = inode->i_mtime.tv_nsec;
if (fc->minor >= 23) {
inarg.valid |= FATTR_CTIME;
inarg.ctime = inode->i_ctime.tv_sec;
inarg.ctimensec = inode->i_ctime.tv_nsec;
}
if (ff) {
inarg.valid |= FATTR_FH;
inarg.fh = ff->fh;
}
fuse_setattr_fill(fc, &args, inode, &inarg, &outarg);
return fuse_simple_request(fc, &args);
}
/*
* Set attributes, and at the same time refresh them.
*
* Truncation is slightly complicated, because the 'truncate' request
* may fail, in which case we don't want to touch the mapping.
* vmtruncate() doesn't allow for this case, so do the rlimit checking
* and the actual truncation by hand.
*/
int fuse_do_setattr(struct dentry *dentry, struct iattr *attr,
struct file *file)
{
struct inode *inode = d_inode(dentry);
struct fuse_conn *fc = get_fuse_conn(inode);
struct fuse_inode *fi = get_fuse_inode(inode);
FUSE_ARGS(args);
struct fuse_setattr_in inarg;
struct fuse_attr_out outarg;
bool is_truncate = false;
bool is_wb = fc->writeback_cache;
loff_t oldsize;
int err;
bool trust_local_cmtime = is_wb && S_ISREG(inode->i_mode);
if (!fc->default_permissions)
attr->ia_valid |= ATTR_FORCE;
err = setattr_prepare(dentry, attr);
if (err)
return err;
if (attr->ia_valid & ATTR_OPEN) {
if (fc->atomic_o_trunc)
return 0;
file = NULL;
}
if (attr->ia_valid & ATTR_SIZE)
is_truncate = true;
if (is_truncate) {
fuse_set_nowrite(inode);
set_bit(FUSE_I_SIZE_UNSTABLE, &fi->state);
if (trust_local_cmtime && attr->ia_size != inode->i_size)
attr->ia_valid |= ATTR_MTIME | ATTR_CTIME;
}
memset(&inarg, 0, sizeof(inarg));
memset(&outarg, 0, sizeof(outarg));
iattr_to_fattr(attr, &inarg, trust_local_cmtime);
if (file) {
struct fuse_file *ff = file->private_data;
inarg.valid |= FATTR_FH;
inarg.fh = ff->fh;
}
if (attr->ia_valid & ATTR_SIZE) {
/* For mandatory locking in truncate */
inarg.valid |= FATTR_LOCKOWNER;
inarg.lock_owner = fuse_lock_owner_id(fc, current->files);
}
fuse_setattr_fill(fc, &args, inode, &inarg, &outarg);
err = fuse_simple_request(fc, &args);
if (err) {
if (err == -EINTR)
fuse_invalidate_attr(inode);
goto error;
}
if ((inode->i_mode ^ outarg.attr.mode) & S_IFMT) {
make_bad_inode(inode);
err = -EIO;
goto error;
}
spin_lock(&fc->lock);
/* the kernel maintains i_mtime locally */
if (trust_local_cmtime) {
if (attr->ia_valid & ATTR_MTIME)
inode->i_mtime = attr->ia_mtime;
if (attr->ia_valid & ATTR_CTIME)
inode->i_ctime = attr->ia_ctime;
/* FIXME: clear I_DIRTY_SYNC? */
}
fuse_change_attributes_common(inode, &outarg.attr,
attr_timeout(&outarg));
oldsize = inode->i_size;
/* see the comment in fuse_change_attributes() */
if (!is_wb || is_truncate || !S_ISREG(inode->i_mode))
i_size_write(inode, outarg.attr.size);
if (is_truncate) {
/* NOTE: this may release/reacquire fc->lock */
__fuse_release_nowrite(inode);
}
spin_unlock(&fc->lock);
/*
* Only call invalidate_inode_pages2() after removing
* FUSE_NOWRITE, otherwise fuse_launder_page() would deadlock.
*/
if ((is_truncate || !is_wb) &&
S_ISREG(inode->i_mode) && oldsize != outarg.attr.size) {
truncate_pagecache(inode, outarg.attr.size);
invalidate_inode_pages2(inode->i_mapping);
}
clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state);
return 0;
error:
if (is_truncate)
fuse_release_nowrite(inode);
clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state);
return err;
}
static int fuse_setattr(struct dentry *entry, struct iattr *attr)
{
struct inode *inode = d_inode(entry);
struct fuse_conn *fc = get_fuse_conn(inode);
struct file *file = (attr->ia_valid & ATTR_FILE) ? attr->ia_file : NULL;
int ret;
if (!fuse_allow_current_process(get_fuse_conn(inode)))
return -EACCES;
if (attr->ia_valid & (ATTR_KILL_SUID | ATTR_KILL_SGID)) {
attr->ia_valid &= ~(ATTR_KILL_SUID | ATTR_KILL_SGID |
ATTR_MODE);
/*
* The only sane way to reliably kill suid/sgid is to do it in
* the userspace filesystem
*
* This should be done on write(), truncate() and chown().
*/
if (!fc->handle_killpriv) {
/*
* ia_mode calculation may have used stale i_mode.
* Refresh and recalculate.
*/
ret = fuse_do_getattr(inode, NULL, file);
if (ret)
return ret;
attr->ia_mode = inode->i_mode;
if (inode->i_mode & S_ISUID) {
attr->ia_valid |= ATTR_MODE;
attr->ia_mode &= ~S_ISUID;
}
if ((inode->i_mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
attr->ia_valid |= ATTR_MODE;
attr->ia_mode &= ~S_ISGID;
}
}
}
if (!attr->ia_valid)
return 0;
ret = fuse_do_setattr(entry, attr, file);
if (!ret) {
/*
* If filesystem supports acls it may have updated acl xattrs in
* the filesystem, so forget cached acls for the inode.
*/
if (fc->posix_acl)
forget_all_cached_acls(inode);
/* Directory mode changed, may need to revalidate access */
if (d_is_dir(entry) && (attr->ia_valid & ATTR_MODE))
fuse_invalidate_entry_cache(entry);
}
return ret;
}
static int fuse_getattr(const struct path *path, struct kstat *stat,
u32 request_mask, unsigned int flags)
{
struct inode *inode = d_inode(path->dentry);
struct fuse_conn *fc = get_fuse_conn(inode);
if (!fuse_allow_current_process(fc))
return -EACCES;
return fuse_update_attributes(inode, stat, NULL, NULL);
}
static const struct inode_operations fuse_dir_inode_operations = {
.lookup = fuse_lookup,
.mkdir = fuse_mkdir,
.symlink = fuse_symlink,
.unlink = fuse_unlink,
.rmdir = fuse_rmdir,
.rename = fuse_rename2,
.link = fuse_link,
.setattr = fuse_setattr,
.create = fuse_create,
.atomic_open = fuse_atomic_open,
.mknod = fuse_mknod,
.permission = fuse_permission,
.getattr = fuse_getattr,
.listxattr = fuse_listxattr,
.get_acl = fuse_get_acl,
.set_acl = fuse_set_acl,
};
static const struct file_operations fuse_dir_operations = {
.llseek = generic_file_llseek,
.read = generic_read_dir,
.iterate_shared = fuse_readdir,
.open = fuse_dir_open,
.release = fuse_dir_release,
.fsync = fuse_dir_fsync,
.unlocked_ioctl = fuse_dir_ioctl,
.compat_ioctl = fuse_dir_compat_ioctl,
};
static const struct inode_operations fuse_common_inode_operations = {
.setattr = fuse_setattr,
.permission = fuse_permission,
.getattr = fuse_getattr,
.listxattr = fuse_listxattr,
.get_acl = fuse_get_acl,
.set_acl = fuse_set_acl,
};
static const struct inode_operations fuse_symlink_inode_operations = {
.setattr = fuse_setattr,
.get_link = fuse_get_link,
.getattr = fuse_getattr,
.listxattr = fuse_listxattr,
};
void fuse_init_common(struct inode *inode)
{
inode->i_op = &fuse_common_inode_operations;
}
void fuse_init_dir(struct inode *inode)
{
inode->i_op = &fuse_dir_inode_operations;
inode->i_fop = &fuse_dir_operations;
}
void fuse_init_symlink(struct inode *inode)
{
inode->i_op = &fuse_symlink_inode_operations;
}