Commit Graph

1069 Commits

Author SHA1 Message Date
Miklos Szeredi e904621ae0 fuse: fix use after free in fuse_read_interrupt()
[ Upstream commit e1e71c168813564be0f6ea3d6740a059ca42d177 ]

There is a potential race between fuse_read_interrupt() and
fuse_request_end().

TASK1
  in fuse_read_interrupt(): delete req->intr_entry (while holding
  fiq->lock)

TASK2
  in fuse_request_end(): req->intr_entry is empty -> skip fiq->lock
  wake up TASK3

TASK3
  request is freed

TASK1
  in fuse_read_interrupt(): dereference req->in.h.unique ***BAM***

Fix by always grabbing fiq->lock if the request was ever interrupted
(FR_INTERRUPTED set) thereby serializing with concurrent
fuse_read_interrupt() calls.

FR_INTERRUPTED is set before the request is queued on fiq->interrupts.
Dequeing the request is done with list_del_init() but FR_INTERRUPTED is not
cleared in this case.

Reported-by: lijiazi <lijiazi@xiaomi.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22 12:26:43 +02:00
Miklos Szeredi 85b0726d5b fuse: flush extending writes
commit 59bda8ecee2ffc6a602b7bf2b9e43ca669cdbdcd upstream.

Callers of fuse_writeback_range() assume that the file is ready for
modification by the server in the supplied byte range after the call
returns.

If there's a write that extends the file beyond the end of the supplied
range, then the file needs to be extended to at least the end of the range,
but currently that's not done.

There are at least two cases where this can cause problems:

 - copy_file_range() will return short count if the file is not extended
   up to end of the source range.

 - FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE will not extend the file,
   hence the region may not be fully allocated.

Fix by flushing writes from the start of the range up to the end of the
file.  This could be optimized if the writes are non-extending, etc, but
it's probably not worth the trouble.

Fixes: a2bc923629 ("fuse: fix copy_file_range() in the writeback case")
Fixes: 6b1bdb56b17c ("fuse: allow fallocate(FALLOC_FL_ZERO_RANGE)")
Cc: <stable@vger.kernel.org>  # v5.2
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-15 09:47:41 +02:00
Miklos Szeredi 8a98ced6e1 fuse: truncate pagecache on atomic_o_trunc
commit 76224355db7570cbe6b6f75c8929a1558828dd55 upstream.

fuse_finish_open() will be called with FUSE_NOWRITE in case of atomic
O_TRUNC.  This can deadlock with fuse_wait_on_page_writeback() in
fuse_launder_page() triggered by invalidate_inode_pages2().

Fix by replacing invalidate_inode_pages2() in fuse_finish_open() with a
truncate_pagecache() call.  This makes sense regardless of FOPEN_KEEP_CACHE
or fc->writeback cache, so do it unconditionally.

Reported-by: Xie Yongji <xieyongji@bytedance.com>
Reported-and-tested-by: syzbot+bea44a5189836d956894@syzkaller.appspotmail.com
Fixes: e4648309b8 ("fuse: truncate pending writes on O_TRUNC")
Cc: <stable@vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-15 09:47:41 +02:00
Miklos Szeredi a883c38f1c fuse: reject internal errno
commit 49221cf86d18bb66fe95d3338cb33bd4b9880ca5 upstream.

Don't allow userspace to report errors that could be kernel-internal.

Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Fixes: 334f485df8 ("[PATCH] FUSE - device functions")
Cc: <stable@vger.kernel.org> # v2.6.14
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-14 16:53:09 +02:00
Miklos Szeredi 059dd690bf fuse: check connected before queueing on fpq->io
commit 80ef08670d4c28a06a3de954bd350368780bcfef upstream.

A request could end up on the fpq->io list after fuse_abort_conn() has
reset fpq->connected and aborted requests on that list:

Thread-1			  Thread-2
========			  ========
->fuse_simple_request()           ->shutdown
  ->__fuse_request_send()
    ->queue_request()		->fuse_abort_conn()
->fuse_dev_do_read()                ->acquire(fpq->lock)
  ->wait_for(fpq->lock) 	  ->set err to all req's in fpq->io
				  ->release(fpq->lock)
  ->acquire(fpq->lock)
  ->add req to fpq->io

After the userspace copy is done the request will be ended, but
req->out.h.error will remain uninitialized.  Also the copy might block
despite being already aborted.

Fix both issues by not allowing the request to be queued on the fpq->io
list after fuse_abort_conn() has processed this list.

Reported-by: Pradeep P V K <pragalla@codeaurora.org>
Fixes: fd22d62ed0 ("fuse: no fc->lock for iqueue parts")
Cc: <stable@vger.kernel.org> # v4.2
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-14 16:53:09 +02:00
Miklos Szeredi e72bec9226 fuse: ignore PG_workingset after stealing
commit b89ecd60d38ec042d63bdb376c722a16f92bcb88 upstream.

Fix the "fuse: trying to steal weird page" warning.

Description from Johannes Weiner:

  "Think of it as similar to PG_active. It's just another usage/heat
   indicator of file and anon pages on the reclaim LRU that, unlike
   PG_active, persists across deactivation and even reclaim (we store it in
   the page cache / swapper cache tree until the page refaults).

   So if fuse accepts pages that can legally have PG_active set,
   PG_workingset is fine too."

Reported-by: Thomas Lindroth <thomas.lindroth@gmail.com>
Fixes: 1899ad18c6 ("mm: workingset: tell cache transitions from workingset thrashing")
Cc: <stable@vger.kernel.org> # v4.20
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-14 16:53:08 +02:00
Miklos Szeredi d61f2d9381 cuse: prevent clone
[ Upstream commit 8217673d07256b22881127bf50dce874d0e51653 ]

For cloned connections cuse_channel_release() will be called more than
once, resulting in use after free.

Prevent device cloning for CUSE, which does not make sense at this point,
and highly unlikely to be used in real life.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19 10:08:22 +02:00
Vivek Goyal be8db260f4 fuse: fix write deadlock
commit 4f06dd92b5d0a6f8eec6a34b8d6ef3e1f4ac1e10 upstream.

There are two modes for write(2) and friends in fuse:

a) write through (update page cache, send sync WRITE request to userspace)

b) buffered write (update page cache, async writeout later)

The write through method kept all the page cache pages locked that were
used for the request.  Keeping more than one page locked is deadlock prone
and Qian Cai demonstrated this with trinity fuzzing.

The reason for keeping the pages locked is that concurrent mapped reads
shouldn't try to pull possibly stale data into the page cache.

For full page writes, the easy way to fix this is to make the cached page
be the authoritative source by marking the page PG_uptodate immediately.
After this the page can be safely unlocked, since mapped/cached reads will
take the written data from the cache.

Concurrent mapped writes will now cause data in the original WRITE request
to be updated; this however doesn't cause any data inconsistency and this
scenario should be exceedingly rare anyway.

If the WRITE request returns with an error in the above case, currently the
page is not marked uptodate; this means that a concurrent read will always
read consistent data.  After this patch the page is uptodate between
writing to the cache and receiving the error: there's window where a cached
read will read the wrong data.  While theoretically this could be a
regression, it is unlikely to be one in practice, since this is normal for
buffered writes.

In case of a partial page write to an already uptodate page the locking is
also unnecessary, with the above caveats.

Partial write of a not uptodate page still needs to be handled.  One way
would be to read the complete page before doing the write.  This is not
possible, since it might break filesystems that don't expect any READ
requests when the file was opened O_WRONLY.

The other solution is to serialize the synchronous write with reads from
the partial pages.  The easiest way to do this is to keep the partial pages
locked.  The problem is that a write() may involve two such pages (one head
and one tail).  This patch fixes it by only locking the partial tail page.
If there's a partial head page as well, then split that off as a separate
WRITE request.

Reported-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/linux-fsdevel/4794a3fa3742a5e84fb0f934944204b55730829b.camel@lca.pw/
Fixes: ea9b9907b8 ("fuse: implement perform_write")
Cc: <stable@vger.kernel.org> # v2.6.26
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-11 14:04:16 +02:00
Luis Henriques 310efc95c7 virtiofs: fix memory leak in virtio_fs_probe()
commit c79c5e0178922a9e092ec8fed026750f39dcaef4 upstream.

When accidentally passing twice the same tag to qemu, kmemleak ended up
reporting a memory leak in virtiofs.  Also, looking at the log I saw the
following error (that's when I realised the duplicated tag):

  virtiofs: probe of virtio5 failed with error -17

Here's the kmemleak log for reference:

unreferenced object 0xffff888103d47800 (size 1024):
  comm "systemd-udevd", pid 118, jiffies 4294893780 (age 18.340s)
  hex dump (first 32 bytes):
    00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .....N..........
    ff ff ff ff ff ff ff ff 80 90 02 a0 ff ff ff ff  ................
  backtrace:
    [<000000000ebb87c1>] virtio_fs_probe+0x171/0x7ae [virtiofs]
    [<00000000f8aca419>] virtio_dev_probe+0x15f/0x210
    [<000000004d6baf3c>] really_probe+0xea/0x430
    [<00000000a6ceeac8>] device_driver_attach+0xa8/0xb0
    [<00000000196f47a7>] __driver_attach+0x98/0x140
    [<000000000b20601d>] bus_for_each_dev+0x7b/0xc0
    [<00000000399c7b7f>] bus_add_driver+0x11b/0x1f0
    [<0000000032b09ba7>] driver_register+0x8f/0xe0
    [<00000000cdd55998>] 0xffffffffa002c013
    [<000000000ea196a2>] do_one_initcall+0x64/0x2e0
    [<0000000008f727ce>] do_init_module+0x5c/0x260
    [<000000003cdedab6>] __do_sys_finit_module+0xb5/0x120
    [<00000000ad2f48c6>] do_syscall_64+0x33/0x40
    [<00000000809526b5>] entry_SYSCALL_64_after_hwframe+0x44/0xae

Cc: stable@vger.kernel.org
Signed-off-by: Luis Henriques <lhenriques@suse.de>
Fixes: a62a8ef9d9 ("virtio-fs: add virtiofs filesystem")
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-11 14:04:14 +02:00
Amir Goldstein 187ae04636 fuse: fix live lock in fuse_iget()
commit 775c5033a0d164622d9d10dd0f0a5531639ed3ed upstream.

Commit 5d069dbe8aaf ("fuse: fix bad inode") replaced make_bad_inode()
in fuse_iget() with a private implementation fuse_make_bad().

The private implementation fails to remove the bad inode from inode
cache, so the retry loop with iget5_locked() finds the same bad inode
and marks it bad forever.

kmsg snip:

[ ] rcu: INFO: rcu_sched self-detected stall on CPU
...
[ ]  ? bit_wait_io+0x50/0x50
[ ]  ? fuse_init_file_inode+0x70/0x70
[ ]  ? find_inode.isra.32+0x60/0xb0
[ ]  ? fuse_init_file_inode+0x70/0x70
[ ]  ilookup5_nowait+0x65/0x90
[ ]  ? fuse_init_file_inode+0x70/0x70
[ ]  ilookup5.part.36+0x2e/0x80
[ ]  ? fuse_init_file_inode+0x70/0x70
[ ]  ? fuse_inode_eq+0x20/0x20
[ ]  iget5_locked+0x21/0x80
[ ]  ? fuse_inode_eq+0x20/0x20
[ ]  fuse_iget+0x96/0x1b0

Fixes: 5d069dbe8aaf ("fuse: fix bad inode")
Cc: stable@vger.kernel.org # 5.10+
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-20 10:39:47 +01:00
Miklos Szeredi 732251cabe fuse: fix bad inode
[ Upstream commit 5d069dbe8aaf2a197142558b6fb2978189ba3454 ]

Jan Kara's analysis of the syzbot report (edited):

  The reproducer opens a directory on FUSE filesystem, it then attaches
  dnotify mark to the open directory.  After that a fuse_do_getattr() call
  finds that attributes returned by the server are inconsistent, and calls
  make_bad_inode() which, among other things does:

          inode->i_mode = S_IFREG;

  This then confuses dnotify which doesn't tear down its structures
  properly and eventually crashes.

Avoid calling make_bad_inode() on a live inode: switch to a private flag on
the fuse inode.  Also add the test to ops which the bad_inode_ops would
have caught.

This bug goes back to the initial merge of fuse in 2.6.14...

Reported-by: syzbot+f427adf9324b92652ccc@syzkaller.appspotmail.com
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Tested-by: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-09 13:44:54 +01:00
Miklos Szeredi 860448e73b fuse: fix page dereference after free
commit d78092e4937de9ce55edcb4ee4c5e3c707be0190 upstream.

After unlock_request() pages from the ap->pages[] array may be put (e.g. by
aborting the connection) and the pages can be freed.

Prevent use after free by grabbing a reference to the page before calling
unlock_request().

The original patch was created by Pradeep P V K.

Reported-by: Pradeep P V K <ppvk@codeaurora.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01 12:01:05 +01:00
Al Viro 89fd103fbb fuse: fix the ->direct_IO() treatment of iov_iter
[ Upstream commit 933a3752babcf6513117d5773d2b70782d6ad149 ]

the callers rely upon having any iov_iter_truncate() done inside
->direct_IO() countered by iov_iter_reexpand().

Reported-by: Qian Cai <cai@redhat.com>
Tested-by: Qian Cai <cai@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-07 08:01:26 +02:00
Miklos Szeredi 89e6cf1c0a fuse: update attr_version counter on fuse_notify_inval_inode()
[ Upstream commit 5ddd9ced9aef6cfa76af27d384c17c9e2d610ce8 ]

A GETATTR request can race with FUSE_NOTIFY_INVAL_INODE, resulting in the
attribute cache being updated with stale information after the
invalidation.

Fix this by bumping the attribute version in fuse_reverse_inval_inode().

Reported-by: Krzysztof Rusek <rusek@9livesdata.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01 13:18:02 +02:00
Miklos Szeredi e431e923c8 fuse: don't check refcount after stealing page
[ Upstream commit 32f98877c57bee6bc27f443a96f49678a2cd6a50 ]

page_count() is unstable.  Unless there has been an RCU grace period
between when the page was removed from the page cache and now, a
speculative reference may exist from the page cache.

Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01 13:18:01 +02:00
Miklos Szeredi 8676732c33 fuse: fix weird page warning
commit a5005c3cda6eeb6b95645e6cc32f58dafeffc976 upstream.

When PageWaiters was added, updating this check was missed.

Reported-by: Nikolaus Rath <Nikolaus@rath.org>
Reported-by: Hugh Dickins <hughd@google.com>
Fixes: 6290602709 ("mm: add PageWaiters indicating tasks are waiting for a page bit")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: André Almeida <andrealmeid@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-29 10:18:28 +02:00
Chirantan Ekbote 4dd2ad6867 fuse: Fix parameter for FS_IOC_{GET,SET}FLAGS
commit 31070f6ccec09f3bd4f1e28cd1e592fa4f3ba0b6 upstream.

The ioctl encoding for this parameter is a long but the documentation says
it should be an int and the kernel drivers expect it to be an int.  If the
fuse driver treats this as a long it might end up scribbling over the stack
of a userspace process that only allocated enough space for an int.

This was previously discussed in [1] and a patch for fuse was proposed in
[2].  From what I can tell the patch in [2] was nacked in favor of adding
new, "fixed" ioctls and using those from userspace.  However there is still
no "fixed" version of these ioctls and the fact is that it's sometimes
infeasible to change all userspace to use the new one.

Handling the ioctls specially in the fuse driver seems like the most
pragmatic way for fuse servers to support them without causing crashes in
userspace applications that call them.

[1]: https://lore.kernel.org/linux-fsdevel/20131126200559.GH20559@hall.aurel32.net/T/
[2]: https://sourceforge.net/p/fuse/mailman/message/31771759/

Signed-off-by: Chirantan Ekbote <chirantan@chromium.org>
Fixes: 59efec7b90 ("fuse: implement ioctl support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-22 09:33:12 +02:00
Miklos Szeredi e8f32a9f5a fuse: use ->reconfigure() instead of ->remount_fs()
commit 0189a2d367f49729622fdafaef5da73161591859 upstream.

s_op->remount_fs() is only called from legacy_reconfigure(), which is not
used after being converted to the new API.

Convert to using ->reconfigure().  This restores the previous behavior of
syncing the filesystem and rejecting MS_MANDLOCK on remount.

Fixes: c30da2e981 ("fuse: convert to use the new mount API")
Cc: <stable@vger.kernel.org> # v5.4
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-22 09:33:12 +02:00
Miklos Szeredi f96ce4be46 fuse: ignore 'data' argument of mount(..., MS_REMOUNT)
commit e8b20a474cf2c42698d1942f939ff2128819f151 upstream.

The command

  mount -o remount -o unknownoption /mnt/fuse

succeeds on kernel versions prior to v5.4 and fails on kernel version at or
after.  This is because fuse_parse_param() rejects any unrecognised options
in case of FS_CONTEXT_FOR_RECONFIGURE, just as for FS_CONTEXT_FOR_MOUNT.

This causes a regression in case the fuse filesystem is in fstab, since
remount sends all options found there to the kernel; even ones that are
meant for the initial mount and are consumed by the userspace fuse server.

Fix this by ignoring mount options, just as fuse_remount_fs() did prior to
the conversion to the new API.

Reported-by: Stefan Priebe <s.priebe@profihost.ag>
Fixes: c30da2e981 ("fuse: convert to use the new mount API")
Cc: <stable@vger.kernel.org> # v5.4
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-22 09:33:12 +02:00
Vasily Averin 408ef501b8 fuse: don't ignore errors from fuse_writepages_fill()
[ Upstream commit 7779b047a57f6824a43d0e1f70de2741b7426b9d ]

fuse_writepages() ignores some errors taken from fuse_writepages_fill() I
believe it is a bug: if .writepages is called with WB_SYNC_ALL it should
either guarantee that all data was successfully saved or return error.

Fixes: 26d614df1d ("fuse: Implement writepages callback")
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-22 09:33:03 +02:00
Miklos Szeredi d658c127fc fuse: copy_file_range should truncate cache
[ Upstream commit 9b46418c40fe910e6537618f9932a8be78a3dd6c ]

After the copy operation completes the cache is not up-to-date.  Truncate
all pages in the interval that has successfully been copied.

Truncating completely copied dirty pages is okay, since the data has been
overwritten anyway.  Truncating partially copied dirty pages is not okay;
add a comment for now.

Fixes: 88bc7d5097 ("fuse: add support for copy_file_range()")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24 17:50:30 +02:00
Miklos Szeredi c9ddb8dd12 fuse: fix copy_file_range cache issues
[ Upstream commit 2c4656dfd994538176db30ce09c02cc0dfc361ae ]

a) Dirty cache needs to be written back not just in the writeback_cache
case, since the dirty pages may come from memory maps.

b) The fuse_writeback_range() helper takes an inclusive interval, so the
end position needs to be pos+len-1 instead of pos+len.

Fixes: 88bc7d5097 ("fuse: add support for copy_file_range()")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24 17:50:30 +02:00
Vivek Goyal 2b5e027657 virtiofs: schedule blocking async replies in separate worker
[ Upstream commit bb737bbe48bea9854455cb61ea1dc06e92ce586c ]

In virtiofs (unlike in regular fuse) processing of async replies is
serialized.  This can result in a deadlock in rare corner cases when
there's a circular dependency between the completion of two or more async
replies.

Such a deadlock can be reproduced with xfstests:generic/503 if TEST_DIR ==
SCRATCH_MNT (which is a misconfiguration):

 - Process A is waiting for page lock in worker thread context and blocked
   (virtio_fs_requests_done_work()).
 - Process B is holding page lock and waiting for pending writes to
   finish (fuse_wait_on_page_writeback()).
 - Write requests are waiting in virtqueue and can't complete because
   worker thread is blocked on page lock (process A).

Fix this by creating a unique work_struct for each async reply that can
block (O_DIRECT read).

Fixes: a62a8ef9d9 ("virtio-fs: add virtiofs filesystem")
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24 17:50:21 +02:00
Miklos Szeredi 63050b3dc0 fuse: fix stack use after return
commit 3e8cb8b2eaeb22f540f1cbc00cbb594047b7ba89 upstream.

Normal, synchronous requests will have their args allocated on the stack.
After the FR_FINISHED bit is set by receiving the reply from the userspace
fuse server, the originating task may return and reuse the stack frame,
resulting in an Oops if the args structure is dereferenced.

Fix by setting a flag in the request itself upon initializing, indicating
whether it has an asynchronous ->end() callback.

Reported-by: Kyle Sanderson <kyle.leet@gmail.com>
Reported-by: Michael Stapelberg <michael+lkml@stapelberg.ch>
Fixes: 2b319d1f6f ("fuse: don't dereference req->args on finished request")
Cc: <stable@vger.kernel.org> # v5.4
Tested-by: Michael Stapelberg <michael+lkml@stapelberg.ch>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18 07:17:52 +01:00
Miklos Szeredi 399ca7ee91 fuse: don't overflow LLONG_MAX with end offset
[ Upstream commit 2f1398291bf35fe027914ae7a9610d8e601fbfde ]

Handle the special case of fuse_readpages() wanting to read the last page
of a hugest file possible and overflowing the end offset in the process.

This is basically to unbreak xfstests:generic/525 and prevent filesystems
from doing bad things with an overflowing offset.

Reported-by: Xiao Yang <ice_yangxiao@163.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24 08:37:03 +01:00
Miklos Szeredi 07fbef9a6e fix up iter on short count in fuse_direct_io()
commit f658adeea45e430a24c7a157c3d5448925ac2038 upstream.

fuse_direct_io() can end up advancing the iterator by more than the amount
of data read or written.  This case is handled by the generic code if going
through ->direct_IO(), but not in the FOPEN_DIRECT_IO case.

Fix by reverting the extra bytes from the iterator in case of error or a
short count.

To test: install lxcfs, then the following testcase
  int fd = open("/var/lib/lxcfs/proc/uptime", O_RDONLY);
  sendfile(1, fd, NULL, 16777216);
  sendfile(1, fd, NULL, 16777216);
will spew WARN_ON() in iov_iter_pipe().

Reported-by: Peter Geis <pgwipeout@gmail.com>
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Fixes: 3c3db095b6 ("fuse: use iov_iter based generic splice helpers")
Cc: <stable@vger.kernel.org> # v5.1
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 04:35:43 -08:00
Miklos Szeredi 7e7f29200f fuse: fix fuse_send_readpages() in the syncronous read case
commit 7df1e988c723a066754090b22d047c3225342152 upstream.

Buffered read in fuse normally goes via:

 -> generic_file_buffered_read()
   -> fuse_readpages()
     -> fuse_send_readpages()
       ->fuse_simple_request() [called since v5.4]

In the case of a read request, fuse_simple_request() will return a
non-negative bytecount on success or a negative error value.  A positive
bytecount was taken to be an error and the PG_error flag set on the page.
This resulted in generic_file_buffered_read() falling back to ->readpage(),
which would repeat the read request and succeed.  Because of the repeated
read succeeding the bug was not detected with regression tests or other use
cases.

The FTP module in GVFS however fails the second read due to the
non-seekable nature of FTP downloads.

Fix by checking and ignoring positive return value from
fuse_simple_request().

Reported-by: Ondrej Holy <oholy@redhat.com>
Link: https://gitlab.gnome.org/GNOME/gvfs/issues/441
Fixes: 134831e36b ("fuse: convert readpages to simple api")
Cc: <stable@vger.kernel.org> # v5.4
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-23 08:22:32 +01:00
Miklos Szeredi cbc5b45148 fuse: verify attributes
commit eb59bd17d2fa6e5e84fba61a5ebdea984222e6d5 upstream.

If a filesystem returns negative inode sizes, future reads on the file were
causing the cpu to spin on truncate_pagecache.

Create a helper to validate the attributes.  This now does two things:

 - check the file mode
 - check if the file size fits in i_size without overflowing

Reported-by: Arijit Banerjee <arijit@rubrik.com>
Fixes: d8a5ba4545 ("[PATCH] FUSE - core")
Cc: <stable@vger.kernel.org> # v2.6.14
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-13 08:42:31 +01:00
Miklos Szeredi 8aa5c23ef8 fuse: verify write return
commit 8aab336b14c115c6bf1d4baeb9247e41ed9ce6de upstream.

Make sure filesystem is not returning a bogus number of bytes written.

Fixes: ea9b9907b8 ("fuse: implement perform_write")
Cc: <stable@vger.kernel.org> # v2.6.26
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-13 08:42:31 +01:00
Miklos Szeredi ba916a1310 fuse: verify nlink
commit c634da718db9b2fac201df2ae1b1b095344ce5eb upstream.

When adding a new hard link, make sure that i_nlink doesn't overflow.

Fixes: ac45d61357 ("fuse: fix nlink after unlink")
Cc: <stable@vger.kernel.org> # v3.4
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-13 08:42:30 +01:00
Miklos Szeredi a266e9072a fuse: fix leak of fuse_io_priv
commit f1ebdeffc6f325e30e0ddb9f7a70f1370fa4b851 upstream.

exit_aio() is sometimes stuck in wait_for_completion() after aio is issued
with direct IO and the task receives a signal.

The reason is failure to call ->ki_complete() due to a leaked reference to
fuse_io_priv.  This happens in fuse_async_req_send() if
fuse_simple_background() returns an error (e.g. -EINTR).

In this case the error value is propagated via io->err, so return success
to not confuse callers.

This issue is tracked as a virtio-fs issue:
https://gitlab.com/virtio-fs/qemu/issues/14

Reported-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Fixes: 45ac96ed7c ("fuse: convert direct_io to simple api")
Cc: <stable@vger.kernel.org> # v5.4
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-13 08:42:30 +01:00
Vasily Averin 091d1a7267 fuse: redundant get_fuse_inode() calls in fuse_writepages_fill()
Currently fuse_writepages_fill() calls get_fuse_inode() few times with
the same argument.

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-23 14:26:37 +02:00
Miklos Szeredi e4648309b8 fuse: truncate pending writes on O_TRUNC
Make sure cached writes are not reordered around open(..., O_TRUNC), with
the obvious wrong results.

Fixes: 4d99ff8f12 ("fuse: Turn writeback cache on")
Cc: <stable@vger.kernel.org> # v3.15+
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-23 14:26:37 +02:00
Miklos Szeredi b24e7598db fuse: flush dirty data/metadata before non-truncate setattr
If writeback cache is enabled, then writes might get reordered with
chmod/chown/utimes.  The problem with this is that performing the write in
the fuse daemon might itself change some of these attributes.  In such case
the following sequence of operations will result in file ending up with the
wrong mode, for example:

  int fd = open ("suid", O_WRONLY|O_CREAT|O_EXCL);
  write (fd, "1", 1);
  fchown (fd, 0, 0);
  fchmod (fd, 04755);
  close (fd);

This patch fixes this by flushing pending writes before performing
chown/chmod/utimes.

Reported-by: Giuseppe Scrivano <gscrivan@redhat.com>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Fixes: 4d99ff8f12 ("fuse: Turn writeback cache on")
Cc: <stable@vger.kernel.org> # v3.15+
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-23 14:26:37 +02:00
zhengbin 80da5a809d virtiofs: Remove set but not used variable 'fc'
Fixes gcc '-Wunused-but-set-variable' warning:

fs/fuse/virtio_fs.c: In function virtio_fs_wake_pending_and_unlock:
fs/fuse/virtio_fs.c:983:20: warning: variable fc set but not used [-Wunused-but-set-variable]

It is not used since commit 7ee1e2e631 ("virtiofs: No need to check
fpq->connected state")

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-23 10:25:17 +02:00
Vivek Goyal a9bfd9dd34 virtiofs: Retry request submission from worker context
If regular request queue gets full, currently we sleep for a bit and
retrying submission in submitter's context. This assumes submitter is not
holding any spin lock. But this assumption is not true for background
requests. For background requests, we are called with fc->bg_lock held.

This can lead to deadlock where one thread is trying submission with
fc->bg_lock held while request completion thread has called
fuse_request_end() which tries to acquire fc->bg_lock and gets blocked. As
request completion thread gets blocked, it does not make further progress
and that means queue does not get empty and submitter can't submit more
requests.

To solve this issue, retry submission with the help of a worker, instead of
retrying in submitter's context. We already do this for hiprio/forget
requests.

Reported-by: Chirantan Ekbote <chirantan@chromium.org>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21 15:57:08 +02:00
Vivek Goyal c17ea00961 virtiofs: Count pending forgets as in_flight forgets
If virtqueue is full, we put forget requests on a list and these forgets
are dispatched later using a worker. As of now we don't count these forgets
in fsvq->in_flight variable. This means when queue is being drained, we
have to have special logic to first drain these pending requests and then
wait for fsvq->in_flight to go to zero.

By counting pending forgets in fsvq->in_flight, we can get rid of special
logic and just wait for in_flight to go to zero. Worker thread will kick
and drain all the forgets anyway, leading in_flight to zero.

I also need similar logic for normal request queue in next patch where I am
about to defer request submission in the worker context if queue is full.

This simplifies the code a bit.

Also add two helper functions to inc/dec in_flight. Decrement in_flight
helper will later used to call completion when in_flight reaches zero.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21 15:57:07 +02:00
Vivek Goyal 5dbe190f34 virtiofs: Set FR_SENT flag only after request has been sent
FR_SENT flag should be set when request has been sent successfully sent
over virtqueue. This is used by interrupt logic to figure out if interrupt
request should be sent or not.

Also add it to fqp->processing list after sending it successfully.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21 15:57:07 +02:00
Vivek Goyal 7ee1e2e631 virtiofs: No need to check fpq->connected state
In virtiofs we keep per queue connected state in virtio_fs_vq->connected
and use that to end request if queue is not connected. And virtiofs does
not even touch fpq->connected state.

We probably need to merge these two at some point of time. For now,
simplify the code a bit and do not worry about checking state of
fpq->connected.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21 15:57:07 +02:00
Vivek Goyal 51fecdd255 virtiofs: Do not end request in submission context
Submission context can hold some locks which end request code tries to hold
again and deadlock can occur. For example, fc->bg_lock. If a background
request is being submitted, it might hold fc->bg_lock and if we could not
submit request (because device went away) and tried to end request, then
deadlock happens. During testing, I also got a warning from deadlock
detection code.

So put requests on a list and end requests from a worker thread.

I got following warning from deadlock detector.

[  603.137138] WARNING: possible recursive locking detected
[  603.137142] --------------------------------------------
[  603.137144] blogbench/2036 is trying to acquire lock:
[  603.137149] 00000000f0f51107 (&(&fc->bg_lock)->rlock){+.+.}, at: fuse_request_end+0xdf/0x1c0 [fuse]
[  603.140701]
[  603.140701] but task is already holding lock:
[  603.140703] 00000000f0f51107 (&(&fc->bg_lock)->rlock){+.+.}, at: fuse_simple_background+0x92/0x1d0 [fuse]
[  603.140713]
[  603.140713] other info that might help us debug this:
[  603.140714]  Possible unsafe locking scenario:
[  603.140714]
[  603.140715]        CPU0
[  603.140716]        ----
[  603.140716]   lock(&(&fc->bg_lock)->rlock);
[  603.140718]   lock(&(&fc->bg_lock)->rlock);
[  603.140719]
[  603.140719]  *** DEADLOCK ***

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21 15:57:07 +02:00
Miklos Szeredi 6c26f71759 fuse: don't advise readdirplus for negative lookup
If the FUSE_READDIRPLUS_AUTO feature is enabled, then lookups on a
directory before/during readdir are used as an indication that READDIRPLUS
should be used instead of READDIR.  However if the lookup turns out to be
negative, then selecting READDIRPLUS makes no sense.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21 15:57:07 +02:00
Miklos Szeredi 2b319d1f6f fuse: don't dereference req->args on finished request
Move the check for async request after check for the request being already
finished and done with.

Reported-by: syzbot+ae0bb7aae3de6b4594e2@syzkaller.appspotmail.com
Fixes: d49937749f ("fuse: stop copying args to fuse_req")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21 09:11:40 +02:00
Miklos Szeredi 3f22c74671 virtio-fs: don't show mount options
Virtio-fs does not accept any mount options, so it's confusing and wrong to
show any in /proc/mounts.

Reported-by: Stefan Hajnoczi <stefanha@redhat.com> 
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-15 16:11:41 +02:00
Vivek Goyal 112e72373d virtio-fs: Change module name to virtiofs.ko
We have been calling it virtio_fs and even file name is virtio_fs.c. Module
name is virtio_fs.ko but when registering file system user is supposed to
specify filesystem type as "virtiofs".

Masayoshi Mizuma reported that he specified filesytem type as "virtio_fs"
and got this warning on console.

  ------------[ cut here ]------------
  request_module fs-virtio_fs succeeded, but still no fs?
  WARNING: CPU: 1 PID: 1234 at fs/filesystems.c:274 get_fs_type+0x12c/0x138
  Modules linked in: ... virtio_fs fuse virtio_net net_failover ...
  CPU: 1 PID: 1234 Comm: mount Not tainted 5.4.0-rc1 #1

So looks like kernel could find the module virtio_fs.ko but could not find
filesystem type after that.

It probably is better to rename module name to virtiofs.ko so that above
warning goes away in case user ends up specifying wrong fs name.

Reported-by: Masayoshi Mizuma <msys.mizuma@gmail.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-14 10:20:33 +02:00
Linus Torvalds 8f744bdee4 add virtio-fs
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCXYx2zAAKCRDh3BK/laaZ
 PFpHAQD2G+F8a9e41jFTJg5YpNKMD8/Pl4T6v9chIO9qPXF2IAEAji0P1JterRfv
 ixiBhv54hSwYbk527nxNWE9tP5gAHAQ=
 =WCHy
 -----END PGP SIGNATURE-----

Merge tag 'virtio-fs-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse virtio-fs support from Miklos Szeredi:
 "Virtio-fs allows exporting directory trees on the host and mounting
  them in guest(s).

  This isn't actually a new filesystem, but a glue layer between the
  fuse filesystem and a virtio based back-end.

  It's similar in functionality to the existing virtio-9p solution, but
  significantly faster in benchmarks and has better POSIX compliance.
  Further permformance improvements can be achieved by sharing the page
  cache between host and guest, allowing for faster I/O and reduced
  memory use.

  Kata Containers have been including the out-of-tree virtio-fs (with
  the shared page cache patches as well) since version 1.7 as an
  experimental feature. They have been active in development and plan to
  switch from virtio-9p to virtio-fs as their default solution. There
  has been interest from other sources as well.

  The userspace infrastructure is slated to be merged into qemu once the
  kernel part hits mainline.

  This was developed by Vivek Goyal, Dave Gilbert and Stefan Hajnoczi"

* tag 'virtio-fs-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  virtio-fs: add virtiofs filesystem
  virtio-fs: add Documentation/filesystems/virtiofs.rst
  fuse: reserve values for mapping protocol
2019-09-27 15:54:24 -07:00
YueHaibing 5addcd5dbd fuse: Make fuse_args_to_req static
Fix sparse warning:

fs/fuse/dev.c:468:6: warning: symbol 'fuse_args_to_req' was not declared. Should it be static?

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Fixes: 68583165f9 ("fuse: add pages to fuse_args")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-24 15:28:02 +02:00
zhengbin 9ad09b1976 fuse: fix memleak in cuse_channel_open
If cuse_send_init fails, need to fuse_conn_put cc->fc.

cuse_channel_open->fuse_conn_init->refcount_set(&fc->count, 1)
                 ->fuse_dev_alloc->fuse_conn_get
                 ->fuse_dev_free->fuse_conn_put

Fixes: cc080e9e9b ("fuse: introduce per-instance fuse_dev structure")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-24 15:28:01 +02:00
Tejun Heo e5854b1cdf fuse: fix beyond-end-of-page access in fuse_parse_cache()
With DEBUG_PAGEALLOC on, the following triggers.

  BUG: unable to handle page fault for address: ffff88859367c000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 3001067 P4D 3001067 PUD 406d3a8067 PMD 406d30c067 PTE 800ffffa6c983060
  Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
  CPU: 38 PID: 3110657 Comm: python2.7
  RIP: 0010:fuse_readdir+0x88f/0xe7a [fuse]
  Code: 49 8b 4d 08 49 39 4e 60 0f 84 44 04 00 00 48 8b 43 08 43 8d 1c 3c 4d 01 7e 68 49 89 dc 48 03 5c 24 38 49 89 46 60 8b 44 24 30 <8b> 4b 10 44 29 e0 48 89 ca 48 83 c1 1f 48 83 e1 f8 83 f8 17 49 89
  RSP: 0018:ffffc90035edbde0 EFLAGS: 00010286
  RAX: 0000000000001000 RBX: ffff88859367bff0 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: ffff88859367bfed RDI: 0000000000920907
  RBP: ffffc90035edbe90 R08: 000000000000014b R09: 0000000000000004
  R10: ffff88859367b000 R11: 0000000000000000 R12: 0000000000000ff0
  R13: ffffc90035edbee0 R14: ffff889fb8546180 R15: 0000000000000020
  FS:  00007f80b5f4a740(0000) GS:ffff889fffa00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffff88859367c000 CR3: 0000001c170c2001 CR4: 00000000003606e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   iterate_dir+0x122/0x180
   __x64_sys_getdents+0xa6/0x140
   do_syscall_64+0x42/0x100
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

It's in fuse_parse_cache().  %rbx (ffff88859367bff0) is fuse_dirent
pointer - addr + offset.  FUSE_DIRENT_SIZE() is trying to dereference
namelen off of it but that derefs into the next page which is disabled
by pagealloc debug causing a PF.

This is caused by dirent->namelen being accessed before ensuring that
there's enough bytes in the page for the dirent.  Fix it by pushing
down reclen calculation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 5d7bc7e868 ("fuse: allow using readdir cache")
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-24 15:28:01 +02:00
Arnd Bergmann 0ed4059302 fuse: unexport fuse_put_request
This function has been made static, which now causes a compile-time
warning:

WARNING: "fuse_put_request" [vmlinux] is a static EXPORT_SYMBOL_GPL

Remove the unneeded export.

Fixes: 66abc3599c ("fuse: unexport request ops")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-24 15:28:01 +02:00
Khazhismel Kumykov dc69e98c24 fuse: kmemcg account fs data
account per-file, dentry, and inode data

blockdev/superblock and temporary per-request data was left alone, as
this usually isn't accounted

Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-24 15:28:01 +02:00