linux-brain/drivers/scsi/scsi_sysfs.c

1623 lines
42 KiB
C
Raw Permalink Normal View History

// SPDX-License-Identifier: GPL-2.0-only
/*
* scsi_sysfs.c
*
* SCSI sysfs interface routines.
*
* Created to pull SCSI mid layer sysfs routines into one file.
*/
#include <linux/module.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 17:04:11 +09:00
#include <linux/slab.h>
#include <linux/init.h>
#include <linux/blkdev.h>
#include <linux/device.h>
[SCSI] implement runtime Power Management This patch (as1398b) adds runtime PM support to the SCSI layer. Only the machanism is provided; use of it is up to the various high-level drivers, and the patch doesn't change any of them. Except for sg -- the patch expicitly prevents a device from being runtime-suspended while its sg device file is open. The implementation is simplistic. In general, hosts and targets are automatically suspended when all their children are asleep, but for them the runtime-suspend code doesn't actually do anything. (A host's runtime PM status is propagated up the device tree, though, so a runtime-PM-aware lower-level driver could power down the host adapter hardware at the appropriate times.) There are comments indicating where a transport class might be notified or some other hooks added. LUNs are runtime-suspended by calling the drivers' existing suspend handlers (and likewise for runtime-resume). Somewhat arbitrarily, the implementation delays for 100 ms before suspending an eligible LUN. This is because there typically are occasions during bootup when the same device file is opened and closed several times in quick succession. The way this all works is that the SCSI core increments a device's PM-usage count when it is registered. If a high-level driver does nothing then the device will not be eligible for runtime-suspend because of the elevated usage count. If a high-level driver wants to use runtime PM then it can call scsi_autopm_put_device() in its probe routine to decrement the usage count and scsi_autopm_get_device() in its remove routine to restore the original count. Hosts, targets, and LUNs are not suspended while they are being probed or removed, or while the error handler is running. In fact, a fairly large part of the patch consists of code to make sure that things aren't suspended at such times. [jejb: fix up compile issues in PM config variations] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-06-17 23:41:42 +09:00
#include <linux/pm_runtime.h>
#include <scsi/scsi.h>
#include <scsi/scsi_device.h>
#include <scsi/scsi_host.h>
#include <scsi/scsi_tcq.h>
#include <scsi/scsi_dh.h>
#include <scsi/scsi_transport.h>
#include <scsi/scsi_driver.h>
#include <scsi/scsi_devinfo.h>
#include "scsi_priv.h"
#include "scsi_logging.h"
static struct device_type scsi_dev_type;
static const struct {
enum scsi_device_state value;
char *name;
} sdev_states[] = {
{ SDEV_CREATED, "created" },
{ SDEV_RUNNING, "running" },
{ SDEV_CANCEL, "cancel" },
{ SDEV_DEL, "deleted" },
{ SDEV_QUIESCE, "quiesce" },
{ SDEV_OFFLINE, "offline" },
{ SDEV_TRANSPORT_OFFLINE, "transport-offline" },
{ SDEV_BLOCK, "blocked" },
{ SDEV_CREATED_BLOCK, "created-blocked" },
};
const char *scsi_device_state_name(enum scsi_device_state state)
{
int i;
char *name = NULL;
for (i = 0; i < ARRAY_SIZE(sdev_states); i++) {
if (sdev_states[i].value == state) {
name = sdev_states[i].name;
break;
}
}
return name;
}
static const struct {
enum scsi_host_state value;
char *name;
} shost_states[] = {
{ SHOST_CREATED, "created" },
{ SHOST_RUNNING, "running" },
{ SHOST_CANCEL, "cancel" },
{ SHOST_DEL, "deleted" },
{ SHOST_RECOVERY, "recovery" },
{ SHOST_CANCEL_RECOVERY, "cancel/recovery" },
{ SHOST_DEL_RECOVERY, "deleted/recovery", },
};
const char *scsi_host_state_name(enum scsi_host_state state)
{
int i;
char *name = NULL;
for (i = 0; i < ARRAY_SIZE(shost_states); i++) {
if (shost_states[i].value == state) {
name = shost_states[i].name;
break;
}
}
return name;
}
#ifdef CONFIG_SCSI_DH
static const struct {
unsigned char value;
char *name;
} sdev_access_states[] = {
{ SCSI_ACCESS_STATE_OPTIMAL, "active/optimized" },
{ SCSI_ACCESS_STATE_ACTIVE, "active/non-optimized" },
{ SCSI_ACCESS_STATE_STANDBY, "standby" },
{ SCSI_ACCESS_STATE_UNAVAILABLE, "unavailable" },
{ SCSI_ACCESS_STATE_LBA, "lba-dependent" },
{ SCSI_ACCESS_STATE_OFFLINE, "offline" },
{ SCSI_ACCESS_STATE_TRANSITIONING, "transitioning" },
};
static const char *scsi_access_state_name(unsigned char state)
{
int i;
char *name = NULL;
for (i = 0; i < ARRAY_SIZE(sdev_access_states); i++) {
if (sdev_access_states[i].value == state) {
name = sdev_access_states[i].name;
break;
}
}
return name;
}
#endif
static int check_set(unsigned long long *val, char *src)
{
char *last;
if (strcmp(src, "-") == 0) {
*val = SCAN_WILD_CARD;
} else {
/*
* Doesn't check for int overflow
*/
*val = simple_strtoull(src, &last, 0);
if (*last != '\0')
return 1;
}
return 0;
}
static int scsi_scan(struct Scsi_Host *shost, const char *str)
{
char s1[15], s2[15], s3[17], junk;
unsigned long long channel, id, lun;
int res;
res = sscanf(str, "%10s %10s %16s %c", s1, s2, s3, &junk);
if (res != 3)
return -EINVAL;
if (check_set(&channel, s1))
return -EINVAL;
if (check_set(&id, s2))
return -EINVAL;
if (check_set(&lun, s3))
return -EINVAL;
if (shost->transportt->user_scan)
res = shost->transportt->user_scan(shost, channel, id, lun);
else
res = scsi_scan_host_selected(shost, channel, id, lun,
SCSI_SCAN_MANUAL);
return res;
}
/*
* shost_show_function: macro to create an attr function that can be used to
* show a non-bit field.
*/
#define shost_show_function(name, field, format_string) \
static ssize_t \
show_##name (struct device *dev, struct device_attribute *attr, \
char *buf) \
{ \
struct Scsi_Host *shost = class_to_shost(dev); \
return snprintf (buf, 20, format_string, shost->field); \
}
/*
* shost_rd_attr: macro to create a function and attribute variable for a
* read only field.
*/
#define shost_rd_attr2(name, field, format_string) \
shost_show_function(name, field, format_string) \
static DEVICE_ATTR(name, S_IRUGO, show_##name, NULL);
#define shost_rd_attr(field, format_string) \
shost_rd_attr2(field, field, format_string)
/*
* Create the actual show/store functions and data structures.
*/
static ssize_t
store_scan(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
{
struct Scsi_Host *shost = class_to_shost(dev);
int res;
res = scsi_scan(shost, buf);
if (res == 0)
res = count;
return res;
};
static DEVICE_ATTR(scan, S_IWUSR, NULL, store_scan);
static ssize_t
store_shost_state(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
{
int i;
struct Scsi_Host *shost = class_to_shost(dev);
enum scsi_host_state state = 0;
for (i = 0; i < ARRAY_SIZE(shost_states); i++) {
const int len = strlen(shost_states[i].name);
if (strncmp(shost_states[i].name, buf, len) == 0 &&
buf[len] == '\n') {
state = shost_states[i].value;
break;
}
}
if (!state)
return -EINVAL;
if (scsi_host_set_state(shost, state))
return -EINVAL;
return count;
}
static ssize_t
show_shost_state(struct device *dev, struct device_attribute *attr, char *buf)
{
struct Scsi_Host *shost = class_to_shost(dev);
const char *name = scsi_host_state_name(shost->shost_state);
if (!name)
return -EINVAL;
return snprintf(buf, 20, "%s\n", name);
}
/* DEVICE_ATTR(state) clashes with dev_attr_state for sdev */
static struct device_attribute dev_attr_hstate =
__ATTR(state, S_IRUGO | S_IWUSR, show_shost_state, store_shost_state);
static ssize_t
show_shost_mode(unsigned int mode, char *buf)
{
ssize_t len = 0;
if (mode & MODE_INITIATOR)
len = sprintf(buf, "%s", "Initiator");
if (mode & MODE_TARGET)
len += sprintf(buf + len, "%s%s", len ? ", " : "", "Target");
len += sprintf(buf + len, "\n");
return len;
}
static ssize_t
show_shost_supported_mode(struct device *dev, struct device_attribute *attr,
char *buf)
{
struct Scsi_Host *shost = class_to_shost(dev);
unsigned int supported_mode = shost->hostt->supported_mode;
if (supported_mode == MODE_UNKNOWN)
/* by default this should be initiator */
supported_mode = MODE_INITIATOR;
return show_shost_mode(supported_mode, buf);
}
static DEVICE_ATTR(supported_mode, S_IRUGO | S_IWUSR, show_shost_supported_mode, NULL);
static ssize_t
show_shost_active_mode(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct Scsi_Host *shost = class_to_shost(dev);
if (shost->active_mode == MODE_UNKNOWN)
return snprintf(buf, 20, "unknown\n");
else
return show_shost_mode(shost->active_mode, buf);
}
static DEVICE_ATTR(active_mode, S_IRUGO | S_IWUSR, show_shost_active_mode, NULL);
[SCSI] prevent stack buffer overflow in host_reset store_host_reset() has tried to re-invent the wheel to compare sysfs strings. Unfortunately it did so poorly and never bothered to check the input from userspace before overwriting stack with it, so something simple as: echo "WoopsieWoopsie" > /sys/devices/pseudo_0/adapter0/host0/scsi_host/host0/host_reset would result in: [ 316.310101] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81f5bac7 [ 316.310101] [ 316.320051] Pid: 6655, comm: sh Tainted: G W 3.7.0-rc5-next-20121114-sasha-00016-g5c9d68d-dirty #129 [ 316.320051] Call Trace: [ 316.340058] pps pps0: PPS event at 1352918752.620355751 [ 316.340062] pps pps0: capture assert seq #303 [ 316.320051] [<ffffffff83b3856b>] panic+0xcd/0x1f4 [ 316.320051] [<ffffffff81f5bac7>] ? store_host_reset+0xd7/0x100 [ 316.320051] [<ffffffff8110b996>] __stack_chk_fail+0x16/0x20 [ 316.320051] [<ffffffff81f5bac7>] store_host_reset+0xd7/0x100 [ 316.320051] [<ffffffff81e55bb3>] dev_attr_store+0x13/0x30 [ 316.320051] [<ffffffff812f7db1>] sysfs_write_file+0x101/0x170 [ 316.320051] [<ffffffff8127acc8>] vfs_write+0xb8/0x180 [ 316.320051] [<ffffffff8127ae80>] sys_write+0x50/0xa0 [ 316.320051] [<ffffffff83c03418>] tracesys+0xe1/0xe6 Fix this by uninventing whatever was going on there and just use sysfs_streq. Bug introduced by 29443691 ("[SCSI] scsi: Added support for adapter and firmware reset"). [jejb: added necessary const to prevent compile warnings] Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: <stable@vger.kernel.org> #3.2+ Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-11-16 05:51:46 +09:00
static int check_reset_type(const char *str)
{
[SCSI] prevent stack buffer overflow in host_reset store_host_reset() has tried to re-invent the wheel to compare sysfs strings. Unfortunately it did so poorly and never bothered to check the input from userspace before overwriting stack with it, so something simple as: echo "WoopsieWoopsie" > /sys/devices/pseudo_0/adapter0/host0/scsi_host/host0/host_reset would result in: [ 316.310101] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81f5bac7 [ 316.310101] [ 316.320051] Pid: 6655, comm: sh Tainted: G W 3.7.0-rc5-next-20121114-sasha-00016-g5c9d68d-dirty #129 [ 316.320051] Call Trace: [ 316.340058] pps pps0: PPS event at 1352918752.620355751 [ 316.340062] pps pps0: capture assert seq #303 [ 316.320051] [<ffffffff83b3856b>] panic+0xcd/0x1f4 [ 316.320051] [<ffffffff81f5bac7>] ? store_host_reset+0xd7/0x100 [ 316.320051] [<ffffffff8110b996>] __stack_chk_fail+0x16/0x20 [ 316.320051] [<ffffffff81f5bac7>] store_host_reset+0xd7/0x100 [ 316.320051] [<ffffffff81e55bb3>] dev_attr_store+0x13/0x30 [ 316.320051] [<ffffffff812f7db1>] sysfs_write_file+0x101/0x170 [ 316.320051] [<ffffffff8127acc8>] vfs_write+0xb8/0x180 [ 316.320051] [<ffffffff8127ae80>] sys_write+0x50/0xa0 [ 316.320051] [<ffffffff83c03418>] tracesys+0xe1/0xe6 Fix this by uninventing whatever was going on there and just use sysfs_streq. Bug introduced by 29443691 ("[SCSI] scsi: Added support for adapter and firmware reset"). [jejb: added necessary const to prevent compile warnings] Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: <stable@vger.kernel.org> #3.2+ Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-11-16 05:51:46 +09:00
if (sysfs_streq(str, "adapter"))
return SCSI_ADAPTER_RESET;
[SCSI] prevent stack buffer overflow in host_reset store_host_reset() has tried to re-invent the wheel to compare sysfs strings. Unfortunately it did so poorly and never bothered to check the input from userspace before overwriting stack with it, so something simple as: echo "WoopsieWoopsie" > /sys/devices/pseudo_0/adapter0/host0/scsi_host/host0/host_reset would result in: [ 316.310101] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81f5bac7 [ 316.310101] [ 316.320051] Pid: 6655, comm: sh Tainted: G W 3.7.0-rc5-next-20121114-sasha-00016-g5c9d68d-dirty #129 [ 316.320051] Call Trace: [ 316.340058] pps pps0: PPS event at 1352918752.620355751 [ 316.340062] pps pps0: capture assert seq #303 [ 316.320051] [<ffffffff83b3856b>] panic+0xcd/0x1f4 [ 316.320051] [<ffffffff81f5bac7>] ? store_host_reset+0xd7/0x100 [ 316.320051] [<ffffffff8110b996>] __stack_chk_fail+0x16/0x20 [ 316.320051] [<ffffffff81f5bac7>] store_host_reset+0xd7/0x100 [ 316.320051] [<ffffffff81e55bb3>] dev_attr_store+0x13/0x30 [ 316.320051] [<ffffffff812f7db1>] sysfs_write_file+0x101/0x170 [ 316.320051] [<ffffffff8127acc8>] vfs_write+0xb8/0x180 [ 316.320051] [<ffffffff8127ae80>] sys_write+0x50/0xa0 [ 316.320051] [<ffffffff83c03418>] tracesys+0xe1/0xe6 Fix this by uninventing whatever was going on there and just use sysfs_streq. Bug introduced by 29443691 ("[SCSI] scsi: Added support for adapter and firmware reset"). [jejb: added necessary const to prevent compile warnings] Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: <stable@vger.kernel.org> #3.2+ Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-11-16 05:51:46 +09:00
else if (sysfs_streq(str, "firmware"))
return SCSI_FIRMWARE_RESET;
else
return 0;
}
static ssize_t
store_host_reset(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
{
struct Scsi_Host *shost = class_to_shost(dev);
struct scsi_host_template *sht = shost->hostt;
int ret = -EINVAL;
int type;
[SCSI] prevent stack buffer overflow in host_reset store_host_reset() has tried to re-invent the wheel to compare sysfs strings. Unfortunately it did so poorly and never bothered to check the input from userspace before overwriting stack with it, so something simple as: echo "WoopsieWoopsie" > /sys/devices/pseudo_0/adapter0/host0/scsi_host/host0/host_reset would result in: [ 316.310101] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81f5bac7 [ 316.310101] [ 316.320051] Pid: 6655, comm: sh Tainted: G W 3.7.0-rc5-next-20121114-sasha-00016-g5c9d68d-dirty #129 [ 316.320051] Call Trace: [ 316.340058] pps pps0: PPS event at 1352918752.620355751 [ 316.340062] pps pps0: capture assert seq #303 [ 316.320051] [<ffffffff83b3856b>] panic+0xcd/0x1f4 [ 316.320051] [<ffffffff81f5bac7>] ? store_host_reset+0xd7/0x100 [ 316.320051] [<ffffffff8110b996>] __stack_chk_fail+0x16/0x20 [ 316.320051] [<ffffffff81f5bac7>] store_host_reset+0xd7/0x100 [ 316.320051] [<ffffffff81e55bb3>] dev_attr_store+0x13/0x30 [ 316.320051] [<ffffffff812f7db1>] sysfs_write_file+0x101/0x170 [ 316.320051] [<ffffffff8127acc8>] vfs_write+0xb8/0x180 [ 316.320051] [<ffffffff8127ae80>] sys_write+0x50/0xa0 [ 316.320051] [<ffffffff83c03418>] tracesys+0xe1/0xe6 Fix this by uninventing whatever was going on there and just use sysfs_streq. Bug introduced by 29443691 ("[SCSI] scsi: Added support for adapter and firmware reset"). [jejb: added necessary const to prevent compile warnings] Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: <stable@vger.kernel.org> #3.2+ Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-11-16 05:51:46 +09:00
type = check_reset_type(buf);
if (!type)
goto exit_store_host_reset;
if (sht->host_reset)
ret = sht->host_reset(shost, type);
else
ret = -EOPNOTSUPP;
exit_store_host_reset:
if (ret == 0)
ret = count;
return ret;
}
static DEVICE_ATTR(host_reset, S_IWUSR, NULL, store_host_reset);
static ssize_t
show_shost_eh_deadline(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct Scsi_Host *shost = class_to_shost(dev);
if (shost->eh_deadline == -1)
return snprintf(buf, strlen("off") + 2, "off\n");
return sprintf(buf, "%u\n", shost->eh_deadline / HZ);
}
static ssize_t
store_shost_eh_deadline(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
{
struct Scsi_Host *shost = class_to_shost(dev);
int ret = -EINVAL;
unsigned long deadline, flags;
if (shost->transportt &&
(shost->transportt->eh_strategy_handler ||
!shost->hostt->eh_host_reset_handler))
return ret;
if (!strncmp(buf, "off", strlen("off")))
deadline = -1;
else {
ret = kstrtoul(buf, 10, &deadline);
if (ret)
return ret;
if (deadline * HZ > UINT_MAX)
return -EINVAL;
}
spin_lock_irqsave(shost->host_lock, flags);
if (scsi_host_in_recovery(shost))
ret = -EBUSY;
else {
if (deadline == -1)
shost->eh_deadline = -1;
else
shost->eh_deadline = deadline * HZ;
ret = count;
}
spin_unlock_irqrestore(shost->host_lock, flags);
return ret;
}
static DEVICE_ATTR(eh_deadline, S_IRUGO | S_IWUSR, show_shost_eh_deadline, store_shost_eh_deadline);
shost_rd_attr(unique_id, "%u\n");
shost_rd_attr(cmd_per_lun, "%hd\n");
shost_rd_attr(can_queue, "%hd\n");
shost_rd_attr(sg_tablesize, "%hu\n");
shost_rd_attr(sg_prot_tablesize, "%hu\n");
shost_rd_attr(unchecked_isa_dma, "%d\n");
shost_rd_attr(prot_capabilities, "%u\n");
shost_rd_attr(prot_guard_type, "%hd\n");
shost_rd_attr2(proc_name, hostt->proc_name, "%s\n");
static ssize_t
show_host_busy(struct device *dev, struct device_attribute *attr, char *buf)
{
struct Scsi_Host *shost = class_to_shost(dev);
return snprintf(buf, 20, "%d\n", scsi_host_busy(shost));
}
static DEVICE_ATTR(host_busy, S_IRUGO, show_host_busy, NULL);
static ssize_t
show_use_blk_mq(struct device *dev, struct device_attribute *attr, char *buf)
{
return sprintf(buf, "1\n");
}
static DEVICE_ATTR(use_blk_mq, S_IRUGO, show_use_blk_mq, NULL);
static struct attribute *scsi_sysfs_shost_attrs[] = {
scsi: add support for a blk-mq based I/O path. This patch adds support for an alternate I/O path in the scsi midlayer which uses the blk-mq infrastructure instead of the legacy request code. Use of blk-mq is fully transparent to drivers, although for now a host template field is provided to opt out of blk-mq usage in case any unforseen incompatibilities arise. In general replacing the legacy request code with blk-mq is a simple and mostly mechanical transformation. The biggest exception is the new code that deals with the fact the I/O submissions in blk-mq must happen from process context, which slightly complicates the I/O completion handler. The second biggest differences is that blk-mq is build around the concept of preallocated requests that also include driver specific data, which in SCSI context means the scsi_cmnd structure. This completely avoids dynamic memory allocations for the fast path through I/O submission. Due the preallocated requests the MQ code path exclusively uses the host-wide shared tag allocator instead of a per-LUN one. This only affects drivers actually using the block layer provided tag allocator instead of their own. Unlike the old path blk-mq always provides a tag, although drivers don't have to use it. For now the blk-mq path is disable by defauly and must be enabled using the "use_blk_mq" module parameter. Once the remaining work in the block layer to make blk-mq more suitable for slow devices is complete I hope to make it the default and eventually even remove the old code path. Based on the earlier scsi-mq prototype by Nicholas Bellinger. Thanks to Bart Van Assche and Robert Elliot for testing, benchmarking and various sugestions and code contributions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>
2014-01-17 20:06:53 +09:00
&dev_attr_use_blk_mq.attr,
&dev_attr_unique_id.attr,
&dev_attr_host_busy.attr,
&dev_attr_cmd_per_lun.attr,
&dev_attr_can_queue.attr,
&dev_attr_sg_tablesize.attr,
&dev_attr_sg_prot_tablesize.attr,
&dev_attr_unchecked_isa_dma.attr,
&dev_attr_proc_name.attr,
&dev_attr_scan.attr,
&dev_attr_hstate.attr,
&dev_attr_supported_mode.attr,
&dev_attr_active_mode.attr,
&dev_attr_prot_capabilities.attr,
&dev_attr_prot_guard_type.attr,
&dev_attr_host_reset.attr,
&dev_attr_eh_deadline.attr,
NULL
};
static struct attribute_group scsi_shost_attr_group = {
.attrs = scsi_sysfs_shost_attrs,
};
const struct attribute_group *scsi_sysfs_shost_attr_groups[] = {
&scsi_shost_attr_group,
NULL
};
static void scsi_device_cls_release(struct device *class_dev)
{
struct scsi_device *sdev;
sdev = class_to_sdev(class_dev);
put_device(&sdev->sdev_gendev);
}
2006-11-22 23:55:48 +09:00
static void scsi_device_dev_release_usercontext(struct work_struct *work)
{
struct scsi_device *sdev;
struct device *parent;
struct list_head *this, *tmp;
struct scsi_vpd *vpd_pg80 = NULL, *vpd_pg83 = NULL;
unsigned long flags;
2006-11-22 23:55:48 +09:00
sdev = container_of(work, struct scsi_device, ew.work);
scsi_dh_release_device(sdev);
2006-11-22 23:55:48 +09:00
parent = sdev->sdev_gendev.parent;
spin_lock_irqsave(sdev->host->host_lock, flags);
list_del(&sdev->siblings);
list_del(&sdev->same_target_siblings);
list_del(&sdev->starved_entry);
spin_unlock_irqrestore(sdev->host->host_lock, flags);
cancel_work_sync(&sdev->event_work);
list_for_each_safe(this, tmp, &sdev->event_list) {
struct scsi_event *evt;
evt = list_entry(this, struct scsi_event, node);
list_del(&evt->node);
kfree(evt);
}
blk_put_queue(sdev->request_queue);
/* NULL queue means the device can't be used */
sdev->request_queue = NULL;
mutex_lock(&sdev->inquiry_mutex);
rcu_swap_protected(sdev->vpd_pg80, vpd_pg80,
lockdep_is_held(&sdev->inquiry_mutex));
rcu_swap_protected(sdev->vpd_pg83, vpd_pg83,
lockdep_is_held(&sdev->inquiry_mutex));
mutex_unlock(&sdev->inquiry_mutex);
if (vpd_pg83)
kfree_rcu(vpd_pg83, rcu);
if (vpd_pg80)
kfree_rcu(vpd_pg80, rcu);
kfree(sdev->inquiry);
kfree(sdev);
if (parent)
put_device(parent);
}
static void scsi_device_dev_release(struct device *dev)
{
struct scsi_device *sdp = to_scsi_device(dev);
2006-11-22 23:55:48 +09:00
execute_in_process_context(scsi_device_dev_release_usercontext,
&sdp->ew);
}
static struct class sdev_class = {
.name = "scsi_device",
.dev_release = scsi_device_cls_release,
};
/* all probing is done in the individual ->probe routines */
static int scsi_bus_match(struct device *dev, struct device_driver *gendrv)
{
struct scsi_device *sdp;
if (dev->type != &scsi_dev_type)
return 0;
sdp = to_scsi_device(dev);
if (sdp->no_uld_attach)
return 0;
return (sdp->inq_periph_qual == SCSI_INQ_PQ_CON)? 1: 0;
}
static int scsi_bus_uevent(struct device *dev, struct kobj_uevent_env *env)
[SCSI] modalias for scsi devices The following patch adds support for sysfs/uevent modalias attribute for scsi devices (like disks, tapes, cdroms etc), based on whatever current sd.c, sr.c, st.c and osst.c drivers supports. The modalias format is like this: scsi:type-0x04 (for TYPE_WORM, handled by sr.c now). Several comments. o This hexadecimal type value is because all TYPE_XXX constants in include/scsi/scsi.h are given in hex, but __stringify() will not convert them to decimal (so it will NOT be scsi:type-4). Since it does not really matter in which format it is, while both modalias in module and modalias attribute match each other, I descided to go for that 0x%02x format (and added a comment in include/scsi/scsi.h to keep them that way), instead of changing them all to decimal. o There was no .uevent routine for SCSI bus. It might be a good idea to add some more ueven environment variables in there. o osst.c driver handles tapes too, like st.c, but only SOME tapes. With this setup, hotplug scripts (or whatever is used by the user) will try to load both st and osst modules for all SCSI tapes found, because both modules have scsi:type-0x01 alias). It is not harmful, but one extra module is no good either. It is possible to solve this, by exporting more info in modalias attribute, including vendor and device identification strings, so that modalias becomes something like scsi:type-0x12:vendor-Adaptec LTD:device-OnStream Tape Drive and having that, match for all 3 attributes, not only device type. But oh well, vendor and device strings may be large, and they do contain spaces and whatnot. So I left them for now, awaiting for comments first. Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-10-27 21:02:37 +09:00
{
struct scsi_device *sdev;
if (dev->type != &scsi_dev_type)
return 0;
sdev = to_scsi_device(dev);
[SCSI] modalias for scsi devices The following patch adds support for sysfs/uevent modalias attribute for scsi devices (like disks, tapes, cdroms etc), based on whatever current sd.c, sr.c, st.c and osst.c drivers supports. The modalias format is like this: scsi:type-0x04 (for TYPE_WORM, handled by sr.c now). Several comments. o This hexadecimal type value is because all TYPE_XXX constants in include/scsi/scsi.h are given in hex, but __stringify() will not convert them to decimal (so it will NOT be scsi:type-4). Since it does not really matter in which format it is, while both modalias in module and modalias attribute match each other, I descided to go for that 0x%02x format (and added a comment in include/scsi/scsi.h to keep them that way), instead of changing them all to decimal. o There was no .uevent routine for SCSI bus. It might be a good idea to add some more ueven environment variables in there. o osst.c driver handles tapes too, like st.c, but only SOME tapes. With this setup, hotplug scripts (or whatever is used by the user) will try to load both st and osst modules for all SCSI tapes found, because both modules have scsi:type-0x01 alias). It is not harmful, but one extra module is no good either. It is possible to solve this, by exporting more info in modalias attribute, including vendor and device identification strings, so that modalias becomes something like scsi:type-0x12:vendor-Adaptec LTD:device-OnStream Tape Drive and having that, match for all 3 attributes, not only device type. But oh well, vendor and device strings may be large, and they do contain spaces and whatnot. So I left them for now, awaiting for comments first. Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-10-27 21:02:37 +09:00
add_uevent_var(env, "MODALIAS=" SCSI_DEVICE_MODALIAS_FMT, sdev->type);
[SCSI] modalias for scsi devices The following patch adds support for sysfs/uevent modalias attribute for scsi devices (like disks, tapes, cdroms etc), based on whatever current sd.c, sr.c, st.c and osst.c drivers supports. The modalias format is like this: scsi:type-0x04 (for TYPE_WORM, handled by sr.c now). Several comments. o This hexadecimal type value is because all TYPE_XXX constants in include/scsi/scsi.h are given in hex, but __stringify() will not convert them to decimal (so it will NOT be scsi:type-4). Since it does not really matter in which format it is, while both modalias in module and modalias attribute match each other, I descided to go for that 0x%02x format (and added a comment in include/scsi/scsi.h to keep them that way), instead of changing them all to decimal. o There was no .uevent routine for SCSI bus. It might be a good idea to add some more ueven environment variables in there. o osst.c driver handles tapes too, like st.c, but only SOME tapes. With this setup, hotplug scripts (or whatever is used by the user) will try to load both st and osst modules for all SCSI tapes found, because both modules have scsi:type-0x01 alias). It is not harmful, but one extra module is no good either. It is possible to solve this, by exporting more info in modalias attribute, including vendor and device identification strings, so that modalias becomes something like scsi:type-0x12:vendor-Adaptec LTD:device-OnStream Tape Drive and having that, match for all 3 attributes, not only device type. But oh well, vendor and device strings may be large, and they do contain spaces and whatnot. So I left them for now, awaiting for comments first. Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-10-27 21:02:37 +09:00
return 0;
}
struct bus_type scsi_bus_type = {
.name = "scsi",
.match = scsi_bus_match,
[SCSI] modalias for scsi devices The following patch adds support for sysfs/uevent modalias attribute for scsi devices (like disks, tapes, cdroms etc), based on whatever current sd.c, sr.c, st.c and osst.c drivers supports. The modalias format is like this: scsi:type-0x04 (for TYPE_WORM, handled by sr.c now). Several comments. o This hexadecimal type value is because all TYPE_XXX constants in include/scsi/scsi.h are given in hex, but __stringify() will not convert them to decimal (so it will NOT be scsi:type-4). Since it does not really matter in which format it is, while both modalias in module and modalias attribute match each other, I descided to go for that 0x%02x format (and added a comment in include/scsi/scsi.h to keep them that way), instead of changing them all to decimal. o There was no .uevent routine for SCSI bus. It might be a good idea to add some more ueven environment variables in there. o osst.c driver handles tapes too, like st.c, but only SOME tapes. With this setup, hotplug scripts (or whatever is used by the user) will try to load both st and osst modules for all SCSI tapes found, because both modules have scsi:type-0x01 alias). It is not harmful, but one extra module is no good either. It is possible to solve this, by exporting more info in modalias attribute, including vendor and device identification strings, so that modalias becomes something like scsi:type-0x12:vendor-Adaptec LTD:device-OnStream Tape Drive and having that, match for all 3 attributes, not only device type. But oh well, vendor and device strings may be large, and they do contain spaces and whatnot. So I left them for now, awaiting for comments first. Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-10-27 21:02:37 +09:00
.uevent = scsi_bus_uevent,
#ifdef CONFIG_PM
.pm = &scsi_bus_pm_ops,
#endif
};
EXPORT_SYMBOL_GPL(scsi_bus_type);
int scsi_sysfs_register(void)
{
int error;
error = bus_register(&scsi_bus_type);
if (!error) {
error = class_register(&sdev_class);
if (error)
bus_unregister(&scsi_bus_type);
}
return error;
}
void scsi_sysfs_unregister(void)
{
class_unregister(&sdev_class);
bus_unregister(&scsi_bus_type);
}
/*
* sdev_show_function: macro to create an attr function that can be used to
* show a non-bit field.
*/
#define sdev_show_function(field, format_string) \
static ssize_t \
sdev_show_##field (struct device *dev, struct device_attribute *attr, \
char *buf) \
{ \
struct scsi_device *sdev; \
sdev = to_scsi_device(dev); \
return snprintf (buf, 20, format_string, sdev->field); \
} \
/*
* sdev_rd_attr: macro to create a function and attribute variable for a
* read only field.
*/
#define sdev_rd_attr(field, format_string) \
sdev_show_function(field, format_string) \
static DEVICE_ATTR(field, S_IRUGO, sdev_show_##field, NULL);
/*
* sdev_rw_attr: create a function and attribute variable for a
* read/write field.
*/
#define sdev_rw_attr(field, format_string) \
sdev_show_function(field, format_string) \
\
static ssize_t \
sdev_store_##field (struct device *dev, struct device_attribute *attr, \
const char *buf, size_t count) \
{ \
struct scsi_device *sdev; \
sdev = to_scsi_device(dev); \
sscanf (buf, format_string, &sdev->field); \
return count; \
} \
static DEVICE_ATTR(field, S_IRUGO | S_IWUSR, sdev_show_##field, sdev_store_##field);
/* Currently we don't export bit fields, but we might in future,
* so leave this code in */
#if 0
/*
* sdev_rd_attr: create a function and attribute variable for a
* read/write bit field.
*/
#define sdev_rw_attr_bit(field) \
sdev_show_function(field, "%d\n") \
\
static ssize_t \
sdev_store_##field (struct device *dev, struct device_attribute *attr, \
const char *buf, size_t count) \
{ \
int ret; \
struct scsi_device *sdev; \
ret = scsi_sdev_check_buf_bit(buf); \
if (ret >= 0) { \
sdev = to_scsi_device(dev); \
sdev->field = ret; \
ret = count; \
} \
return ret; \
} \
static DEVICE_ATTR(field, S_IRUGO | S_IWUSR, sdev_show_##field, sdev_store_##field);
/*
* scsi_sdev_check_buf_bit: return 0 if buf is "0", return 1 if buf is "1",
* else return -EINVAL.
*/
static int scsi_sdev_check_buf_bit(const char *buf)
{
if ((buf[1] == '\0') || ((buf[1] == '\n') && (buf[2] == '\0'))) {
if (buf[0] == '1')
return 1;
else if (buf[0] == '0')
return 0;
else
return -EINVAL;
} else
return -EINVAL;
}
#endif
/*
* Create the actual show/store functions and data structures.
*/
sdev_rd_attr (type, "%d\n");
sdev_rd_attr (scsi_level, "%d\n");
sdev_rd_attr (vendor, "%.8s\n");
sdev_rd_attr (model, "%.16s\n");
sdev_rd_attr (rev, "%.4s\n");
static ssize_t
sdev_show_device_busy(struct device *dev, struct device_attribute *attr,
char *buf)
{
struct scsi_device *sdev = to_scsi_device(dev);
return snprintf(buf, 20, "%d\n", atomic_read(&sdev->device_busy));
}
static DEVICE_ATTR(device_busy, S_IRUGO, sdev_show_device_busy, NULL);
static ssize_t
sdev_show_device_blocked(struct device *dev, struct device_attribute *attr,
char *buf)
{
struct scsi_device *sdev = to_scsi_device(dev);
return snprintf(buf, 20, "%d\n", atomic_read(&sdev->device_blocked));
}
static DEVICE_ATTR(device_blocked, S_IRUGO, sdev_show_device_blocked, NULL);
/*
* TODO: can we make these symlinks to the block layer ones?
*/
static ssize_t
sdev_show_timeout (struct device *dev, struct device_attribute *attr, char *buf)
{
struct scsi_device *sdev;
sdev = to_scsi_device(dev);
return snprintf(buf, 20, "%d\n", sdev->request_queue->rq_timeout / HZ);
}
static ssize_t
sdev_store_timeout (struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
{
struct scsi_device *sdev;
int timeout;
sdev = to_scsi_device(dev);
sscanf (buf, "%d\n", &timeout);
blk_queue_rq_timeout(sdev->request_queue, timeout * HZ);
return count;
}
static DEVICE_ATTR(timeout, S_IRUGO | S_IWUSR, sdev_show_timeout, sdev_store_timeout);
static ssize_t
sdev_show_eh_timeout(struct device *dev, struct device_attribute *attr, char *buf)
{
struct scsi_device *sdev;
sdev = to_scsi_device(dev);
return snprintf(buf, 20, "%u\n", sdev->eh_timeout / HZ);
}
static ssize_t
sdev_store_eh_timeout(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
{
struct scsi_device *sdev;
unsigned int eh_timeout;
int err;
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
sdev = to_scsi_device(dev);
err = kstrtouint(buf, 10, &eh_timeout);
if (err)
return err;
sdev->eh_timeout = eh_timeout * HZ;
return count;
}
static DEVICE_ATTR(eh_timeout, S_IRUGO | S_IWUSR, sdev_show_eh_timeout, sdev_store_eh_timeout);
static ssize_t
store_rescan_field (struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
{
scsi_rescan_device(dev);
return count;
}
static DEVICE_ATTR(rescan, S_IWUSR, NULL, store_rescan_field);
static ssize_t
sdev_store_delete(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
{
scsi: core: Avoid that SCSI device removal through sysfs triggers a deadlock A long time ago the unfortunate decision was taken to add a self-deletion attribute to the sysfs SCSI device directory. That decision was unfortunate because self-deletion is really tricky. We can't drop that attribute because widely used user space software depends on it, namely the rescan-scsi-bus.sh script. Hence this patch that avoids that writing into that attribute triggers a deadlock. See also commit 7973cbd9fbd9 ("[PATCH] add sysfs attributes to scan and delete scsi_devices"). This patch avoids that self-removal triggers the following deadlock: ====================================================== WARNING: possible circular locking dependency detected 4.18.0-rc2-dbg+ #5 Not tainted ------------------------------------------------------ modprobe/6539 is trying to acquire lock: 000000008323c4cd (kn->count#202){++++}, at: kernfs_remove_by_name_ns+0x45/0x90 but task is already holding lock: 00000000a6ec2c69 (&shost->scan_mutex){+.+.}, at: scsi_remove_host+0x21/0x150 [scsi_mod] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&shost->scan_mutex){+.+.}: __mutex_lock+0xfe/0xc70 mutex_lock_nested+0x1b/0x20 scsi_remove_device+0x26/0x40 [scsi_mod] sdev_store_delete+0x27/0x30 [scsi_mod] dev_attr_store+0x3e/0x50 sysfs_kf_write+0x87/0xa0 kernfs_fop_write+0x190/0x230 __vfs_write+0xd2/0x3b0 vfs_write+0x101/0x270 ksys_write+0xab/0x120 __x64_sys_write+0x43/0x50 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #0 (kn->count#202){++++}: lock_acquire+0xd2/0x260 __kernfs_remove+0x424/0x4a0 kernfs_remove_by_name_ns+0x45/0x90 remove_files.isra.1+0x3a/0x90 sysfs_remove_group+0x5c/0xc0 sysfs_remove_groups+0x39/0x60 device_remove_attrs+0x82/0xb0 device_del+0x251/0x580 __scsi_remove_device+0x19f/0x1d0 [scsi_mod] scsi_forget_host+0x37/0xb0 [scsi_mod] scsi_remove_host+0x9b/0x150 [scsi_mod] sdebug_driver_remove+0x4b/0x150 [scsi_debug] device_release_driver_internal+0x241/0x360 device_release_driver+0x12/0x20 bus_remove_device+0x1bc/0x290 device_del+0x259/0x580 device_unregister+0x1a/0x70 sdebug_remove_adapter+0x8b/0xf0 [scsi_debug] scsi_debug_exit+0x76/0xe8 [scsi_debug] __x64_sys_delete_module+0x1c1/0x280 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&shost->scan_mutex); lock(kn->count#202); lock(&shost->scan_mutex); lock(kn->count#202); *** DEADLOCK *** 2 locks held by modprobe/6539: #0: 00000000efaf9298 (&dev->mutex){....}, at: device_release_driver_internal+0x68/0x360 #1: 00000000a6ec2c69 (&shost->scan_mutex){+.+.}, at: scsi_remove_host+0x21/0x150 [scsi_mod] stack backtrace: CPU: 10 PID: 6539 Comm: modprobe Not tainted 4.18.0-rc2-dbg+ #5 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0xa4/0xf5 print_circular_bug.isra.34+0x213/0x221 __lock_acquire+0x1a7e/0x1b50 lock_acquire+0xd2/0x260 __kernfs_remove+0x424/0x4a0 kernfs_remove_by_name_ns+0x45/0x90 remove_files.isra.1+0x3a/0x90 sysfs_remove_group+0x5c/0xc0 sysfs_remove_groups+0x39/0x60 device_remove_attrs+0x82/0xb0 device_del+0x251/0x580 __scsi_remove_device+0x19f/0x1d0 [scsi_mod] scsi_forget_host+0x37/0xb0 [scsi_mod] scsi_remove_host+0x9b/0x150 [scsi_mod] sdebug_driver_remove+0x4b/0x150 [scsi_debug] device_release_driver_internal+0x241/0x360 device_release_driver+0x12/0x20 bus_remove_device+0x1bc/0x290 device_del+0x259/0x580 device_unregister+0x1a/0x70 sdebug_remove_adapter+0x8b/0xf0 [scsi_debug] scsi_debug_exit+0x76/0xe8 [scsi_debug] __x64_sys_delete_module+0x1c1/0x280 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe See also https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg54525.html. Fixes: ac0ece9174ac ("scsi: use device_remove_file_self() instead of device_schedule_callback()") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-08-03 02:51:41 +09:00
struct kernfs_node *kn;
scsi: core: try to get module before removing device We have a test case like block/001 in blktests, which will create a scsi device by loading scsi_debug module and then try to delete the device by sysfs interface. At the same time, it may remove the scsi_debug module. And getting a invalid paging request BUG_ON as following: [ 34.625854] BUG: unable to handle page fault for address: ffffffffa0016bb8 [ 34.629189] Oops: 0000 [#1] SMP PTI [ 34.629618] CPU: 1 PID: 450 Comm: bash Tainted: G W 5.4.0-rc3+ #473 [ 34.632524] RIP: 0010:scsi_proc_hostdir_rm+0x5/0xa0 [ 34.643555] CR2: ffffffffa0016bb8 CR3: 000000012cd88000 CR4: 00000000000006e0 [ 34.644545] Call Trace: [ 34.644907] scsi_host_dev_release+0x6b/0x1f0 [ 34.645511] device_release+0x74/0x110 [ 34.646046] kobject_put+0x116/0x390 [ 34.646559] put_device+0x17/0x30 [ 34.647041] scsi_target_dev_release+0x2b/0x40 [ 34.647652] device_release+0x74/0x110 [ 34.648186] kobject_put+0x116/0x390 [ 34.648691] put_device+0x17/0x30 [ 34.649157] scsi_device_dev_release_usercontext+0x2e8/0x360 [ 34.649953] execute_in_process_context+0x29/0x80 [ 34.650603] scsi_device_dev_release+0x20/0x30 [ 34.651221] device_release+0x74/0x110 [ 34.651732] kobject_put+0x116/0x390 [ 34.652230] sysfs_unbreak_active_protection+0x3f/0x50 [ 34.652935] sdev_store_delete.cold.4+0x71/0x8f [ 34.653579] dev_attr_store+0x1b/0x40 [ 34.654103] sysfs_kf_write+0x3d/0x60 [ 34.654603] kernfs_fop_write+0x174/0x250 [ 34.655165] __vfs_write+0x1f/0x60 [ 34.655639] vfs_write+0xc7/0x280 [ 34.656117] ksys_write+0x6d/0x140 [ 34.656591] __x64_sys_write+0x1e/0x30 [ 34.657114] do_syscall_64+0xb1/0x400 [ 34.657627] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 34.658335] RIP: 0033:0x7f156f337130 During deleting scsi target, the scsi_debug module have been removed. Then, sdebug_driver_template belonged to the module cannot be accessd, resulting in scsi_proc_hostdir_rm() BUG_ON. To fix the bug, we add scsi_device_get() in sdev_store_delete() to try to increase refcount of module, avoiding the module been removed. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20191015130556.18061-1-yuyufen@huawei.com Signed-off-by: Yufen Yu <yuyufen@huawei.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-15 22:05:56 +09:00
struct scsi_device *sdev = to_scsi_device(dev);
/*
* We need to try to get module, avoiding the module been removed
* during delete.
*/
if (scsi_device_get(sdev))
return -ENODEV;
scsi: core: Avoid that SCSI device removal through sysfs triggers a deadlock A long time ago the unfortunate decision was taken to add a self-deletion attribute to the sysfs SCSI device directory. That decision was unfortunate because self-deletion is really tricky. We can't drop that attribute because widely used user space software depends on it, namely the rescan-scsi-bus.sh script. Hence this patch that avoids that writing into that attribute triggers a deadlock. See also commit 7973cbd9fbd9 ("[PATCH] add sysfs attributes to scan and delete scsi_devices"). This patch avoids that self-removal triggers the following deadlock: ====================================================== WARNING: possible circular locking dependency detected 4.18.0-rc2-dbg+ #5 Not tainted ------------------------------------------------------ modprobe/6539 is trying to acquire lock: 000000008323c4cd (kn->count#202){++++}, at: kernfs_remove_by_name_ns+0x45/0x90 but task is already holding lock: 00000000a6ec2c69 (&shost->scan_mutex){+.+.}, at: scsi_remove_host+0x21/0x150 [scsi_mod] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&shost->scan_mutex){+.+.}: __mutex_lock+0xfe/0xc70 mutex_lock_nested+0x1b/0x20 scsi_remove_device+0x26/0x40 [scsi_mod] sdev_store_delete+0x27/0x30 [scsi_mod] dev_attr_store+0x3e/0x50 sysfs_kf_write+0x87/0xa0 kernfs_fop_write+0x190/0x230 __vfs_write+0xd2/0x3b0 vfs_write+0x101/0x270 ksys_write+0xab/0x120 __x64_sys_write+0x43/0x50 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #0 (kn->count#202){++++}: lock_acquire+0xd2/0x260 __kernfs_remove+0x424/0x4a0 kernfs_remove_by_name_ns+0x45/0x90 remove_files.isra.1+0x3a/0x90 sysfs_remove_group+0x5c/0xc0 sysfs_remove_groups+0x39/0x60 device_remove_attrs+0x82/0xb0 device_del+0x251/0x580 __scsi_remove_device+0x19f/0x1d0 [scsi_mod] scsi_forget_host+0x37/0xb0 [scsi_mod] scsi_remove_host+0x9b/0x150 [scsi_mod] sdebug_driver_remove+0x4b/0x150 [scsi_debug] device_release_driver_internal+0x241/0x360 device_release_driver+0x12/0x20 bus_remove_device+0x1bc/0x290 device_del+0x259/0x580 device_unregister+0x1a/0x70 sdebug_remove_adapter+0x8b/0xf0 [scsi_debug] scsi_debug_exit+0x76/0xe8 [scsi_debug] __x64_sys_delete_module+0x1c1/0x280 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&shost->scan_mutex); lock(kn->count#202); lock(&shost->scan_mutex); lock(kn->count#202); *** DEADLOCK *** 2 locks held by modprobe/6539: #0: 00000000efaf9298 (&dev->mutex){....}, at: device_release_driver_internal+0x68/0x360 #1: 00000000a6ec2c69 (&shost->scan_mutex){+.+.}, at: scsi_remove_host+0x21/0x150 [scsi_mod] stack backtrace: CPU: 10 PID: 6539 Comm: modprobe Not tainted 4.18.0-rc2-dbg+ #5 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0xa4/0xf5 print_circular_bug.isra.34+0x213/0x221 __lock_acquire+0x1a7e/0x1b50 lock_acquire+0xd2/0x260 __kernfs_remove+0x424/0x4a0 kernfs_remove_by_name_ns+0x45/0x90 remove_files.isra.1+0x3a/0x90 sysfs_remove_group+0x5c/0xc0 sysfs_remove_groups+0x39/0x60 device_remove_attrs+0x82/0xb0 device_del+0x251/0x580 __scsi_remove_device+0x19f/0x1d0 [scsi_mod] scsi_forget_host+0x37/0xb0 [scsi_mod] scsi_remove_host+0x9b/0x150 [scsi_mod] sdebug_driver_remove+0x4b/0x150 [scsi_debug] device_release_driver_internal+0x241/0x360 device_release_driver+0x12/0x20 bus_remove_device+0x1bc/0x290 device_del+0x259/0x580 device_unregister+0x1a/0x70 sdebug_remove_adapter+0x8b/0xf0 [scsi_debug] scsi_debug_exit+0x76/0xe8 [scsi_debug] __x64_sys_delete_module+0x1c1/0x280 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe See also https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg54525.html. Fixes: ac0ece9174ac ("scsi: use device_remove_file_self() instead of device_schedule_callback()") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-08-03 02:51:41 +09:00
kn = sysfs_break_active_protection(&dev->kobj, &attr->attr);
WARN_ON_ONCE(!kn);
/*
* Concurrent writes into the "delete" sysfs attribute may trigger
* concurrent calls to device_remove_file() and scsi_remove_device().
* device_remove_file() handles concurrent removal calls by
* serializing these and by ignoring the second and later removal
* attempts. Concurrent calls of scsi_remove_device() are
* serialized. The second and later calls of scsi_remove_device() are
* ignored because the first call of that function changes the device
* state into SDEV_DEL.
*/
device_remove_file(dev, attr);
scsi: core: try to get module before removing device We have a test case like block/001 in blktests, which will create a scsi device by loading scsi_debug module and then try to delete the device by sysfs interface. At the same time, it may remove the scsi_debug module. And getting a invalid paging request BUG_ON as following: [ 34.625854] BUG: unable to handle page fault for address: ffffffffa0016bb8 [ 34.629189] Oops: 0000 [#1] SMP PTI [ 34.629618] CPU: 1 PID: 450 Comm: bash Tainted: G W 5.4.0-rc3+ #473 [ 34.632524] RIP: 0010:scsi_proc_hostdir_rm+0x5/0xa0 [ 34.643555] CR2: ffffffffa0016bb8 CR3: 000000012cd88000 CR4: 00000000000006e0 [ 34.644545] Call Trace: [ 34.644907] scsi_host_dev_release+0x6b/0x1f0 [ 34.645511] device_release+0x74/0x110 [ 34.646046] kobject_put+0x116/0x390 [ 34.646559] put_device+0x17/0x30 [ 34.647041] scsi_target_dev_release+0x2b/0x40 [ 34.647652] device_release+0x74/0x110 [ 34.648186] kobject_put+0x116/0x390 [ 34.648691] put_device+0x17/0x30 [ 34.649157] scsi_device_dev_release_usercontext+0x2e8/0x360 [ 34.649953] execute_in_process_context+0x29/0x80 [ 34.650603] scsi_device_dev_release+0x20/0x30 [ 34.651221] device_release+0x74/0x110 [ 34.651732] kobject_put+0x116/0x390 [ 34.652230] sysfs_unbreak_active_protection+0x3f/0x50 [ 34.652935] sdev_store_delete.cold.4+0x71/0x8f [ 34.653579] dev_attr_store+0x1b/0x40 [ 34.654103] sysfs_kf_write+0x3d/0x60 [ 34.654603] kernfs_fop_write+0x174/0x250 [ 34.655165] __vfs_write+0x1f/0x60 [ 34.655639] vfs_write+0xc7/0x280 [ 34.656117] ksys_write+0x6d/0x140 [ 34.656591] __x64_sys_write+0x1e/0x30 [ 34.657114] do_syscall_64+0xb1/0x400 [ 34.657627] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 34.658335] RIP: 0033:0x7f156f337130 During deleting scsi target, the scsi_debug module have been removed. Then, sdebug_driver_template belonged to the module cannot be accessd, resulting in scsi_proc_hostdir_rm() BUG_ON. To fix the bug, we add scsi_device_get() in sdev_store_delete() to try to increase refcount of module, avoiding the module been removed. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20191015130556.18061-1-yuyufen@huawei.com Signed-off-by: Yufen Yu <yuyufen@huawei.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-15 22:05:56 +09:00
scsi_remove_device(sdev);
scsi: core: Avoid that SCSI device removal through sysfs triggers a deadlock A long time ago the unfortunate decision was taken to add a self-deletion attribute to the sysfs SCSI device directory. That decision was unfortunate because self-deletion is really tricky. We can't drop that attribute because widely used user space software depends on it, namely the rescan-scsi-bus.sh script. Hence this patch that avoids that writing into that attribute triggers a deadlock. See also commit 7973cbd9fbd9 ("[PATCH] add sysfs attributes to scan and delete scsi_devices"). This patch avoids that self-removal triggers the following deadlock: ====================================================== WARNING: possible circular locking dependency detected 4.18.0-rc2-dbg+ #5 Not tainted ------------------------------------------------------ modprobe/6539 is trying to acquire lock: 000000008323c4cd (kn->count#202){++++}, at: kernfs_remove_by_name_ns+0x45/0x90 but task is already holding lock: 00000000a6ec2c69 (&shost->scan_mutex){+.+.}, at: scsi_remove_host+0x21/0x150 [scsi_mod] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&shost->scan_mutex){+.+.}: __mutex_lock+0xfe/0xc70 mutex_lock_nested+0x1b/0x20 scsi_remove_device+0x26/0x40 [scsi_mod] sdev_store_delete+0x27/0x30 [scsi_mod] dev_attr_store+0x3e/0x50 sysfs_kf_write+0x87/0xa0 kernfs_fop_write+0x190/0x230 __vfs_write+0xd2/0x3b0 vfs_write+0x101/0x270 ksys_write+0xab/0x120 __x64_sys_write+0x43/0x50 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #0 (kn->count#202){++++}: lock_acquire+0xd2/0x260 __kernfs_remove+0x424/0x4a0 kernfs_remove_by_name_ns+0x45/0x90 remove_files.isra.1+0x3a/0x90 sysfs_remove_group+0x5c/0xc0 sysfs_remove_groups+0x39/0x60 device_remove_attrs+0x82/0xb0 device_del+0x251/0x580 __scsi_remove_device+0x19f/0x1d0 [scsi_mod] scsi_forget_host+0x37/0xb0 [scsi_mod] scsi_remove_host+0x9b/0x150 [scsi_mod] sdebug_driver_remove+0x4b/0x150 [scsi_debug] device_release_driver_internal+0x241/0x360 device_release_driver+0x12/0x20 bus_remove_device+0x1bc/0x290 device_del+0x259/0x580 device_unregister+0x1a/0x70 sdebug_remove_adapter+0x8b/0xf0 [scsi_debug] scsi_debug_exit+0x76/0xe8 [scsi_debug] __x64_sys_delete_module+0x1c1/0x280 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&shost->scan_mutex); lock(kn->count#202); lock(&shost->scan_mutex); lock(kn->count#202); *** DEADLOCK *** 2 locks held by modprobe/6539: #0: 00000000efaf9298 (&dev->mutex){....}, at: device_release_driver_internal+0x68/0x360 #1: 00000000a6ec2c69 (&shost->scan_mutex){+.+.}, at: scsi_remove_host+0x21/0x150 [scsi_mod] stack backtrace: CPU: 10 PID: 6539 Comm: modprobe Not tainted 4.18.0-rc2-dbg+ #5 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0xa4/0xf5 print_circular_bug.isra.34+0x213/0x221 __lock_acquire+0x1a7e/0x1b50 lock_acquire+0xd2/0x260 __kernfs_remove+0x424/0x4a0 kernfs_remove_by_name_ns+0x45/0x90 remove_files.isra.1+0x3a/0x90 sysfs_remove_group+0x5c/0xc0 sysfs_remove_groups+0x39/0x60 device_remove_attrs+0x82/0xb0 device_del+0x251/0x580 __scsi_remove_device+0x19f/0x1d0 [scsi_mod] scsi_forget_host+0x37/0xb0 [scsi_mod] scsi_remove_host+0x9b/0x150 [scsi_mod] sdebug_driver_remove+0x4b/0x150 [scsi_debug] device_release_driver_internal+0x241/0x360 device_release_driver+0x12/0x20 bus_remove_device+0x1bc/0x290 device_del+0x259/0x580 device_unregister+0x1a/0x70 sdebug_remove_adapter+0x8b/0xf0 [scsi_debug] scsi_debug_exit+0x76/0xe8 [scsi_debug] __x64_sys_delete_module+0x1c1/0x280 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe See also https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg54525.html. Fixes: ac0ece9174ac ("scsi: use device_remove_file_self() instead of device_schedule_callback()") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-08-03 02:51:41 +09:00
if (kn)
sysfs_unbreak_active_protection(kn);
scsi: core: try to get module before removing device We have a test case like block/001 in blktests, which will create a scsi device by loading scsi_debug module and then try to delete the device by sysfs interface. At the same time, it may remove the scsi_debug module. And getting a invalid paging request BUG_ON as following: [ 34.625854] BUG: unable to handle page fault for address: ffffffffa0016bb8 [ 34.629189] Oops: 0000 [#1] SMP PTI [ 34.629618] CPU: 1 PID: 450 Comm: bash Tainted: G W 5.4.0-rc3+ #473 [ 34.632524] RIP: 0010:scsi_proc_hostdir_rm+0x5/0xa0 [ 34.643555] CR2: ffffffffa0016bb8 CR3: 000000012cd88000 CR4: 00000000000006e0 [ 34.644545] Call Trace: [ 34.644907] scsi_host_dev_release+0x6b/0x1f0 [ 34.645511] device_release+0x74/0x110 [ 34.646046] kobject_put+0x116/0x390 [ 34.646559] put_device+0x17/0x30 [ 34.647041] scsi_target_dev_release+0x2b/0x40 [ 34.647652] device_release+0x74/0x110 [ 34.648186] kobject_put+0x116/0x390 [ 34.648691] put_device+0x17/0x30 [ 34.649157] scsi_device_dev_release_usercontext+0x2e8/0x360 [ 34.649953] execute_in_process_context+0x29/0x80 [ 34.650603] scsi_device_dev_release+0x20/0x30 [ 34.651221] device_release+0x74/0x110 [ 34.651732] kobject_put+0x116/0x390 [ 34.652230] sysfs_unbreak_active_protection+0x3f/0x50 [ 34.652935] sdev_store_delete.cold.4+0x71/0x8f [ 34.653579] dev_attr_store+0x1b/0x40 [ 34.654103] sysfs_kf_write+0x3d/0x60 [ 34.654603] kernfs_fop_write+0x174/0x250 [ 34.655165] __vfs_write+0x1f/0x60 [ 34.655639] vfs_write+0xc7/0x280 [ 34.656117] ksys_write+0x6d/0x140 [ 34.656591] __x64_sys_write+0x1e/0x30 [ 34.657114] do_syscall_64+0xb1/0x400 [ 34.657627] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 34.658335] RIP: 0033:0x7f156f337130 During deleting scsi target, the scsi_debug module have been removed. Then, sdebug_driver_template belonged to the module cannot be accessd, resulting in scsi_proc_hostdir_rm() BUG_ON. To fix the bug, we add scsi_device_get() in sdev_store_delete() to try to increase refcount of module, avoiding the module been removed. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20191015130556.18061-1-yuyufen@huawei.com Signed-off-by: Yufen Yu <yuyufen@huawei.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-15 22:05:56 +09:00
scsi_device_put(sdev);
return count;
};
static DEVICE_ATTR(delete, S_IWUSR, NULL, sdev_store_delete);
static ssize_t
store_state_field(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
{
int i, ret;
struct scsi_device *sdev = to_scsi_device(dev);
enum scsi_device_state state = 0;
for (i = 0; i < ARRAY_SIZE(sdev_states); i++) {
const int len = strlen(sdev_states[i].name);
if (strncmp(sdev_states[i].name, buf, len) == 0 &&
buf[len] == '\n') {
state = sdev_states[i].value;
break;
}
}
scsi: Restrict user space SCSI device state changes to "running" and "offline" The ability to modify the SCSI device state was introduced by commit 638127e579a4 ("[PATCH] Fix error handler offline behaviour"; v2.6.12). That same commit introduced the following device states: { SDEV_CREATED, "created" }, { SDEV_RUNNING, "running" }, { SDEV_CANCEL, "cancel" }, { SDEV_DEL, "deleted" }, { SDEV_QUIESCE, "quiesce" }, { SDEV_OFFLINE, "offline" }, The SDEV_BLOCK state was introduced later to avoid that an FC cable pull would immediately result in an I/O error (commit 1094e682310e; "[PATCH] suspending I/Os to a device"; v2.6.12). That same patch introduced the ability to set the SDEV_BLOCK state from user space. I'm not sure whether that ability was introduced on purpose or accidentally. Since there is agreement that only writing "running" or "offline" into the SCSI sysfs device state attribute makes sense, restrict sysfs writes to these values. This patch makes sure that SDEV_BLOCK is only used for its original purpose, namely to allow transport drivers and LLDs to block further .queuecommand() calls while transport layer or adapter recovery is in progress. Note: a web search for "/sys/class/scsi_device" AND "device/state" revealed several storage configuration guides. The instructions I found in these guides tell users to write the value "running" or "offline" in the SCSI device state sysfs attribute and no other values. [mkp: typo] Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: James Smart <james.smart@broadcom.com> Cc: Ewan D. Milne <emilne@redhat.com> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 00:18:18 +09:00
switch (state) {
case SDEV_RUNNING:
case SDEV_OFFLINE:
break;
default:
return -EINVAL;
scsi: Restrict user space SCSI device state changes to "running" and "offline" The ability to modify the SCSI device state was introduced by commit 638127e579a4 ("[PATCH] Fix error handler offline behaviour"; v2.6.12). That same commit introduced the following device states: { SDEV_CREATED, "created" }, { SDEV_RUNNING, "running" }, { SDEV_CANCEL, "cancel" }, { SDEV_DEL, "deleted" }, { SDEV_QUIESCE, "quiesce" }, { SDEV_OFFLINE, "offline" }, The SDEV_BLOCK state was introduced later to avoid that an FC cable pull would immediately result in an I/O error (commit 1094e682310e; "[PATCH] suspending I/Os to a device"; v2.6.12). That same patch introduced the ability to set the SDEV_BLOCK state from user space. I'm not sure whether that ability was introduced on purpose or accidentally. Since there is agreement that only writing "running" or "offline" into the SCSI sysfs device state attribute makes sense, restrict sysfs writes to these values. This patch makes sure that SDEV_BLOCK is only used for its original purpose, namely to allow transport drivers and LLDs to block further .queuecommand() calls while transport layer or adapter recovery is in progress. Note: a web search for "/sys/class/scsi_device" AND "device/state" revealed several storage configuration guides. The instructions I found in these guides tell users to write the value "running" or "offline" in the SCSI device state sysfs attribute and no other values. [mkp: typo] Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: James Smart <james.smart@broadcom.com> Cc: Ewan D. Milne <emilne@redhat.com> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 00:18:18 +09:00
}
mutex_lock(&sdev->state_mutex);
ret = scsi_device_set_state(sdev, state);
/*
* If the device state changes to SDEV_RUNNING, we need to
scsi: core: Fix hang of freezing queue between blocking and running device commit 02c6dcd543f8f051973ee18bfbc4dc3bd595c558 upstream. We found a hang, the steps to reproduce are as follows: 1. blocking device via scsi_device_set_state() 2. dd if=/dev/sda of=/mnt/t.log bs=1M count=10 3. echo none > /sys/block/sda/queue/scheduler 4. echo "running" >/sys/block/sda/device/state Step 3 and 4 should complete after step 4, but they hang. CPU#0 CPU#1 CPU#2 --------------- ---------------- ---------------- Step 1: blocking device Step 2: dd xxxx ^^^^^^ get request q_usage_counter++ Step 3: switching scheculer elv_iosched_store elevator_switch blk_mq_freeze_queue blk_freeze_queue > blk_freeze_queue_start ^^^^^^ mq_freeze_depth++ > blk_mq_run_hw_queues ^^^^^^ can't run queue when dev blocked > blk_mq_freeze_queue_wait ^^^^^^ Hang here!!! wait q_usage_counter==0 Step 4: running device store_state_field scsi_rescan_device scsi_attach_vpd scsi_vpd_inquiry __scsi_execute blk_get_request blk_mq_alloc_request blk_queue_enter ^^^^^^ Hang here!!! wait mq_freeze_depth==0 blk_mq_run_hw_queues ^^^^^^ dispatch IO, q_usage_counter will reduce to zero blk_mq_unfreeze_queue ^^^^^ mq_freeze_depth-- To fix this, we need to run queue before rescanning device when the device state changes to SDEV_RUNNING. Link: https://lore.kernel.org/r/20210824025921.3277629-1-lijinlin3@huawei.com Fixes: f0f82e2476f6 ("scsi: core: Fix capacity set to zero after offlinining device") Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Li Jinlin <lijinlin3@huawei.com> Signed-off-by: Qiu Laibin <qiulaibin@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-24 11:59:21 +09:00
* run the queue to avoid I/O hang, and rescan the device
* to revalidate it. Running the queue first is necessary
* because another thread may be waiting inside
* blk_mq_freeze_queue_wait() and because that call may be
* waiting for pending I/O to finish.
*/
if (ret == 0 && state == SDEV_RUNNING) {
blk_mq_run_hw_queues(sdev->request_queue, true);
scsi: core: Fix hang of freezing queue between blocking and running device commit 02c6dcd543f8f051973ee18bfbc4dc3bd595c558 upstream. We found a hang, the steps to reproduce are as follows: 1. blocking device via scsi_device_set_state() 2. dd if=/dev/sda of=/mnt/t.log bs=1M count=10 3. echo none > /sys/block/sda/queue/scheduler 4. echo "running" >/sys/block/sda/device/state Step 3 and 4 should complete after step 4, but they hang. CPU#0 CPU#1 CPU#2 --------------- ---------------- ---------------- Step 1: blocking device Step 2: dd xxxx ^^^^^^ get request q_usage_counter++ Step 3: switching scheculer elv_iosched_store elevator_switch blk_mq_freeze_queue blk_freeze_queue > blk_freeze_queue_start ^^^^^^ mq_freeze_depth++ > blk_mq_run_hw_queues ^^^^^^ can't run queue when dev blocked > blk_mq_freeze_queue_wait ^^^^^^ Hang here!!! wait q_usage_counter==0 Step 4: running device store_state_field scsi_rescan_device scsi_attach_vpd scsi_vpd_inquiry __scsi_execute blk_get_request blk_mq_alloc_request blk_queue_enter ^^^^^^ Hang here!!! wait mq_freeze_depth==0 blk_mq_run_hw_queues ^^^^^^ dispatch IO, q_usage_counter will reduce to zero blk_mq_unfreeze_queue ^^^^^ mq_freeze_depth-- To fix this, we need to run queue before rescanning device when the device state changes to SDEV_RUNNING. Link: https://lore.kernel.org/r/20210824025921.3277629-1-lijinlin3@huawei.com Fixes: f0f82e2476f6 ("scsi: core: Fix capacity set to zero after offlinining device") Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Li Jinlin <lijinlin3@huawei.com> Signed-off-by: Qiu Laibin <qiulaibin@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-24 11:59:21 +09:00
scsi_rescan_device(dev);
}
mutex_unlock(&sdev->state_mutex);
return ret == 0 ? count : -EINVAL;
}
static ssize_t
show_state_field(struct device *dev, struct device_attribute *attr, char *buf)
{
struct scsi_device *sdev = to_scsi_device(dev);
const char *name = scsi_device_state_name(sdev->sdev_state);
if (!name)
return -EINVAL;
return snprintf(buf, 20, "%s\n", name);
}
static DEVICE_ATTR(state, S_IRUGO | S_IWUSR, show_state_field, store_state_field);
static ssize_t
show_queue_type_field(struct device *dev, struct device_attribute *attr,
char *buf)
{
struct scsi_device *sdev = to_scsi_device(dev);
const char *name = "none";
if (sdev->simple_tags)
name = "simple";
return snprintf(buf, 20, "%s\n", name);
}
static ssize_t
store_queue_type_field(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
{
struct scsi_device *sdev = to_scsi_device(dev);
if (!sdev->tagged_supported)
return -EINVAL;
sdev_printk(KERN_INFO, sdev,
"ignoring write to deprecated queue_type attribute");
return count;
}
static DEVICE_ATTR(queue_type, S_IRUGO | S_IWUSR, show_queue_type_field,
store_queue_type_field);
#define sdev_vpd_pg_attr(_page) \
static ssize_t \
show_vpd_##_page(struct file *filp, struct kobject *kobj, \
struct bin_attribute *bin_attr, \
char *buf, loff_t off, size_t count) \
{ \
struct device *dev = container_of(kobj, struct device, kobj); \
struct scsi_device *sdev = to_scsi_device(dev); \
struct scsi_vpd *vpd_page; \
int ret = -EINVAL; \
\
rcu_read_lock(); \
vpd_page = rcu_dereference(sdev->vpd_##_page); \
if (vpd_page) \
ret = memory_read_from_buffer(buf, count, &off, \
vpd_page->data, vpd_page->len); \
rcu_read_unlock(); \
return ret; \
} \
static struct bin_attribute dev_attr_vpd_##_page = { \
.attr = {.name = __stringify(vpd_##_page), .mode = S_IRUGO }, \
.size = 0, \
.read = show_vpd_##_page, \
};
sdev_vpd_pg_attr(pg83);
sdev_vpd_pg_attr(pg80);
static ssize_t show_inquiry(struct file *filep, struct kobject *kobj,
struct bin_attribute *bin_attr,
char *buf, loff_t off, size_t count)
{
struct device *dev = container_of(kobj, struct device, kobj);
struct scsi_device *sdev = to_scsi_device(dev);
if (!sdev->inquiry)
return -EINVAL;
return memory_read_from_buffer(buf, count, &off, sdev->inquiry,
sdev->inquiry_len);
}
static struct bin_attribute dev_attr_inquiry = {
.attr = {
.name = "inquiry",
.mode = S_IRUGO,
},
.size = 0,
.read = show_inquiry,
};
static ssize_t
show_iostat_counterbits(struct device *dev, struct device_attribute *attr,
char *buf)
{
return snprintf(buf, 20, "%d\n", (int)sizeof(atomic_t) * 8);
}
static DEVICE_ATTR(iocounterbits, S_IRUGO, show_iostat_counterbits, NULL);
#define show_sdev_iostat(field) \
static ssize_t \
show_iostat_##field(struct device *dev, struct device_attribute *attr, \
char *buf) \
{ \
struct scsi_device *sdev = to_scsi_device(dev); \
unsigned long long count = atomic_read(&sdev->field); \
return snprintf(buf, 20, "0x%llx\n", count); \
} \
static DEVICE_ATTR(field, S_IRUGO, show_iostat_##field, NULL)
show_sdev_iostat(iorequest_cnt);
show_sdev_iostat(iodone_cnt);
show_sdev_iostat(ioerr_cnt);
[SCSI] modalias for scsi devices The following patch adds support for sysfs/uevent modalias attribute for scsi devices (like disks, tapes, cdroms etc), based on whatever current sd.c, sr.c, st.c and osst.c drivers supports. The modalias format is like this: scsi:type-0x04 (for TYPE_WORM, handled by sr.c now). Several comments. o This hexadecimal type value is because all TYPE_XXX constants in include/scsi/scsi.h are given in hex, but __stringify() will not convert them to decimal (so it will NOT be scsi:type-4). Since it does not really matter in which format it is, while both modalias in module and modalias attribute match each other, I descided to go for that 0x%02x format (and added a comment in include/scsi/scsi.h to keep them that way), instead of changing them all to decimal. o There was no .uevent routine for SCSI bus. It might be a good idea to add some more ueven environment variables in there. o osst.c driver handles tapes too, like st.c, but only SOME tapes. With this setup, hotplug scripts (or whatever is used by the user) will try to load both st and osst modules for all SCSI tapes found, because both modules have scsi:type-0x01 alias). It is not harmful, but one extra module is no good either. It is possible to solve this, by exporting more info in modalias attribute, including vendor and device identification strings, so that modalias becomes something like scsi:type-0x12:vendor-Adaptec LTD:device-OnStream Tape Drive and having that, match for all 3 attributes, not only device type. But oh well, vendor and device strings may be large, and they do contain spaces and whatnot. So I left them for now, awaiting for comments first. Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-10-27 21:02:37 +09:00
static ssize_t
sdev_show_modalias(struct device *dev, struct device_attribute *attr, char *buf)
{
struct scsi_device *sdev;
sdev = to_scsi_device(dev);
return snprintf (buf, 20, SCSI_DEVICE_MODALIAS_FMT "\n", sdev->type);
}
static DEVICE_ATTR(modalias, S_IRUGO, sdev_show_modalias, NULL);
#define DECLARE_EVT_SHOW(name, Cap_name) \
static ssize_t \
sdev_show_evt_##name(struct device *dev, struct device_attribute *attr, \
char *buf) \
{ \
struct scsi_device *sdev = to_scsi_device(dev); \
int val = test_bit(SDEV_EVT_##Cap_name, sdev->supported_events);\
return snprintf(buf, 20, "%d\n", val); \
}
#define DECLARE_EVT_STORE(name, Cap_name) \
static ssize_t \
sdev_store_evt_##name(struct device *dev, struct device_attribute *attr,\
const char *buf, size_t count) \
{ \
struct scsi_device *sdev = to_scsi_device(dev); \
int val = simple_strtoul(buf, NULL, 0); \
if (val == 0) \
clear_bit(SDEV_EVT_##Cap_name, sdev->supported_events); \
else if (val == 1) \
set_bit(SDEV_EVT_##Cap_name, sdev->supported_events); \
else \
return -EINVAL; \
return count; \
}
#define DECLARE_EVT(name, Cap_name) \
DECLARE_EVT_SHOW(name, Cap_name) \
DECLARE_EVT_STORE(name, Cap_name) \
static DEVICE_ATTR(evt_##name, S_IRUGO, sdev_show_evt_##name, \
sdev_store_evt_##name);
#define REF_EVT(name) &dev_attr_evt_##name.attr
DECLARE_EVT(media_change, MEDIA_CHANGE)
DECLARE_EVT(inquiry_change_reported, INQUIRY_CHANGE_REPORTED)
DECLARE_EVT(capacity_change_reported, CAPACITY_CHANGE_REPORTED)
DECLARE_EVT(soft_threshold_reached, SOFT_THRESHOLD_REACHED_REPORTED)
DECLARE_EVT(mode_parameter_change_reported, MODE_PARAMETER_CHANGE_REPORTED)
DECLARE_EVT(lun_change_reported, LUN_CHANGE_REPORTED)
static ssize_t
sdev_store_queue_depth(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
{
int depth, retval;
struct scsi_device *sdev = to_scsi_device(dev);
struct scsi_host_template *sht = sdev->host->hostt;
if (!sht->change_queue_depth)
return -EINVAL;
depth = simple_strtoul(buf, NULL, 0);
if (depth < 1 || depth > sdev->host->can_queue)
return -EINVAL;
retval = sht->change_queue_depth(sdev, depth);
if (retval < 0)
return retval;
[SCSI] add queue_depth ramp up code Current FC HBA queue_depth ramp up code depends on last queue full time. The sdev already has last_queue_full_time field to track last queue full time but stored value is truncated by last four bits. So this patch updates last_queue_full_time without truncating last 4 bits to store full value and then updates its only current usages in scsi_track_queue_full to ignore last four bits to keep current usages same while also use this field in added ramp up code. Adds scsi_handle_queue_ramp_up to ramp up queue_depth on successful completion of IO. The scsi_handle_queue_ramp_up will do ramp up on all luns of a target, just same as ramp down done on all luns on a target. The ramp up is skipped in case the change_queue_depth is not supported by LLD or already reached to added max_queue_depth. Updates added max_queue_depth on every new update to default queue_depth value. The ramp up is also skipped if lapsed time since either last queue ramp up or down is less than LLD specified queue_ramp_up_period. Adds queue_ramp_up_period to sysfs but only if change_queue_depth is supported since ramp up and queue_ramp_up_period is needed only in case change_queue_depth is supported first. Initializes queue_ramp_up_period to 120HZ jiffies as initial default value, it is same as used in existing lpfc and qla2xxx. -v2 Combined all ramp code into this single patch. -v3 Moves max_queue_depth initialization after slave_configure is called from after slave_alloc calling done. Also adjusted max_queue_depth check to skip ramp up if current queue_depth is >= max_queue_depth. -v4 Changes sdev->queue_ramp_up_period unit to ms when using sysfs i/f to store or show its value. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Tested-by: Christof Schmitt <christof.schmitt@de.ibm.com> Tested-by: Giridhar Malavali <giridhar.malavali@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-10-23 07:46:33 +09:00
sdev->max_queue_depth = sdev->queue_depth;
return count;
}
sdev_show_function(queue_depth, "%d\n");
static DEVICE_ATTR(queue_depth, S_IRUGO | S_IWUSR, sdev_show_queue_depth,
sdev_store_queue_depth);
static ssize_t
sdev_show_wwid(struct device *dev, struct device_attribute *attr,
char *buf)
{
struct scsi_device *sdev = to_scsi_device(dev);
ssize_t count;
count = scsi_vpd_lun_id(sdev, buf, PAGE_SIZE);
if (count > 0) {
buf[count] = '\n';
count++;
}
return count;
}
static DEVICE_ATTR(wwid, S_IRUGO, sdev_show_wwid, NULL);
#define BLIST_FLAG_NAME(name) \
[const_ilog2((__force __u64)BLIST_##name)] = #name
static const char *const sdev_bflags_name[] = {
#include "scsi_devinfo_tbl.c"
};
#undef BLIST_FLAG_NAME
static ssize_t
sdev_show_blacklist(struct device *dev, struct device_attribute *attr,
char *buf)
{
struct scsi_device *sdev = to_scsi_device(dev);
int i;
ssize_t len = 0;
for (i = 0; i < sizeof(sdev->sdev_bflags) * BITS_PER_BYTE; i++) {
const char *name = NULL;
if (!(sdev->sdev_bflags & (__force blist_flags_t)BIT(i)))
continue;
if (i < ARRAY_SIZE(sdev_bflags_name) && sdev_bflags_name[i])
name = sdev_bflags_name[i];
if (name)
len += snprintf(buf + len, PAGE_SIZE - len,
"%s%s", len ? " " : "", name);
else
len += snprintf(buf + len, PAGE_SIZE - len,
"%sINVALID_BIT(%d)", len ? " " : "", i);
}
if (len)
len += snprintf(buf + len, PAGE_SIZE - len, "\n");
return len;
}
static DEVICE_ATTR(blacklist, S_IRUGO, sdev_show_blacklist, NULL);
#ifdef CONFIG_SCSI_DH
static ssize_t
sdev_show_dh_state(struct device *dev, struct device_attribute *attr,
char *buf)
{
struct scsi_device *sdev = to_scsi_device(dev);
if (!sdev->handler)
return snprintf(buf, 20, "detached\n");
return snprintf(buf, 20, "%s\n", sdev->handler->name);
}
static ssize_t
sdev_store_dh_state(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
{
struct scsi_device *sdev = to_scsi_device(dev);
int err = -EINVAL;
if (sdev->sdev_state == SDEV_CANCEL ||
sdev->sdev_state == SDEV_DEL)
return -ENODEV;
if (!sdev->handler) {
/*
* Attach to a device handler
*/
err = scsi_dh_attach(sdev->request_queue, buf);
} else if (!strncmp(buf, "activate", 8)) {
/*
* Activate a device handler
*/
if (sdev->handler->activate)
err = sdev->handler->activate(sdev, NULL, NULL);
else
err = 0;
} else if (!strncmp(buf, "detach", 6)) {
/*
* Detach from a device handler
*/
sdev_printk(KERN_WARNING, sdev,
"can't detach handler %s.\n",
sdev->handler->name);
err = -EINVAL;
}
return err < 0 ? err : count;
}
static DEVICE_ATTR(dh_state, S_IRUGO | S_IWUSR, sdev_show_dh_state,
sdev_store_dh_state);
static ssize_t
sdev_show_access_state(struct device *dev,
struct device_attribute *attr,
char *buf)
{
struct scsi_device *sdev = to_scsi_device(dev);
unsigned char access_state;
const char *access_state_name;
if (!sdev->handler)
return -EINVAL;
access_state = (sdev->access_state & SCSI_ACCESS_STATE_MASK);
access_state_name = scsi_access_state_name(access_state);
return sprintf(buf, "%s\n",
access_state_name ? access_state_name : "unknown");
}
static DEVICE_ATTR(access_state, S_IRUGO, sdev_show_access_state, NULL);
static ssize_t
sdev_show_preferred_path(struct device *dev,
struct device_attribute *attr,
char *buf)
{
struct scsi_device *sdev = to_scsi_device(dev);
if (!sdev->handler)
return -EINVAL;
if (sdev->access_state & SCSI_ACCESS_STATE_PREFERRED)
return sprintf(buf, "1\n");
else
return sprintf(buf, "0\n");
}
static DEVICE_ATTR(preferred_path, S_IRUGO, sdev_show_preferred_path, NULL);
#endif
[SCSI] add queue_depth ramp up code Current FC HBA queue_depth ramp up code depends on last queue full time. The sdev already has last_queue_full_time field to track last queue full time but stored value is truncated by last four bits. So this patch updates last_queue_full_time without truncating last 4 bits to store full value and then updates its only current usages in scsi_track_queue_full to ignore last four bits to keep current usages same while also use this field in added ramp up code. Adds scsi_handle_queue_ramp_up to ramp up queue_depth on successful completion of IO. The scsi_handle_queue_ramp_up will do ramp up on all luns of a target, just same as ramp down done on all luns on a target. The ramp up is skipped in case the change_queue_depth is not supported by LLD or already reached to added max_queue_depth. Updates added max_queue_depth on every new update to default queue_depth value. The ramp up is also skipped if lapsed time since either last queue ramp up or down is less than LLD specified queue_ramp_up_period. Adds queue_ramp_up_period to sysfs but only if change_queue_depth is supported since ramp up and queue_ramp_up_period is needed only in case change_queue_depth is supported first. Initializes queue_ramp_up_period to 120HZ jiffies as initial default value, it is same as used in existing lpfc and qla2xxx. -v2 Combined all ramp code into this single patch. -v3 Moves max_queue_depth initialization after slave_configure is called from after slave_alloc calling done. Also adjusted max_queue_depth check to skip ramp up if current queue_depth is >= max_queue_depth. -v4 Changes sdev->queue_ramp_up_period unit to ms when using sysfs i/f to store or show its value. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Tested-by: Christof Schmitt <christof.schmitt@de.ibm.com> Tested-by: Giridhar Malavali <giridhar.malavali@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-10-23 07:46:33 +09:00
static ssize_t
sdev_show_queue_ramp_up_period(struct device *dev,
struct device_attribute *attr,
char *buf)
{
struct scsi_device *sdev;
sdev = to_scsi_device(dev);
return snprintf(buf, 20, "%u\n",
jiffies_to_msecs(sdev->queue_ramp_up_period));
}
static ssize_t
sdev_store_queue_ramp_up_period(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
struct scsi_device *sdev = to_scsi_device(dev);
unsigned int period;
[SCSI] add queue_depth ramp up code Current FC HBA queue_depth ramp up code depends on last queue full time. The sdev already has last_queue_full_time field to track last queue full time but stored value is truncated by last four bits. So this patch updates last_queue_full_time without truncating last 4 bits to store full value and then updates its only current usages in scsi_track_queue_full to ignore last four bits to keep current usages same while also use this field in added ramp up code. Adds scsi_handle_queue_ramp_up to ramp up queue_depth on successful completion of IO. The scsi_handle_queue_ramp_up will do ramp up on all luns of a target, just same as ramp down done on all luns on a target. The ramp up is skipped in case the change_queue_depth is not supported by LLD or already reached to added max_queue_depth. Updates added max_queue_depth on every new update to default queue_depth value. The ramp up is also skipped if lapsed time since either last queue ramp up or down is less than LLD specified queue_ramp_up_period. Adds queue_ramp_up_period to sysfs but only if change_queue_depth is supported since ramp up and queue_ramp_up_period is needed only in case change_queue_depth is supported first. Initializes queue_ramp_up_period to 120HZ jiffies as initial default value, it is same as used in existing lpfc and qla2xxx. -v2 Combined all ramp code into this single patch. -v3 Moves max_queue_depth initialization after slave_configure is called from after slave_alloc calling done. Also adjusted max_queue_depth check to skip ramp up if current queue_depth is >= max_queue_depth. -v4 Changes sdev->queue_ramp_up_period unit to ms when using sysfs i/f to store or show its value. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Tested-by: Christof Schmitt <christof.schmitt@de.ibm.com> Tested-by: Giridhar Malavali <giridhar.malavali@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-10-23 07:46:33 +09:00
if (kstrtouint(buf, 10, &period))
[SCSI] add queue_depth ramp up code Current FC HBA queue_depth ramp up code depends on last queue full time. The sdev already has last_queue_full_time field to track last queue full time but stored value is truncated by last four bits. So this patch updates last_queue_full_time without truncating last 4 bits to store full value and then updates its only current usages in scsi_track_queue_full to ignore last four bits to keep current usages same while also use this field in added ramp up code. Adds scsi_handle_queue_ramp_up to ramp up queue_depth on successful completion of IO. The scsi_handle_queue_ramp_up will do ramp up on all luns of a target, just same as ramp down done on all luns on a target. The ramp up is skipped in case the change_queue_depth is not supported by LLD or already reached to added max_queue_depth. Updates added max_queue_depth on every new update to default queue_depth value. The ramp up is also skipped if lapsed time since either last queue ramp up or down is less than LLD specified queue_ramp_up_period. Adds queue_ramp_up_period to sysfs but only if change_queue_depth is supported since ramp up and queue_ramp_up_period is needed only in case change_queue_depth is supported first. Initializes queue_ramp_up_period to 120HZ jiffies as initial default value, it is same as used in existing lpfc and qla2xxx. -v2 Combined all ramp code into this single patch. -v3 Moves max_queue_depth initialization after slave_configure is called from after slave_alloc calling done. Also adjusted max_queue_depth check to skip ramp up if current queue_depth is >= max_queue_depth. -v4 Changes sdev->queue_ramp_up_period unit to ms when using sysfs i/f to store or show its value. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Tested-by: Christof Schmitt <christof.schmitt@de.ibm.com> Tested-by: Giridhar Malavali <giridhar.malavali@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-10-23 07:46:33 +09:00
return -EINVAL;
sdev->queue_ramp_up_period = msecs_to_jiffies(period);
return count;
[SCSI] add queue_depth ramp up code Current FC HBA queue_depth ramp up code depends on last queue full time. The sdev already has last_queue_full_time field to track last queue full time but stored value is truncated by last four bits. So this patch updates last_queue_full_time without truncating last 4 bits to store full value and then updates its only current usages in scsi_track_queue_full to ignore last four bits to keep current usages same while also use this field in added ramp up code. Adds scsi_handle_queue_ramp_up to ramp up queue_depth on successful completion of IO. The scsi_handle_queue_ramp_up will do ramp up on all luns of a target, just same as ramp down done on all luns on a target. The ramp up is skipped in case the change_queue_depth is not supported by LLD or already reached to added max_queue_depth. Updates added max_queue_depth on every new update to default queue_depth value. The ramp up is also skipped if lapsed time since either last queue ramp up or down is less than LLD specified queue_ramp_up_period. Adds queue_ramp_up_period to sysfs but only if change_queue_depth is supported since ramp up and queue_ramp_up_period is needed only in case change_queue_depth is supported first. Initializes queue_ramp_up_period to 120HZ jiffies as initial default value, it is same as used in existing lpfc and qla2xxx. -v2 Combined all ramp code into this single patch. -v3 Moves max_queue_depth initialization after slave_configure is called from after slave_alloc calling done. Also adjusted max_queue_depth check to skip ramp up if current queue_depth is >= max_queue_depth. -v4 Changes sdev->queue_ramp_up_period unit to ms when using sysfs i/f to store or show its value. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Tested-by: Christof Schmitt <christof.schmitt@de.ibm.com> Tested-by: Giridhar Malavali <giridhar.malavali@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-10-23 07:46:33 +09:00
}
static DEVICE_ATTR(queue_ramp_up_period, S_IRUGO | S_IWUSR,
sdev_show_queue_ramp_up_period,
sdev_store_queue_ramp_up_period);
[SCSI] add queue_depth ramp up code Current FC HBA queue_depth ramp up code depends on last queue full time. The sdev already has last_queue_full_time field to track last queue full time but stored value is truncated by last four bits. So this patch updates last_queue_full_time without truncating last 4 bits to store full value and then updates its only current usages in scsi_track_queue_full to ignore last four bits to keep current usages same while also use this field in added ramp up code. Adds scsi_handle_queue_ramp_up to ramp up queue_depth on successful completion of IO. The scsi_handle_queue_ramp_up will do ramp up on all luns of a target, just same as ramp down done on all luns on a target. The ramp up is skipped in case the change_queue_depth is not supported by LLD or already reached to added max_queue_depth. Updates added max_queue_depth on every new update to default queue_depth value. The ramp up is also skipped if lapsed time since either last queue ramp up or down is less than LLD specified queue_ramp_up_period. Adds queue_ramp_up_period to sysfs but only if change_queue_depth is supported since ramp up and queue_ramp_up_period is needed only in case change_queue_depth is supported first. Initializes queue_ramp_up_period to 120HZ jiffies as initial default value, it is same as used in existing lpfc and qla2xxx. -v2 Combined all ramp code into this single patch. -v3 Moves max_queue_depth initialization after slave_configure is called from after slave_alloc calling done. Also adjusted max_queue_depth check to skip ramp up if current queue_depth is >= max_queue_depth. -v4 Changes sdev->queue_ramp_up_period unit to ms when using sysfs i/f to store or show its value. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Tested-by: Christof Schmitt <christof.schmitt@de.ibm.com> Tested-by: Giridhar Malavali <giridhar.malavali@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-10-23 07:46:33 +09:00
static umode_t scsi_sdev_attr_is_visible(struct kobject *kobj,
struct attribute *attr, int i)
{
struct device *dev = container_of(kobj, struct device, kobj);
struct scsi_device *sdev = to_scsi_device(dev);
if (attr == &dev_attr_queue_depth.attr &&
!sdev->host->hostt->change_queue_depth)
return S_IRUGO;
if (attr == &dev_attr_queue_ramp_up_period.attr &&
!sdev->host->hostt->change_queue_depth)
return 0;
#ifdef CONFIG_SCSI_DH
if (attr == &dev_attr_access_state.attr &&
!sdev->handler)
return 0;
if (attr == &dev_attr_preferred_path.attr &&
!sdev->handler)
return 0;
#endif
return attr->mode;
}
static umode_t scsi_sdev_bin_attr_is_visible(struct kobject *kobj,
struct bin_attribute *attr, int i)
{
struct device *dev = container_of(kobj, struct device, kobj);
struct scsi_device *sdev = to_scsi_device(dev);
if (attr == &dev_attr_vpd_pg80 && !sdev->vpd_pg80)
return 0;
if (attr == &dev_attr_vpd_pg83 && !sdev->vpd_pg83)
return 0;
return S_IRUGO;
}
/* Default template for device attributes. May NOT be modified */
static struct attribute *scsi_sdev_attrs[] = {
&dev_attr_device_blocked.attr,
&dev_attr_type.attr,
&dev_attr_scsi_level.attr,
&dev_attr_device_busy.attr,
&dev_attr_vendor.attr,
&dev_attr_model.attr,
&dev_attr_rev.attr,
&dev_attr_rescan.attr,
&dev_attr_delete.attr,
&dev_attr_state.attr,
&dev_attr_timeout.attr,
&dev_attr_eh_timeout.attr,
&dev_attr_iocounterbits.attr,
&dev_attr_iorequest_cnt.attr,
&dev_attr_iodone_cnt.attr,
&dev_attr_ioerr_cnt.attr,
&dev_attr_modalias.attr,
&dev_attr_queue_depth.attr,
&dev_attr_queue_type.attr,
&dev_attr_wwid.attr,
&dev_attr_blacklist.attr,
#ifdef CONFIG_SCSI_DH
&dev_attr_dh_state.attr,
&dev_attr_access_state.attr,
&dev_attr_preferred_path.attr,
#endif
&dev_attr_queue_ramp_up_period.attr,
REF_EVT(media_change),
REF_EVT(inquiry_change_reported),
REF_EVT(capacity_change_reported),
REF_EVT(soft_threshold_reached),
REF_EVT(mode_parameter_change_reported),
REF_EVT(lun_change_reported),
NULL
};
static struct bin_attribute *scsi_sdev_bin_attrs[] = {
&dev_attr_vpd_pg83,
&dev_attr_vpd_pg80,
&dev_attr_inquiry,
NULL
};
static struct attribute_group scsi_sdev_attr_group = {
.attrs = scsi_sdev_attrs,
.bin_attrs = scsi_sdev_bin_attrs,
.is_visible = scsi_sdev_attr_is_visible,
.is_bin_visible = scsi_sdev_bin_attr_is_visible,
};
static const struct attribute_group *scsi_sdev_attr_groups[] = {
&scsi_sdev_attr_group,
NULL
};
static int scsi_target_add(struct scsi_target *starget)
{
int error;
if (starget->state != STARGET_CREATED)
return 0;
error = device_add(&starget->dev);
if (error) {
dev_err(&starget->dev, "target device_add failed, error %d\n", error);
return error;
}
transport_add_device(&starget->dev);
starget->state = STARGET_RUNNING;
[SCSI] implement runtime Power Management This patch (as1398b) adds runtime PM support to the SCSI layer. Only the machanism is provided; use of it is up to the various high-level drivers, and the patch doesn't change any of them. Except for sg -- the patch expicitly prevents a device from being runtime-suspended while its sg device file is open. The implementation is simplistic. In general, hosts and targets are automatically suspended when all their children are asleep, but for them the runtime-suspend code doesn't actually do anything. (A host's runtime PM status is propagated up the device tree, though, so a runtime-PM-aware lower-level driver could power down the host adapter hardware at the appropriate times.) There are comments indicating where a transport class might be notified or some other hooks added. LUNs are runtime-suspended by calling the drivers' existing suspend handlers (and likewise for runtime-resume). Somewhat arbitrarily, the implementation delays for 100 ms before suspending an eligible LUN. This is because there typically are occasions during bootup when the same device file is opened and closed several times in quick succession. The way this all works is that the SCSI core increments a device's PM-usage count when it is registered. If a high-level driver does nothing then the device will not be eligible for runtime-suspend because of the elevated usage count. If a high-level driver wants to use runtime PM then it can call scsi_autopm_put_device() in its probe routine to decrement the usage count and scsi_autopm_get_device() in its remove routine to restore the original count. Hosts, targets, and LUNs are not suspended while they are being probed or removed, or while the error handler is running. In fact, a fairly large part of the patch consists of code to make sure that things aren't suspended at such times. [jejb: fix up compile issues in PM config variations] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-06-17 23:41:42 +09:00
pm_runtime_set_active(&starget->dev);
pm_runtime_enable(&starget->dev);
device_enable_async_suspend(&starget->dev);
return 0;
}
/**
* scsi_sysfs_add_sdev - add scsi device to sysfs
* @sdev: scsi_device to add
*
* Return value:
* 0 on Success / non-zero on Failure
**/
int scsi_sysfs_add_sdev(struct scsi_device *sdev)
{
int error, i;
struct request_queue *rq = sdev->request_queue;
struct scsi_target *starget = sdev->sdev_target;
error = scsi_target_add(starget);
if (error)
return error;
transport_configure_device(&starget->dev);
[SCSI] implement runtime Power Management This patch (as1398b) adds runtime PM support to the SCSI layer. Only the machanism is provided; use of it is up to the various high-level drivers, and the patch doesn't change any of them. Except for sg -- the patch expicitly prevents a device from being runtime-suspended while its sg device file is open. The implementation is simplistic. In general, hosts and targets are automatically suspended when all their children are asleep, but for them the runtime-suspend code doesn't actually do anything. (A host's runtime PM status is propagated up the device tree, though, so a runtime-PM-aware lower-level driver could power down the host adapter hardware at the appropriate times.) There are comments indicating where a transport class might be notified or some other hooks added. LUNs are runtime-suspended by calling the drivers' existing suspend handlers (and likewise for runtime-resume). Somewhat arbitrarily, the implementation delays for 100 ms before suspending an eligible LUN. This is because there typically are occasions during bootup when the same device file is opened and closed several times in quick succession. The way this all works is that the SCSI core increments a device's PM-usage count when it is registered. If a high-level driver does nothing then the device will not be eligible for runtime-suspend because of the elevated usage count. If a high-level driver wants to use runtime PM then it can call scsi_autopm_put_device() in its probe routine to decrement the usage count and scsi_autopm_get_device() in its remove routine to restore the original count. Hosts, targets, and LUNs are not suspended while they are being probed or removed, or while the error handler is running. In fact, a fairly large part of the patch consists of code to make sure that things aren't suspended at such times. [jejb: fix up compile issues in PM config variations] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-06-17 23:41:42 +09:00
device_enable_async_suspend(&sdev->sdev_gendev);
[SCSI] implement runtime Power Management This patch (as1398b) adds runtime PM support to the SCSI layer. Only the machanism is provided; use of it is up to the various high-level drivers, and the patch doesn't change any of them. Except for sg -- the patch expicitly prevents a device from being runtime-suspended while its sg device file is open. The implementation is simplistic. In general, hosts and targets are automatically suspended when all their children are asleep, but for them the runtime-suspend code doesn't actually do anything. (A host's runtime PM status is propagated up the device tree, though, so a runtime-PM-aware lower-level driver could power down the host adapter hardware at the appropriate times.) There are comments indicating where a transport class might be notified or some other hooks added. LUNs are runtime-suspended by calling the drivers' existing suspend handlers (and likewise for runtime-resume). Somewhat arbitrarily, the implementation delays for 100 ms before suspending an eligible LUN. This is because there typically are occasions during bootup when the same device file is opened and closed several times in quick succession. The way this all works is that the SCSI core increments a device's PM-usage count when it is registered. If a high-level driver does nothing then the device will not be eligible for runtime-suspend because of the elevated usage count. If a high-level driver wants to use runtime PM then it can call scsi_autopm_put_device() in its probe routine to decrement the usage count and scsi_autopm_get_device() in its remove routine to restore the original count. Hosts, targets, and LUNs are not suspended while they are being probed or removed, or while the error handler is running. In fact, a fairly large part of the patch consists of code to make sure that things aren't suspended at such times. [jejb: fix up compile issues in PM config variations] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-06-17 23:41:42 +09:00
scsi_autopm_get_target(starget);
pm_runtime_set_active(&sdev->sdev_gendev);
pm_runtime_forbid(&sdev->sdev_gendev);
pm_runtime_enable(&sdev->sdev_gendev);
scsi_autopm_put_target(starget);
scsi_autopm_get_device(sdev);
scsi_dh_add_device(sdev);
error = device_add(&sdev->sdev_gendev);
if (error) {
sdev_printk(KERN_INFO, sdev,
"failed to add device: %d\n", error);
return error;
}
device_enable_async_suspend(&sdev->sdev_dev);
error = device_add(&sdev->sdev_dev);
if (error) {
sdev_printk(KERN_INFO, sdev,
"failed to add class device: %d\n", error);
device_del(&sdev->sdev_gendev);
return error;
}
transport_add_device(&sdev->sdev_gendev);
sdev->is_visible = 1;
error = bsg_scsi_register_queue(rq, &sdev->sdev_gendev);
if (error)
/* we're treating error on bsg register as non-fatal,
* so pretend nothing went wrong */
sdev_printk(KERN_INFO, sdev,
"Failed to register bsg queue, errno=%d\n", error);
/* add additional host specific attributes */
if (sdev->host->hostt->sdev_attrs) {
for (i = 0; sdev->host->hostt->sdev_attrs[i]; i++) {
error = device_create_file(&sdev->sdev_gendev,
sdev->host->hostt->sdev_attrs[i]);
if (error)
return error;
}
}
if (sdev->host->hostt->sdev_groups) {
error = sysfs_create_groups(&sdev->sdev_gendev.kobj,
sdev->host->hostt->sdev_groups);
if (error)
return error;
}
scsi_autopm_put_device(sdev);
return error;
}
void __scsi_remove_device(struct scsi_device *sdev)
{
struct device *dev = &sdev->sdev_gendev;
int res;
scsi_sysfs: protect against double execution of __scsi_remove_device() On some host errors storvsc module tries to remove sdev by scheduling a job which does the following: sdev = scsi_device_lookup(wrk->host, 0, 0, wrk->lun); if (sdev) { scsi_remove_device(sdev); scsi_device_put(sdev); } While this code seems correct the following crash is observed: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC RIP: 0010:[<ffffffff81169979>] [<ffffffff81169979>] bdi_destroy+0x39/0x220 ... [<ffffffff814aecdc>] ? _raw_spin_unlock_irq+0x2c/0x40 [<ffffffff8127b7db>] blk_cleanup_queue+0x17b/0x270 [<ffffffffa00b54c4>] __scsi_remove_device+0x54/0xd0 [scsi_mod] [<ffffffffa00b556b>] scsi_remove_device+0x2b/0x40 [scsi_mod] [<ffffffffa00ec47d>] storvsc_remove_lun+0x3d/0x60 [hv_storvsc] [<ffffffff81080791>] process_one_work+0x1b1/0x530 ... The problem comes with the fact that many such jobs (for the same device) are being scheduled simultaneously. While scsi_remove_device() uses shost->scan_mutex and scsi_device_lookup() will fail for a device in SDEV_DEL state there is no protection against someone who did scsi_device_lookup() before we actually entered __scsi_remove_device(). So the whole scenario looks like that: two callers do simultaneous (or preemption happens) calls to scsi_device_lookup() ant these calls succeed for both of them, after that they try doing scsi_remove_device(). shost->scan_mutex only serializes their calls to __scsi_remove_device() and we end up doing the cleanup path twice. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2015-11-19 22:02:19 +09:00
/*
* This cleanup path is not reentrant and while it is impossible
* to get a new reference with scsi_device_get() someone can still
* hold a previously acquired one.
*/
if (sdev->sdev_state == SDEV_DEL)
return;
if (sdev->is_visible) {
/*
* If scsi_internal_target_block() is running concurrently,
* wait until it has finished before changing the device state.
*/
mutex_lock(&sdev->state_mutex);
/*
* If blocked, we go straight to DEL and restart the queue so
* any commands issued during driver shutdown (like sync
* cache) are errored immediately.
*/
res = scsi_device_set_state(sdev, SDEV_CANCEL);
if (res != 0) {
res = scsi_device_set_state(sdev, SDEV_DEL);
if (res == 0)
scsi_start_queue(sdev);
}
mutex_unlock(&sdev->state_mutex);
if (res != 0)
return;
if (sdev->host->hostt->sdev_groups)
sysfs_remove_groups(&sdev->sdev_gendev.kobj,
sdev->host->hostt->sdev_groups);
bsg_unregister_queue(sdev->request_queue);
device_unregister(&sdev->sdev_dev);
transport_remove_device(dev);
device_del(dev);
} else
put_device(&sdev->sdev_dev);
/*
* Stop accepting new requests and wait until all queuecommand() and
* scsi_run_queue() invocations have finished before tearing down the
* device.
*/
mutex_lock(&sdev->state_mutex);
scsi_device_set_state(sdev, SDEV_DEL);
mutex_unlock(&sdev->state_mutex);
blk_cleanup_queue(sdev->request_queue);
cancel_work_sync(&sdev->requeue_work);
if (sdev->host->hostt->slave_destroy)
sdev->host->hostt->slave_destroy(sdev);
transport_destroy_device(dev);
/*
* Paired with the kref_get() in scsi_sysfs_initialize(). We have
* remoed sysfs visibility from the device, so make the target
* invisible if this was the last device underneath it.
*/
scsi_target_reap(scsi_target(sdev));
put_device(dev);
}
/**
* scsi_remove_device - unregister a device from the scsi bus
* @sdev: scsi_device to unregister
**/
void scsi_remove_device(struct scsi_device *sdev)
{
struct Scsi_Host *shost = sdev->host;
mutex_lock(&shost->scan_mutex);
__scsi_remove_device(sdev);
mutex_unlock(&shost->scan_mutex);
}
EXPORT_SYMBOL(scsi_remove_device);
static void __scsi_remove_target(struct scsi_target *starget)
{
struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
unsigned long flags;
struct scsi_device *sdev;
spin_lock_irqsave(shost->host_lock, flags);
restart:
list_for_each_entry(sdev, &shost->__devices, siblings) {
/*
* We cannot call scsi_device_get() here, as
* we might've been called from rmmod() causing
* scsi_device_get() to fail the module_is_live()
* check.
*/
if (sdev->channel != starget->channel ||
sdev->id != starget->id)
continue;
if (sdev->sdev_state == SDEV_DEL ||
sdev->sdev_state == SDEV_CANCEL ||
!get_device(&sdev->sdev_gendev))
continue;
spin_unlock_irqrestore(shost->host_lock, flags);
scsi_remove_device(sdev);
put_device(&sdev->sdev_gendev);
spin_lock_irqsave(shost->host_lock, flags);
goto restart;
}
spin_unlock_irqrestore(shost->host_lock, flags);
}
/**
* scsi_remove_target - try to remove a target and all its devices
* @dev: generic starget or parent of generic stargets to be removed
*
* Note: This is slightly racy. It is possible that if the user
* requests the addition of another device then the target won't be
* removed.
*/
void scsi_remove_target(struct device *dev)
{
struct Scsi_Host *shost = dev_to_shost(dev->parent);
struct scsi_target *starget;
unsigned long flags;
restart:
spin_lock_irqsave(shost->host_lock, flags);
list_for_each_entry(starget, &shost->__targets, siblings) {
scsi: fix soft lockup in scsi_remove_target() on module removal This softlockup is currently happening: [ 444.088002] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:1:29] [ 444.088002] Modules linked in: lpfc(-) qla2x00tgt(O) qla2xxx_scst(O) scst_vdisk(O) scsi_transport_fc libcrc32c scst(O) dlm configfs nfsd lockd grace nfs_acl auth_rpcgss sunrpc ed d snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device dm_mod iTCO_wdt snd_hda_codec_realtek snd_hda_codec_generic gpio_ich iTCO_vendor_support ppdev snd_hda_intel snd_hda_codec snd_hda _core snd_hwdep tg3 snd_pcm snd_timer libphy lpc_ich parport_pc ptp acpi_cpufreq snd pps_core fjes parport i2c_i801 ehci_pci tpm_tis tpm sr_mod cdrom soundcore floppy hwmon sg 8250_ fintek pcspkr i915 drm_kms_helper uhci_hcd ehci_hcd drm fb_sys_fops sysimgblt sysfillrect syscopyarea i2c_algo_bit usbcore button video usb_common fan ata_generic ata_piix libata th ermal [ 444.088002] CPU: 1 PID: 29 Comm: kworker/1:1 Tainted: G O 4.4.0-rc5-2.g1e923a3-default #1 [ 444.088002] Hardware name: FUJITSU SIEMENS ESPRIMO E /D2164-A1, BIOS 5.00 R1.10.2164.A1 05/08/2006 [ 444.088002] Workqueue: fc_wq_4 fc_rport_final_delete [scsi_transport_fc] [ 444.088002] task: f6266ec0 ti: f6268000 task.ti: f6268000 [ 444.088002] EIP: 0060:[<c07e7044>] EFLAGS: 00000286 CPU: 1 [ 444.088002] EIP is at _raw_spin_unlock_irqrestore+0x14/0x20 [ 444.088002] EAX: 00000286 EBX: f20d3800 ECX: 00000002 EDX: 00000286 [ 444.088002] ESI: f50ba800 EDI: f2146848 EBP: f6269ec8 ESP: f6269ec8 [ 444.088002] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 444.088002] CR0: 8005003b CR2: 08f96600 CR3: 363ae000 CR4: 000006d0 [ 444.088002] Stack: [ 444.088002] f6269eec c066b0f7 00000286 f2146848 f50ba808 f50ba800 f50ba800 f2146a90 [ 444.088002] f2146848 f6269f08 f8f0a4ed f3141000 f2146800 f2146a90 f619fa00 00000040 [ 444.088002] f6269f40 c026cb25 00000001 166c6392 00000061 f6757140 f6136340 00000004 [ 444.088002] Call Trace: [ 444.088002] [<c066b0f7>] scsi_remove_target+0x167/0x1c0 [ 444.088002] [<f8f0a4ed>] fc_rport_final_delete+0x9d/0x1e0 [scsi_transport_fc] [ 444.088002] [<c026cb25>] process_one_work+0x155/0x3e0 [ 444.088002] [<c026cde7>] worker_thread+0x37/0x490 [ 444.088002] [<c027214b>] kthread+0x9b/0xb0 [ 444.088002] [<c07e72c1>] ret_from_kernel_thread+0x21/0x40 What appears to be happening is that something has pinned the target so it can't go into STARGET_DEL via final release and the loop in scsi_remove_target spins endlessly until that happens. The fix for this soft lockup is to not keep looping over a device that we've called remove on but which hasn't gone into DEL state. This patch will retain a simplistic memory of the last target and not keep looping over it. Reported-by: Sebastian Herbszt <herbszt@gmx.de> Tested-by: Sebastian Herbszt <herbszt@gmx.de> Fixes: 40998193560dab6c3ce8d25f4fa58a23e252ef38 Cc: stable@vger.kernel.org Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2016-02-11 01:03:26 +09:00
if (starget->state == STARGET_DEL ||
scsi: Add STARGET_CREATED_REMOVE state to scsi_target_state The addition of the STARGET_REMOVE state had the side effect of introducing a race condition that can cause a crash. scsi_target_reap_ref_release() checks the starget->state to see if it still in STARGET_CREATED, and if so, skips calling transport_remove_device() and device_del(), because the starget->state is only set to STARGET_RUNNING after scsi_target_add() has called device_add() and transport_add_device(). However, if an rport loss occurs while a target is being scanned, it can happen that scsi_remove_target() will be called while the starget is still in the STARGET_CREATED state. In this case, the starget->state will be set to STARGET_REMOVE, and as a result, scsi_target_reap_ref_release() will take the wrong path. The end result is a panic: [ 1255.356653] Oops: 0000 [#1] SMP [ 1255.360154] Modules linked in: x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32c_intel ghash_clmulni_i [ 1255.393234] CPU: 5 PID: 149 Comm: kworker/u96:4 Tainted: G W 4.11.0+ #8 [ 1255.401879] Hardware name: Dell Inc. PowerEdge R320/08VT7V, BIOS 2.0.22 11/19/2013 [ 1255.410327] Workqueue: scsi_wq_6 fc_scsi_scan_rport [scsi_transport_fc] [ 1255.417720] task: ffff88060ca8c8c0 task.stack: ffffc900048a8000 [ 1255.424331] RIP: 0010:kernfs_find_ns+0x13/0xc0 [ 1255.429287] RSP: 0018:ffffc900048abbf0 EFLAGS: 00010246 [ 1255.435123] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 1255.443083] RDX: 0000000000000000 RSI: ffffffff8188d659 RDI: 0000000000000000 [ 1255.451043] RBP: ffffc900048abc10 R08: 0000000000000000 R09: 0000012433fe0025 [ 1255.459005] R10: 0000000025e5a4b5 R11: 0000000025e5a4b5 R12: ffffffff8188d659 [ 1255.466972] R13: 0000000000000000 R14: ffff8805f55e5088 R15: 0000000000000000 [ 1255.474931] FS: 0000000000000000(0000) GS:ffff880616b40000(0000) knlGS:0000000000000000 [ 1255.483959] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1255.490370] CR2: 0000000000000068 CR3: 0000000001c09000 CR4: 00000000000406e0 [ 1255.498332] Call Trace: [ 1255.501058] kernfs_find_and_get_ns+0x31/0x60 [ 1255.505916] sysfs_unmerge_group+0x1d/0x60 [ 1255.510498] dpm_sysfs_remove+0x22/0x60 [ 1255.514783] device_del+0xf4/0x2e0 [ 1255.518577] ? device_remove_file+0x19/0x20 [ 1255.523241] attribute_container_class_device_del+0x1a/0x20 [ 1255.529457] transport_remove_classdev+0x4e/0x60 [ 1255.534607] ? transport_add_class_device+0x40/0x40 [ 1255.540046] attribute_container_device_trigger+0xb0/0xc0 [ 1255.546069] transport_remove_device+0x15/0x20 [ 1255.551025] scsi_target_reap_ref_release+0x25/0x40 [ 1255.556467] scsi_target_reap+0x2e/0x40 [ 1255.560744] __scsi_scan_target+0xaa/0x5b0 [ 1255.565312] scsi_scan_target+0xec/0x100 [ 1255.569689] fc_scsi_scan_rport+0xb1/0xc0 [scsi_transport_fc] [ 1255.576099] process_one_work+0x14b/0x390 [ 1255.580569] worker_thread+0x4b/0x390 [ 1255.584651] kthread+0x109/0x140 [ 1255.588251] ? rescuer_thread+0x330/0x330 [ 1255.592730] ? kthread_park+0x60/0x60 [ 1255.596815] ret_from_fork+0x29/0x40 [ 1255.600801] Code: 24 08 48 83 42 40 01 5b 41 5c 5d c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 [ 1255.621876] RIP: kernfs_find_ns+0x13/0xc0 RSP: ffffc900048abbf0 [ 1255.628479] CR2: 0000000000000068 [ 1255.632756] ---[ end trace 34a69ba0477d036f ]--- Fix this by adding another scsi_target state STARGET_CREATED_REMOVE to distinguish this case. Fixes: f05795d3d771 ("scsi: Add intermediate STARGET_REMOVE state to scsi_target_state") Reported-by: David Jeffery <djeffery@redhat.com> Signed-off-by: Ewan D. Milne <emilne@redhat.com> Cc: <stable@vger.kernel.org> Reviewed-by: Laurence Oberman <loberman@redhat.com> Tested-by: Laurence Oberman <loberman@redhat.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-06-28 03:55:58 +09:00
starget->state == STARGET_REMOVE ||
starget->state == STARGET_CREATED_REMOVE)
continue;
if (starget->dev.parent == dev || &starget->dev == dev) {
kref_get(&starget->reap_ref);
scsi: Add STARGET_CREATED_REMOVE state to scsi_target_state The addition of the STARGET_REMOVE state had the side effect of introducing a race condition that can cause a crash. scsi_target_reap_ref_release() checks the starget->state to see if it still in STARGET_CREATED, and if so, skips calling transport_remove_device() and device_del(), because the starget->state is only set to STARGET_RUNNING after scsi_target_add() has called device_add() and transport_add_device(). However, if an rport loss occurs while a target is being scanned, it can happen that scsi_remove_target() will be called while the starget is still in the STARGET_CREATED state. In this case, the starget->state will be set to STARGET_REMOVE, and as a result, scsi_target_reap_ref_release() will take the wrong path. The end result is a panic: [ 1255.356653] Oops: 0000 [#1] SMP [ 1255.360154] Modules linked in: x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32c_intel ghash_clmulni_i [ 1255.393234] CPU: 5 PID: 149 Comm: kworker/u96:4 Tainted: G W 4.11.0+ #8 [ 1255.401879] Hardware name: Dell Inc. PowerEdge R320/08VT7V, BIOS 2.0.22 11/19/2013 [ 1255.410327] Workqueue: scsi_wq_6 fc_scsi_scan_rport [scsi_transport_fc] [ 1255.417720] task: ffff88060ca8c8c0 task.stack: ffffc900048a8000 [ 1255.424331] RIP: 0010:kernfs_find_ns+0x13/0xc0 [ 1255.429287] RSP: 0018:ffffc900048abbf0 EFLAGS: 00010246 [ 1255.435123] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 1255.443083] RDX: 0000000000000000 RSI: ffffffff8188d659 RDI: 0000000000000000 [ 1255.451043] RBP: ffffc900048abc10 R08: 0000000000000000 R09: 0000012433fe0025 [ 1255.459005] R10: 0000000025e5a4b5 R11: 0000000025e5a4b5 R12: ffffffff8188d659 [ 1255.466972] R13: 0000000000000000 R14: ffff8805f55e5088 R15: 0000000000000000 [ 1255.474931] FS: 0000000000000000(0000) GS:ffff880616b40000(0000) knlGS:0000000000000000 [ 1255.483959] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1255.490370] CR2: 0000000000000068 CR3: 0000000001c09000 CR4: 00000000000406e0 [ 1255.498332] Call Trace: [ 1255.501058] kernfs_find_and_get_ns+0x31/0x60 [ 1255.505916] sysfs_unmerge_group+0x1d/0x60 [ 1255.510498] dpm_sysfs_remove+0x22/0x60 [ 1255.514783] device_del+0xf4/0x2e0 [ 1255.518577] ? device_remove_file+0x19/0x20 [ 1255.523241] attribute_container_class_device_del+0x1a/0x20 [ 1255.529457] transport_remove_classdev+0x4e/0x60 [ 1255.534607] ? transport_add_class_device+0x40/0x40 [ 1255.540046] attribute_container_device_trigger+0xb0/0xc0 [ 1255.546069] transport_remove_device+0x15/0x20 [ 1255.551025] scsi_target_reap_ref_release+0x25/0x40 [ 1255.556467] scsi_target_reap+0x2e/0x40 [ 1255.560744] __scsi_scan_target+0xaa/0x5b0 [ 1255.565312] scsi_scan_target+0xec/0x100 [ 1255.569689] fc_scsi_scan_rport+0xb1/0xc0 [scsi_transport_fc] [ 1255.576099] process_one_work+0x14b/0x390 [ 1255.580569] worker_thread+0x4b/0x390 [ 1255.584651] kthread+0x109/0x140 [ 1255.588251] ? rescuer_thread+0x330/0x330 [ 1255.592730] ? kthread_park+0x60/0x60 [ 1255.596815] ret_from_fork+0x29/0x40 [ 1255.600801] Code: 24 08 48 83 42 40 01 5b 41 5c 5d c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 [ 1255.621876] RIP: kernfs_find_ns+0x13/0xc0 RSP: ffffc900048abbf0 [ 1255.628479] CR2: 0000000000000068 [ 1255.632756] ---[ end trace 34a69ba0477d036f ]--- Fix this by adding another scsi_target state STARGET_CREATED_REMOVE to distinguish this case. Fixes: f05795d3d771 ("scsi: Add intermediate STARGET_REMOVE state to scsi_target_state") Reported-by: David Jeffery <djeffery@redhat.com> Signed-off-by: Ewan D. Milne <emilne@redhat.com> Cc: <stable@vger.kernel.org> Reviewed-by: Laurence Oberman <loberman@redhat.com> Tested-by: Laurence Oberman <loberman@redhat.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-06-28 03:55:58 +09:00
if (starget->state == STARGET_CREATED)
starget->state = STARGET_CREATED_REMOVE;
else
starget->state = STARGET_REMOVE;
spin_unlock_irqrestore(shost->host_lock, flags);
__scsi_remove_target(starget);
scsi_target_reap(starget);
goto restart;
}
}
spin_unlock_irqrestore(shost->host_lock, flags);
}
EXPORT_SYMBOL(scsi_remove_target);
int scsi_register_driver(struct device_driver *drv)
{
drv->bus = &scsi_bus_type;
return driver_register(drv);
}
EXPORT_SYMBOL(scsi_register_driver);
int scsi_register_interface(struct class_interface *intf)
{
intf->class = &sdev_class;
return class_interface_register(intf);
}
EXPORT_SYMBOL(scsi_register_interface);
/**
* scsi_sysfs_add_host - add scsi host to subsystem
* @shost: scsi host struct to add to subsystem
**/
int scsi_sysfs_add_host(struct Scsi_Host *shost)
{
int error, i;
/* add host specific attributes */
if (shost->hostt->shost_attrs) {
for (i = 0; shost->hostt->shost_attrs[i]; i++) {
error = device_create_file(&shost->shost_dev,
shost->hostt->shost_attrs[i]);
if (error)
return error;
}
}
transport_register_device(&shost->shost_gendev);
transport_configure_device(&shost->shost_gendev);
return 0;
}
static struct device_type scsi_dev_type = {
.name = "scsi_device",
.release = scsi_device_dev_release,
.groups = scsi_sdev_attr_groups,
};
void scsi_sysfs_device_initialize(struct scsi_device *sdev)
{
unsigned long flags;
struct Scsi_Host *shost = sdev->host;
struct scsi_target *starget = sdev->sdev_target;
device_initialize(&sdev->sdev_gendev);
sdev->sdev_gendev.bus = &scsi_bus_type;
sdev->sdev_gendev.type = &scsi_dev_type;
dev_set_name(&sdev->sdev_gendev, "%d:%d:%d:%llu",
sdev->host->host_no, sdev->channel, sdev->id, sdev->lun);
device_initialize(&sdev->sdev_dev);
sdev->sdev_dev.parent = get_device(&sdev->sdev_gendev);
sdev->sdev_dev.class = &sdev_class;
dev_set_name(&sdev->sdev_dev, "%d:%d:%d:%llu",
sdev->host->host_no, sdev->channel, sdev->id, sdev->lun);
scsi: don't store LUN bits in CDB[1] for USB mass-storage devices The SCSI specification requires that the second Command Data Byte should contain the LUN value in its high-order bits if the recipient device reports SCSI level 2 or below. Nevertheless, some USB mass-storage devices use those bits for other purposes in vendor-specific commands. Currently Linux has no way to send such commands, because the SCSI stack always overwrites the LUN bits. Testing shows that Windows 7 and XP do not store the LUN bits in the CDB when sending commands to a USB device. This doesn't matter if the device uses the Bulk-Only or UAS transports (which virtually all modern USB mass-storage devices do), as these have a separate mechanism for sending the LUN value. Therefore this patch introduces a flag in the Scsi_Host structure to inform the SCSI midlayer that a transport does not require the LUN bits to be stored in the CDB, and it makes usb-storage set this flag for all devices using the Bulk-Only transport. (UAS is handled by a separate driver, but it doesn't really matter because no SCSI-2 or lower device is at all likely to use UAS.) The patch also cleans up the code responsible for storing the LUN value by adding a bitflag to the scsi_device structure. The test for whether to stick the LUN value in the CDB can be made when the device is probed, and stored for future use rather than being made over and over in the fast path. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Tiziano Bacocco <tiziano.bacocco@gmail.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-09-03 00:35:50 +09:00
/*
* Get a default scsi_level from the target (derived from sibling
* devices). This is the best we can do for guessing how to set
* sdev->lun_in_cdb for the initial INQUIRY command. For LUN 0 the
* setting doesn't matter, because all the bits are zero anyway.
* But it does matter for higher LUNs.
*/
sdev->scsi_level = starget->scsi_level;
scsi: don't store LUN bits in CDB[1] for USB mass-storage devices The SCSI specification requires that the second Command Data Byte should contain the LUN value in its high-order bits if the recipient device reports SCSI level 2 or below. Nevertheless, some USB mass-storage devices use those bits for other purposes in vendor-specific commands. Currently Linux has no way to send such commands, because the SCSI stack always overwrites the LUN bits. Testing shows that Windows 7 and XP do not store the LUN bits in the CDB when sending commands to a USB device. This doesn't matter if the device uses the Bulk-Only or UAS transports (which virtually all modern USB mass-storage devices do), as these have a separate mechanism for sending the LUN value. Therefore this patch introduces a flag in the Scsi_Host structure to inform the SCSI midlayer that a transport does not require the LUN bits to be stored in the CDB, and it makes usb-storage set this flag for all devices using the Bulk-Only transport. (UAS is handled by a separate driver, but it doesn't really matter because no SCSI-2 or lower device is at all likely to use UAS.) The patch also cleans up the code responsible for storing the LUN value by adding a bitflag to the scsi_device structure. The test for whether to stick the LUN value in the CDB can be made when the device is probed, and stored for future use rather than being made over and over in the fast path. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Tiziano Bacocco <tiziano.bacocco@gmail.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-09-03 00:35:50 +09:00
if (sdev->scsi_level <= SCSI_2 &&
sdev->scsi_level != SCSI_UNKNOWN &&
!shost->no_scsi2_lun_in_cdb)
sdev->lun_in_cdb = 1;
transport_setup_device(&sdev->sdev_gendev);
spin_lock_irqsave(shost->host_lock, flags);
list_add_tail(&sdev->same_target_siblings, &starget->devices);
list_add_tail(&sdev->siblings, &shost->__devices);
spin_unlock_irqrestore(shost->host_lock, flags);
/*
* device can now only be removed via __scsi_remove_device() so hold
* the target. Target will be held in CREATED state until something
* beneath it becomes visible (in which case it moves to RUNNING)
*/
kref_get(&starget->reap_ref);
}
int scsi_is_sdev_device(const struct device *dev)
{
return dev->type == &scsi_dev_type;
}
EXPORT_SYMBOL(scsi_is_sdev_device);
/* A blank transport template that is used in drivers that don't
* yet implement Transport Attributes */
struct scsi_transport_template blank_transport_template = { { { {NULL, }, }, }, };