Merge branch 'for-4.12/block' of git://git.kernel.dk/linux-block

Pull block layer updates from Jens Axboe:

 - Add BFQ IO scheduler under the new blk-mq scheduling framework. BFQ
   was initially a fork of CFQ, but subsequently changed to implement
   fairness based on B-WF2Q+, a modified variant of WF2Q. BFQ is meant
   to be used on desktop type single drives, providing good fairness.
   From Paolo.

 - Add Kyber IO scheduler. This is a full multiqueue aware scheduler,
   using a scalable token based algorithm that throttles IO based on
   live completion IO stats, similary to blk-wbt. From Omar.

 - A series from Jan, moving users to separately allocated backing
   devices. This continues the work of separating backing device life
   times, solving various problems with hot removal.

 - A series of updates for lightnvm, mostly from Javier. Includes a
   'pblk' target that exposes an open channel SSD as a physical block
   device.

 - A series of fixes and improvements for nbd from Josef.

 - A series from Omar, removing queue sharing between devices on mostly
   legacy drivers. This helps us clean up other bits, if we know that a
   queue only has a single device backing. This has been overdue for
   more than a decade.

 - Fixes for the blk-stats, and improvements to unify the stats and user
   windows. This both improves blk-wbt, and enables other users to
   register a need to receive IO stats for a device. From Omar.

 - blk-throttle improvements from Shaohua. This provides a scalable
   framework for implementing scalable priotization - particularly for
   blk-mq, but applicable to any type of block device. The interface is
   marked experimental for now.

 - Bucketized IO stats for IO polling from Stephen Bates. This improves
   efficiency of polled workloads in the presence of mixed block size
   IO.

 - A few fixes for opal, from Scott.

 - A few pulls for NVMe, including a lot of fixes for NVMe-over-fabrics.
   From a variety of folks, mostly Sagi and James Smart.

 - A series from Bart, improving our exposed info and capabilities from
   the blk-mq debugfs support.

 - A series from Christoph, cleaning up how handle WRITE_ZEROES.

 - A series from Christoph, cleaning up the block layer handling of how
   we track errors in a request. On top of being a nice cleanup, it also
   shrinks the size of struct request a bit.

 - Removal of mg_disk and hd (sorry Linus) by Christoph. The former was
   never used by platforms, and the latter has outlived it's usefulness.

 - Various little bug fixes and cleanups from a wide variety of folks.

* 'for-4.12/block' of git://git.kernel.dk/linux-block: (329 commits)
  block: hide badblocks attribute by default
  blk-mq: unify hctx delay_work and run_work
  block: add kblock_mod_delayed_work_on()
  blk-mq: unify hctx delayed_run_work and run_work
  nbd: fix use after free on module unload
  MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler
  blk-mq-sched: alloate reserved tags out of normal pool
  mtip32xx: use runtime tag to initialize command header
  scsi: Implement blk_mq_ops.show_rq()
  blk-mq: Add blk_mq_ops.show_rq()
  blk-mq: Show operation, cmd_flags and rq_flags names
  blk-mq: Make blk_flags_show() callers append a newline character
  blk-mq: Move the "state" debugfs attribute one level down
  blk-mq: Unregister debugfs attributes earlier
  blk-mq: Only unregister hctxs for which registration succeeded
  blk-mq-debugfs: Rename functions for registering and unregistering the mq directory
  blk-mq: Let blk_mq_debugfs_register() look up the queue name
  blk-mq: Register <dev>/queue/mq after having registered <dev>/queue
  ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset
  ide-pm: always pass 0 error to __blk_end_request_all
  ..
This commit is contained in:
Linus Torvalds 2017-05-01 10:39:57 -07:00
commit 694752922b
255 changed files with 24703 additions and 6212 deletions

View File

@ -213,14 +213,8 @@ What: /sys/block/<disk>/queue/discard_zeroes_data
Date: May 2011
Contact: Martin K. Petersen <martin.petersen@oracle.com>
Description:
Devices that support discard functionality may return
stale or random data when a previously discarded block
is read back. This can cause problems if the filesystem
expects discarded blocks to be explicitly cleared. If a
device reports that it deterministically returns zeroes
when a discarded area is read the discard_zeroes_data
parameter will be set to one. Otherwise it will be 0 and
the result of reading a discarded area is undefined.
Will always return 0. Don't rely on any specific behavior
for discards, and don't read this file.
What: /sys/block/<disk>/queue/write_same_max_bytes
Date: January 2012

View File

@ -1,5 +1,7 @@
00-INDEX
- This file
bfq-iosched.txt
- BFQ IO scheduler and its tunables
biodoc.txt
- Notes on the Generic Block Layer Rewrite in Linux 2.5
biovecs.txt

View File

@ -0,0 +1,531 @@
BFQ (Budget Fair Queueing)
==========================
BFQ is a proportional-share I/O scheduler, with some extra
low-latency capabilities. In addition to cgroups support (blkio or io
controllers), BFQ's main features are:
- BFQ guarantees a high system and application responsiveness, and a
low latency for time-sensitive applications, such as audio or video
players;
- BFQ distributes bandwidth, and not just time, among processes or
groups (switching back to time distribution when needed to keep
throughput high).
On average CPUs, the current version of BFQ can handle devices
performing at most ~30K IOPS; at most ~50 KIOPS on faster CPUs. As a
reference, 30-50 KIOPS correspond to very high bandwidths with
sequential I/O (e.g., 8-12 GB/s if I/O requests are 256 KB large), and
to 120-200 MB/s with 4KB random I/O. BFQ has not yet been tested on
multi-queue devices.
The table of contents follow. Impatients can just jump to Section 3.
CONTENTS
1. When may BFQ be useful?
1-1 Personal systems
1-2 Server systems
2. How does BFQ work?
3. What are BFQ's tunable?
4. BFQ group scheduling
4-1 Service guarantees provided
4-2 Interface
1. When may BFQ be useful?
==========================
BFQ provides the following benefits on personal and server systems.
1-1 Personal systems
--------------------
Low latency for interactive applications
Regardless of the actual background workload, BFQ guarantees that, for
interactive tasks, the storage device is virtually as responsive as if
it was idle. For example, even if one or more of the following
background workloads are being executed:
- one or more large files are being read, written or copied,
- a tree of source files is being compiled,
- one or more virtual machines are performing I/O,
- a software update is in progress,
- indexing daemons are scanning filesystems and updating their
databases,
starting an application or loading a file from within an application
takes about the same time as if the storage device was idle. As a
comparison, with CFQ, NOOP or DEADLINE, and in the same conditions,
applications experience high latencies, or even become unresponsive
until the background workload terminates (also on SSDs).
Low latency for soft real-time applications
Also soft real-time applications, such as audio and video
players/streamers, enjoy a low latency and a low drop rate, regardless
of the background I/O workload. As a consequence, these applications
do not suffer from almost any glitch due to the background workload.
Higher speed for code-development tasks
If some additional workload happens to be executed in parallel, then
BFQ executes the I/O-related components of typical code-development
tasks (compilation, checkout, merge, ...) much more quickly than CFQ,
NOOP or DEADLINE.
High throughput
On hard disks, BFQ achieves up to 30% higher throughput than CFQ, and
up to 150% higher throughput than DEADLINE and NOOP, with all the
sequential workloads considered in our tests. With random workloads,
and with all the workloads on flash-based devices, BFQ achieves,
instead, about the same throughput as the other schedulers.
Strong fairness, bandwidth and delay guarantees
BFQ distributes the device throughput, and not just the device time,
among I/O-bound applications in proportion their weights, with any
workload and regardless of the device parameters. From these bandwidth
guarantees, it is possible to compute tight per-I/O-request delay
guarantees by a simple formula. If not configured for strict service
guarantees, BFQ switches to time-based resource sharing (only) for
applications that would otherwise cause a throughput loss.
1-2 Server systems
------------------
Most benefits for server systems follow from the same service
properties as above. In particular, regardless of whether additional,
possibly heavy workloads are being served, BFQ guarantees:
. audio and video-streaming with zero or very low jitter and drop
rate;
. fast retrieval of WEB pages and embedded objects;
. real-time recording of data in live-dumping applications (e.g.,
packet logging);
. responsiveness in local and remote access to a server.
2. How does BFQ work?
=====================
BFQ is a proportional-share I/O scheduler, whose general structure,
plus a lot of code, are borrowed from CFQ.
- Each process doing I/O on a device is associated with a weight and a
(bfq_)queue.
- BFQ grants exclusive access to the device, for a while, to one queue
(process) at a time, and implements this service model by
associating every queue with a budget, measured in number of
sectors.
- After a queue is granted access to the device, the budget of the
queue is decremented, on each request dispatch, by the size of the
request.
- The in-service queue is expired, i.e., its service is suspended,
only if one of the following events occurs: 1) the queue finishes
its budget, 2) the queue empties, 3) a "budget timeout" fires.
- The budget timeout prevents processes doing random I/O from
holding the device for too long and dramatically reducing
throughput.
- Actually, as in CFQ, a queue associated with a process issuing
sync requests may not be expired immediately when it empties. In
contrast, BFQ may idle the device for a short time interval,
giving the process the chance to go on being served if it issues
a new request in time. Device idling typically boosts the
throughput on rotational devices, if processes do synchronous
and sequential I/O. In addition, under BFQ, device idling is
also instrumental in guaranteeing the desired throughput
fraction to processes issuing sync requests (see the description
of the slice_idle tunable in this document, or [1, 2], for more
details).
- With respect to idling for service guarantees, if several
processes are competing for the device at the same time, but
all processes (and groups, after the following commit) have
the same weight, then BFQ guarantees the expected throughput
distribution without ever idling the device. Throughput is
thus as high as possible in this common scenario.
- If low-latency mode is enabled (default configuration), BFQ
executes some special heuristics to detect interactive and soft
real-time applications (e.g., video or audio players/streamers),
and to reduce their latency. The most important action taken to
achieve this goal is to give to the queues associated with these
applications more than their fair share of the device
throughput. For brevity, we call just "weight-raising" the whole
sets of actions taken by BFQ to privilege these queues. In
particular, BFQ provides a milder form of weight-raising for
interactive applications, and a stronger form for soft real-time
applications.
- BFQ automatically deactivates idling for queues born in a burst of
queue creations. In fact, these queues are usually associated with
the processes of applications and services that benefit mostly
from a high throughput. Examples are systemd during boot, or git
grep.
- As CFQ, BFQ merges queues performing interleaved I/O, i.e.,
performing random I/O that becomes mostly sequential if
merged. Differently from CFQ, BFQ achieves this goal with a more
reactive mechanism, called Early Queue Merge (EQM). EQM is so
responsive in detecting interleaved I/O (cooperating processes),
that it enables BFQ to achieve a high throughput, by queue
merging, even for queues for which CFQ needs a different
mechanism, preemption, to get a high throughput. As such EQM is a
unified mechanism to achieve a high throughput with interleaved
I/O.
- Queues are scheduled according to a variant of WF2Q+, named
B-WF2Q+, and implemented using an augmented rb-tree to preserve an
O(log N) overall complexity. See [2] for more details. B-WF2Q+ is
also ready for hierarchical scheduling. However, for a cleaner
logical breakdown, the code that enables and completes
hierarchical support is provided in the next commit, which focuses
exactly on this feature.
- B-WF2Q+ guarantees a tight deviation with respect to an ideal,
perfectly fair, and smooth service. In particular, B-WF2Q+
guarantees that each queue receives a fraction of the device
throughput proportional to its weight, even if the throughput
fluctuates, and regardless of: the device parameters, the current
workload and the budgets assigned to the queue.
- The last, budget-independence, property (although probably
counterintuitive in the first place) is definitely beneficial, for
the following reasons:
- First, with any proportional-share scheduler, the maximum
deviation with respect to an ideal service is proportional to
the maximum budget (slice) assigned to queues. As a consequence,
BFQ can keep this deviation tight not only because of the
accurate service of B-WF2Q+, but also because BFQ *does not*
need to assign a larger budget to a queue to let the queue
receive a higher fraction of the device throughput.
- Second, BFQ is free to choose, for every process (queue), the
budget that best fits the needs of the process, or best
leverages the I/O pattern of the process. In particular, BFQ
updates queue budgets with a simple feedback-loop algorithm that
allows a high throughput to be achieved, while still providing
tight latency guarantees to time-sensitive applications. When
the in-service queue expires, this algorithm computes the next
budget of the queue so as to:
- Let large budgets be eventually assigned to the queues
associated with I/O-bound applications performing sequential
I/O: in fact, the longer these applications are served once
got access to the device, the higher the throughput is.
- Let small budgets be eventually assigned to the queues
associated with time-sensitive applications (which typically
perform sporadic and short I/O), because, the smaller the
budget assigned to a queue waiting for service is, the sooner
B-WF2Q+ will serve that queue (Subsec 3.3 in [2]).
- If several processes are competing for the device at the same time,
but all processes and groups have the same weight, then BFQ
guarantees the expected throughput distribution without ever idling
the device. It uses preemption instead. Throughput is then much
higher in this common scenario.
- ioprio classes are served in strict priority order, i.e.,
lower-priority queues are not served as long as there are
higher-priority queues. Among queues in the same class, the
bandwidth is distributed in proportion to the weight of each
queue. A very thin extra bandwidth is however guaranteed to
the Idle class, to prevent it from starving.
3. What are BFQ's tunable?
==========================
The tunables back_seek-max, back_seek_penalty, fifo_expire_async and
fifo_expire_sync below are the same as in CFQ. Their description is
just copied from that for CFQ. Some considerations in the description
of slice_idle are copied from CFQ too.
per-process ioprio and weight
-----------------------------
Unless the cgroups interface is used (see "4. BFQ group scheduling"),
weights can be assigned to processes only indirectly, through I/O
priorities, and according to the relation:
weight = (IOPRIO_BE_NR - ioprio) * 10.
Beware that, if low-latency is set, then BFQ automatically raises the
weight of the queues associated with interactive and soft real-time
applications. Unset this tunable if you need/want to control weights.
slice_idle
----------
This parameter specifies how long BFQ should idle for next I/O
request, when certain sync BFQ queues become empty. By default
slice_idle is a non-zero value. Idling has a double purpose: boosting
throughput and making sure that the desired throughput distribution is
respected (see the description of how BFQ works, and, if needed, the
papers referred there).
As for throughput, idling can be very helpful on highly seeky media
like single spindle SATA/SAS disks where we can cut down on overall
number of seeks and see improved throughput.
Setting slice_idle to 0 will remove all the idling on queues and one
should see an overall improved throughput on faster storage devices
like multiple SATA/SAS disks in hardware RAID configuration.
So depending on storage and workload, it might be useful to set
slice_idle=0. In general for SATA/SAS disks and software RAID of
SATA/SAS disks keeping slice_idle enabled should be useful. For any
configurations where there are multiple spindles behind single LUN
(Host based hardware RAID controller or for storage arrays), setting
slice_idle=0 might end up in better throughput and acceptable
latencies.
Idling is however necessary to have service guarantees enforced in
case of differentiated weights or differentiated I/O-request lengths.
To see why, suppose that a given BFQ queue A must get several I/O
requests served for each request served for another queue B. Idling
ensures that, if A makes a new I/O request slightly after becoming
empty, then no request of B is dispatched in the middle, and thus A
does not lose the possibility to get more than one request dispatched
before the next request of B is dispatched. Note that idling
guarantees the desired differentiated treatment of queues only in
terms of I/O-request dispatches. To guarantee that the actual service
order then corresponds to the dispatch order, the strict_guarantees
tunable must be set too.
There is an important flipside for idling: apart from the above cases
where it is beneficial also for throughput, idling can severely impact
throughput. One important case is random workload. Because of this
issue, BFQ tends to avoid idling as much as possible, when it is not
beneficial also for throughput. As a consequence of this behavior, and
of further issues described for the strict_guarantees tunable,
short-term service guarantees may be occasionally violated. And, in
some cases, these guarantees may be more important than guaranteeing
maximum throughput. For example, in video playing/streaming, a very
low drop rate may be more important than maximum throughput. In these
cases, consider setting the strict_guarantees parameter.
strict_guarantees
-----------------
If this parameter is set (default: unset), then BFQ
- always performs idling when the in-service queue becomes empty;
- forces the device to serve one I/O request at a time, by dispatching a
new request only if there is no outstanding request.
In the presence of differentiated weights or I/O-request sizes, both
the above conditions are needed to guarantee that every BFQ queue
receives its allotted share of the bandwidth. The first condition is
needed for the reasons explained in the description of the slice_idle
tunable. The second condition is needed because all modern storage
devices reorder internally-queued requests, which may trivially break
the service guarantees enforced by the I/O scheduler.
Setting strict_guarantees may evidently affect throughput.
back_seek_max
-------------
This specifies, given in Kbytes, the maximum "distance" for backward seeking.
The distance is the amount of space from the current head location to the
sectors that are backward in terms of distance.
This parameter allows the scheduler to anticipate requests in the "backward"
direction and consider them as being the "next" if they are within this
distance from the current head location.
back_seek_penalty
-----------------
This parameter is used to compute the cost of backward seeking. If the
backward distance of request is just 1/back_seek_penalty from a "front"
request, then the seeking cost of two requests is considered equivalent.
So scheduler will not bias toward one or the other request (otherwise scheduler
will bias toward front request). Default value of back_seek_penalty is 2.
fifo_expire_async
-----------------
This parameter is used to set the timeout of asynchronous requests. Default
value of this is 248ms.
fifo_expire_sync
----------------
This parameter is used to set the timeout of synchronous requests. Default
value of this is 124ms. In case to favor synchronous requests over asynchronous
one, this value should be decreased relative to fifo_expire_async.
low_latency
-----------
This parameter is used to enable/disable BFQ's low latency mode. By
default, low latency mode is enabled. If enabled, interactive and soft
real-time applications are privileged and experience a lower latency,
as explained in more detail in the description of how BFQ works.
DO NOT enable this mode if you need full control on bandwidth
distribution. In fact, if it is enabled, then BFQ automatically
increases the bandwidth share of privileged applications, as the main
means to guarantee a lower latency to them.
timeout_sync
------------
Maximum amount of device time that can be given to a task (queue) once
it has been selected for service. On devices with costly seeks,
increasing this time usually increases maximum throughput. On the
opposite end, increasing this time coarsens the granularity of the
short-term bandwidth and latency guarantees, especially if the
following parameter is set to zero.
max_budget
----------
Maximum amount of service, measured in sectors, that can be provided
to a BFQ queue once it is set in service (of course within the limits
of the above timeout). According to what said in the description of
the algorithm, larger values increase the throughput in proportion to
the percentage of sequential I/O requests issued. The price of larger
values is that they coarsen the granularity of short-term bandwidth
and latency guarantees.
The default value is 0, which enables auto-tuning: BFQ sets max_budget
to the maximum number of sectors that can be served during
timeout_sync, according to the estimated peak rate.
weights
-------
Read-only parameter, used to show the weights of the currently active
BFQ queues.
wr_ tunables
------------
BFQ exports a few parameters to control/tune the behavior of
low-latency heuristics.
wr_coeff
Factor by which the weight of a weight-raised queue is multiplied. If
the queue is deemed soft real-time, then the weight is further
multiplied by an additional, constant factor.
wr_max_time
Maximum duration of a weight-raising period for an interactive task
(ms). If set to zero (default value), then this value is computed
automatically, as a function of the peak rate of the device. In any
case, when the value of this parameter is read, it always reports the
current duration, regardless of whether it has been set manually or
computed automatically.
wr_max_softrt_rate
Maximum service rate below which a queue is deemed to be associated
with a soft real-time application, and is then weight-raised
accordingly (sectors/sec).
wr_min_idle_time
Minimum idle period after which interactive weight-raising may be
reactivated for a queue (in ms).
wr_rt_max_time
Maximum weight-raising duration for soft real-time queues (in ms). The
start time from which this duration is considered is automatically
moved forward if the queue is detected to be still soft real-time
before the current soft real-time weight-raising period finishes.
wr_min_inter_arr_async
Minimum period between I/O request arrivals after which weight-raising
may be reactivated for an already busy async queue (in ms).
4. Group scheduling with BFQ
============================
BFQ supports both cgroups-v1 and cgroups-v2 io controllers, namely
blkio and io. In particular, BFQ supports weight-based proportional
share. To activate cgroups support, set BFQ_GROUP_IOSCHED.
4-1 Service guarantees provided
-------------------------------
With BFQ, proportional share means true proportional share of the
device bandwidth, according to group weights. For example, a group
with weight 200 gets twice the bandwidth, and not just twice the time,
of a group with weight 100.
BFQ supports hierarchies (group trees) of any depth. Bandwidth is
distributed among groups and processes in the expected way: for each
group, the children of the group share the whole bandwidth of the
group in proportion to their weights. In particular, this implies
that, for each leaf group, every process of the group receives the
same share of the whole group bandwidth, unless the ioprio of the
process is modified.
The resource-sharing guarantee for a group may partially or totally
switch from bandwidth to time, if providing bandwidth guarantees to
the group lowers the throughput too much. This switch occurs on a
per-process basis: if a process of a leaf group causes throughput loss
if served in such a way to receive its share of the bandwidth, then
BFQ switches back to just time-based proportional share for that
process.
4-2 Interface
-------------
To get proportional sharing of bandwidth with BFQ for a given device,
BFQ must of course be the active scheduler for that device.
Within each group directory, the names of the files associated with
BFQ-specific cgroup parameters and stats begin with the "bfq."
prefix. So, with cgroups-v1 or cgroups-v2, the full prefix for
BFQ-specific files is "blkio.bfq." or "io.bfq." For example, the group
parameter to set the weight of a group with BFQ is blkio.bfq.weight
or io.bfq.weight.
Parameters to set
-----------------
For each group, there is only the following parameter to set.
weight (namely blkio.bfq.weight or io.bfq-weight): the weight of the
group inside its parent. Available values: 1..10000 (default 100). The
linear mapping between ioprio and weights, described at the beginning
of the tunable section, is still valid, but all weights higher than
IOPRIO_BE_NR*10 are mapped to ioprio 0.
Recall that, if low-latency is set, then BFQ automatically raises the
weight of the queues associated with interactive and soft real-time
applications. Unset this tunable if you need/want to control weights.
[1] P. Valente, A. Avanzini, "Evolution of the BFQ Storage I/O
Scheduler", Proceedings of the First Workshop on Mobile System
Technologies (MST-2015), May 2015.
http://algogroup.unimore.it/people/paolo/disk_sched/mst-2015.pdf
[2] P. Valente and M. Andreolini, "Improving Application
Responsiveness with the BFQ Disk I/O Scheduler", Proceedings of
the 5th Annual International Systems and Storage Conference
(SYSTOR '12), June 2012.
Slightly extended version:
http://algogroup.unimore.it/people/paolo/disk_sched/bfq-v1-suite-
results.pdf

View File

@ -0,0 +1,14 @@
Kyber I/O scheduler tunables
===========================
The only two tunables for the Kyber scheduler are the target latencies for
reads and synchronous writes. Kyber will throttle requests in order to meet
these target latencies.
read_lat_nsec
-------------
Target latency for reads (in nanoseconds).
write_lat_nsec
--------------
Target latency for synchronous writes (in nanoseconds).

View File

@ -43,11 +43,6 @@ large discards are issued, setting this value lower will make Linux issue
smaller discards and potentially help reduce latencies induced by large
discard operations.
discard_zeroes_data (RO)
------------------------
When read, this file will show if the discarded block are zeroed by the
device or not. If its value is '1' the blocks are zeroed otherwise not.
hw_sector_size (RO)
-------------------
This is the hardware sector size of the device, in bytes.
@ -192,5 +187,11 @@ scaling back writes. Writing a value of '0' to this file disables the
feature. Writing a value of '-1' to this file resets the value to the
default setting.
throttle_sample_time (RW)
-------------------------
This is the time window that blk-throttle samples data, in millisecond.
blk-throttle makes decision based on the samplings. Lower time means cgroups
have more smooth throughput, but higher CPU overhead. This exists only when
CONFIG_BLK_DEV_THROTTLING_LOW is enabled.
Jens Axboe <jens.axboe@oracle.com>, February 2009

View File

@ -1,84 +0,0 @@
This document describes m[g]flash support in linux.
Contents
1. Overview
2. Reserved area configuration
3. Example of mflash platform driver registration
1. Overview
Mflash and gflash are embedded flash drive. The only difference is mflash is
MCP(Multi Chip Package) device. These two device operate exactly same way.
So the rest mflash repersents mflash and gflash altogether.
Internally, mflash has nand flash and other hardware logics and supports
2 different operation (ATA, IO) modes. ATA mode doesn't need any new
driver and currently works well under standard IDE subsystem. Actually it's
one chip SSD. IO mode is ATA-like custom mode for the host that doesn't have
IDE interface.
Following are brief descriptions about IO mode.
A. IO mode based on ATA protocol and uses some custom command. (read confirm,
write confirm)
B. IO mode uses SRAM bus interface.
C. IO mode supports 4kB boot area, so host can boot from mflash.
2. Reserved area configuration
If host boot from mflash, usually needs raw area for boot loader image. All of
the mflash's block device operation will be taken this value as start offset.
Note that boot loader's size of reserved area and kernel configuration value
must be same.
3. Example of mflash platform driver registration
Working mflash is very straight forward. Adding platform device stuff to board
configuration file is all. Here is some pseudo example.
static struct mg_drv_data mflash_drv_data = {
/* If you want to polling driver set to 1 */
.use_polling = 0,
/* device attribution */
.dev_attr = MG_BOOT_DEV
};
static struct resource mg_mflash_rsc[] = {
/* Base address of mflash */
[0] = {
.start = 0x08000000,
.end = 0x08000000 + SZ_64K - 1,
.flags = IORESOURCE_MEM
},
/* mflash interrupt pin */
[1] = {
.start = IRQ_GPIO(84),
.end = IRQ_GPIO(84),
.flags = IORESOURCE_IRQ
},
/* mflash reset pin */
[2] = {
.start = 43,
.end = 43,
.name = MG_RST_PIN,
.flags = IORESOURCE_IO
},
/* mflash reset-out pin
* If you use mflash as storage device (i.e. other than MG_BOOT_DEV),
* should assign this */
[3] = {
.start = 51,
.end = 51,
.name = MG_RSTOUT_PIN,
.flags = IORESOURCE_IO
}
};
static struct platform_device mflash_dev = {
.name = MG_DEV_NAME,
.id = -1,
.dev = {
.platform_data = &mflash_drv_data,
},
.num_resources = ARRAY_SIZE(mg_mflash_rsc),
.resource = mg_mflash_rsc
};
platform_device_register(&mflash_dev);

View File

@ -0,0 +1,21 @@
pblk: Physical Block Device Target
==================================
pblk implements a fully associative, host-based FTL that exposes a traditional
block I/O interface. Its primary responsibilities are:
- Map logical addresses onto physical addresses (4KB granularity) in a
logical-to-physical (L2P) table.
- Maintain the integrity and consistency of the L2P table as well as its
recovery from normal tear down and power outage.
- Deal with controller- and media-specific constrains.
- Handle I/O errors.
- Implement garbage collection.
- Maintain consistency across the I/O stack during synchronization points.
For more information please refer to:
http://lightnvm.io
which maintains updated FAQs, manual pages, technical documentation, tools,
contacts, etc.

View File

@ -2544,6 +2544,14 @@ F: block/
F: kernel/trace/blktrace.c
F: lib/sbitmap.c
BFQ I/O SCHEDULER
M: Paolo Valente <paolo.valente@linaro.org>
M: Jens Axboe <axboe@kernel.dk>
L: linux-block@vger.kernel.org
S: Maintained
F: block/bfq-*
F: Documentation/block/bfq-iosched.txt
BLOCK2MTD DRIVER
M: Joern Engel <joern@lazybastard.org>
L: linux-mtd@lists.infradead.org

View File

@ -115,6 +115,18 @@ config BLK_DEV_THROTTLING
See Documentation/cgroups/blkio-controller.txt for more information.
config BLK_DEV_THROTTLING_LOW
bool "Block throttling .low limit interface support (EXPERIMENTAL)"
depends on BLK_DEV_THROTTLING
default n
---help---
Add .low limit interface for block throttling. The low limit is a best
effort limit to prioritize cgroups. Depending on the setting, the limit
can be used to protect cgroups in terms of bandwidth/iops and better
utilize disk resource.
Note, this is an experimental interface and could be changed someday.
config BLK_CMDLINE_PARSER
bool "Block device command line partition parser"
default n

View File

@ -40,6 +40,7 @@ config CFQ_GROUP_IOSCHED
Enable group IO scheduling in CFQ.
choice
prompt "Default I/O scheduler"
default DEFAULT_CFQ
help
@ -69,6 +70,35 @@ config MQ_IOSCHED_DEADLINE
---help---
MQ version of the deadline IO scheduler.
config MQ_IOSCHED_KYBER
tristate "Kyber I/O scheduler"
default y
---help---
The Kyber I/O scheduler is a low-overhead scheduler suitable for
multiqueue and other fast devices. Given target latencies for reads and
synchronous writes, it will self-tune queue depths to achieve that
goal.
config IOSCHED_BFQ
tristate "BFQ I/O scheduler"
default n
---help---
BFQ I/O scheduler for BLK-MQ. BFQ distributes the bandwidth of
of the device among all processes according to their weights,
regardless of the device parameters and with any workload. It
also guarantees a low latency to interactive and soft
real-time applications. Details in
Documentation/block/bfq-iosched.txt
config BFQ_GROUP_IOSCHED
bool "BFQ hierarchical scheduling support"
depends on IOSCHED_BFQ && BLK_CGROUP
default n
---help---
Enable hierarchical scheduling in BFQ, using the blkio
(cgroups-v1) or io (cgroups-v2) controller.
endmenu
endif

View File

@ -20,6 +20,9 @@ obj-$(CONFIG_IOSCHED_NOOP) += noop-iosched.o
obj-$(CONFIG_IOSCHED_DEADLINE) += deadline-iosched.o
obj-$(CONFIG_IOSCHED_CFQ) += cfq-iosched.o
obj-$(CONFIG_MQ_IOSCHED_DEADLINE) += mq-deadline.o
obj-$(CONFIG_MQ_IOSCHED_KYBER) += kyber-iosched.o
bfq-y := bfq-iosched.o bfq-wf2q.o bfq-cgroup.o
obj-$(CONFIG_IOSCHED_BFQ) += bfq.o
obj-$(CONFIG_BLOCK_COMPAT) += compat_ioctl.o
obj-$(CONFIG_BLK_CMDLINE_PARSER) += cmdline-parser.o

1139
block/bfq-cgroup.c Normal file

File diff suppressed because it is too large Load Diff

5047
block/bfq-iosched.c Normal file

File diff suppressed because it is too large Load Diff

941
block/bfq-iosched.h Normal file
View File

@ -0,0 +1,941 @@
/*
* Header file for the BFQ I/O scheduler: data structures and
* prototypes of interface functions among BFQ components.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License as
* published by the Free Software Foundation; either version 2 of the
* License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*/
#ifndef _BFQ_H
#define _BFQ_H
#include <linux/blktrace_api.h>
#include <linux/hrtimer.h>
#include <linux/blk-cgroup.h>
#define BFQ_IOPRIO_CLASSES 3
#define BFQ_CL_IDLE_TIMEOUT (HZ/5)
#define BFQ_MIN_WEIGHT 1
#define BFQ_MAX_WEIGHT 1000
#define BFQ_WEIGHT_CONVERSION_COEFF 10
#define BFQ_DEFAULT_QUEUE_IOPRIO 4
#define BFQ_WEIGHT_LEGACY_DFL 100
#define BFQ_DEFAULT_GRP_IOPRIO 0
#define BFQ_DEFAULT_GRP_CLASS IOPRIO_CLASS_BE
/*
* Soft real-time applications are extremely more latency sensitive
* than interactive ones. Over-raise the weight of the former to
* privilege them against the latter.
*/
#define BFQ_SOFTRT_WEIGHT_FACTOR 100
struct bfq_entity;
/**
* struct bfq_service_tree - per ioprio_class service tree.
*
* Each service tree represents a B-WF2Q+ scheduler on its own. Each
* ioprio_class has its own independent scheduler, and so its own
* bfq_service_tree. All the fields are protected by the queue lock
* of the containing bfqd.
*/
struct bfq_service_tree {
/* tree for active entities (i.e., those backlogged) */
struct rb_root active;
/* tree for idle entities (i.e., not backlogged, with V <= F_i)*/
struct rb_root idle;
/* idle entity with minimum F_i */
struct bfq_entity *first_idle;
/* idle entity with maximum F_i */
struct bfq_entity *last_idle;
/* scheduler virtual time */
u64 vtime;
/* scheduler weight sum; active and idle entities contribute to it */
unsigned long wsum;
};
/**
* struct bfq_sched_data - multi-class scheduler.
*
* bfq_sched_data is the basic scheduler queue. It supports three
* ioprio_classes, and can be used either as a toplevel queue or as an
* intermediate queue on a hierarchical setup. @next_in_service
* points to the active entity of the sched_data service trees that
* will be scheduled next. It is used to reduce the number of steps
* needed for each hierarchical-schedule update.
*
* The supported ioprio_classes are the same as in CFQ, in descending
* priority order, IOPRIO_CLASS_RT, IOPRIO_CLASS_BE, IOPRIO_CLASS_IDLE.
* Requests from higher priority queues are served before all the
* requests from lower priority queues; among requests of the same
* queue requests are served according to B-WF2Q+.
* All the fields are protected by the queue lock of the containing bfqd.
*/
struct bfq_sched_data {
/* entity in service */
struct bfq_entity *in_service_entity;
/* head-of-line entity (see comments above) */
struct bfq_entity *next_in_service;
/* array of service trees, one per ioprio_class */
struct bfq_service_tree service_tree[BFQ_IOPRIO_CLASSES];
/* last time CLASS_IDLE was served */
unsigned long bfq_class_idle_last_service;
};
/**
* struct bfq_weight_counter - counter of the number of all active entities
* with a given weight.
*/
struct bfq_weight_counter {
unsigned int weight; /* weight of the entities this counter refers to */
unsigned int num_active; /* nr of active entities with this weight */
/*
* Weights tree member (see bfq_data's @queue_weights_tree and
* @group_weights_tree)
*/
struct rb_node weights_node;
};
/**
* struct bfq_entity - schedulable entity.
*
* A bfq_entity is used to represent either a bfq_queue (leaf node in the
* cgroup hierarchy) or a bfq_group into the upper level scheduler. Each
* entity belongs to the sched_data of the parent group in the cgroup
* hierarchy. Non-leaf entities have also their own sched_data, stored
* in @my_sched_data.
*
* Each entity stores independently its priority values; this would
* allow different weights on different devices, but this
* functionality is not exported to userspace by now. Priorities and
* weights are updated lazily, first storing the new values into the
* new_* fields, then setting the @prio_changed flag. As soon as
* there is a transition in the entity state that allows the priority
* update to take place the effective and the requested priority
* values are synchronized.
*
* Unless cgroups are used, the weight value is calculated from the
* ioprio to export the same interface as CFQ. When dealing with
* ``well-behaved'' queues (i.e., queues that do not spend too much
* time to consume their budget and have true sequential behavior, and
* when there are no external factors breaking anticipation) the
* relative weights at each level of the cgroups hierarchy should be
* guaranteed. All the fields are protected by the queue lock of the
* containing bfqd.
*/
struct bfq_entity {
/* service_tree member */
struct rb_node rb_node;
/* pointer to the weight counter associated with this entity */
struct bfq_weight_counter *weight_counter;
/*
* Flag, true if the entity is on a tree (either the active or
* the idle one of its service_tree) or is in service.
*/
bool on_st;
/* B-WF2Q+ start and finish timestamps [sectors/weight] */
u64 start, finish;
/* tree the entity is enqueued into; %NULL if not on a tree */
struct rb_root *tree;
/*
* minimum start time of the (active) subtree rooted at this
* entity; used for O(log N) lookups into active trees
*/
u64 min_start;
/* amount of service received during the last service slot */
int service;
/* budget, used also to calculate F_i: F_i = S_i + @budget / @weight */
int budget;
/* weight of the queue */
int weight;
/* next weight if a change is in progress */
int new_weight;
/* original weight, used to implement weight boosting */
int orig_weight;
/* parent entity, for hierarchical scheduling */
struct bfq_entity *parent;
/*
* For non-leaf nodes in the hierarchy, the associated
* scheduler queue, %NULL on leaf nodes.
*/
struct bfq_sched_data *my_sched_data;
/* the scheduler queue this entity belongs to */
struct bfq_sched_data *sched_data;
/* flag, set to request a weight, ioprio or ioprio_class change */
int prio_changed;
};
struct bfq_group;
/**
* struct bfq_ttime - per process thinktime stats.
*/
struct bfq_ttime {
/* completion time of the last request */
u64 last_end_request;
/* total process thinktime */
u64 ttime_total;
/* number of thinktime samples */
unsigned long ttime_samples;
/* average process thinktime */
u64 ttime_mean;
};
/**
* struct bfq_queue - leaf schedulable entity.
*
* A bfq_queue is a leaf request queue; it can be associated with an
* io_context or more, if it is async or shared between cooperating
* processes. @cgroup holds a reference to the cgroup, to be sure that it
* does not disappear while a bfqq still references it (mostly to avoid
* races between request issuing and task migration followed by cgroup
* destruction).
* All the fields are protected by the queue lock of the containing bfqd.
*/
struct bfq_queue {
/* reference counter */
int ref;
/* parent bfq_data */
struct bfq_data *bfqd;
/* current ioprio and ioprio class */
unsigned short ioprio, ioprio_class;
/* next ioprio and ioprio class if a change is in progress */
unsigned short new_ioprio, new_ioprio_class;
/*
* Shared bfq_queue if queue is cooperating with one or more
* other queues.
*/
struct bfq_queue *new_bfqq;
/* request-position tree member (see bfq_group's @rq_pos_tree) */
struct rb_node pos_node;
/* request-position tree root (see bfq_group's @rq_pos_tree) */
struct rb_root *pos_root;
/* sorted list of pending requests */
struct rb_root sort_list;
/* if fifo isn't expired, next request to serve */
struct request *next_rq;
/* number of sync and async requests queued */
int queued[2];
/* number of requests currently allocated */
int allocated;
/* number of pending metadata requests */
int meta_pending;
/* fifo list of requests in sort_list */
struct list_head fifo;
/* entity representing this queue in the scheduler */
struct bfq_entity entity;
/* maximum budget allowed from the feedback mechanism */
int max_budget;
/* budget expiration (in jiffies) */
unsigned long budget_timeout;
/* number of requests on the dispatch list or inside driver */
int dispatched;
/* status flags */
unsigned long flags;
/* node for active/idle bfqq list inside parent bfqd */
struct list_head bfqq_list;
/* associated @bfq_ttime struct */
struct bfq_ttime ttime;
/* bit vector: a 1 for each seeky requests in history */
u32 seek_history;
/* node for the device's burst list */
struct hlist_node burst_list_node;
/* position of the last request enqueued */
sector_t last_request_pos;
/* Number of consecutive pairs of request completion and
* arrival, such that the queue becomes idle after the
* completion, but the next request arrives within an idle
* time slice; used only if the queue's IO_bound flag has been
* cleared.
*/
unsigned int requests_within_timer;
/* pid of the process owning the queue, used for logging purposes */
pid_t pid;
/*
* Pointer to the bfq_io_cq owning the bfq_queue, set to %NULL
* if the queue is shared.
*/
struct bfq_io_cq *bic;
/* current maximum weight-raising time for this queue */
unsigned long wr_cur_max_time;
/*
* Minimum time instant such that, only if a new request is
* enqueued after this time instant in an idle @bfq_queue with
* no outstanding requests, then the task associated with the
* queue it is deemed as soft real-time (see the comments on
* the function bfq_bfqq_softrt_next_start())
*/
unsigned long soft_rt_next_start;
/*
* Start time of the current weight-raising period if
* the @bfq-queue is being weight-raised, otherwise
* finish time of the last weight-raising period.
*/
unsigned long last_wr_start_finish;
/* factor by which the weight of this queue is multiplied */
unsigned int wr_coeff;
/*
* Time of the last transition of the @bfq_queue from idle to
* backlogged.
*/
unsigned long last_idle_bklogged;
/*
* Cumulative service received from the @bfq_queue since the
* last transition from idle to backlogged.
*/
unsigned long service_from_backlogged;
/*
* Value of wr start time when switching to soft rt
*/
unsigned long wr_start_at_switch_to_srt;
unsigned long split_time; /* time of last split */
};
/**
* struct bfq_io_cq - per (request_queue, io_context) structure.
*/
struct bfq_io_cq {
/* associated io_cq structure */
struct io_cq icq; /* must be the first member */
/* array of two process queues, the sync and the async */
struct bfq_queue *bfqq[2];
/* per (request_queue, blkcg) ioprio */
int ioprio;
#ifdef CONFIG_BFQ_GROUP_IOSCHED
uint64_t blkcg_serial_nr; /* the current blkcg serial */
#endif
/*
* Snapshot of the idle window before merging; taken to
* remember this value while the queue is merged, so as to be
* able to restore it in case of split.
*/
bool saved_idle_window;
/*
* Same purpose as the previous two fields for the I/O bound
* classification of a queue.
*/
bool saved_IO_bound;
/*
* Same purpose as the previous fields for the value of the
* field keeping the queue's belonging to a large burst
*/
bool saved_in_large_burst;
/*
* True if the queue belonged to a burst list before its merge
* with another cooperating queue.
*/
bool was_in_burst_list;
/*
* Similar to previous fields: save wr information.
*/
unsigned long saved_wr_coeff;
unsigned long saved_last_wr_start_finish;
unsigned long saved_wr_start_at_switch_to_srt;
unsigned int saved_wr_cur_max_time;
struct bfq_ttime saved_ttime;
};
enum bfq_device_speed {
BFQ_BFQD_FAST,
BFQ_BFQD_SLOW,
};
/**
* struct bfq_data - per-device data structure.
*
* All the fields are protected by @lock.
*/
struct bfq_data {
/* device request queue */
struct request_queue *queue;
/* dispatch queue */
struct list_head dispatch;
/* root bfq_group for the device */
struct bfq_group *root_group;
/*
* rbtree of weight counters of @bfq_queues, sorted by
* weight. Used to keep track of whether all @bfq_queues have
* the same weight. The tree contains one counter for each
* distinct weight associated to some active and not
* weight-raised @bfq_queue (see the comments to the functions
* bfq_weights_tree_[add|remove] for further details).
*/
struct rb_root queue_weights_tree;
/*
* rbtree of non-queue @bfq_entity weight counters, sorted by
* weight. Used to keep track of whether all @bfq_groups have
* the same weight. The tree contains one counter for each
* distinct weight associated to some active @bfq_group (see
* the comments to the functions bfq_weights_tree_[add|remove]
* for further details).
*/
struct rb_root group_weights_tree;
/*
* Number of bfq_queues containing requests (including the
* queue in service, even if it is idling).
*/
int busy_queues;
/* number of weight-raised busy @bfq_queues */
int wr_busy_queues;
/* number of queued requests */
int queued;
/* number of requests dispatched and waiting for completion */
int rq_in_driver;
/*
* Maximum number of requests in driver in the last
* @hw_tag_samples completed requests.
*/
int max_rq_in_driver;
/* number of samples used to calculate hw_tag */
int hw_tag_samples;
/* flag set to one if the driver is showing a queueing behavior */
int hw_tag;
/* number of budgets assigned */
int budgets_assigned;
/*
* Timer set when idling (waiting) for the next request from
* the queue in service.
*/
struct hrtimer idle_slice_timer;
/* bfq_queue in service */
struct bfq_queue *in_service_queue;
/* on-disk position of the last served request */
sector_t last_position;
/* time of last request completion (ns) */
u64 last_completion;
/* time of first rq dispatch in current observation interval (ns) */
u64 first_dispatch;
/* time of last rq dispatch in current observation interval (ns) */
u64 last_dispatch;
/* beginning of the last budget */
ktime_t last_budget_start;
/* beginning of the last idle slice */
ktime_t last_idling_start;
/* number of samples in current observation interval */
int peak_rate_samples;
/* num of samples of seq dispatches in current observation interval */
u32 sequential_samples;
/* total num of sectors transferred in current observation interval */
u64 tot_sectors_dispatched;
/* max rq size seen during current observation interval (sectors) */
u32 last_rq_max_size;
/* time elapsed from first dispatch in current observ. interval (us) */
u64 delta_from_first;
/*
* Current estimate of the device peak rate, measured in
* [BFQ_RATE_SHIFT * sectors/usec]. The left-shift by
* BFQ_RATE_SHIFT is performed to increase precision in
* fixed-point calculations.
*/
u32 peak_rate;
/* maximum budget allotted to a bfq_queue before rescheduling */
int bfq_max_budget;
/* list of all the bfq_queues active on the device */
struct list_head active_list;
/* list of all the bfq_queues idle on the device */
struct list_head idle_list;
/*
* Timeout for async/sync requests; when it fires, requests
* are served in fifo order.
*/
u64 bfq_fifo_expire[2];
/* weight of backward seeks wrt forward ones */
unsigned int bfq_back_penalty;
/* maximum allowed backward seek */
unsigned int bfq_back_max;
/* maximum idling time */
u32 bfq_slice_idle;
/* user-configured max budget value (0 for auto-tuning) */
int bfq_user_max_budget;
/*
* Timeout for bfq_queues to consume their budget; used to
* prevent seeky queues from imposing long latencies to
* sequential or quasi-sequential ones (this also implies that
* seeky queues cannot receive guarantees in the service
* domain; after a timeout they are charged for the time they
* have been in service, to preserve fairness among them, but
* without service-domain guarantees).
*/
unsigned int bfq_timeout;
/*
* Number of consecutive requests that must be issued within
* the idle time slice to set again idling to a queue which
* was marked as non-I/O-bound (see the definition of the
* IO_bound flag for further details).
*/
unsigned int bfq_requests_within_timer;
/*
* Force device idling whenever needed to provide accurate
* service guarantees, without caring about throughput
* issues. CAVEAT: this may even increase latencies, in case
* of useless idling for processes that did stop doing I/O.
*/
bool strict_guarantees;
/*
* Last time at which a queue entered the current burst of
* queues being activated shortly after each other; for more
* details about this and the following parameters related to
* a burst of activations, see the comments on the function
* bfq_handle_burst.
*/
unsigned long last_ins_in_burst;
/*
* Reference time interval used to decide whether a queue has
* been activated shortly after @last_ins_in_burst.
*/
unsigned long bfq_burst_interval;
/* number of queues in the current burst of queue activations */
int burst_size;
/* common parent entity for the queues in the burst */
struct bfq_entity *burst_parent_entity;
/* Maximum burst size above which the current queue-activation
* burst is deemed as 'large'.
*/
unsigned long bfq_large_burst_thresh;
/* true if a large queue-activation burst is in progress */
bool large_burst;
/*
* Head of the burst list (as for the above fields, more
* details in the comments on the function bfq_handle_burst).
*/
struct hlist_head burst_list;
/* if set to true, low-latency heuristics are enabled */
bool low_latency;
/*
* Maximum factor by which the weight of a weight-raised queue
* is multiplied.
*/
unsigned int bfq_wr_coeff;
/* maximum duration of a weight-raising period (jiffies) */
unsigned int bfq_wr_max_time;
/* Maximum weight-raising duration for soft real-time processes */
unsigned int bfq_wr_rt_max_time;
/*
* Minimum idle period after which weight-raising may be
* reactivated for a queue (in jiffies).
*/
unsigned int bfq_wr_min_idle_time;
/*
* Minimum period between request arrivals after which
* weight-raising may be reactivated for an already busy async
* queue (in jiffies).
*/
unsigned long bfq_wr_min_inter_arr_async;
/* Max service-rate for a soft real-time queue, in sectors/sec */
unsigned int bfq_wr_max_softrt_rate;
/*
* Cached value of the product R*T, used for computing the
* maximum duration of weight raising automatically.
*/
u64 RT_prod;
/* device-speed class for the low-latency heuristic */
enum bfq_device_speed device_speed;
/* fallback dummy bfqq for extreme OOM conditions */
struct bfq_queue oom_bfqq;
spinlock_t lock;
/*
* bic associated with the task issuing current bio for
* merging. This and the next field are used as a support to
* be able to perform the bic lookup, needed by bio-merge
* functions, before the scheduler lock is taken, and thus
* avoid taking the request-queue lock while the scheduler
* lock is being held.
*/
struct bfq_io_cq *bio_bic;
/* bfqq associated with the task issuing current bio for merging */
struct bfq_queue *bio_bfqq;
};
enum bfqq_state_flags {
BFQQF_just_created = 0, /* queue just allocated */
BFQQF_busy, /* has requests or is in service */
BFQQF_wait_request, /* waiting for a request */
BFQQF_non_blocking_wait_rq, /*
* waiting for a request
* without idling the device
*/
BFQQF_fifo_expire, /* FIFO checked in this slice */
BFQQF_idle_window, /* slice idling enabled */
BFQQF_sync, /* synchronous queue */
BFQQF_IO_bound, /*
* bfqq has timed-out at least once
* having consumed at most 2/10 of
* its budget
*/
BFQQF_in_large_burst, /*
* bfqq activated in a large burst,
* see comments to bfq_handle_burst.
*/
BFQQF_softrt_update, /*
* may need softrt-next-start
* update
*/
BFQQF_coop, /* bfqq is shared */
BFQQF_split_coop /* shared bfqq will be split */
};
#define BFQ_BFQQ_FNS(name) \
void bfq_mark_bfqq_##name(struct bfq_queue *bfqq); \
void bfq_clear_bfqq_##name(struct bfq_queue *bfqq); \
int bfq_bfqq_##name(const struct bfq_queue *bfqq);
BFQ_BFQQ_FNS(just_created);
BFQ_BFQQ_FNS(busy);
BFQ_BFQQ_FNS(wait_request);
BFQ_BFQQ_FNS(non_blocking_wait_rq);
BFQ_BFQQ_FNS(fifo_expire);
BFQ_BFQQ_FNS(idle_window);
BFQ_BFQQ_FNS(sync);
BFQ_BFQQ_FNS(IO_bound);
BFQ_BFQQ_FNS(in_large_burst);
BFQ_BFQQ_FNS(coop);
BFQ_BFQQ_FNS(split_coop);
BFQ_BFQQ_FNS(softrt_update);
#undef BFQ_BFQQ_FNS
/* Expiration reasons. */
enum bfqq_expiration {
BFQQE_TOO_IDLE = 0, /*
* queue has been idling for
* too long
*/
BFQQE_BUDGET_TIMEOUT, /* budget took too long to be used */
BFQQE_BUDGET_EXHAUSTED, /* budget consumed */
BFQQE_NO_MORE_REQUESTS, /* the queue has no more requests */
BFQQE_PREEMPTED /* preemption in progress */
};
struct bfqg_stats {
#ifdef CONFIG_BFQ_GROUP_IOSCHED
/* number of ios merged */
struct blkg_rwstat merged;
/* total time spent on device in ns, may not be accurate w/ queueing */
struct blkg_rwstat service_time;
/* total time spent waiting in scheduler queue in ns */
struct blkg_rwstat wait_time;
/* number of IOs queued up */
struct blkg_rwstat queued;
/* total disk time and nr sectors dispatched by this group */
struct blkg_stat time;
/* sum of number of ios queued across all samples */
struct blkg_stat avg_queue_size_sum;
/* count of samples taken for average */
struct blkg_stat avg_queue_size_samples;
/* how many times this group has been removed from service tree */
struct blkg_stat dequeue;
/* total time spent waiting for it to be assigned a timeslice. */
struct blkg_stat group_wait_time;
/* time spent idling for this blkcg_gq */
struct blkg_stat idle_time;
/* total time with empty current active q with other requests queued */
struct blkg_stat empty_time;
/* fields after this shouldn't be cleared on stat reset */
uint64_t start_group_wait_time;
uint64_t start_idle_time;
uint64_t start_empty_time;
uint16_t flags;
#endif /* CONFIG_BFQ_GROUP_IOSCHED */
};
#ifdef CONFIG_BFQ_GROUP_IOSCHED
/*
* struct bfq_group_data - per-blkcg storage for the blkio subsystem.
*
* @ps: @blkcg_policy_storage that this structure inherits
* @weight: weight of the bfq_group
*/
struct bfq_group_data {
/* must be the first member */
struct blkcg_policy_data pd;
unsigned int weight;
};
/**
* struct bfq_group - per (device, cgroup) data structure.
* @entity: schedulable entity to insert into the parent group sched_data.
* @sched_data: own sched_data, to contain child entities (they may be
* both bfq_queues and bfq_groups).
* @bfqd: the bfq_data for the device this group acts upon.
* @async_bfqq: array of async queues for all the tasks belonging to
* the group, one queue per ioprio value per ioprio_class,
* except for the idle class that has only one queue.
* @async_idle_bfqq: async queue for the idle class (ioprio is ignored).
* @my_entity: pointer to @entity, %NULL for the toplevel group; used
* to avoid too many special cases during group creation/
* migration.
* @stats: stats for this bfqg.
* @active_entities: number of active entities belonging to the group;
* unused for the root group. Used to know whether there
* are groups with more than one active @bfq_entity
* (see the comments to the function
* bfq_bfqq_may_idle()).
* @rq_pos_tree: rbtree sorted by next_request position, used when
* determining if two or more queues have interleaving
* requests (see bfq_find_close_cooperator()).
*
* Each (device, cgroup) pair has its own bfq_group, i.e., for each cgroup
* there is a set of bfq_groups, each one collecting the lower-level
* entities belonging to the group that are acting on the same device.
*
* Locking works as follows:
* o @bfqd is protected by the queue lock, RCU is used to access it
* from the readers.
* o All the other fields are protected by the @bfqd queue lock.
*/
struct bfq_group {
/* must be the first member */
struct blkg_policy_data pd;
struct bfq_entity entity;
struct bfq_sched_data sched_data;
void *bfqd;
struct bfq_queue *async_bfqq[2][IOPRIO_BE_NR];
struct bfq_queue *async_idle_bfqq;
struct bfq_entity *my_entity;
int active_entities;
struct rb_root rq_pos_tree;
struct bfqg_stats stats;
};
#else
struct bfq_group {
struct bfq_sched_data sched_data;
struct bfq_queue *async_bfqq[2][IOPRIO_BE_NR];
struct bfq_queue *async_idle_bfqq;
struct rb_root rq_pos_tree;
};
#endif
struct bfq_queue *bfq_entity_to_bfqq(struct bfq_entity *entity);
/* --------------- main algorithm interface ----------------- */
#define BFQ_SERVICE_TREE_INIT ((struct bfq_service_tree) \
{ RB_ROOT, RB_ROOT, NULL, NULL, 0, 0 })
extern const int bfq_timeout;
struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic, bool is_sync);
void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq, bool is_sync);
struct bfq_data *bic_to_bfqd(struct bfq_io_cq *bic);
void bfq_requeue_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq);
void bfq_pos_tree_add_move(struct bfq_data *bfqd, struct bfq_queue *bfqq);
void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_entity *entity,
struct rb_root *root);
void bfq_weights_tree_remove(struct bfq_data *bfqd, struct bfq_entity *entity,
struct rb_root *root);
void bfq_bfqq_expire(struct bfq_data *bfqd, struct bfq_queue *bfqq,
bool compensate, enum bfqq_expiration reason);
void bfq_put_queue(struct bfq_queue *bfqq);
void bfq_end_wr_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg);
void bfq_schedule_dispatch(struct bfq_data *bfqd);
void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg);
/* ------------ end of main algorithm interface -------------- */
/* ---------------- cgroups-support interface ---------------- */
void bfqg_stats_update_io_add(struct bfq_group *bfqg, struct bfq_queue *bfqq,
unsigned int op);
void bfqg_stats_update_io_remove(struct bfq_group *bfqg, unsigned int op);
void bfqg_stats_update_io_merged(struct bfq_group *bfqg, unsigned int op);
void bfqg_stats_update_completion(struct bfq_group *bfqg, uint64_t start_time,
uint64_t io_start_time, unsigned int op);
void bfqg_stats_update_dequeue(struct bfq_group *bfqg);
void bfqg_stats_set_start_empty_time(struct bfq_group *bfqg);
void bfqg_stats_update_idle_time(struct bfq_group *bfqg);
void bfqg_stats_set_start_idle_time(struct bfq_group *bfqg);
void bfqg_stats_update_avg_queue_size(struct bfq_group *bfqg);
void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
struct bfq_group *bfqg);
void bfq_init_entity(struct bfq_entity *entity, struct bfq_group *bfqg);
void bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio);
void bfq_end_wr_async(struct bfq_data *bfqd);
struct bfq_group *bfq_find_set_group(struct bfq_data *bfqd,
struct blkcg *blkcg);
struct blkcg_gq *bfqg_to_blkg(struct bfq_group *bfqg);
struct bfq_group *bfqq_group(struct bfq_queue *bfqq);
struct bfq_group *bfq_create_group_hierarchy(struct bfq_data *bfqd, int node);
void bfqg_put(struct bfq_group *bfqg);
#ifdef CONFIG_BFQ_GROUP_IOSCHED
extern struct cftype bfq_blkcg_legacy_files[];
extern struct cftype bfq_blkg_files[];
extern struct blkcg_policy blkcg_policy_bfq;
#endif
/* ------------- end of cgroups-support interface ------------- */
/* - interface of the internal hierarchical B-WF2Q+ scheduler - */
#ifdef CONFIG_BFQ_GROUP_IOSCHED
/* both next loops stop at one of the child entities of the root group */
#define for_each_entity(entity) \
for (; entity ; entity = entity->parent)
/*
* For each iteration, compute parent in advance, so as to be safe if
* entity is deallocated during the iteration. Such a deallocation may
* happen as a consequence of a bfq_put_queue that frees the bfq_queue
* containing entity.
*/
#define for_each_entity_safe(entity, parent) \
for (; entity && ({ parent = entity->parent; 1; }); entity = parent)
#else /* CONFIG_BFQ_GROUP_IOSCHED */
/*
* Next two macros are fake loops when cgroups support is not
* enabled. I fact, in such a case, there is only one level to go up
* (to reach the root group).
*/
#define for_each_entity(entity) \
for (; entity ; entity = NULL)
#define for_each_entity_safe(entity, parent) \
for (parent = NULL; entity ; entity = parent)
#endif /* CONFIG_BFQ_GROUP_IOSCHED */
struct bfq_group *bfq_bfqq_to_bfqg(struct bfq_queue *bfqq);
struct bfq_queue *bfq_entity_to_bfqq(struct bfq_entity *entity);
struct bfq_service_tree *bfq_entity_service_tree(struct bfq_entity *entity);
struct bfq_entity *bfq_entity_of(struct rb_node *node);
unsigned short bfq_ioprio_to_weight(int ioprio);
void bfq_put_idle_entity(struct bfq_service_tree *st,
struct bfq_entity *entity);
struct bfq_service_tree *
__bfq_entity_update_weight_prio(struct bfq_service_tree *old_st,
struct bfq_entity *entity);
void bfq_bfqq_served(struct bfq_queue *bfqq, int served);
void bfq_bfqq_charge_time(struct bfq_data *bfqd, struct bfq_queue *bfqq,
unsigned long time_ms);
bool __bfq_deactivate_entity(struct bfq_entity *entity,
bool ins_into_idle_tree);
bool next_queue_may_preempt(struct bfq_data *bfqd);
struct bfq_queue *bfq_get_next_queue(struct bfq_data *bfqd);
void __bfq_bfqd_reset_in_service(struct bfq_data *bfqd);
void bfq_deactivate_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq,
bool ins_into_idle_tree, bool expiration);
void bfq_activate_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq);
void bfq_requeue_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq);
void bfq_del_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq,
bool expiration);
void bfq_add_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq);
/* --------------- end of interface of B-WF2Q+ ---------------- */
/* Logging facilities. */
#ifdef CONFIG_BFQ_GROUP_IOSCHED
struct bfq_group *bfqq_group(struct bfq_queue *bfqq);
#define bfq_log_bfqq(bfqd, bfqq, fmt, args...) do { \
char __pbuf[128]; \
\
blkg_path(bfqg_to_blkg(bfqq_group(bfqq)), __pbuf, sizeof(__pbuf)); \
blk_add_trace_msg((bfqd)->queue, "bfq%d%c %s " fmt, (bfqq)->pid, \
bfq_bfqq_sync((bfqq)) ? 'S' : 'A', \
__pbuf, ##args); \
} while (0)
#define bfq_log_bfqg(bfqd, bfqg, fmt, args...) do { \
char __pbuf[128]; \
\
blkg_path(bfqg_to_blkg(bfqg), __pbuf, sizeof(__pbuf)); \
blk_add_trace_msg((bfqd)->queue, "%s " fmt, __pbuf, ##args); \
} while (0)
#else /* CONFIG_BFQ_GROUP_IOSCHED */
#define bfq_log_bfqq(bfqd, bfqq, fmt, args...) \
blk_add_trace_msg((bfqd)->queue, "bfq%d%c " fmt, (bfqq)->pid, \
bfq_bfqq_sync((bfqq)) ? 'S' : 'A', \
##args)
#define bfq_log_bfqg(bfqd, bfqg, fmt, args...) do {} while (0)
#endif /* CONFIG_BFQ_GROUP_IOSCHED */
#define bfq_log(bfqd, fmt, args...) \
blk_add_trace_msg((bfqd)->queue, "bfq " fmt, ##args)
#endif /* _BFQ_H */

1616
block/bfq-wf2q.c Normal file

File diff suppressed because it is too large Load Diff

View File

@ -30,6 +30,7 @@
#include <linux/cgroup.h>
#include <trace/events/block.h>
#include "blk.h"
/*
* Test patch to inline a certain number of bi_io_vec's inside the bio
@ -427,7 +428,8 @@ static void punt_bios_to_rescuer(struct bio_set *bs)
* RETURNS:
* Pointer to new bio on success, NULL on failure.
*/
struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
struct bio *bio_alloc_bioset(gfp_t gfp_mask, unsigned int nr_iovecs,
struct bio_set *bs)
{
gfp_t saved_gfp = gfp_mask;
unsigned front_pad;
@ -1824,6 +1826,11 @@ static inline bool bio_remaining_done(struct bio *bio)
* bio_endio() will end I/O on the whole bio. bio_endio() is the preferred
* way to end I/O on a bio. No one should call bi_end_io() directly on a
* bio unless they own it and thus know that it has an end_io function.
*
* bio_endio() can be called several times on a bio that has been chained
* using bio_chain(). The ->bi_end_io() function will only be called the
* last time. At this point the BLK_TA_COMPLETE tracing event will be
* generated if BIO_TRACE_COMPLETION is set.
**/
void bio_endio(struct bio *bio)
{
@ -1844,6 +1851,13 @@ void bio_endio(struct bio *bio)
goto again;
}
if (bio->bi_bdev && bio_flagged(bio, BIO_TRACE_COMPLETION)) {
trace_block_bio_complete(bdev_get_queue(bio->bi_bdev),
bio, bio->bi_error);
bio_clear_flag(bio, BIO_TRACE_COMPLETION);
}
blk_throtl_bio_endio(bio);
if (bio->bi_end_io)
bio->bi_end_io(bio);
}
@ -1882,6 +1896,9 @@ struct bio *bio_split(struct bio *bio, int sectors,
bio_advance(bio, split->bi_iter.bi_size);
if (bio_flagged(bio, BIO_TRACE_COMPLETION))
bio_set_flag(bio, BIO_TRACE_COMPLETION);
return split;
}
EXPORT_SYMBOL(bio_split);

View File

@ -772,6 +772,27 @@ struct blkg_rwstat blkg_rwstat_recursive_sum(struct blkcg_gq *blkg,
}
EXPORT_SYMBOL_GPL(blkg_rwstat_recursive_sum);
/* Performs queue bypass and policy enabled checks then looks up blkg. */
static struct blkcg_gq *blkg_lookup_check(struct blkcg *blkcg,
const struct blkcg_policy *pol,
struct request_queue *q)
{
WARN_ON_ONCE(!rcu_read_lock_held());
lockdep_assert_held(q->queue_lock);
if (!blkcg_policy_enabled(q, pol))
return ERR_PTR(-EOPNOTSUPP);
/*
* This could be the first entry point of blkcg implementation and
* we shouldn't allow anything to go through for a bypassing queue.
*/
if (unlikely(blk_queue_bypass(q)))
return ERR_PTR(blk_queue_dying(q) ? -ENODEV : -EBUSY);
return __blkg_lookup(blkcg, q, true /* update_hint */);
}
/**
* blkg_conf_prep - parse and prepare for per-blkg config update
* @blkcg: target block cgroup
@ -789,6 +810,7 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
__acquires(rcu) __acquires(disk->queue->queue_lock)
{
struct gendisk *disk;
struct request_queue *q;
struct blkcg_gq *blkg;
struct module *owner;
unsigned int major, minor;
@ -807,44 +829,95 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
if (!disk)
return -ENODEV;
if (part) {
owner = disk->fops->owner;
put_disk(disk);
module_put(owner);
return -ENODEV;
ret = -ENODEV;
goto fail;
}
q = disk->queue;
rcu_read_lock();
spin_lock_irq(disk->queue->queue_lock);
if (blkcg_policy_enabled(disk->queue, pol))
blkg = blkg_lookup_create(blkcg, disk->queue);
else
blkg = ERR_PTR(-EOPNOTSUPP);
spin_lock_irq(q->queue_lock);
blkg = blkg_lookup_check(blkcg, pol, q);
if (IS_ERR(blkg)) {
ret = PTR_ERR(blkg);
rcu_read_unlock();
spin_unlock_irq(disk->queue->queue_lock);
owner = disk->fops->owner;
put_disk(disk);
module_put(owner);
/*
* If queue was bypassing, we should retry. Do so after a
* short msleep(). It isn't strictly necessary but queue
* can be bypassing for some time and it's always nice to
* avoid busy looping.
*/
if (ret == -EBUSY) {
msleep(10);
ret = restart_syscall();
}
return ret;
goto fail_unlock;
}
if (blkg)
goto success;
/*
* Create blkgs walking down from blkcg_root to @blkcg, so that all
* non-root blkgs have access to their parents.
*/
while (true) {
struct blkcg *pos = blkcg;
struct blkcg *parent;
struct blkcg_gq *new_blkg;
parent = blkcg_parent(blkcg);
while (parent && !__blkg_lookup(parent, q, false)) {
pos = parent;
parent = blkcg_parent(parent);
}
/* Drop locks to do new blkg allocation with GFP_KERNEL. */
spin_unlock_irq(q->queue_lock);
rcu_read_unlock();
new_blkg = blkg_alloc(pos, q, GFP_KERNEL);
if (unlikely(!new_blkg)) {
ret = -ENOMEM;
goto fail;
}
rcu_read_lock();
spin_lock_irq(q->queue_lock);
blkg = blkg_lookup_check(pos, pol, q);
if (IS_ERR(blkg)) {
ret = PTR_ERR(blkg);
goto fail_unlock;
}
if (blkg) {
blkg_free(new_blkg);
} else {
blkg = blkg_create(pos, q, new_blkg);
if (unlikely(IS_ERR(blkg))) {
ret = PTR_ERR(blkg);
goto fail_unlock;
}
}
if (pos == blkcg)
goto success;
}
success:
ctx->disk = disk;
ctx->blkg = blkg;
ctx->body = body;
return 0;
fail_unlock:
spin_unlock_irq(q->queue_lock);
rcu_read_unlock();
fail:
owner = disk->fops->owner;
put_disk(disk);
module_put(owner);
/*
* If queue was bypassing, we should retry. Do so after a
* short msleep(). It isn't strictly necessary but queue
* can be bypassing for some time and it's always nice to
* avoid busy looping.
*/
if (ret == -EBUSY) {
msleep(10);
ret = restart_syscall();
}
return ret;
}
EXPORT_SYMBOL_GPL(blkg_conf_prep);

View File

@ -268,10 +268,8 @@ void blk_sync_queue(struct request_queue *q)
struct blk_mq_hw_ctx *hctx;
int i;
queue_for_each_hw_ctx(q, hctx, i) {
cancel_work_sync(&hctx->run_work);
cancel_delayed_work_sync(&hctx->delay_work);
}
queue_for_each_hw_ctx(q, hctx, i)
cancel_delayed_work_sync(&hctx->run_work);
} else {
cancel_delayed_work_sync(&q->delay_work);
}
@ -500,6 +498,13 @@ void blk_set_queue_dying(struct request_queue *q)
queue_flag_set(QUEUE_FLAG_DYING, q);
spin_unlock_irq(q->queue_lock);
/*
* When queue DYING flag is set, we need to block new req
* entering queue, so we call blk_freeze_queue_start() to
* prevent I/O from crossing blk_queue_enter().
*/
blk_freeze_queue_start(q);
if (q->mq_ops)
blk_mq_wake_waiters(q);
else {
@ -556,9 +561,13 @@ void blk_cleanup_queue(struct request_queue *q)
* prevent that q->request_fn() gets invoked after draining finished.
*/
blk_freeze_queue(q);
spin_lock_irq(lock);
if (!q->mq_ops)
if (!q->mq_ops) {
spin_lock_irq(lock);
__blk_drain_queue(q, true);
} else {
blk_mq_debugfs_unregister_mq(q);
spin_lock_irq(lock);
}
queue_flag_set(QUEUE_FLAG_DEAD, q);
spin_unlock_irq(lock);
@ -669,6 +678,15 @@ int blk_queue_enter(struct request_queue *q, bool nowait)
if (nowait)
return -EBUSY;
/*
* read pair of barrier in blk_freeze_queue_start(),
* we need to order reading __PERCPU_REF_DEAD flag of
* .q_usage_counter and reading .mq_freeze_depth or
* queue dying flag, otherwise the following wait may
* never return if the two reads are reordered.
*/
smp_rmb();
ret = wait_event_interruptible(q->mq_freeze_wq,
!atomic_read(&q->mq_freeze_depth) ||
blk_queue_dying(q));
@ -720,6 +738,10 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
if (!q->backing_dev_info)
goto fail_split;
q->stats = blk_alloc_queue_stats();
if (!q->stats)
goto fail_stats;
q->backing_dev_info->ra_pages =
(VM_MAX_READAHEAD * 1024) / PAGE_SIZE;
q->backing_dev_info->capabilities = BDI_CAP_CGROUP_WRITEBACK;
@ -776,6 +798,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
fail_ref:
percpu_ref_exit(&q->q_usage_counter);
fail_bdi:
blk_free_queue_stats(q->stats);
fail_stats:
bdi_put(q->backing_dev_info);
fail_split:
bioset_free(q->bio_split);
@ -889,7 +913,6 @@ int blk_init_allocated_queue(struct request_queue *q)
q->exit_rq_fn(q, q->fq->flush_rq);
out_free_flush_queue:
blk_free_flush_queue(q->fq);
wbt_exit(q);
return -ENOMEM;
}
EXPORT_SYMBOL(blk_init_allocated_queue);
@ -1128,7 +1151,6 @@ static struct request *__get_request(struct request_list *rl, unsigned int op,
blk_rq_init(q, rq);
blk_rq_set_rl(rq, rl);
blk_rq_set_prio(rq, ioc);
rq->cmd_flags = op;
rq->rq_flags = rq_flags;
@ -1608,17 +1630,23 @@ unsigned int blk_plug_queued_count(struct request_queue *q)
return ret;
}
void init_request_from_bio(struct request *req, struct bio *bio)
void blk_init_request_from_bio(struct request *req, struct bio *bio)
{
struct io_context *ioc = rq_ioc(bio);
if (bio->bi_opf & REQ_RAHEAD)
req->cmd_flags |= REQ_FAILFAST_MASK;
req->errors = 0;
req->__sector = bio->bi_iter.bi_sector;
if (ioprio_valid(bio_prio(bio)))
req->ioprio = bio_prio(bio);
else if (ioc)
req->ioprio = ioc->ioprio;
else
req->ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0);
blk_rq_bio_prep(req->q, req, bio);
}
EXPORT_SYMBOL_GPL(blk_init_request_from_bio);
static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio)
{
@ -1709,7 +1737,7 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio)
* We don't worry about that case for efficiency. It won't happen
* often, and the elevators are able to handle it.
*/
init_request_from_bio(req, bio);
blk_init_request_from_bio(req, bio);
if (test_bit(QUEUE_FLAG_SAME_COMP, &q->queue_flags))
req->cpu = raw_smp_processor_id();
@ -1936,7 +1964,13 @@ generic_make_request_checks(struct bio *bio)
if (!blkcg_bio_issue_check(q, bio))
return false;
trace_block_bio_queue(q, bio);
if (!bio_flagged(bio, BIO_TRACE_COMPLETION)) {
trace_block_bio_queue(q, bio);
/* Now that enqueuing has been traced, we need to trace
* completion as well.
*/
bio_set_flag(bio, BIO_TRACE_COMPLETION);
}
return true;
not_supported:
@ -2478,7 +2512,7 @@ void blk_start_request(struct request *req)
blk_dequeue_request(req);
if (test_bit(QUEUE_FLAG_STATS, &req->q->queue_flags)) {
blk_stat_set_issue_time(&req->issue_stat);
blk_stat_set_issue(&req->issue_stat, blk_rq_sectors(req));
req->rq_flags |= RQF_STATS;
wbt_issue(req->q->rq_wb, &req->issue_stat);
}
@ -2540,22 +2574,11 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes)
{
int total_bytes;
trace_block_rq_complete(req->q, req, nr_bytes);
trace_block_rq_complete(req, error, nr_bytes);
if (!req->bio)
return false;
/*
* For fs requests, rq is just carrier of independent bio's
* and each partial completion should be handled separately.
* Reset per-request error on each partial completion.
*
* TODO: tj: This is too subtle. It would be better to let
* low level drivers do what they see fit.
*/
if (!blk_rq_is_passthrough(req))
req->errors = 0;
if (error && !blk_rq_is_passthrough(req) &&
!(req->rq_flags & RQF_QUIET)) {
char *error_type;
@ -2601,6 +2624,8 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes)
if (bio_bytes == bio->bi_iter.bi_size)
req->bio = bio->bi_next;
/* Completion has already been traced */
bio_clear_flag(bio, BIO_TRACE_COMPLETION);
req_bio_endio(req, bio, bio_bytes, error);
total_bytes += bio_bytes;
@ -2699,7 +2724,7 @@ void blk_finish_request(struct request *req, int error)
struct request_queue *q = req->q;
if (req->rq_flags & RQF_STATS)
blk_stat_add(&q->rq_stats[rq_data_dir(req)], req);
blk_stat_add(req);
if (req->rq_flags & RQF_QUEUED)
blk_queue_end_tag(q, req);
@ -2776,7 +2801,7 @@ static bool blk_end_bidi_request(struct request *rq, int error,
* %false - we are done with this request
* %true - still buffers pending for this request
**/
bool __blk_end_bidi_request(struct request *rq, int error,
static bool __blk_end_bidi_request(struct request *rq, int error,
unsigned int nr_bytes, unsigned int bidi_bytes)
{
if (blk_update_bidi_request(rq, error, nr_bytes, bidi_bytes))
@ -2828,43 +2853,6 @@ void blk_end_request_all(struct request *rq, int error)
}
EXPORT_SYMBOL(blk_end_request_all);
/**
* blk_end_request_cur - Helper function to finish the current request chunk.
* @rq: the request to finish the current chunk for
* @error: %0 for success, < %0 for error
*
* Description:
* Complete the current consecutively mapped chunk from @rq.
*
* Return:
* %false - we are done with this request
* %true - still buffers pending for this request
*/
bool blk_end_request_cur(struct request *rq, int error)
{
return blk_end_request(rq, error, blk_rq_cur_bytes(rq));
}
EXPORT_SYMBOL(blk_end_request_cur);
/**
* blk_end_request_err - Finish a request till the next failure boundary.
* @rq: the request to finish till the next failure boundary for
* @error: must be negative errno
*
* Description:
* Complete @rq till the next failure boundary.
*
* Return:
* %false - we are done with this request
* %true - still buffers pending for this request
*/
bool blk_end_request_err(struct request *rq, int error)
{
WARN_ON(error >= 0);
return blk_end_request(rq, error, blk_rq_err_bytes(rq));
}
EXPORT_SYMBOL_GPL(blk_end_request_err);
/**
* __blk_end_request - Helper function for drivers to complete the request.
* @rq: the request being processed
@ -2924,26 +2912,6 @@ bool __blk_end_request_cur(struct request *rq, int error)
}
EXPORT_SYMBOL(__blk_end_request_cur);
/**
* __blk_end_request_err - Finish a request till the next failure boundary.
* @rq: the request to finish till the next failure boundary for
* @error: must be negative errno
*
* Description:
* Complete @rq till the next failure boundary. Must be called
* with queue lock held.
*
* Return:
* %false - we are done with this request
* %true - still buffers pending for this request
*/
bool __blk_end_request_err(struct request *rq, int error)
{
WARN_ON(error >= 0);
return __blk_end_request(rq, error, blk_rq_err_bytes(rq));
}
EXPORT_SYMBOL_GPL(__blk_end_request_err);
void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
struct bio *bio)
{
@ -3106,6 +3074,13 @@ int kblockd_schedule_work_on(int cpu, struct work_struct *work)
}
EXPORT_SYMBOL(kblockd_schedule_work_on);
int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork,
unsigned long delay)
{
return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay);
}
EXPORT_SYMBOL(kblockd_mod_delayed_work_on);
int kblockd_schedule_delayed_work(struct delayed_work *dwork,
unsigned long delay)
{

View File

@ -69,8 +69,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
if (unlikely(blk_queue_dying(q))) {
rq->rq_flags |= RQF_QUIET;
rq->errors = -ENXIO;
__blk_end_request_all(rq, rq->errors);
__blk_end_request_all(rq, -ENXIO);
spin_unlock_irq(q->queue_lock);
return;
}
@ -92,11 +91,10 @@ EXPORT_SYMBOL_GPL(blk_execute_rq_nowait);
* Insert a fully prepared request at the back of the I/O scheduler queue
* for execution and wait for completion.
*/
int blk_execute_rq(struct request_queue *q, struct gendisk *bd_disk,
void blk_execute_rq(struct request_queue *q, struct gendisk *bd_disk,
struct request *rq, int at_head)
{
DECLARE_COMPLETION_ONSTACK(wait);
int err = 0;
unsigned long hang_check;
rq->end_io_data = &wait;
@ -108,10 +106,5 @@ int blk_execute_rq(struct request_queue *q, struct gendisk *bd_disk,
while (!wait_for_completion_io_timeout(&wait, hang_check * (HZ/2)));
else
wait_for_completion_io(&wait);
if (rq->errors)
err = -EIO;
return err;
}
EXPORT_SYMBOL(blk_execute_rq);

View File

@ -447,7 +447,7 @@ void blk_insert_flush(struct request *rq)
if (q->mq_ops)
blk_mq_end_request(rq, 0);
else
__blk_end_bidi_request(rq, 0, 0, 0);
__blk_end_request(rq, 0, 0);
return;
}
@ -497,8 +497,7 @@ void blk_insert_flush(struct request *rq)
* Description:
* Issue a flush for the block device in question. Caller can supply
* room for storing the error offset in case of a flush error, if they
* wish to. If WAIT flag is not passed then caller may check only what
* request was pushed in some internal queue for later handling.
* wish to.
*/
int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask,
sector_t *error_sector)

View File

@ -389,7 +389,7 @@ static int blk_integrity_nop_fn(struct blk_integrity_iter *iter)
return 0;
}
static struct blk_integrity_profile nop_profile = {
static const struct blk_integrity_profile nop_profile = {
.name = "nop",
.generate_fn = blk_integrity_nop_fn,
.verify_fn = blk_integrity_nop_fn,
@ -412,12 +412,13 @@ void blk_integrity_register(struct gendisk *disk, struct blk_integrity *template
bi->flags = BLK_INTEGRITY_VERIFY | BLK_INTEGRITY_GENERATE |
template->flags;
bi->interval_exp = ilog2(queue_logical_block_size(disk->queue));
bi->interval_exp = template->interval_exp ? :
ilog2(queue_logical_block_size(disk->queue));
bi->profile = template->profile ? template->profile : &nop_profile;
bi->tuple_size = template->tuple_size;
bi->tag_size = template->tag_size;
blk_integrity_revalidate(disk);
disk->queue->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
}
EXPORT_SYMBOL(blk_integrity_register);
@ -430,26 +431,11 @@ EXPORT_SYMBOL(blk_integrity_register);
*/
void blk_integrity_unregister(struct gendisk *disk)
{
blk_integrity_revalidate(disk);
disk->queue->backing_dev_info->capabilities &= ~BDI_CAP_STABLE_WRITES;
memset(&disk->queue->integrity, 0, sizeof(struct blk_integrity));
}
EXPORT_SYMBOL(blk_integrity_unregister);
void blk_integrity_revalidate(struct gendisk *disk)
{
struct blk_integrity *bi = &disk->queue->integrity;
if (!(disk->flags & GENHD_FL_UP))
return;
if (bi->profile)
disk->queue->backing_dev_info->capabilities |=
BDI_CAP_STABLE_WRITES;
else
disk->queue->backing_dev_info->capabilities &=
~BDI_CAP_STABLE_WRITES;
}
void blk_integrity_add(struct gendisk *disk)
{
if (kobject_init_and_add(&disk->integrity_kobj, &integrity_ktype,

View File

@ -37,17 +37,12 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
return -ENXIO;
if (flags & BLKDEV_DISCARD_SECURE) {
if (flags & BLKDEV_DISCARD_ZERO)
return -EOPNOTSUPP;
if (!blk_queue_secure_erase(q))
return -EOPNOTSUPP;
op = REQ_OP_SECURE_ERASE;
} else {
if (!blk_queue_discard(q))
return -EOPNOTSUPP;
if ((flags & BLKDEV_DISCARD_ZERO) &&
!q->limits.discard_zeroes_data)
return -EOPNOTSUPP;
op = REQ_OP_DISCARD;
}
@ -109,7 +104,7 @@ EXPORT_SYMBOL(__blkdev_issue_discard);
* @sector: start sector
* @nr_sects: number of sectors to discard
* @gfp_mask: memory allocation flags (for bio_alloc)
* @flags: BLKDEV_IFL_* flags to control behaviour
* @flags: BLKDEV_DISCARD_* flags to control behaviour
*
* Description:
* Issue a discard request for the sectors in question.
@ -126,7 +121,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
&bio);
if (!ret && bio) {
ret = submit_bio_wait(bio);
if (ret == -EOPNOTSUPP && !(flags & BLKDEV_DISCARD_ZERO))
if (ret == -EOPNOTSUPP)
ret = 0;
bio_put(bio);
}
@ -226,20 +221,9 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
}
EXPORT_SYMBOL(blkdev_issue_write_same);
/**
* __blkdev_issue_write_zeroes - generate number of bios with WRITE ZEROES
* @bdev: blockdev to issue
* @sector: start sector
* @nr_sects: number of sectors to write
* @gfp_mask: memory allocation flags (for bio_alloc)
* @biop: pointer to anchor bio
*
* Description:
* Generate and issue number of bios(REQ_OP_WRITE_ZEROES) with zerofiled pages.
*/
static int __blkdev_issue_write_zeroes(struct block_device *bdev,
sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
struct bio **biop)
struct bio **biop, unsigned flags)
{
struct bio *bio = *biop;
unsigned int max_write_zeroes_sectors;
@ -258,7 +242,9 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
bio = next_bio(bio, 0, gfp_mask);
bio->bi_iter.bi_sector = sector;
bio->bi_bdev = bdev;
bio_set_op_attrs(bio, REQ_OP_WRITE_ZEROES, 0);
bio->bi_opf = REQ_OP_WRITE_ZEROES;
if (flags & BLKDEV_ZERO_NOUNMAP)
bio->bi_opf |= REQ_NOUNMAP;
if (nr_sects > max_write_zeroes_sectors) {
bio->bi_iter.bi_size = max_write_zeroes_sectors << 9;
@ -282,14 +268,27 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
* @nr_sects: number of sectors to write
* @gfp_mask: memory allocation flags (for bio_alloc)
* @biop: pointer to anchor bio
* @discard: discard flag
* @flags: controls detailed behavior
*
* Description:
* Generate and issue number of bios with zerofiled pages.
* Zero-fill a block range, either using hardware offload or by explicitly
* writing zeroes to the device.
*
* Note that this function may fail with -EOPNOTSUPP if the driver signals
* zeroing offload support, but the device fails to process the command (for
* some devices there is no non-destructive way to verify whether this
* operation is actually supported). In this case the caller should call
* retry the call to blkdev_issue_zeroout() and the fallback path will be used.
*
* If a device is using logical block provisioning, the underlying space will
* not be released if %flags contains BLKDEV_ZERO_NOUNMAP.
*
* If %flags contains BLKDEV_ZERO_NOFALLBACK, the function will return
* -EOPNOTSUPP if no explicit hardware offload for zeroing is provided.
*/
int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, struct bio **biop,
bool discard)
unsigned flags)
{
int ret;
int bi_size = 0;
@ -302,8 +301,8 @@ int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
return -EINVAL;
ret = __blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp_mask,
biop);
if (ret == 0 || (ret && ret != -EOPNOTSUPP))
biop, flags);
if (ret != -EOPNOTSUPP || (flags & BLKDEV_ZERO_NOFALLBACK))
goto out;
ret = 0;
@ -337,40 +336,23 @@ EXPORT_SYMBOL(__blkdev_issue_zeroout);
* @sector: start sector
* @nr_sects: number of sectors to write
* @gfp_mask: memory allocation flags (for bio_alloc)
* @discard: whether to discard the block range
* @flags: controls detailed behavior
*
* Description:
* Zero-fill a block range. If the discard flag is set and the block
* device guarantees that subsequent READ operations to the block range
* in question will return zeroes, the blocks will be discarded. Should
* the discard request fail, if the discard flag is not set, or if
* discard_zeroes_data is not supported, this function will resort to
* zeroing the blocks manually, thus provisioning (allocating,
* anchoring) them. If the block device supports WRITE ZEROES or WRITE SAME
* command(s), blkdev_issue_zeroout() will use it to optimize the process of
* clearing the block range. Otherwise the zeroing will be performed
* using regular WRITE calls.
* Zero-fill a block range, either using hardware offload or by explicitly
* writing zeroes to the device. See __blkdev_issue_zeroout() for the
* valid values for %flags.
*/
int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, bool discard)
sector_t nr_sects, gfp_t gfp_mask, unsigned flags)
{
int ret;
struct bio *bio = NULL;
struct blk_plug plug;
if (discard) {
if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask,
BLKDEV_DISCARD_ZERO))
return 0;
}
if (!blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
ZERO_PAGE(0)))
return 0;
blk_start_plug(&plug);
ret = __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask,
&bio, discard);
&bio, flags);
if (ret == 0 && bio) {
ret = submit_bio_wait(bio);
bio_put(bio);

View File

@ -54,6 +54,20 @@ static struct bio *blk_bio_discard_split(struct request_queue *q,
return bio_split(bio, split_sectors, GFP_NOIO, bs);
}
static struct bio *blk_bio_write_zeroes_split(struct request_queue *q,
struct bio *bio, struct bio_set *bs, unsigned *nsegs)
{
*nsegs = 1;
if (!q->limits.max_write_zeroes_sectors)
return NULL;
if (bio_sectors(bio) <= q->limits.max_write_zeroes_sectors)
return NULL;
return bio_split(bio, q->limits.max_write_zeroes_sectors, GFP_NOIO, bs);
}
static struct bio *blk_bio_write_same_split(struct request_queue *q,
struct bio *bio,
struct bio_set *bs,
@ -200,8 +214,7 @@ void blk_queue_split(struct request_queue *q, struct bio **bio,
split = blk_bio_discard_split(q, *bio, bs, &nsegs);
break;
case REQ_OP_WRITE_ZEROES:
split = NULL;
nsegs = (*bio)->bi_phys_segments;
split = blk_bio_write_zeroes_split(q, *bio, bs, &nsegs);
break;
case REQ_OP_WRITE_SAME:
split = blk_bio_write_same_split(q, *bio, bs, &nsegs);

View File

@ -43,11 +43,157 @@ static int blk_mq_debugfs_seq_open(struct inode *inode, struct file *file,
return ret;
}
static int blk_flags_show(struct seq_file *m, const unsigned long flags,
const char *const *flag_name, int flag_name_count)
{
bool sep = false;
int i;
for (i = 0; i < sizeof(flags) * BITS_PER_BYTE; i++) {
if (!(flags & BIT(i)))
continue;
if (sep)
seq_puts(m, " ");
sep = true;
if (i < flag_name_count && flag_name[i])
seq_puts(m, flag_name[i]);
else
seq_printf(m, "%d", i);
}
return 0;
}
static const char *const blk_queue_flag_name[] = {
[QUEUE_FLAG_QUEUED] = "QUEUED",
[QUEUE_FLAG_STOPPED] = "STOPPED",
[QUEUE_FLAG_SYNCFULL] = "SYNCFULL",
[QUEUE_FLAG_ASYNCFULL] = "ASYNCFULL",
[QUEUE_FLAG_DYING] = "DYING",
[QUEUE_FLAG_BYPASS] = "BYPASS",
[QUEUE_FLAG_BIDI] = "BIDI",
[QUEUE_FLAG_NOMERGES] = "NOMERGES",
[QUEUE_FLAG_SAME_COMP] = "SAME_COMP",
[QUEUE_FLAG_FAIL_IO] = "FAIL_IO",
[QUEUE_FLAG_STACKABLE] = "STACKABLE",
[QUEUE_FLAG_NONROT] = "NONROT",
[QUEUE_FLAG_IO_STAT] = "IO_STAT",
[QUEUE_FLAG_DISCARD] = "DISCARD",
[QUEUE_FLAG_NOXMERGES] = "NOXMERGES",
[QUEUE_FLAG_ADD_RANDOM] = "ADD_RANDOM",
[QUEUE_FLAG_SECERASE] = "SECERASE",
[QUEUE_FLAG_SAME_FORCE] = "SAME_FORCE",
[QUEUE_FLAG_DEAD] = "DEAD",
[QUEUE_FLAG_INIT_DONE] = "INIT_DONE",
[QUEUE_FLAG_NO_SG_MERGE] = "NO_SG_MERGE",
[QUEUE_FLAG_POLL] = "POLL",
[QUEUE_FLAG_WC] = "WC",
[QUEUE_FLAG_FUA] = "FUA",
[QUEUE_FLAG_FLUSH_NQ] = "FLUSH_NQ",
[QUEUE_FLAG_DAX] = "DAX",
[QUEUE_FLAG_STATS] = "STATS",
[QUEUE_FLAG_POLL_STATS] = "POLL_STATS",
[QUEUE_FLAG_REGISTERED] = "REGISTERED",
};
static int blk_queue_flags_show(struct seq_file *m, void *v)
{
struct request_queue *q = m->private;
blk_flags_show(m, q->queue_flags, blk_queue_flag_name,
ARRAY_SIZE(blk_queue_flag_name));
seq_puts(m, "\n");
return 0;
}
static ssize_t blk_queue_flags_store(struct file *file, const char __user *ubuf,
size_t len, loff_t *offp)
{
struct request_queue *q = file_inode(file)->i_private;
char op[16] = { }, *s;
len = min(len, sizeof(op) - 1);
if (copy_from_user(op, ubuf, len))
return -EFAULT;
s = op;
strsep(&s, " \t\n"); /* strip trailing whitespace */
if (strcmp(op, "run") == 0) {
blk_mq_run_hw_queues(q, true);
} else if (strcmp(op, "start") == 0) {
blk_mq_start_stopped_hw_queues(q, true);
} else {
pr_err("%s: unsupported operation %s. Use either 'run' or 'start'\n",
__func__, op);
return -EINVAL;
}
return len;
}
static int blk_queue_flags_open(struct inode *inode, struct file *file)
{
return single_open(file, blk_queue_flags_show, inode->i_private);
}
static const struct file_operations blk_queue_flags_fops = {
.open = blk_queue_flags_open,
.read = seq_read,
.llseek = seq_lseek,
.release = single_release,
.write = blk_queue_flags_store,
};
static void print_stat(struct seq_file *m, struct blk_rq_stat *stat)
{
if (stat->nr_samples) {
seq_printf(m, "samples=%d, mean=%lld, min=%llu, max=%llu",
stat->nr_samples, stat->mean, stat->min, stat->max);
} else {
seq_puts(m, "samples=0");
}
}
static int queue_poll_stat_show(struct seq_file *m, void *v)
{
struct request_queue *q = m->private;
int bucket;
for (bucket = 0; bucket < BLK_MQ_POLL_STATS_BKTS/2; bucket++) {
seq_printf(m, "read (%d Bytes): ", 1 << (9+bucket));
print_stat(m, &q->poll_stat[2*bucket]);
seq_puts(m, "\n");
seq_printf(m, "write (%d Bytes): ", 1 << (9+bucket));
print_stat(m, &q->poll_stat[2*bucket+1]);
seq_puts(m, "\n");
}
return 0;
}
static int queue_poll_stat_open(struct inode *inode, struct file *file)
{
return single_open(file, queue_poll_stat_show, inode->i_private);
}
static const struct file_operations queue_poll_stat_fops = {
.open = queue_poll_stat_open,
.read = seq_read,
.llseek = seq_lseek,
.release = single_release,
};
static const char *const hctx_state_name[] = {
[BLK_MQ_S_STOPPED] = "STOPPED",
[BLK_MQ_S_TAG_ACTIVE] = "TAG_ACTIVE",
[BLK_MQ_S_SCHED_RESTART] = "SCHED_RESTART",
[BLK_MQ_S_TAG_WAITING] = "TAG_WAITING",
};
static int hctx_state_show(struct seq_file *m, void *v)
{
struct blk_mq_hw_ctx *hctx = m->private;
seq_printf(m, "0x%lx\n", hctx->state);
blk_flags_show(m, hctx->state, hctx_state_name,
ARRAY_SIZE(hctx_state_name));
seq_puts(m, "\n");
return 0;
}
@ -63,11 +209,35 @@ static const struct file_operations hctx_state_fops = {
.release = single_release,
};
static const char *const alloc_policy_name[] = {
[BLK_TAG_ALLOC_FIFO] = "fifo",
[BLK_TAG_ALLOC_RR] = "rr",
};
static const char *const hctx_flag_name[] = {
[ilog2(BLK_MQ_F_SHOULD_MERGE)] = "SHOULD_MERGE",
[ilog2(BLK_MQ_F_TAG_SHARED)] = "TAG_SHARED",
[ilog2(BLK_MQ_F_SG_MERGE)] = "SG_MERGE",
[ilog2(BLK_MQ_F_BLOCKING)] = "BLOCKING",
[ilog2(BLK_MQ_F_NO_SCHED)] = "NO_SCHED",
};
static int hctx_flags_show(struct seq_file *m, void *v)
{
struct blk_mq_hw_ctx *hctx = m->private;
const int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(hctx->flags);
seq_printf(m, "0x%lx\n", hctx->flags);
seq_puts(m, "alloc_policy=");
if (alloc_policy < ARRAY_SIZE(alloc_policy_name) &&
alloc_policy_name[alloc_policy])
seq_puts(m, alloc_policy_name[alloc_policy]);
else
seq_printf(m, "%d", alloc_policy);
seq_puts(m, " ");
blk_flags_show(m,
hctx->flags ^ BLK_ALLOC_POLICY_TO_MQ_FLAG(alloc_policy),
hctx_flag_name, ARRAY_SIZE(hctx_flag_name));
seq_puts(m, "\n");
return 0;
}
@ -83,13 +253,83 @@ static const struct file_operations hctx_flags_fops = {
.release = single_release,
};
static const char *const op_name[] = {
[REQ_OP_READ] = "READ",
[REQ_OP_WRITE] = "WRITE",
[REQ_OP_FLUSH] = "FLUSH",
[REQ_OP_DISCARD] = "DISCARD",
[REQ_OP_ZONE_REPORT] = "ZONE_REPORT",
[REQ_OP_SECURE_ERASE] = "SECURE_ERASE",
[REQ_OP_ZONE_RESET] = "ZONE_RESET",
[REQ_OP_WRITE_SAME] = "WRITE_SAME",
[REQ_OP_WRITE_ZEROES] = "WRITE_ZEROES",
[REQ_OP_SCSI_IN] = "SCSI_IN",
[REQ_OP_SCSI_OUT] = "SCSI_OUT",
[REQ_OP_DRV_IN] = "DRV_IN",
[REQ_OP_DRV_OUT] = "DRV_OUT",
};
static const char *const cmd_flag_name[] = {
[__REQ_FAILFAST_DEV] = "FAILFAST_DEV",
[__REQ_FAILFAST_TRANSPORT] = "FAILFAST_TRANSPORT",
[__REQ_FAILFAST_DRIVER] = "FAILFAST_DRIVER",
[__REQ_SYNC] = "SYNC",
[__REQ_META] = "META",
[__REQ_PRIO] = "PRIO",
[__REQ_NOMERGE] = "NOMERGE",
[__REQ_IDLE] = "IDLE",
[__REQ_INTEGRITY] = "INTEGRITY",
[__REQ_FUA] = "FUA",
[__REQ_PREFLUSH] = "PREFLUSH",
[__REQ_RAHEAD] = "RAHEAD",
[__REQ_BACKGROUND] = "BACKGROUND",
[__REQ_NR_BITS] = "NR_BITS",
};
static const char *const rqf_name[] = {
[ilog2((__force u32)RQF_SORTED)] = "SORTED",
[ilog2((__force u32)RQF_STARTED)] = "STARTED",
[ilog2((__force u32)RQF_QUEUED)] = "QUEUED",
[ilog2((__force u32)RQF_SOFTBARRIER)] = "SOFTBARRIER",
[ilog2((__force u32)RQF_FLUSH_SEQ)] = "FLUSH_SEQ",
[ilog2((__force u32)RQF_MIXED_MERGE)] = "MIXED_MERGE",
[ilog2((__force u32)RQF_MQ_INFLIGHT)] = "MQ_INFLIGHT",
[ilog2((__force u32)RQF_DONTPREP)] = "DONTPREP",
[ilog2((__force u32)RQF_PREEMPT)] = "PREEMPT",
[ilog2((__force u32)RQF_COPY_USER)] = "COPY_USER",
[ilog2((__force u32)RQF_FAILED)] = "FAILED",
[ilog2((__force u32)RQF_QUIET)] = "QUIET",
[ilog2((__force u32)RQF_ELVPRIV)] = "ELVPRIV",
[ilog2((__force u32)RQF_IO_STAT)] = "IO_STAT",
[ilog2((__force u32)RQF_ALLOCED)] = "ALLOCED",
[ilog2((__force u32)RQF_PM)] = "PM",
[ilog2((__force u32)RQF_HASHED)] = "HASHED",
[ilog2((__force u32)RQF_STATS)] = "STATS",
[ilog2((__force u32)RQF_SPECIAL_PAYLOAD)] = "SPECIAL_PAYLOAD",
};
static int blk_mq_debugfs_rq_show(struct seq_file *m, void *v)
{
struct request *rq = list_entry_rq(v);
const struct blk_mq_ops *const mq_ops = rq->q->mq_ops;
const unsigned int op = rq->cmd_flags & REQ_OP_MASK;
seq_printf(m, "%p {.cmd_flags=0x%x, .rq_flags=0x%x, .tag=%d, .internal_tag=%d}\n",
rq, rq->cmd_flags, (__force unsigned int)rq->rq_flags,
rq->tag, rq->internal_tag);
seq_printf(m, "%p {.op=", rq);
if (op < ARRAY_SIZE(op_name) && op_name[op])
seq_printf(m, "%s", op_name[op]);
else
seq_printf(m, "%d", op);
seq_puts(m, ", .cmd_flags=");
blk_flags_show(m, rq->cmd_flags & ~REQ_OP_MASK, cmd_flag_name,
ARRAY_SIZE(cmd_flag_name));
seq_puts(m, ", .rq_flags=");
blk_flags_show(m, (__force unsigned int)rq->rq_flags, rqf_name,
ARRAY_SIZE(rqf_name));
seq_printf(m, ", .tag=%d, .internal_tag=%d", rq->tag,
rq->internal_tag);
if (mq_ops->show_rq)
mq_ops->show_rq(m, rq);
seq_puts(m, "}\n");
return 0;
}
@ -322,60 +562,6 @@ static const struct file_operations hctx_io_poll_fops = {
.release = single_release,
};
static void print_stat(struct seq_file *m, struct blk_rq_stat *stat)
{
seq_printf(m, "samples=%d, mean=%lld, min=%llu, max=%llu",
stat->nr_samples, stat->mean, stat->min, stat->max);
}
static int hctx_stats_show(struct seq_file *m, void *v)
{
struct blk_mq_hw_ctx *hctx = m->private;
struct blk_rq_stat stat[2];
blk_stat_init(&stat[BLK_STAT_READ]);
blk_stat_init(&stat[BLK_STAT_WRITE]);
blk_hctx_stat_get(hctx, stat);
seq_puts(m, "read: ");
print_stat(m, &stat[BLK_STAT_READ]);
seq_puts(m, "\n");
seq_puts(m, "write: ");
print_stat(m, &stat[BLK_STAT_WRITE]);
seq_puts(m, "\n");
return 0;
}
static int hctx_stats_open(struct inode *inode, struct file *file)
{
return single_open(file, hctx_stats_show, inode->i_private);
}
static ssize_t hctx_stats_write(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
struct seq_file *m = file->private_data;
struct blk_mq_hw_ctx *hctx = m->private;
struct blk_mq_ctx *ctx;
int i;
hctx_for_each_ctx(hctx, ctx, i) {
blk_stat_init(&ctx->stat[BLK_STAT_READ]);
blk_stat_init(&ctx->stat[BLK_STAT_WRITE]);
}
return count;
}
static const struct file_operations hctx_stats_fops = {
.open = hctx_stats_open,
.read = seq_read,
.write = hctx_stats_write,
.llseek = seq_lseek,
.release = single_release,
};
static int hctx_dispatched_show(struct seq_file *m, void *v)
{
struct blk_mq_hw_ctx *hctx = m->private;
@ -636,6 +822,12 @@ static const struct file_operations ctx_completed_fops = {
.release = single_release,
};
static const struct blk_mq_debugfs_attr blk_mq_debugfs_queue_attrs[] = {
{"poll_stat", 0400, &queue_poll_stat_fops},
{"state", 0600, &blk_queue_flags_fops},
{},
};
static const struct blk_mq_debugfs_attr blk_mq_debugfs_hctx_attrs[] = {
{"state", 0400, &hctx_state_fops},
{"flags", 0400, &hctx_flags_fops},
@ -646,7 +838,6 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_hctx_attrs[] = {
{"sched_tags", 0400, &hctx_sched_tags_fops},
{"sched_tags_bitmap", 0400, &hctx_sched_tags_bitmap_fops},
{"io_poll", 0600, &hctx_io_poll_fops},
{"stats", 0600, &hctx_stats_fops},
{"dispatched", 0600, &hctx_dispatched_fops},
{"queued", 0600, &hctx_queued_fops},
{"run", 0600, &hctx_run_fops},
@ -662,16 +853,17 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_ctx_attrs[] = {
{},
};
int blk_mq_debugfs_register(struct request_queue *q, const char *name)
int blk_mq_debugfs_register(struct request_queue *q)
{
if (!blk_debugfs_root)
return -ENOENT;
q->debugfs_dir = debugfs_create_dir(name, blk_debugfs_root);
q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent),
blk_debugfs_root);
if (!q->debugfs_dir)
goto err;
if (blk_mq_debugfs_register_hctxs(q))
if (blk_mq_debugfs_register_mq(q))
goto err;
return 0;
@ -741,7 +933,7 @@ static int blk_mq_debugfs_register_hctx(struct request_queue *q,
return 0;
}
int blk_mq_debugfs_register_hctxs(struct request_queue *q)
int blk_mq_debugfs_register_mq(struct request_queue *q)
{
struct blk_mq_hw_ctx *hctx;
int i;
@ -753,6 +945,9 @@ int blk_mq_debugfs_register_hctxs(struct request_queue *q)
if (!q->mq_debugfs_dir)
goto err;
if (!debugfs_create_files(q->mq_debugfs_dir, q, blk_mq_debugfs_queue_attrs))
goto err;
queue_for_each_hw_ctx(q, hctx, i) {
if (blk_mq_debugfs_register_hctx(q, hctx))
goto err;
@ -761,11 +956,11 @@ int blk_mq_debugfs_register_hctxs(struct request_queue *q)
return 0;
err:
blk_mq_debugfs_unregister_hctxs(q);
blk_mq_debugfs_unregister_mq(q);
return -ENOMEM;
}
void blk_mq_debugfs_unregister_hctxs(struct request_queue *q)
void blk_mq_debugfs_unregister_mq(struct request_queue *q)
{
debugfs_remove_recursive(q->mq_debugfs_dir);
q->mq_debugfs_dir = NULL;

View File

@ -23,7 +23,7 @@
* @pdev: PCI device associated with @set.
*
* This function assumes the PCI device @pdev has at least as many available
* interrupt vetors as @set has queues. It will then queuery the vector
* interrupt vectors as @set has queues. It will then query the vector
* corresponding to each queue for it's affinity mask and built queue mapping
* that maps a queue to the CPUs that have irq affinity for the corresponding
* vector.

View File

@ -30,43 +30,6 @@ void blk_mq_sched_free_hctx_data(struct request_queue *q,
}
EXPORT_SYMBOL_GPL(blk_mq_sched_free_hctx_data);
int blk_mq_sched_init_hctx_data(struct request_queue *q, size_t size,
int (*init)(struct blk_mq_hw_ctx *),
void (*exit)(struct blk_mq_hw_ctx *))
{
struct blk_mq_hw_ctx *hctx;
int ret;
int i;
queue_for_each_hw_ctx(q, hctx, i) {
hctx->sched_data = kmalloc_node(size, GFP_KERNEL, hctx->numa_node);
if (!hctx->sched_data) {
ret = -ENOMEM;
goto error;
}
if (init) {
ret = init(hctx);
if (ret) {
/*
* We don't want to give exit() a partially
* initialized sched_data. init() must clean up
* if it fails.
*/
kfree(hctx->sched_data);
hctx->sched_data = NULL;
goto error;
}
}
}
return 0;
error:
blk_mq_sched_free_hctx_data(q, exit);
return ret;
}
EXPORT_SYMBOL_GPL(blk_mq_sched_init_hctx_data);
static void __blk_mq_sched_assign_ioc(struct request_queue *q,
struct request *rq,
struct bio *bio,
@ -119,7 +82,11 @@ struct request *blk_mq_sched_get_request(struct request_queue *q,
if (likely(!data->hctx))
data->hctx = blk_mq_map_queue(q, data->ctx->cpu);
if (e) {
/*
* For a reserved tag, allocate a normal request since we might
* have driver dependencies on the value of the internal tag.
*/
if (e && !(data->flags & BLK_MQ_REQ_RESERVED)) {
data->flags |= BLK_MQ_REQ_INTERNAL;
/*
@ -227,22 +194,6 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
}
}
void blk_mq_sched_move_to_dispatch(struct blk_mq_hw_ctx *hctx,
struct list_head *rq_list,
struct request *(*get_rq)(struct blk_mq_hw_ctx *))
{
do {
struct request *rq;
rq = get_rq(hctx);
if (!rq)
break;
list_add_tail(&rq->queuelist, rq_list);
} while (1);
}
EXPORT_SYMBOL_GPL(blk_mq_sched_move_to_dispatch);
bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
struct request **merged_request)
{
@ -508,11 +459,24 @@ int blk_mq_sched_init_hctx(struct request_queue *q, struct blk_mq_hw_ctx *hctx,
unsigned int hctx_idx)
{
struct elevator_queue *e = q->elevator;
int ret;
if (!e)
return 0;
return blk_mq_sched_alloc_tags(q, hctx, hctx_idx);
ret = blk_mq_sched_alloc_tags(q, hctx, hctx_idx);
if (ret)
return ret;
if (e->type->ops.mq.init_hctx) {
ret = e->type->ops.mq.init_hctx(hctx, hctx_idx);
if (ret) {
blk_mq_sched_free_tags(q->tag_set, hctx, hctx_idx);
return ret;
}
}
return 0;
}
void blk_mq_sched_exit_hctx(struct request_queue *q, struct blk_mq_hw_ctx *hctx,
@ -523,12 +487,18 @@ void blk_mq_sched_exit_hctx(struct request_queue *q, struct blk_mq_hw_ctx *hctx,
if (!e)
return;
if (e->type->ops.mq.exit_hctx && hctx->sched_data) {
e->type->ops.mq.exit_hctx(hctx, hctx_idx);
hctx->sched_data = NULL;
}
blk_mq_sched_free_tags(q->tag_set, hctx, hctx_idx);
}
int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
{
struct blk_mq_hw_ctx *hctx;
struct elevator_queue *eq;
unsigned int i;
int ret;
@ -553,6 +523,18 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
if (ret)
goto err;
if (e->ops.mq.init_hctx) {
queue_for_each_hw_ctx(q, hctx, i) {
ret = e->ops.mq.init_hctx(hctx, i);
if (ret) {
eq = q->elevator;
blk_mq_exit_sched(q, eq);
kobject_put(&eq->kobj);
return ret;
}
}
}
return 0;
err:
@ -563,6 +545,17 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e)
{
struct blk_mq_hw_ctx *hctx;
unsigned int i;
if (e->type->ops.mq.exit_hctx) {
queue_for_each_hw_ctx(q, hctx, i) {
if (hctx->sched_data) {
e->type->ops.mq.exit_hctx(hctx, i);
hctx->sched_data = NULL;
}
}
}
if (e->type->ops.mq.exit_sched)
e->type->ops.mq.exit_sched(e);
blk_mq_sched_tags_teardown(q);

View File

@ -4,10 +4,6 @@
#include "blk-mq.h"
#include "blk-mq-tag.h"
int blk_mq_sched_init_hctx_data(struct request_queue *q, size_t size,
int (*init)(struct blk_mq_hw_ctx *),
void (*exit)(struct blk_mq_hw_ctx *));
void blk_mq_sched_free_hctx_data(struct request_queue *q,
void (*exit)(struct blk_mq_hw_ctx *));
@ -28,9 +24,6 @@ void blk_mq_sched_insert_requests(struct request_queue *q,
struct list_head *list, bool run_queue_async);
void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx);
void blk_mq_sched_move_to_dispatch(struct blk_mq_hw_ctx *hctx,
struct list_head *rq_list,
struct request *(*get_rq)(struct blk_mq_hw_ctx *));
int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e);
void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e);
@ -86,17 +79,12 @@ blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
return true;
}
static inline void
blk_mq_sched_completed_request(struct blk_mq_hw_ctx *hctx, struct request *rq)
static inline void blk_mq_sched_completed_request(struct request *rq)
{
struct elevator_queue *e = hctx->queue->elevator;
struct elevator_queue *e = rq->q->elevator;
if (e && e->type->ops.mq.completed_request)
e->type->ops.mq.completed_request(hctx, rq);
BUG_ON(rq->internal_tag == -1);
blk_mq_put_tag(hctx, hctx->sched_tags, rq->mq_ctx, rq->internal_tag);
e->type->ops.mq.completed_request(rq);
}
static inline void blk_mq_sched_started_request(struct request *rq)

View File

@ -253,10 +253,12 @@ static void __blk_mq_unregister_dev(struct device *dev, struct request_queue *q)
struct blk_mq_hw_ctx *hctx;
int i;
lockdep_assert_held(&q->sysfs_lock);
queue_for_each_hw_ctx(q, hctx, i)
blk_mq_unregister_hctx(hctx);
blk_mq_debugfs_unregister_hctxs(q);
blk_mq_debugfs_unregister_mq(q);
kobject_uevent(&q->mq_kobj, KOBJ_REMOVE);
kobject_del(&q->mq_kobj);
@ -267,9 +269,9 @@ static void __blk_mq_unregister_dev(struct device *dev, struct request_queue *q)
void blk_mq_unregister_dev(struct device *dev, struct request_queue *q)
{
blk_mq_disable_hotplug();
mutex_lock(&q->sysfs_lock);
__blk_mq_unregister_dev(dev, q);
blk_mq_enable_hotplug();
mutex_unlock(&q->sysfs_lock);
}
void blk_mq_hctx_kobj_init(struct blk_mq_hw_ctx *hctx)
@ -302,12 +304,13 @@ void blk_mq_sysfs_init(struct request_queue *q)
}
}
int blk_mq_register_dev(struct device *dev, struct request_queue *q)
int __blk_mq_register_dev(struct device *dev, struct request_queue *q)
{
struct blk_mq_hw_ctx *hctx;
int ret, i;
blk_mq_disable_hotplug();
WARN_ON_ONCE(!q->kobj.parent);
lockdep_assert_held(&q->sysfs_lock);
ret = kobject_add(&q->mq_kobj, kobject_get(&dev->kobj), "%s", "mq");
if (ret < 0)
@ -315,20 +318,38 @@ int blk_mq_register_dev(struct device *dev, struct request_queue *q)
kobject_uevent(&q->mq_kobj, KOBJ_ADD);
blk_mq_debugfs_register(q, kobject_name(&dev->kobj));
blk_mq_debugfs_register(q);
queue_for_each_hw_ctx(q, hctx, i) {
ret = blk_mq_register_hctx(hctx);
if (ret)
break;
goto unreg;
}
if (ret)
__blk_mq_unregister_dev(dev, q);
else
q->mq_sysfs_init_done = true;
q->mq_sysfs_init_done = true;
out:
blk_mq_enable_hotplug();
return ret;
unreg:
while (--i >= 0)
blk_mq_unregister_hctx(q->queue_hw_ctx[i]);
blk_mq_debugfs_unregister_mq(q);
kobject_uevent(&q->mq_kobj, KOBJ_REMOVE);
kobject_del(&q->mq_kobj);
kobject_put(&dev->kobj);
return ret;
}
int blk_mq_register_dev(struct device *dev, struct request_queue *q)
{
int ret;
mutex_lock(&q->sysfs_lock);
ret = __blk_mq_register_dev(dev, q);
mutex_unlock(&q->sysfs_lock);
return ret;
}
@ -339,13 +360,17 @@ void blk_mq_sysfs_unregister(struct request_queue *q)
struct blk_mq_hw_ctx *hctx;
int i;
mutex_lock(&q->sysfs_lock);
if (!q->mq_sysfs_init_done)
return;
goto unlock;
blk_mq_debugfs_unregister_hctxs(q);
blk_mq_debugfs_unregister_mq(q);
queue_for_each_hw_ctx(q, hctx, i)
blk_mq_unregister_hctx(hctx);
unlock:
mutex_unlock(&q->sysfs_lock);
}
int blk_mq_sysfs_register(struct request_queue *q)
@ -353,10 +378,11 @@ int blk_mq_sysfs_register(struct request_queue *q)
struct blk_mq_hw_ctx *hctx;
int i, ret = 0;
mutex_lock(&q->sysfs_lock);
if (!q->mq_sysfs_init_done)
return ret;
goto unlock;
blk_mq_debugfs_register_hctxs(q);
blk_mq_debugfs_register_mq(q);
queue_for_each_hw_ctx(q, hctx, i) {
ret = blk_mq_register_hctx(hctx);
@ -364,5 +390,8 @@ int blk_mq_sysfs_register(struct request_queue *q)
break;
}
unlock:
mutex_unlock(&q->sysfs_lock);
return ret;
}

View File

@ -96,7 +96,10 @@ static int __blk_mq_get_tag(struct blk_mq_alloc_data *data,
if (!(data->flags & BLK_MQ_REQ_INTERNAL) &&
!hctx_may_queue(data->hctx, bt))
return -1;
return __sbitmap_queue_get(bt);
if (data->shallow_depth)
return __sbitmap_queue_get_shallow(bt, data->shallow_depth);
else
return __sbitmap_queue_get(bt);
}
unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)

View File

@ -39,6 +39,26 @@
static DEFINE_MUTEX(all_q_mutex);
static LIST_HEAD(all_q_list);
static void blk_mq_poll_stats_start(struct request_queue *q);
static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
static int blk_mq_poll_stats_bkt(const struct request *rq)
{
int ddir, bytes, bucket;
ddir = rq_data_dir(rq);
bytes = blk_rq_bytes(rq);
bucket = ddir + 2*(ilog2(bytes) - 9);
if (bucket < 0)
return -1;
else if (bucket >= BLK_MQ_POLL_STATS_BKTS)
return ddir + BLK_MQ_POLL_STATS_BKTS - 2;
return bucket;
}
/*
* Check if any of the ctx's have pending work in this hardware queue
*/
@ -65,7 +85,7 @@ static void blk_mq_hctx_clear_pending(struct blk_mq_hw_ctx *hctx,
sbitmap_clear_bit(&hctx->ctx_map, ctx->index_hw);
}
void blk_mq_freeze_queue_start(struct request_queue *q)
void blk_freeze_queue_start(struct request_queue *q)
{
int freeze_depth;
@ -75,7 +95,7 @@ void blk_mq_freeze_queue_start(struct request_queue *q)
blk_mq_run_hw_queues(q, false);
}
}
EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_start);
EXPORT_SYMBOL_GPL(blk_freeze_queue_start);
void blk_mq_freeze_queue_wait(struct request_queue *q)
{
@ -105,7 +125,7 @@ void blk_freeze_queue(struct request_queue *q)
* no blk_unfreeze_queue(), and blk_freeze_queue() is not
* exported to drivers as the only user for unfreeze is blk_mq.
*/
blk_mq_freeze_queue_start(q);
blk_freeze_queue_start(q);
blk_mq_freeze_queue_wait(q);
}
@ -210,7 +230,6 @@ void blk_mq_rq_ctx_init(struct request_queue *q, struct blk_mq_ctx *ctx,
#endif
rq->special = NULL;
/* tag was already set */
rq->errors = 0;
rq->extra_len = 0;
INIT_LIST_HEAD(&rq->timeout_list);
@ -347,7 +366,7 @@ void __blk_mq_finish_request(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
if (rq->tag != -1)
blk_mq_put_tag(hctx, hctx->tags, ctx, rq->tag);
if (sched_tag != -1)
blk_mq_sched_completed_request(hctx, rq);
blk_mq_put_tag(hctx, hctx->sched_tags, ctx, sched_tag);
blk_mq_sched_restart(hctx);
blk_queue_exit(q);
}
@ -365,6 +384,7 @@ void blk_mq_finish_request(struct request *rq)
{
blk_mq_finish_hctx_request(blk_mq_map_queue(rq->q, rq->mq_ctx->cpu), rq);
}
EXPORT_SYMBOL_GPL(blk_mq_finish_request);
void blk_mq_free_request(struct request *rq)
{
@ -402,12 +422,19 @@ static void __blk_mq_complete_request_remote(void *data)
rq->q->softirq_done_fn(rq);
}
static void blk_mq_ipi_complete_request(struct request *rq)
static void __blk_mq_complete_request(struct request *rq)
{
struct blk_mq_ctx *ctx = rq->mq_ctx;
bool shared = false;
int cpu;
if (rq->internal_tag != -1)
blk_mq_sched_completed_request(rq);
if (rq->rq_flags & RQF_STATS) {
blk_mq_poll_stats_start(rq->q);
blk_stat_add(rq);
}
if (!test_bit(QUEUE_FLAG_SAME_COMP, &rq->q->queue_flags)) {
rq->q->softirq_done_fn(rq);
return;
@ -428,33 +455,6 @@ static void blk_mq_ipi_complete_request(struct request *rq)
put_cpu();
}
static void blk_mq_stat_add(struct request *rq)
{
if (rq->rq_flags & RQF_STATS) {
/*
* We could rq->mq_ctx here, but there's less of a risk
* of races if we have the completion event add the stats
* to the local software queue.
*/
struct blk_mq_ctx *ctx;
ctx = __blk_mq_get_ctx(rq->q, raw_smp_processor_id());
blk_stat_add(&ctx->stat[rq_data_dir(rq)], rq);
}
}
static void __blk_mq_complete_request(struct request *rq)
{
struct request_queue *q = rq->q;
blk_mq_stat_add(rq);
if (!q->softirq_done_fn)
blk_mq_end_request(rq, rq->errors);
else
blk_mq_ipi_complete_request(rq);
}
/**
* blk_mq_complete_request - end I/O on a request
* @rq: the request being processed
@ -463,16 +463,14 @@ static void __blk_mq_complete_request(struct request *rq)
* Ends all I/O on a request. It does not handle partial completions.
* The actual completion happens out-of-order, through a IPI handler.
**/
void blk_mq_complete_request(struct request *rq, int error)
void blk_mq_complete_request(struct request *rq)
{
struct request_queue *q = rq->q;
if (unlikely(blk_should_fake_timeout(q)))
return;
if (!blk_mark_rq_complete(rq)) {
rq->errors = error;
if (!blk_mark_rq_complete(rq))
__blk_mq_complete_request(rq);
}
}
EXPORT_SYMBOL(blk_mq_complete_request);
@ -491,7 +489,7 @@ void blk_mq_start_request(struct request *rq)
trace_block_rq_issue(q, rq);
if (test_bit(QUEUE_FLAG_STATS, &q->queue_flags)) {
blk_stat_set_issue_time(&rq->issue_stat);
blk_stat_set_issue(&rq->issue_stat, blk_rq_sectors(rq));
rq->rq_flags |= RQF_STATS;
wbt_issue(q->rq_wb, &rq->issue_stat);
}
@ -526,6 +524,15 @@ void blk_mq_start_request(struct request *rq)
}
EXPORT_SYMBOL(blk_mq_start_request);
/*
* When we reach here because queue is busy, REQ_ATOM_COMPLETE
* flag isn't set yet, so there may be race with timeout handler,
* but given rq->deadline is just set in .queue_rq() under
* this situation, the race won't be possible in reality because
* rq->timeout should be set as big enough to cover the window
* between blk_mq_start_request() called from .queue_rq() and
* clearing REQ_ATOM_STARTED here.
*/
static void __blk_mq_requeue_request(struct request *rq)
{
struct request_queue *q = rq->q;
@ -633,8 +640,7 @@ void blk_mq_abort_requeue_list(struct request_queue *q)
rq = list_first_entry(&rq_list, struct request, queuelist);
list_del_init(&rq->queuelist);
rq->errors = -EIO;
blk_mq_end_request(rq, rq->errors);
blk_mq_end_request(rq, -EIO);
}
}
EXPORT_SYMBOL(blk_mq_abort_requeue_list);
@ -666,7 +672,7 @@ void blk_mq_rq_timed_out(struct request *req, bool reserved)
* just be ignored. This can happen due to the bitflag ordering.
* Timeout first checks if STARTED is set, and if it is, assumes
* the request is active. But if we race with completion, then
* we both flags will get cleared. So check here again, and ignore
* both flags will get cleared. So check here again, and ignore
* a timeout event with a request that isn't active.
*/
if (!test_bit(REQ_ATOM_STARTED, &req->atomic_flags))
@ -699,6 +705,19 @@ static void blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
if (!test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
return;
/*
* The rq being checked may have been freed and reallocated
* out already here, we avoid this race by checking rq->deadline
* and REQ_ATOM_COMPLETE flag together:
*
* - if rq->deadline is observed as new value because of
* reusing, the rq won't be timed out because of timing.
* - if rq->deadline is observed as previous value,
* REQ_ATOM_COMPLETE flag won't be cleared in reuse path
* because we put a barrier between setting rq->deadline
* and clearing the flag in blk_mq_start_request(), so
* this rq won't be timed out too.
*/
if (time_after_eq(jiffies, rq->deadline)) {
if (!blk_mark_rq_complete(rq))
blk_mq_rq_timed_out(rq, reserved);
@ -727,7 +746,7 @@ static void blk_mq_timeout_work(struct work_struct *work)
* percpu_ref_tryget directly, because we need to be able to
* obtain a reference even in the short window between the queue
* starting to freeze, by dropping the first reference in
* blk_mq_freeze_queue_start, and the moment the last request is
* blk_freeze_queue_start, and the moment the last request is
* consumed, marked by the instant q_usage_counter reaches
* zero.
*/
@ -845,6 +864,8 @@ bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx,
.flags = wait ? 0 : BLK_MQ_REQ_NOWAIT,
};
might_sleep_if(wait);
if (rq->tag != -1)
goto done;
@ -964,19 +985,11 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list)
{
struct blk_mq_hw_ctx *hctx;
struct request *rq;
LIST_HEAD(driver_list);
struct list_head *dptr;
int errors, queued, ret = BLK_MQ_RQ_QUEUE_OK;
if (list_empty(list))
return false;
/*
* Start off with dptr being NULL, so we start the first request
* immediately, even if we have more pending.
*/
dptr = NULL;
/*
* Now process all the entries, sending them to the driver.
*/
@ -993,23 +1006,21 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list)
* The initial allocation attempt failed, so we need to
* rerun the hardware queue when a tag is freed.
*/
if (blk_mq_dispatch_wait_add(hctx)) {
/*
* It's possible that a tag was freed in the
* window between the allocation failure and
* adding the hardware queue to the wait queue.
*/
if (!blk_mq_get_driver_tag(rq, &hctx, false))
break;
} else {
if (!blk_mq_dispatch_wait_add(hctx))
break;
/*
* It's possible that a tag was freed in the window
* between the allocation failure and adding the
* hardware queue to the wait queue.
*/
if (!blk_mq_get_driver_tag(rq, &hctx, false))
break;
}
}
list_del_init(&rq->queuelist);
bd.rq = rq;
bd.list = dptr;
/*
* Flag last if we have no more requests, or if we have more
@ -1038,20 +1049,12 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list)
pr_err("blk-mq: bad return on queue: %d\n", ret);
case BLK_MQ_RQ_QUEUE_ERROR:
errors++;
rq->errors = -EIO;
blk_mq_end_request(rq, rq->errors);
blk_mq_end_request(rq, -EIO);
break;
}
if (ret == BLK_MQ_RQ_QUEUE_BUSY)
break;
/*
* We've done the first request. If we have more than 1
* left in the list, set dptr to defer issue.
*/
if (!dptr && list->next != list->prev)
dptr = &driver_list;
} while (!list_empty(list));
hctx->dispatched[queued_to_index(queued)]++;
@ -1062,8 +1065,8 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list)
*/
if (!list_empty(list)) {
/*
* If we got a driver tag for the next request already,
* free it again.
* If an I/O scheduler has been configured and we got a driver
* tag for the next request already, free it again.
*/
rq = list_first_entry(list, struct request, queuelist);
blk_mq_put_driver_tag(rq);
@ -1073,16 +1076,24 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list)
spin_unlock(&hctx->lock);
/*
* the queue is expected stopped with BLK_MQ_RQ_QUEUE_BUSY, but
* it's possible the queue is stopped and restarted again
* before this. Queue restart will dispatch requests. And since
* requests in rq_list aren't added into hctx->dispatch yet,
* the requests in rq_list might get lost.
* If SCHED_RESTART was set by the caller of this function and
* it is no longer set that means that it was cleared by another
* thread and hence that a queue rerun is needed.
*
* blk_mq_run_hw_queue() already checks the STOPPED bit
* If TAG_WAITING is set that means that an I/O scheduler has
* been configured and another thread is waiting for a driver
* tag. To guarantee fairness, do not rerun this hardware queue
* but let the other thread grab the driver tag.
*
* If RESTART or TAG_WAITING is set, then let completion restart
* the queue instead of potentially looping here.
* If no I/O scheduler has been configured it is possible that
* the hardware queue got stopped and restarted before requests
* were pushed back onto the dispatch list. Rerun the queue to
* avoid starvation. Notes:
* - blk_mq_run_hw_queue() checks whether or not a queue has
* been stopped before rerunning a queue.
* - Some but not all block drivers stop a queue before
* returning BLK_MQ_RQ_QUEUE_BUSY. Two exceptions are scsi-mq
* and dm-rq.
*/
if (!blk_mq_sched_needs_restart(hctx) &&
!test_bit(BLK_MQ_S_TAG_WAITING, &hctx->state))
@ -1104,6 +1115,8 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
blk_mq_sched_dispatch_requests(hctx);
rcu_read_unlock();
} else {
might_sleep();
srcu_idx = srcu_read_lock(&hctx->queue_rq_srcu);
blk_mq_sched_dispatch_requests(hctx);
srcu_read_unlock(&hctx->queue_rq_srcu, srcu_idx);
@ -1153,13 +1166,9 @@ static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async,
put_cpu();
}
if (msecs == 0)
kblockd_schedule_work_on(blk_mq_hctx_next_cpu(hctx),
&hctx->run_work);
else
kblockd_schedule_delayed_work_on(blk_mq_hctx_next_cpu(hctx),
&hctx->delayed_run_work,
msecs_to_jiffies(msecs));
kblockd_schedule_delayed_work_on(blk_mq_hctx_next_cpu(hctx),
&hctx->run_work,
msecs_to_jiffies(msecs));
}
void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs)
@ -1172,6 +1181,7 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
{
__blk_mq_delay_run_hw_queue(hctx, async, 0);
}
EXPORT_SYMBOL(blk_mq_run_hw_queue);
void blk_mq_run_hw_queues(struct request_queue *q, bool async)
{
@ -1210,8 +1220,7 @@ EXPORT_SYMBOL(blk_mq_queue_stopped);
void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx)
{
cancel_work(&hctx->run_work);
cancel_delayed_work(&hctx->delay_work);
cancel_delayed_work_sync(&hctx->run_work);
set_bit(BLK_MQ_S_STOPPED, &hctx->state);
}
EXPORT_SYMBOL(blk_mq_stop_hw_queue);
@ -1268,38 +1277,40 @@ static void blk_mq_run_work_fn(struct work_struct *work)
{
struct blk_mq_hw_ctx *hctx;
hctx = container_of(work, struct blk_mq_hw_ctx, run_work);
hctx = container_of(work, struct blk_mq_hw_ctx, run_work.work);
/*
* If we are stopped, don't run the queue. The exception is if
* BLK_MQ_S_START_ON_RUN is set. For that case, we auto-clear
* the STOPPED bit and run it.
*/
if (test_bit(BLK_MQ_S_STOPPED, &hctx->state)) {
if (!test_bit(BLK_MQ_S_START_ON_RUN, &hctx->state))
return;
clear_bit(BLK_MQ_S_START_ON_RUN, &hctx->state);
clear_bit(BLK_MQ_S_STOPPED, &hctx->state);
}
__blk_mq_run_hw_queue(hctx);
}
static void blk_mq_delayed_run_work_fn(struct work_struct *work)
{
struct blk_mq_hw_ctx *hctx;
hctx = container_of(work, struct blk_mq_hw_ctx, delayed_run_work.work);
__blk_mq_run_hw_queue(hctx);
}
static void blk_mq_delay_work_fn(struct work_struct *work)
{
struct blk_mq_hw_ctx *hctx;
hctx = container_of(work, struct blk_mq_hw_ctx, delay_work.work);
if (test_and_clear_bit(BLK_MQ_S_STOPPED, &hctx->state))
__blk_mq_run_hw_queue(hctx);
}
void blk_mq_delay_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs)
{
if (unlikely(!blk_mq_hw_queue_mapped(hctx)))
return;
/*
* Stop the hw queue, then modify currently delayed work.
* This should prevent us from running the queue prematurely.
* Mark the queue as auto-clearing STOPPED when it runs.
*/
blk_mq_stop_hw_queue(hctx);
kblockd_schedule_delayed_work_on(blk_mq_hctx_next_cpu(hctx),
&hctx->delay_work, msecs_to_jiffies(msecs));
set_bit(BLK_MQ_S_START_ON_RUN, &hctx->state);
kblockd_mod_delayed_work_on(blk_mq_hctx_next_cpu(hctx),
&hctx->run_work,
msecs_to_jiffies(msecs));
}
EXPORT_SYMBOL(blk_mq_delay_queue);
@ -1408,7 +1419,7 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
static void blk_mq_bio_to_request(struct request *rq, struct bio *bio)
{
init_request_from_bio(rq, bio);
blk_init_request_from_bio(rq, bio);
blk_account_io_start(rq, true);
}
@ -1453,14 +1464,13 @@ static blk_qc_t request_to_qc_t(struct blk_mq_hw_ctx *hctx, struct request *rq)
return blk_tag_to_qc_t(rq->internal_tag, hctx->queue_num, true);
}
static void blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie,
static void __blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie,
bool may_sleep)
{
struct request_queue *q = rq->q;
struct blk_mq_queue_data bd = {
.rq = rq,
.list = NULL,
.last = 1
.last = true,
};
struct blk_mq_hw_ctx *hctx;
blk_qc_t new_cookie;
@ -1485,31 +1495,42 @@ static void blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie,
return;
}
__blk_mq_requeue_request(rq);
if (ret == BLK_MQ_RQ_QUEUE_ERROR) {
*cookie = BLK_QC_T_NONE;
rq->errors = -EIO;
blk_mq_end_request(rq, rq->errors);
blk_mq_end_request(rq, -EIO);
return;
}
__blk_mq_requeue_request(rq);
insert:
blk_mq_sched_insert_request(rq, false, true, false, may_sleep);
}
/*
* Multiple hardware queue variant. This will not use per-process plugs,
* but will attempt to bypass the hctx queueing if we can go straight to
* hardware for SYNC IO.
*/
static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
struct request *rq, blk_qc_t *cookie)
{
if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
rcu_read_lock();
__blk_mq_try_issue_directly(rq, cookie, false);
rcu_read_unlock();
} else {
unsigned int srcu_idx;
might_sleep();
srcu_idx = srcu_read_lock(&hctx->queue_rq_srcu);
__blk_mq_try_issue_directly(rq, cookie, true);
srcu_read_unlock(&hctx->queue_rq_srcu, srcu_idx);
}
}
static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
{
const int is_sync = op_is_sync(bio->bi_opf);
const int is_flush_fua = op_is_flush(bio->bi_opf);
struct blk_mq_alloc_data data = { .flags = 0 };
struct request *rq;
unsigned int request_count = 0, srcu_idx;
unsigned int request_count = 0;
struct blk_plug *plug;
struct request *same_queue_rq = NULL;
blk_qc_t cookie;
@ -1545,147 +1566,21 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
cookie = request_to_qc_t(data.hctx, rq);
if (unlikely(is_flush_fua)) {
if (q->elevator)
goto elv_insert;
blk_mq_bio_to_request(rq, bio);
blk_insert_flush(rq);
goto run_queue;
}
plug = current->plug;
/*
* If the driver supports defer issued based on 'last', then
* queue it up like normal since we can potentially save some
* CPU this way.
*/
if (((plug && !blk_queue_nomerges(q)) || is_sync) &&
!(data.hctx->flags & BLK_MQ_F_DEFER_ISSUE)) {
struct request *old_rq = NULL;
blk_mq_bio_to_request(rq, bio);
/*
* We do limited plugging. If the bio can be merged, do that.
* Otherwise the existing request in the plug list will be
* issued. So the plug list will have one request at most
*/
if (plug) {
/*
* The plug list might get flushed before this. If that
* happens, same_queue_rq is invalid and plug list is
* empty
*/
if (same_queue_rq && !list_empty(&plug->mq_list)) {
old_rq = same_queue_rq;
list_del_init(&old_rq->queuelist);
}
list_add_tail(&rq->queuelist, &plug->mq_list);
} else /* is_sync */
old_rq = rq;
if (unlikely(is_flush_fua)) {
blk_mq_put_ctx(data.ctx);
if (!old_rq)
goto done;
if (!(data.hctx->flags & BLK_MQ_F_BLOCKING)) {
rcu_read_lock();
blk_mq_try_issue_directly(old_rq, &cookie, false);
rcu_read_unlock();
blk_mq_bio_to_request(rq, bio);
if (q->elevator) {
blk_mq_sched_insert_request(rq, false, true, true,
true);
} else {
srcu_idx = srcu_read_lock(&data.hctx->queue_rq_srcu);
blk_mq_try_issue_directly(old_rq, &cookie, true);
srcu_read_unlock(&data.hctx->queue_rq_srcu, srcu_idx);
blk_insert_flush(rq);
blk_mq_run_hw_queue(data.hctx, true);
}
goto done;
}
if (q->elevator) {
elv_insert:
blk_mq_put_ctx(data.ctx);
blk_mq_bio_to_request(rq, bio);
blk_mq_sched_insert_request(rq, false, true,
!is_sync || is_flush_fua, true);
goto done;
}
if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio)) {
/*
* For a SYNC request, send it to the hardware immediately. For
* an ASYNC request, just ensure that we run it later on. The
* latter allows for merging opportunities and more efficient
* dispatching.
*/
run_queue:
blk_mq_run_hw_queue(data.hctx, !is_sync || is_flush_fua);
}
blk_mq_put_ctx(data.ctx);
done:
return cookie;
}
/*
* Single hardware queue variant. This will attempt to use any per-process
* plug for merging and IO deferral.
*/
static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio)
{
const int is_sync = op_is_sync(bio->bi_opf);
const int is_flush_fua = op_is_flush(bio->bi_opf);
struct blk_plug *plug;
unsigned int request_count = 0;
struct blk_mq_alloc_data data = { .flags = 0 };
struct request *rq;
blk_qc_t cookie;
unsigned int wb_acct;
blk_queue_bounce(q, &bio);
if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) {
bio_io_error(bio);
return BLK_QC_T_NONE;
}
blk_queue_split(q, &bio, q->bio_split);
if (!is_flush_fua && !blk_queue_nomerges(q)) {
if (blk_attempt_plug_merge(q, bio, &request_count, NULL))
return BLK_QC_T_NONE;
} else
request_count = blk_plug_queued_count(q);
if (blk_mq_sched_bio_merge(q, bio))
return BLK_QC_T_NONE;
wb_acct = wbt_wait(q->rq_wb, bio, NULL);
trace_block_getrq(q, bio, bio->bi_opf);
rq = blk_mq_sched_get_request(q, bio, bio->bi_opf, &data);
if (unlikely(!rq)) {
__wbt_done(q->rq_wb, wb_acct);
return BLK_QC_T_NONE;
}
wbt_track(&rq->issue_stat, wb_acct);
cookie = request_to_qc_t(data.hctx, rq);
if (unlikely(is_flush_fua)) {
if (q->elevator)
goto elv_insert;
blk_mq_bio_to_request(rq, bio);
blk_insert_flush(rq);
goto run_queue;
}
/*
* A task plug currently exists. Since this is completely lockless,
* utilize that to temporarily store requests until the task is
* either done or scheduled away.
*/
plug = current->plug;
if (plug) {
} else if (plug && q->nr_hw_queues == 1) {
struct request *last = NULL;
blk_mq_put_ctx(data.ctx);
blk_mq_bio_to_request(rq, bio);
/*
@ -1694,13 +1589,14 @@ static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio)
*/
if (list_empty(&plug->mq_list))
request_count = 0;
else if (blk_queue_nomerges(q))
request_count = blk_plug_queued_count(q);
if (!request_count)
trace_block_plug(q);
else
last = list_entry_rq(plug->mq_list.prev);
blk_mq_put_ctx(data.ctx);
if (request_count >= BLK_MAX_REQUEST_COUNT || (last &&
blk_rq_bytes(last) >= BLK_PLUG_FLUSH_SIZE)) {
blk_flush_plug_list(plug, false);
@ -1708,30 +1604,41 @@ static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio)
}
list_add_tail(&rq->queuelist, &plug->mq_list);
return cookie;
}
} else if (plug && !blk_queue_nomerges(q)) {
blk_mq_bio_to_request(rq, bio);
if (q->elevator) {
elv_insert:
/*
* We do limited plugging. If the bio can be merged, do that.
* Otherwise the existing request in the plug list will be
* issued. So the plug list will have one request at most
* The plug list might get flushed before this. If that happens,
* the plug list is empty, and same_queue_rq is invalid.
*/
if (list_empty(&plug->mq_list))
same_queue_rq = NULL;
if (same_queue_rq)
list_del_init(&same_queue_rq->queuelist);
list_add_tail(&rq->queuelist, &plug->mq_list);
blk_mq_put_ctx(data.ctx);
if (same_queue_rq)
blk_mq_try_issue_directly(data.hctx, same_queue_rq,
&cookie);
} else if (q->nr_hw_queues > 1 && is_sync) {
blk_mq_put_ctx(data.ctx);
blk_mq_bio_to_request(rq, bio);
blk_mq_sched_insert_request(rq, false, true,
!is_sync || is_flush_fua, true);
goto done;
}
if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio)) {
/*
* For a SYNC request, send it to the hardware immediately. For
* an ASYNC request, just ensure that we run it later on. The
* latter allows for merging opportunities and more efficient
* dispatching.
*/
run_queue:
blk_mq_run_hw_queue(data.hctx, !is_sync || is_flush_fua);
}
blk_mq_try_issue_directly(data.hctx, rq, &cookie);
} else if (q->elevator) {
blk_mq_put_ctx(data.ctx);
blk_mq_bio_to_request(rq, bio);
blk_mq_sched_insert_request(rq, false, true, true, true);
} else if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio)) {
blk_mq_put_ctx(data.ctx);
blk_mq_run_hw_queue(data.hctx, true);
} else
blk_mq_put_ctx(data.ctx);
blk_mq_put_ctx(data.ctx);
done:
return cookie;
}
@ -1988,9 +1895,7 @@ static int blk_mq_init_hctx(struct request_queue *q,
if (node == NUMA_NO_NODE)
node = hctx->numa_node = set->numa_node;
INIT_WORK(&hctx->run_work, blk_mq_run_work_fn);
INIT_DELAYED_WORK(&hctx->delayed_run_work, blk_mq_delayed_run_work_fn);
INIT_DELAYED_WORK(&hctx->delay_work, blk_mq_delay_work_fn);
INIT_DELAYED_WORK(&hctx->run_work, blk_mq_run_work_fn);
spin_lock_init(&hctx->lock);
INIT_LIST_HEAD(&hctx->dispatch);
hctx->queue = q;
@ -2067,8 +1972,6 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
spin_lock_init(&__ctx->lock);
INIT_LIST_HEAD(&__ctx->rq_list);
__ctx->queue = q;
blk_stat_init(&__ctx->stat[BLK_STAT_READ]);
blk_stat_init(&__ctx->stat[BLK_STAT_WRITE]);
/* If the cpu isn't online, the cpu is mapped to first hctx */
if (!cpu_online(i))
@ -2215,6 +2118,8 @@ static void blk_mq_update_tag_set_depth(struct blk_mq_tag_set *set, bool shared)
{
struct request_queue *q;
lockdep_assert_held(&set->tag_list_lock);
list_for_each_entry(q, &set->tag_list, tag_set_list) {
blk_mq_freeze_queue(q);
queue_set_hctx_shared(q, shared);
@ -2227,7 +2132,8 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q)
struct blk_mq_tag_set *set = q->tag_set;
mutex_lock(&set->tag_list_lock);
list_del_init(&q->tag_set_list);
list_del_rcu(&q->tag_set_list);
INIT_LIST_HEAD(&q->tag_set_list);
if (list_is_singular(&set->tag_list)) {
/* just transitioned to unshared */
set->flags &= ~BLK_MQ_F_TAG_SHARED;
@ -2235,6 +2141,8 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q)
blk_mq_update_tag_set_depth(set, false);
}
mutex_unlock(&set->tag_list_lock);
synchronize_rcu();
}
static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set,
@ -2252,7 +2160,7 @@ static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set,
}
if (set->flags & BLK_MQ_F_TAG_SHARED)
queue_set_hctx_shared(q, true);
list_add_tail(&q->tag_set_list, &set->tag_list);
list_add_tail_rcu(&q->tag_set_list, &set->tag_list);
mutex_unlock(&set->tag_list_lock);
}
@ -2364,6 +2272,12 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
/* mark the queue as mq asap */
q->mq_ops = set->ops;
q->poll_cb = blk_stat_alloc_callback(blk_mq_poll_stats_fn,
blk_mq_poll_stats_bkt,
BLK_MQ_POLL_STATS_BKTS, q);
if (!q->poll_cb)
goto err_exit;
q->queue_ctx = alloc_percpu(struct blk_mq_ctx);
if (!q->queue_ctx)
goto err_exit;
@ -2398,10 +2312,7 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
INIT_LIST_HEAD(&q->requeue_list);
spin_lock_init(&q->requeue_lock);
if (q->nr_hw_queues > 1)
blk_queue_make_request(q, blk_mq_make_request);
else
blk_queue_make_request(q, blk_sq_make_request);
blk_queue_make_request(q, blk_mq_make_request);
/*
* Do this after blk_queue_make_request() overrides it...
@ -2456,8 +2367,6 @@ void blk_mq_free_queue(struct request_queue *q)
list_del_init(&q->all_q_node);
mutex_unlock(&all_q_mutex);
wbt_exit(q);
blk_mq_del_queue_tag_set(q);
blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
@ -2502,7 +2411,7 @@ static void blk_mq_queue_reinit_work(void)
* take place in parallel.
*/
list_for_each_entry(q, &all_q_list, all_q_node)
blk_mq_freeze_queue_start(q);
blk_freeze_queue_start(q);
list_for_each_entry(q, &all_q_list, all_q_node)
blk_mq_freeze_queue_wait(q);
@ -2743,6 +2652,8 @@ void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues)
{
struct request_queue *q;
lockdep_assert_held(&set->tag_list_lock);
if (nr_hw_queues > nr_cpu_ids)
nr_hw_queues = nr_cpu_ids;
if (nr_hw_queues < 1 || nr_hw_queues == set->nr_hw_queues)
@ -2755,16 +2666,6 @@ void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues)
blk_mq_update_queue_map(set);
list_for_each_entry(q, &set->tag_list, tag_set_list) {
blk_mq_realloc_hw_ctxs(set, q);
/*
* Manually set the make_request_fn as blk_queue_make_request
* resets a lot of the queue settings.
*/
if (q->nr_hw_queues > 1)
q->make_request_fn = blk_mq_make_request;
else
q->make_request_fn = blk_sq_make_request;
blk_mq_queue_reinit(q, cpu_online_mask);
}
@ -2773,39 +2674,69 @@ void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues)
}
EXPORT_SYMBOL_GPL(blk_mq_update_nr_hw_queues);
/* Enable polling stats and return whether they were already enabled. */
static bool blk_poll_stats_enable(struct request_queue *q)
{
if (test_bit(QUEUE_FLAG_POLL_STATS, &q->queue_flags) ||
test_and_set_bit(QUEUE_FLAG_POLL_STATS, &q->queue_flags))
return true;
blk_stat_add_callback(q, q->poll_cb);
return false;
}
static void blk_mq_poll_stats_start(struct request_queue *q)
{
/*
* We don't arm the callback if polling stats are not enabled or the
* callback is already active.
*/
if (!test_bit(QUEUE_FLAG_POLL_STATS, &q->queue_flags) ||
blk_stat_is_active(q->poll_cb))
return;
blk_stat_activate_msecs(q->poll_cb, 100);
}
static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb)
{
struct request_queue *q = cb->data;
int bucket;
for (bucket = 0; bucket < BLK_MQ_POLL_STATS_BKTS; bucket++) {
if (cb->stat[bucket].nr_samples)
q->poll_stat[bucket] = cb->stat[bucket];
}
}
static unsigned long blk_mq_poll_nsecs(struct request_queue *q,
struct blk_mq_hw_ctx *hctx,
struct request *rq)
{
struct blk_rq_stat stat[2];
unsigned long ret = 0;
int bucket;
/*
* If stats collection isn't on, don't sleep but turn it on for
* future users
*/
if (!blk_stat_enable(q))
if (!blk_poll_stats_enable(q))
return 0;
/*
* We don't have to do this once per IO, should optimize this
* to just use the current window of stats until it changes
*/
memset(&stat, 0, sizeof(stat));
blk_hctx_stat_get(hctx, stat);
/*
* As an optimistic guess, use half of the mean service time
* for this type of request. We can (and should) make this smarter.
* For instance, if the completion latencies are tight, we can
* get closer than just half the mean. This is especially
* important on devices where the completion latencies are longer
* than ~10 usec.
* than ~10 usec. We do use the stats for the relevant IO size
* if available which does lead to better estimates.
*/
if (req_op(rq) == REQ_OP_READ && stat[BLK_STAT_READ].nr_samples)
ret = (stat[BLK_STAT_READ].mean + 1) / 2;
else if (req_op(rq) == REQ_OP_WRITE && stat[BLK_STAT_WRITE].nr_samples)
ret = (stat[BLK_STAT_WRITE].mean + 1) / 2;
bucket = blk_mq_poll_stats_bkt(rq);
if (bucket < 0)
return ret;
if (q->poll_stat[bucket].nr_samples)
ret = (q->poll_stat[bucket].mean + 1) / 2;
return ret;
}

View File

@ -20,7 +20,6 @@ struct blk_mq_ctx {
/* incremented at completion time */
unsigned long ____cacheline_aligned_in_smp rq_completed[2];
struct blk_rq_stat stat[2];
struct request_queue *queue;
struct kobject kobj;
@ -79,6 +78,7 @@ static inline struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *q,
*/
extern void blk_mq_sysfs_init(struct request_queue *q);
extern void blk_mq_sysfs_deinit(struct request_queue *q);
extern int __blk_mq_register_dev(struct device *dev, struct request_queue *q);
extern int blk_mq_sysfs_register(struct request_queue *q);
extern void blk_mq_sysfs_unregister(struct request_queue *q);
extern void blk_mq_hctx_kobj_init(struct blk_mq_hw_ctx *hctx);
@ -87,13 +87,12 @@ extern void blk_mq_hctx_kobj_init(struct blk_mq_hw_ctx *hctx);
* debugfs helpers
*/
#ifdef CONFIG_BLK_DEBUG_FS
int blk_mq_debugfs_register(struct request_queue *q, const char *name);
int blk_mq_debugfs_register(struct request_queue *q);
void blk_mq_debugfs_unregister(struct request_queue *q);
int blk_mq_debugfs_register_hctxs(struct request_queue *q);
void blk_mq_debugfs_unregister_hctxs(struct request_queue *q);
int blk_mq_debugfs_register_mq(struct request_queue *q);
void blk_mq_debugfs_unregister_mq(struct request_queue *q);
#else
static inline int blk_mq_debugfs_register(struct request_queue *q,
const char *name)
static inline int blk_mq_debugfs_register(struct request_queue *q)
{
return 0;
}
@ -102,12 +101,12 @@ static inline void blk_mq_debugfs_unregister(struct request_queue *q)
{
}
static inline int blk_mq_debugfs_register_hctxs(struct request_queue *q)
static inline int blk_mq_debugfs_register_mq(struct request_queue *q)
{
return 0;
}
static inline void blk_mq_debugfs_unregister_hctxs(struct request_queue *q)
static inline void blk_mq_debugfs_unregister_mq(struct request_queue *q)
{
}
#endif
@ -142,6 +141,7 @@ struct blk_mq_alloc_data {
/* input parameter */
struct request_queue *q;
unsigned int flags;
unsigned int shallow_depth;
/* input & output parameter */
struct blk_mq_ctx *ctx;

View File

@ -103,7 +103,6 @@ void blk_set_default_limits(struct queue_limits *lim)
lim->discard_granularity = 0;
lim->discard_alignment = 0;
lim->discard_misaligned = 0;
lim->discard_zeroes_data = 0;
lim->logical_block_size = lim->physical_block_size = lim->io_min = 512;
lim->bounce_pfn = (unsigned long)(BLK_BOUNCE_ANY >> PAGE_SHIFT);
lim->alignment_offset = 0;
@ -127,7 +126,6 @@ void blk_set_stacking_limits(struct queue_limits *lim)
blk_set_default_limits(lim);
/* Inherit limits from component devices */
lim->discard_zeroes_data = 1;
lim->max_segments = USHRT_MAX;
lim->max_discard_segments = 1;
lim->max_hw_sectors = UINT_MAX;
@ -609,7 +607,6 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
t->io_opt = lcm_not_zero(t->io_opt, b->io_opt);
t->cluster &= b->cluster;
t->discard_zeroes_data &= b->discard_zeroes_data;
/* Physical block size a multiple of the logical block size? */
if (t->physical_block_size & (t->logical_block_size - 1)) {

View File

@ -4,10 +4,27 @@
* Copyright (C) 2016 Jens Axboe
*/
#include <linux/kernel.h>
#include <linux/rculist.h>
#include <linux/blk-mq.h>
#include "blk-stat.h"
#include "blk-mq.h"
#include "blk.h"
#define BLK_RQ_STAT_BATCH 64
struct blk_queue_stats {
struct list_head callbacks;
spinlock_t lock;
bool enable_accounting;
};
static void blk_stat_init(struct blk_rq_stat *stat)
{
stat->min = -1ULL;
stat->max = stat->nr_samples = stat->mean = 0;
stat->batch = stat->nr_batch = 0;
}
static void blk_stat_flush_batch(struct blk_rq_stat *stat)
{
@ -48,166 +65,10 @@ static void blk_stat_sum(struct blk_rq_stat *dst, struct blk_rq_stat *src)
dst->nr_samples += src->nr_samples;
}
static void blk_mq_stat_get(struct request_queue *q, struct blk_rq_stat *dst)
static void __blk_stat_add(struct blk_rq_stat *stat, u64 value)
{
struct blk_mq_hw_ctx *hctx;
struct blk_mq_ctx *ctx;
uint64_t latest = 0;
int i, j, nr;
blk_stat_init(&dst[BLK_STAT_READ]);
blk_stat_init(&dst[BLK_STAT_WRITE]);
nr = 0;
do {
uint64_t newest = 0;
queue_for_each_hw_ctx(q, hctx, i) {
hctx_for_each_ctx(hctx, ctx, j) {
blk_stat_flush_batch(&ctx->stat[BLK_STAT_READ]);
blk_stat_flush_batch(&ctx->stat[BLK_STAT_WRITE]);
if (!ctx->stat[BLK_STAT_READ].nr_samples &&
!ctx->stat[BLK_STAT_WRITE].nr_samples)
continue;
if (ctx->stat[BLK_STAT_READ].time > newest)
newest = ctx->stat[BLK_STAT_READ].time;
if (ctx->stat[BLK_STAT_WRITE].time > newest)
newest = ctx->stat[BLK_STAT_WRITE].time;
}
}
/*
* No samples
*/
if (!newest)
break;
if (newest > latest)
latest = newest;
queue_for_each_hw_ctx(q, hctx, i) {
hctx_for_each_ctx(hctx, ctx, j) {
if (ctx->stat[BLK_STAT_READ].time == newest) {
blk_stat_sum(&dst[BLK_STAT_READ],
&ctx->stat[BLK_STAT_READ]);
nr++;
}
if (ctx->stat[BLK_STAT_WRITE].time == newest) {
blk_stat_sum(&dst[BLK_STAT_WRITE],
&ctx->stat[BLK_STAT_WRITE]);
nr++;
}
}
}
/*
* If we race on finding an entry, just loop back again.
* Should be very rare.
*/
} while (!nr);
dst[BLK_STAT_READ].time = dst[BLK_STAT_WRITE].time = latest;
}
void blk_queue_stat_get(struct request_queue *q, struct blk_rq_stat *dst)
{
if (q->mq_ops)
blk_mq_stat_get(q, dst);
else {
blk_stat_flush_batch(&q->rq_stats[BLK_STAT_READ]);
blk_stat_flush_batch(&q->rq_stats[BLK_STAT_WRITE]);
memcpy(&dst[BLK_STAT_READ], &q->rq_stats[BLK_STAT_READ],
sizeof(struct blk_rq_stat));
memcpy(&dst[BLK_STAT_WRITE], &q->rq_stats[BLK_STAT_WRITE],
sizeof(struct blk_rq_stat));
}
}
void blk_hctx_stat_get(struct blk_mq_hw_ctx *hctx, struct blk_rq_stat *dst)
{
struct blk_mq_ctx *ctx;
unsigned int i, nr;
nr = 0;
do {
uint64_t newest = 0;
hctx_for_each_ctx(hctx, ctx, i) {
blk_stat_flush_batch(&ctx->stat[BLK_STAT_READ]);
blk_stat_flush_batch(&ctx->stat[BLK_STAT_WRITE]);
if (!ctx->stat[BLK_STAT_READ].nr_samples &&
!ctx->stat[BLK_STAT_WRITE].nr_samples)
continue;
if (ctx->stat[BLK_STAT_READ].time > newest)
newest = ctx->stat[BLK_STAT_READ].time;
if (ctx->stat[BLK_STAT_WRITE].time > newest)
newest = ctx->stat[BLK_STAT_WRITE].time;
}
if (!newest)
break;
hctx_for_each_ctx(hctx, ctx, i) {
if (ctx->stat[BLK_STAT_READ].time == newest) {
blk_stat_sum(&dst[BLK_STAT_READ],
&ctx->stat[BLK_STAT_READ]);
nr++;
}
if (ctx->stat[BLK_STAT_WRITE].time == newest) {
blk_stat_sum(&dst[BLK_STAT_WRITE],
&ctx->stat[BLK_STAT_WRITE]);
nr++;
}
}
/*
* If we race on finding an entry, just loop back again.
* Should be very rare, as the window is only updated
* occasionally
*/
} while (!nr);
}
static void __blk_stat_init(struct blk_rq_stat *stat, s64 time_now)
{
stat->min = -1ULL;
stat->max = stat->nr_samples = stat->mean = 0;
stat->batch = stat->nr_batch = 0;
stat->time = time_now & BLK_STAT_NSEC_MASK;
}
void blk_stat_init(struct blk_rq_stat *stat)
{
__blk_stat_init(stat, ktime_to_ns(ktime_get()));
}
static bool __blk_stat_is_current(struct blk_rq_stat *stat, s64 now)
{
return (now & BLK_STAT_NSEC_MASK) == (stat->time & BLK_STAT_NSEC_MASK);
}
bool blk_stat_is_current(struct blk_rq_stat *stat)
{
return __blk_stat_is_current(stat, ktime_to_ns(ktime_get()));
}
void blk_stat_add(struct blk_rq_stat *stat, struct request *rq)
{
s64 now, value;
now = __blk_stat_time(ktime_to_ns(ktime_get()));
if (now < blk_stat_time(&rq->issue_stat))
return;
if (!__blk_stat_is_current(stat, now))
__blk_stat_init(stat, now);
value = now - blk_stat_time(&rq->issue_stat);
if (value > stat->max)
stat->max = value;
if (value < stat->min)
stat->min = value;
stat->min = min(stat->min, value);
stat->max = max(stat->max, value);
if (stat->batch + value < stat->batch ||
stat->nr_batch + 1 == BLK_RQ_STAT_BATCH)
@ -217,40 +78,172 @@ void blk_stat_add(struct blk_rq_stat *stat, struct request *rq)
stat->nr_batch++;
}
void blk_stat_clear(struct request_queue *q)
void blk_stat_add(struct request *rq)
{
if (q->mq_ops) {
struct blk_mq_hw_ctx *hctx;
struct blk_mq_ctx *ctx;
int i, j;
struct request_queue *q = rq->q;
struct blk_stat_callback *cb;
struct blk_rq_stat *stat;
int bucket;
s64 now, value;
queue_for_each_hw_ctx(q, hctx, i) {
hctx_for_each_ctx(hctx, ctx, j) {
blk_stat_init(&ctx->stat[BLK_STAT_READ]);
blk_stat_init(&ctx->stat[BLK_STAT_WRITE]);
}
now = __blk_stat_time(ktime_to_ns(ktime_get()));
if (now < blk_stat_time(&rq->issue_stat))
return;
value = now - blk_stat_time(&rq->issue_stat);
blk_throtl_stat_add(rq, value);
rcu_read_lock();
list_for_each_entry_rcu(cb, &q->stats->callbacks, list) {
if (blk_stat_is_active(cb)) {
bucket = cb->bucket_fn(rq);
if (bucket < 0)
continue;
stat = &this_cpu_ptr(cb->cpu_stat)[bucket];
__blk_stat_add(stat, value);
}
} else {
blk_stat_init(&q->rq_stats[BLK_STAT_READ]);
blk_stat_init(&q->rq_stats[BLK_STAT_WRITE]);
}
rcu_read_unlock();
}
void blk_stat_set_issue_time(struct blk_issue_stat *stat)
static void blk_stat_timer_fn(unsigned long data)
{
stat->time = (stat->time & BLK_STAT_MASK) |
(ktime_to_ns(ktime_get()) & BLK_STAT_TIME_MASK);
}
struct blk_stat_callback *cb = (void *)data;
unsigned int bucket;
int cpu;
/*
* Enable stat tracking, return whether it was enabled
*/
bool blk_stat_enable(struct request_queue *q)
{
if (!test_bit(QUEUE_FLAG_STATS, &q->queue_flags)) {
set_bit(QUEUE_FLAG_STATS, &q->queue_flags);
return false;
for (bucket = 0; bucket < cb->buckets; bucket++)
blk_stat_init(&cb->stat[bucket]);
for_each_online_cpu(cpu) {
struct blk_rq_stat *cpu_stat;
cpu_stat = per_cpu_ptr(cb->cpu_stat, cpu);
for (bucket = 0; bucket < cb->buckets; bucket++) {
blk_stat_sum(&cb->stat[bucket], &cpu_stat[bucket]);
blk_stat_init(&cpu_stat[bucket]);
}
}
return true;
cb->timer_fn(cb);
}
struct blk_stat_callback *
blk_stat_alloc_callback(void (*timer_fn)(struct blk_stat_callback *),
int (*bucket_fn)(const struct request *),
unsigned int buckets, void *data)
{
struct blk_stat_callback *cb;
cb = kmalloc(sizeof(*cb), GFP_KERNEL);
if (!cb)
return NULL;
cb->stat = kmalloc_array(buckets, sizeof(struct blk_rq_stat),
GFP_KERNEL);
if (!cb->stat) {
kfree(cb);
return NULL;
}
cb->cpu_stat = __alloc_percpu(buckets * sizeof(struct blk_rq_stat),
__alignof__(struct blk_rq_stat));
if (!cb->cpu_stat) {
kfree(cb->stat);
kfree(cb);
return NULL;
}
cb->timer_fn = timer_fn;
cb->bucket_fn = bucket_fn;
cb->data = data;
cb->buckets = buckets;
setup_timer(&cb->timer, blk_stat_timer_fn, (unsigned long)cb);
return cb;
}
EXPORT_SYMBOL_GPL(blk_stat_alloc_callback);
void blk_stat_add_callback(struct request_queue *q,
struct blk_stat_callback *cb)
{
unsigned int bucket;
int cpu;
for_each_possible_cpu(cpu) {
struct blk_rq_stat *cpu_stat;
cpu_stat = per_cpu_ptr(cb->cpu_stat, cpu);
for (bucket = 0; bucket < cb->buckets; bucket++)
blk_stat_init(&cpu_stat[bucket]);
}
spin_lock(&q->stats->lock);
list_add_tail_rcu(&cb->list, &q->stats->callbacks);
set_bit(QUEUE_FLAG_STATS, &q->queue_flags);
spin_unlock(&q->stats->lock);
}
EXPORT_SYMBOL_GPL(blk_stat_add_callback);
void blk_stat_remove_callback(struct request_queue *q,
struct blk_stat_callback *cb)
{
spin_lock(&q->stats->lock);
list_del_rcu(&cb->list);
if (list_empty(&q->stats->callbacks) && !q->stats->enable_accounting)
clear_bit(QUEUE_FLAG_STATS, &q->queue_flags);
spin_unlock(&q->stats->lock);
del_timer_sync(&cb->timer);
}
EXPORT_SYMBOL_GPL(blk_stat_remove_callback);
static void blk_stat_free_callback_rcu(struct rcu_head *head)
{
struct blk_stat_callback *cb;
cb = container_of(head, struct blk_stat_callback, rcu);
free_percpu(cb->cpu_stat);
kfree(cb->stat);
kfree(cb);
}
void blk_stat_free_callback(struct blk_stat_callback *cb)
{
if (cb)
call_rcu(&cb->rcu, blk_stat_free_callback_rcu);
}
EXPORT_SYMBOL_GPL(blk_stat_free_callback);
void blk_stat_enable_accounting(struct request_queue *q)
{
spin_lock(&q->stats->lock);
q->stats->enable_accounting = true;
set_bit(QUEUE_FLAG_STATS, &q->queue_flags);
spin_unlock(&q->stats->lock);
}
struct blk_queue_stats *blk_alloc_queue_stats(void)
{
struct blk_queue_stats *stats;
stats = kmalloc(sizeof(*stats), GFP_KERNEL);
if (!stats)
return NULL;
INIT_LIST_HEAD(&stats->callbacks);
spin_lock_init(&stats->lock);
stats->enable_accounting = false;
return stats;
}
void blk_free_queue_stats(struct blk_queue_stats *stats)
{
if (!stats)
return;
WARN_ON(!list_empty(&stats->callbacks));
kfree(stats);
}

View File

@ -1,33 +1,85 @@
#ifndef BLK_STAT_H
#define BLK_STAT_H
/*
* ~0.13s window as a power-of-2 (2^27 nsecs)
*/
#define BLK_STAT_NSEC 134217728ULL
#define BLK_STAT_NSEC_MASK ~(BLK_STAT_NSEC - 1)
#include <linux/kernel.h>
#include <linux/blkdev.h>
#include <linux/ktime.h>
#include <linux/rcupdate.h>
#include <linux/timer.h>
/*
* Upper 3 bits can be used elsewhere
* from upper:
* 3 bits: reserved for other usage
* 12 bits: size
* 49 bits: time
*/
#define BLK_STAT_RES_BITS 3
#define BLK_STAT_SHIFT (64 - BLK_STAT_RES_BITS)
#define BLK_STAT_TIME_MASK ((1ULL << BLK_STAT_SHIFT) - 1)
#define BLK_STAT_MASK ~BLK_STAT_TIME_MASK
#define BLK_STAT_SIZE_BITS 12
#define BLK_STAT_RES_SHIFT (64 - BLK_STAT_RES_BITS)
#define BLK_STAT_SIZE_SHIFT (BLK_STAT_RES_SHIFT - BLK_STAT_SIZE_BITS)
#define BLK_STAT_TIME_MASK ((1ULL << BLK_STAT_SIZE_SHIFT) - 1)
#define BLK_STAT_SIZE_MASK \
(((1ULL << BLK_STAT_SIZE_BITS) - 1) << BLK_STAT_SIZE_SHIFT)
#define BLK_STAT_RES_MASK (~((1ULL << BLK_STAT_RES_SHIFT) - 1))
enum {
BLK_STAT_READ = 0,
BLK_STAT_WRITE,
/**
* struct blk_stat_callback - Block statistics callback.
*
* A &struct blk_stat_callback is associated with a &struct request_queue. While
* @timer is active, that queue's request completion latencies are sorted into
* buckets by @bucket_fn and added to a per-cpu buffer, @cpu_stat. When the
* timer fires, @cpu_stat is flushed to @stat and @timer_fn is invoked.
*/
struct blk_stat_callback {
/*
* @list: RCU list of callbacks for a &struct request_queue.
*/
struct list_head list;
/**
* @timer: Timer for the next callback invocation.
*/
struct timer_list timer;
/**
* @cpu_stat: Per-cpu statistics buckets.
*/
struct blk_rq_stat __percpu *cpu_stat;
/**
* @bucket_fn: Given a request, returns which statistics bucket it
* should be accounted under. Return -1 for no bucket for this
* request.
*/
int (*bucket_fn)(const struct request *);
/**
* @buckets: Number of statistics buckets.
*/
unsigned int buckets;
/**
* @stat: Array of statistics buckets.
*/
struct blk_rq_stat *stat;
/**
* @fn: Callback function.
*/
void (*timer_fn)(struct blk_stat_callback *);
/**
* @data: Private pointer for the user.
*/
void *data;
struct rcu_head rcu;
};
void blk_stat_add(struct blk_rq_stat *, struct request *);
void blk_hctx_stat_get(struct blk_mq_hw_ctx *, struct blk_rq_stat *);
void blk_queue_stat_get(struct request_queue *, struct blk_rq_stat *);
void blk_stat_clear(struct request_queue *);
void blk_stat_init(struct blk_rq_stat *);
bool blk_stat_is_current(struct blk_rq_stat *);
void blk_stat_set_issue_time(struct blk_issue_stat *);
bool blk_stat_enable(struct request_queue *);
struct blk_queue_stats *blk_alloc_queue_stats(void);
void blk_free_queue_stats(struct blk_queue_stats *);
void blk_stat_add(struct request *);
static inline u64 __blk_stat_time(u64 time)
{
@ -36,7 +88,117 @@ static inline u64 __blk_stat_time(u64 time)
static inline u64 blk_stat_time(struct blk_issue_stat *stat)
{
return __blk_stat_time(stat->time);
return __blk_stat_time(stat->stat);
}
static inline sector_t blk_capped_size(sector_t size)
{
return size & ((1ULL << BLK_STAT_SIZE_BITS) - 1);
}
static inline sector_t blk_stat_size(struct blk_issue_stat *stat)
{
return (stat->stat & BLK_STAT_SIZE_MASK) >> BLK_STAT_SIZE_SHIFT;
}
static inline void blk_stat_set_issue(struct blk_issue_stat *stat,
sector_t size)
{
stat->stat = (stat->stat & BLK_STAT_RES_MASK) |
(ktime_to_ns(ktime_get()) & BLK_STAT_TIME_MASK) |
(((u64)blk_capped_size(size)) << BLK_STAT_SIZE_SHIFT);
}
/* record time/size info in request but not add a callback */
void blk_stat_enable_accounting(struct request_queue *q);
/**
* blk_stat_alloc_callback() - Allocate a block statistics callback.
* @timer_fn: Timer callback function.
* @bucket_fn: Bucket callback function.
* @buckets: Number of statistics buckets.
* @data: Value for the @data field of the &struct blk_stat_callback.
*
* See &struct blk_stat_callback for details on the callback functions.
*
* Return: &struct blk_stat_callback on success or NULL on ENOMEM.
*/
struct blk_stat_callback *
blk_stat_alloc_callback(void (*timer_fn)(struct blk_stat_callback *),
int (*bucket_fn)(const struct request *),
unsigned int buckets, void *data);
/**
* blk_stat_add_callback() - Add a block statistics callback to be run on a
* request queue.
* @q: The request queue.
* @cb: The callback.
*
* Note that a single &struct blk_stat_callback can only be added to a single
* &struct request_queue.
*/
void blk_stat_add_callback(struct request_queue *q,
struct blk_stat_callback *cb);
/**
* blk_stat_remove_callback() - Remove a block statistics callback from a
* request queue.
* @q: The request queue.
* @cb: The callback.
*
* When this returns, the callback is not running on any CPUs and will not be
* called again unless readded.
*/
void blk_stat_remove_callback(struct request_queue *q,
struct blk_stat_callback *cb);
/**
* blk_stat_free_callback() - Free a block statistics callback.
* @cb: The callback.
*
* @cb may be NULL, in which case this does nothing. If it is not NULL, @cb must
* not be associated with a request queue. I.e., if it was previously added with
* blk_stat_add_callback(), it must also have been removed since then with
* blk_stat_remove_callback().
*/
void blk_stat_free_callback(struct blk_stat_callback *cb);
/**
* blk_stat_is_active() - Check if a block statistics callback is currently
* gathering statistics.
* @cb: The callback.
*/
static inline bool blk_stat_is_active(struct blk_stat_callback *cb)
{
return timer_pending(&cb->timer);
}
/**
* blk_stat_activate_nsecs() - Gather block statistics during a time window in
* nanoseconds.
* @cb: The callback.
* @nsecs: Number of nanoseconds to gather statistics for.
*
* The timer callback will be called when the window expires.
*/
static inline void blk_stat_activate_nsecs(struct blk_stat_callback *cb,
u64 nsecs)
{
mod_timer(&cb->timer, jiffies + nsecs_to_jiffies(nsecs));
}
/**
* blk_stat_activate_msecs() - Gather block statistics during a time window in
* milliseconds.
* @cb: The callback.
* @msecs: Number of milliseconds to gather statistics for.
*
* The timer callback will be called when the window expires.
*/
static inline void blk_stat_activate_msecs(struct blk_stat_callback *cb,
unsigned int msecs)
{
mod_timer(&cb->timer, jiffies + msecs_to_jiffies(msecs));
}
#endif

View File

@ -208,7 +208,7 @@ static ssize_t queue_discard_max_store(struct request_queue *q,
static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *page)
{
return queue_var_show(queue_discard_zeroes_data(q), page);
return queue_var_show(0, page);
}
static ssize_t queue_write_same_max_show(struct request_queue *q, char *page)
@ -503,26 +503,6 @@ static ssize_t queue_dax_show(struct request_queue *q, char *page)
return queue_var_show(blk_queue_dax(q), page);
}
static ssize_t print_stat(char *page, struct blk_rq_stat *stat, const char *pre)
{
return sprintf(page, "%s samples=%llu, mean=%lld, min=%lld, max=%lld\n",
pre, (long long) stat->nr_samples,
(long long) stat->mean, (long long) stat->min,
(long long) stat->max);
}
static ssize_t queue_stats_show(struct request_queue *q, char *page)
{
struct blk_rq_stat stat[2];
ssize_t ret;
blk_queue_stat_get(q, stat);
ret = print_stat(page, &stat[BLK_STAT_READ], "read :");
ret += print_stat(page + ret, &stat[BLK_STAT_WRITE], "write:");
return ret;
}
static struct queue_sysfs_entry queue_requests_entry = {
.attr = {.name = "nr_requests", .mode = S_IRUGO | S_IWUSR },
.show = queue_requests_show,
@ -691,17 +671,20 @@ static struct queue_sysfs_entry queue_dax_entry = {
.show = queue_dax_show,
};
static struct queue_sysfs_entry queue_stats_entry = {
.attr = {.name = "stats", .mode = S_IRUGO },
.show = queue_stats_show,
};
static struct queue_sysfs_entry queue_wb_lat_entry = {
.attr = {.name = "wbt_lat_usec", .mode = S_IRUGO | S_IWUSR },
.show = queue_wb_lat_show,
.store = queue_wb_lat_store,
};
#ifdef CONFIG_BLK_DEV_THROTTLING_LOW
static struct queue_sysfs_entry throtl_sample_time_entry = {
.attr = {.name = "throttle_sample_time", .mode = S_IRUGO | S_IWUSR },
.show = blk_throtl_sample_time_show,
.store = blk_throtl_sample_time_store,
};
#endif
static struct attribute *default_attrs[] = {
&queue_requests_entry.attr,
&queue_ra_entry.attr,
@ -733,9 +716,11 @@ static struct attribute *default_attrs[] = {
&queue_poll_entry.attr,
&queue_wc_entry.attr,
&queue_dax_entry.attr,
&queue_stats_entry.attr,
&queue_wb_lat_entry.attr,
&queue_poll_delay_entry.attr,
#ifdef CONFIG_BLK_DEV_THROTTLING_LOW
&throtl_sample_time_entry.attr,
#endif
NULL,
};
@ -810,7 +795,9 @@ static void blk_release_queue(struct kobject *kobj)
struct request_queue *q =
container_of(kobj, struct request_queue, kobj);
wbt_exit(q);
if (test_bit(QUEUE_FLAG_POLL_STATS, &q->queue_flags))
blk_stat_remove_callback(q, q->poll_cb);
blk_stat_free_callback(q->poll_cb);
bdi_put(q->backing_dev_info);
blkcg_exit_queue(q);
@ -819,6 +806,8 @@ static void blk_release_queue(struct kobject *kobj)
elevator_exit(q, q->elevator);
}
blk_free_queue_stats(q->stats);
blk_exit_rl(&q->root_rl);
if (q->queue_tags)
@ -855,23 +844,6 @@ struct kobj_type blk_queue_ktype = {
.release = blk_release_queue,
};
static void blk_wb_init(struct request_queue *q)
{
#ifndef CONFIG_BLK_WBT_MQ
if (q->mq_ops)
return;
#endif
#ifndef CONFIG_BLK_WBT_SQ
if (q->request_fn)
return;
#endif
/*
* If this fails, we don't get throttling
*/
wbt_init(q);
}
int blk_register_queue(struct gendisk *disk)
{
int ret;
@ -881,6 +853,11 @@ int blk_register_queue(struct gendisk *disk)
if (WARN_ON(!q))
return -ENXIO;
WARN_ONCE(test_bit(QUEUE_FLAG_REGISTERED, &q->queue_flags),
"%s is registering an already registered queue\n",
kobject_name(&dev->kobj));
queue_flag_set_unlocked(QUEUE_FLAG_REGISTERED, q);
/*
* SCSI probing may synchronously create and destroy a lot of
* request_queues for non-existent devices. Shutting down a fully
@ -900,9 +877,6 @@ int blk_register_queue(struct gendisk *disk)
if (ret)
return ret;
if (q->mq_ops)
blk_mq_register_dev(dev, q);
/* Prevent changes through sysfs until registration is completed. */
mutex_lock(&q->sysfs_lock);
@ -912,9 +886,14 @@ int blk_register_queue(struct gendisk *disk)
goto unlock;
}
if (q->mq_ops)
__blk_mq_register_dev(dev, q);
kobject_uevent(&q->kobj, KOBJ_ADD);
blk_wb_init(q);
wbt_enable_default(q);
blk_throtl_register_queue(q);
if (q->request_fn || (q->mq_ops && q->elevator)) {
ret = elv_register_queue(q);
@ -939,6 +918,11 @@ void blk_unregister_queue(struct gendisk *disk)
if (WARN_ON(!q))
return;
queue_flag_clear_unlocked(QUEUE_FLAG_REGISTERED, q);
wbt_exit(q);
if (q->mq_ops)
blk_mq_unregister_dev(disk_to_dev(disk), q);

File diff suppressed because it is too large Load Diff

View File

@ -89,7 +89,6 @@ static void blk_rq_timed_out(struct request *req)
ret = q->rq_timed_out_fn(req);
switch (ret) {
case BLK_EH_HANDLED:
/* Can we use req->errors here? */
__blk_complete_request(req);
break;
case BLK_EH_RESET_TIMER:

View File

@ -255,8 +255,8 @@ static inline bool stat_sample_valid(struct blk_rq_stat *stat)
* that it's writes impacting us, and not just some sole read on
* a device that is in a lower power state.
*/
return stat[BLK_STAT_READ].nr_samples >= 1 &&
stat[BLK_STAT_WRITE].nr_samples >= RWB_MIN_WRITE_SAMPLES;
return (stat[READ].nr_samples >= 1 &&
stat[WRITE].nr_samples >= RWB_MIN_WRITE_SAMPLES);
}
static u64 rwb_sync_issue_lat(struct rq_wb *rwb)
@ -277,7 +277,7 @@ enum {
LAT_EXCEEDED,
};
static int __latency_exceeded(struct rq_wb *rwb, struct blk_rq_stat *stat)
static int latency_exceeded(struct rq_wb *rwb, struct blk_rq_stat *stat)
{
struct backing_dev_info *bdi = rwb->queue->backing_dev_info;
u64 thislat;
@ -293,7 +293,7 @@ static int __latency_exceeded(struct rq_wb *rwb, struct blk_rq_stat *stat)
*/
thislat = rwb_sync_issue_lat(rwb);
if (thislat > rwb->cur_win_nsec ||
(thislat > rwb->min_lat_nsec && !stat[BLK_STAT_READ].nr_samples)) {
(thislat > rwb->min_lat_nsec && !stat[READ].nr_samples)) {
trace_wbt_lat(bdi, thislat);
return LAT_EXCEEDED;
}
@ -308,8 +308,8 @@ static int __latency_exceeded(struct rq_wb *rwb, struct blk_rq_stat *stat)
* waited or still has writes in flights, consider us doing
* just writes as well.
*/
if ((stat[BLK_STAT_WRITE].nr_samples && blk_stat_is_current(stat)) ||
wb_recent_wait(rwb) || wbt_inflight(rwb))
if (stat[WRITE].nr_samples || wb_recent_wait(rwb) ||
wbt_inflight(rwb))
return LAT_UNKNOWN_WRITES;
return LAT_UNKNOWN;
}
@ -317,8 +317,8 @@ static int __latency_exceeded(struct rq_wb *rwb, struct blk_rq_stat *stat)
/*
* If the 'min' latency exceeds our target, step down.
*/
if (stat[BLK_STAT_READ].min > rwb->min_lat_nsec) {
trace_wbt_lat(bdi, stat[BLK_STAT_READ].min);
if (stat[READ].min > rwb->min_lat_nsec) {
trace_wbt_lat(bdi, stat[READ].min);
trace_wbt_stat(bdi, stat);
return LAT_EXCEEDED;
}
@ -329,14 +329,6 @@ static int __latency_exceeded(struct rq_wb *rwb, struct blk_rq_stat *stat)
return LAT_OK;
}
static int latency_exceeded(struct rq_wb *rwb)
{
struct blk_rq_stat stat[2];
blk_queue_stat_get(rwb->queue, stat);
return __latency_exceeded(rwb, stat);
}
static void rwb_trace_step(struct rq_wb *rwb, const char *msg)
{
struct backing_dev_info *bdi = rwb->queue->backing_dev_info;
@ -355,7 +347,6 @@ static void scale_up(struct rq_wb *rwb)
rwb->scale_step--;
rwb->unknown_cnt = 0;
blk_stat_clear(rwb->queue);
rwb->scaled_max = calc_wb_limits(rwb);
@ -385,15 +376,12 @@ static void scale_down(struct rq_wb *rwb, bool hard_throttle)
rwb->scaled_max = false;
rwb->unknown_cnt = 0;
blk_stat_clear(rwb->queue);
calc_wb_limits(rwb);
rwb_trace_step(rwb, "step down");
}
static void rwb_arm_timer(struct rq_wb *rwb)
{
unsigned long expires;
if (rwb->scale_step > 0) {
/*
* We should speed this up, using some variant of a fast
@ -411,17 +399,16 @@ static void rwb_arm_timer(struct rq_wb *rwb)
rwb->cur_win_nsec = rwb->win_nsec;
}
expires = jiffies + nsecs_to_jiffies(rwb->cur_win_nsec);
mod_timer(&rwb->window_timer, expires);
blk_stat_activate_nsecs(rwb->cb, rwb->cur_win_nsec);
}
static void wb_timer_fn(unsigned long data)
static void wb_timer_fn(struct blk_stat_callback *cb)
{
struct rq_wb *rwb = (struct rq_wb *) data;
struct rq_wb *rwb = cb->data;
unsigned int inflight = wbt_inflight(rwb);
int status;
status = latency_exceeded(rwb);
status = latency_exceeded(rwb, cb->stat);
trace_wbt_timer(rwb->queue->backing_dev_info, status, rwb->scale_step,
inflight);
@ -614,7 +601,7 @@ enum wbt_flags wbt_wait(struct rq_wb *rwb, struct bio *bio, spinlock_t *lock)
__wbt_wait(rwb, bio->bi_opf, lock);
if (!timer_pending(&rwb->window_timer))
if (!blk_stat_is_active(rwb->cb))
rwb_arm_timer(rwb);
if (current_is_kswapd())
@ -666,22 +653,37 @@ void wbt_set_write_cache(struct rq_wb *rwb, bool write_cache_on)
rwb->wc = write_cache_on;
}
/*
* Disable wbt, if enabled by default. Only called from CFQ, if we have
* cgroups enabled
/*
* Disable wbt, if enabled by default. Only called from CFQ.
*/
void wbt_disable_default(struct request_queue *q)
{
struct rq_wb *rwb = q->rq_wb;
if (rwb && rwb->enable_state == WBT_STATE_ON_DEFAULT) {
del_timer_sync(&rwb->window_timer);
rwb->win_nsec = rwb->min_lat_nsec = 0;
wbt_update_limits(rwb);
}
if (rwb && rwb->enable_state == WBT_STATE_ON_DEFAULT)
wbt_exit(q);
}
EXPORT_SYMBOL_GPL(wbt_disable_default);
/*
* Enable wbt if defaults are configured that way
*/
void wbt_enable_default(struct request_queue *q)
{
/* Throttling already enabled? */
if (q->rq_wb)
return;
/* Queue not registered? Maybe shutting down... */
if (!test_bit(QUEUE_FLAG_REGISTERED, &q->queue_flags))
return;
if ((q->mq_ops && IS_ENABLED(CONFIG_BLK_WBT_MQ)) ||
(q->request_fn && IS_ENABLED(CONFIG_BLK_WBT_SQ)))
wbt_init(q);
}
EXPORT_SYMBOL_GPL(wbt_enable_default);
u64 wbt_default_latency_nsec(struct request_queue *q)
{
/*
@ -694,29 +696,33 @@ u64 wbt_default_latency_nsec(struct request_queue *q)
return 75000000ULL;
}
static int wbt_data_dir(const struct request *rq)
{
return rq_data_dir(rq);
}
int wbt_init(struct request_queue *q)
{
struct rq_wb *rwb;
int i;
/*
* For now, we depend on the stats window being larger than
* our monitoring window. Ensure that this isn't inadvertently
* violated.
*/
BUILD_BUG_ON(RWB_WINDOW_NSEC > BLK_STAT_NSEC);
BUILD_BUG_ON(WBT_NR_BITS > BLK_STAT_RES_BITS);
rwb = kzalloc(sizeof(*rwb), GFP_KERNEL);
if (!rwb)
return -ENOMEM;
rwb->cb = blk_stat_alloc_callback(wb_timer_fn, wbt_data_dir, 2, rwb);
if (!rwb->cb) {
kfree(rwb);
return -ENOMEM;
}
for (i = 0; i < WBT_NUM_RWQ; i++) {
atomic_set(&rwb->rq_wait[i].inflight, 0);
init_waitqueue_head(&rwb->rq_wait[i].wait);
}
setup_timer(&rwb->window_timer, wb_timer_fn, (unsigned long) rwb);
rwb->wc = 1;
rwb->queue_depth = RWB_DEF_DEPTH;
rwb->last_comp = rwb->last_issue = jiffies;
@ -726,10 +732,10 @@ int wbt_init(struct request_queue *q)
wbt_update_limits(rwb);
/*
* Assign rwb, and turn on stats tracking for this queue
* Assign rwb and add the stats callback.
*/
q->rq_wb = rwb;
blk_stat_enable(q);
blk_stat_add_callback(q, rwb->cb);
rwb->min_lat_nsec = wbt_default_latency_nsec(q);
@ -744,7 +750,8 @@ void wbt_exit(struct request_queue *q)
struct rq_wb *rwb = q->rq_wb;
if (rwb) {
del_timer_sync(&rwb->window_timer);
blk_stat_remove_callback(q, rwb->cb);
blk_stat_free_callback(rwb->cb);
q->rq_wb = NULL;
kfree(rwb);
}

View File

@ -32,27 +32,27 @@ enum {
static inline void wbt_clear_state(struct blk_issue_stat *stat)
{
stat->time &= BLK_STAT_TIME_MASK;
stat->stat &= ~BLK_STAT_RES_MASK;
}
static inline enum wbt_flags wbt_stat_to_mask(struct blk_issue_stat *stat)
{
return (stat->time & BLK_STAT_MASK) >> BLK_STAT_SHIFT;
return (stat->stat & BLK_STAT_RES_MASK) >> BLK_STAT_RES_SHIFT;
}
static inline void wbt_track(struct blk_issue_stat *stat, enum wbt_flags wb_acct)
{
stat->time |= ((u64) wb_acct) << BLK_STAT_SHIFT;
stat->stat |= ((u64) wb_acct) << BLK_STAT_RES_SHIFT;
}
static inline bool wbt_is_tracked(struct blk_issue_stat *stat)
{
return (stat->time >> BLK_STAT_SHIFT) & WBT_TRACKED;
return (stat->stat >> BLK_STAT_RES_SHIFT) & WBT_TRACKED;
}
static inline bool wbt_is_read(struct blk_issue_stat *stat)
{
return (stat->time >> BLK_STAT_SHIFT) & WBT_READ;
return (stat->stat >> BLK_STAT_RES_SHIFT) & WBT_READ;
}
struct rq_wait {
@ -81,7 +81,7 @@ struct rq_wb {
u64 win_nsec; /* default window size */
u64 cur_win_nsec; /* current window size */
struct timer_list window_timer;
struct blk_stat_callback *cb;
s64 sync_issue;
void *sync_cookie;
@ -117,6 +117,7 @@ void wbt_update_limits(struct rq_wb *);
void wbt_requeue(struct rq_wb *, struct blk_issue_stat *);
void wbt_issue(struct rq_wb *, struct blk_issue_stat *);
void wbt_disable_default(struct request_queue *);
void wbt_enable_default(struct request_queue *);
void wbt_set_queue_depth(struct rq_wb *, unsigned int);
void wbt_set_write_cache(struct rq_wb *, bool);
@ -155,6 +156,9 @@ static inline void wbt_issue(struct rq_wb *rwb, struct blk_issue_stat *stat)
static inline void wbt_disable_default(struct request_queue *q)
{
}
static inline void wbt_enable_default(struct request_queue *q)
{
}
static inline void wbt_set_queue_depth(struct rq_wb *rwb, unsigned int depth)
{
}

View File

@ -60,15 +60,12 @@ void blk_free_flush_queue(struct blk_flush_queue *q);
int blk_init_rl(struct request_list *rl, struct request_queue *q,
gfp_t gfp_mask);
void blk_exit_rl(struct request_list *rl);
void init_request_from_bio(struct request *req, struct bio *bio);
void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
struct bio *bio);
void blk_queue_bypass_start(struct request_queue *q);
void blk_queue_bypass_end(struct request_queue *q);
void blk_dequeue_request(struct request *rq);
void __blk_queue_free_tags(struct request_queue *q);
bool __blk_end_bidi_request(struct request *rq, int error,
unsigned int nr_bytes, unsigned int bidi_bytes);
void blk_freeze_queue(struct request_queue *q);
static inline void blk_queue_enter_live(struct request_queue *q)
@ -319,10 +316,22 @@ static inline struct io_context *create_io_context(gfp_t gfp_mask, int node)
extern void blk_throtl_drain(struct request_queue *q);
extern int blk_throtl_init(struct request_queue *q);
extern void blk_throtl_exit(struct request_queue *q);
extern void blk_throtl_register_queue(struct request_queue *q);
#else /* CONFIG_BLK_DEV_THROTTLING */
static inline void blk_throtl_drain(struct request_queue *q) { }
static inline int blk_throtl_init(struct request_queue *q) { return 0; }
static inline void blk_throtl_exit(struct request_queue *q) { }
static inline void blk_throtl_register_queue(struct request_queue *q) { }
#endif /* CONFIG_BLK_DEV_THROTTLING */
#ifdef CONFIG_BLK_DEV_THROTTLING_LOW
extern ssize_t blk_throtl_sample_time_show(struct request_queue *q, char *page);
extern ssize_t blk_throtl_sample_time_store(struct request_queue *q,
const char *page, size_t count);
extern void blk_throtl_bio_endio(struct bio *bio);
extern void blk_throtl_stat_add(struct request *rq, u64 time);
#else
static inline void blk_throtl_bio_endio(struct bio *bio) { }
static inline void blk_throtl_stat_add(struct request *rq, u64 time) { }
#endif
#endif /* BLK_INTERNAL_H */

View File

@ -37,7 +37,7 @@ static void bsg_destroy_job(struct kref *kref)
struct bsg_job *job = container_of(kref, struct bsg_job, kref);
struct request *rq = job->req;
blk_end_request_all(rq, rq->errors);
blk_end_request_all(rq, scsi_req(rq)->result);
put_device(job->dev); /* release reference for the request */
@ -74,7 +74,7 @@ void bsg_job_done(struct bsg_job *job, int result,
struct scsi_request *rq = scsi_req(req);
int err;
err = job->req->errors = result;
err = scsi_req(job->req)->result = result;
if (err < 0)
/* we're only returning the result field in the reply */
rq->sense_len = sizeof(u32);
@ -177,7 +177,7 @@ static int bsg_create_job(struct device *dev, struct request *req)
* @q: request queue to manage
*
* On error the create_bsg_job function should return a -Exyz error value
* that will be set to the req->errors.
* that will be set to ->result.
*
* Drivers/subsys should pass this to the queue init function.
*/
@ -201,7 +201,7 @@ static void bsg_request_fn(struct request_queue *q)
ret = bsg_create_job(dev, req);
if (ret) {
req->errors = ret;
scsi_req(req)->result = ret;
blk_end_request_all(req, ret);
spin_lock_irq(q->queue_lock);
continue;

View File

@ -391,13 +391,13 @@ static int blk_complete_sgv4_hdr_rq(struct request *rq, struct sg_io_v4 *hdr,
struct scsi_request *req = scsi_req(rq);
int ret = 0;
dprintk("rq %p bio %p 0x%x\n", rq, bio, rq->errors);
dprintk("rq %p bio %p 0x%x\n", rq, bio, req->result);
/*
* fill in all the output members
*/
hdr->device_status = rq->errors & 0xff;
hdr->transport_status = host_byte(rq->errors);
hdr->driver_status = driver_byte(rq->errors);
hdr->device_status = req->result & 0xff;
hdr->transport_status = host_byte(req->result);
hdr->driver_status = driver_byte(req->result);
hdr->info = 0;
if (hdr->device_status || hdr->transport_status || hdr->driver_status)
hdr->info |= SG_INFO_CHECK;
@ -431,8 +431,8 @@ static int blk_complete_sgv4_hdr_rq(struct request *rq, struct sg_io_v4 *hdr,
* just a protocol response (i.e. non negative), that gets
* processed above.
*/
if (!ret && rq->errors < 0)
ret = rq->errors;
if (!ret && req->result < 0)
ret = req->result;
blk_rq_unmap_user(bio);
scsi_req_free_cmd(req);

View File

@ -3761,16 +3761,14 @@ static void cfq_init_cfqq(struct cfq_data *cfqd, struct cfq_queue *cfqq,
}
#ifdef CONFIG_CFQ_GROUP_IOSCHED
static bool check_blkcg_changed(struct cfq_io_cq *cic, struct bio *bio)
static void check_blkcg_changed(struct cfq_io_cq *cic, struct bio *bio)
{
struct cfq_data *cfqd = cic_to_cfqd(cic);
struct cfq_queue *cfqq;
uint64_t serial_nr;
bool nonroot_cg;
rcu_read_lock();
serial_nr = bio_blkcg(bio)->css.serial_nr;
nonroot_cg = bio_blkcg(bio) != &blkcg_root;
rcu_read_unlock();
/*
@ -3778,7 +3776,7 @@ static bool check_blkcg_changed(struct cfq_io_cq *cic, struct bio *bio)
* spuriously on a newly created cic but there's no harm.
*/
if (unlikely(!cfqd) || likely(cic->blkcg_serial_nr == serial_nr))
return nonroot_cg;
return;
/*
* Drop reference to queues. New queues will be assigned in new
@ -3799,12 +3797,10 @@ static bool check_blkcg_changed(struct cfq_io_cq *cic, struct bio *bio)
}
cic->blkcg_serial_nr = serial_nr;
return nonroot_cg;
}
#else
static inline bool check_blkcg_changed(struct cfq_io_cq *cic, struct bio *bio)
static inline void check_blkcg_changed(struct cfq_io_cq *cic, struct bio *bio)
{
return false;
}
#endif /* CONFIG_CFQ_GROUP_IOSCHED */
@ -4449,12 +4445,11 @@ cfq_set_request(struct request_queue *q, struct request *rq, struct bio *bio,
const int rw = rq_data_dir(rq);
const bool is_sync = rq_is_sync(rq);
struct cfq_queue *cfqq;
bool disable_wbt;
spin_lock_irq(q->queue_lock);
check_ioprio_changed(cic, bio);
disable_wbt = check_blkcg_changed(cic, bio);
check_blkcg_changed(cic, bio);
new_queue:
cfqq = cic_to_cfqq(cic, is_sync);
if (!cfqq || cfqq == &cfqd->oom_cfqq) {
@ -4491,9 +4486,6 @@ cfq_set_request(struct request_queue *q, struct request *rq, struct bio *bio,
rq->elv.priv[1] = cfqq->cfqg;
spin_unlock_irq(q->queue_lock);
if (disable_wbt)
wbt_disable_default(q);
return 0;
}
@ -4706,6 +4698,7 @@ static void cfq_registered_queue(struct request_queue *q)
*/
if (blk_queue_nonrot(q))
cfqd->cfq_slice_idle = 0;
wbt_disable_default(q);
}
/*

View File

@ -685,7 +685,7 @@ long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg)
case BLKALIGNOFF:
return compat_put_int(arg, bdev_alignment_offset(bdev));
case BLKDISCARDZEROES:
return compat_put_uint(arg, bdev_discard_zeroes_data(bdev));
return compat_put_uint(arg, 0);
case BLKFLSBUF:
case BLKROSET:
case BLKDISCARD:

View File

@ -41,6 +41,7 @@
#include "blk.h"
#include "blk-mq-sched.h"
#include "blk-wbt.h"
static DEFINE_SPINLOCK(elv_list_lock);
static LIST_HEAD(elv_list);
@ -877,6 +878,8 @@ void elv_unregister_queue(struct request_queue *q)
kobject_uevent(&e->kobj, KOBJ_REMOVE);
kobject_del(&e->kobj);
e->registered = 0;
/* Re-enable throttling in case elevator disabled it */
wbt_enable_default(q);
}
}
EXPORT_SYMBOL(elv_unregister_queue);

View File

@ -1060,8 +1060,19 @@ static struct attribute *disk_attrs[] = {
NULL
};
static umode_t disk_visible(struct kobject *kobj, struct attribute *a, int n)
{
struct device *dev = container_of(kobj, typeof(*dev), kobj);
struct gendisk *disk = dev_to_disk(dev);
if (a == &dev_attr_badblocks.attr && !disk->bb)
return 0;
return a->mode;
}
static struct attribute_group disk_attr_group = {
.attrs = disk_attrs,
.is_visible = disk_visible,
};
static const struct attribute_group *disk_attr_groups[] = {
@ -1352,7 +1363,7 @@ struct kobject *get_disk(struct gendisk *disk)
owner = disk->fops->owner;
if (owner && !try_module_get(owner))
return NULL;
kobj = kobject_get(&disk_to_dev(disk)->kobj);
kobj = kobject_get_unless_zero(&disk_to_dev(disk)->kobj);
if (kobj == NULL) {
module_put(owner);
return NULL;

View File

@ -255,7 +255,7 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
truncate_inode_pages_range(mapping, start, end);
return blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
false);
BLKDEV_ZERO_NOUNMAP);
}
static int put_ushort(unsigned long arg, unsigned short val)
@ -547,7 +547,7 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
case BLKALIGNOFF:
return put_int(arg, bdev_alignment_offset(bdev));
case BLKDISCARDZEROES:
return put_uint(arg, bdev_discard_zeroes_data(bdev));
return put_uint(arg, 0);
case BLKSECTGET:
max_sectors = min_t(unsigned int, USHRT_MAX,
queue_max_sectors(bdev_get_queue(bdev)));

View File

@ -163,22 +163,12 @@ static int get_task_ioprio(struct task_struct *p)
int ioprio_best(unsigned short aprio, unsigned short bprio)
{
unsigned short aclass;
unsigned short bclass;
if (!ioprio_valid(aprio))
aprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_NORM);
if (!ioprio_valid(bprio))
bprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_NORM);
aclass = IOPRIO_PRIO_CLASS(aprio);
bclass = IOPRIO_PRIO_CLASS(bprio);
if (aclass == bclass)
return min(aprio, bprio);
if (aclass > bclass)
return bprio;
else
return aprio;
return min(aprio, bprio);
}
SYSCALL_DEFINE2(ioprio_get, int, which, int, who)

719
block/kyber-iosched.c Normal file
View File

@ -0,0 +1,719 @@
/*
* The Kyber I/O scheduler. Controls latency by throttling queue depths using
* scalable techniques.
*
* Copyright (C) 2017 Facebook
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public
* License v2 as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <https://www.gnu.org/licenses/>.
*/
#include <linux/kernel.h>
#include <linux/blkdev.h>
#include <linux/blk-mq.h>
#include <linux/elevator.h>
#include <linux/module.h>
#include <linux/sbitmap.h>
#include "blk.h"
#include "blk-mq.h"
#include "blk-mq-sched.h"
#include "blk-mq-tag.h"
#include "blk-stat.h"
/* Scheduling domains. */
enum {
KYBER_READ,
KYBER_SYNC_WRITE,
KYBER_OTHER, /* Async writes, discard, etc. */
KYBER_NUM_DOMAINS,
};
enum {
KYBER_MIN_DEPTH = 256,
/*
* In order to prevent starvation of synchronous requests by a flood of
* asynchronous requests, we reserve 25% of requests for synchronous
* operations.
*/
KYBER_ASYNC_PERCENT = 75,
};
/*
* Initial device-wide depths for each scheduling domain.
*
* Even for fast devices with lots of tags like NVMe, you can saturate
* the device with only a fraction of the maximum possible queue depth.
* So, we cap these to a reasonable value.
*/
static const unsigned int kyber_depth[] = {
[KYBER_READ] = 256,
[KYBER_SYNC_WRITE] = 128,
[KYBER_OTHER] = 64,
};
/*
* Scheduling domain batch sizes. We favor reads.
*/
static const unsigned int kyber_batch_size[] = {
[KYBER_READ] = 16,
[KYBER_SYNC_WRITE] = 8,
[KYBER_OTHER] = 8,
};
struct kyber_queue_data {
struct request_queue *q;
struct blk_stat_callback *cb;
/*
* The device is divided into multiple scheduling domains based on the
* request type. Each domain has a fixed number of in-flight requests of
* that type device-wide, limited by these tokens.
*/
struct sbitmap_queue domain_tokens[KYBER_NUM_DOMAINS];
/*
* Async request percentage, converted to per-word depth for
* sbitmap_get_shallow().
*/
unsigned int async_depth;
/* Target latencies in nanoseconds. */
u64 read_lat_nsec, write_lat_nsec;
};
struct kyber_hctx_data {
spinlock_t lock;
struct list_head rqs[KYBER_NUM_DOMAINS];
unsigned int cur_domain;
unsigned int batching;
wait_queue_t domain_wait[KYBER_NUM_DOMAINS];
atomic_t wait_index[KYBER_NUM_DOMAINS];
};
static int rq_sched_domain(const struct request *rq)
{
unsigned int op = rq->cmd_flags;
if ((op & REQ_OP_MASK) == REQ_OP_READ)
return KYBER_READ;
else if ((op & REQ_OP_MASK) == REQ_OP_WRITE && op_is_sync(op))
return KYBER_SYNC_WRITE;
else
return KYBER_OTHER;
}
enum {
NONE = 0,
GOOD = 1,
GREAT = 2,
BAD = -1,
AWFUL = -2,
};
#define IS_GOOD(status) ((status) > 0)
#define IS_BAD(status) ((status) < 0)
static int kyber_lat_status(struct blk_stat_callback *cb,
unsigned int sched_domain, u64 target)
{
u64 latency;
if (!cb->stat[sched_domain].nr_samples)
return NONE;
latency = cb->stat[sched_domain].mean;
if (latency >= 2 * target)
return AWFUL;
else if (latency > target)
return BAD;
else if (latency <= target / 2)
return GREAT;
else /* (latency <= target) */
return GOOD;
}
/*
* Adjust the read or synchronous write depth given the status of reads and
* writes. The goal is that the latencies of the two domains are fair (i.e., if
* one is good, then the other is good).
*/
static void kyber_adjust_rw_depth(struct kyber_queue_data *kqd,
unsigned int sched_domain, int this_status,
int other_status)
{
unsigned int orig_depth, depth;
/*
* If this domain had no samples, or reads and writes are both good or
* both bad, don't adjust the depth.
*/
if (this_status == NONE ||
(IS_GOOD(this_status) && IS_GOOD(other_status)) ||
(IS_BAD(this_status) && IS_BAD(other_status)))
return;
orig_depth = depth = kqd->domain_tokens[sched_domain].sb.depth;
if (other_status == NONE) {
depth++;
} else {
switch (this_status) {
case GOOD:
if (other_status == AWFUL)
depth -= max(depth / 4, 1U);
else
depth -= max(depth / 8, 1U);
break;
case GREAT:
if (other_status == AWFUL)
depth /= 2;
else
depth -= max(depth / 4, 1U);
break;
case BAD:
depth++;
break;
case AWFUL:
if (other_status == GREAT)
depth += 2;
else
depth++;
break;
}
}
depth = clamp(depth, 1U, kyber_depth[sched_domain]);
if (depth != orig_depth)
sbitmap_queue_resize(&kqd->domain_tokens[sched_domain], depth);
}
/*
* Adjust the depth of other requests given the status of reads and synchronous
* writes. As long as either domain is doing fine, we don't throttle, but if
* both domains are doing badly, we throttle heavily.
*/
static void kyber_adjust_other_depth(struct kyber_queue_data *kqd,
int read_status, int write_status,
bool have_samples)
{
unsigned int orig_depth, depth;
int status;
orig_depth = depth = kqd->domain_tokens[KYBER_OTHER].sb.depth;
if (read_status == NONE && write_status == NONE) {
depth += 2;
} else if (have_samples) {
if (read_status == NONE)
status = write_status;
else if (write_status == NONE)
status = read_status;
else
status = max(read_status, write_status);
switch (status) {
case GREAT:
depth += 2;
break;
case GOOD:
depth++;
break;
case BAD:
depth -= max(depth / 4, 1U);
break;
case AWFUL:
depth /= 2;
break;
}
}
depth = clamp(depth, 1U, kyber_depth[KYBER_OTHER]);
if (depth != orig_depth)
sbitmap_queue_resize(&kqd->domain_tokens[KYBER_OTHER], depth);
}
/*
* Apply heuristics for limiting queue depths based on gathered latency
* statistics.
*/
static void kyber_stat_timer_fn(struct blk_stat_callback *cb)
{
struct kyber_queue_data *kqd = cb->data;
int read_status, write_status;
read_status = kyber_lat_status(cb, KYBER_READ, kqd->read_lat_nsec);
write_status = kyber_lat_status(cb, KYBER_SYNC_WRITE, kqd->write_lat_nsec);
kyber_adjust_rw_depth(kqd, KYBER_READ, read_status, write_status);
kyber_adjust_rw_depth(kqd, KYBER_SYNC_WRITE, write_status, read_status);
kyber_adjust_other_depth(kqd, read_status, write_status,
cb->stat[KYBER_OTHER].nr_samples != 0);
/*
* Continue monitoring latencies if we aren't hitting the targets or
* we're still throttling other requests.
*/
if (!blk_stat_is_active(kqd->cb) &&
((IS_BAD(read_status) || IS_BAD(write_status) ||
kqd->domain_tokens[KYBER_OTHER].sb.depth < kyber_depth[KYBER_OTHER])))
blk_stat_activate_msecs(kqd->cb, 100);
}
static unsigned int kyber_sched_tags_shift(struct kyber_queue_data *kqd)
{
/*
* All of the hardware queues have the same depth, so we can just grab
* the shift of the first one.
*/
return kqd->q->queue_hw_ctx[0]->sched_tags->bitmap_tags.sb.shift;
}
static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q)
{
struct kyber_queue_data *kqd;
unsigned int max_tokens;
unsigned int shift;
int ret = -ENOMEM;
int i;
kqd = kmalloc_node(sizeof(*kqd), GFP_KERNEL, q->node);
if (!kqd)
goto err;
kqd->q = q;
kqd->cb = blk_stat_alloc_callback(kyber_stat_timer_fn, rq_sched_domain,
KYBER_NUM_DOMAINS, kqd);
if (!kqd->cb)
goto err_kqd;
/*
* The maximum number of tokens for any scheduling domain is at least
* the queue depth of a single hardware queue. If the hardware doesn't
* have many tags, still provide a reasonable number.
*/
max_tokens = max_t(unsigned int, q->tag_set->queue_depth,
KYBER_MIN_DEPTH);
for (i = 0; i < KYBER_NUM_DOMAINS; i++) {
WARN_ON(!kyber_depth[i]);
WARN_ON(!kyber_batch_size[i]);
ret = sbitmap_queue_init_node(&kqd->domain_tokens[i],
max_tokens, -1, false, GFP_KERNEL,
q->node);
if (ret) {
while (--i >= 0)
sbitmap_queue_free(&kqd->domain_tokens[i]);
goto err_cb;
}
sbitmap_queue_resize(&kqd->domain_tokens[i], kyber_depth[i]);
}
shift = kyber_sched_tags_shift(kqd);
kqd->async_depth = (1U << shift) * KYBER_ASYNC_PERCENT / 100U;
kqd->read_lat_nsec = 2000000ULL;
kqd->write_lat_nsec = 10000000ULL;
return kqd;
err_cb:
blk_stat_free_callback(kqd->cb);
err_kqd:
kfree(kqd);
err:
return ERR_PTR(ret);
}
static int kyber_init_sched(struct request_queue *q, struct elevator_type *e)
{
struct kyber_queue_data *kqd;
struct elevator_queue *eq;
eq = elevator_alloc(q, e);
if (!eq)
return -ENOMEM;
kqd = kyber_queue_data_alloc(q);
if (IS_ERR(kqd)) {
kobject_put(&eq->kobj);
return PTR_ERR(kqd);
}
eq->elevator_data = kqd;
q->elevator = eq;
blk_stat_add_callback(q, kqd->cb);
return 0;
}
static void kyber_exit_sched(struct elevator_queue *e)
{
struct kyber_queue_data *kqd = e->elevator_data;
struct request_queue *q = kqd->q;
int i;
blk_stat_remove_callback(q, kqd->cb);
for (i = 0; i < KYBER_NUM_DOMAINS; i++)
sbitmap_queue_free(&kqd->domain_tokens[i]);
blk_stat_free_callback(kqd->cb);
kfree(kqd);
}
static int kyber_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
{
struct kyber_hctx_data *khd;
int i;
khd = kmalloc_node(sizeof(*khd), GFP_KERNEL, hctx->numa_node);
if (!khd)
return -ENOMEM;
spin_lock_init(&khd->lock);
for (i = 0; i < KYBER_NUM_DOMAINS; i++) {
INIT_LIST_HEAD(&khd->rqs[i]);
INIT_LIST_HEAD(&khd->domain_wait[i].task_list);
atomic_set(&khd->wait_index[i], 0);
}
khd->cur_domain = 0;
khd->batching = 0;
hctx->sched_data = khd;
return 0;
}
static void kyber_exit_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
{
kfree(hctx->sched_data);
}
static int rq_get_domain_token(struct request *rq)
{
return (long)rq->elv.priv[0];
}
static void rq_set_domain_token(struct request *rq, int token)
{
rq->elv.priv[0] = (void *)(long)token;
}
static void rq_clear_domain_token(struct kyber_queue_data *kqd,
struct request *rq)
{
unsigned int sched_domain;
int nr;
nr = rq_get_domain_token(rq);
if (nr != -1) {
sched_domain = rq_sched_domain(rq);
sbitmap_queue_clear(&kqd->domain_tokens[sched_domain], nr,
rq->mq_ctx->cpu);
}
}
static struct request *kyber_get_request(struct request_queue *q,
unsigned int op,
struct blk_mq_alloc_data *data)
{
struct kyber_queue_data *kqd = q->elevator->elevator_data;
struct request *rq;
/*
* We use the scheduler tags as per-hardware queue queueing tokens.
* Async requests can be limited at this stage.
*/
if (!op_is_sync(op))
data->shallow_depth = kqd->async_depth;
rq = __blk_mq_alloc_request(data, op);
if (rq)
rq_set_domain_token(rq, -1);
return rq;
}
static void kyber_put_request(struct request *rq)
{
struct request_queue *q = rq->q;
struct kyber_queue_data *kqd = q->elevator->elevator_data;
rq_clear_domain_token(kqd, rq);
blk_mq_finish_request(rq);
}
static void kyber_completed_request(struct request *rq)
{
struct request_queue *q = rq->q;
struct kyber_queue_data *kqd = q->elevator->elevator_data;
unsigned int sched_domain;
u64 now, latency, target;
/*
* Check if this request met our latency goal. If not, quickly gather
* some statistics and start throttling.
*/
sched_domain = rq_sched_domain(rq);
switch (sched_domain) {
case KYBER_READ:
target = kqd->read_lat_nsec;
break;
case KYBER_SYNC_WRITE:
target = kqd->write_lat_nsec;
break;
default:
return;
}
/* If we are already monitoring latencies, don't check again. */
if (blk_stat_is_active(kqd->cb))
return;
now = __blk_stat_time(ktime_to_ns(ktime_get()));
if (now < blk_stat_time(&rq->issue_stat))
return;
latency = now - blk_stat_time(&rq->issue_stat);
if (latency > target)
blk_stat_activate_msecs(kqd->cb, 10);
}
static void kyber_flush_busy_ctxs(struct kyber_hctx_data *khd,
struct blk_mq_hw_ctx *hctx)
{
LIST_HEAD(rq_list);
struct request *rq, *next;
blk_mq_flush_busy_ctxs(hctx, &rq_list);
list_for_each_entry_safe(rq, next, &rq_list, queuelist) {
unsigned int sched_domain;
sched_domain = rq_sched_domain(rq);
list_move_tail(&rq->queuelist, &khd->rqs[sched_domain]);
}
}
static int kyber_domain_wake(wait_queue_t *wait, unsigned mode, int flags,
void *key)
{
struct blk_mq_hw_ctx *hctx = READ_ONCE(wait->private);
list_del_init(&wait->task_list);
blk_mq_run_hw_queue(hctx, true);
return 1;
}
static int kyber_get_domain_token(struct kyber_queue_data *kqd,
struct kyber_hctx_data *khd,
struct blk_mq_hw_ctx *hctx)
{
unsigned int sched_domain = khd->cur_domain;
struct sbitmap_queue *domain_tokens = &kqd->domain_tokens[sched_domain];
wait_queue_t *wait = &khd->domain_wait[sched_domain];
struct sbq_wait_state *ws;
int nr;
nr = __sbitmap_queue_get(domain_tokens);
if (nr >= 0)
return nr;
/*
* If we failed to get a domain token, make sure the hardware queue is
* run when one becomes available. Note that this is serialized on
* khd->lock, but we still need to be careful about the waker.
*/
if (list_empty_careful(&wait->task_list)) {
init_waitqueue_func_entry(wait, kyber_domain_wake);
wait->private = hctx;
ws = sbq_wait_ptr(domain_tokens,
&khd->wait_index[sched_domain]);
add_wait_queue(&ws->wait, wait);
/*
* Try again in case a token was freed before we got on the wait
* queue.
*/
nr = __sbitmap_queue_get(domain_tokens);
}
return nr;
}
static struct request *
kyber_dispatch_cur_domain(struct kyber_queue_data *kqd,
struct kyber_hctx_data *khd,
struct blk_mq_hw_ctx *hctx,
bool *flushed)
{
struct list_head *rqs;
struct request *rq;
int nr;
rqs = &khd->rqs[khd->cur_domain];
rq = list_first_entry_or_null(rqs, struct request, queuelist);
/*
* If there wasn't already a pending request and we haven't flushed the
* software queues yet, flush the software queues and check again.
*/
if (!rq && !*flushed) {
kyber_flush_busy_ctxs(khd, hctx);
*flushed = true;
rq = list_first_entry_or_null(rqs, struct request, queuelist);
}
if (rq) {
nr = kyber_get_domain_token(kqd, khd, hctx);
if (nr >= 0) {
khd->batching++;
rq_set_domain_token(rq, nr);
list_del_init(&rq->queuelist);
return rq;
}
}
/* There were either no pending requests or no tokens. */
return NULL;
}
static struct request *kyber_dispatch_request(struct blk_mq_hw_ctx *hctx)
{
struct kyber_queue_data *kqd = hctx->queue->elevator->elevator_data;
struct kyber_hctx_data *khd = hctx->sched_data;
bool flushed = false;
struct request *rq;
int i;
spin_lock(&khd->lock);
/*
* First, if we are still entitled to batch, try to dispatch a request
* from the batch.
*/
if (khd->batching < kyber_batch_size[khd->cur_domain]) {
rq = kyber_dispatch_cur_domain(kqd, khd, hctx, &flushed);
if (rq)
goto out;
}
/*
* Either,
* 1. We were no longer entitled to a batch.
* 2. The domain we were batching didn't have any requests.
* 3. The domain we were batching was out of tokens.
*
* Start another batch. Note that this wraps back around to the original
* domain if no other domains have requests or tokens.
*/
khd->batching = 0;
for (i = 0; i < KYBER_NUM_DOMAINS; i++) {
if (khd->cur_domain == KYBER_NUM_DOMAINS - 1)
khd->cur_domain = 0;
else
khd->cur_domain++;
rq = kyber_dispatch_cur_domain(kqd, khd, hctx, &flushed);
if (rq)
goto out;
}
rq = NULL;
out:
spin_unlock(&khd->lock);
return rq;
}
static bool kyber_has_work(struct blk_mq_hw_ctx *hctx)
{
struct kyber_hctx_data *khd = hctx->sched_data;
int i;
for (i = 0; i < KYBER_NUM_DOMAINS; i++) {
if (!list_empty_careful(&khd->rqs[i]))
return true;
}
return false;
}
#define KYBER_LAT_SHOW_STORE(op) \
static ssize_t kyber_##op##_lat_show(struct elevator_queue *e, \
char *page) \
{ \
struct kyber_queue_data *kqd = e->elevator_data; \
\
return sprintf(page, "%llu\n", kqd->op##_lat_nsec); \
} \
\
static ssize_t kyber_##op##_lat_store(struct elevator_queue *e, \
const char *page, size_t count) \
{ \
struct kyber_queue_data *kqd = e->elevator_data; \
unsigned long long nsec; \
int ret; \
\
ret = kstrtoull(page, 10, &nsec); \
if (ret) \
return ret; \
\
kqd->op##_lat_nsec = nsec; \
\
return count; \
}
KYBER_LAT_SHOW_STORE(read);
KYBER_LAT_SHOW_STORE(write);
#undef KYBER_LAT_SHOW_STORE
#define KYBER_LAT_ATTR(op) __ATTR(op##_lat_nsec, 0644, kyber_##op##_lat_show, kyber_##op##_lat_store)
static struct elv_fs_entry kyber_sched_attrs[] = {
KYBER_LAT_ATTR(read),
KYBER_LAT_ATTR(write),
__ATTR_NULL
};
#undef KYBER_LAT_ATTR
static struct elevator_type kyber_sched = {
.ops.mq = {
.init_sched = kyber_init_sched,
.exit_sched = kyber_exit_sched,
.init_hctx = kyber_init_hctx,
.exit_hctx = kyber_exit_hctx,
.get_request = kyber_get_request,
.put_request = kyber_put_request,
.completed_request = kyber_completed_request,
.dispatch_request = kyber_dispatch_request,
.has_work = kyber_has_work,
},
.uses_mq = true,
.elevator_attrs = kyber_sched_attrs,
.elevator_name = "kyber",
.elevator_owner = THIS_MODULE,
};
static int __init kyber_init(void)
{
return elv_register(&kyber_sched);
}
static void __exit kyber_exit(void)
{
elv_unregister(&kyber_sched);
}
module_init(kyber_init);
module_exit(kyber_exit);
MODULE_AUTHOR("Omar Sandoval");
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("Kyber I/O scheduler");

View File

@ -497,7 +497,6 @@ int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
if (disk->fops->revalidate_disk)
disk->fops->revalidate_disk(disk);
blk_integrity_revalidate(disk);
check_disk_size_change(disk, bdev);
bdev->bd_invalidated = 0;
if (!get_capacity(disk) || !(state = check_partition(disk, bdev)))

View File

@ -262,11 +262,11 @@ static int blk_complete_sghdr_rq(struct request *rq, struct sg_io_hdr *hdr,
/*
* fill in all the output members
*/
hdr->status = rq->errors & 0xff;
hdr->masked_status = status_byte(rq->errors);
hdr->msg_status = msg_byte(rq->errors);
hdr->host_status = host_byte(rq->errors);
hdr->driver_status = driver_byte(rq->errors);
hdr->status = req->result & 0xff;
hdr->masked_status = status_byte(req->result);
hdr->msg_status = msg_byte(req->result);
hdr->host_status = host_byte(req->result);
hdr->driver_status = driver_byte(req->result);
hdr->info = 0;
if (hdr->masked_status || hdr->host_status || hdr->driver_status)
hdr->info |= SG_INFO_CHECK;
@ -362,7 +362,7 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
goto out_free_cdb;
bio = rq->bio;
rq->retries = 0;
req->retries = 0;
start_time = jiffies;
@ -476,13 +476,13 @@ int sg_scsi_ioctl(struct request_queue *q, struct gendisk *disk, fmode_t mode,
goto error;
/* default. possible overriden later */
rq->retries = 5;
req->retries = 5;
switch (opcode) {
case SEND_DIAGNOSTIC:
case FORMAT_UNIT:
rq->timeout = FORMAT_UNIT_TIMEOUT;
rq->retries = 1;
req->retries = 1;
break;
case START_STOP:
rq->timeout = START_STOP_TIMEOUT;
@ -495,7 +495,7 @@ int sg_scsi_ioctl(struct request_queue *q, struct gendisk *disk, fmode_t mode,
break;
case READ_DEFECT_DATA:
rq->timeout = READ_DEFECT_DATA_TIMEOUT;
rq->retries = 1;
req->retries = 1;
break;
default:
rq->timeout = BLK_DEFAULT_SG_TIMEOUT;
@ -509,7 +509,7 @@ int sg_scsi_ioctl(struct request_queue *q, struct gendisk *disk, fmode_t mode,
blk_execute_rq(q, disk, rq, 0);
err = rq->errors & 0xff; /* only 8 bit SCSI status */
err = req->result & 0xff; /* only 8 bit SCSI status */
if (err) {
if (req->sense_len && req->sense) {
bytes = (OMAX_SB_LEN > req->sense_len) ?
@ -547,7 +547,8 @@ static int __blk_send_generic(struct request_queue *q, struct gendisk *bd_disk,
scsi_req(rq)->cmd[0] = cmd;
scsi_req(rq)->cmd[4] = data;
scsi_req(rq)->cmd_len = 6;
err = blk_execute_rq(q, bd_disk, rq, 0);
blk_execute_rq(q, bd_disk, rq, 0);
err = scsi_req(rq)->result ? -EIO : 0;
blk_put_request(rq);
return err;

View File

@ -275,8 +275,8 @@ static bool check_tper(const void *data)
u8 flags = tper->supported_features;
if (!(flags & TPER_SYNC_SUPPORTED)) {
pr_err("TPer sync not supported. flags = %d\n",
tper->supported_features);
pr_debug("TPer sync not supported. flags = %d\n",
tper->supported_features);
return false;
}
@ -289,7 +289,7 @@ static bool check_sum(const void *data)
u32 nlo = be32_to_cpu(sum->num_locking_objects);
if (nlo == 0) {
pr_err("Need at least one locking object.\n");
pr_debug("Need at least one locking object.\n");
return false;
}
@ -385,9 +385,9 @@ static int next(struct opal_dev *dev)
error = step->fn(dev, step->data);
if (error) {
pr_err("Error on step function: %d with error %d: %s\n",
state, error,
opal_error_to_human(error));
pr_debug("Error on step function: %d with error %d: %s\n",
state, error,
opal_error_to_human(error));
/* For each OPAL command we do a discovery0 then we
* start some sort of session.
@ -419,8 +419,8 @@ static int opal_discovery0_end(struct opal_dev *dev)
print_buffer(dev->resp, hlen);
if (hlen > IO_BUFFER_LENGTH - sizeof(*hdr)) {
pr_warn("Discovery length overflows buffer (%zu+%u)/%u\n",
sizeof(*hdr), hlen, IO_BUFFER_LENGTH);
pr_debug("Discovery length overflows buffer (%zu+%u)/%u\n",
sizeof(*hdr), hlen, IO_BUFFER_LENGTH);
return -EFAULT;
}
@ -503,7 +503,7 @@ static void add_token_u8(int *err, struct opal_dev *cmd, u8 tok)
if (*err)
return;
if (cmd->pos >= IO_BUFFER_LENGTH - 1) {
pr_err("Error adding u8: end of buffer.\n");
pr_debug("Error adding u8: end of buffer.\n");
*err = -ERANGE;
return;
}
@ -553,7 +553,7 @@ static void add_token_u64(int *err, struct opal_dev *cmd, u64 number)
len = DIV_ROUND_UP(msb, 4);
if (cmd->pos >= IO_BUFFER_LENGTH - len - 1) {
pr_err("Error adding u64: end of buffer.\n");
pr_debug("Error adding u64: end of buffer.\n");
*err = -ERANGE;
return;
}
@ -579,7 +579,7 @@ static void add_token_bytestring(int *err, struct opal_dev *cmd,
}
if (len >= IO_BUFFER_LENGTH - cmd->pos - header_len) {
pr_err("Error adding bytestring: end of buffer.\n");
pr_debug("Error adding bytestring: end of buffer.\n");
*err = -ERANGE;
return;
}
@ -597,7 +597,7 @@ static void add_token_bytestring(int *err, struct opal_dev *cmd,
static int build_locking_range(u8 *buffer, size_t length, u8 lr)
{
if (length > OPAL_UID_LENGTH) {
pr_err("Can't build locking range. Length OOB\n");
pr_debug("Can't build locking range. Length OOB\n");
return -ERANGE;
}
@ -614,7 +614,7 @@ static int build_locking_range(u8 *buffer, size_t length, u8 lr)
static int build_locking_user(u8 *buffer, size_t length, u8 lr)
{
if (length > OPAL_UID_LENGTH) {
pr_err("Can't build locking range user, Length OOB\n");
pr_debug("Can't build locking range user, Length OOB\n");
return -ERANGE;
}
@ -648,7 +648,7 @@ static int cmd_finalize(struct opal_dev *cmd, u32 hsn, u32 tsn)
add_token_u8(&err, cmd, OPAL_ENDLIST);
if (err) {
pr_err("Error finalizing command.\n");
pr_debug("Error finalizing command.\n");
return -EFAULT;
}
@ -660,7 +660,7 @@ static int cmd_finalize(struct opal_dev *cmd, u32 hsn, u32 tsn)
hdr->subpkt.length = cpu_to_be32(cmd->pos - sizeof(*hdr));
while (cmd->pos % 4) {
if (cmd->pos >= IO_BUFFER_LENGTH) {
pr_err("Error: Buffer overrun\n");
pr_debug("Error: Buffer overrun\n");
return -ERANGE;
}
cmd->cmd[cmd->pos++] = 0;
@ -679,14 +679,14 @@ static const struct opal_resp_tok *response_get_token(
const struct opal_resp_tok *tok;
if (n >= resp->num) {
pr_err("Token number doesn't exist: %d, resp: %d\n",
n, resp->num);
pr_debug("Token number doesn't exist: %d, resp: %d\n",
n, resp->num);
return ERR_PTR(-EINVAL);
}
tok = &resp->toks[n];
if (tok->len == 0) {
pr_err("Token length must be non-zero\n");
pr_debug("Token length must be non-zero\n");
return ERR_PTR(-EINVAL);
}
@ -727,7 +727,7 @@ static ssize_t response_parse_short(struct opal_resp_tok *tok,
tok->type = OPAL_DTA_TOKENID_UINT;
if (tok->len > 9) {
pr_warn("uint64 with more than 8 bytes\n");
pr_debug("uint64 with more than 8 bytes\n");
return -EINVAL;
}
for (i = tok->len - 1; i > 0; i--) {
@ -814,8 +814,8 @@ static int response_parse(const u8 *buf, size_t length,
if (clen == 0 || plen == 0 || slen == 0 ||
slen > IO_BUFFER_LENGTH - sizeof(*hdr)) {
pr_err("Bad header length. cp: %u, pkt: %u, subpkt: %u\n",
clen, plen, slen);
pr_debug("Bad header length. cp: %u, pkt: %u, subpkt: %u\n",
clen, plen, slen);
print_buffer(pos, sizeof(*hdr));
return -EINVAL;
}
@ -848,7 +848,7 @@ static int response_parse(const u8 *buf, size_t length,
}
if (num_entries == 0) {
pr_err("Couldn't parse response.\n");
pr_debug("Couldn't parse response.\n");
return -EINVAL;
}
resp->num = num_entries;
@ -861,18 +861,18 @@ static size_t response_get_string(const struct parsed_resp *resp, int n,
{
*store = NULL;
if (!resp) {
pr_err("Response is NULL\n");
pr_debug("Response is NULL\n");
return 0;
}
if (n > resp->num) {
pr_err("Response has %d tokens. Can't access %d\n",
resp->num, n);
pr_debug("Response has %d tokens. Can't access %d\n",
resp->num, n);
return 0;
}
if (resp->toks[n].type != OPAL_DTA_TOKENID_BYTESTRING) {
pr_err("Token is not a byte string!\n");
pr_debug("Token is not a byte string!\n");
return 0;
}
@ -883,26 +883,26 @@ static size_t response_get_string(const struct parsed_resp *resp, int n,
static u64 response_get_u64(const struct parsed_resp *resp, int n)
{
if (!resp) {
pr_err("Response is NULL\n");
pr_debug("Response is NULL\n");
return 0;
}
if (n > resp->num) {
pr_err("Response has %d tokens. Can't access %d\n",
resp->num, n);
pr_debug("Response has %d tokens. Can't access %d\n",
resp->num, n);
return 0;
}
if (resp->toks[n].type != OPAL_DTA_TOKENID_UINT) {
pr_err("Token is not unsigned it: %d\n",
resp->toks[n].type);
pr_debug("Token is not unsigned it: %d\n",
resp->toks[n].type);
return 0;
}
if (!(resp->toks[n].width == OPAL_WIDTH_TINY ||
resp->toks[n].width == OPAL_WIDTH_SHORT)) {
pr_err("Atom is not short or tiny: %d\n",
resp->toks[n].width);
pr_debug("Atom is not short or tiny: %d\n",
resp->toks[n].width);
return 0;
}
@ -949,7 +949,7 @@ static int parse_and_check_status(struct opal_dev *dev)
error = response_parse(dev->resp, IO_BUFFER_LENGTH, &dev->parsed);
if (error) {
pr_err("Couldn't parse response.\n");
pr_debug("Couldn't parse response.\n");
return error;
}
@ -975,7 +975,7 @@ static int start_opal_session_cont(struct opal_dev *dev)
tsn = response_get_u64(&dev->parsed, 5);
if (hsn == 0 && tsn == 0) {
pr_err("Couldn't authenticate session\n");
pr_debug("Couldn't authenticate session\n");
return -EPERM;
}
@ -1012,7 +1012,7 @@ static int finalize_and_send(struct opal_dev *dev, cont_fn cont)
ret = cmd_finalize(dev, dev->hsn, dev->tsn);
if (ret) {
pr_err("Error finalizing command buffer: %d\n", ret);
pr_debug("Error finalizing command buffer: %d\n", ret);
return ret;
}
@ -1041,7 +1041,7 @@ static int gen_key(struct opal_dev *dev, void *data)
add_token_u8(&err, dev, OPAL_ENDLIST);
if (err) {
pr_err("Error building gen key command\n");
pr_debug("Error building gen key command\n");
return err;
}
@ -1059,8 +1059,8 @@ static int get_active_key_cont(struct opal_dev *dev)
return error;
keylen = response_get_string(&dev->parsed, 4, &activekey);
if (!activekey) {
pr_err("%s: Couldn't extract the Activekey from the response\n",
__func__);
pr_debug("%s: Couldn't extract the Activekey from the response\n",
__func__);
return OPAL_INVAL_PARAM;
}
dev->prev_data = kmemdup(activekey, keylen, GFP_KERNEL);
@ -1103,7 +1103,7 @@ static int get_active_key(struct opal_dev *dev, void *data)
add_token_u8(&err, dev, OPAL_ENDLIST);
add_token_u8(&err, dev, OPAL_ENDLIST);
if (err) {
pr_err("Error building get active key command\n");
pr_debug("Error building get active key command\n");
return err;
}
@ -1159,7 +1159,7 @@ static inline int enable_global_lr(struct opal_dev *dev, u8 *uid,
err = generic_lr_enable_disable(dev, uid, !!setup->RLE, !!setup->WLE,
0, 0);
if (err)
pr_err("Failed to create enable global lr command\n");
pr_debug("Failed to create enable global lr command\n");
return err;
}
@ -1217,7 +1217,7 @@ static int setup_locking_range(struct opal_dev *dev, void *data)
}
if (err) {
pr_err("Error building Setup Locking range command.\n");
pr_debug("Error building Setup Locking range command.\n");
return err;
}
@ -1234,11 +1234,8 @@ static int start_generic_opal_session(struct opal_dev *dev,
u32 hsn;
int err = 0;
if (key == NULL && auth != OPAL_ANYBODY_UID) {
pr_err("%s: Attempted to open ADMIN_SP Session without a Host" \
"Challenge, and not as the Anybody UID\n", __func__);
if (key == NULL && auth != OPAL_ANYBODY_UID)
return OPAL_INVAL_PARAM;
}
clear_opal_cmd(dev);
@ -1273,12 +1270,12 @@ static int start_generic_opal_session(struct opal_dev *dev,
add_token_u8(&err, dev, OPAL_ENDLIST);
break;
default:
pr_err("Cannot start Admin SP session with auth %d\n", auth);
pr_debug("Cannot start Admin SP session with auth %d\n", auth);
return OPAL_INVAL_PARAM;
}
if (err) {
pr_err("Error building start adminsp session command.\n");
pr_debug("Error building start adminsp session command.\n");
return err;
}
@ -1369,7 +1366,7 @@ static int start_auth_opal_session(struct opal_dev *dev, void *data)
add_token_u8(&err, dev, OPAL_ENDLIST);
if (err) {
pr_err("Error building STARTSESSION command.\n");
pr_debug("Error building STARTSESSION command.\n");
return err;
}
@ -1391,7 +1388,7 @@ static int revert_tper(struct opal_dev *dev, void *data)
add_token_u8(&err, dev, OPAL_STARTLIST);
add_token_u8(&err, dev, OPAL_ENDLIST);
if (err) {
pr_err("Error building REVERT TPER command.\n");
pr_debug("Error building REVERT TPER command.\n");
return err;
}
@ -1426,7 +1423,7 @@ static int internal_activate_user(struct opal_dev *dev, void *data)
add_token_u8(&err, dev, OPAL_ENDLIST);
if (err) {
pr_err("Error building Activate UserN command.\n");
pr_debug("Error building Activate UserN command.\n");
return err;
}
@ -1453,7 +1450,7 @@ static int erase_locking_range(struct opal_dev *dev, void *data)
add_token_u8(&err, dev, OPAL_ENDLIST);
if (err) {
pr_err("Error building Erase Locking Range Command.\n");
pr_debug("Error building Erase Locking Range Command.\n");
return err;
}
return finalize_and_send(dev, parse_and_check_status);
@ -1484,7 +1481,7 @@ static int set_mbr_done(struct opal_dev *dev, void *data)
add_token_u8(&err, dev, OPAL_ENDLIST);
if (err) {
pr_err("Error Building set MBR Done command\n");
pr_debug("Error Building set MBR Done command\n");
return err;
}
@ -1516,7 +1513,7 @@ static int set_mbr_enable_disable(struct opal_dev *dev, void *data)
add_token_u8(&err, dev, OPAL_ENDLIST);
if (err) {
pr_err("Error Building set MBR done command\n");
pr_debug("Error Building set MBR done command\n");
return err;
}
@ -1567,7 +1564,7 @@ static int set_new_pw(struct opal_dev *dev, void *data)
if (generic_pw_cmd(usr->opal_key.key, usr->opal_key.key_len,
cpin_uid, dev)) {
pr_err("Error building set password command.\n");
pr_debug("Error building set password command.\n");
return -ERANGE;
}
@ -1582,7 +1579,7 @@ static int set_sid_cpin_pin(struct opal_dev *dev, void *data)
memcpy(cpin_uid, opaluid[OPAL_C_PIN_SID], OPAL_UID_LENGTH);
if (generic_pw_cmd(key->key, key->key_len, cpin_uid, dev)) {
pr_err("Error building Set SID cpin\n");
pr_debug("Error building Set SID cpin\n");
return -ERANGE;
}
return finalize_and_send(dev, parse_and_check_status);
@ -1657,7 +1654,7 @@ static int add_user_to_lr(struct opal_dev *dev, void *data)
add_token_u8(&err, dev, OPAL_ENDLIST);
if (err) {
pr_err("Error building add user to locking range command.\n");
pr_debug("Error building add user to locking range command.\n");
return err;
}
@ -1691,7 +1688,7 @@ static int lock_unlock_locking_range(struct opal_dev *dev, void *data)
/* vars are initalized to locked */
break;
default:
pr_err("Tried to set an invalid locking state... returning to uland\n");
pr_debug("Tried to set an invalid locking state... returning to uland\n");
return OPAL_INVAL_PARAM;
}
@ -1718,7 +1715,7 @@ static int lock_unlock_locking_range(struct opal_dev *dev, void *data)
add_token_u8(&err, dev, OPAL_ENDLIST);
if (err) {
pr_err("Error building SET command.\n");
pr_debug("Error building SET command.\n");
return err;
}
return finalize_and_send(dev, parse_and_check_status);
@ -1752,14 +1749,14 @@ static int lock_unlock_locking_range_sum(struct opal_dev *dev, void *data)
/* vars are initalized to locked */
break;
default:
pr_err("Tried to set an invalid locking state.\n");
pr_debug("Tried to set an invalid locking state.\n");
return OPAL_INVAL_PARAM;
}
ret = generic_lr_enable_disable(dev, lr_buffer, 1, 1,
read_locked, write_locked);
if (ret < 0) {
pr_err("Error building SET command.\n");
pr_debug("Error building SET command.\n");
return ret;
}
return finalize_and_send(dev, parse_and_check_status);
@ -1811,7 +1808,7 @@ static int activate_lsp(struct opal_dev *dev, void *data)
}
if (err) {
pr_err("Error building Activate LockingSP command.\n");
pr_debug("Error building Activate LockingSP command.\n");
return err;
}
@ -1831,7 +1828,7 @@ static int get_lsp_lifecycle_cont(struct opal_dev *dev)
/* 0x08 is Manufacured Inactive */
/* 0x09 is Manufactured */
if (lc_status != OPAL_MANUFACTURED_INACTIVE) {
pr_err("Couldn't determine the status of the Lifcycle state\n");
pr_debug("Couldn't determine the status of the Lifecycle state\n");
return -ENODEV;
}
@ -1868,7 +1865,7 @@ static int get_lsp_lifecycle(struct opal_dev *dev, void *data)
add_token_u8(&err, dev, OPAL_ENDLIST);
if (err) {
pr_err("Error Building GET Lifecycle Status command\n");
pr_debug("Error Building GET Lifecycle Status command\n");
return err;
}
@ -1887,7 +1884,7 @@ static int get_msid_cpin_pin_cont(struct opal_dev *dev)
strlen = response_get_string(&dev->parsed, 4, &msid_pin);
if (!msid_pin) {
pr_err("%s: Couldn't extract PIN from response\n", __func__);
pr_debug("%s: Couldn't extract PIN from response\n", __func__);
return OPAL_INVAL_PARAM;
}
@ -1929,7 +1926,7 @@ static int get_msid_cpin_pin(struct opal_dev *dev, void *data)
add_token_u8(&err, dev, OPAL_ENDLIST);
if (err) {
pr_err("Error building Get MSID CPIN PIN command.\n");
pr_debug("Error building Get MSID CPIN PIN command.\n");
return err;
}
@ -2124,18 +2121,18 @@ static int opal_add_user_to_lr(struct opal_dev *dev,
if (lk_unlk->l_state != OPAL_RO &&
lk_unlk->l_state != OPAL_RW) {
pr_err("Locking state was not RO or RW\n");
pr_debug("Locking state was not RO or RW\n");
return -EINVAL;
}
if (lk_unlk->session.who < OPAL_USER1 ||
lk_unlk->session.who > OPAL_USER9) {
pr_err("Authority was not within the range of users: %d\n",
lk_unlk->session.who);
pr_debug("Authority was not within the range of users: %d\n",
lk_unlk->session.who);
return -EINVAL;
}
if (lk_unlk->session.sum) {
pr_err("%s not supported in sum. Use setup locking range\n",
__func__);
pr_debug("%s not supported in sum. Use setup locking range\n",
__func__);
return -EINVAL;
}
@ -2312,7 +2309,7 @@ static int opal_activate_user(struct opal_dev *dev,
/* We can't activate Admin1 it's active as manufactured */
if (opal_session->who < OPAL_USER1 ||
opal_session->who > OPAL_USER9) {
pr_err("Who was not a valid user: %d\n", opal_session->who);
pr_debug("Who was not a valid user: %d\n", opal_session->who);
return -EINVAL;
}
@ -2343,9 +2340,9 @@ bool opal_unlock_from_suspend(struct opal_dev *dev)
ret = __opal_lock_unlock(dev, &suspend->unlk);
if (ret) {
pr_warn("Failed to unlock LR %hhu with sum %d\n",
suspend->unlk.session.opal_key.lr,
suspend->unlk.session.sum);
pr_debug("Failed to unlock LR %hhu with sum %d\n",
suspend->unlk.session.opal_key.lr,
suspend->unlk.session.sum);
was_failure = true;
}
}
@ -2363,10 +2360,8 @@ int sed_ioctl(struct opal_dev *dev, unsigned int cmd, void __user *arg)
return -EACCES;
if (!dev)
return -ENOTSUPP;
if (!dev->supported) {
pr_err("Not supported\n");
if (!dev->supported)
return -ENOTSUPP;
}
p = memdup_user(arg, _IOC_SIZE(cmd));
if (IS_ERR(p))
@ -2410,7 +2405,7 @@ int sed_ioctl(struct opal_dev *dev, unsigned int cmd, void __user *arg)
ret = opal_secure_erase_locking_range(dev, p);
break;
default:
pr_warn("No such Opal Ioctl %u\n", cmd);
break;
}
kfree(p);

View File

@ -160,28 +160,28 @@ static int t10_pi_type3_verify_ip(struct blk_integrity_iter *iter)
return t10_pi_verify(iter, t10_pi_ip_fn, 3);
}
struct blk_integrity_profile t10_pi_type1_crc = {
const struct blk_integrity_profile t10_pi_type1_crc = {
.name = "T10-DIF-TYPE1-CRC",
.generate_fn = t10_pi_type1_generate_crc,
.verify_fn = t10_pi_type1_verify_crc,
};
EXPORT_SYMBOL(t10_pi_type1_crc);
struct blk_integrity_profile t10_pi_type1_ip = {
const struct blk_integrity_profile t10_pi_type1_ip = {
.name = "T10-DIF-TYPE1-IP",
.generate_fn = t10_pi_type1_generate_ip,
.verify_fn = t10_pi_type1_verify_ip,
};
EXPORT_SYMBOL(t10_pi_type1_ip);
struct blk_integrity_profile t10_pi_type3_crc = {
const struct blk_integrity_profile t10_pi_type3_crc = {
.name = "T10-DIF-TYPE3-CRC",
.generate_fn = t10_pi_type3_generate_crc,
.verify_fn = t10_pi_type3_verify_crc,
};
EXPORT_SYMBOL(t10_pi_type3_crc);
struct blk_integrity_profile t10_pi_type3_ip = {
const struct blk_integrity_profile t10_pi_type3_ip = {
.name = "T10-DIF-TYPE3-IP",
.generate_fn = t10_pi_type3_generate_ip,
.verify_fn = t10_pi_type3_verify_ip,

View File

@ -312,22 +312,6 @@ config BLK_DEV_SKD
Use device /dev/skd$N amd /dev/skd$Np$M.
config BLK_DEV_OSD
tristate "OSD object-as-blkdev support"
depends on SCSI_OSD_ULD
---help---
Saying Y or M here will allow the exporting of a single SCSI
OSD (object-based storage) object as a Linux block device.
For example, if you create a 2G object on an OSD device,
you can then use this module to present that 2G object as
a Linux block device.
To compile this driver as a module, choose M here: the
module will be called osdblk.
If unsure, say N.
config BLK_DEV_SX8
tristate "Promise SATA SX8 support"
depends on PCI
@ -434,23 +418,6 @@ config ATA_OVER_ETH
This driver provides Support for ATA over Ethernet block
devices like the Coraid EtherDrive (R) Storage Blade.
config MG_DISK
tristate "mGine mflash, gflash support"
depends on ARM && GPIOLIB
help
mGine mFlash(gFlash) block device driver
config MG_DISK_RES
int "Size of reserved area before MBR"
depends on MG_DISK
default 0
help
Define size of reserved area that usually used for boot. Unit is KB.
All of the block device operation will be taken this value as start
offset
Examples:
1024 => 1 MB
config SUNVDC
tristate "Sun Virtual Disk Client support"
depends on SUN_LDOMS
@ -512,19 +479,7 @@ config VIRTIO_BLK_SCSI
Enable support for SCSI passthrough (e.g. the SG_IO ioctl) on
virtio-blk devices. This is only supported for the legacy
virtio protocol and not enabled by default by any hypervisor.
Your probably want to virtio-scsi instead.
config BLK_DEV_HD
bool "Very old hard disk (MFM/RLL/IDE) driver"
depends on HAVE_IDE
depends on !ARM || ARCH_RPC || BROKEN
help
This is a very old hard disk driver that lacks the enhanced
functionality of the newer ones.
It is required for systems with ancient MFM/RLL/ESDI drives.
If unsure, say N.
You probably want to use virtio-scsi instead.
config BLK_DEV_RBD
tristate "Rados block device (RBD)"

View File

@ -19,10 +19,8 @@ obj-$(CONFIG_BLK_CPQ_CISS_DA) += cciss.o
obj-$(CONFIG_BLK_DEV_DAC960) += DAC960.o
obj-$(CONFIG_XILINX_SYSACE) += xsysace.o
obj-$(CONFIG_CDROM_PKTCDVD) += pktcdvd.o
obj-$(CONFIG_MG_DISK) += mg_disk.o
obj-$(CONFIG_SUNVDC) += sunvdc.o
obj-$(CONFIG_BLK_DEV_SKD) += skd.o
obj-$(CONFIG_BLK_DEV_OSD) += osdblk.o
obj-$(CONFIG_BLK_DEV_UMEM) += umem.o
obj-$(CONFIG_BLK_DEV_NBD) += nbd.o
@ -30,7 +28,6 @@ obj-$(CONFIG_BLK_DEV_CRYPTOLOOP) += cryptoloop.o
obj-$(CONFIG_VIRTIO_BLK) += virtio_blk.o
obj-$(CONFIG_BLK_DEV_SX8) += sx8.o
obj-$(CONFIG_BLK_DEV_HD) += hd.o
obj-$(CONFIG_XEN_BLKDEV_FRONTEND) += xen-blkfront.o
obj-$(CONFIG_XEN_BLKDEV_BACKEND) += xen-blkback/

View File

@ -617,12 +617,12 @@ static void fd_error( void )
if (!fd_request)
return;
fd_request->errors++;
if (fd_request->errors >= MAX_ERRORS) {
fd_request->error_count++;
if (fd_request->error_count >= MAX_ERRORS) {
printk(KERN_ERR "fd%d: too many errors.\n", SelectedDrive );
fd_end_request_cur(-EIO);
}
else if (fd_request->errors == RECALIBRATE_ERRORS) {
else if (fd_request->error_count == RECALIBRATE_ERRORS) {
printk(KERN_WARNING "fd%d: recalibrating\n", SelectedDrive );
if (SelectedDrive != -1)
SUD.track = -1;
@ -1386,7 +1386,7 @@ static void setup_req_params( int drive )
ReqData = ReqBuffer + 512 * ReqCnt;
if (UseTrackbuffer)
read_track = (ReqCmd == READ && fd_request->errors == 0);
read_track = (ReqCmd == READ && fd_request->error_count == 0);
else
read_track = 0;
@ -1409,8 +1409,10 @@ static struct request *set_next_request(void)
fdc_queue = 0;
if (q) {
rq = blk_fetch_request(q);
if (rq)
if (rq) {
rq->error_count = 0;
break;
}
}
} while (fdc_queue != old_pos);

View File

@ -134,28 +134,6 @@ static struct page *brd_insert_page(struct brd_device *brd, sector_t sector)
return page;
}
static void brd_free_page(struct brd_device *brd, sector_t sector)
{
struct page *page;
pgoff_t idx;
spin_lock(&brd->brd_lock);
idx = sector >> PAGE_SECTORS_SHIFT;
page = radix_tree_delete(&brd->brd_pages, idx);
spin_unlock(&brd->brd_lock);
if (page)
__free_page(page);
}
static void brd_zero_page(struct brd_device *brd, sector_t sector)
{
struct page *page;
page = brd_lookup_page(brd, sector);
if (page)
clear_highpage(page);
}
/*
* Free all backing store pages and radix tree. This must only be called when
* there are no other users of the device.
@ -212,24 +190,6 @@ static int copy_to_brd_setup(struct brd_device *brd, sector_t sector, size_t n)
return 0;
}
static void discard_from_brd(struct brd_device *brd,
sector_t sector, size_t n)
{
while (n >= PAGE_SIZE) {
/*
* Don't want to actually discard pages here because
* re-allocating the pages can result in writeback
* deadlocks under heavy load.
*/
if (0)
brd_free_page(brd, sector);
else
brd_zero_page(brd, sector);
sector += PAGE_SIZE >> SECTOR_SHIFT;
n -= PAGE_SIZE;
}
}
/*
* Copy n bytes from src to the brd starting at sector. Does not sleep.
*/
@ -338,14 +298,6 @@ static blk_qc_t brd_make_request(struct request_queue *q, struct bio *bio)
if (bio_end_sector(bio) > get_capacity(bdev->bd_disk))
goto io_error;
if (unlikely(bio_op(bio) == REQ_OP_DISCARD)) {
if (sector & ((PAGE_SIZE >> SECTOR_SHIFT) - 1) ||
bio->bi_iter.bi_size & ~PAGE_MASK)
goto io_error;
discard_from_brd(brd, sector, bio->bi_iter.bi_size);
goto out;
}
bio_for_each_segment(bvec, bio, iter) {
unsigned int len = bvec.bv_len;
int err;
@ -357,7 +309,6 @@ static blk_qc_t brd_make_request(struct request_queue *q, struct bio *bio)
sector += len >> SECTOR_SHIFT;
}
out:
bio_endio(bio);
return BLK_QC_T_NONE;
io_error:
@ -464,11 +415,6 @@ static struct brd_device *brd_alloc(int i)
* is harmless)
*/
blk_queue_physical_block_size(brd->brd_queue, PAGE_SIZE);
brd->brd_queue->limits.discard_granularity = PAGE_SIZE;
blk_queue_max_discard_sectors(brd->brd_queue, UINT_MAX);
brd->brd_queue->limits.discard_zeroes_data = 1;
queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, brd->brd_queue);
#ifdef CONFIG_BLK_DEV_RAM_DAX
queue_flag_set_unlocked(QUEUE_FLAG_DAX, brd->brd_queue);
#endif

View File

@ -1864,8 +1864,7 @@ static void cciss_softirq_done(struct request *rq)
/* set the residual count for pc requests */
if (blk_rq_is_passthrough(rq))
scsi_req(rq)->resid_len = c->err_info->ResidualCnt;
blk_end_request_all(rq, (rq->errors == 0) ? 0 : -EIO);
blk_end_request_all(rq, scsi_req(rq)->result ? -EIO : 0);
spin_lock_irqsave(&h->lock, flags);
cmd_free(h, c);
@ -3140,18 +3139,19 @@ static inline void complete_command(ctlr_info_t *h, CommandList_struct *cmd,
{
int retry_cmd = 0;
struct request *rq = cmd->rq;
struct scsi_request *sreq = scsi_req(rq);
rq->errors = 0;
sreq->result = 0;
if (timeout)
rq->errors = make_status_bytes(0, 0, 0, DRIVER_TIMEOUT);
sreq->result = make_status_bytes(0, 0, 0, DRIVER_TIMEOUT);
if (cmd->err_info->CommandStatus == 0) /* no error has occurred */
goto after_error_processing;
switch (cmd->err_info->CommandStatus) {
case CMD_TARGET_STATUS:
rq->errors = evaluate_target_status(h, cmd, &retry_cmd);
sreq->result = evaluate_target_status(h, cmd, &retry_cmd);
break;
case CMD_DATA_UNDERRUN:
if (!blk_rq_is_passthrough(cmd->rq)) {
@ -3169,7 +3169,7 @@ static inline void complete_command(ctlr_info_t *h, CommandList_struct *cmd,
case CMD_INVALID:
dev_warn(&h->pdev->dev, "cciss: cmd %p is "
"reported invalid\n", cmd);
rq->errors = make_status_bytes(SAM_STAT_GOOD,
sreq->result = make_status_bytes(SAM_STAT_GOOD,
cmd->err_info->CommandStatus, DRIVER_OK,
blk_rq_is_passthrough(cmd->rq) ?
DID_PASSTHROUGH : DID_ERROR);
@ -3177,7 +3177,7 @@ static inline void complete_command(ctlr_info_t *h, CommandList_struct *cmd,
case CMD_PROTOCOL_ERR:
dev_warn(&h->pdev->dev, "cciss: cmd %p has "
"protocol error\n", cmd);
rq->errors = make_status_bytes(SAM_STAT_GOOD,
sreq->result = make_status_bytes(SAM_STAT_GOOD,
cmd->err_info->CommandStatus, DRIVER_OK,
blk_rq_is_passthrough(cmd->rq) ?
DID_PASSTHROUGH : DID_ERROR);
@ -3185,7 +3185,7 @@ static inline void complete_command(ctlr_info_t *h, CommandList_struct *cmd,
case CMD_HARDWARE_ERR:
dev_warn(&h->pdev->dev, "cciss: cmd %p had "
" hardware error\n", cmd);
rq->errors = make_status_bytes(SAM_STAT_GOOD,
sreq->result = make_status_bytes(SAM_STAT_GOOD,
cmd->err_info->CommandStatus, DRIVER_OK,
blk_rq_is_passthrough(cmd->rq) ?
DID_PASSTHROUGH : DID_ERROR);
@ -3193,7 +3193,7 @@ static inline void complete_command(ctlr_info_t *h, CommandList_struct *cmd,
case CMD_CONNECTION_LOST:
dev_warn(&h->pdev->dev, "cciss: cmd %p had "
"connection lost\n", cmd);
rq->errors = make_status_bytes(SAM_STAT_GOOD,
sreq->result = make_status_bytes(SAM_STAT_GOOD,
cmd->err_info->CommandStatus, DRIVER_OK,
blk_rq_is_passthrough(cmd->rq) ?
DID_PASSTHROUGH : DID_ERROR);
@ -3201,7 +3201,7 @@ static inline void complete_command(ctlr_info_t *h, CommandList_struct *cmd,
case CMD_ABORTED:
dev_warn(&h->pdev->dev, "cciss: cmd %p was "
"aborted\n", cmd);
rq->errors = make_status_bytes(SAM_STAT_GOOD,
sreq->result = make_status_bytes(SAM_STAT_GOOD,
cmd->err_info->CommandStatus, DRIVER_OK,
blk_rq_is_passthrough(cmd->rq) ?
DID_PASSTHROUGH : DID_ABORT);
@ -3209,7 +3209,7 @@ static inline void complete_command(ctlr_info_t *h, CommandList_struct *cmd,
case CMD_ABORT_FAILED:
dev_warn(&h->pdev->dev, "cciss: cmd %p reports "
"abort failed\n", cmd);
rq->errors = make_status_bytes(SAM_STAT_GOOD,
sreq->result = make_status_bytes(SAM_STAT_GOOD,
cmd->err_info->CommandStatus, DRIVER_OK,
blk_rq_is_passthrough(cmd->rq) ?
DID_PASSTHROUGH : DID_ERROR);
@ -3224,21 +3224,21 @@ static inline void complete_command(ctlr_info_t *h, CommandList_struct *cmd,
} else
dev_warn(&h->pdev->dev,
"%p retried too many times\n", cmd);
rq->errors = make_status_bytes(SAM_STAT_GOOD,
sreq->result = make_status_bytes(SAM_STAT_GOOD,
cmd->err_info->CommandStatus, DRIVER_OK,
blk_rq_is_passthrough(cmd->rq) ?
DID_PASSTHROUGH : DID_ABORT);
break;
case CMD_TIMEOUT:
dev_warn(&h->pdev->dev, "cmd %p timedout\n", cmd);
rq->errors = make_status_bytes(SAM_STAT_GOOD,
sreq->result = make_status_bytes(SAM_STAT_GOOD,
cmd->err_info->CommandStatus, DRIVER_OK,
blk_rq_is_passthrough(cmd->rq) ?
DID_PASSTHROUGH : DID_ERROR);
break;
case CMD_UNABORTABLE:
dev_warn(&h->pdev->dev, "cmd %p unabortable\n", cmd);
rq->errors = make_status_bytes(SAM_STAT_GOOD,
sreq->result = make_status_bytes(SAM_STAT_GOOD,
cmd->err_info->CommandStatus, DRIVER_OK,
blk_rq_is_passthrough(cmd->rq) ?
DID_PASSTHROUGH : DID_ERROR);
@ -3247,7 +3247,7 @@ static inline void complete_command(ctlr_info_t *h, CommandList_struct *cmd,
dev_warn(&h->pdev->dev, "cmd %p returned "
"unknown status %x\n", cmd,
cmd->err_info->CommandStatus);
rq->errors = make_status_bytes(SAM_STAT_GOOD,
sreq->result = make_status_bytes(SAM_STAT_GOOD,
cmd->err_info->CommandStatus, DRIVER_OK,
blk_rq_is_passthrough(cmd->rq) ?
DID_PASSTHROUGH : DID_ERROR);
@ -3380,9 +3380,9 @@ static void do_cciss_request(struct request_queue *q)
if (dma_mapping_error(&h->pdev->dev, temp64.val)) {
dev_warn(&h->pdev->dev,
"%s: error mapping page for DMA\n", __func__);
creq->errors = make_status_bytes(SAM_STAT_GOOD,
0, DRIVER_OK,
DID_SOFT_ERROR);
scsi_req(creq)->result =
make_status_bytes(SAM_STAT_GOOD, 0, DRIVER_OK,
DID_SOFT_ERROR);
cmd_free(h, c);
return;
}
@ -3395,9 +3395,9 @@ static void do_cciss_request(struct request_queue *q)
if (cciss_map_sg_chain_block(h, c, h->cmd_sg_list[c->cmdindex],
(seg - (h->max_cmd_sgentries - 1)) *
sizeof(SGDescriptor_struct))) {
creq->errors = make_status_bytes(SAM_STAT_GOOD,
0, DRIVER_OK,
DID_SOFT_ERROR);
scsi_req(creq)->result =
make_status_bytes(SAM_STAT_GOOD, 0, DRIVER_OK,
DID_SOFT_ERROR);
cmd_free(h, c);
return;
}

View File

@ -236,9 +236,6 @@ static void seq_print_peer_request_flags(struct seq_file *m, struct drbd_peer_re
seq_print_rq_state_bit(m, f & EE_CALL_AL_COMPLETE_IO, &sep, "in-AL");
seq_print_rq_state_bit(m, f & EE_SEND_WRITE_ACK, &sep, "C");
seq_print_rq_state_bit(m, f & EE_MAY_SET_IN_SYNC, &sep, "set-in-sync");
if (f & EE_IS_TRIM)
__seq_print_rq_state_bit(m, f & EE_IS_TRIM_USE_ZEROOUT, &sep, "zero-out", "trim");
seq_print_rq_state_bit(m, f & EE_WRITE_SAME, &sep, "write-same");
seq_putc(m, '\n');
}

View File

@ -437,9 +437,6 @@ enum {
/* is this a TRIM aka REQ_DISCARD? */
__EE_IS_TRIM,
/* our lower level cannot handle trim,
* and we want to fall back to zeroout instead */
__EE_IS_TRIM_USE_ZEROOUT,
/* In case a barrier failed,
* we need to resubmit without the barrier flag. */
@ -482,7 +479,6 @@ enum {
#define EE_CALL_AL_COMPLETE_IO (1<<__EE_CALL_AL_COMPLETE_IO)
#define EE_MAY_SET_IN_SYNC (1<<__EE_MAY_SET_IN_SYNC)
#define EE_IS_TRIM (1<<__EE_IS_TRIM)
#define EE_IS_TRIM_USE_ZEROOUT (1<<__EE_IS_TRIM_USE_ZEROOUT)
#define EE_RESUBMITTED (1<<__EE_RESUBMITTED)
#define EE_WAS_ERROR (1<<__EE_WAS_ERROR)
#define EE_HAS_DIGEST (1<<__EE_HAS_DIGEST)
@ -1561,8 +1557,6 @@ extern void start_resync_timer_fn(unsigned long data);
extern void drbd_endio_write_sec_final(struct drbd_peer_request *peer_req);
/* drbd_receiver.c */
extern int drbd_issue_discard_or_zero_out(struct drbd_device *device,
sector_t start, unsigned int nr_sectors, bool discard);
extern int drbd_receiver(struct drbd_thread *thi);
extern int drbd_ack_receiver(struct drbd_thread *thi);
extern void drbd_send_ping_wf(struct work_struct *ws);

View File

@ -931,7 +931,6 @@ void assign_p_sizes_qlim(struct drbd_device *device, struct p_sizes *p, struct r
p->qlim->io_min = cpu_to_be32(queue_io_min(q));
p->qlim->io_opt = cpu_to_be32(queue_io_opt(q));
p->qlim->discard_enabled = blk_queue_discard(q);
p->qlim->discard_zeroes_data = queue_discard_zeroes_data(q);
p->qlim->write_same_capable = !!q->limits.max_write_same_sectors;
} else {
q = device->rq_queue;
@ -941,7 +940,6 @@ void assign_p_sizes_qlim(struct drbd_device *device, struct p_sizes *p, struct r
p->qlim->io_min = cpu_to_be32(queue_io_min(q));
p->qlim->io_opt = cpu_to_be32(queue_io_opt(q));
p->qlim->discard_enabled = 0;
p->qlim->discard_zeroes_data = 0;
p->qlim->write_same_capable = 0;
}
}
@ -1668,7 +1666,8 @@ static u32 bio_flags_to_wire(struct drbd_connection *connection,
(bio->bi_opf & REQ_FUA ? DP_FUA : 0) |
(bio->bi_opf & REQ_PREFLUSH ? DP_FLUSH : 0) |
(bio_op(bio) == REQ_OP_WRITE_SAME ? DP_WSAME : 0) |
(bio_op(bio) == REQ_OP_DISCARD ? DP_DISCARD : 0);
(bio_op(bio) == REQ_OP_DISCARD ? DP_DISCARD : 0) |
(bio_op(bio) == REQ_OP_WRITE_ZEROES ? DP_DISCARD : 0);
else
return bio->bi_opf & REQ_SYNC ? DP_RW_SYNC : 0;
}

View File

@ -1199,10 +1199,6 @@ static void decide_on_discard_support(struct drbd_device *device,
struct drbd_connection *connection = first_peer_device(device)->connection;
bool can_do = b ? blk_queue_discard(b) : true;
if (can_do && b && !b->limits.discard_zeroes_data && !discard_zeroes_if_aligned) {
can_do = false;
drbd_info(device, "discard_zeroes_data=0 and discard_zeroes_if_aligned=no: disabling discards\n");
}
if (can_do && connection->cstate >= C_CONNECTED && !(connection->agreed_features & DRBD_FF_TRIM)) {
can_do = false;
drbd_info(connection, "peer DRBD too old, does not support TRIM: disabling discards\n");
@ -1217,10 +1213,12 @@ static void decide_on_discard_support(struct drbd_device *device,
blk_queue_discard_granularity(q, 512);
q->limits.max_discard_sectors = drbd_max_discard_sectors(connection);
queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
q->limits.max_write_zeroes_sectors = drbd_max_discard_sectors(connection);
} else {
queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, q);
blk_queue_discard_granularity(q, 0);
q->limits.max_discard_sectors = 0;
q->limits.max_write_zeroes_sectors = 0;
}
}
@ -1482,8 +1480,7 @@ static void sanitize_disk_conf(struct drbd_device *device, struct disk_conf *dis
if (disk_conf->al_extents > drbd_al_extents_max(nbc))
disk_conf->al_extents = drbd_al_extents_max(nbc);
if (!blk_queue_discard(q)
|| (!q->limits.discard_zeroes_data && !disk_conf->discard_zeroes_if_aligned)) {
if (!blk_queue_discard(q)) {
if (disk_conf->rs_discard_granularity) {
disk_conf->rs_discard_granularity = 0; /* disable feature */
drbd_info(device, "rs_discard_granularity feature disabled\n");

View File

@ -1448,105 +1448,14 @@ void drbd_bump_write_ordering(struct drbd_resource *resource, struct drbd_backin
drbd_info(resource, "Method to ensure write ordering: %s\n", write_ordering_str[resource->write_ordering]);
}
/*
* We *may* ignore the discard-zeroes-data setting, if so configured.
*
* Assumption is that it "discard_zeroes_data=0" is only because the backend
* may ignore partial unaligned discards.
*
* LVM/DM thin as of at least
* LVM version: 2.02.115(2)-RHEL7 (2015-01-28)
* Library version: 1.02.93-RHEL7 (2015-01-28)
* Driver version: 4.29.0
* still behaves this way.
*
* For unaligned (wrt. alignment and granularity) or too small discards,
* we zero-out the initial (and/or) trailing unaligned partial chunks,
* but discard all the aligned full chunks.
*
* At least for LVM/DM thin, the result is effectively "discard_zeroes_data=1".
*/
int drbd_issue_discard_or_zero_out(struct drbd_device *device, sector_t start, unsigned int nr_sectors, bool discard)
{
struct block_device *bdev = device->ldev->backing_bdev;
struct request_queue *q = bdev_get_queue(bdev);
sector_t tmp, nr;
unsigned int max_discard_sectors, granularity;
int alignment;
int err = 0;
if (!discard)
goto zero_out;
/* Zero-sector (unknown) and one-sector granularities are the same. */
granularity = max(q->limits.discard_granularity >> 9, 1U);
alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
max_discard_sectors = min(q->limits.max_discard_sectors, (1U << 22));
max_discard_sectors -= max_discard_sectors % granularity;
if (unlikely(!max_discard_sectors))
goto zero_out;
if (nr_sectors < granularity)
goto zero_out;
tmp = start;
if (sector_div(tmp, granularity) != alignment) {
if (nr_sectors < 2*granularity)
goto zero_out;
/* start + gran - (start + gran - align) % gran */
tmp = start + granularity - alignment;
tmp = start + granularity - sector_div(tmp, granularity);
nr = tmp - start;
err |= blkdev_issue_zeroout(bdev, start, nr, GFP_NOIO, 0);
nr_sectors -= nr;
start = tmp;
}
while (nr_sectors >= granularity) {
nr = min_t(sector_t, nr_sectors, max_discard_sectors);
err |= blkdev_issue_discard(bdev, start, nr, GFP_NOIO, 0);
nr_sectors -= nr;
start += nr;
}
zero_out:
if (nr_sectors) {
err |= blkdev_issue_zeroout(bdev, start, nr_sectors, GFP_NOIO, 0);
}
return err != 0;
}
static bool can_do_reliable_discards(struct drbd_device *device)
{
struct request_queue *q = bdev_get_queue(device->ldev->backing_bdev);
struct disk_conf *dc;
bool can_do;
if (!blk_queue_discard(q))
return false;
if (q->limits.discard_zeroes_data)
return true;
rcu_read_lock();
dc = rcu_dereference(device->ldev->disk_conf);
can_do = dc->discard_zeroes_if_aligned;
rcu_read_unlock();
return can_do;
}
static void drbd_issue_peer_discard(struct drbd_device *device, struct drbd_peer_request *peer_req)
{
/* If the backend cannot discard, or does not guarantee
* read-back zeroes in discarded ranges, we fall back to
* zero-out. Unless configuration specifically requested
* otherwise. */
if (!can_do_reliable_discards(device))
peer_req->flags |= EE_IS_TRIM_USE_ZEROOUT;
struct block_device *bdev = device->ldev->backing_bdev;
if (drbd_issue_discard_or_zero_out(device, peer_req->i.sector,
peer_req->i.size >> 9, !(peer_req->flags & EE_IS_TRIM_USE_ZEROOUT)))
if (blkdev_issue_zeroout(bdev, peer_req->i.sector, peer_req->i.size >> 9,
GFP_NOIO, 0))
peer_req->flags |= EE_WAS_ERROR;
drbd_endio_write_sec_final(peer_req);
}
@ -2376,7 +2285,7 @@ static unsigned long wire_flags_to_bio_flags(u32 dpf)
static unsigned long wire_flags_to_bio_op(u32 dpf)
{
if (dpf & DP_DISCARD)
return REQ_OP_DISCARD;
return REQ_OP_WRITE_ZEROES;
else
return REQ_OP_WRITE;
}
@ -2567,7 +2476,7 @@ static int receive_Data(struct drbd_connection *connection, struct packet_info *
op_flags = wire_flags_to_bio_flags(dp_flags);
if (pi->cmd == P_TRIM) {
D_ASSERT(peer_device, peer_req->i.size > 0);
D_ASSERT(peer_device, op == REQ_OP_DISCARD);
D_ASSERT(peer_device, op == REQ_OP_WRITE_ZEROES);
D_ASSERT(peer_device, peer_req->pages == NULL);
} else if (peer_req->pages == NULL) {
D_ASSERT(device, peer_req->i.size == 0);
@ -4880,7 +4789,7 @@ static int receive_rs_deallocated(struct drbd_connection *connection, struct pac
if (get_ldev(device)) {
struct drbd_peer_request *peer_req;
const int op = REQ_OP_DISCARD;
const int op = REQ_OP_WRITE_ZEROES;
peer_req = drbd_alloc_peer_req(peer_device, ID_SYNCER, sector,
size, 0, GFP_NOIO);

View File

@ -59,6 +59,7 @@ static struct drbd_request *drbd_req_new(struct drbd_device *device, struct bio
drbd_req_make_private_bio(req, bio_src);
req->rq_state = (bio_data_dir(bio_src) == WRITE ? RQ_WRITE : 0)
| (bio_op(bio_src) == REQ_OP_WRITE_SAME ? RQ_WSAME : 0)
| (bio_op(bio_src) == REQ_OP_WRITE_ZEROES ? RQ_UNMAP : 0)
| (bio_op(bio_src) == REQ_OP_DISCARD ? RQ_UNMAP : 0);
req->device = device;
req->master_bio = bio_src;
@ -1148,10 +1149,10 @@ static int drbd_process_write_request(struct drbd_request *req)
static void drbd_process_discard_req(struct drbd_request *req)
{
int err = drbd_issue_discard_or_zero_out(req->device,
req->i.sector, req->i.size >> 9, true);
struct block_device *bdev = req->device->ldev->backing_bdev;
if (err)
if (blkdev_issue_zeroout(bdev, req->i.sector, req->i.size >> 9,
GFP_NOIO, 0))
req->private_bio->bi_error = -EIO;
bio_endio(req->private_bio);
}
@ -1180,7 +1181,8 @@ drbd_submit_req_private_bio(struct drbd_request *req)
if (get_ldev(device)) {
if (drbd_insert_fault(device, type))
bio_io_error(bio);
else if (bio_op(bio) == REQ_OP_DISCARD)
else if (bio_op(bio) == REQ_OP_WRITE_ZEROES ||
bio_op(bio) == REQ_OP_DISCARD)
drbd_process_discard_req(req);
else
generic_make_request(bio);
@ -1234,7 +1236,8 @@ drbd_request_prepare(struct drbd_device *device, struct bio *bio, unsigned long
_drbd_start_io_acct(device, req);
/* process discards always from our submitter thread */
if (bio_op(bio) & REQ_OP_DISCARD)
if ((bio_op(bio) & REQ_OP_WRITE_ZEROES) ||
(bio_op(bio) & REQ_OP_DISCARD))
goto queue_for_submitter_thread;
if (rw == WRITE && req->private_bio && req->i.size

View File

@ -174,7 +174,8 @@ void drbd_peer_request_endio(struct bio *bio)
struct drbd_peer_request *peer_req = bio->bi_private;
struct drbd_device *device = peer_req->peer_device->device;
bool is_write = bio_data_dir(bio) == WRITE;
bool is_discard = !!(bio_op(bio) == REQ_OP_DISCARD);
bool is_discard = bio_op(bio) == REQ_OP_WRITE_ZEROES ||
bio_op(bio) == REQ_OP_DISCARD;
if (bio->bi_error && __ratelimit(&drbd_ratelimit_state))
drbd_warn(device, "%s: error=%d s=%llus\n",
@ -249,6 +250,7 @@ void drbd_request_endio(struct bio *bio)
/* to avoid recursion in __req_mod */
if (unlikely(bio->bi_error)) {
switch (bio_op(bio)) {
case REQ_OP_WRITE_ZEROES:
case REQ_OP_DISCARD:
if (bio->bi_error == -EOPNOTSUPP)
what = DISCARD_COMPLETED_NOTSUPP;

View File

@ -2805,8 +2805,10 @@ static int set_next_request(void)
fdc_queue = 0;
if (q) {
current_req = blk_fetch_request(q);
if (current_req)
if (current_req) {
current_req->error_count = 0;
break;
}
}
} while (fdc_queue != old_pos);
@ -2866,7 +2868,7 @@ static void redo_fd_request(void)
_floppy = floppy_type + DP->autodetect[DRS->probed_format];
} else
probing = 0;
errors = &(current_req->errors);
errors = &(current_req->error_count);
tmp = make_raw_rw_request();
if (tmp < 2) {
request_done(tmp);
@ -4207,9 +4209,7 @@ static int __init do_floppy_init(void)
disks[drive]->fops = &floppy_fops;
sprintf(disks[drive]->disk_name, "fd%d", drive);
init_timer(&motor_off_timer[drive]);
motor_off_timer[drive].data = drive;
motor_off_timer[drive].function = motor_off_callback;
setup_timer(&motor_off_timer[drive], motor_off_callback, drive);
}
err = register_blkdev(FLOPPY_MAJOR, "fd");

View File

@ -1,803 +0,0 @@
/*
* Copyright (C) 1991, 1992 Linus Torvalds
*
* This is the low-level hd interrupt support. It traverses the
* request-list, using interrupts to jump between functions. As
* all the functions are called within interrupts, we may not
* sleep. Special care is recommended.
*
* modified by Drew Eckhardt to check nr of hd's from the CMOS.
*
* Thanks to Branko Lankester, lankeste@fwi.uva.nl, who found a bug
* in the early extended-partition checks and added DM partitions
*
* IRQ-unmask, drive-id, multiple-mode, support for ">16 heads",
* and general streamlining by Mark Lord.
*
* Removed 99% of above. Use Mark's ide driver for those options.
* This is now a lightweight ST-506 driver. (Paul Gortmaker)
*
* Modified 1995 Russell King for ARM processor.
*
* Bugfix: max_sectors must be <= 255 or the wheels tend to come
* off in a hurry once you queue things up - Paul G. 02/2001
*/
/* Uncomment the following if you want verbose error reports. */
/* #define VERBOSE_ERRORS */
#include <linux/blkdev.h>
#include <linux/errno.h>
#include <linux/signal.h>
#include <linux/interrupt.h>
#include <linux/timer.h>
#include <linux/fs.h>
#include <linux/kernel.h>
#include <linux/genhd.h>
#include <linux/string.h>
#include <linux/ioport.h>
#include <linux/init.h>
#include <linux/blkpg.h>
#include <linux/ata.h>
#include <linux/hdreg.h>
#define HD_IRQ 14
#define REALLY_SLOW_IO
#include <asm/io.h>
#include <linux/uaccess.h>
#ifdef __arm__
#undef HD_IRQ
#endif
#include <asm/irq.h>
#ifdef __arm__
#define HD_IRQ IRQ_HARDDISK
#endif
/* Hd controller regster ports */
#define HD_DATA 0x1f0 /* _CTL when writing */
#define HD_ERROR 0x1f1 /* see err-bits */
#define HD_NSECTOR 0x1f2 /* nr of sectors to read/write */
#define HD_SECTOR 0x1f3 /* starting sector */
#define HD_LCYL 0x1f4 /* starting cylinder */
#define HD_HCYL 0x1f5 /* high byte of starting cyl */
#define HD_CURRENT 0x1f6 /* 101dhhhh , d=drive, hhhh=head */
#define HD_STATUS 0x1f7 /* see status-bits */
#define HD_FEATURE HD_ERROR /* same io address, read=error, write=feature */
#define HD_PRECOMP HD_FEATURE /* obsolete use of this port - predates IDE */
#define HD_COMMAND HD_STATUS /* same io address, read=status, write=cmd */
#define HD_CMD 0x3f6 /* used for resets */
#define HD_ALTSTATUS 0x3f6 /* same as HD_STATUS but doesn't clear irq */
/* Bits of HD_STATUS */
#define ERR_STAT 0x01
#define INDEX_STAT 0x02
#define ECC_STAT 0x04 /* Corrected error */
#define DRQ_STAT 0x08
#define SEEK_STAT 0x10
#define SERVICE_STAT SEEK_STAT
#define WRERR_STAT 0x20
#define READY_STAT 0x40
#define BUSY_STAT 0x80
/* Bits for HD_ERROR */
#define MARK_ERR 0x01 /* Bad address mark */
#define TRK0_ERR 0x02 /* couldn't find track 0 */
#define ABRT_ERR 0x04 /* Command aborted */
#define MCR_ERR 0x08 /* media change request */
#define ID_ERR 0x10 /* ID field not found */
#define MC_ERR 0x20 /* media changed */
#define ECC_ERR 0x40 /* Uncorrectable ECC error */
#define BBD_ERR 0x80 /* pre-EIDE meaning: block marked bad */
#define ICRC_ERR 0x80 /* new meaning: CRC error during transfer */
static DEFINE_SPINLOCK(hd_lock);
static struct request_queue *hd_queue;
static struct request *hd_req;
#define TIMEOUT_VALUE (6*HZ)
#define HD_DELAY 0
#define MAX_ERRORS 16 /* Max read/write errors/sector */
#define RESET_FREQ 8 /* Reset controller every 8th retry */
#define RECAL_FREQ 4 /* Recalibrate every 4th retry */
#define MAX_HD 2
#define STAT_OK (READY_STAT|SEEK_STAT)
#define OK_STATUS(s) (((s)&(STAT_OK|(BUSY_STAT|WRERR_STAT|ERR_STAT)))==STAT_OK)
static void recal_intr(void);
static void bad_rw_intr(void);
static int reset;
static int hd_error;
/*
* This struct defines the HD's and their types.
*/
struct hd_i_struct {
unsigned int head, sect, cyl, wpcom, lzone, ctl;
int unit;
int recalibrate;
int special_op;
};
#ifdef HD_TYPE
static struct hd_i_struct hd_info[] = { HD_TYPE };
static int NR_HD = ARRAY_SIZE(hd_info);
#else
static struct hd_i_struct hd_info[MAX_HD];
static int NR_HD;
#endif
static struct gendisk *hd_gendisk[MAX_HD];
static struct timer_list device_timer;
#define TIMEOUT_VALUE (6*HZ)
#define SET_TIMER \
do { \
mod_timer(&device_timer, jiffies + TIMEOUT_VALUE); \
} while (0)
static void (*do_hd)(void) = NULL;
#define SET_HANDLER(x) \
if ((do_hd = (x)) != NULL) \
SET_TIMER; \
else \
del_timer(&device_timer);
#if (HD_DELAY > 0)
#include <linux/i8253.h>
unsigned long last_req;
unsigned long read_timer(void)
{
unsigned long t, flags;
int i;
raw_spin_lock_irqsave(&i8253_lock, flags);
t = jiffies * 11932;
outb_p(0, 0x43);
i = inb_p(0x40);
i |= inb(0x40) << 8;
raw_spin_unlock_irqrestore(&i8253_lock, flags);
return(t - i);
}
#endif
static void __init hd_setup(char *str, int *ints)
{
int hdind = 0;
if (ints[0] != 3)
return;
if (hd_info[0].head != 0)
hdind = 1;
hd_info[hdind].head = ints[2];
hd_info[hdind].sect = ints[3];
hd_info[hdind].cyl = ints[1];
hd_info[hdind].wpcom = 0;
hd_info[hdind].lzone = ints[1];
hd_info[hdind].ctl = (ints[2] > 8 ? 8 : 0);
NR_HD = hdind+1;
}
static bool hd_end_request(int err, unsigned int bytes)
{
if (__blk_end_request(hd_req, err, bytes))
return true;
hd_req = NULL;
return false;
}
static bool hd_end_request_cur(int err)
{
return hd_end_request(err, blk_rq_cur_bytes(hd_req));
}
static void dump_status(const char *msg, unsigned int stat)
{
char *name = "hd?";
if (hd_req)
name = hd_req->rq_disk->disk_name;
#ifdef VERBOSE_ERRORS
printk("%s: %s: status=0x%02x { ", name, msg, stat & 0xff);
if (stat & BUSY_STAT) printk("Busy ");
if (stat & READY_STAT) printk("DriveReady ");
if (stat & WRERR_STAT) printk("WriteFault ");
if (stat & SEEK_STAT) printk("SeekComplete ");
if (stat & DRQ_STAT) printk("DataRequest ");
if (stat & ECC_STAT) printk("CorrectedError ");
if (stat & INDEX_STAT) printk("Index ");
if (stat & ERR_STAT) printk("Error ");
printk("}\n");
if ((stat & ERR_STAT) == 0) {
hd_error = 0;
} else {
hd_error = inb(HD_ERROR);
printk("%s: %s: error=0x%02x { ", name, msg, hd_error & 0xff);
if (hd_error & BBD_ERR) printk("BadSector ");
if (hd_error & ECC_ERR) printk("UncorrectableError ");
if (hd_error & ID_ERR) printk("SectorIdNotFound ");
if (hd_error & ABRT_ERR) printk("DriveStatusError ");
if (hd_error & TRK0_ERR) printk("TrackZeroNotFound ");
if (hd_error & MARK_ERR) printk("AddrMarkNotFound ");
printk("}");
if (hd_error & (BBD_ERR|ECC_ERR|ID_ERR|MARK_ERR)) {
printk(", CHS=%d/%d/%d", (inb(HD_HCYL)<<8) + inb(HD_LCYL),
inb(HD_CURRENT) & 0xf, inb(HD_SECTOR));
if (hd_req)
printk(", sector=%ld", blk_rq_pos(hd_req));
}
printk("\n");
}
#else
printk("%s: %s: status=0x%02x.\n", name, msg, stat & 0xff);
if ((stat & ERR_STAT) == 0) {
hd_error = 0;
} else {
hd_error = inb(HD_ERROR);
printk("%s: %s: error=0x%02x.\n", name, msg, hd_error & 0xff);
}
#endif
}
static void check_status(void)
{
int i = inb_p(HD_STATUS);
if (!OK_STATUS(i)) {
dump_status("check_status", i);
bad_rw_intr();
}
}
static int controller_busy(void)
{
int retries = 100000;
unsigned char status;
do {
status = inb_p(HD_STATUS);
} while ((status & BUSY_STAT) && --retries);
return status;
}
static int status_ok(void)
{
unsigned char status = inb_p(HD_STATUS);
if (status & BUSY_STAT)
return 1; /* Ancient, but does it make sense??? */
if (status & WRERR_STAT)
return 0;
if (!(status & READY_STAT))
return 0;
if (!(status & SEEK_STAT))
return 0;
return 1;
}
static int controller_ready(unsigned int drive, unsigned int head)
{
int retry = 100;
do {
if (controller_busy() & BUSY_STAT)
return 0;
outb_p(0xA0 | (drive<<4) | head, HD_CURRENT);
if (status_ok())
return 1;
} while (--retry);
return 0;
}
static void hd_out(struct hd_i_struct *disk,
unsigned int nsect,
unsigned int sect,
unsigned int head,
unsigned int cyl,
unsigned int cmd,
void (*intr_addr)(void))
{
unsigned short port;
#if (HD_DELAY > 0)
while (read_timer() - last_req < HD_DELAY)
/* nothing */;
#endif
if (reset)
return;
if (!controller_ready(disk->unit, head)) {
reset = 1;
return;
}
SET_HANDLER(intr_addr);
outb_p(disk->ctl, HD_CMD);
port = HD_DATA;
outb_p(disk->wpcom >> 2, ++port);
outb_p(nsect, ++port);
outb_p(sect, ++port);
outb_p(cyl, ++port);
outb_p(cyl >> 8, ++port);
outb_p(0xA0 | (disk->unit << 4) | head, ++port);
outb_p(cmd, ++port);
}
static void hd_request (void);
static int drive_busy(void)
{
unsigned int i;
unsigned char c;
for (i = 0; i < 500000 ; i++) {
c = inb_p(HD_STATUS);
if ((c & (BUSY_STAT | READY_STAT | SEEK_STAT)) == STAT_OK)
return 0;
}
dump_status("reset timed out", c);
return 1;
}
static void reset_controller(void)
{
int i;
outb_p(4, HD_CMD);
for (i = 0; i < 1000; i++) barrier();
outb_p(hd_info[0].ctl & 0x0f, HD_CMD);
for (i = 0; i < 1000; i++) barrier();
if (drive_busy())
printk("hd: controller still busy\n");
else if ((hd_error = inb(HD_ERROR)) != 1)
printk("hd: controller reset failed: %02x\n", hd_error);
}
static void reset_hd(void)
{
static int i;
repeat:
if (reset) {
reset = 0;
i = -1;
reset_controller();
} else {
check_status();
if (reset)
goto repeat;
}
if (++i < NR_HD) {
struct hd_i_struct *disk = &hd_info[i];
disk->special_op = disk->recalibrate = 1;
hd_out(disk, disk->sect, disk->sect, disk->head-1,
disk->cyl, ATA_CMD_INIT_DEV_PARAMS, &reset_hd);
if (reset)
goto repeat;
} else
hd_request();
}
/*
* Ok, don't know what to do with the unexpected interrupts: on some machines
* doing a reset and a retry seems to result in an eternal loop. Right now I
* ignore it, and just set the timeout.
*
* On laptops (and "green" PCs), an unexpected interrupt occurs whenever the
* drive enters "idle", "standby", or "sleep" mode, so if the status looks
* "good", we just ignore the interrupt completely.
*/
static void unexpected_hd_interrupt(void)
{
unsigned int stat = inb_p(HD_STATUS);
if (stat & (BUSY_STAT|DRQ_STAT|ECC_STAT|ERR_STAT)) {
dump_status("unexpected interrupt", stat);
SET_TIMER;
}
}
/*
* bad_rw_intr() now tries to be a bit smarter and does things
* according to the error returned by the controller.
* -Mika Liljeberg (liljeber@cs.Helsinki.FI)
*/
static void bad_rw_intr(void)
{
struct request *req = hd_req;
if (req != NULL) {
struct hd_i_struct *disk = req->rq_disk->private_data;
if (++req->errors >= MAX_ERRORS || (hd_error & BBD_ERR)) {
hd_end_request_cur(-EIO);
disk->special_op = disk->recalibrate = 1;
} else if (req->errors % RESET_FREQ == 0)
reset = 1;
else if ((hd_error & TRK0_ERR) || req->errors % RECAL_FREQ == 0)
disk->special_op = disk->recalibrate = 1;
/* Otherwise just retry */
}
}
static inline int wait_DRQ(void)
{
int retries;
int stat;
for (retries = 0; retries < 100000; retries++) {
stat = inb_p(HD_STATUS);
if (stat & DRQ_STAT)
return 0;
}
dump_status("wait_DRQ", stat);
return -1;
}
static void read_intr(void)
{
struct request *req;
int i, retries = 100000;
do {
i = (unsigned) inb_p(HD_STATUS);
if (i & BUSY_STAT)
continue;
if (!OK_STATUS(i))
break;
if (i & DRQ_STAT)
goto ok_to_read;
} while (--retries > 0);
dump_status("read_intr", i);
bad_rw_intr();
hd_request();
return;
ok_to_read:
req = hd_req;
insw(HD_DATA, bio_data(req->bio), 256);
#ifdef DEBUG
printk("%s: read: sector %ld, remaining = %u, buffer=%p\n",
req->rq_disk->disk_name, blk_rq_pos(req) + 1,
blk_rq_sectors(req) - 1, bio_data(req->bio)+512);
#endif
if (hd_end_request(0, 512)) {
SET_HANDLER(&read_intr);
return;
}
(void) inb_p(HD_STATUS);
#if (HD_DELAY > 0)
last_req = read_timer();
#endif
hd_request();
}
static void write_intr(void)
{
struct request *req = hd_req;
int i;
int retries = 100000;
do {
i = (unsigned) inb_p(HD_STATUS);
if (i & BUSY_STAT)
continue;
if (!OK_STATUS(i))
break;
if ((blk_rq_sectors(req) <= 1) || (i & DRQ_STAT))
goto ok_to_write;
} while (--retries > 0);
dump_status("write_intr", i);
bad_rw_intr();
hd_request();
return;
ok_to_write:
if (hd_end_request(0, 512)) {
SET_HANDLER(&write_intr);
outsw(HD_DATA, bio_data(req->bio), 256);
return;
}
#if (HD_DELAY > 0)
last_req = read_timer();
#endif
hd_request();
}
static void recal_intr(void)
{
check_status();
#if (HD_DELAY > 0)
last_req = read_timer();
#endif
hd_request();
}
/*
* This is another of the error-routines I don't know what to do with. The
* best idea seems to just set reset, and start all over again.
*/
static void hd_times_out(unsigned long dummy)
{
char *name;
do_hd = NULL;
if (!hd_req)
return;
spin_lock_irq(hd_queue->queue_lock);
reset = 1;
name = hd_req->rq_disk->disk_name;
printk("%s: timeout\n", name);
if (++hd_req->errors >= MAX_ERRORS) {
#ifdef DEBUG
printk("%s: too many errors\n", name);
#endif
hd_end_request_cur(-EIO);
}
hd_request();
spin_unlock_irq(hd_queue->queue_lock);
}
static int do_special_op(struct hd_i_struct *disk, struct request *req)
{
if (disk->recalibrate) {
disk->recalibrate = 0;
hd_out(disk, disk->sect, 0, 0, 0, ATA_CMD_RESTORE, &recal_intr);
return reset;
}
if (disk->head > 16) {
printk("%s: cannot handle device with more than 16 heads - giving up\n", req->rq_disk->disk_name);
hd_end_request_cur(-EIO);
}
disk->special_op = 0;
return 1;
}
/*
* The driver enables interrupts as much as possible. In order to do this,
* (a) the device-interrupt is disabled before entering hd_request(),
* and (b) the timeout-interrupt is disabled before the sti().
*
* Interrupts are still masked (by default) whenever we are exchanging
* data/cmds with a drive, because some drives seem to have very poor
* tolerance for latency during I/O. The IDE driver has support to unmask
* interrupts for non-broken hardware, so use that driver if required.
*/
static void hd_request(void)
{
unsigned int block, nsect, sec, track, head, cyl;
struct hd_i_struct *disk;
struct request *req;
if (do_hd)
return;
repeat:
del_timer(&device_timer);
if (!hd_req) {
hd_req = blk_fetch_request(hd_queue);
if (!hd_req) {
do_hd = NULL;
return;
}
}
req = hd_req;
if (reset) {
reset_hd();
return;
}
disk = req->rq_disk->private_data;
block = blk_rq_pos(req);
nsect = blk_rq_sectors(req);
if (block >= get_capacity(req->rq_disk) ||
((block+nsect) > get_capacity(req->rq_disk))) {
printk("%s: bad access: block=%d, count=%d\n",
req->rq_disk->disk_name, block, nsect);
hd_end_request_cur(-EIO);
goto repeat;
}
if (disk->special_op) {
if (do_special_op(disk, req))
goto repeat;
return;
}
sec = block % disk->sect + 1;
track = block / disk->sect;
head = track % disk->head;
cyl = track / disk->head;
#ifdef DEBUG
printk("%s: %sing: CHS=%d/%d/%d, sectors=%d, buffer=%p\n",
req->rq_disk->disk_name,
req_data_dir(req) == READ ? "read" : "writ",
cyl, head, sec, nsect, bio_data(req->bio));
#endif
switch (req_op(req)) {
case REQ_OP_READ:
hd_out(disk, nsect, sec, head, cyl, ATA_CMD_PIO_READ,
&read_intr);
if (reset)
goto repeat;
break;
case REQ_OP_WRITE:
hd_out(disk, nsect, sec, head, cyl, ATA_CMD_PIO_WRITE,
&write_intr);
if (reset)
goto repeat;
if (wait_DRQ()) {
bad_rw_intr();
goto repeat;
}
outsw(HD_DATA, bio_data(req->bio), 256);
break;
default:
printk("unknown hd-command\n");
hd_end_request_cur(-EIO);
break;
}
}
static void do_hd_request(struct request_queue *q)
{
hd_request();
}
static int hd_getgeo(struct block_device *bdev, struct hd_geometry *geo)
{
struct hd_i_struct *disk = bdev->bd_disk->private_data;
geo->heads = disk->head;
geo->sectors = disk->sect;
geo->cylinders = disk->cyl;
return 0;
}
/*
* Releasing a block device means we sync() it, so that it can safely
* be forgotten about...
*/
static irqreturn_t hd_interrupt(int irq, void *dev_id)
{
void (*handler)(void) = do_hd;
spin_lock(hd_queue->queue_lock);
do_hd = NULL;
del_timer(&device_timer);
if (!handler)
handler = unexpected_hd_interrupt;
handler();
spin_unlock(hd_queue->queue_lock);
return IRQ_HANDLED;
}
static const struct block_device_operations hd_fops = {
.getgeo = hd_getgeo,
};
static int __init hd_init(void)
{
int drive;
if (register_blkdev(HD_MAJOR, "hd"))
return -1;
hd_queue = blk_init_queue(do_hd_request, &hd_lock);
if (!hd_queue) {
unregister_blkdev(HD_MAJOR, "hd");
return -ENOMEM;
}
blk_queue_max_hw_sectors(hd_queue, 255);
init_timer(&device_timer);
device_timer.function = hd_times_out;
blk_queue_logical_block_size(hd_queue, 512);
if (!NR_HD) {
/*
* We don't know anything about the drive. This means
* that you *MUST* specify the drive parameters to the
* kernel yourself.
*
* If we were on an i386, we used to read this info from
* the BIOS or CMOS. This doesn't work all that well,
* since this assumes that this is a primary or secondary
* drive, and if we're using this legacy driver, it's
* probably an auxiliary controller added to recover
* legacy data off an ST-506 drive. Either way, it's
* definitely safest to have the user explicitly specify
* the information.
*/
printk("hd: no drives specified - use hd=cyl,head,sectors"
" on kernel command line\n");
goto out;
}
for (drive = 0 ; drive < NR_HD ; drive++) {
struct gendisk *disk = alloc_disk(64);
struct hd_i_struct *p = &hd_info[drive];
if (!disk)
goto Enomem;
disk->major = HD_MAJOR;
disk->first_minor = drive << 6;
disk->fops = &hd_fops;
sprintf(disk->disk_name, "hd%c", 'a'+drive);
disk->private_data = p;
set_capacity(disk, p->head * p->sect * p->cyl);
disk->queue = hd_queue;
p->unit = drive;
hd_gendisk[drive] = disk;
printk("%s: %luMB, CHS=%d/%d/%d\n",
disk->disk_name, (unsigned long)get_capacity(disk)/2048,
p->cyl, p->head, p->sect);
}
if (request_irq(HD_IRQ, hd_interrupt, 0, "hd", NULL)) {
printk("hd: unable to get IRQ%d for the hard disk driver\n",
HD_IRQ);
goto out1;
}
if (!request_region(HD_DATA, 8, "hd")) {
printk(KERN_WARNING "hd: port 0x%x busy\n", HD_DATA);
goto out2;
}
if (!request_region(HD_CMD, 1, "hd(cmd)")) {
printk(KERN_WARNING "hd: port 0x%x busy\n", HD_CMD);
goto out3;
}
/* Let them fly */
for (drive = 0; drive < NR_HD; drive++)
add_disk(hd_gendisk[drive]);
return 0;
out3:
release_region(HD_DATA, 8);
out2:
free_irq(HD_IRQ, NULL);
out1:
for (drive = 0; drive < NR_HD; drive++)
put_disk(hd_gendisk[drive]);
NR_HD = 0;
out:
del_timer(&device_timer);
unregister_blkdev(HD_MAJOR, "hd");
blk_cleanup_queue(hd_queue);
return -1;
Enomem:
while (drive--)
put_disk(hd_gendisk[drive]);
goto out;
}
static int __init parse_hd_setup(char *line)
{
int ints[6];
(void) get_options(line, ARRAY_SIZE(ints), ints);
hd_setup(NULL, ints);
return 1;
}
__setup("hd=", parse_hd_setup);
late_initcall(hd_init);

View File

@ -445,32 +445,27 @@ static int lo_req_flush(struct loop_device *lo, struct request *rq)
return ret;
}
static inline void handle_partial_read(struct loop_cmd *cmd, long bytes)
static void lo_complete_rq(struct request *rq)
{
if (bytes < 0 || op_is_write(req_op(cmd->rq)))
return;
struct loop_cmd *cmd = blk_mq_rq_to_pdu(rq);
if (unlikely(bytes < blk_rq_bytes(cmd->rq))) {
if (unlikely(req_op(cmd->rq) == REQ_OP_READ && cmd->use_aio &&
cmd->ret >= 0 && cmd->ret < blk_rq_bytes(cmd->rq))) {
struct bio *bio = cmd->rq->bio;
bio_advance(bio, bytes);
bio_advance(bio, cmd->ret);
zero_fill_bio(bio);
}
blk_mq_end_request(rq, cmd->ret < 0 ? -EIO : 0);
}
static void lo_rw_aio_complete(struct kiocb *iocb, long ret, long ret2)
{
struct loop_cmd *cmd = container_of(iocb, struct loop_cmd, iocb);
struct request *rq = cmd->rq;
handle_partial_read(cmd, ret);
if (ret > 0)
ret = 0;
else if (ret < 0)
ret = -EIO;
blk_mq_complete_request(rq, ret);
cmd->ret = ret;
blk_mq_complete_request(cmd->rq);
}
static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
@ -528,6 +523,7 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)
case REQ_OP_FLUSH:
return lo_req_flush(lo, rq);
case REQ_OP_DISCARD:
case REQ_OP_WRITE_ZEROES:
return lo_discard(lo, rq, pos);
case REQ_OP_WRITE:
if (lo->transfer)
@ -826,7 +822,7 @@ static void loop_config_discard(struct loop_device *lo)
q->limits.discard_granularity = 0;
q->limits.discard_alignment = 0;
blk_queue_max_discard_sectors(q, 0);
q->limits.discard_zeroes_data = 0;
blk_queue_max_write_zeroes_sectors(q, 0);
queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, q);
return;
}
@ -834,7 +830,7 @@ static void loop_config_discard(struct loop_device *lo)
q->limits.discard_granularity = inode->i_sb->s_blocksize;
q->limits.discard_alignment = 0;
blk_queue_max_discard_sectors(q, UINT_MAX >> 9);
q->limits.discard_zeroes_data = 1;
blk_queue_max_write_zeroes_sectors(q, UINT_MAX >> 9);
queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
}
@ -1660,6 +1656,7 @@ static int loop_queue_rq(struct blk_mq_hw_ctx *hctx,
switch (req_op(cmd->rq)) {
case REQ_OP_FLUSH:
case REQ_OP_DISCARD:
case REQ_OP_WRITE_ZEROES:
cmd->use_aio = false;
break;
default:
@ -1686,8 +1683,10 @@ static void loop_handle_cmd(struct loop_cmd *cmd)
ret = do_req_filebacked(lo, cmd->rq);
failed:
/* complete non-aio request */
if (!cmd->use_aio || ret)
blk_mq_complete_request(cmd->rq, ret ? -EIO : 0);
if (!cmd->use_aio || ret) {
cmd->ret = ret ? -EIO : 0;
blk_mq_complete_request(cmd->rq);
}
}
static void loop_queue_work(struct kthread_work *work)
@ -1710,9 +1709,10 @@ static int loop_init_request(void *data, struct request *rq,
return 0;
}
static struct blk_mq_ops loop_mq_ops = {
static const struct blk_mq_ops loop_mq_ops = {
.queue_rq = loop_queue_rq,
.init_request = loop_init_request,
.complete = lo_complete_rq,
};
static int loop_add(struct loop_device **l, int i)

View File

@ -70,6 +70,7 @@ struct loop_cmd {
struct request *rq;
struct list_head list;
bool use_aio; /* use AIO interface to handle I/O */
long ret;
struct kiocb iocb;
};

File diff suppressed because it is too large Load Diff

View File

@ -169,6 +169,25 @@ static bool mtip_check_surprise_removal(struct pci_dev *pdev)
return false; /* device present */
}
/* we have to use runtime tag to setup command header */
static void mtip_init_cmd_header(struct request *rq)
{
struct driver_data *dd = rq->q->queuedata;
struct mtip_cmd *cmd = blk_mq_rq_to_pdu(rq);
u32 host_cap_64 = readl(dd->mmio + HOST_CAP) & HOST_CAP_64;
/* Point the command headers at the command tables. */
cmd->command_header = dd->port->command_list +
(sizeof(struct mtip_cmd_hdr) * rq->tag);
cmd->command_header_dma = dd->port->command_list_dma +
(sizeof(struct mtip_cmd_hdr) * rq->tag);
if (host_cap_64)
cmd->command_header->ctbau = __force_bit2int cpu_to_le32((cmd->command_dma >> 16) >> 16);
cmd->command_header->ctba = __force_bit2int cpu_to_le32(cmd->command_dma & 0xFFFFFFFF);
}
static struct mtip_cmd *mtip_get_int_command(struct driver_data *dd)
{
struct request *rq;
@ -180,6 +199,9 @@ static struct mtip_cmd *mtip_get_int_command(struct driver_data *dd)
if (IS_ERR(rq))
return NULL;
/* Internal cmd isn't submitted via .queue_rq */
mtip_init_cmd_header(rq);
return blk_mq_rq_to_pdu(rq);
}
@ -241,7 +263,8 @@ static void mtip_async_complete(struct mtip_port *port,
rq = mtip_rq_from_tag(dd, tag);
blk_mq_complete_request(rq, status);
cmd->status = status;
blk_mq_complete_request(rq);
}
/*
@ -2910,18 +2933,19 @@ static void mtip_softirq_done_fn(struct request *rq)
if (unlikely(cmd->unaligned))
up(&dd->port->cmd_slot_unal);
blk_mq_end_request(rq, rq->errors);
blk_mq_end_request(rq, cmd->status);
}
static void mtip_abort_cmd(struct request *req, void *data,
bool reserved)
{
struct mtip_cmd *cmd = blk_mq_rq_to_pdu(req);
struct driver_data *dd = data;
dbg_printk(MTIP_DRV_NAME " Aborting request, tag = %d\n", req->tag);
clear_bit(req->tag, dd->port->cmds_to_issue);
req->errors = -EIO;
cmd->status = -EIO;
mtip_softirq_done_fn(req);
}
@ -3807,6 +3831,8 @@ static int mtip_queue_rq(struct blk_mq_hw_ctx *hctx,
struct request *rq = bd->rq;
int ret;
mtip_init_cmd_header(rq);
if (unlikely(mtip_check_unal_depth(hctx, rq)))
return BLK_MQ_RQ_QUEUE_BUSY;
@ -3816,7 +3842,6 @@ static int mtip_queue_rq(struct blk_mq_hw_ctx *hctx,
if (likely(!ret))
return BLK_MQ_RQ_QUEUE_OK;
rq->errors = ret;
return BLK_MQ_RQ_QUEUE_ERROR;
}
@ -3838,7 +3863,6 @@ static int mtip_init_cmd(void *data, struct request *rq, unsigned int hctx_idx,
{
struct driver_data *dd = data;
struct mtip_cmd *cmd = blk_mq_rq_to_pdu(rq);
u32 host_cap_64 = readl(dd->mmio + HOST_CAP) & HOST_CAP_64;
/*
* For flush requests, request_idx starts at the end of the
@ -3855,17 +3879,6 @@ static int mtip_init_cmd(void *data, struct request *rq, unsigned int hctx_idx,
memset(cmd->command, 0, CMD_DMA_ALLOC_SZ);
/* Point the command headers at the command tables. */
cmd->command_header = dd->port->command_list +
(sizeof(struct mtip_cmd_hdr) * request_idx);
cmd->command_header_dma = dd->port->command_list_dma +
(sizeof(struct mtip_cmd_hdr) * request_idx);
if (host_cap_64)
cmd->command_header->ctbau = __force_bit2int cpu_to_le32((cmd->command_dma >> 16) >> 16);
cmd->command_header->ctba = __force_bit2int cpu_to_le32(cmd->command_dma & 0xFFFFFFFF);
sg_init_table(cmd->sg, MTIP_MAX_SG);
return 0;
}
@ -3889,7 +3902,7 @@ static enum blk_eh_timer_return mtip_cmd_timeout(struct request *req,
return BLK_EH_RESET_TIMER;
}
static struct blk_mq_ops mtip_mq_ops = {
static const struct blk_mq_ops mtip_mq_ops = {
.queue_rq = mtip_queue_rq,
.init_request = mtip_init_cmd,
.exit_request = mtip_free_cmd,
@ -4025,7 +4038,6 @@ static int mtip_block_initialize(struct driver_data *dd)
dd->queue->limits.discard_granularity = 4096;
blk_queue_max_discard_sectors(dd->queue,
MTIP_MAX_TRIM_ENTRY_LEN * MTIP_MAX_TRIM_ENTRIES);
dd->queue->limits.discard_zeroes_data = 0;
}
/* Set the capacity of the device in 512 byte sectors. */
@ -4107,9 +4119,11 @@ static void mtip_no_dev_cleanup(struct request *rq, void *data, bool reserv)
struct driver_data *dd = (struct driver_data *)data;
struct mtip_cmd *cmd;
if (likely(!reserv))
blk_mq_complete_request(rq, -ENODEV);
else if (test_bit(MTIP_PF_IC_ACTIVE_BIT, &dd->port->flags)) {
if (likely(!reserv)) {
cmd = blk_mq_rq_to_pdu(rq);
cmd->status = -ENODEV;
blk_mq_complete_request(rq);
} else if (test_bit(MTIP_PF_IC_ACTIVE_BIT, &dd->port->flags)) {
cmd = mtip_cmd_from_tag(dd, MTIP_TAG_INTERNAL);
if (cmd->comp_func)
@ -4162,7 +4176,7 @@ static int mtip_block_remove(struct driver_data *dd)
dev_info(&dd->pdev->dev, "device %s surprise removal\n",
dd->disk->disk_name);
blk_mq_freeze_queue_start(dd->queue);
blk_freeze_queue_start(dd->queue);
blk_mq_stop_hw_queues(dd->queue);
blk_mq_tagset_busy_iter(&dd->tags, mtip_no_dev_cleanup, dd);

View File

@ -352,6 +352,7 @@ struct mtip_cmd {
int retries; /* The number of retries left for this command. */
int direction; /* Data transfer direction */
int status;
};
/* Structure used to describe a port. */

File diff suppressed because it is too large Load Diff

View File

@ -117,6 +117,10 @@ static bool use_lightnvm;
module_param(use_lightnvm, bool, S_IRUGO);
MODULE_PARM_DESC(use_lightnvm, "Register as a LightNVM device");
static bool blocking;
module_param(blocking, bool, S_IRUGO);
MODULE_PARM_DESC(blocking, "Register as a blocking blk-mq driver device");
static int irqmode = NULL_IRQ_SOFTIRQ;
static int null_set_irqmode(const char *str, const struct kernel_param *kp)
@ -277,7 +281,7 @@ static inline void null_handle_cmd(struct nullb_cmd *cmd)
case NULL_IRQ_SOFTIRQ:
switch (queue_mode) {
case NULL_Q_MQ:
blk_mq_complete_request(cmd->rq, cmd->rq->errors);
blk_mq_complete_request(cmd->rq);
break;
case NULL_Q_RQ:
blk_complete_request(cmd->rq);
@ -357,6 +361,8 @@ static int null_queue_rq(struct blk_mq_hw_ctx *hctx,
{
struct nullb_cmd *cmd = blk_mq_rq_to_pdu(bd->rq);
might_sleep_if(hctx->flags & BLK_MQ_F_BLOCKING);
if (irqmode == NULL_IRQ_TIMER) {
hrtimer_init(&cmd->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
cmd->timer.function = null_cmd_timer_expired;
@ -392,7 +398,7 @@ static int null_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
return 0;
}
static struct blk_mq_ops null_mq_ops = {
static const struct blk_mq_ops null_mq_ops = {
.queue_rq = null_queue_rq,
.init_hctx = null_init_hctx,
.complete = null_softirq_done_fn,
@ -437,14 +443,7 @@ static int null_lnvm_submit_io(struct nvm_dev *dev, struct nvm_rq *rqd)
if (IS_ERR(rq))
return -ENOMEM;
rq->__sector = bio->bi_iter.bi_sector;
rq->ioprio = bio_prio(bio);
if (bio_has_data(bio))
rq->nr_phys_segments = bio_phys_segments(q, bio);
rq->__data_len = bio->bi_iter.bi_size;
rq->bio = rq->biotail = bio;
blk_init_request_from_bio(rq, bio);
rq->end_io_data = rqd;
@ -724,6 +723,9 @@ static int null_add_dev(void)
nullb->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
nullb->tag_set.driver_data = nullb;
if (blocking)
nullb->tag_set.flags |= BLK_MQ_F_BLOCKING;
rv = blk_mq_alloc_tag_set(&nullb->tag_set);
if (rv)
goto out_cleanup_queues;

View File

@ -1,693 +0,0 @@
/*
osdblk.c -- Export a single SCSI OSD object as a Linux block device
Copyright 2009 Red Hat, Inc.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; see the file COPYING. If not, write to
the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
Instructions for use
--------------------
1) Map a Linux block device to an existing OSD object.
In this example, we will use partition id 1234, object id 5678,
OSD device /dev/osd1.
$ echo "1234 5678 /dev/osd1" > /sys/class/osdblk/add
2) List all active blkdev<->object mappings.
In this example, we have performed step #1 twice, creating two blkdevs,
mapped to two separate OSD objects.
$ cat /sys/class/osdblk/list
0 174 1234 5678 /dev/osd1
1 179 1994 897123 /dev/osd0
The columns, in order, are:
- blkdev unique id
- blkdev assigned major
- OSD object partition id
- OSD object id
- OSD device
3) Remove an active blkdev<->object mapping.
In this example, we remove the mapping with blkdev unique id 1.
$ echo 1 > /sys/class/osdblk/remove
NOTE: The actual creation and deletion of OSD objects is outside the scope
of this driver.
*/
#include <linux/kernel.h>
#include <linux/device.h>
#include <linux/module.h>
#include <linux/fs.h>
#include <linux/slab.h>
#include <scsi/osd_initiator.h>
#include <scsi/osd_attributes.h>
#include <scsi/osd_sec.h>
#include <scsi/scsi_device.h>
#define DRV_NAME "osdblk"
#define PFX DRV_NAME ": "
/* #define _OSDBLK_DEBUG */
#ifdef _OSDBLK_DEBUG
#define OSDBLK_DEBUG(fmt, a...) \
printk(KERN_NOTICE "osdblk @%s:%d: " fmt, __func__, __LINE__, ##a)
#else
#define OSDBLK_DEBUG(fmt, a...) \
do { if (0) printk(fmt, ##a); } while (0)
#endif
MODULE_AUTHOR("Jeff Garzik <jeff@garzik.org>");
MODULE_DESCRIPTION("block device inside an OSD object osdblk.ko");
MODULE_LICENSE("GPL");
struct osdblk_device;
enum {
OSDBLK_MINORS_PER_MAJOR = 256, /* max minors per blkdev */
OSDBLK_MAX_REQ = 32, /* max parallel requests */
OSDBLK_OP_TIMEOUT = 4 * 60, /* sync OSD req timeout */
};
struct osdblk_request {
struct request *rq; /* blk layer request */
struct bio *bio; /* cloned bio */
struct osdblk_device *osdev; /* associated blkdev */
};
struct osdblk_device {
int id; /* blkdev unique id */
int major; /* blkdev assigned major */
struct gendisk *disk; /* blkdev's gendisk and rq */
struct request_queue *q;
struct osd_dev *osd; /* associated OSD */
char name[32]; /* blkdev name, e.g. osdblk34 */
spinlock_t lock; /* queue lock */
struct osd_obj_id obj; /* OSD partition, obj id */
uint8_t obj_cred[OSD_CAP_LEN]; /* OSD cred */
struct osdblk_request req[OSDBLK_MAX_REQ]; /* request table */
struct list_head node;
char osd_path[0]; /* OSD device path */
};
static struct class *class_osdblk; /* /sys/class/osdblk */
static DEFINE_MUTEX(ctl_mutex); /* Serialize open/close/setup/teardown */
static LIST_HEAD(osdblkdev_list);
static const struct block_device_operations osdblk_bd_ops = {
.owner = THIS_MODULE,
};
static const struct osd_attr g_attr_logical_length = ATTR_DEF(
OSD_APAGE_OBJECT_INFORMATION, OSD_ATTR_OI_LOGICAL_LENGTH, 8);
static void osdblk_make_credential(u8 cred_a[OSD_CAP_LEN],
const struct osd_obj_id *obj)
{
osd_sec_init_nosec_doall_caps(cred_a, obj, false, true);
}
/* copied from exofs; move to libosd? */
/*
* Perform a synchronous OSD operation. copied from exofs; move to libosd?
*/
static int osd_sync_op(struct osd_request *or, int timeout, uint8_t *credential)
{
int ret;
or->timeout = timeout;
ret = osd_finalize_request(or, 0, credential, NULL);
if (ret)
return ret;
ret = osd_execute_request(or);
/* osd_req_decode_sense(or, ret); */
return ret;
}
/*
* Perform an asynchronous OSD operation. copied from exofs; move to libosd?
*/
static int osd_async_op(struct osd_request *or, osd_req_done_fn *async_done,
void *caller_context, u8 *cred)
{
int ret;
ret = osd_finalize_request(or, 0, cred, NULL);
if (ret)
return ret;
ret = osd_execute_request_async(or, async_done, caller_context);
return ret;
}
/* copied from exofs; move to libosd? */
static int extract_attr_from_req(struct osd_request *or, struct osd_attr *attr)
{
struct osd_attr cur_attr = {.attr_page = 0}; /* start with zeros */
void *iter = NULL;
int nelem;
do {
nelem = 1;
osd_req_decode_get_attr_list(or, &cur_attr, &nelem, &iter);
if ((cur_attr.attr_page == attr->attr_page) &&
(cur_attr.attr_id == attr->attr_id)) {
attr->len = cur_attr.len;
attr->val_ptr = cur_attr.val_ptr;
return 0;
}
} while (iter);
return -EIO;
}
static int osdblk_get_obj_size(struct osdblk_device *osdev, u64 *size_out)
{
struct osd_request *or;
struct osd_attr attr;
int ret;
/* start request */
or = osd_start_request(osdev->osd, GFP_KERNEL);
if (!or)
return -ENOMEM;
/* create a get-attributes(length) request */
osd_req_get_attributes(or, &osdev->obj);
osd_req_add_get_attr_list(or, &g_attr_logical_length, 1);
/* execute op synchronously */
ret = osd_sync_op(or, OSDBLK_OP_TIMEOUT, osdev->obj_cred);
if (ret)
goto out;
/* extract length from returned attribute info */
attr = g_attr_logical_length;
ret = extract_attr_from_req(or, &attr);
if (ret)
goto out;
*size_out = get_unaligned_be64(attr.val_ptr);
out:
osd_end_request(or);
return ret;
}
static void osdblk_osd_complete(struct osd_request *or, void *private)
{
struct osdblk_request *orq = private;
struct osd_sense_info osi;
int ret = osd_req_decode_sense(or, &osi);
if (ret) {
ret = -EIO;
OSDBLK_DEBUG("osdblk_osd_complete with err=%d\n", ret);
}
/* complete OSD request */
osd_end_request(or);
/* complete request passed to osdblk by block layer */
__blk_end_request_all(orq->rq, ret);
}
static void bio_chain_put(struct bio *chain)
{
struct bio *tmp;
while (chain) {
tmp = chain;
chain = chain->bi_next;
bio_put(tmp);
}
}
static struct bio *bio_chain_clone(struct bio *old_chain, gfp_t gfpmask)
{
struct bio *tmp, *new_chain = NULL, *tail = NULL;
while (old_chain) {
tmp = bio_clone_kmalloc(old_chain, gfpmask);
if (!tmp)
goto err_out;
tmp->bi_bdev = NULL;
gfpmask &= ~__GFP_DIRECT_RECLAIM;
tmp->bi_next = NULL;
if (!new_chain)
new_chain = tail = tmp;
else {
tail->bi_next = tmp;
tail = tmp;
}
old_chain = old_chain->bi_next;
}
return new_chain;
err_out:
OSDBLK_DEBUG("bio_chain_clone with err\n");
bio_chain_put(new_chain);
return NULL;
}
static void osdblk_rq_fn(struct request_queue *q)
{
struct osdblk_device *osdev = q->queuedata;
while (1) {
struct request *rq;
struct osdblk_request *orq;
struct osd_request *or;
struct bio *bio;
bool do_write, do_flush;
/* peek at request from block layer */
rq = blk_fetch_request(q);
if (!rq)
break;
/* deduce our operation (read, write, flush) */
/* I wish the block layer simplified cmd_type/cmd_flags/cmd[]
* into a clearly defined set of RPC commands:
* read, write, flush, scsi command, power mgmt req,
* driver-specific, etc.
*/
do_flush = (req_op(rq) == REQ_OP_FLUSH);
do_write = (rq_data_dir(rq) == WRITE);
if (!do_flush) { /* osd_flush does not use a bio */
/* a bio clone to be passed down to OSD request */
bio = bio_chain_clone(rq->bio, GFP_ATOMIC);
if (!bio)
break;
} else
bio = NULL;
/* alloc internal OSD request, for OSD command execution */
or = osd_start_request(osdev->osd, GFP_ATOMIC);
if (!or) {
bio_chain_put(bio);
OSDBLK_DEBUG("osd_start_request with err\n");
break;
}
orq = &osdev->req[rq->tag];
orq->rq = rq;
orq->bio = bio;
orq->osdev = osdev;
/* init OSD command: flush, write or read */
if (do_flush)
osd_req_flush_object(or, &osdev->obj,
OSD_CDB_FLUSH_ALL, 0, 0);
else if (do_write)
osd_req_write(or, &osdev->obj, blk_rq_pos(rq) * 512ULL,
bio, blk_rq_bytes(rq));
else
osd_req_read(or, &osdev->obj, blk_rq_pos(rq) * 512ULL,
bio, blk_rq_bytes(rq));
OSDBLK_DEBUG("%s 0x%x bytes at 0x%llx\n",
do_flush ? "flush" : do_write ?
"write" : "read", blk_rq_bytes(rq),
blk_rq_pos(rq) * 512ULL);
/* begin OSD command execution */
if (osd_async_op(or, osdblk_osd_complete, orq,
osdev->obj_cred)) {
osd_end_request(or);
blk_requeue_request(q, rq);
bio_chain_put(bio);
OSDBLK_DEBUG("osd_execute_request_async with err\n");
break;
}
/* remove the special 'flush' marker, now that the command
* is executing
*/
rq->special = NULL;
}
}
static void osdblk_free_disk(struct osdblk_device *osdev)
{
struct gendisk *disk = osdev->disk;
if (!disk)
return;
if (disk->flags & GENHD_FL_UP)
del_gendisk(disk);
if (disk->queue)
blk_cleanup_queue(disk->queue);
put_disk(disk);
}
static int osdblk_init_disk(struct osdblk_device *osdev)
{
struct gendisk *disk;
struct request_queue *q;
int rc;
u64 obj_size = 0;
/* contact OSD, request size info about the object being mapped */
rc = osdblk_get_obj_size(osdev, &obj_size);
if (rc)
return rc;
/* create gendisk info */
disk = alloc_disk(OSDBLK_MINORS_PER_MAJOR);
if (!disk)
return -ENOMEM;
sprintf(disk->disk_name, DRV_NAME "%d", osdev->id);
disk->major = osdev->major;
disk->first_minor = 0;
disk->fops = &osdblk_bd_ops;
disk->private_data = osdev;
/* init rq */
q = blk_init_queue(osdblk_rq_fn, &osdev->lock);
if (!q) {
put_disk(disk);
return -ENOMEM;
}
/* switch queue to TCQ mode; allocate tag map */
rc = blk_queue_init_tags(q, OSDBLK_MAX_REQ, NULL, BLK_TAG_ALLOC_FIFO);
if (rc) {
blk_cleanup_queue(q);
put_disk(disk);
return rc;
}
/* Set our limits to the lower device limits, because osdblk cannot
* sleep when allocating a lower-request and therefore cannot be
* bouncing.
*/
blk_queue_stack_limits(q, osd_request_queue(osdev->osd));
blk_queue_prep_rq(q, blk_queue_start_tag);
blk_queue_write_cache(q, true, false);
disk->queue = q;
q->queuedata = osdev;
osdev->disk = disk;
osdev->q = q;
/* finally, announce the disk to the world */
set_capacity(disk, obj_size / 512ULL);
add_disk(disk);
printk(KERN_INFO "%s: Added of size 0x%llx\n",
disk->disk_name, (unsigned long long)obj_size);
return 0;
}
/********************************************************************
* /sys/class/osdblk/
* add map OSD object to blkdev
* remove unmap OSD object
* list show mappings
*******************************************************************/
static void class_osdblk_release(struct class *cls)
{
kfree(cls);
}
static ssize_t class_osdblk_list(struct class *c,
struct class_attribute *attr,
char *data)
{
int n = 0;
struct list_head *tmp;
mutex_lock_nested(&ctl_mutex, SINGLE_DEPTH_NESTING);
list_for_each(tmp, &osdblkdev_list) {
struct osdblk_device *osdev;
osdev = list_entry(tmp, struct osdblk_device, node);
n += sprintf(data+n, "%d %d %llu %llu %s\n",
osdev->id,
osdev->major,
osdev->obj.partition,
osdev->obj.id,
osdev->osd_path);
}
mutex_unlock(&ctl_mutex);
return n;
}
static ssize_t class_osdblk_add(struct class *c,
struct class_attribute *attr,
const char *buf, size_t count)
{
struct osdblk_device *osdev;
ssize_t rc;
int irc, new_id = 0;
struct list_head *tmp;
if (!try_module_get(THIS_MODULE))
return -ENODEV;
/* new osdblk_device object */
osdev = kzalloc(sizeof(*osdev) + strlen(buf) + 1, GFP_KERNEL);
if (!osdev) {
rc = -ENOMEM;
goto err_out_mod;
}
/* static osdblk_device initialization */
spin_lock_init(&osdev->lock);
INIT_LIST_HEAD(&osdev->node);
/* generate unique id: find highest unique id, add one */
mutex_lock_nested(&ctl_mutex, SINGLE_DEPTH_NESTING);
list_for_each(tmp, &osdblkdev_list) {
struct osdblk_device *osdev;
osdev = list_entry(tmp, struct osdblk_device, node);
if (osdev->id > new_id)
new_id = osdev->id + 1;
}
osdev->id = new_id;
/* add to global list */
list_add_tail(&osdev->node, &osdblkdev_list);
mutex_unlock(&ctl_mutex);
/* parse add command */
if (sscanf(buf, "%llu %llu %s", &osdev->obj.partition, &osdev->obj.id,
osdev->osd_path) != 3) {
rc = -EINVAL;
goto err_out_slot;
}
/* initialize rest of new object */
sprintf(osdev->name, DRV_NAME "%d", osdev->id);
/* contact requested OSD */
osdev->osd = osduld_path_lookup(osdev->osd_path);
if (IS_ERR(osdev->osd)) {
rc = PTR_ERR(osdev->osd);
goto err_out_slot;
}
/* build OSD credential */
osdblk_make_credential(osdev->obj_cred, &osdev->obj);
/* register our block device */
irc = register_blkdev(0, osdev->name);
if (irc < 0) {
rc = irc;
goto err_out_osd;
}
osdev->major = irc;
/* set up and announce blkdev mapping */
rc = osdblk_init_disk(osdev);
if (rc)
goto err_out_blkdev;
return count;
err_out_blkdev:
unregister_blkdev(osdev->major, osdev->name);
err_out_osd:
osduld_put_device(osdev->osd);
err_out_slot:
mutex_lock_nested(&ctl_mutex, SINGLE_DEPTH_NESTING);
list_del_init(&osdev->node);
mutex_unlock(&ctl_mutex);
kfree(osdev);
err_out_mod:
OSDBLK_DEBUG("Error adding device %s\n", buf);
module_put(THIS_MODULE);
return rc;
}
static ssize_t class_osdblk_remove(struct class *c,
struct class_attribute *attr,
const char *buf,
size_t count)
{
struct osdblk_device *osdev = NULL;
int target_id, rc;
unsigned long ul;
struct list_head *tmp;
rc = kstrtoul(buf, 10, &ul);
if (rc)
return rc;
/* convert to int; abort if we lost anything in the conversion */
target_id = (int) ul;
if (target_id != ul)
return -EINVAL;
/* remove object from list immediately */
mutex_lock_nested(&ctl_mutex, SINGLE_DEPTH_NESTING);
list_for_each(tmp, &osdblkdev_list) {
osdev = list_entry(tmp, struct osdblk_device, node);
if (osdev->id == target_id) {
list_del_init(&osdev->node);
break;
}
osdev = NULL;
}
mutex_unlock(&ctl_mutex);
if (!osdev)
return -ENOENT;
/* clean up and free blkdev and associated OSD connection */
osdblk_free_disk(osdev);
unregister_blkdev(osdev->major, osdev->name);
osduld_put_device(osdev->osd);
kfree(osdev);
/* release module ref */
module_put(THIS_MODULE);
return count;
}
static struct class_attribute class_osdblk_attrs[] = {
__ATTR(add, 0200, NULL, class_osdblk_add),
__ATTR(remove, 0200, NULL, class_osdblk_remove),
__ATTR(list, 0444, class_osdblk_list, NULL),
__ATTR_NULL
};
static int osdblk_sysfs_init(void)
{
int ret = 0;
/*
* create control files in sysfs
* /sys/class/osdblk/...
*/
class_osdblk = kzalloc(sizeof(*class_osdblk), GFP_KERNEL);
if (!class_osdblk)
return -ENOMEM;
class_osdblk->name = DRV_NAME;
class_osdblk->owner = THIS_MODULE;
class_osdblk->class_release = class_osdblk_release;
class_osdblk->class_attrs = class_osdblk_attrs;
ret = class_register(class_osdblk);
if (ret) {
kfree(class_osdblk);
class_osdblk = NULL;
printk(PFX "failed to create class osdblk\n");
return ret;
}
return 0;
}
static void osdblk_sysfs_cleanup(void)
{
if (class_osdblk)
class_destroy(class_osdblk);
class_osdblk = NULL;
}
static int __init osdblk_init(void)
{
int rc;
rc = osdblk_sysfs_init();
if (rc)
return rc;
return 0;
}
static void __exit osdblk_exit(void)
{
osdblk_sysfs_cleanup();
}
module_init(osdblk_init);
module_exit(osdblk_exit);

View File

@ -300,6 +300,11 @@ static void pcd_init_units(void)
struct gendisk *disk = alloc_disk(1);
if (!disk)
continue;
disk->queue = blk_init_queue(do_pcd_request, &pcd_lock);
if (!disk->queue) {
put_disk(disk);
continue;
}
cd->disk = disk;
cd->pi = &cd->pia;
cd->present = 0;
@ -735,18 +740,36 @@ static int pcd_detect(void)
}
/* I/O request processing */
static struct request_queue *pcd_queue;
static int pcd_queue;
static void do_pcd_request(struct request_queue * q)
static int set_next_request(void)
{
struct pcd_unit *cd;
struct request_queue *q;
int old_pos = pcd_queue;
do {
cd = &pcd[pcd_queue];
q = cd->present ? cd->disk->queue : NULL;
if (++pcd_queue == PCD_UNITS)
pcd_queue = 0;
if (q) {
pcd_req = blk_fetch_request(q);
if (pcd_req)
break;
}
} while (pcd_queue != old_pos);
return pcd_req != NULL;
}
static void pcd_request(void)
{
if (pcd_busy)
return;
while (1) {
if (!pcd_req) {
pcd_req = blk_fetch_request(q);
if (!pcd_req)
return;
}
if (!pcd_req && !set_next_request())
return;
if (rq_data_dir(pcd_req) == READ) {
struct pcd_unit *cd = pcd_req->rq_disk->private_data;
@ -766,6 +789,11 @@ static void do_pcd_request(struct request_queue * q)
}
}
static void do_pcd_request(struct request_queue *q)
{
pcd_request();
}
static inline void next_request(int err)
{
unsigned long saved_flags;
@ -774,7 +802,7 @@ static inline void next_request(int err)
if (!__blk_end_request_cur(pcd_req, err))
pcd_req = NULL;
pcd_busy = 0;
do_pcd_request(pcd_queue);
pcd_request();
spin_unlock_irqrestore(&pcd_lock, saved_flags);
}
@ -849,7 +877,7 @@ static void do_pcd_read_drq(void)
do_pcd_read();
spin_lock_irqsave(&pcd_lock, saved_flags);
do_pcd_request(pcd_queue);
pcd_request();
spin_unlock_irqrestore(&pcd_lock, saved_flags);
}
@ -957,19 +985,10 @@ static int __init pcd_init(void)
return -EBUSY;
}
pcd_queue = blk_init_queue(do_pcd_request, &pcd_lock);
if (!pcd_queue) {
unregister_blkdev(major, name);
for (unit = 0, cd = pcd; unit < PCD_UNITS; unit++, cd++)
put_disk(cd->disk);
return -ENOMEM;
}
for (unit = 0, cd = pcd; unit < PCD_UNITS; unit++, cd++) {
if (cd->present) {
register_cdrom(&cd->info);
cd->disk->private_data = cd;
cd->disk->queue = pcd_queue;
add_disk(cd->disk);
}
}
@ -988,9 +1007,9 @@ static void __exit pcd_exit(void)
pi_release(cd->pi);
unregister_cdrom(&cd->info);
}
blk_cleanup_queue(cd->disk->queue);
put_disk(cd->disk);
}
blk_cleanup_queue(pcd_queue);
unregister_blkdev(major, name);
pi_unregister_driver(par_drv);
}

View File

@ -381,12 +381,33 @@ static enum action do_pd_write_start(void);
static enum action do_pd_read_drq(void);
static enum action do_pd_write_done(void);
static struct request_queue *pd_queue;
static int pd_queue;
static int pd_claimed;
static struct pd_unit *pd_current; /* current request's drive */
static PIA *pi_current; /* current request's PIA */
static int set_next_request(void)
{
struct gendisk *disk;
struct request_queue *q;
int old_pos = pd_queue;
do {
disk = pd[pd_queue].gd;
q = disk ? disk->queue : NULL;
if (++pd_queue == PD_UNITS)
pd_queue = 0;
if (q) {
pd_req = blk_fetch_request(q);
if (pd_req)
break;
}
} while (pd_queue != old_pos);
return pd_req != NULL;
}
static void run_fsm(void)
{
while (1) {
@ -418,8 +439,7 @@ static void run_fsm(void)
spin_lock_irqsave(&pd_lock, saved_flags);
if (!__blk_end_request_cur(pd_req,
res == Ok ? 0 : -EIO)) {
pd_req = blk_fetch_request(pd_queue);
if (!pd_req)
if (!set_next_request())
stop = 1;
}
spin_unlock_irqrestore(&pd_lock, saved_flags);
@ -719,18 +739,15 @@ static int pd_special_command(struct pd_unit *disk,
enum action (*func)(struct pd_unit *disk))
{
struct request *rq;
int err = 0;
rq = blk_get_request(disk->gd->queue, REQ_OP_DRV_IN, __GFP_RECLAIM);
if (IS_ERR(rq))
return PTR_ERR(rq);
rq->special = func;
err = blk_execute_rq(disk->gd->queue, disk->gd, rq, 0);
blk_execute_rq(disk->gd->queue, disk->gd, rq, 0);
blk_put_request(rq);
return err;
return 0;
}
/* kernel glue structures */
@ -839,7 +856,13 @@ static void pd_probe_drive(struct pd_unit *disk)
p->first_minor = (disk - pd) << PD_BITS;
disk->gd = p;
p->private_data = disk;
p->queue = pd_queue;
p->queue = blk_init_queue(do_pd_request, &pd_lock);
if (!p->queue) {
disk->gd = NULL;
put_disk(p);
return;
}
blk_queue_max_hw_sectors(p->queue, cluster);
if (disk->drive == -1) {
for (disk->drive = 0; disk->drive <= 1; disk->drive++)
@ -919,26 +942,18 @@ static int __init pd_init(void)
if (disable)
goto out1;
pd_queue = blk_init_queue(do_pd_request, &pd_lock);
if (!pd_queue)
goto out1;
blk_queue_max_hw_sectors(pd_queue, cluster);
if (register_blkdev(major, name))
goto out2;
goto out1;
printk("%s: %s version %s, major %d, cluster %d, nice %d\n",
name, name, PD_VERSION, major, cluster, nice);
if (!pd_detect())
goto out3;
goto out2;
return 0;
out3:
unregister_blkdev(major, name);
out2:
blk_cleanup_queue(pd_queue);
unregister_blkdev(major, name);
out1:
return -ENODEV;
}
@ -953,11 +968,11 @@ static void __exit pd_exit(void)
if (p) {
disk->gd = NULL;
del_gendisk(p);
blk_cleanup_queue(p->queue);
put_disk(p);
pi_release(disk->pi);
}
}
blk_cleanup_queue(pd_queue);
}
MODULE_LICENSE("GPL");

View File

@ -287,6 +287,12 @@ static void __init pf_init_units(void)
struct gendisk *disk = alloc_disk(1);
if (!disk)
continue;
disk->queue = blk_init_queue(do_pf_request, &pf_spin_lock);
if (!disk->queue) {
put_disk(disk);
return;
}
blk_queue_max_segments(disk->queue, cluster);
pf->disk = disk;
pf->pi = &pf->pia;
pf->media_status = PF_NM;
@ -772,7 +778,28 @@ static int pf_ready(void)
return (((status_reg(pf_current) & (STAT_BUSY | pf_mask)) == pf_mask));
}
static struct request_queue *pf_queue;
static int pf_queue;
static int set_next_request(void)
{
struct pf_unit *pf;
struct request_queue *q;
int old_pos = pf_queue;
do {
pf = &units[pf_queue];
q = pf->present ? pf->disk->queue : NULL;
if (++pf_queue == PF_UNITS)
pf_queue = 0;
if (q) {
pf_req = blk_fetch_request(q);
if (pf_req)
break;
}
} while (pf_queue != old_pos);
return pf_req != NULL;
}
static void pf_end_request(int err)
{
@ -780,16 +807,13 @@ static void pf_end_request(int err)
pf_req = NULL;
}
static void do_pf_request(struct request_queue * q)
static void pf_request(void)
{
if (pf_busy)
return;
repeat:
if (!pf_req) {
pf_req = blk_fetch_request(q);
if (!pf_req)
return;
}
if (!pf_req && !set_next_request())
return;
pf_current = pf_req->rq_disk->private_data;
pf_block = blk_rq_pos(pf_req);
@ -817,6 +841,11 @@ static void do_pf_request(struct request_queue * q)
}
}
static void do_pf_request(struct request_queue *q)
{
pf_request();
}
static int pf_next_buf(void)
{
unsigned long saved_flags;
@ -846,7 +875,7 @@ static inline void next_request(int err)
spin_lock_irqsave(&pf_spin_lock, saved_flags);
pf_end_request(err);
pf_busy = 0;
do_pf_request(pf_queue);
pf_request();
spin_unlock_irqrestore(&pf_spin_lock, saved_flags);
}
@ -972,15 +1001,6 @@ static int __init pf_init(void)
put_disk(pf->disk);
return -EBUSY;
}
pf_queue = blk_init_queue(do_pf_request, &pf_spin_lock);
if (!pf_queue) {
unregister_blkdev(major, name);
for (pf = units, unit = 0; unit < PF_UNITS; pf++, unit++)
put_disk(pf->disk);
return -ENOMEM;
}
blk_queue_max_segments(pf_queue, cluster);
for (pf = units, unit = 0; unit < PF_UNITS; pf++, unit++) {
struct gendisk *disk = pf->disk;
@ -988,7 +1008,6 @@ static int __init pf_init(void)
if (!pf->present)
continue;
disk->private_data = pf;
disk->queue = pf_queue;
add_disk(disk);
}
return 0;
@ -1003,10 +1022,10 @@ static void __exit pf_exit(void)
if (!pf->present)
continue;
del_gendisk(pf->disk);
blk_cleanup_queue(pf->disk->queue);
put_disk(pf->disk);
pi_release(pf->pi);
}
blk_cleanup_queue(pf_queue);
}
MODULE_LICENSE("GPL");

View File

@ -724,7 +724,7 @@ static int pkt_generic_packet(struct pktcdvd_device *pd, struct packet_command *
rq->rq_flags |= RQF_QUIET;
blk_execute_rq(rq->q, pd->bdev->bd_disk, rq, 0);
if (rq->errors)
if (scsi_req(rq)->result)
ret = -EIO;
out:
blk_put_request(rq);

View File

@ -4317,7 +4317,7 @@ static int rbd_init_request(void *data, struct request *rq,
return 0;
}
static struct blk_mq_ops rbd_mq_ops = {
static const struct blk_mq_ops rbd_mq_ops = {
.queue_rq = rbd_queue_rq,
.init_request = rbd_init_request,
};
@ -4380,7 +4380,6 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
q->limits.discard_granularity = segment_size;
q->limits.discard_alignment = segment_size;
blk_queue_max_discard_sectors(q, segment_size / SECTOR_SIZE);
q->limits.discard_zeroes_data = 1;
if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;

View File

@ -300,7 +300,6 @@ int rsxx_setup_dev(struct rsxx_cardinfo *card)
RSXX_HW_BLK_SIZE >> 9);
card->queue->limits.discard_granularity = RSXX_HW_BLK_SIZE;
card->queue->limits.discard_alignment = RSXX_HW_BLK_SIZE;
card->queue->limits.discard_zeroes_data = 1;
}
card->queue->queuedata = card;

View File

@ -211,7 +211,7 @@ enum head {
struct swim_priv {
struct swim __iomem *base;
spinlock_t lock;
struct request_queue *queue;
int fdc_queue;
int floppy_count;
struct floppy_state unit[FD_MAX_UNIT];
};
@ -525,12 +525,33 @@ static int floppy_read_sectors(struct floppy_state *fs,
return 0;
}
static void redo_fd_request(struct request_queue *q)
static struct request *swim_next_request(struct swim_priv *swd)
{
struct request_queue *q;
struct request *rq;
int old_pos = swd->fdc_queue;
do {
q = swd->unit[swd->fdc_queue].disk->queue;
if (++swd->fdc_queue == swd->floppy_count)
swd->fdc_queue = 0;
if (q) {
rq = blk_fetch_request(q);
if (rq)
return rq;
}
} while (swd->fdc_queue != old_pos);
return NULL;
}
static void do_fd_request(struct request_queue *q)
{
struct swim_priv *swd = q->queuedata;
struct request *req;
struct floppy_state *fs;
req = blk_fetch_request(q);
req = swim_next_request(swd);
while (req) {
int err = -EIO;
@ -554,15 +575,10 @@ static void redo_fd_request(struct request_queue *q)
}
done:
if (!__blk_end_request_cur(req, err))
req = blk_fetch_request(q);
req = swim_next_request(swd);
}
}
static void do_fd_request(struct request_queue *q)
{
redo_fd_request(q);
}
static struct floppy_struct floppy_type[4] = {
{ 0, 0, 0, 0, 0, 0x00, 0x00, 0x00, 0x00, NULL }, /* no testing */
{ 720, 9, 1, 80, 0, 0x2A, 0x02, 0xDF, 0x50, NULL }, /* 360KB SS 3.5"*/
@ -833,22 +849,25 @@ static int swim_floppy_init(struct swim_priv *swd)
return -EBUSY;
}
spin_lock_init(&swd->lock);
for (drive = 0; drive < swd->floppy_count; drive++) {
swd->unit[drive].disk = alloc_disk(1);
if (swd->unit[drive].disk == NULL) {
err = -ENOMEM;
goto exit_put_disks;
}
swd->unit[drive].disk->queue = blk_init_queue(do_fd_request,
&swd->lock);
if (!swd->unit[drive].disk->queue) {
err = -ENOMEM;
put_disk(swd->unit[drive].disk);
goto exit_put_disks;
}
swd->unit[drive].disk->queue->queuedata = swd;
swd->unit[drive].swd = swd;
}
spin_lock_init(&swd->lock);
swd->queue = blk_init_queue(do_fd_request, &swd->lock);
if (!swd->queue) {
err = -ENOMEM;
goto exit_put_disks;
}
for (drive = 0; drive < swd->floppy_count; drive++) {
swd->unit[drive].disk->flags = GENHD_FL_REMOVABLE;
swd->unit[drive].disk->major = FLOPPY_MAJOR;
@ -856,7 +875,6 @@ static int swim_floppy_init(struct swim_priv *swd)
sprintf(swd->unit[drive].disk->disk_name, "fd%d", drive);
swd->unit[drive].disk->fops = &floppy_fops;
swd->unit[drive].disk->private_data = &swd->unit[drive];
swd->unit[drive].disk->queue = swd->queue;
set_capacity(swd->unit[drive].disk, 2880);
add_disk(swd->unit[drive].disk);
}
@ -943,13 +961,12 @@ static int swim_remove(struct platform_device *dev)
for (drive = 0; drive < swd->floppy_count; drive++) {
del_gendisk(swd->unit[drive].disk);
blk_cleanup_queue(swd->unit[drive].disk->queue);
put_disk(swd->unit[drive].disk);
}
unregister_blkdev(FLOPPY_MAJOR, "fd");
blk_cleanup_queue(swd->queue);
/* eject floppies */
for (drive = 0; drive < swd->floppy_count; drive++)

View File

@ -343,8 +343,8 @@ static void start_request(struct floppy_state *fs)
req->rq_disk->disk_name, req->cmd,
(long)blk_rq_pos(req), blk_rq_sectors(req),
bio_data(req->bio));
swim3_dbg(" errors=%d current_nr_sectors=%u\n",
req->errors, blk_rq_cur_sectors(req));
swim3_dbg(" current_nr_sectors=%u\n",
blk_rq_cur_sectors(req));
#endif
if (blk_rq_pos(req) >= fs->total_secs) {

View File

@ -111,7 +111,7 @@ static int virtblk_add_req_scsi(struct virtqueue *vq, struct virtblk_req *vbr,
return virtqueue_add_sgs(vq, sgs, num_out, num_in, vbr, GFP_ATOMIC);
}
static inline void virtblk_scsi_reques_done(struct request *req)
static inline void virtblk_scsi_request_done(struct request *req)
{
struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
struct virtio_blk *vblk = req->q->queuedata;
@ -119,7 +119,7 @@ static inline void virtblk_scsi_reques_done(struct request *req)
sreq->resid_len = virtio32_to_cpu(vblk->vdev, vbr->in_hdr.residual);
sreq->sense_len = virtio32_to_cpu(vblk->vdev, vbr->in_hdr.sense_len);
req->errors = virtio32_to_cpu(vblk->vdev, vbr->in_hdr.errors);
sreq->result = virtio32_to_cpu(vblk->vdev, vbr->in_hdr.errors);
}
static int virtblk_ioctl(struct block_device *bdev, fmode_t mode,
@ -144,7 +144,7 @@ static inline int virtblk_add_req_scsi(struct virtqueue *vq,
{
return -EIO;
}
static inline void virtblk_scsi_reques_done(struct request *req)
static inline void virtblk_scsi_request_done(struct request *req)
{
}
#define virtblk_ioctl NULL
@ -175,19 +175,15 @@ static int virtblk_add_req(struct virtqueue *vq, struct virtblk_req *vbr,
static inline void virtblk_request_done(struct request *req)
{
struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
int error = virtblk_result(vbr);
switch (req_op(req)) {
case REQ_OP_SCSI_IN:
case REQ_OP_SCSI_OUT:
virtblk_scsi_reques_done(req);
break;
case REQ_OP_DRV_IN:
req->errors = (error != 0);
virtblk_scsi_request_done(req);
break;
}
blk_mq_end_request(req, error);
blk_mq_end_request(req, virtblk_result(vbr));
}
static void virtblk_done(struct virtqueue *vq)
@ -205,7 +201,7 @@ static void virtblk_done(struct virtqueue *vq)
while ((vbr = virtqueue_get_buf(vblk->vqs[qid].vq, &len)) != NULL) {
struct request *req = blk_mq_rq_from_pdu(vbr);
blk_mq_complete_request(req, req->errors);
blk_mq_complete_request(req);
req_done = true;
}
if (unlikely(virtqueue_is_broken(vq)))
@ -310,7 +306,8 @@ static int virtblk_get_id(struct gendisk *disk, char *id_str)
if (err)
goto out;
err = blk_execute_rq(vblk->disk->queue, vblk->disk, req, false);
blk_execute_rq(vblk->disk->queue, vblk->disk, req, false);
err = virtblk_result(blk_mq_rq_to_pdu(req));
out:
blk_put_request(req);
return err;
@ -597,7 +594,7 @@ static int virtblk_map_queues(struct blk_mq_tag_set *set)
return blk_mq_virtio_map_queues(set, vblk->vdev, 0);
}
static struct blk_mq_ops virtio_mq_ops = {
static const struct blk_mq_ops virtio_mq_ops = {
.queue_rq = virtio_queue_rq,
.complete = virtblk_request_done,
.init_request = virtblk_init_request,

View File

@ -115,6 +115,15 @@ struct split_bio {
atomic_t pending;
};
struct blkif_req {
int error;
};
static inline struct blkif_req *blkif_req(struct request *rq)
{
return blk_mq_rq_to_pdu(rq);
}
static DEFINE_MUTEX(blkfront_mutex);
static const struct block_device_operations xlvbd_block_fops;
@ -907,8 +916,14 @@ static int blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
return BLK_MQ_RQ_QUEUE_BUSY;
}
static struct blk_mq_ops blkfront_mq_ops = {
static void blkif_complete_rq(struct request *rq)
{
blk_mq_end_request(rq, blkif_req(rq)->error);
}
static const struct blk_mq_ops blkfront_mq_ops = {
.queue_rq = blkif_queue_rq,
.complete = blkif_complete_rq,
};
static void blkif_set_queue_limits(struct blkfront_info *info)
@ -969,7 +984,7 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
info->tag_set.queue_depth = BLK_RING_SIZE(info);
info->tag_set.numa_node = NUMA_NO_NODE;
info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
info->tag_set.cmd_size = 0;
info->tag_set.cmd_size = sizeof(struct blkif_req);
info->tag_set.driver_data = info;
if (blk_mq_alloc_tag_set(&info->tag_set))
@ -1543,7 +1558,6 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
unsigned long flags;
struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)dev_id;
struct blkfront_info *info = rinfo->dev_info;
int error;
if (unlikely(info->connected != BLKIF_STATE_CONNECTED))
return IRQ_HANDLED;
@ -1587,37 +1601,36 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
continue;
}
error = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO;
blkif_req(req)->error = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO;
switch (bret->operation) {
case BLKIF_OP_DISCARD:
if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
struct request_queue *rq = info->rq;
printk(KERN_WARNING "blkfront: %s: %s op failed\n",
info->gd->disk_name, op_name(bret->operation));
error = -EOPNOTSUPP;
blkif_req(req)->error = -EOPNOTSUPP;
info->feature_discard = 0;
info->feature_secdiscard = 0;
queue_flag_clear(QUEUE_FLAG_DISCARD, rq);
queue_flag_clear(QUEUE_FLAG_SECERASE, rq);
}
blk_mq_complete_request(req, error);
break;
case BLKIF_OP_FLUSH_DISKCACHE:
case BLKIF_OP_WRITE_BARRIER:
if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
printk(KERN_WARNING "blkfront: %s: %s op failed\n",
info->gd->disk_name, op_name(bret->operation));
error = -EOPNOTSUPP;
blkif_req(req)->error = -EOPNOTSUPP;
}
if (unlikely(bret->status == BLKIF_RSP_ERROR &&
rinfo->shadow[id].req.u.rw.nr_segments == 0)) {
printk(KERN_WARNING "blkfront: %s: empty %s op failed\n",
info->gd->disk_name, op_name(bret->operation));
error = -EOPNOTSUPP;
blkif_req(req)->error = -EOPNOTSUPP;
}
if (unlikely(error)) {
if (error == -EOPNOTSUPP)
error = 0;
if (unlikely(blkif_req(req)->error)) {
if (blkif_req(req)->error == -EOPNOTSUPP)
blkif_req(req)->error = 0;
info->feature_fua = 0;
info->feature_flush = 0;
xlvbd_flush(info);
@ -1629,11 +1642,12 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
dev_dbg(&info->xbdev->dev, "Bad return from blkdev data "
"request: %x\n", bret->status);
blk_mq_complete_request(req, error);
break;
default:
BUG();
}
blk_mq_complete_request(req);
}
rinfo->ring.rsp_cons = i;
@ -2345,6 +2359,7 @@ static void blkfront_connect(struct blkfront_info *info)
unsigned long sector_size;
unsigned int physical_sector_size;
unsigned int binfo;
char *envp[] = { "RESIZE=1", NULL };
int err, i;
switch (info->connected) {
@ -2361,6 +2376,8 @@ static void blkfront_connect(struct blkfront_info *info)
sectors);
set_capacity(info->gd, sectors);
revalidate_disk(info->gd);
kobject_uevent_env(&disk_to_dev(info->gd)->kobj,
KOBJ_CHANGE, envp);
return;
case BLKIF_STATE_SUSPENDED:

View File

@ -829,10 +829,14 @@ static void __zram_make_request(struct zram *zram, struct bio *bio)
offset = (bio->bi_iter.bi_sector &
(SECTORS_PER_PAGE - 1)) << SECTOR_SHIFT;
if (unlikely(bio_op(bio) == REQ_OP_DISCARD)) {
switch (bio_op(bio)) {
case REQ_OP_DISCARD:
case REQ_OP_WRITE_ZEROES:
zram_bio_discard(zram, index, offset, bio);
bio_endio(bio);
return;
default:
break;
}
bio_for_each_segment(bvec, bio, iter) {
@ -1192,6 +1196,8 @@ static int zram_add(void)
zram->disk->queue->limits.max_sectors = SECTORS_PER_PAGE;
zram->disk->queue->limits.chunk_sectors = 0;
blk_queue_max_discard_sectors(zram->disk->queue, UINT_MAX);
queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, zram->disk->queue);
/*
* zram_bio_discard() will clear all logical blocks if logical block
* size is identical with physical block size(PAGE_SIZE). But if it is
@ -1201,10 +1207,7 @@ static int zram_add(void)
* zeroed.
*/
if (ZRAM_LOGICAL_BLOCK_SIZE == PAGE_SIZE)
zram->disk->queue->limits.discard_zeroes_data = 1;
else
zram->disk->queue->limits.discard_zeroes_data = 0;
queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, zram->disk->queue);
blk_queue_max_write_zeroes_sectors(zram->disk->queue, UINT_MAX);
add_disk(zram->disk);

View File

@ -2218,7 +2218,8 @@ static int cdrom_read_cdda_bpc(struct cdrom_device_info *cdi, __u8 __user *ubuf,
rq->timeout = 60 * HZ;
bio = rq->bio;
if (blk_execute_rq(q, cdi->disk, rq, 0)) {
blk_execute_rq(q, cdi->disk, rq, 0);
if (scsi_req(rq)->result) {
struct request_sense *s = req->sense;
ret = -EIO;
cdi->last_sense = s->sense_key;

View File

@ -107,7 +107,8 @@ int ide_queue_pc_tail(ide_drive_t *drive, struct gendisk *disk,
memcpy(scsi_req(rq)->cmd, pc->c, 12);
if (drive->media == ide_tape)
scsi_req(rq)->cmd[13] = REQ_IDETAPE_PC1;
error = blk_execute_rq(drive->queue, disk, rq, 0);
blk_execute_rq(drive->queue, disk, rq, 0);
error = scsi_req(rq)->result ? -EIO : 0;
put_req:
blk_put_request(rq);
return error;
@ -454,7 +455,7 @@ static ide_startstop_t ide_pc_intr(ide_drive_t *drive)
debug_log("%s: I/O error\n", drive->name);
if (drive->media != ide_tape)
pc->rq->errors++;
scsi_req(pc->rq)->result++;
if (scsi_req(rq)->cmd[0] == REQUEST_SENSE) {
printk(KERN_ERR PFX "%s: I/O error in request "
@ -488,13 +489,13 @@ static ide_startstop_t ide_pc_intr(ide_drive_t *drive)
drive->failed_pc = NULL;
if (ata_misc_request(rq)) {
rq->errors = 0;
scsi_req(rq)->result = 0;
error = 0;
} else {
if (blk_rq_is_passthrough(rq) && uptodate <= 0) {
if (rq->errors == 0)
rq->errors = -EIO;
if (scsi_req(rq)->result == 0)
scsi_req(rq)->result = -EIO;
}
error = uptodate ? 0 : -EIO;

View File

@ -247,10 +247,10 @@ static int ide_cd_breathe(ide_drive_t *drive, struct request *rq)
struct cdrom_info *info = drive->driver_data;
if (!rq->errors)
if (!scsi_req(rq)->result)
info->write_timeout = jiffies + ATAPI_WAIT_WRITE_BUSY;
rq->errors = 1;
scsi_req(rq)->result = 1;
if (time_after(jiffies, info->write_timeout))
return 0;
@ -294,8 +294,8 @@ static int cdrom_decode_status(ide_drive_t *drive, u8 stat)
}
/* if we have an error, pass CHECK_CONDITION as the SCSI status byte */
if (blk_rq_is_scsi(rq) && !rq->errors)
rq->errors = SAM_STAT_CHECK_CONDITION;
if (blk_rq_is_scsi(rq) && !scsi_req(rq)->result)
scsi_req(rq)->result = SAM_STAT_CHECK_CONDITION;
if (blk_noretry_request(rq))
do_end_request = 1;
@ -325,7 +325,7 @@ static int cdrom_decode_status(ide_drive_t *drive, u8 stat)
* Arrange to retry the request but be sure to give up if we've
* retried too many times.
*/
if (++rq->errors > ERROR_MAX)
if (++scsi_req(rq)->result > ERROR_MAX)
do_end_request = 1;
break;
case ILLEGAL_REQUEST:
@ -372,7 +372,7 @@ static int cdrom_decode_status(ide_drive_t *drive, u8 stat)
/* go to the default handler for other errors */
ide_error(drive, "cdrom_decode_status", stat);
return 1;
} else if (++rq->errors > ERROR_MAX)
} else if (++scsi_req(rq)->result > ERROR_MAX)
/* we've racked up too many retries, abort */
do_end_request = 1;
}
@ -452,7 +452,8 @@ int ide_cd_queue_pc(ide_drive_t *drive, const unsigned char *cmd,
}
}
error = blk_execute_rq(drive->queue, info->disk, rq, 0);
blk_execute_rq(drive->queue, info->disk, rq, 0);
error = scsi_req(rq)->result ? -EIO : 0;
if (buffer)
*bufflen = scsi_req(rq)->resid_len;
@ -683,8 +684,8 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t *drive)
if (cmd->nleft == 0)
uptodate = 1;
} else {
if (uptodate <= 0 && rq->errors == 0)
rq->errors = -EIO;
if (uptodate <= 0 && scsi_req(rq)->result == 0)
scsi_req(rq)->result = -EIO;
}
if (uptodate == 0 && rq->bio)
@ -1379,7 +1380,7 @@ static int ide_cdrom_prep_pc(struct request *rq)
* appropriate action
*/
if (c[0] == MODE_SENSE || c[0] == MODE_SELECT) {
rq->errors = ILLEGAL_REQUEST;
scsi_req(rq)->result = ILLEGAL_REQUEST;
return BLKPREP_KILL;
}

View File

@ -307,7 +307,8 @@ int ide_cdrom_reset(struct cdrom_device_info *cdi)
scsi_req_init(rq);
ide_req(rq)->type = ATA_PRIV_MISC;
rq->rq_flags = RQF_QUIET;
ret = blk_execute_rq(drive->queue, cd->disk, rq, 0);
blk_execute_rq(drive->queue, cd->disk, rq, 0);
ret = scsi_req(rq)->result ? -EIO : 0;
blk_put_request(rq);
/*
* A reset will unlock the door. If it was previously locked,

View File

@ -173,8 +173,8 @@ int ide_devset_execute(ide_drive_t *drive, const struct ide_devset *setting,
*(int *)&scsi_req(rq)->cmd[1] = arg;
rq->special = setting->set;
if (blk_execute_rq(q, NULL, rq, 0))
ret = rq->errors;
blk_execute_rq(q, NULL, rq, 0);
ret = scsi_req(rq)->result;
blk_put_request(rq);
return ret;
@ -186,7 +186,7 @@ ide_startstop_t ide_do_devset(ide_drive_t *drive, struct request *rq)
err = setfunc(drive, *(int *)&scsi_req(rq)->cmd[1]);
if (err)
rq->errors = err;
ide_complete_rq(drive, err, blk_rq_bytes(rq));
scsi_req(rq)->result = err;
ide_complete_rq(drive, 0, blk_rq_bytes(rq));
return ide_stopped;
}

View File

@ -470,7 +470,6 @@ ide_devset_get(multcount, mult_count);
static int set_multcount(ide_drive_t *drive, int arg)
{
struct request *rq;
int error;
if (arg < 0 || arg > (drive->id[ATA_ID_MAX_MULTSECT] & 0xff))
return -EINVAL;
@ -484,7 +483,7 @@ static int set_multcount(ide_drive_t *drive, int arg)
drive->mult_req = arg;
drive->special_flags |= IDE_SFLAG_SET_MULTMODE;
error = blk_execute_rq(drive->queue, NULL, rq, 0);
blk_execute_rq(drive->queue, NULL, rq, 0);
blk_put_request(rq);
return (drive->mult_count == arg) ? 0 : -EIO;

View File

@ -490,7 +490,7 @@ ide_startstop_t ide_dma_timeout_retry(ide_drive_t *drive, int error)
* make sure request is sane
*/
if (hwif->rq)
hwif->rq->errors = 0;
scsi_req(hwif->rq)->result = 0;
return ret;
}

View File

@ -12,7 +12,7 @@ static ide_startstop_t ide_ata_error(ide_drive_t *drive, struct request *rq,
if ((stat & ATA_BUSY) ||
((stat & ATA_DF) && (drive->dev_flags & IDE_DFLAG_NOWERR) == 0)) {
/* other bits are useless when BUSY */
rq->errors |= ERROR_RESET;
scsi_req(rq)->result |= ERROR_RESET;
} else if (stat & ATA_ERR) {
/* err has different meaning on cdrom and tape */
if (err == ATA_ABORTED) {
@ -25,10 +25,10 @@ static ide_startstop_t ide_ata_error(ide_drive_t *drive, struct request *rq,
drive->crc_count++;
} else if (err & (ATA_BBK | ATA_UNC)) {
/* retries won't help these */
rq->errors = ERROR_MAX;
scsi_req(rq)->result = ERROR_MAX;
} else if (err & ATA_TRK0NF) {
/* help it find track zero */
rq->errors |= ERROR_RECAL;
scsi_req(rq)->result |= ERROR_RECAL;
}
}
@ -39,23 +39,23 @@ static ide_startstop_t ide_ata_error(ide_drive_t *drive, struct request *rq,
ide_pad_transfer(drive, READ, nsect * SECTOR_SIZE);
}
if (rq->errors >= ERROR_MAX || blk_noretry_request(rq)) {
if (scsi_req(rq)->result >= ERROR_MAX || blk_noretry_request(rq)) {
ide_kill_rq(drive, rq);
return ide_stopped;
}
if (hwif->tp_ops->read_status(hwif) & (ATA_BUSY | ATA_DRQ))
rq->errors |= ERROR_RESET;
scsi_req(rq)->result |= ERROR_RESET;
if ((rq->errors & ERROR_RESET) == ERROR_RESET) {
++rq->errors;
if ((scsi_req(rq)->result & ERROR_RESET) == ERROR_RESET) {
++scsi_req(rq)->result;
return ide_do_reset(drive);
}
if ((rq->errors & ERROR_RECAL) == ERROR_RECAL)
if ((scsi_req(rq)->result & ERROR_RECAL) == ERROR_RECAL)
drive->special_flags |= IDE_SFLAG_RECALIBRATE;
++rq->errors;
++scsi_req(rq)->result;
return ide_stopped;
}
@ -68,7 +68,7 @@ static ide_startstop_t ide_atapi_error(ide_drive_t *drive, struct request *rq,
if ((stat & ATA_BUSY) ||
((stat & ATA_DF) && (drive->dev_flags & IDE_DFLAG_NOWERR) == 0)) {
/* other bits are useless when BUSY */
rq->errors |= ERROR_RESET;
scsi_req(rq)->result |= ERROR_RESET;
} else {
/* add decoding error stuff */
}
@ -77,14 +77,14 @@ static ide_startstop_t ide_atapi_error(ide_drive_t *drive, struct request *rq,
/* force an abort */
hwif->tp_ops->exec_command(hwif, ATA_CMD_IDLEIMMEDIATE);
if (rq->errors >= ERROR_MAX) {
if (scsi_req(rq)->result >= ERROR_MAX) {
ide_kill_rq(drive, rq);
} else {
if ((rq->errors & ERROR_RESET) == ERROR_RESET) {
++rq->errors;
if ((scsi_req(rq)->result & ERROR_RESET) == ERROR_RESET) {
++scsi_req(rq)->result;
return ide_do_reset(drive);
}
++rq->errors;
++scsi_req(rq)->result;
}
return ide_stopped;
@ -130,11 +130,11 @@ ide_startstop_t ide_error(ide_drive_t *drive, const char *msg, u8 stat)
if (cmd)
ide_complete_cmd(drive, cmd, stat, err);
} else if (ata_pm_request(rq)) {
rq->errors = 1;
scsi_req(rq)->result = 1;
ide_complete_pm_rq(drive, rq);
return ide_stopped;
}
rq->errors = err;
scsi_req(rq)->result = err;
ide_complete_rq(drive, err ? -EIO : 0, blk_rq_bytes(rq));
return ide_stopped;
}
@ -149,8 +149,8 @@ static inline void ide_complete_drive_reset(ide_drive_t *drive, int err)
if (rq && ata_misc_request(rq) &&
scsi_req(rq)->cmd[0] == REQ_DRIVE_RESET) {
if (err <= 0 && rq->errors == 0)
rq->errors = -EIO;
if (err <= 0 && scsi_req(rq)->result == 0)
scsi_req(rq)->result = -EIO;
ide_complete_rq(drive, err ? err : 0, blk_rq_bytes(rq));
}
}

View File

@ -98,7 +98,7 @@ static int ide_floppy_callback(ide_drive_t *drive, int dsc)
}
if (ata_misc_request(rq))
rq->errors = uptodate ? 0 : IDE_DRV_ERROR_GENERAL;
scsi_req(rq)->result = uptodate ? 0 : IDE_DRV_ERROR_GENERAL;
return uptodate;
}
@ -239,7 +239,7 @@ static ide_startstop_t ide_floppy_do_request(ide_drive_t *drive,
? rq->rq_disk->disk_name
: "dev?"));
if (rq->errors >= ERROR_MAX) {
if (scsi_req(rq)->result >= ERROR_MAX) {
if (drive->failed_pc) {
ide_floppy_report_error(floppy, drive->failed_pc);
drive->failed_pc = NULL;
@ -247,7 +247,7 @@ static ide_startstop_t ide_floppy_do_request(ide_drive_t *drive,
printk(KERN_ERR PFX "%s: I/O error\n", drive->name);
if (ata_misc_request(rq)) {
rq->errors = 0;
scsi_req(rq)->result = 0;
ide_complete_rq(drive, 0, blk_rq_bytes(rq));
return ide_stopped;
} else
@ -301,8 +301,8 @@ static ide_startstop_t ide_floppy_do_request(ide_drive_t *drive,
return ide_floppy_issue_pc(drive, &cmd, pc);
out_end:
drive->failed_pc = NULL;
if (blk_rq_is_passthrough(rq) && rq->errors == 0)
rq->errors = -EIO;
if (blk_rq_is_passthrough(rq) && scsi_req(rq)->result == 0)
scsi_req(rq)->result = -EIO;
ide_complete_rq(drive, -EIO, blk_rq_bytes(rq));
return ide_stopped;
}

View File

@ -141,12 +141,12 @@ void ide_kill_rq(ide_drive_t *drive, struct request *rq)
drive->failed_pc = NULL;
if ((media == ide_floppy || media == ide_tape) && drv_req) {
rq->errors = 0;
scsi_req(rq)->result = 0;
} else {
if (media == ide_tape)
rq->errors = IDE_DRV_ERROR_GENERAL;
else if (blk_rq_is_passthrough(rq) && rq->errors == 0)
rq->errors = -EIO;
scsi_req(rq)->result = IDE_DRV_ERROR_GENERAL;
else if (blk_rq_is_passthrough(rq) && scsi_req(rq)->result == 0)
scsi_req(rq)->result = -EIO;
}
ide_complete_rq(drive, -EIO, blk_rq_bytes(rq));
@ -271,7 +271,7 @@ static ide_startstop_t execute_drive_cmd (ide_drive_t *drive,
#ifdef DEBUG
printk("%s: DRIVE_CMD (null)\n", drive->name);
#endif
rq->errors = 0;
scsi_req(rq)->result = 0;
ide_complete_rq(drive, 0, blk_rq_bytes(rq));
return ide_stopped;

View File

@ -128,7 +128,8 @@ static int ide_cmd_ioctl(ide_drive_t *drive, unsigned long arg)
rq = blk_get_request(drive->queue, REQ_OP_DRV_IN, __GFP_RECLAIM);
scsi_req_init(rq);
ide_req(rq)->type = ATA_PRIV_TASKFILE;
err = blk_execute_rq(drive->queue, NULL, rq, 0);
blk_execute_rq(drive->queue, NULL, rq, 0);
err = scsi_req(rq)->result ? -EIO : 0;
blk_put_request(rq);
return err;
@ -227,8 +228,8 @@ static int generic_drive_reset(ide_drive_t *drive)
ide_req(rq)->type = ATA_PRIV_MISC;
scsi_req(rq)->cmd_len = 1;
scsi_req(rq)->cmd[0] = REQ_DRIVE_RESET;
if (blk_execute_rq(drive->queue, NULL, rq, 1))
ret = rq->errors;
blk_execute_rq(drive->queue, NULL, rq, 1);
ret = scsi_req(rq)->result;
blk_put_request(rq);
return ret;
}

View File

@ -37,7 +37,8 @@ static void issue_park_cmd(ide_drive_t *drive, unsigned long timeout)
scsi_req(rq)->cmd_len = 1;
ide_req(rq)->type = ATA_PRIV_MISC;
rq->special = &timeout;
rc = blk_execute_rq(q, NULL, rq, 1);
blk_execute_rq(q, NULL, rq, 1);
rc = scsi_req(rq)->result ? -EIO : 0;
blk_put_request(rq);
if (rc)
goto out;

View File

@ -27,7 +27,8 @@ int generic_ide_suspend(struct device *dev, pm_message_t mesg)
mesg.event = PM_EVENT_FREEZE;
rqpm.pm_state = mesg.event;
ret = blk_execute_rq(drive->queue, NULL, rq, 0);
blk_execute_rq(drive->queue, NULL, rq, 0);
ret = scsi_req(rq)->result ? -EIO : 0;
blk_put_request(rq);
if (ret == 0 && ide_port_acpi(hwif)) {
@ -55,8 +56,8 @@ static int ide_pm_execute_rq(struct request *rq)
spin_lock_irq(q->queue_lock);
if (unlikely(blk_queue_dying(q))) {
rq->rq_flags |= RQF_QUIET;
rq->errors = -ENXIO;
__blk_end_request_all(rq, rq->errors);
scsi_req(rq)->result = -ENXIO;
__blk_end_request_all(rq, 0);
spin_unlock_irq(q->queue_lock);
return -ENXIO;
}
@ -66,7 +67,7 @@ static int ide_pm_execute_rq(struct request *rq)
wait_for_completion_io(&wait);
return rq->errors ? -EIO : 0;
return scsi_req(rq)->result ? -EIO : 0;
}
int generic_ide_resume(struct device *dev)

View File

@ -366,7 +366,7 @@ static int ide_tape_callback(ide_drive_t *drive, int dsc)
err = pc->error;
}
}
rq->errors = err;
scsi_req(rq)->result = err;
return uptodate;
}
@ -879,7 +879,7 @@ static int idetape_queue_rw_tail(ide_drive_t *drive, int cmd, int size)
tape->valid = 0;
ret = size;
if (rq->errors == IDE_DRV_ERROR_GENERAL)
if (scsi_req(rq)->result == IDE_DRV_ERROR_GENERAL)
ret = -EIO;
out_put:
blk_put_request(rq);

Some files were not shown because too many files have changed in this diff Show More