Documentation/memory-barriers.txt: Add needed ACCESS_ONCE() calls to memory-barriers.txt

The Documentation/memory-barriers.txt file was written before
the need for ACCESS_ONCE() was fully appreciated.  It therefore
contains no ACCESS_ONCE() calls, which can be a problem when
people lift examples from it.  This commit therefore adds
ACCESS_ONCE() calls.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <linux-arch@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1386799151-2219-1-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This commit is contained in:
Paul E. McKenney 2013-12-11 13:59:04 -08:00 committed by Ingo Molnar
parent 962d9c5757
commit 2ecf810121

View File

@ -194,18 +194,22 @@ There are some minimal guarantees that may be expected of a CPU:
(*) On any given CPU, dependent memory accesses will be issued in order, with
respect to itself. This means that for:
Q = P; D = *Q;
ACCESS_ONCE(Q) = P; smp_read_barrier_depends(); D = ACCESS_ONCE(*Q);
the CPU will issue the following memory operations:
Q = LOAD P, D = LOAD *Q
and always in that order.
and always in that order. On most systems, smp_read_barrier_depends()
does nothing, but it is required for DEC Alpha. The ACCESS_ONCE()
is required to prevent compiler mischief. Please note that you
should normally use something like rcu_dereference() instead of
open-coding smp_read_barrier_depends().
(*) Overlapping loads and stores within a particular CPU will appear to be
ordered within that CPU. This means that for:
a = *X; *X = b;
a = ACCESS_ONCE(*X); ACCESS_ONCE(*X) = b;
the CPU will only issue the following sequence of memory operations:
@ -213,7 +217,7 @@ There are some minimal guarantees that may be expected of a CPU:
And for:
*X = c; d = *X;
ACCESS_ONCE(*X) = c; d = ACCESS_ONCE(*X);
the CPU will only issue:
@ -224,6 +228,41 @@ There are some minimal guarantees that may be expected of a CPU:
And there are a number of things that _must_ or _must_not_ be assumed:
(*) It _must_not_ be assumed that the compiler will do what you want with
memory references that are not protected by ACCESS_ONCE(). Without
ACCESS_ONCE(), the compiler is within its rights to do all sorts
of "creative" transformations:
(-) Repeat the load, possibly getting a different value on the second
and subsequent loads. This is especially prone to happen when
register pressure is high.
(-) Merge adjacent loads and stores to the same location. The most
familiar example is the transformation from:
while (a)
do_something();
to something like:
if (a)
for (;;)
do_something();
Using ACCESS_ONCE() as follows prevents this sort of optimization:
while (ACCESS_ONCE(a))
do_something();
(-) "Store tearing", where a single store in the source code is split
into smaller stores in the object code. Note that gcc really
will do this on some architectures when storing certain constants.
It can be cheaper to do a series of immediate stores than to
form the constant in a register and then to store that register.
(-) "Load tearing", which splits loads in a manner analogous to
store tearing.
(*) It _must_not_ be assumed that independent loads and stores will be issued
in the order given. This means that for:
@ -450,14 +489,14 @@ The usage requirements of data dependency barriers are a little subtle, and
it's not always obvious that they're needed. To illustrate, consider the
following sequence of events:
CPU 1 CPU 2
=============== ===============
CPU 1 CPU 2
=============== ===============
{ A == 1, B == 2, C = 3, P == &A, Q == &C }
B = 4;
<write barrier>
P = &B
Q = P;
D = *Q;
ACCESS_ONCE(P) = &B
Q = ACCESS_ONCE(P);
D = *Q;
There's a clear data dependency here, and it would seem that by the end of the
sequence, Q must be either &A or &B, and that:
@ -477,15 +516,15 @@ Alpha).
To deal with this, a data dependency barrier or better must be inserted
between the address load and the data load:
CPU 1 CPU 2
=============== ===============
CPU 1 CPU 2
=============== ===============
{ A == 1, B == 2, C = 3, P == &A, Q == &C }
B = 4;
<write barrier>
P = &B
Q = P;
<data dependency barrier>
D = *Q;
ACCESS_ONCE(P) = &B
Q = ACCESS_ONCE(P);
<data dependency barrier>
D = *Q;
This enforces the occurrence of one of the two implications, and prevents the
third possibility from arising.
@ -504,21 +543,22 @@ Another example of where data dependency barriers might be required is where a
number is read from memory and then used to calculate the index for an array
access:
CPU 1 CPU 2
=============== ===============
CPU 1 CPU 2
=============== ===============
{ M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 }
M[1] = 4;
<write barrier>
P = 1
Q = P;
<data dependency barrier>
D = M[Q];
ACCESS_ONCE(P) = 1
Q = ACCESS_ONCE(P);
<data dependency barrier>
D = M[Q];
The data dependency barrier is very important to the RCU system, for example.
See rcu_dereference() in include/linux/rcupdate.h. This permits the current
target of an RCU'd pointer to be replaced with a new modified target, without
the replacement target appearing to be incompletely initialised.
The data dependency barrier is very important to the RCU system,
for example. See rcu_assign_pointer() and rcu_dereference() in
include/linux/rcupdate.h. This permits the current target of an RCU'd
pointer to be replaced with a new modified target, without the replacement
target appearing to be incompletely initialised.
See also the subsection on "Cache Coherency" for a more thorough example.
@ -530,22 +570,23 @@ A control dependency requires a full read memory barrier, not simply a data
dependency barrier to make it work correctly. Consider the following bit of
code:
q = &a;
q = ACCESS_ONCE(a);
if (p) {
<data dependency barrier>
q = &b;
q = ACCESS_ONCE(b);
}
x = *q;
This will not have the desired effect because there is no actual data
dependency, but rather a control dependency that the CPU may short-circuit by
attempting to predict the outcome in advance. In such a case what's actually
required is:
dependency, but rather a control dependency that the CPU may short-circuit
by attempting to predict the outcome in advance, so that other CPUs see
the load from b as having happened before the load from a. In such a
case what's actually required is:
q = &a;
q = ACCESS_ONCE(a);
if (p) {
<read barrier>
q = &b;
q = ACCESS_ONCE(b);
}
x = *q;
@ -561,23 +602,23 @@ barrier, though a general barrier would also be viable. Similarly a read
barrier or a data dependency barrier should always be paired with at least an
write barrier, though, again, a general barrier is viable:
CPU 1 CPU 2
=============== ===============
a = 1;
CPU 1 CPU 2
=============== ===============
ACCESS_ONCE(a) = 1;
<write barrier>
b = 2; x = b;
<read barrier>
y = a;
ACCESS_ONCE(b) = 2; x = ACCESS_ONCE(b);
<read barrier>
y = ACCESS_ONCE(a);
Or:
CPU 1 CPU 2
=============== ===============================
CPU 1 CPU 2
=============== ===============================
a = 1;
<write barrier>
b = &a; x = b;
<data dependency barrier>
y = *x;
ACCESS_ONCE(b) = &a; x = ACCESS_ONCE(b);
<data dependency barrier>
y = *x;
Basically, the read barrier always has to be there, even though it can be of
the "weaker" type.
@ -586,13 +627,13 @@ the "weaker" type.
match the loads after the read barrier or the data dependency barrier, and vice
versa:
CPU 1 CPU 2
=============== ===============
a = 1; }---- --->{ v = c
b = 2; } \ / { w = d
<write barrier> \ <read barrier>
c = 3; } / \ { x = a;
d = 4; }---- --->{ y = b;
CPU 1 CPU 2
=================== ===================
ACCESS_ONCE(a) = 1; }---- --->{ v = ACCESS_ONCE(c);
ACCESS_ONCE(b) = 2; } \ / { w = ACCESS_ONCE(d);
<write barrier> \ <read barrier>
ACCESS_ONCE(c) = 3; } / \ { x = ACCESS_ONCE(a);
ACCESS_ONCE(d) = 4; }---- --->{ y = ACCESS_ONCE(b);
EXAMPLES OF MEMORY BARRIER SEQUENCES
@ -1435,12 +1476,12 @@ three CPUs; then should the following sequence of events occur:
CPU 1 CPU 2
=============================== ===============================
*A = a; *E = e;
ACCESS_ONCE(*A) = a; ACCESS_ONCE(*E) = e;
LOCK M LOCK Q
*B = b; *F = f;
*C = c; *G = g;
ACCESS_ONCE(*B) = b; ACCESS_ONCE(*F) = f;
ACCESS_ONCE(*C) = c; ACCESS_ONCE(*G) = g;
UNLOCK M UNLOCK Q
*D = d; *H = h;
ACCESS_ONCE(*D) = d; ACCESS_ONCE(*H) = h;
Then there is no guarantee as to what order CPU 3 will see the accesses to *A
through *H occur in, other than the constraints imposed by the separate locks
@ -1460,17 +1501,17 @@ However, if the following occurs:
CPU 1 CPU 2
=============================== ===============================
*A = a;
LOCK M [1]
*B = b;
*C = c;
UNLOCK M [1]
*D = d; *E = e;
LOCK M [2]
*F = f;
*G = g;
UNLOCK M [2]
*H = h;
ACCESS_ONCE(*A) = a;
LOCK M [1]
ACCESS_ONCE(*B) = b;
ACCESS_ONCE(*C) = c;
UNLOCK M [1]
ACCESS_ONCE(*D) = d; ACCESS_ONCE(*E) = e;
LOCK M [2]
ACCESS_ONCE(*F) = f;
ACCESS_ONCE(*G) = g;
UNLOCK M [2]
ACCESS_ONCE(*H) = h;
CPU 3 might see:
@ -2177,11 +2218,11 @@ A programmer might take it for granted that the CPU will perform memory
operations in exactly the order specified, so that if the CPU is, for example,
given the following piece of code to execute:
a = *A;
*B = b;
c = *C;
d = *D;
*E = e;
a = ACCESS_ONCE(*A);
ACCESS_ONCE(*B) = b;
c = ACCESS_ONCE(*C);
d = ACCESS_ONCE(*D);
ACCESS_ONCE(*E) = e;
they would then expect that the CPU will complete the memory operation for each
instruction before moving on to the next one, leading to a definite sequence of
@ -2228,12 +2269,12 @@ However, it is guaranteed that a CPU will be self-consistent: it will see its
_own_ accesses appear to be correctly ordered, without the need for a memory
barrier. For instance with the following code:
U = *A;
*A = V;
*A = W;
X = *A;
*A = Y;
Z = *A;
U = ACCESS_ONCE(*A);
ACCESS_ONCE(*A) = V;
ACCESS_ONCE(*A) = W;
X = ACCESS_ONCE(*A);
ACCESS_ONCE(*A) = Y;
Z = ACCESS_ONCE(*A);
and assuming no intervention by an external influence, it can be assumed that
the final result will appear to be:
@ -2250,7 +2291,12 @@ accesses:
in that order, but, without intervention, the sequence may have almost any
combination of elements combined or discarded, provided the program's view of
the world remains consistent.
the world remains consistent. Note that ACCESS_ONCE() is -not- optional
in the above example, as there are architectures where a given CPU might
interchange successive loads to the same location. On such architectures,
ACCESS_ONCE() does whatever is necessary to prevent this, for example, on
Itanium the volatile casts used by ACCESS_ONCE() cause GCC to emit the
special ld.acq and st.rel instructions that prevent such reordering.
The compiler may also combine, discard or defer elements of the sequence before
the CPU even sees them.
@ -2264,13 +2310,13 @@ may be reduced to:
*A = W;
since, without a write barrier, it can be assumed that the effect of the
storage of V to *A is lost. Similarly:
since, without either a write barrier or an ACCESS_ONCE(), it can be
assumed that the effect of the storage of V to *A is lost. Similarly:
*A = Y;
Z = *A;
may, without a memory barrier, be reduced to:
may, without a memory barrier or an ACCESS_ONCE(), be reduced to:
*A = Y;
Z = Y;