Quantcast
Viewing all articles
Browse latest Browse all 1280

Answer by HolyBlackCat for What is the significance of 'strongly happens before' compared to '(simply) happens before'?

Here's my current understanding, which could be incomplete or incorrect. A verification would be appreciated.


C++20 renamed strongly happens before to simply happens before, and introduced a new, more relaxed definition for strongly happens before, which imposes less ordering.

Simply happens before is used to reason about the presence of data races in your code. (Actually that would be the plain 'happens before', but the two are equivalent in absence of consume operations, the use of which is discouraged by the standard, since most (all?) major compilers treat them as acquires.)

The weaker strongly happens before is used to reason about the global order of seq-cst operations.


This change was introduced in proposal P0668R5: Revising the C++ memory model, which is based on the paper Repairing Sequential Consistency in C/C++11 by Lahav et al (which I didn't fully read).

The proposal explains why the change was made. Long story short, the way most compilers implement atomics on Power and ARM architectures turned out to be non-conformant in rare edge cases, and fixing the compilers had a performance cost, so they fixed the standard instead.

The change only affects you if you mix seq-cst operations with acquire-release operations on the same atomic variable (i.e. if an acquire operation reads a value from a seq-cst store, or a seq-cst operation reads a value from a release store).

If you don't mix operations in this manner, then you're not affected (i.e. can treat simply happens before and strongly happens before as equivalent).

The gist of the change is that the synchronization between a seq-cst operation and the corresponding acquire/release operation no longer affects the position of this specific seq-cst operation in the global seq-cst order, but the synchronization itself is still there.

This makes the seq-cst order for the affected seq-cst operations moot, see below.


The proposal presents following example, and I'll try to explain my understanding of it:

atomic_int x = 0, y = 0;int a = 0, b = 0, c = 0;// Thread 1x.store(1, seq_cst);y.store(1, release);// Thread 2b = y.fetch_add(1, seq_cst); // b = 1 (the value of y before increment)c = y.load(relaxed); // c = 3// Thread 3y.store(3, seq_cst);a = x.load(seq_cst); // a = 0

The comments indicate one of the ways that this code can execute, which the standard used to forbid (before this change), but which actually can happen on the affected architectures.

The execution proceeds as follows:

    Sequenced-before (and strongly-happens-before) in each individual thread       | Variable modification orders                   | Global seq_cst order    T1                            T2                      T3                         X                       Y                                                                                                         .-- .load(seq_cst) == 0                              (3).-> x.store(1, seq_cst)                                                          |   .store(1, seq_cst)                               (4)|   y.store(1, release) -.                                                       |                           .store(1, release)|                        '-sync-> y.fetch_add(1, seq_cst)                        |                           .fetch_add(1, seq_cst)   (1)|                                                         y.store(3, seq_cst)    |                           .store(3, seq_cst)       (2)|                                 y.load(relaxed)         x.load(seq_cst) == 0 --'                           .load(relaxed)|                                                            ||                                                           coherence-|                                                           ordered|                                                           before|                                                            |'------------------------------------------------------------'

Here time flows downwards.

But, as usual, there's no single timeline. Rather, there are several, some of which coincide in some aspects. Each column represents a separate timeline (ordering).

T1,T2,T3 represent the sequenced-before relation (internal to each thread), which implies strongly-happens-before too.

X,Y are the modification orders of the respective variables. (I've included reads, even though they're don't technically Each thread must be consistent with each variable, but threads don't have to be consistent with one another. Each variable's modification order must be consistent with the global seq-cst order (only counting seq-cst operations on those variables).

On the first glance, x.load(seq_cst) == 0 is in a weird place in the modification order of X (hence the connecting line leading to it) but this is completely legal (doesn't contradict the previous paragraph), regardless of the C++20 changes.

The "sync" arrow represents the synchronizes-with relation. The seq-cst order would normally agree with it, but since this synchronization is between a seq-cst and a non-seq-cst operation, it doesn't happen. In the "seq-cst" column, you can see a discontinuity at this point.

There's also a "coherence-ordered before" arrow, which is a new relation introduced in this proposal, which is only used to define the global seq-cst order, and apparently imposes no synchronization (unlike release-acquire operations). (In this case this relation coincides with common sense: if a load doesn't see the value written by the store, then the load goes before the store.)


Viewing all articles
Browse latest Browse all 1280

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>