Index

Impact on existing optimizations

This very ad-hoc 'analysis' is meant to be a starting point for discussion.

Do not consider this any sort of authoritative reference!

It *only* concerns optimizations involving shared memory variables which happen across an atomic operation but are otherwise unrelated.

Optimizations that are likely to be potential candidates for allowing optimizing around/across atomic operations:

Any optimization not explicitly mentioned will simply play it safe and treat all atomics as complete barriers. A summary table will be provided at the end. Feel free to bring others to my attention.

My impression would be that an optimization which encounters an acq_rel or seq_cst atomic would simply disallow whatever it is considering. So no dead store, hoisting or sinking or anything. Full barrier just like an unknown side effect function call.

Relaxed mode atomics would allow all these optimizations to happen across them.

Which leaves only the release, acquire and consume to be potentially interesting. I was planning to treat consume exactly like acquire. My understanding is that it's similar, except it only applies to things used directly in the calculation of the value read by the consume. Perhaps that means it can be treated like a relaxed mode operation, but I'm not sure and I doubt it. So it will be treated like acquire for now.

Generally, release is a barrier to sinking code, acquire is a barrier to hoisting code. A quick rundown would be :

DSE

a = 1
atomic operation
a = 2;

DSE in essence sinks the first statement to the second statement, and eliminates the first as redundant. Thus a = 1 cannot be removed across a release operation, but would be valid across an acquire operation.

DCE

Dead code elimination differs from DSE in that the code being eliminated is either unreachable, or has a value which is proven to be never used. A store to shared memory be definition is a potentially used value, so it is not eliminated (thus the need for a separate DSE pass). The shared memory values that are removed are therefore loads or unreachable code. I suspect even in the case of IPA DCE there should be no impact by atomic operations. Even for seq-cst and acq_rel!

PRE

if (some_condition) {
  y = x + 4;
}
atomic operation
z = x + 4;

becomes

if (some_condition) {
  t = x + 4;
  y = t;
}
else {
  t = x + 4;
}
atomic operation
z = t;

This really involves hoisting loads and leaving the stores where they are, so this would be valid across a release operation, but invalid across an acquire. (The acquire may cause a new value for x to be visible.)

CSE

CSE basically hoists subexpressions into temporaries, so it would have the same logic apply as PRE: valid across release, invalid across an acquire.

Store Sinking

a = 1
atomic_opertation

Almost by definition, sinking a=1 below the atomic operation would be disabled for release, enabled for acquire

Reassociation

Basic reassociation simply changes the order or values used in expressions. this means it moves loads around, sometimes raising them, sometimes sinking them. Usually this is all within a statement, but not always. I wouldn't expect to see a lot of reassociation opportunities across an atomic operation, but it would have to be modified based on exactly what its trying to do. Hoisting loads would be allowed across a release operation, and sinking them would be allowed across an acquire.

I'd probably leave this one to last because there seems to be more room for error with it, but the same rules seem sensible, its just more difficult to pin down in a short summary.

Load Hoisting

Hoisting of loads and expressions are allowed across a release, disabled for acquire since it can cause a new value to be available for the load.

Value Numbering

w = 3;
x = 3;
atomic_operation
y = x + 4;
z = w + 4;

Value numbering in essence presumes the load of a value can be considered a constant and reused at a later point. so

t1 = 3
w = t1;
x = t1;
atomic_operation
y = t1 + 4;  // 7
z = t1 + 4;  // 7

The question is whether a new value of 'x' or 'w' could be made visible by the atomic_operation. An acquire operation may force a synchronization which could bring a new value for 'x' or 'y', so this would be invalid across an acquire operation. It would be valid across a release operation.

Constant Propagation

The same logic for value numbering would apply to constant propagation.

Forward Propagation

The same logic for value numbering would apply to any other forward propagation.

Value Range Propagation

It would also seem that VRP would have similar logic as the other propagation optimizations. An acquire could result in the variable getting a synchronized value, and a release would have no synchronization effect.

Summary

Modes where optimizations are allowed to be performed:

Optimization

seq-cst

rel-acq

release

acquire/consume

relaxed

DSE

---

---

---

allow

allow

DCE

allow

allow

allow

allow

allow

PRE/FRE

---

---

allow

---

allow

CSE

---

---

allow

---

allow

store sinking

---

---

---

allow

allow

reassociation *

---

---

varies

varies

varies

code hoisting

---

---

allow

---

allow

value numbering

---

---

allow

---

allow

constant/forward propagation

---

---

allow

---

allow

VRP

---

---

allow

---

allow

* reassociation requires much closer attention since it performs both hoisting and sinking type operations at various times. Initially it will simply be disallowed across an atomic operation.

Enhancement Requests

This section is a collection of enhancement requests filed by users in Bugzilla. Rather than copying them here, it might be more convenient to create a meta-bug in Bugzilla and collect them there.

Bug

Summary

49244

__sync or __atomic builtins will not emit 'lock bts/btr/btc'

69130

explicit atomic ops treating non-constant memory orders as memory_order_seq_cst

None: Atomic/GCCMM/Optimizations/Details (last edited 2016-04-15 20:14:46 by MartinSebor)