This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: making the new if-converter not mangle IR that is already vectorizer-friendly
- From: Alan Lawrence <alan dot lawrence at arm dot com>
- To: Abe <abe_skolnik at yahoo dot com>
- Cc: "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>, Richard Biener <richard dot guenther at gmail dot com>, Sebastian Pop <sebpop at gmail dot com>
- Date: Fri, 03 Jul 2015 10:37:14 +0100
- Subject: Re: making the new if-converter not mangle IR that is already vectorizer-friendly
- Authentication-results: sourceware.org; auth=none
- References: <55946699 dot 4070803 at yahoo dot com> <559504BE dot 40207 at arm dot com> <5595AB07 dot 7040404 at yahoo dot com>
Abe wrote:
In other words, the problem about which I was concerned is not going to be triggered by e.g. "if (c) x = ..."
which lacks an attached "else x = ..." in a multithreaded program without enough locking just because 'x' is global/static.
The only remaining case to consider is if some code being compiler takes the address of something thread-local and then "gives"
that pointer to another thread. Even for _that_ extreme case, Sebastian says that the gimplifier will detect this
"address has been taken" situation and do the right thing such that the new if converter also does the right thing.
Great :). I don't understand much/anything about how gcc deals with
thread-locals, but everything before that, all sounds good...
[Alan wrote:]
Can you give an example?
The test cases in the GCC tree at "gcc.dg/vect/pr61194.c" and "gcc.dg/vect/vect-mask-load-1.c"
currently test as: the new if-converter is "converting" something that`s already vectorizer-friendly...
> [snip]
However, TTBOMK the vectorizer already "understands" that in cases where its input looks like:
x = c ? y : z;
... and 'y' and 'z' are both pure [side-effect-free] -- including, but not limited to, they must be non-"volatile" --
it may vectorize a loop containing code like the preceding, ignoring for this particular instance the C mandate
that only one of {y, z} be evaluated...
My understanding, is that any decision as to whether one or both of y or z is
evaluated (when 'evaluation' involves doing any work, e.g. a load), has already
been encoded into the gimple/tree IR. Thus, if we are to only evaluate one of
'y' or 'z' in your example, the IR will (prior to if-conversion), contain basic
blocks and control flow, that means we jump around the one that's not evaluated.
This appears to be the case in pr61194.c: prior to if-conversion, the IR for the
loop in barX is
<bb 3>:
# i_16 = PHI <i_13(7), 0(2)>
# ivtmp_21 = PHI <ivtmp_20(7), 1024(2)>
_5 = x[i_16];
_6 = _5 > 0.0;
_7 = w[i_16];
_8 = _7 < 0.0;
_9 = _6 & _8;
if (_9 != 0)
goto <bb 4>;
else
goto <bb 5>;
<bb 4>:
iftmp.0_10 = z[i_16];
goto <bb 6>;
<bb 5>:
iftmp.0_11 = y[i_16];
<bb 6>:
# iftmp.0_2 = PHI <iftmp.0_10(4), iftmp.0_11(5)>
z[i_16] = iftmp.0_2;
i_13 = i_16 + 1;
ivtmp_20 = ivtmp_21 - 1;
if (ivtmp_20 != 0)
goto <bb 7>;
else
goto <bb 8>;
<bb 7>:
goto <bb 3>;
which clearly contains (unvectorizable!) control flow. Without
-ftree-loop-if-convert-stores, if-conversion leaves this alone, and
vectorization fails (i.e. the vectorizer bails out because the loop has >2 basic
blocks). With -ftree-loop-if-convert-stores, if-conversion produces
<bb 3>:
# i_16 = PHI <i_13(4), 0(2)>
# ivtmp_21 = PHI <ivtmp_20(4), 1024(2)>
_5 = x[i_16];
_6 = _5 > 0.0;
_7 = w[i_16];
_8 = _7 < 0.0;
_9 = _6 & _8;
iftmp.0_10 = z[i_16]; // <== here
iftmp.0_11 = y[i_16]; // <== here
iftmp.0_2 = _9 ? iftmp.0_10 : iftmp.0_11;
z[i_16] = iftmp.0_2;
i_13 = i_16 + 1;
ivtmp_20 = ivtmp_21 - 1;
if (ivtmp_20 != 0)
goto <bb 4>;
else
goto <bb 5>;
<bb 4>:
goto <bb 3>;
where I have commented the conditional loads that have become unconditional.
(Hence, "-ftree-loop-if-convert-stores" looks misnamed - it affects how the
if-conversion phase converts loads, too - please correct me if I misunderstand
(Richard?) ?!) This contains no control flow, and so is vectorizable.
(This is all without your scratchpad patch, of course.) IOW this being
vectorized, or not, relies upon the preceding if-conversion phase removing the
control flow.
HTH
Alan