This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: making the new if-converter not mangle IR that is already vectorizer-friendly


Abe wrote:

In other words, the problem about which I was concerned is not going to be triggered by e.g. "if (c)  x = ..."
which lacks an attached "else  x = ..." in a multithreaded program without enough locking just because 'x' is global/static.

The only remaining case to consider is if some code being compiler takes the address of something thread-local and then "gives"
that pointer to another thread.  Even for _that_ extreme case, Sebastian says that the gimplifier will detect this
"address has been taken" situation and do the right thing such that the new if converter also does the right thing.

Great :). I don't understand much/anything about how gcc deals with thread-locals, but everything before that, all sounds good...

[Alan wrote:]

Can you give an example?

The test cases in the GCC tree at "gcc.dg/vect/pr61194.c" and "gcc.dg/vect/vect-mask-load-1.c"
currently test as: the new if-converter is "converting" something that`s already vectorizer-friendly...
> [snip]
However, TTBOMK the vectorizer already "understands" that in cases where its input looks like:

   x = c ? y : z;

... and 'y' and 'z' are both pure [side-effect-free] -- including, but not limited to, they must be non-"volatile" --
it may vectorize a loop containing code like the preceding, ignoring for this particular instance the C mandate
that only one of {y, z} be evaluated...

My understanding, is that any decision as to whether one or both of y or z is evaluated (when 'evaluation' involves doing any work, e.g. a load), has already been encoded into the gimple/tree IR. Thus, if we are to only evaluate one of 'y' or 'z' in your example, the IR will (prior to if-conversion), contain basic blocks and control flow, that means we jump around the one that's not evaluated.

This appears to be the case in pr61194.c: prior to if-conversion, the IR for the loop in barX is

 <bb 3>:
  # i_16 = PHI <i_13(7), 0(2)>
  # ivtmp_21 = PHI <ivtmp_20(7), 1024(2)>
  _5 = x[i_16];
  _6 = _5 > 0.0;
  _7 = w[i_16];
  _8 = _7 < 0.0;
  _9 = _6 & _8;
  if (_9 != 0)
    goto <bb 4>;
  else
    goto <bb 5>;

  <bb 4>:
  iftmp.0_10 = z[i_16];
  goto <bb 6>;

  <bb 5>:
  iftmp.0_11 = y[i_16];

  <bb 6>:
  # iftmp.0_2 = PHI <iftmp.0_10(4), iftmp.0_11(5)>
  z[i_16] = iftmp.0_2;
  i_13 = i_16 + 1;
  ivtmp_20 = ivtmp_21 - 1;
  if (ivtmp_20 != 0)
    goto <bb 7>;
  else
    goto <bb 8>;

  <bb 7>:
  goto <bb 3>;

which clearly contains (unvectorizable!) control flow. Without -ftree-loop-if-convert-stores, if-conversion leaves this alone, and vectorization fails (i.e. the vectorizer bails out because the loop has >2 basic blocks). With -ftree-loop-if-convert-stores, if-conversion produces

 <bb 3>:
  # i_16 = PHI <i_13(4), 0(2)>
  # ivtmp_21 = PHI <ivtmp_20(4), 1024(2)>
  _5 = x[i_16];
  _6 = _5 > 0.0;
  _7 = w[i_16];
  _8 = _7 < 0.0;
  _9 = _6 & _8;
  iftmp.0_10 = z[i_16]; // <== here
  iftmp.0_11 = y[i_16]; // <== here
  iftmp.0_2 = _9 ? iftmp.0_10 : iftmp.0_11;
  z[i_16] = iftmp.0_2;
  i_13 = i_16 + 1;
  ivtmp_20 = ivtmp_21 - 1;
  if (ivtmp_20 != 0)
    goto <bb 4>;
  else
    goto <bb 5>;

  <bb 4>:
  goto <bb 3>;

where I have commented the conditional loads that have become unconditional. (Hence, "-ftree-loop-if-convert-stores" looks misnamed - it affects how the if-conversion phase converts loads, too - please correct me if I misunderstand (Richard?) ?!) This contains no control flow, and so is vectorizable.

(This is all without your scratchpad patch, of course.) IOW this being vectorized, or not, relies upon the preceding if-conversion phase removing the control flow.

HTH
Alan


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]