[Bug c++/81366] pragma omp simd reduce(max:m) not vectorizing

jakub at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Mon Jul 10 12:52:00 GMT 2017


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81366

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2017-07-10
                 CC|                            |jakub at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
FRE or similar are unable to sufficiently optimize the mess added because of
the std::max references.
If one uses
m = x[i] < m ? m : x[i];
instead of
m = std::max(x[i], m);
in the loop, then before ifcvt we see:
  _21 = GOMP_SIMD_LANE (simduid.1_12(D));
  _1 = (long unsigned int) i_35;
  _2 = _1 * 8;
  _3 = x_22(D) + _2;
  _4 = *_3;
  _23 = D.2907[_21];
  if (_4 < _23)
    goto <bb 8>; [50.00%] [count: INV]
  else
    goto <bb 9>; [50.00%] [count: INV]

  <bb 8> [42.50%] [count: INV]:

  <bb 9> [85.00%] [count: INV]:
  # iftmp.0_6 = PHI <_23(8), _4(7)>
  D.2907[_21] = iftmp.0_6;
  i_25 = i_35 + 1;
  if (n_15(D) > i_25)
and that can be handled just fine.
But with std::max we have:
  # i_38 = PHI <0(6), i_25(10)>
  _20 = GOMP_SIMD_LANE (simduid.2_12(D));
  _1 = (long unsigned int) i_38;
  _2 = _1 * 8;
  _3 = x_21(D) + _2;
  _23 = MEM[(const double &)_3];
  _28 = MEM[(const double &)&D.2904][_20];
  if (_23 < _28)
    goto <bb 8>; [50.00%] [count: INV]
  else
    goto <bb 9>; [50.00%] [count: INV]

  <bb 8> [42.50%] [count: INV]:
  _42 = (sizetype) _20;
  _7 = _42 * 8;
  _22 = &D.2904 + _7;
  pretmp_50 = MEM[(const double &)_22];

  <bb 9> [85.00%] [count: INV]:
  # prephitmp_51 = PHI <_23(7), pretmp_50(8)>
  D.2904[_20] = prephitmp_51;
  i_25 = i_38 + 1;
  if (n_14(D) > i_25)
It figured out that one of the prephitmp_51 arguments is _23, but not that
pretmp_50 = _28;
and thus it wants to generate a masked load.


More information about the Gcc-bugs mailing list