This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/79102] gcc fails to auto-vectorise the multiplicative reduction of an array of complex floats


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79102

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2017-01-17
          Component|c                           |tree-optimization
             Blocks|                            |53947
            Summary|gcc fails to auto-vectorise |gcc fails to auto-vectorise
                   |the product of an array of  |the multiplicative
                   |complex floats              |reduction of an array of
                   |                            |complex floats
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
The issue is we do not support reduction of _Complex vars.  The vectorizer sees

  <bb 3> [99.00%]:
  # i_16 = PHI <i_11(4), 0(2)>
  # p$real_13 = PHI <_21(4), 1.0e+0(2)>
  # p$imag_14 = PHI <_22(4), 0.0(2)>
  # ivtmp_48 = PHI <ivtmp_47(4), 128(2)>
  _1 = (long unsigned int) i_16;
  _2 = _1 * 8;
  _3 = x_9(D) + _2;
  _7 = REALPART_EXPR <*_3>;
  _12 = IMAGPART_EXPR <*_3>;
  _17 = _7 * p$real_13;
  _18 = _12 * p$imag_14;
  _19 = _7 * p$imag_14;
  _20 = _12 * p$real_13;
  _21 = _17 - _18;
  _22 = _19 + _20;
  i_11 = i_16 + 1;
  ivtmp_47 = ivtmp_48 - 1;
  if (ivtmp_47 != 0)
    goto <bb 4>; [98.99%]
  else
    goto <bb 5>; [1.01%]

  <bb 4> [98.00%]:
  goto <bb 3>; [100.00%]

  <bb 5> [1.00%]:
  # _50 = PHI <_21(3)>
  # _49 = PHI <_22(3)>
  p_10 = COMPLEX_EXPR <_50, _49>;
  return p_10;

which would need to be detected as a single reduction with two components.  I
guess not lowering complex operations would help here (with its own
complications of course).  Not sinking the COMPLEX_EXPR and having it as
loop carried dep would eventually help as well.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]