This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/79336] Poor vectorisation of additive reduction of complex array, final SLP reduction step inefficient
- From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 02 Feb 2017 12:05:57 +0000
- Subject: [Bug tree-optimization/79336] Poor vectorisation of additive reduction of complex array, final SLP reduction step inefficient
- Auto-submitted: auto-generated
- References: <bug-79336-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79336
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Status|UNCONFIRMED |NEW
Last reconfirmed| |2017-02-02
Component|c |tree-optimization
Blocks| |53947
Summary|Poor vectorisation of |Poor vectorisation of
|additive reduction of |additive reduction of
|complex array |complex array, final SLP
| |reduction step inefficient
Ever confirmed|0 |1
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed. The reduction loop itself is fine, it is the final reduction step
involving the SLP reduction result (we reduce two scalars) that is handled
less than optimally:
<bb 3> [96.97%]:
# i_16 = PHI <i_11(4), 0(2)>
# p$real_13 = PHI <_17(4), 1.0e+0(2)>
# p$imag_14 = PHI <_18(4), 0.0(2)>
# ivtmp_34 = PHI <ivtmp_33(4), 32(2)>
_1 = (long unsigned int) i_16;
_2 = _1 * 8;
_3 = x_9(D) + _2;
_7 = REALPART_EXPR <*_3>;
_12 = IMAGPART_EXPR <*_3>;
_17 = _7 + p$real_13;
_18 = _12 + p$imag_14;
i_11 = i_16 + 1;
ivtmp_33 = ivtmp_34 - 1;
if (ivtmp_33 != 0)
goto <bb 4>; [96.88%]
else
goto <bb 5>; [3.12%]
<bb 4> [93.94%]:
goto <bb 3>; [100.00%]
<bb 5> [3.03%]:
# _36 = PHI <_17(3)>
# _35 = PHI <_18(3)>
p_10 = COMPLEX_EXPR <_36, _35>;
here we simply try to first produce _36 and _35 from the vectorized reduction
result and then build the complex function result:
<bb 5> [3.03%]:
# _36 = PHI <_17(3)>
# _35 = PHI <_18(3)>
# vect__17.8_22 = PHI <vect__17.8_24(3)>
stmp__17.9_21 = BIT_FIELD_REF <vect__17.8_22, 32, 0>;
stmp__17.9_20 = BIT_FIELD_REF <vect__17.8_22, 32, 32>;
stmp__17.9_19 = BIT_FIELD_REF <vect__17.8_22, 32, 64>;
stmp__17.9_15 = BIT_FIELD_REF <vect__17.8_22, 32, 96>;
stmp__17.9_6 = BIT_FIELD_REF <vect__17.8_22, 32, 128>;
stmp__17.9_5 = BIT_FIELD_REF <vect__17.8_22, 32, 160>;
stmp__17.9_4 = BIT_FIELD_REF <vect__17.8_22, 32, 192>;
stmp__17.9_29 = BIT_FIELD_REF <vect__17.8_22, 32, 224>;
stmp__17.9_28 = stmp__17.9_21 + stmp__17.9_19;
stmp__17.9_27 = stmp__17.9_20 + stmp__17.9_15;
stmp__17.9_26 = stmp__17.9_28 + stmp__17.9_6;
stmp__17.9_37 = stmp__17.9_27 + stmp__17.9_5;
stmp__17.9_38 = stmp__17.9_26 + stmp__17.9_4;
stmp__17.9_39 = stmp__17.9_37 + stmp__17.9_29;
p_10 = COMPLEX_EXPR <stmp__17.9_38, stmp__17.9_39>;
return p_10;
this doesn't take advantage from the fact that we can do this kind
final SLP reduction more efficiently (didn't try to decipher exactly
what ICC does here). It may require ABI details or knowing that
we can type-pun a vector to a complex... (but only for complex float,
for complex double the ABI doesn't work out this way!)
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations