[Bug tree-optimization/107247] New: SLP reduction results fail to reduce to a single accumulator
rguenth at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Thu Oct 13 12:37:56 GMT 2022
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107247
Bug ID: 107247
Summary: SLP reduction results fail to reduce to a single
accumulator
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: rguenth at gcc dot gnu.org
Target Milestone: ---
float fl[128];
int x[128];
float
foo (int n1)
{
float sum0, sum1, sum2, sum3;
sum0 = sum1 = sum2 = sum3 = 0.0f;
int n = (n1 / 4) * 4;
for (int i = 0; i < n; i += 4)
{
sum0 += fabs (fl[i]);
sum1 += fabs (fl[i + 1]);
sum2 += fabs (fl[i + 2]);
sum3 += fabs (fl[i + 3]);
x[i] = 1;
}
return sum0 + sum1 + sum2 + sum3;
}
shows how we fail to reduce the SLP reduction accumulators to a single one
before extracting the elements:
<bb 3> [local count: 567644343]:
# sum0_37 = PHI <sum0_29(7), 0.0(9)>
# sum1_39 = PHI <sum1_30(7), 0.0(9)>
# sum2_41 = PHI <sum2_31(7), 0.0(9)>
# sum3_43 = PHI <sum3_32(7), 0.0(9)>
# i_45 = PHI <i_34(7), 0(9)>
# vectp_fl.8_89 = PHI <vectp_fl.8_90(7), &fl(9)>
# vect_sum3_43.15_102 = PHI <vect_sum3_32.16_106(7), { 0.0, 0.0, 0.0, 0.0
}(9)>
# vect_sum3_43.15_103 = PHI <vect_sum3_32.16_107(7), { 0.0, 0.0, 0.0, 0.0
}(9)>
# vect_sum3_43.15_104 = PHI <vect_sum3_32.16_108(7), { 0.0, 0.0, 0.0, 0.0
}(9)>
# vect_sum3_43.15_105 = PHI <vect_sum3_32.16_109(7), { 0.0, 0.0, 0.0, 0.0
}(9)>
...
vect__12.14_98 = ABS_EXPR <vect__11.10_91>;
vect__12.14_99 = ABS_EXPR <vect__11.11_93>;
vect__12.14_100 = ABS_EXPR <vect__11.12_95>;
vect__12.14_101 = ABS_EXPR <vect__11.13_97>;
vect_sum3_32.16_106 = vect__12.14_98 + vect_sum3_43.15_102;
vect_sum3_32.16_107 = vect__12.14_99 + vect_sum3_43.15_103;
vect_sum3_32.16_108 = vect__12.14_100 + vect_sum3_43.15_104;
vect_sum3_32.16_109 = vect__12.14_101 + vect_sum3_43.15_105;
...
<bb 11> [local count: 94607391]:
# sum0_48 = PHI <sum0_29(3)>
# sum1_36 = PHI <sum1_30(3)>
# sum2_35 = PHI <sum2_31(3)>
# sum3_24 = PHI <sum3_32(3)>
# vect_sum3_32.16_110 = PHI <vect_sum3_32.16_106(3)>
# vect_sum3_32.16_111 = PHI <vect_sum3_32.16_107(3)>
# vect_sum3_32.16_112 = PHI <vect_sum3_32.16_108(3)>
# vect_sum3_32.16_113 = PHI <vect_sum3_32.16_109(3)>
_114 = BIT_FIELD_REF <vect_sum3_32.16_110, 32, 0>;
_115 = BIT_FIELD_REF <vect_sum3_32.16_110, 32, 32>;
_116 = BIT_FIELD_REF <vect_sum3_32.16_110, 32, 64>;
_117 = BIT_FIELD_REF <vect_sum3_32.16_110, 32, 96>;
_118 = BIT_FIELD_REF <vect_sum3_32.16_111, 32, 0>;
_119 = BIT_FIELD_REF <vect_sum3_32.16_111, 32, 32>;
_120 = BIT_FIELD_REF <vect_sum3_32.16_111, 32, 64>;
_121 = BIT_FIELD_REF <vect_sum3_32.16_111, 32, 96>;
_122 = BIT_FIELD_REF <vect_sum3_32.16_112, 32, 0>;
_123 = BIT_FIELD_REF <vect_sum3_32.16_112, 32, 32>;
_124 = BIT_FIELD_REF <vect_sum3_32.16_112, 32, 64>;
_125 = BIT_FIELD_REF <vect_sum3_32.16_112, 32, 96>;
_126 = BIT_FIELD_REF <vect_sum3_32.16_113, 32, 0>;
_127 = BIT_FIELD_REF <vect_sum3_32.16_113, 32, 32>;
_128 = BIT_FIELD_REF <vect_sum3_32.16_113, 32, 64>;
_129 = BIT_FIELD_REF <vect_sum3_32.16_113, 32, 96>;
_130 = _114 + _118;
_131 = _115 + _119;
_132 = _116 + _120;
_133 = _117 + _121;
_134 = _130 + _122;
_135 = _131 + _123;
_136 = _132 + _124;
_137 = _133 + _125;
_138 = _134 + _126;
_139 = _135 + _127;
_140 = _136 + _128;
_141 = _137 + _129;
...
instead of doing vector adds and a single series of extracts.
More information about the Gcc-bugs
mailing list