This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/83326] [8 Regression] SPEC CPU2017 648.exchange2_s ~6% performance regression with r255267 (reproducer attached)
- From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 14 Dec 2017 10:13:38 +0000
- Subject: [Bug tree-optimization/83326] [8 Regression] SPEC CPU2017 648.exchange2_s ~6% performance regression with r255267 (reproducer attached)
- Auto-submitted: auto-generated
- References: <bug-83326-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83326
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
We no longer unroll the inner loops in cunrolli because cunrolli will leave us
with exit checks.
We fail to compute the number of iterations of the inner loop(s) (pre loop
header copying):
<bb 5> [local count: 21065692]:
L.5:
_3 = _1 + 1;
_53 = (integer(kind=8)) _3;
_4 = _1 + 2;
_54 = (integer(kind=8)) _4;
_55 = (integer(kind=8)) i1_25;
_5 = _55 * 81;
_56 = _5 + -91;
$3 = <basic_block 0x7ffff689a478 (5)>
(gdb) p debug_bb_n (7)
<bb 7> [local count: 63197075]:
_6 = S.0_27 * 9;
_57 = _6 + _56;
<bb 8> [local count: 189610187]:
# S.1_28 = PHI <_53(7), S.1_59(9)>
if (S.1_28 > _54)
goto <bb 10>; [33.33%]
else
goto <bb 9>; [66.67%]
$1 = <basic_block 0x7ffff689a5b0 (8)>
(gdb) p debug_bb_n (9)
<bb 9> [local count: 126413112]:
_7 = S.1_28 + _57;
_8 = test_array[_7];
_9 = _8 + -10;
test_array[_7] = _9;
S.1_59 = S.1_28 + 1;
goto <bb 8>; [100.00%]
this one being a bit difficult, but the other (but not as interesting(?)):
<bb 17> [local count: 119292717]:
L.14:
_14 = _1 + 1;
_69 = (integer(kind=8)) _14;
_15 = _1 + 2;
_70 = (integer(kind=8)) _15;
_71 = (integer(kind=8)) i2_26;
_16 = _71 * 81;
_72 = _16 + -91;
# S.4_31 = PHI <_69(19), S.4_75(21)>
if (S.4_31 > _70)
goto <bb 22>; [33.33%]
else
goto <bb 21>; [66.67%]
<bb 21> [local count: 715863674]:
_18 = S.4_31 + _73;
_19 = test_array[_18];
_20 = _19 + 10;
test_array[_18] = _20;
S.4_75 = S.4_31 + 1;
goto <bb 20>; [100.00%]
looks like it should be doable.
And indeed it is - we are just "confused" by the maybe_zero test. IMHO
we should allow constant zero or N iterations by performing the loop
header copying alongside the unrolling (leaving the first exit test
unremoved).
Testing a patch to do that.