This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/66642] New: transform_to_exit_first_loop_alt doesn't use result of low iteration count loop
- From: "vries at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Tue, 23 Jun 2015 15:47:51 +0000
- Subject: [Bug tree-optimization/66642] New: transform_to_exit_first_loop_alt doesn't use result of low iteration count loop
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66642
Bug ID: 66642
Summary: transform_to_exit_first_loop_alt doesn't use result of
low iteration count loop
Product: gcc
Version: 6.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vries at gcc dot gnu.org
Target Milestone: ---
Created attachment 35831
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35831&action=edit
patch to produce test case
Using attached patch, we exercise the low iteration count loop generated by the
parloops pass.
The libgomp.c/parloops-exit-first-loop-alt-3.c testcase fails:
...
PASS: libgomp.c/parloops-exit-first-loop-alt-2.c (test for excess errors)
PASS: libgomp.c/parloops-exit-first-loop-alt-2.c execution test
PASS: libgomp.c/parloops-exit-first-loop-alt-3.c (test for excess errors)
FAIL: libgomp.c/parloops-exit-first-loop-alt-3.c execution test
PASS: libgomp.c/parloops-exit-first-loop-alt-4.c (test for excess errors)
PASS: libgomp.c/parloops-exit-first-loop-alt-4.c execution test
PASS: libgomp.c/parloops-exit-first-loop-alt.c (test for excess errors)
PASS: libgomp.c/parloops-exit-first-loop-alt.c execution test
...
The problem is the following.
Before transform_to_exit_first_loop, we have loop header bb4, loop latch bb6,
and loop exit bb5:
...
<bb 4>:
# sum_17 = PHI <1(11), sum_11(6)>
# ivtmp_24 = PHI <0(11), ivtmp_6(6)>
i_16 = (int) ivtmp_24;
_7 = (long unsigned int) i_16;
_8 = _7 * 4;
_9 = pretmp_23 + _8;
_10 = *_9;
sum_11 = _10 + sum_17;
i_12 = i_16 + 1;
i.1_3 = (unsigned int) i_12;
if (ivtmp_24 < _19)
goto <bb 6>;
else
goto <bb 5>;
<bb 5>:
# sum_20 = PHI <sum_11(4), sum_25(8)>
goto <bb 7>;
<bb 6>:
ivtmp_6 = ivtmp_24 + 1;
goto <bb 4>;
...
After transform_to_exit_first_loop, we still have loop header bb4 and loop
latch bb6, but the loop exit is now bb14:
...
<bb 4>:
# sum_27 = PHI <1(11), sum_11(6)>
# ivtmp_28 = PHI <0(11), ivtmp_6(6)>
if (ivtmp_28 < _19)
goto <bb 13>;
else
goto <bb 14>;
<bb 13>:
# sum_17 = PHI <sum_27(4)>
# ivtmp_24 = PHI <ivtmp_28(4)>
i_16 = (int) ivtmp_24;
_7 = (long unsigned int) i_16;
_8 = _7 * 4;
_9 = pretmp_23 + _8;
_10 = *_9;
sum_11 = _10 + sum_17;
i_12 = i_16 + 1;
i.1_3 = (unsigned int) i_12;
goto <bb 6>;
<bb 14>:
# sum_29 = PHI <sum_27(4)>
ivtmp_30 = _19;
i_31 = (int) ivtmp_30;
_32 = (long unsigned int) i_31;
_33 = _32 * 4;
_34 = pretmp_23 + _33;
_35 = *_34;
sum_36 = _35 + sum_29;
i_37 = i_31 + 1;
i.1_38 = (unsigned int) i_37;
<bb 5>:
# sum_20 = PHI <sum_36(14), sum_25(8)>
goto <bb 7>;
<bb 6>:
ivtmp_6 = ivtmp_24 + 1;
goto <bb 4>;
...
A bit later, separate_decls_in_region inserts a .paral_data_store based load in
the new exit block, assuming that the exit block has a single predecessor (the
loop header bb4):
...
<bb 14>:
.paral_data_load.11_42 = &.paral_data_store.10;
sum_29 = .paral_data_load.11_42->sum.7;
ivtmp_30 = _19;
i_31 = (int) ivtmp_30;
_32 = (long unsigned int) i_31;
_33 = _32 * 4;
_34 = pretmp_23 + _33;
_35 = *_34;
sum_36 = _35 + sum_29;
i_37 = i_31 + 1;
i.1_38 = (unsigned int) i_37;
...
However, with transform_to_exit_first_loop_alt we keep loop latch bb6 and loop
exit bb5, but we get a new loop header bb13:
...
<bb 11>:
goto <bb 13>;
<bb 4>:
# sum_17 = PHI <sum_27(13)>
# ivtmp_24 = PHI <ivtmp_28(13)>
i_16 = (int) ivtmp_24;
_7 = (long unsigned int) i_16;
_8 = _7 * 4;
_9 = pretmp_23 + _8;
_10 = *_9;
sum_11 = _10 + sum_17;
i_12 = i_16 + 1;
i.1_3 = (unsigned int) i_12;
goto <bb 6>;
<bb 13>:
# sum_27 = PHI <sum_11(6), 1(11)>
# ivtmp_28 = PHI <ivtmp_6(6), 0(11)>
if (ivtmp_28 < n_4(D))
goto <bb 4>;
else
goto <bb 5>;
<bb 5>:
# sum_20 = PHI <sum_27(13), sum_25(8)>
goto <bb 7>;
...
The loop exit bb5 is also reached from bb8, the loop header of the low
iteration count loop.
So when separate_decls_in_region inserts a .paral_data_store based load in the
exit block, it destroys the value coming from the low iteration count loop:
...
<bb 5>:
.paral_data_load.12_32 = &.paral_data_store.11;
sum_20 = .paral_data_load.12_32->sum.7;
goto <bb 7>;
...
The fix is probably to make sure that we split the exit edge during
transform_to_exit_first_loop_alt.