This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/42720] New: Empty loop generated at unswitch-loops with -O2 -fprofile-use
- From: "jingyu at google dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 13 Jan 2010 02:47:25 -0000
- Subject: [Bug tree-optimization/42720] New: Empty loop generated at unswitch-loops with -O2 -fprofile-use
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
The bug is triggered with -O2 -fprofile-use.
test case, loop.cpp:
int fun_b(int hbs[], int num, void *obj) {
int i;
int s = 0;
for (i = 0; i < num; i++) {
if (obj != 0) {
if ((int)obj - hbs[i] > 0) {
s += hbs[i];
}
}
}
return s;
}
int main () {
int i;
int s = 0;
int hbs[100];
for (i = 0; i < 100; ++i) {
hbs[i] = i * 2000 + 100000;
}
for (i = 0; i < 20; ++i) {
s += fun_b (hbs, 100, &hbs[i]);
}
return s;
}
Profile the program. Apparently the loop inside fun_b() is hot.
$arm-eabi-g++ loop.cpp -O2 -fprofile-use --save-temps -c -o loop.o
We we see an empty loop (.L5) if obj==0, in function fun_b.
_Z5fun_bPiiPv:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
cmp r1, #0
stmfd sp!, {r4, r5}
mov r3, r0
ble .L57
cmp r2, #0 <--- "if (obj != 0)" is moved out of loop
beq .L5
....
.L3:
ldmfd sp!, {r4, r5}
bx lr
.L5: ;; if (obj == 0), empty loop
add r2, r2, #1 ;;
cmp r2, r1 ;;
bne .L5 ;;
.L57:
mov r0, #0
b .L3
The empty loop (.L5) should have been eliminated. I have tested -O2 without
-fprofile-use, where the empty loop is gone.
I find that the root cause of the inefficiency of -O2 FDO is that during
unswitch-loops, the simplification of loop conditions is missed when FDO is on.
Let's say,
Version A: "-O2 -funswitch-loops", which does right thing.
Version B: "-O2 -fprofile-use". Version B generates an empty loop which should
be eliminated.
Before switch-loop pass, the loop (inner-most, hot) is
loop {
if (obj != 0) {
...
}
}
Both version A and version B perform one pass of unswitch-loop on this loop
body.
In function tree_unswitch_single_loop(),
after "nloop = tree_unswitch_loop (loop, bbs[i], cond)", the loop becomes
if (obj != 0) {
loop { <---- original copy of the loop
if (obj != 0) {
...
}
}
} else {
loop { <----- "nloop": a new copy of the loop
if (obj != 0) {
...
}
}
}
Then, right before the end of tree_unswitch_single_loop(), gcc recursively
calls itself on modified loops.
tree_unswitch_single_loop (nloop, num + 1);
>From here, Version A and Version B starts to perform differently.
For Version A ("-O2 -funswitch-loops"), gcc conditions looking for
unswitch-loop opportunity in the new loop "nloop".
It finds that the condition of the new loop can be simplified. Since obj is 0
when it comes to the new loop, gcc
replaces obj by 0. Thus the loop becomes
if (obj != 0) {
loop { <---- original copy of the loop
if (obj != 0) {
...
}
}
} else {
loop { <----- "nloop": a new copy of the loop
if (0 != 0) { <--- obj is replaced by "0"
...
}
}
}
Therefore, in the TODO pass cleanup-cfg, the "nloop" is entirely removed.
However, for Version B ("-O2 -fprofile-use"), gcc finds that the "nloop" is a
cold loop, so it returns immediately, without checking if the condition can be
simplified. Thus nloop is not cleaned up by the following cleanup-cfg pass and
results in an empty loop.
The problematic code in is unswitch_single_loop() in loop-unswitch.c.
static void
unswitch_single_loop(struct loop *loop, ...)
{ ...
/* Do not unswitch in cold areas. */
if (optimize_loop_for_size_p (loop))
{
dump
return;
}
...
do
{ ...
/* Check whether the result can be predicted. */
for (acond = cond_checked; acond; acond = XEXP (acond, 1))
simplify_using_condition (XEXP (acond, 0), &cond, NULL);
...
} while (repeat);
...
/* Unswitch the loop on this condition. */
nloop = unswitch_loop (loop, bbs[i], cond, cinsn);
...
/* Invoke itself on modified loops. */
unswitch_single_loop (nloop, rconds, num + 1);
unswitch_single_loop (loop, conds, num + 1);
...
}
To fix the empty loop problem, my thought is to propagate the conditions
immediately after nloop is inserted.
Any suggestion?
Thanks,
Jing
--
Summary: Empty loop generated at unswitch-loops with -O2 -
fprofile-use
Product: gcc
Version: 4.5.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: jingyu at google dot com
GCC build triplet: X86_64-linux-gnu
GCC host triplet: X86_64-linux-gnu
GCC target triplet: arm-unknown-eabi
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42720