[Bug tree-optimization/80155] [7/8 regression] Performance regression with code hoisting enabled
prathamesh3492 at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Wed Oct 11 19:05:00 GMT 2017
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80155
--- Comment #33 from prathamesh3492 at gcc dot gnu.org ---
Created attachment 42341
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42341&action=edit
Test-case to reproduce regression with cortex-m7
I have attached an artificial test-case that is fairly representative of the
regression we are seeing in a benchmark. The test-case mimics a deterministic
finite automaton. With code-hoisting there's an additional spill of r5 near
beginning of the function.
Looking at the loop from the attached test-case:
for (; *a && b != 'z'; a++)
{
next = *a;
if (next == ',')
{
a++;
break;
}
switch (b) { ... }
}
The for loop has same computation a++ in two sibling basic blocks,
which gets hoisted.
From PRE dump with code-hoisting:
<bb 26> [23.80%] [count: INV]:
# _25 = PHI <_151(25), _23(2)>
# b_50 = PHI <b_152(25), 97(2)>
# a_55 = PHI <a_153(25), a_28(2)>
next_29 = (int) _25;
_44 = a_55 + 1;
if (next_29 == 44)
goto <bb 27>; [5.00%] [count: INV]
else
goto <bb 12>; [95.00%] [count: INV]
(a+1) seems to get hoisted in bb26:
_44 = a_55 + 1
just before
if (next_29 == 44) which corresponds to if (next == ',') condition.
The issue I think is that there is a use of 'a' near end of function:
*s = a;
which possibly results in register pressure forcing the compiler to spill r5.
Commenting out the assignment removes the spill.
Looking at register allocation with code-hoisting, it seems r2 is used
to hold the hoisted value (a + 1):
r0 = s
r1 = tab
r3 = a
r4 = b
r5 = *a
r2 = r3 + 1 (holding the hoisted value)
And without code-hoisting, it seems only r3 is assigned to 'a'.
r0 = s
r1 = tab
r2 = b
r3 = a
r4 = *a
This is evident from asm differences for the early-exit code-path:
if (next == ',')
{
a++;
break;
}
<breaks to>:
*s = a;
return b;
Without code-hoisting:
.L2:
cmp r4, #44
beq .L4
.L4:
adds r3, r3, #1
ldr r4, [sp], #4
str r3, [r0]
mov r0, r2
bx lr
With code-hoisting:
.L2:
cmp r5, #44
add r2, r3, #1
beq .L3
.L3:
str r2, [r0]
mov r0, r4
pop {r4, r5}
bx lr
Without code-hoisting it is reusing r3 to store a + 1, while due to code
hoisting it uses the extra register 'r2' to store the value of hoisted
expression a + 1.
Would it be a good idea to somehow "limit" the distance (in terms of number of
basic blocks maybe?) between the definition of hoisted variable and it's
furthest use during PRE ? If that exceeds a certain threshold then PRE should
choose not to hoist that expression. The threshold could be a param that can be
set by backends.
Does this analysis look reasonable ?
Thanks,
Prathamesh
More information about the Gcc-bugs
mailing list