[Bug target/70341] [7/8/9 Regression] cost model for addresses is incorrect, slsr is using reg + reg + CST for arm
jakub at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Thu Feb 21 12:51:00 GMT 2019
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70341
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
With additional default: __builtin_unreachable (); this gets somewhat optimized
into:
add r0, r0, r1
cmp r3, #3
ldrls pc, [pc, r3, asl #2]
b .L3
.L4:
.word .L7
.word .L6
.word .L5
.word .L3
.L5:
ldr r0, [r0, #8]
b handle_case_3
.L6:
ldr r0, [r0, #8]
b handle_case_2
.L7:
ldr r0, [r0, #8]
b handle_case_1
.L3:
ldr r0, [r0, #8]
b handle_case_4
so add r0, r0, r1 is hoisted by the jump2 pass, but strangely it doesn't happen
for the ldr r0, [r0, #8] instruction.
On x86_64-linux with -O2, we also get:
movl 8(%rsi), %edi
jmp handle_case_2
...
movl 8(%rsi), %edi
jmp handle_case_4
etc. without the default: and the jump2 pass hoists those movl 8(%rsi), %edi
instructions before the switch.
Guess the reason why jump2 doesn't hoist anything without default:
__builtin_unreachable (); is that the instruction(s) are not common to all the
paths, so it would be a speculative execution, which still is a win for -Os.
Though, on x86_64 for -Os and 5 such cases instead of 4 (for 4 there is no
switch, but a series of conditional branches) the RA actually does hoist it.
More information about the Gcc-bugs
mailing list