[Bug target/70341] [7/8/9 Regression] cost model for addresses is incorrect, slsr is using reg + reg + CST for arm

jakub at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Thu Feb 21 12:51:00 GMT 2019


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70341

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
With additional default: __builtin_unreachable (); this gets somewhat optimized 
into:
        add     r0, r0, r1
        cmp     r3, #3
        ldrls   pc, [pc, r3, asl #2]
        b       .L3
.L4:
        .word   .L7
        .word   .L6
        .word   .L5
        .word   .L3
.L5:
        ldr     r0, [r0, #8]
        b       handle_case_3
.L6:
        ldr     r0, [r0, #8]
        b       handle_case_2
.L7:
        ldr     r0, [r0, #8]
        b       handle_case_1
.L3:
        ldr     r0, [r0, #8]
        b       handle_case_4
so add r0, r0, r1 is hoisted by the jump2 pass, but strangely it doesn't happen
for the ldr r0, [r0, #8] instruction.
On x86_64-linux with -O2, we also get:
        movl    8(%rsi), %edi
        jmp     handle_case_2
...
        movl    8(%rsi), %edi
        jmp     handle_case_4
etc. without the default: and the jump2 pass hoists those movl 8(%rsi), %edi
instructions before the switch.
Guess the reason why jump2 doesn't hoist anything without default:
__builtin_unreachable (); is that the instruction(s) are not common to all the
paths, so it would be a speculative execution, which still is a win for -Os.
Though, on x86_64 for -Os and 5 such cases instead of 4 (for 4 there is no
switch, but a series of conditional branches) the RA actually does hoist it.


More information about the Gcc-bugs mailing list