[Bug target/89822] self mov on x86_64 and not optimized-out sub on ARM/ARM64 in a jump table switch

Tue Aug 3 01:32:31 GMT 2021

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89822

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |INVALID
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I don't see anything wrong with what is currently done.

aarch64 cost for a jump table is very high which causes the jump table not be
generated.  indirect jumps on some/most aarch64 cores are not very predictable
so GCC tries to avoid them.

Note with clang, the x86_64 code has:
addl    $-1, %edi
Which is also a zero extend.
GCC is not subtracting one as it was trying to avoid an instruction.
If change the argument type to long, gcc will not produce the zero_extend and
produce better code than clang (the table size one element bigger but does that
matter I doubt it).

If you add 100 to each of the case statements (and change the type to long),
gcc still produces better code than clang:
        jmp     *.L4-808(,%rdi,8)
vs
        addq    $-101, %rdi
        jmpq    *.LJTI0_0(,%rdi,8)