[Bug target/89822] self mov on x86_64 and not optimized-out sub on ARM/ARM64 in a jump table switch
pinskia at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Tue Aug 3 01:32:31 GMT 2021
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89822
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |INVALID
Status|UNCONFIRMED |RESOLVED
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I don't see anything wrong with what is currently done.
aarch64 cost for a jump table is very high which causes the jump table not be
generated. indirect jumps on some/most aarch64 cores are not very predictable
so GCC tries to avoid them.
Note with clang, the x86_64 code has:
addl $-1, %edi
Which is also a zero extend.
GCC is not subtracting one as it was trying to avoid an instruction.
If change the argument type to long, gcc will not produce the zero_extend and
produce better code than clang (the table size one element bigger but does that
matter I doubt it).
If you add 100 to each of the case statements (and change the type to long),
gcc still produces better code than clang:
jmp *.L4-808(,%rdi,8)
vs
addq $-101, %rdi
jmpq *.LJTI0_0(,%rdi,8)
More information about the Gcc-bugs
mailing list