Bug 89822 - self mov on x86_64 and not optimized-out sub on ARM/ARM64 in a jump table switch
Summary: self mov on x86_64 and not optimized-out sub on ARM/ARM64 in a jump table switch
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 9.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2019-03-26 01:24 UTC by Nikita Kniazev
Modified: 2021-08-03 01:34 UTC (History)
0 users

See Also:
Host: x86_64
Target: x86_64-*-* i?86-*-* arm aarch64
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
A reproducer (206 bytes, text/plain)
2019-03-26 01:24 UTC, Nikita Kniazev
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nikita Kniazev 2019-03-26 01:24:21 UTC
Created attachment 46020 [details]
A reproducer

A simple switch that will be generated as a jump table:

int f1();
int f2();
int f3();
int f4();
int f5();

int foo(int i)
{
    switch (i) {
        case 1: return f1();
        case 2: return f2();
        case 3: return f3();
        case 4: return f4();
        case 5: return f5();
    }
    __builtin_unreachable();
}

Compiles into (first two rows):

i686:
  movl 4(%esp), %eax
  jmp *.L4(,%eax,4)

x86_64:
  movl %edi, %edi
  jmp *.L4(,%rdi,8)

ARM:
  sub r0, r0, #1
  cmp r0, #16

ARM64:
  sub w0, w0, #1
  cmp w0, 16


I am not sure why on ARM there is even cmp+bls. https://godbolt.org/z/hi66cD


Possibly a useful info:
GCC  x86_64
4.1  mov %edi, %eax
4.4  mov %edi, %edi
4.6  movl %edi, %edi
4.8  bogus jump became jump to ret
8.1  jump to ret removed, but self mov is still there
Comment 1 Uroš Bizjak 2019-03-26 07:26:20 UTC
(In reply to Nikita Kniazev from comment #0)
> 8.1  jump to ret removed, but self mov is still there
It's not a self move, but zero extend.

        movl    %edi, %edi      # 6     [c=1 l=2]  *zero_extendsidi2/3
Comment 2 Andrew Pinski 2021-08-03 01:32:31 UTC
I don't see anything wrong with what is currently done.

aarch64 cost for a jump table is very high which causes the jump table not be generated.  indirect jumps on some/most aarch64 cores are not very predictable so GCC tries to avoid them.

Note with clang, the x86_64 code has:
addl    $-1, %edi
Which is also a zero extend.
GCC is not subtracting one as it was trying to avoid an instruction.
If change the argument type to long, gcc will not produce the zero_extend and produce better code than clang (the table size one element bigger but does that matter I doubt it).

If you add 100 to each of the case statements (and change the type to long), gcc still produces better code than clang:
        jmp     *.L4-808(,%rdi,8)
vs
        addq    $-101, %rdi
        jmpq    *.LJTI0_0(,%rdi,8)
Comment 3 Andrew Pinski 2021-08-03 01:34:30 UTC
Oh the sub issue for aarch64 is solved in GCC 7+.