89822 – self mov on x86_64 and not optimized-out sub on ARM/ARM64 in a jump table switch

Bug 89822 - self mov on x86_64 and not optimized-out sub on ARM/ARM64 in a jump table switch

Summary: self mov on x86_64 and not optimized-out sub on ARM/ARM64 in a jump table switch

Status:	RESOLVED INVALID

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	9.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2019-03-26 01:24 UTC by Nikita Kniazev
Modified:	2021-08-03 01:34 UTC (History)
CC List:	0 users

See Also:
Host:	x86_64
Target:	x86_64-- i?86-- arm aarch64
Build:
Known to work:
Known to fail:
Last reconfirmed:

Attachments
A reproducer (206 bytes, text/plain) 2019-03-26 01:24 UTC, Nikita Kniazev	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Nikita Kniazev 2019-03-26 01:24:21 UTC

Created attachment 46020 [details]
A reproducer

A simple switch that will be generated as a jump table:

int f1();
int f2();
int f3();
int f4();
int f5();

int foo(int i)
{
    switch (i) {
        case 1: return f1();
        case 2: return f2();
        case 3: return f3();
        case 4: return f4();
        case 5: return f5();
    }
    __builtin_unreachable();
}

Compiles into (first two rows):

i686:
  movl 4(%esp), %eax
  jmp *.L4(,%eax,4)

x86_64:
  movl %edi, %edi
  jmp *.L4(,%rdi,8)

ARM:
  sub r0, r0, #1
  cmp r0, #16

ARM64:
  sub w0, w0, #1
  cmp w0, 16


I am not sure why on ARM there is even cmp+bls. https://godbolt.org/z/hi66cD


Possibly a useful info:
GCC  x86_64
4.1  mov %edi, %eax
4.4  mov %edi, %edi
4.6  movl %edi, %edi
4.8  bogus jump became jump to ret
8.1  jump to ret removed, but self mov is still there

Comment 1 Uroš Bizjak 2019-03-26 07:26:20 UTC

(In reply to Nikita Kniazev from comment #0)
> 8.1  jump to ret removed, but self mov is still there
It's not a self move, but zero extend.

        movl    %edi, %edi      # 6     [c=1 l=2]  *zero_extendsidi2/3

Comment 2 Andrew Pinski 2021-08-03 01:32:31 UTC

I don't see anything wrong with what is currently done.

aarch64 cost for a jump table is very high which causes the jump table not be generated.  indirect jumps on some/most aarch64 cores are not very predictable so GCC tries to avoid them.

Note with clang, the x86_64 code has:
addl    $-1, %edi
Which is also a zero extend.
GCC is not subtracting one as it was trying to avoid an instruction.
If change the argument type to long, gcc will not produce the zero_extend and produce better code than clang (the table size one element bigger but does that matter I doubt it).

If you add 100 to each of the case statements (and change the type to long), gcc still produces better code than clang:
        jmp     *.L4-808(,%rdi,8)
vs
        addq    $-101, %rdi
        jmpq    *.LJTI0_0(,%rdi,8)

Comment 3 Andrew Pinski 2021-08-03 01:34:30 UTC

Oh the sub issue for aarch64 is solved in GCC 7+.