This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/56197] New: [SH] Use calculated jump address instead of using a jump table
- From: "olegendo at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Sun, 03 Feb 2013 21:59:40 +0000
- Subject: [Bug target/56197] New: [SH] Use calculated jump address instead of using a jump table
- Auto-submitted: auto-generated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56197
Bug #: 56197
Summary: [SH] Use calculated jump address instead of using a
jump table
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: target
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: olegendo@gcc.gnu.org
Target: sh*-*-*
I ran across this one while checking out PR 55146.
If there are a lot of cases in a switch and the length of the case blocks is
more or less constant, it can be beneficial to calculate the jump address and
eliminate the jump table. For example, code such as
int
test (int arg)
{
int rc;
switch (arg)
{
case 0:
asm ("nop\n\tnop\n\t"
"mov r4,%0"
: "=r" (rc)
: "r" (arg));
break;
case 1:
asm ("nop\n\tnop\n\t"
"mov r5,%0"
: "=r" (rc)
: "r" (arg));
break;
case 2:
asm ("nop\n\tnop\n\t"
"mov r6,%0"
: "=r" (rc)
: "r" (arg));
[...]
case 9:
asm ("nop\n\tnop\n\t"
"mov r7,%0"
: "=r" (rc)
: "r" (arg));
break;
}
return rc;
}
Compiled with -O2 results in:
_test:
mov #9,r1
cmp/hi r1,r4
bt .L2
mova .L4,r0
mov.b @(r0,r4),r4
add r0,r4
jmp @r4
nop
.align 2
.L4:
.byte .L3-.L4
.byte .L5-.L4
.byte .L6-.L4
.byte .L7-.L4
.byte .L8-.L4
.byte .L9-.L4
.byte .L10-.L4
.byte .L11-.L4
.byte .L12-.L4
.byte .L13-.L4
.align 1
.L13:
mov #9,r0
nop
nop
mov r7,r0
.align 2
.L2:
rts
nop
.align 1
.L12:
mov #8,r0
[...]
For a lot of cases, the jump table might become large and is likely to cause
data cache misses. The following might be better in that case (assuming that
the length of each case block is 16 bytes):
mov #9,r1
cmp/hi r1,r4
bt .L2
shll2 r4
shll2 r4
add #.Lcase_0 - .Lcase_default,r4
braf @r4
nop
.Lcase_default:
rts
nop
.align 4
.Lcase_0:
mov #0,r0
nop
nop
mov r4,r0
rts
nop
.align 4
.Lcase_1:
[...]
.align 4
.Lcase_9:
mov #0,r0
nop
nop
mov r7,r0
rts
nop
However, this requires the jump table to be sorted in ascending order and the
length of the case blocks should not vary too much.
Maybe this optimization could also be beneficial on other targets than SH. At
least PR 43462 looks somewhat related to it.