This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/56197] New: [SH] Use calculated jump address instead of using a jump table


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56197

             Bug #: 56197
           Summary: [SH] Use calculated jump address instead of using a
                    jump table
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: target
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: olegendo@gcc.gnu.org
            Target: sh*-*-*


I ran across this one while checking out PR 55146.
If there are a lot of cases in a switch and the length of the case blocks is
more or less constant, it can be beneficial to calculate the jump address and
eliminate the jump table.  For example, code such as

int
test (int arg)
{
  int rc;
  switch (arg)
    {
    case 0:
      asm ("nop\n\tnop\n\t"
           "mov r4,%0"
           : "=r" (rc)
           : "r" (arg));
      break;
    case 1:
      asm ("nop\n\tnop\n\t"
           "mov r5,%0"
           : "=r" (rc)
           : "r" (arg));
      break;
    case 2:
      asm ("nop\n\tnop\n\t"
           "mov r6,%0"
           : "=r" (rc)
           : "r" (arg));

    [...]

    case 9:
      asm ("nop\n\tnop\n\t"
           "mov r7,%0"
           : "=r" (rc)
           : "r" (arg));
      break;
    }
  return rc;
}


Compiled with -O2 results in:

_test:
        mov     #9,r1
        cmp/hi  r1,r4
        bt      .L2
        mova    .L4,r0
        mov.b   @(r0,r4),r4
        add     r0,r4
        jmp     @r4
        nop
        .align 2
.L4:
        .byte   .L3-.L4
        .byte   .L5-.L4
        .byte   .L6-.L4
        .byte   .L7-.L4
        .byte   .L8-.L4
        .byte   .L9-.L4
        .byte   .L10-.L4
        .byte   .L11-.L4
        .byte   .L12-.L4
        .byte   .L13-.L4
        .align 1
.L13:
        mov     #9,r0
        nop
        nop
        mov r7,r0
        .align 2
.L2:
        rts    
        nop
        .align 1
.L12:
        mov     #8,r0

        [...]

For a lot of cases, the jump table might become large and is likely to cause
data cache misses.  The following might be better in that case (assuming that
the length of each case block is 16 bytes):

        mov     #9,r1
        cmp/hi  r1,r4
        bt      .L2
        shll2   r4
        shll2   r4
        add     #.Lcase_0 - .Lcase_default,r4
        braf    @r4
        nop

.Lcase_default:
        rts
        nop

        .align 4
.Lcase_0:
        mov     #0,r0
        nop
        nop
        mov     r4,r0
        rts    
        nop

        .align 4
.Lcase_1:

        [...]

        .align 4
.Lcase_9:
        mov     #0,r0
        nop
        nop
        mov     r7,r0
        rts    
        nop

However, this requires the jump table to be sorted in ascending order and the
length of the case blocks should not vary too much.

Maybe this optimization could also be beneficial on other targets than SH.  At
least PR 43462 looks somewhat related to it.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]