Bug 44249

Summary: [4.7 Regression] IRA generates extra register move
Product: gcc Reporter: H.J. Lu <hjl.tools>
Component: rtl-optimizationAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal CC: areg.melikadamyan, ebotcazou, gcc-bugs, vmakarov
Priority: P2 Keywords: missed-optimization, ra
Version: 4.6.0   
Target Milestone: 4.8.0   
Host: Target:
Build: Known to work: 4.8.0
Known to fail: 4.7.4 Last reconfirmed: 2010-11-24 15:14:39

Description H.J. Lu 2010-05-22 23:40:42 UTC
From PR 34653, for

---
extern unsigned long table[];

unsigned long foo(unsigned char *p) {
  unsigned long tag = *p;
  return table[tag >> 4] + table[tag & 0xf];
}
---

at -O2, IRA generates an extra register move:

gcc-4.3 with -O2 generates:

foo:
        movzbl  (%rdi), %edx
        movq    %rdx, %rax
        shrq    $4, %rdx
        andl    $15, %eax
        movq    table(,%rax,8), %rax
        addq    table(,%rdx,8), %rax
        ret

And gcc-4.4+ -O2 generates:

foo:
        movzbl  (%rdi), %eax
        movq    %rax, %rcx
>>	movq	%rax, %rdx
        andl    $15, %ecx
        shrq    $4, %rdx
        movq    table(,%rcx,8), %rax
        addq    table(,%rdx,8), %rax
        ret

Please note extra move.
Comment 1 Eric Botcazou 2010-11-24 15:14:39 UTC
There isn't just an extra move, the code is also different.  Are you sure that it results in inferior performances?
Comment 2 Vladimir Makarov 2010-11-24 17:40:56 UTC
Reload creates additional insn for insn

(insn 9 7 11 2 (parallel [
            (set (reg:DI 71)
                (lshiftrt:DI (reg/v:DI 60 [ tag ])
                    (const_int 4 [0x4])))
            (clobber (reg:CC 17 flags))
        ]) b.i:5 533 {*lshrdi3_1}
     (expr_list:REG_DEAD (reg/v:DI 60 [ tag ])
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))

That is because r60 and r71 got different registers (0 an 1) even
although there is a copy between r71 and r60 which should result in
getting r70 hard register 0 as r60 one.  It does not happen because
r68 already got 0 and it conflicts with r71:

    r71: preferred GENERAL_REGS, alternative NO_REGS, cover GENERAL_REGS
    r68: preferred AREG, alternative GENERAL_REGS, cover GENERAL_REGS
    r60: preferred GENERAL_REGS, alternative NO_REGS, cover GENERAL_REGS

;; a0(r68,l0) conflicts: a1(r71,l0)

;; a4(r67,l0) conflicts:  cp0:a1(r71)<->a3(r60)@1000:constraint

      Popping a0(r68,l0)  -- assign reg 0
      Popping a3(r60,l0)  -- assign reg 0
      Popping a1(r71,l0)  -- assign reg 1

Analogous insn for gcc-4.3 looks like

(insn:HI 9 7 11 2 b.i:4 (parallel [
            (set (reg/v:DI 58 [ tag ])
                (lshiftrt:DI (reg/v:DI 58 [ tag ])
                    (const_int 4 [0x4])))
            (clobber (reg:CC 17 flags))
        ]) 514 {*lshrdi3_1_rex64} (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil)))

It means there is no such problem as in gcc4.4+.

Insn 9 for gcc-4.3 is a result of regmove transformation.  I have no
idea why regmove (which is present in gcc4.4+) does not do the same
for gcc4.4+ (probably because of some changes since 4.3).

The problem could be fixed in regmove or in IRA (which is probably
harder).  But I don't know is it worth to do it.  Because such
transformations result in longer live ranges of pseudos and might
result in worse code for other programs.
Comment 3 Jakub Jelinek 2012-03-13 12:48:04 UTC
4.4 branch is being closed, moving to 4.5.4 target.
Comment 4 Richard Biener 2012-07-02 11:58:03 UTC
The 4.5 branch is being closed, adjusting target milestone.
Comment 5 Steven Bosscher 2012-10-09 21:01:08 UTC
No extra move with trunk today:

$ cat t.c
extern unsigned long table[];

unsigned long foo(unsigned char *p) {
    unsigned long tag = *p;
    return table[tag >> 4] + table[tag & 0xf];
}

$ cat t.s
        .file   "t.c"
        .text
        .p2align 4,,15
        .globl  foo
        .type   foo, @function
foo:
.LFB0:
        .cfi_startproc
        movzbl  (%rdi), %edx
        movq    %rdx, %rax
        shrq    $4, %rdx
        andl    $15, %eax
        movq    table(,%rax,8), %rax
        addq    table(,%rdx,8), %rax
        ret
        .cfi_endproc
.LFE0:
        .size   foo, .-foo
        .ident  "GCC: (GNU) 4.8.0 20121008 (experimental) \
[trunk revision 192219]"
        .section        .note.GNU-stack,"",@progbits
Comment 6 Jakub Jelinek 2013-04-12 15:17:03 UTC
GCC 4.6.4 has been released and the branch has been closed.
Comment 7 Richard Biener 2014-06-12 12:57:18 UTC
Fixed with LRA.
Comment 8 Richard Biener 2015-06-22 14:23:12 UTC
.