[Bug rtl-optimization/50567] New: Reload pass generates sub-optimal spill code for registers in presence of a vec_concat insn

Thu Sep 29 15:25:00 GMT 2011

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50567

             Bug #: 50567
           Summary: Reload pass generates sub-optimal spill code for
                    registers in presence of a vec_concat insn
    Classification: Unclassified
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: siddhesh.poyarekar@gmail.com

Reduced program:

typedef long long __m128i __attribute__ ((__vector_size__ (16)));

__m128i process(char *mem1, char *mem2)
{       
        long long frag1, frag2;

        frag2 = frag1 = *((long long *) mem1);

        if (mem2 > mem1)
                frag2 = *((long long *) mem2);

        return (__m128i){frag2, frag1};
}       

Generates redundant spills during the reload pass. IRA does not spill anything:

process:
.LFB0:
        .cfi_startproc
        movq    (%rdi), %rax
        cmpq    %rsi, %rdi
        movq    %rax, %rdx
        jae     .L2
        movq    (%rsi), %rdx
.L2:
        movq    %rdx, -16(%rsp)        <== here onwards
        movq    -16(%rsp), %xmm1
        pinsrq  $1, %rax, %xmm1
        movdqa  %xmm1, %xmm0
        ret

This seems to happen because the pinsrq instruction (the vec_concat
implementation for x86_64) takes an SSE register for in and out and due to
this, the reload pass generates the spill code to move %rdx to %xmm1 as well as
the move from %xmm1 to %xmm0.

Ideally, the code generated should look like this:

process:
.LFB0:
        .cfi_startproc
        movq    (%rdi), %rax
        cmpq    %rsi, %rdi
        movq    %rax, %rdx
        jae     .L2
        movq    (%rsi), %rdx
.L2:
        movq    %rdx, %xmm0
        pinsrq  $1, %rax, %xmm0
        ret