[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity

Tue Apr 21 07:37:56 GMT 2020

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485

--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Joel Yliluoma from comment #13)
> GCC 4.1.2 is indicated in the bug report headers.
> Luckily, Compiler Explorer has a copy of that exact version, and it indeed
> vectorizes the second function: https://godbolt.org/z/DC_SSb
> 
> On my own system, the earliest I have is 4.6. The Compiler Explorer has 4.4,
> and it, or anything newer than that, no longer vectorizes either function.

Ah, OK - that's before GCC learned vectorization and is code-generated by
RTL expanding

  return {BIT_FIELD_REF <a, 128, 0> + BIT_FIELD_REF <b, 128, 0>};

so the only vector support was GCCs generic vectors (and intrinsics).  The
generated code is far from perfect though.  I also think llvms code
generation is bogus since it appears the ABI does not guarantee zeroed
upper elements of the xmm0 argument which means they could contain sNaNs:

typedef float ss2 __attribute__((vector_size(8)));
typedef float ss4 __attribute__((vector_size(16)));
ss2 add2(ss2 a, ss2 b);
void bar(ss4 a)
{
  volatile ss2 x;
  x = add2 ((ss2){a[0], a[1]}, (ss2){a[0], a[1]});
}

produces

bar:
.LFB1:  
        .cfi_startproc
        subq    $56, %rsp
        .cfi_def_cfa_offset 64
        movdqa  %xmm0, %xmm1
        call    add2
        movq    %xmm0, 24(%rsp)
        addq    $56, %rsp

which means we pass through 'a' unchanged.