[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity
rguenth at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Tue Apr 21 07:37:56 GMT 2020
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485
--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Joel Yliluoma from comment #13)
> GCC 4.1.2 is indicated in the bug report headers.
> Luckily, Compiler Explorer has a copy of that exact version, and it indeed
> vectorizes the second function: https://godbolt.org/z/DC_SSb
>
> On my own system, the earliest I have is 4.6. The Compiler Explorer has 4.4,
> and it, or anything newer than that, no longer vectorizes either function.
Ah, OK - that's before GCC learned vectorization and is code-generated by
RTL expanding
return {BIT_FIELD_REF <a, 128, 0> + BIT_FIELD_REF <b, 128, 0>};
so the only vector support was GCCs generic vectors (and intrinsics). The
generated code is far from perfect though. I also think llvms code
generation is bogus since it appears the ABI does not guarantee zeroed
upper elements of the xmm0 argument which means they could contain sNaNs:
typedef float ss2 __attribute__((vector_size(8)));
typedef float ss4 __attribute__((vector_size(16)));
ss2 add2(ss2 a, ss2 b);
void bar(ss4 a)
{
volatile ss2 x;
x = add2 ((ss2){a[0], a[1]}, (ss2){a[0], a[1]});
}
produces
bar:
.LFB1:
.cfi_startproc
subq $56, %rsp
.cfi_def_cfa_offset 64
movdqa %xmm0, %xmm1
call add2
movq %xmm0, 24(%rsp)
addq $56, %rsp
which means we pass through 'a' unchanged.
More information about the Gcc-bugs
mailing list