[Bug tree-optimization/88873] New: missing vectorization for decomposed operations on a vector type
vincent-gcc at vinc17 dot net
gcc-bugzilla@gcc.gnu.org
Wed Jan 16 09:51:00 GMT 2019
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88873
Bug ID: 88873
Summary: missing vectorization for decomposed operations on a
vector type
Product: gcc
Version: 9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vincent-gcc at vinc17 dot net
Target Milestone: ---
To compute a vectorized fma, one needs to apply it on the decomposed vector
components. Here's an example with a structure type and with a vector type. The
structure type solution is just given for comparison. This bug is about the
vector type solution.
#include <math.h>
typedef struct { double x, y; } s_t;
typedef double v2df __attribute__ ((vector_size (2 * sizeof(double))));
s_t foo (s_t a, s_t b, s_t c)
{
return (s_t) { fma(a.x, b.x, c.x), fma (a.y, b.y, c.y) };
}
v2df bar (v2df a, v2df b, v2df c)
{
v2df r;
r[0] = fma (a[0], b[0], c[0]);
r[1] = fma (a[1], b[1], c[1]);
return r;
}
With -O3, I get on x86_64:
* For function foo (struct type):
[...]
vfmadd132pd -40(%rsp), %xmm7, %xmm6
[...]
This is vectorized as expected, though this solution is affected by bug 65847.
* For function bar (vector type):
bar:
.LFB1:
.cfi_startproc
vmovapd %xmm0, %xmm3
vunpckhpd %xmm0, %xmm0, %xmm0
vfmadd132sd %xmm1, %xmm2, %xmm3
vunpckhpd %xmm1, %xmm1, %xmm1
vunpckhpd %xmm2, %xmm2, %xmm2
vfmadd132sd %xmm1, %xmm2, %xmm0
vunpcklpd %xmm0, %xmm3, %xmm0
ret
.cfi_endproc
This is not vectorized: one has 2 vfmadd132sd instead of a single vfmadd132pd.
Note: The problem is the same with addition, but in the addition case, one can
simply do a + b. This is not possible with fma.
This bug seems similar to bug 77399.
More information about the Gcc-bugs
mailing list