[Bug tree-optimization/88873] New: missing vectorization for decomposed operations on a vector type

vincent-gcc at vinc17 dot net gcc-bugzilla@gcc.gnu.org
Wed Jan 16 09:51:00 GMT 2019


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88873

            Bug ID: 88873
           Summary: missing vectorization for decomposed operations on a
                    vector type
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vincent-gcc at vinc17 dot net
  Target Milestone: ---

To compute a vectorized fma, one needs to apply it on the decomposed vector
components. Here's an example with a structure type and with a vector type. The
structure type solution is just given for comparison. This bug is about the
vector type solution.

#include <math.h>

typedef struct { double x, y; } s_t;

typedef double v2df __attribute__ ((vector_size (2 * sizeof(double))));

s_t foo (s_t a, s_t b, s_t c)
{
  return (s_t) { fma(a.x, b.x, c.x), fma (a.y, b.y, c.y) };
}

v2df bar (v2df a, v2df b, v2df c)
{
  v2df r;

  r[0] = fma (a[0], b[0], c[0]);
  r[1] = fma (a[1], b[1], c[1]);
  return r;
}

With -O3, I get on x86_64:

* For function foo (struct type):

[...]
        vfmadd132pd     -40(%rsp), %xmm7, %xmm6
[...]

This is vectorized as expected, though this solution is affected by bug 65847.

* For function bar (vector type):

bar:
.LFB1:
        .cfi_startproc
        vmovapd %xmm0, %xmm3
        vunpckhpd       %xmm0, %xmm0, %xmm0
        vfmadd132sd     %xmm1, %xmm2, %xmm3
        vunpckhpd       %xmm1, %xmm1, %xmm1
        vunpckhpd       %xmm2, %xmm2, %xmm2
        vfmadd132sd     %xmm1, %xmm2, %xmm0
        vunpcklpd       %xmm0, %xmm3, %xmm0
        ret
        .cfi_endproc

This is not vectorized: one has 2 vfmadd132sd instead of a single vfmadd132pd.

Note: The problem is the same with addition, but in the addition case, one can
simply do a + b. This is not possible with fma.

This bug seems similar to bug 77399.


More information about the Gcc-bugs mailing list