[Bug tree-optimization/67612] New: Unable to vectorize DOT_PROD_EXPR (PMADDWD)

dmalcolm at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Thu Sep 17 14:58:00 GMT 2015


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67612

            Bug ID: 67612
           Summary: Unable to vectorize DOT_PROD_EXPR (PMADDWD)
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dmalcolm at gcc dot gnu.org
  Target Milestone: ---

Created attachment 36346
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36346&action=edit
Test case

The attached code is a reduced form of a loop that the user hoped would be
auto-vectorized to using PMADDWD, but no vectorization occurs (this was whilst
investigating possible use of libgccjit for autovectorization).

With a recent gcc trunk (r227686), I get this for the reproducer at -O3:

0000000000000000 <test_muladd>:
   0:   31 c0                   xor    %eax,%eax
   2:   85 c9                   test   %ecx,%ecx
   4:   7e 37                   jle    3d <test_muladd+0x3d>
   6:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
   d:   00 00 00 
  10:   44 0f bf 04 82          movswl (%rdx,%rax,4),%r8d
  15:   44 0f bf 14 86          movswl (%rsi,%rax,4),%r10d
  1a:   44 0f bf 4c 82 02       movswl 0x2(%rdx,%rax,4),%r9d
  20:   45 0f af d0             imul   %r8d,%r10d
  24:   44 0f bf 44 86 02       movswl 0x2(%rsi,%rax,4),%r8d
  2a:   45 0f af c1             imul   %r9d,%r8d
  2e:   45 01 d0                add    %r10d,%r8d
  31:   44 89 04 87             mov    %r8d,(%rdi,%rax,4)
  35:   48 83 c0 01             add    $0x1,%rax
  39:   39 c1                   cmp    %eax,%ecx
  3b:   7f d3                   jg     10 <test_muladd+0x10>
  3d:   f3 c3                   repz retq 

Building with -fdump-tree-vect-details to see why gcc -O3 fails to vectorize,
I see this in FILENAME.c.130t.vect:
  (snip)
  ../../src/vector_dot_prod.c:11:3: note: ==> examining pattern statement:
patt_91 = DOT_PROD_EXPR <_14, _18, _29>;
  ../../src/vector_dot_prod.c:11:3: note: vect_is_simple_use: operand _14
  ../../src/vector_dot_prod.c:11:3: note: def_stmt: _14 = *_13;
  ../../src/vector_dot_prod.c:11:3: note: type of def: internal
  ../../src/vector_dot_prod.c:11:3: note: not vectorized: relevant stmt not
supported: patt_91 = DOT_PROD_EXPR <_14, _18, _29>;
  ../../src/vector_dot_prod.c:11:3: note: bad operation or unsupported loop
bound.
  ../../src/vector_dot_prod.c:5:1: note: vectorized 0 loops in function.

Stepping through:
  gcc/tree-vect-stmts.c:vect_analyze_stmt
for stmt:
  patt_91 = DOT_PROD_EXPR <_14, _18, _29>;
I see that vectorizable_operation returns false here:

  4821    if (nunits_out != nunits_in)
  4910        return false;

  (gdb) p nunits_out
  $16 = 4
  (gdb) p nunits_in
  $17 = 8

Should this be a vectorizable_operation?



More information about the Gcc-bugs mailing list