This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/67612] New: Unable to vectorize DOT_PROD_EXPR (PMADDWD)
- From: "dmalcolm at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 17 Sep 2015 14:58:22 +0000
- Subject: [Bug tree-optimization/67612] New: Unable to vectorize DOT_PROD_EXPR (PMADDWD)
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67612
Bug ID: 67612
Summary: Unable to vectorize DOT_PROD_EXPR (PMADDWD)
Product: gcc
Version: 6.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: dmalcolm at gcc dot gnu.org
Target Milestone: ---
Created attachment 36346
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36346&action=edit
Test case
The attached code is a reduced form of a loop that the user hoped would be
auto-vectorized to using PMADDWD, but no vectorization occurs (this was whilst
investigating possible use of libgccjit for autovectorization).
With a recent gcc trunk (r227686), I get this for the reproducer at -O3:
0000000000000000 <test_muladd>:
0: 31 c0 xor %eax,%eax
2: 85 c9 test %ecx,%ecx
4: 7e 37 jle 3d <test_muladd+0x3d>
6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
d: 00 00 00
10: 44 0f bf 04 82 movswl (%rdx,%rax,4),%r8d
15: 44 0f bf 14 86 movswl (%rsi,%rax,4),%r10d
1a: 44 0f bf 4c 82 02 movswl 0x2(%rdx,%rax,4),%r9d
20: 45 0f af d0 imul %r8d,%r10d
24: 44 0f bf 44 86 02 movswl 0x2(%rsi,%rax,4),%r8d
2a: 45 0f af c1 imul %r9d,%r8d
2e: 45 01 d0 add %r10d,%r8d
31: 44 89 04 87 mov %r8d,(%rdi,%rax,4)
35: 48 83 c0 01 add $0x1,%rax
39: 39 c1 cmp %eax,%ecx
3b: 7f d3 jg 10 <test_muladd+0x10>
3d: f3 c3 repz retq
Building with -fdump-tree-vect-details to see why gcc -O3 fails to vectorize,
I see this in FILENAME.c.130t.vect:
(snip)
../../src/vector_dot_prod.c:11:3: note: ==> examining pattern statement:
patt_91 = DOT_PROD_EXPR <_14, _18, _29>;
../../src/vector_dot_prod.c:11:3: note: vect_is_simple_use: operand _14
../../src/vector_dot_prod.c:11:3: note: def_stmt: _14 = *_13;
../../src/vector_dot_prod.c:11:3: note: type of def: internal
../../src/vector_dot_prod.c:11:3: note: not vectorized: relevant stmt not
supported: patt_91 = DOT_PROD_EXPR <_14, _18, _29>;
../../src/vector_dot_prod.c:11:3: note: bad operation or unsupported loop
bound.
../../src/vector_dot_prod.c:5:1: note: vectorized 0 loops in function.
Stepping through:
gcc/tree-vect-stmts.c:vect_analyze_stmt
for stmt:
patt_91 = DOT_PROD_EXPR <_14, _18, _29>;
I see that vectorizable_operation returns false here:
4821 if (nunits_out != nunits_in)
4910 return false;
(gdb) p nunits_out
$16 = 4
(gdb) p nunits_in
$17 = 8
Should this be a vectorizable_operation?