[Bug middle-end/106010] Miss vectorization for complex type copy.
rguenth at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Mon Jun 20 10:10:00 GMT 2022
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106010
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Target| |x86_64-*-*
Last reconfirmed| |2022-06-20
Component|tree-optimization |middle-end
Ever confirmed|0 |1
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
We RTL expand
;; _5 = MEM[(complex double *)q_9(D) + ivtmp.12_14 * 1];
(insn 9 8 10 (set (reg:DF 82 [ _5 ])
(mem:DF (plus:DI (reg/v/f:DI 86 [ q ])
(reg:DI 84 [ ivtmp.12 ])) [1 MEM[(complex double *)q_9(D) +
ivtmp.12_14 * 1]+0 S8 A64])) "t.c":5:15 -1
(nil))
(insn 10 9 0 (set (reg:DF 83 [ _5+8 ])
(mem:DF (plus:DI (plus:DI (reg/v/f:DI 86 [ q ])
(reg:DI 84 [ ivtmp.12 ]))
(const_int 8 [0x8])) [1 MEM[(complex double *)q_9(D) +
ivtmp.12_14 * 1]+8 S8 A64])) "t.c":5:15 -1
(nil))
;; MEM[(complex double *)p_10(D) + ivtmp.12_14 * 1] = _5;
(insn 11 10 12 (set (mem:DF (plus:DI (reg/v/f:DI 85 [ p ])
(reg:DI 84 [ ivtmp.12 ])) [1 MEM[(complex double *)p_10(D) +
ivtmp.12_14 * 1]+0 S8 A64])
(reg:DF 82 [ _5 ])) "t.c":5:12 -1
(nil))
(insn 12 11 0 (set (mem:DF (plus:DI (plus:DI (reg/v/f:DI 85 [ p ])
(reg:DI 84 [ ivtmp.12 ]))
(const_int 8 [0x8])) [1 MEM[(complex double *)p_10(D) +
ivtmp.12_14 * 1]+8 S8 A64])
(reg:DF 83 [ _5+8 ])) "t.c":5:12 -1
(nil))
likely assigning (concat:CD ...) to the pseudos instead of using xmm regs.
So for the copy case that's a target issue IMHO.
One could argue that without move patterns for complex we should eventually
lower memory accesses like we lower arithmetic. Note as soon as we do
for (int i = 0; i != 100000; i++)
p[i] = q[i] + 1.;
we do get the memory accesses lowered and the code vectorized.
Extra complications with _Complex arguments where we do _not_ want to
lower the loads (without further thoughts).
foo (p[i]);
for
foo (p[i] + 1.);
we get
_6 = IMAGPART_EXPR <*_3>;
_4 = REALPART_EXPR <*_3>;
_5 = _4 + 1.0e+0;
_7 = COMPLEX_EXPR <_5, _6>;
bar (_7);
which is also similar as to what we expand foo (p[i]) to.
More information about the Gcc-bugs
mailing list