[Bug middle-end/106010] Miss vectorization for complex type copy.

rguenth at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Mon Jun 20 10:10:00 GMT 2022


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106010

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
             Target|                            |x86_64-*-*
   Last reconfirmed|                            |2022-06-20
          Component|tree-optimization           |middle-end
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
We RTL expand

;; _5 = MEM[(complex double *)q_9(D) + ivtmp.12_14 * 1];

(insn 9 8 10 (set (reg:DF 82 [ _5 ])
        (mem:DF (plus:DI (reg/v/f:DI 86 [ q ])
                (reg:DI 84 [ ivtmp.12 ])) [1 MEM[(complex double *)q_9(D) +
ivtmp.12_14 * 1]+0 S8 A64])) "t.c":5:15 -1
     (nil))

(insn 10 9 0 (set (reg:DF 83 [ _5+8 ])
        (mem:DF (plus:DI (plus:DI (reg/v/f:DI 86 [ q ])
                    (reg:DI 84 [ ivtmp.12 ]))
                (const_int 8 [0x8])) [1 MEM[(complex double *)q_9(D) +
ivtmp.12_14 * 1]+8 S8 A64])) "t.c":5:15 -1
     (nil))

;; MEM[(complex double *)p_10(D) + ivtmp.12_14 * 1] = _5;

(insn 11 10 12 (set (mem:DF (plus:DI (reg/v/f:DI 85 [ p ])
                (reg:DI 84 [ ivtmp.12 ])) [1 MEM[(complex double *)p_10(D) +
ivtmp.12_14 * 1]+0 S8 A64])
        (reg:DF 82 [ _5 ])) "t.c":5:12 -1
     (nil))

(insn 12 11 0 (set (mem:DF (plus:DI (plus:DI (reg/v/f:DI 85 [ p ])
                    (reg:DI 84 [ ivtmp.12 ]))
                (const_int 8 [0x8])) [1 MEM[(complex double *)p_10(D) +
ivtmp.12_14 * 1]+8 S8 A64])
        (reg:DF 83 [ _5+8 ])) "t.c":5:12 -1
     (nil))

likely assigning (concat:CD ...) to the pseudos instead of using xmm regs.
So for the copy case that's a target issue IMHO.

One could argue that without move patterns for complex we should eventually
lower memory accesses like we lower arithmetic.  Note as soon as we do

    for (int i = 0; i != 100000; i++)
      p[i] = q[i] + 1.;

we do get the memory accesses lowered and the code vectorized.

Extra complications with _Complex arguments where we do _not_ want to
lower the loads (without further thoughts).

  foo (p[i]);

for

  foo (p[i] + 1.);

we get

  _6 = IMAGPART_EXPR <*_3>;
  _4 = REALPART_EXPR <*_3>;
  _5 = _4 + 1.0e+0;
  _7 = COMPLEX_EXPR <_5, _6>;
  bar (_7);

which is also similar as to what we expand foo (p[i]) to.


More information about the Gcc-bugs mailing list