This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug rtl-optimization/84101] [7/8 Regression] -O3 and -ftree-vectorize trying too hard for function returning trivial pair-of-uint64_t-structure


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84101

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |x86_64-*-*, i?86-*-*
           Priority|P3                          |P2
             Status|UNCONFIRMED                 |NEW
           Keywords|                            |missed-optimization
   Last reconfirmed|                            |2018-01-30
          Component|c                           |rtl-optimization
                 CC|                            |segher at gcc dot gnu.org
     Ever confirmed|0                           |1
   Target Milestone|---                         |7.4

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
.optimized:

pair (int num)
{
  struct uint64_pair_t D.1958;
  int _1;
  long unsigned int _2;
  int _3;
  long unsigned int _4;
  vector(2) long unsigned int _9;

  <bb 2> [local count: 1073741825]:
  _1 = num_5(D) << 1;
  _2 = (long unsigned int) _1;
  _3 = num_5(D) >> 1;
  _4 = (long unsigned int) _3;
  _9 = {_2, _4};
  MEM[(struct uint64_pair *)&D.1958] = _9;
  return D.1958;
}

there's (plenty?) of duplicates with the vectorizer making mistakes with
respect to ABI details which are not exposed at vectorization time.  Note we
don't spill
at expansion time either:

(insn 10 9 11 2 (set (reg:V2DI 95)
        (vec_concat:V2DI (reg:DI 97)
            (reg:DI 99))) "t.c":15 -1
     (nil))
(insn 11 10 12 2 (set (reg:TI 92 [ D.1958 ])
        (subreg:TI (reg:V2DI 95) 0)) "t.c":15 -1
     (nil))
(insn 12 11 16 2 (set (reg:TI 93 [ <retval> ])
        (reg:TI 92 [ D.1958 ])) "t.c":15 -1
     (nil))
(insn 16 12 17 2 (set (reg/i:TI 0 ax)
        (reg:TI 93 [ <retval> ])) "t.c":16 -1
     (nil))
(insn 17 16 0 2 (use (reg/i:TI 0 ax)) "t.c":16 -1
     (nil))

but it's at LRA time the 'ax' TImode reg (register pair!) gets exposed.
From

(insn 10 9 16 2 (set (reg:V2DI 95)
        (vec_concat:V2DI (reg:DI 97)
            (reg:DI 99))) "t.c":15 3744 {vec_concatv2di}
     (expr_list:REG_DEAD (reg:DI 99)
        (expr_list:REG_DEAD (reg:DI 97)
            (nil))))
(insn 16 10 17 2 (set (reg/i:TI 0 ax)
        (subreg:TI (reg:V2DI 95) 0)) "t.c":16 84 {*movti_internal}
     (expr_list:REG_DEAD (reg:V2DI 95)
        (nil)))

we go to (after first spilling the DImode components):

(insn 22 9 24 2 (set (reg:DI 21 xmm0 [95])
        (mem/c:DI (plus:DI (reg/f:DI 7 sp)
                (const_int -24 [0xffffffffffffffe8])) [3 %sfp+-16 S8 A128]))
"t.c":15 85 {*movdi_internal}
     (nil))
(insn 24 22 10 2 (set (mem/c:DI (plus:DI (reg/f:DI 7 sp)
                (const_int -24 [0xffffffffffffffe8])) [3 %sfp+-16 S8 A128])
        (reg:DI 5 di [99])) "t.c":15 85 {*movdi_internal}
     (nil))
(insn 10 24 23 2 (set (reg:V2DI 21 xmm0 [95])
        (vec_concat:V2DI (reg:DI 21 xmm0 [95])
            (mem/c:DI (plus:DI (reg/f:DI 7 sp)
                    (const_int -24 [0xffffffffffffffe8])) [3 %sfp+-16 S8
A128]))) "t.c":15 3744 {vec_concatv2di}
     (nil))
(insn 23 10 16 2 (set (mem/c:V2DI (plus:DI (reg/f:DI 7 sp)
                (const_int -24 [0xffffffffffffffe8])) [3 %sfp+-16 S16 A128])
        (reg:V2DI 21 xmm0 [95])) "t.c":15 1255 {movv2di_internal}
     (nil))
(insn 16 23 17 2 (set (reg/i:TI 0 ax)
        (mem/c:TI (plus:DI (reg/f:DI 7 sp)
                (const_int -24 [0xffffffffffffffe8])) [3 %sfp+-16 S16 A128]))
"t.c":16 84 {*movti_internal}
     (nil))


This is really hard to avoid in the vectorizer given the decl we return
isn't a RESULT_DECL but a regular VAR_DECL so we have no idea it is
literally returned.

Note the RTL when not vectorizing isn't too different:

(insn 10 9 11 2 (set (reg:DI 97)
        (sign_extend:DI (reg:SI 96))) "t.c":13 -1
     (nil))
(insn 11 10 12 2 (set (subreg:DI (reg:TI 91 [ D.1958 ]) 8)
        (reg:DI 97)) "t.c":15 -1
     (nil))
(insn 12 11 16 2 (set (reg:TI 92 [ <retval> ])
        (reg:TI 91 [ D.1958 ])) "t.c":15 -1
     (nil))
(insn 16 12 17 2 (set (reg/i:TI 0 ax)
        (reg:TI 92 [ <retval> ])) "t.c":16 -1
     (nil))
(insn 17 16 0 2 (use (reg/i:TI 0 ax)) "t.c":16 -1
     (nil))

here it is the subreg1 pass that exposes the register pair and lowers the
subreg:

(insn 10 9 11 2 (set (reg:DI 97)
        (sign_extend:DI (reg:SI 96))) "t.c":13 149 {*extendsidi2_rex64}
     (nil))
(insn 11 10 19 2 (set (reg:DI 100 [ D.1958+8 ])
        (reg:DI 97)) "t.c":15 85 {*movdi_internal}
     (nil))
(insn 19 11 20 2 (set (reg:DI 101 [ <retval> ])
        (reg:DI 99 [ D.1958 ])) "t.c":15 85 {*movdi_internal}
     (nil))
(insn 20 19 21 2 (set (reg:DI 102 [ <retval>+8 ])
        (reg:DI 100 [ D.1958+8 ])) "t.c":15 85 {*movdi_internal}
     (nil))
(insn 21 20 22 2 (set (reg:DI 0 ax)
        (reg:DI 101 [ <retval> ])) "t.c":16 85 {*movdi_internal}
     (nil))
(insn 22 21 17 2 (set (reg:DI 1 dx [+8 ])
        (reg:DI 102 [ <retval>+8 ])) "t.c":16 85 {*movdi_internal}
     (nil))
(insn 17 22 0 2 (use (reg/i:TI 0 ax)) "t.c":16 -1
     (nil))

I imagine it could be made recognizing the (subreg (vec_concat ..)) case
as well...  but would that be a hack?

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]