[Bug rtl-optimization/84101] [7/8/9 Regression] -O3 and -ftree-vectorize trying too hard for function returning trivial pair-of-uint64_t-structure

jakub at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Tue Dec 18 17:16:00 GMT 2018


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84101

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |law at gcc dot gnu.org,
                   |                            |uros at gcc dot gnu.org,
                   |                            |vmakarov at gcc dot gnu.org

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I actually don't think the right way to fix this is not to SLP vectorize it,
but rather be able to undo it during RTL optimizations.
The user could have written it that way already, e.g. as:
#include <string.h>

struct S { long long a, b; };
typedef long long v2di __attribute__((vector_size (16)));

struct S
foo (int x)
{
  struct S p;
  v2di q = { x << 1, x >> 1 };
  memcpy (&p, &q, sizeof (p));
  return p;
}

or similar.

We have:
(insn 10 9 11 2 (set (reg:V2DI 92)
        (vec_concat:V2DI (reg:DI 94)
            (reg:DI 96))) "pr84101-3.c":8:8 4110 {vec_concatv2di}
     (nil))
(insn 11 10 12 2 (set (reg/v:TI 89 [ p ])
        (subreg:TI (reg:V2DI 92) 0)) "pr84101-3.c":9:3 65 {*movti_internal}
     (nil))
(insn 12 11 13 2 (set (reg:TI 88 [ D.1915 ])
        (reg/v:TI 89 [ p ])) "pr84101-3.c":10:10 65 {*movti_internal}
     (nil))
(insn 13 12 14 2 (clobber (reg/v:TI 89 [ p ])) -1
     (nil))
(insn 14 13 18 2 (set (reg:TI 90 [ <retval> ])
        (reg:TI 88 [ D.1915 ])) "pr84101-3.c":10:10 65 {*movti_internal}
     (nil))
(insn 18 14 19 2 (set (reg/i:TI 0 ax)
        (reg:TI 90 [ <retval> ])) "pr84101-3.c":11:1 65 {*movti_internal}
     (nil))
(insn 19 18 0 2 (use (reg/i:TI 0 ax)) "pr84101-3.c":11:1 -1
     (nil))
and because there aren't any half of TImode subregs, the subreg1 pass does
nothing.  Combiner already sees
(insn 10 9 18 2 (set (reg:V2DI 92)
        (vec_concat:V2DI (reg:DI 94)
            (reg:DI 96))) "pr84101-3.c":8:8 4110 {vec_concatv2di}
     (expr_list:REG_DEAD (reg:DI 96)
        (expr_list:REG_DEAD (reg:DI 94)
            (nil))))
(insn 18 10 19 2 (set (reg/i:TI 0 ax)
        (subreg:TI (reg:V2DI 92) 0)) "pr84101-3.c":11:1 65 {*movti_internal}
     (expr_list:REG_DEAD (reg:V2DI 92)
        (nil)))
but (because of the hard register destination?) decides not to combine anything
into insn 18.  The RA isn't able to cope with this because V2DImode is not
HARD_REGNO_MODE_OK in GPRs (but TImode is).
So, where do you think we should deal with it?


More information about the Gcc-bugs mailing list