This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug rtl-optimization/84101] [7/8 Regression] -O3 and -ftree-vectorize trying too hard for function returning trivial pair-of-uint64_t-structure
- From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Tue, 30 Jan 2018 08:50:14 +0000
- Subject: [Bug rtl-optimization/84101] [7/8 Regression] -O3 and -ftree-vectorize trying too hard for function returning trivial pair-of-uint64_t-structure
- Auto-submitted: auto-generated
- References: <bug-84101-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84101
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target| |x86_64-*-*, i?86-*-*
Priority|P3 |P2
Status|UNCONFIRMED |NEW
Keywords| |missed-optimization
Last reconfirmed| |2018-01-30
Component|c |rtl-optimization
CC| |segher at gcc dot gnu.org
Ever confirmed|0 |1
Target Milestone|--- |7.4
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
.optimized:
pair (int num)
{
struct uint64_pair_t D.1958;
int _1;
long unsigned int _2;
int _3;
long unsigned int _4;
vector(2) long unsigned int _9;
<bb 2> [local count: 1073741825]:
_1 = num_5(D) << 1;
_2 = (long unsigned int) _1;
_3 = num_5(D) >> 1;
_4 = (long unsigned int) _3;
_9 = {_2, _4};
MEM[(struct uint64_pair *)&D.1958] = _9;
return D.1958;
}
there's (plenty?) of duplicates with the vectorizer making mistakes with
respect to ABI details which are not exposed at vectorization time. Note we
don't spill
at expansion time either:
(insn 10 9 11 2 (set (reg:V2DI 95)
(vec_concat:V2DI (reg:DI 97)
(reg:DI 99))) "t.c":15 -1
(nil))
(insn 11 10 12 2 (set (reg:TI 92 [ D.1958 ])
(subreg:TI (reg:V2DI 95) 0)) "t.c":15 -1
(nil))
(insn 12 11 16 2 (set (reg:TI 93 [ <retval> ])
(reg:TI 92 [ D.1958 ])) "t.c":15 -1
(nil))
(insn 16 12 17 2 (set (reg/i:TI 0 ax)
(reg:TI 93 [ <retval> ])) "t.c":16 -1
(nil))
(insn 17 16 0 2 (use (reg/i:TI 0 ax)) "t.c":16 -1
(nil))
but it's at LRA time the 'ax' TImode reg (register pair!) gets exposed.
From
(insn 10 9 16 2 (set (reg:V2DI 95)
(vec_concat:V2DI (reg:DI 97)
(reg:DI 99))) "t.c":15 3744 {vec_concatv2di}
(expr_list:REG_DEAD (reg:DI 99)
(expr_list:REG_DEAD (reg:DI 97)
(nil))))
(insn 16 10 17 2 (set (reg/i:TI 0 ax)
(subreg:TI (reg:V2DI 95) 0)) "t.c":16 84 {*movti_internal}
(expr_list:REG_DEAD (reg:V2DI 95)
(nil)))
we go to (after first spilling the DImode components):
(insn 22 9 24 2 (set (reg:DI 21 xmm0 [95])
(mem/c:DI (plus:DI (reg/f:DI 7 sp)
(const_int -24 [0xffffffffffffffe8])) [3 %sfp+-16 S8 A128]))
"t.c":15 85 {*movdi_internal}
(nil))
(insn 24 22 10 2 (set (mem/c:DI (plus:DI (reg/f:DI 7 sp)
(const_int -24 [0xffffffffffffffe8])) [3 %sfp+-16 S8 A128])
(reg:DI 5 di [99])) "t.c":15 85 {*movdi_internal}
(nil))
(insn 10 24 23 2 (set (reg:V2DI 21 xmm0 [95])
(vec_concat:V2DI (reg:DI 21 xmm0 [95])
(mem/c:DI (plus:DI (reg/f:DI 7 sp)
(const_int -24 [0xffffffffffffffe8])) [3 %sfp+-16 S8
A128]))) "t.c":15 3744 {vec_concatv2di}
(nil))
(insn 23 10 16 2 (set (mem/c:V2DI (plus:DI (reg/f:DI 7 sp)
(const_int -24 [0xffffffffffffffe8])) [3 %sfp+-16 S16 A128])
(reg:V2DI 21 xmm0 [95])) "t.c":15 1255 {movv2di_internal}
(nil))
(insn 16 23 17 2 (set (reg/i:TI 0 ax)
(mem/c:TI (plus:DI (reg/f:DI 7 sp)
(const_int -24 [0xffffffffffffffe8])) [3 %sfp+-16 S16 A128]))
"t.c":16 84 {*movti_internal}
(nil))
This is really hard to avoid in the vectorizer given the decl we return
isn't a RESULT_DECL but a regular VAR_DECL so we have no idea it is
literally returned.
Note the RTL when not vectorizing isn't too different:
(insn 10 9 11 2 (set (reg:DI 97)
(sign_extend:DI (reg:SI 96))) "t.c":13 -1
(nil))
(insn 11 10 12 2 (set (subreg:DI (reg:TI 91 [ D.1958 ]) 8)
(reg:DI 97)) "t.c":15 -1
(nil))
(insn 12 11 16 2 (set (reg:TI 92 [ <retval> ])
(reg:TI 91 [ D.1958 ])) "t.c":15 -1
(nil))
(insn 16 12 17 2 (set (reg/i:TI 0 ax)
(reg:TI 92 [ <retval> ])) "t.c":16 -1
(nil))
(insn 17 16 0 2 (use (reg/i:TI 0 ax)) "t.c":16 -1
(nil))
here it is the subreg1 pass that exposes the register pair and lowers the
subreg:
(insn 10 9 11 2 (set (reg:DI 97)
(sign_extend:DI (reg:SI 96))) "t.c":13 149 {*extendsidi2_rex64}
(nil))
(insn 11 10 19 2 (set (reg:DI 100 [ D.1958+8 ])
(reg:DI 97)) "t.c":15 85 {*movdi_internal}
(nil))
(insn 19 11 20 2 (set (reg:DI 101 [ <retval> ])
(reg:DI 99 [ D.1958 ])) "t.c":15 85 {*movdi_internal}
(nil))
(insn 20 19 21 2 (set (reg:DI 102 [ <retval>+8 ])
(reg:DI 100 [ D.1958+8 ])) "t.c":15 85 {*movdi_internal}
(nil))
(insn 21 20 22 2 (set (reg:DI 0 ax)
(reg:DI 101 [ <retval> ])) "t.c":16 85 {*movdi_internal}
(nil))
(insn 22 21 17 2 (set (reg:DI 1 dx [+8 ])
(reg:DI 102 [ <retval>+8 ])) "t.c":16 85 {*movdi_internal}
(nil))
(insn 17 22 0 2 (use (reg/i:TI 0 ax)) "t.c":16 -1
(nil))
I imagine it could be made recognizing the (subreg (vec_concat ..)) case
as well... but would that be a hack?