This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Allow fwprop to undo vectorization harm (PR68961)
- From: Richard Biener <rguenther at suse dot de>
- To: Richard Sandiford <rdsandiford at googlemail dot com>
- Cc: gcc-patches at gcc dot gnu dot org, segher at kernel dot crashing dot org
- Date: Mon, 4 Jul 2016 12:47:44 +0200 (CEST)
- Subject: Re: [PATCH] Allow fwprop to undo vectorization harm (PR68961)
- Authentication-results: sourceware.org; auth=none
- References: <alpine.LSU.2.11.1606101111140.13824@t29.fhfr.qr> <877fdqti2o.fsf@googlemail.com> <alpine.LSU.2.11.1606271010100.29772@t29.fhfr.qr> <87y45ir5vo.fsf@googlemail.com>
On Sun, 3 Jul 2016, Richard Sandiford wrote:
> Richard Biener <rguenther@suse.de> writes:
> > On Wed, 15 Jun 2016, Richard Sandiford wrote:
> >
> >> Richard Biener <rguenther@suse.de> writes:
> >> > With the proposed cost change for vector construction we will end up
> >> > vectorizing the testcase in PR68961 again (on x86_64 and likely
> >> > on ppc64le as well after that target gets adjustments). Currently
> >> > we can't optimize that away again noticing the direct overlap of
> >> > argument and return registers. The obstackle is
> >> >
> >> > (insn 7 4 8 2 (set (reg:V2DF 93)
> >> > (vec_concat:V2DF (reg/v:DF 91 [ a ])
> >> > (reg/v:DF 92 [ aa ])))
> >> > ...
> >> > (insn 21 8 24 2 (set (reg:DI 97 [ D.1756 ])
> >> > (subreg:DI (reg:TI 88 [ D.1756 ]) 0))
> >> > (insn 24 21 11 2 (set (reg:DI 100 [+8 ])
> >> > (subreg:DI (reg:TI 88 [ D.1756 ]) 8))
> >> >
> >> > which we eventually optimize to DFmode subregs of (reg:V2DF 93).
> >> >
> >> > First of all simplify_subreg doesn't handle the subregs of a vec_concat
> >> > (easy fix below).
> >> >
> >> > Then combine doesn't like to simplify the multi-use (it tries some
> >> > parallel it seems). So I went to forwprop which eventually manages
> >> > to do this but throws away the result (reg:DF 91) or (reg:DF 92)
> >> > because it is not a constant. Thus I allow arbitrary simplification
> >> > results for SUBREGs of [VEC_]CONCAT operations. There doesn't seem
> >> > to be a magic flag to tell it to restrict to the case where all
> >> > uses can be simplified or so, nor to restrict simplifications to a REG.
> >> > But I don't see any undesirable simplifications of (subreg
> >> > ([vec_]concat)).
> >>
> >> Adding that as a special case to propgate_rtx feels like a hack though :-)
> >> I think:
> >>
> >> > Index: gcc/fwprop.c
> >> > ===================================================================
> >> > *** gcc/fwprop.c (revision 237286)
> >> > --- gcc/fwprop.c (working copy)
> >> > *************** propagate_rtx (rtx x, machine_mode mode,
> >> > *** 664,670 ****
> >> > || (GET_CODE (new_rtx) == SUBREG
> >> > && REG_P (SUBREG_REG (new_rtx))
> >> > && (GET_MODE_SIZE (mode)
> >> > ! <= GET_MODE_SIZE (GET_MODE (SUBREG_REG (new_rtx))))))
> >> > flags |= PR_CAN_APPEAR;
> >> > if (!varying_mem_p (new_rtx))
> >> > flags |= PR_HANDLE_MEM;
> >> > --- 664,673 ----
> >> > || (GET_CODE (new_rtx) == SUBREG
> >> > && REG_P (SUBREG_REG (new_rtx))
> >> > && (GET_MODE_SIZE (mode)
> >> > ! <= GET_MODE_SIZE (GET_MODE (SUBREG_REG (new_rtx)))))
> >> > ! || ((GET_CODE (new_rtx) == VEC_CONCAT
> >> > ! || GET_CODE (new_rtx) == CONCAT)
> >> > ! && GET_CODE (x) == SUBREG))
> >> > flags |= PR_CAN_APPEAR;
> >> > if (!varying_mem_p (new_rtx))
> >> > flags |= PR_HANDLE_MEM;
> >>
> >> ...this if statement should fundamentally only test new_rtx.
> >> E.g. we'd want the same thing for any SUBREG inside X.
> >>
> >> How about changing:
> >>
> >> /* The replacement we made so far is valid, if all of the recursive
> >> replacements were valid, or we could simplify everything to
> >> a constant. */
> >> return valid_ops || can_appear || CONSTANT_P (tem);
> >>
> >> so that (REG_P (tem) && !HARD_REGISTER_P (tem)) is also valid?
> >> I suppose that's likely to increase register pressure though,
> >> if only some uses of new_rtx simplify. (There again, requiring all
> >> uses to be replacable could make hot code the hostage of cold code.)
> >
> > Yes, my fear was about register presure increase for the case not all
> > uses can be replaced (fwprop doesn't seem to have code to verify or
> > require that).
> >
> > I can avoid checking for GET_CODE (x) == SUBREG and add a PR_REG
> > case to restrict REG_P (tem) && !HARD_REGISTER_P (tem) to the
> > new_rtx == [VEC_]CONCAT case for example.
>
> I don't think that helps though. There might be other uses of a
> VEC_CONCAT that aren't SUBREGs, in which case we'd have the same
> problem of keeping both values live at once.
>
> How about restricting the REG_P (tem) && !HARD_REGISTER_P (tem)
> to cases where new_rtx has more words than tem?
So would you really make a simple mode-size check here? I wonder
which cases are there other than the subreg of [vec_concat] that
would end up with this case. That is,
if (REG_P (tem) && !HARD_REGISTER_P (tem)
&& GET_MODE (tem) == GET_MODE_INNER (GET_MODE (new_rtx))
&& (VECTOR_MODE_P (GET_MODE (new_rtx))
|| COMPLEX_MODE_P (GET_MODE (new_rtx))))
return true;
works as would constraining new_rtx to [VEC_]CONCAT instead of
vector/complex modes. I'm worried about relaxing it further
as partial int modes also have a GET_MODE_INNER - relaxing to
&& GET_MODE_INNER (GET_MODE (new_rtx)) != GET_MODE (new_rtx)
I mean. If we restrict it to [VEC_]CONCAT we can even allow
any REG_P && !HARD_REGISTER_P as I can't see any outer (non-noop)
operation that will simplify into a REG_P besides a subreg.
But the above form would capture the intent to allow simplifying
operations on vector / complex to a scalar component.
Which means I can test the following if that looks better
than what I had originally. There's Seghers 2->2 combine
patch but that has wider impact as well.
Thanks,
Richard.
2016-07-04 Richard Biener <rguenther@suse.de>
PR rtl-optimization/68961
* fwprop.c (propagate_rtx): Allow SUBREGs of VEC_CONCAT and CONCAT
to simplify to a non-constant.
* gcc.target/i386/pr68961.c: New testcase.
Index: gcc/testsuite/gcc.target/i386/pr68961.c
===================================================================
*** gcc/testsuite/gcc.target/i386/pr68961.c (revision 0)
--- gcc/testsuite/gcc.target/i386/pr68961.c (working copy)
***************
*** 0 ****
--- 1,19 ----
+ /* { dg-do compile { target lp64 } } */
+ /* { dg-options "-O3 -fno-vect-cost-model -fdump-tree-slp2-details" } */
+
+ struct x { double d[2]; };
+
+ struct x
+ pack (double a, double aa)
+ {
+ struct x u;
+ u.d[0] = a;
+ u.d[1] = aa;
+ return u;
+ }
+
+ /* The function should be optimized to just return as arguments and
+ result exactly overlap even when previously vectorized. */
+
+ /* { dg-final { scan-tree-dump "basic block vectorized" "slp2" } } */
+ /* { dg-final { scan-assembler-not "mov" } } */
Index: gcc/fwprop.c
===================================================================
*** gcc/fwprop.c (revision 237955)
--- gcc/fwprop.c (working copy)
*************** propagate_rtx_1 (rtx *px, rtx old_rtx, r
*** 619,624 ****
--- 619,633 ----
*px = tem;
+ /* Allow replacements that simplify operations on a vector or complex
+ value to a component. The most prominent case is
+ (subreg ([vec_]concat ...)). */
+ if (REG_P (tem) && !HARD_REGISTER_P (tem)
+ && GET_MODE (tem) == GET_MODE_INNER (GET_MODE (new_rtx))
+ && (VECTOR_MODE_P (GET_MODE (new_rtx))
+ || COMPLEX_MODE_P (GET_MODE (new_rtx))))
+ return true;
+
/* The replacement we made so far is valid, if all of the recursive
replacements were valid, or we could simplify everything to
a constant. */