This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Allow fwprop to undo vectorization harm (PR68961)
- From: Richard Sandiford <rdsandiford at googlemail dot com>
- To: Richard Biener <rguenther at suse dot de>
- Cc: gcc-patches at gcc dot gnu dot org, segher at kernel dot crashing dot org
- Date: Wed, 15 Jun 2016 19:36:47 +0100
- Subject: Re: [PATCH] Allow fwprop to undo vectorization harm (PR68961)
- Authentication-results: sourceware.org; auth=none
- References: <alpine dot LSU dot 2 dot 11 dot 1606101111140 dot 13824 at t29 dot fhfr dot qr>
Richard Biener <rguenther@suse.de> writes:
> With the proposed cost change for vector construction we will end up
> vectorizing the testcase in PR68961 again (on x86_64 and likely
> on ppc64le as well after that target gets adjustments). Currently
> we can't optimize that away again noticing the direct overlap of
> argument and return registers. The obstackle is
>
> (insn 7 4 8 2 (set (reg:V2DF 93)
> (vec_concat:V2DF (reg/v:DF 91 [ a ])
> (reg/v:DF 92 [ aa ])))
> ...
> (insn 21 8 24 2 (set (reg:DI 97 [ D.1756 ])
> (subreg:DI (reg:TI 88 [ D.1756 ]) 0))
> (insn 24 21 11 2 (set (reg:DI 100 [+8 ])
> (subreg:DI (reg:TI 88 [ D.1756 ]) 8))
>
> which we eventually optimize to DFmode subregs of (reg:V2DF 93).
>
> First of all simplify_subreg doesn't handle the subregs of a vec_concat
> (easy fix below).
>
> Then combine doesn't like to simplify the multi-use (it tries some
> parallel it seems). So I went to forwprop which eventually manages
> to do this but throws away the result (reg:DF 91) or (reg:DF 92)
> because it is not a constant. Thus I allow arbitrary simplification
> results for SUBREGs of [VEC_]CONCAT operations. There doesn't seem
> to be a magic flag to tell it to restrict to the case where all
> uses can be simplified or so, nor to restrict simplifications to a REG.
> But I don't see any undesirable simplifications of (subreg
> ([vec_]concat)).
Adding that as a special case to propgate_rtx feels like a hack though :-)
I think:
> Index: gcc/fwprop.c
> ===================================================================
> *** gcc/fwprop.c (revision 237286)
> --- gcc/fwprop.c (working copy)
> *************** propagate_rtx (rtx x, machine_mode mode,
> *** 664,670 ****
> || (GET_CODE (new_rtx) == SUBREG
> && REG_P (SUBREG_REG (new_rtx))
> && (GET_MODE_SIZE (mode)
> ! <= GET_MODE_SIZE (GET_MODE (SUBREG_REG (new_rtx))))))
> flags |= PR_CAN_APPEAR;
> if (!varying_mem_p (new_rtx))
> flags |= PR_HANDLE_MEM;
> --- 664,673 ----
> || (GET_CODE (new_rtx) == SUBREG
> && REG_P (SUBREG_REG (new_rtx))
> && (GET_MODE_SIZE (mode)
> ! <= GET_MODE_SIZE (GET_MODE (SUBREG_REG (new_rtx)))))
> ! || ((GET_CODE (new_rtx) == VEC_CONCAT
> ! || GET_CODE (new_rtx) == CONCAT)
> ! && GET_CODE (x) == SUBREG))
> flags |= PR_CAN_APPEAR;
> if (!varying_mem_p (new_rtx))
> flags |= PR_HANDLE_MEM;
...this if statement should fundamentally only test new_rtx.
E.g. we'd want the same thing for any SUBREG inside X.
How about changing:
/* The replacement we made so far is valid, if all of the recursive
replacements were valid, or we could simplify everything to
a constant. */
return valid_ops || can_appear || CONSTANT_P (tem);
so that (REG_P (tem) && !HARD_REGISTER_P (tem)) is also valid?
I suppose that's likely to increase register pressure though,
if only some uses of new_rtx simplify. (There again, requiring all
uses to be replacable could make hot code the hostage of cold code.)
Thanks,
Richard