[PATCH][RFC] Come up with VEC_COND_OP_EXPRs.

Tue Sep 24 12:18:00 GMT 2019

On Tue, Sep 24, 2019 at 1:57 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener <richard.guenther@gmail.com> writes:
> > On Tue, Sep 24, 2019 at 1:11 PM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> Martin Liška <mliska@suse.cz> writes:
> >> > Hi.
> >> >
> >> > The patch introduces couple of new TREE_CODEs that will help us to have
> >> > a proper GIMPLE representation of current VECT_COND_EXPR. Right now,
> >> > the first argument is typically a GENERIC tcc_expression tree with 2 operands
> >> > that are visited at various places in GIMPLE code. That said, based on the discussion
> >> > with Richi, I'm suggesting to come up with e.g.
> >> > VECT_COND_LT_EXPR<COND_LHS, COND_RHS, IF_CLAUSE, ELSE_CLAUSE>. Such a change logically
> >> > introduces new GIMPLE_QUATERNARY_RHS gassignments. For now, the VEC_COND_EXPR remains
> >> > and is only valid in GENERIC and gimplifier will take care of the corresponding transition.
> >> >
> >> > The patch is a prototype and missing bits are:
> >> > - folding support addition for GIMPLE_QUATERNARY_RHS is missing
> >> > - fancy tcc_comparison expressions like LTGT_EXPR, UNORDERED_EXPR, ORDERED_EXPR,
> >> >   UNLT_EXPR and others are not supported right now
> >> > - comments are missing for various functions added
> >> >
> >> > Apart from that I was able to bootstrap and run tests with a quite small fallout.
> >> > Thoughts?
> >> > Martin
> >>
> >> I think this is going in the wrong direction.  There are some targets
> >> that can only handle VEC_COND_EXPRs well if we know the associated
> >> condition, and others where a compare-and-VEC_COND_EXPR will always be
> >> two operations.  In that situation, it seems like the native gimple
> >> representation should be the simpler representation rather than the
> >> more complex one.  That way the comparisons can be optimised
> >> independently of any VEC_COND_EXPRs on targets that benefit from that.
> >>
> >> So IMO it would be better to use three-operand VEC_COND_EXPRs with
> >> no embedded conditions as the preferred gimple representation and
> >> have internal functions for the fused operations that some targets
> >> prefer.  This means that using fused operations is "just" an instruction
> >> selection decision rather than hard-coded throughout gimple.  (And that
> >> fits in well with the idea of doing more instruction selection in gimple.)
> >
> > So I've been doing that before, but more generally also for COND_EXPR.
> > We cannot rely on TER and the existing RTL expansion "magic" for the
> > instruction selection issue you mention because TER isn't reliable.  With
> > IFNs for optabs we could do actual [vector] condition instruction selection
> > before RTL expansion, ignoring "single-use" issues - is that what you are
> > hinting at?
>
> Yeah.  It'd be similar to how most FMA selection happens after
> vectorisation but before expand.
>
> > How should the vectorizer deal with this?  Should it directly
> > use the optab IFNs then when facing "split" COND_EXPRs?  IIRC the
> > most fallout of a simple patch (adjusting is_gimple_condexpr) is in the
> > vectorizer.
>
> I guess that would be down to how well the vector costings work if we
> just stick to VEC_COND_EXPR and cost the comparison separately.  Using
> optabs directly in the vectoriser definitely sounds OK if that ends up
> being necessary for good code.  But if (like you say) the COND_EXPR is
> also split apart, we'd be costing the scalar comparison and selection
> separately as well.
>
> > Note I'm specifically looking for a solution that applies to both COND_EXPR
> > and VEC_COND_EXPR since both suffer from the same issues.
>
> Yeah, think the same approach would work for COND_EXPR if it's needed.
> (And I think the same trade-off applies there too.  Some targets will
> always need a separate comparison to implement a four-operand COND_EXPR.)
>
> > There was also recent work in putting back possibly trapping comparisons
> > into [VEC_]COND_EXPR because it doesn't interfere with EH and allows
> > better code.
>
> OK, that's a good counter-reason :-)  But it seems quite special-purpose.
> I assume this works even for targets that do split the VEC_COND_EXPR
> because the result is undefined on entry to the EH receiver if the
> operation didn't complete.  But that should be true of any non-trapping
> work done after the comparison, with the same proviso.
>
> So this still seems like an instruction-selection issue.  We're just
> saying that it's OK to combine a trapping comparison and a VEC_COND_EXPR
> from the non-trapping path.  The same would be true for any other
> instruction selection that combines trapping and non-trapping
> operations, provided that the speculated parts can never trap.

Sure, but that case would necessarily be combining the compare and the
select to the compare place which is "backwards" (and would speculate
the select).  Certainly something we don't do anywhere.  This case btw
made me consider going the four-operand way (I've pondered with all available
ops multiple times...).

> > Also you SVE people had VN issues with cond-exprs and
> > VN runs into the exact same issue (but would handle separate comparisons
> > better - with the caveat of breaking TER).
>
> The VN thing turned out to be a red herring there, sorry.  I think
> I was remembering the state before ifcvt did its own value numbering.
> The remaining issue for the vectoriser is that we don't avoid duplicate
> cast conversions in vect_recog_mask_conversion_pattern, but that's
> mostly a cost thing.  The redundancies do get removed by later passes.

Well, I checked and value-numbering doesn't really handle non-trivial
"equalities" of the condition operand (if one of the operands of the
condition need to be valueized to be detected equal).

So to go forward and to make sure we don't regress the appropriate
way would probably to tackle the expansion part first.  I guess we'll
not notice for scalar COND_EXPRs (because those don't happen
very often) so we could "lower" VEC_COND_EXPRs to the desired
form (and key IL verificataion on PROP_gimple_lvec), which then
means late FRE/DOM have the chance to break things by doing
CSE.  At the same time we'd remove the forwprop pieces that put
the condition back in.  Then we can see to implement the
instruction selection somehow somewhere... (does it need to happen
at -O0?  I think that might be desirable - looking at vectorizer
intrinsic code might help to decide).

Does that sound sensible?  I've searched my patch archieves and
could share several incomplete attempts on tackling this, dating
back to as far as 2010...)

Richard.

> Thanks,
> Richard