Use conditional internal functions in if-conversion

Fri May 25 13:16:00 GMT 2018

Richard Biener <richard.guenther@gmail.com> writes:
> On Fri, May 25, 2018 at 1:49 PM Richard Sandiford <
> richard.sandiford@linaro.org> wrote:
>
>> Richard Biener <richard.guenther@gmail.com> writes:
>> > On Fri, May 25, 2018 at 12:12 PM Richard Sandiford <
>> > richard.sandiford@linaro.org> wrote:
>> >
>> >> Richard Biener <richard.guenther@gmail.com> writes:
>> >> > On Wed, May 16, 2018 at 12:17 PM Richard Sandiford <
>> >> > richard.sandiford@linaro.org> wrote:
>> >> >> This patch uses IFN_COND_* to vectorise conditionally-executed,
>> >> >> potentially-trapping arithmetic, such as most floating-point
>> >> >> ops with -ftrapping-math.  E.g.:
>> >> >
>> >> >>      if (cond) { ... x = a + b; ... }
>> >> >
>> >> >> becomes:
>> >> >
>> >> >>      ...
>> >> >>      x = IFN_COND_ADD (cond, a, b);
>> >> >>      ...
>> >> >
>> >> >> When this transformation is done on its own, the value of x for
>> >> >> !cond isn't important.
>> >> >
>> >> >> However, the patch also looks for the equivalent of:
>> >> >
>> >> >>      y = cond ? x : a;
>> >> >
>> >> > As generated by predicate_all_scalar_phis, right?  So this tries to
>> > capture
>> >> >
>> >> >   if (cond)
>> >> >     y = a / b;
>> >> >   else
>> >> >     y = a;
>> >> >
>> >> > But I think it would be more useful to represent the else value
>> > explicitely
>> >> > as extra operand given a constant like 0 is sth that will happen in
>> > practice
>> >> > and is also sth that is I think supported as a result by some
> targets.
>> >> >
>> >> > So to support the flow of current if-conversion you'd convert to
>> >> >
>> >> >   IFN_COND_ADD (cond, a, b, a); // or any other reasonable "default"
>> > value
>> >> > (target controlled?)
>> >> >
>> >> > and you merge that with a single-use COND_EXPR by replacing that
>> > operand.
>> >> >
>> >> > If a target then cannot support the default value it needs to emit a
>> > blend
>> >> > afterwards.
>> >
>> >> Yeah, that's definitely better.  I was a bit worried about the extra
>> >> generality, but it turns out that (on SVE) having the else argument and
>> >> getting the .md patterns to blend the input can avoid an unnecessary
>> >> move that would be difficult to remove otherwise.
>> >
>> > Btw, I've checked AVX512 and their masked operations either blend
>> > into the destination or zero the masked elements.  Possibly useful
>> > for plus reductions.
>
>> OK.
>
>> >> I've committed the patches to add the else argument and the target hook
>> >> to select the preferred else value (thanks for the reviews).  The patch
>> >> below now uses that preferred else value if there's no ?: or if the
>> >> else value of the ?: isn't available at the point of the conditional
>> >> operation.
>> >
>> >> This version also copes with:
>> >
>> >>      if (cond) { ... x = a + b; ... }
>> >>      y = !cond ? c : x;
>> >
>> >> by adding a new utility function inverse_conditions_p.
>> >
>> > interpret_as_condition_p is a bit of a layering violation in
> fold-const.c
>> > since it looks at SSA_NAME_DEF_STMT.  In gimple.c we have a related
>> > thing, mainly used by forwprop - canonicalize_cond_expr_cond which is
>> > used to interpret a GENERIC folded thing as condition.
>> >
>> > Do you really need the def stmt lookup?
>
>> Yeah, since these internal functions always take a gimple value
>> as input, not a comparison code.  But I can put the code elsewhere.
>
>> > Note that one item on my TODO list is to remove [VEC_]COND_EXPR from
>> > gimple in favor of using a quaternary assign stmt with tcc_comparison
> code
>> > so we'd have
>> >
>> >   op0 = op1 < op2 ? op3 : op4;
>> >
>> > that means currently valid op0 = op1 < op2; would become
>> > a more explicit op0 = op1 < op2 ? true : false;
>> >
>> > That means you'd for example may need to handle _1 != 0 being
>> > feeded by _1 = _2 < _3 ? true : false;  -- or simply rely on those
>> > to be combined.
>> >
>> > Sooo - maybe you feel like implementing the above? ;)
>
>> So is just requiring a gimple value in COND_EXPR and VEC_COND_EXPR
>> a non-starter?  That seems neater IMO.  (And probably replacing
>> them with a single code.)
>
> That's the alternative.  OTOH with the above we have a single canonical
> way to compute the value of a comparison rather than two as we have now:
>   _1 = _2 < _3;
> vs.
>   _1 = _2 < _3 ? true : false;
>
> so I somewhat settled on that "fix" to the GENERIC expression in
> our [VEC_]COND_EXPRs.

But the second feels as redundant as x + 0 to me :-)  Booleans should
be 0 or 1 whereever they come from, whether it's directly from a
comparison, from a BIT_FIELD_REF, from a logical operation, etc.
So if we don*t fold the second to the first at the moment, that seems
like a missed opportunity.

> Also targets seem to like "repeated" conditions better.

What kinds of situation do you mean?

>> I think part of the problem with the 4-way stmt is that it becomes
>> harder to know what canonical form to use.  E.g. should it be
>> canonicalised so that op4 is a constant where possible?
>> Or should it be canonicalised based on the condition?
>
> We have the same issue with COND_EXPR right now...
>
>>   I've seen
>> this lead to cases in which we end up inserting opposite comparisons
>> with the current embedded VEC_COND_EXPR stuff, whereas splitting
>> them out would make it more obvious that the comparison and the
>> select are logically two separate things.
>
> True, but you'd still have the choice of inverting the condition and thus
> swapping the operands.
>
>> It would also be a bit inconsistent with IFN_COND_*, since they'd
>> still take the result of a separate comparison, and also the older
>> IFN_MASK* functions.
>
> True...  it shows that IFNs while "nice" are not really a replacement
> for "proper" predicated stmts ;)

But IFN_*s can in general be used to predicate an entire block.
I don't think we want to repeat the comparison in the block predicate
for each individual statement.

If we're moving to doing more instruction selection in gimple
(IMO a good thing), then I don't think we'd want to bake-in a
requirement that every conditional op has to do a comparison.
E.g. if the same condition is used in two selections:

   _5 = _1 < _2 ? _3 : _4;
   _8 = _1 < _2 ? _6 : _7;

then as it stands it would be hard to remove the redundant
comparison.  I suppose we could have:

   _9 = _1 < _2 ? true : false;
   _5 = _9 != 0 ? _3 : _4;
   _8 = _9 != 0 ? _6 : _7;

and assume that targets can handle != 0 more efficiently than
anything else.  But that makes comparisons with zero special,
whereas using a 4-op statement suggests that it shouldn't be.
It also adds another variant to the canonicalisation rules,
since there's a choice over all of:

  - the way _9 is written (true : false vs. false : true)
  - whether _9 should be inverted and the inversion propagated
    to the users (which like you say is a problem whatever we do)
  - whether != 0 should be preferred over == 0.

>> >> >> in which the "then" value is the result of the
> conditionally-executed
>> >> >> operation and the "else" value is the first operand of that
> operation.
>> >> >> This "else" value is the one guaranteed by IFN_COND_* and so we can
>> >> >> replace y with x.
>> >> >
>> >> >> The patch also adds new conditional functions for multiplication
>> >> >> and division, which previously weren't needed.  This enables an
>> >> >> extra fully-masked reduction (of dubious value) in
>> > gcc.dg/vect/pr53773.c.
>> >> >
>> >> >> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
>> >> >> and x86_64-linux-gnu.  OK to install?
>> >> >
>> >> > It does look like a clever way of expressing conditional execution.
> It
>> >> > seems
>> >> > "scaling" this to more operations is a bit awkward and certainly it
>> > would be
>> >> > nice to see for example the division case also prototyped for AVX512
> to
>> > see
>> >> > if it works for anything besides ARM.  Kirill CCed.
>> >
>> >> Yeah, I think the new version should be general enough for AVX512.
>> >> I've added a couple of gcc.dg/vect tests that are keyed off
>> >> vect_double_cond_arith.
>> >
>> >> Tested as before.  OK to install?
>> >
>> > In the if-conversion part how can it be that you need
>> > value_available_p?  Why do you delay propagating out
>> > redundant SSA names?
>> >
>> > It feels somewhat ugly possibly because if-conversion is implemented
>> > like it is right now?
>
>> I guess (although it didn't seem so ugly to me :-)).  The statements
>> are converted in-place and then the blocks are stitched together,
>> and since there's no relationship between the "then" and "else"
>> values of the COND_EXPR, there's no guarantee that the "else" is
>> computed before the "then" value, or that the "then" value can
>> be delayed to the point of the COND_EXPR.  Some statement reordering
>> would be necessary to handle the general case (and that might in itself
>> not be valid due to aliasing etc.).
>
> Hmm, but the ultimate "merges" are inserted for PHIs and only redundancies
> with those are handled?  At least that's how I read the examples you quoted.

Right.  But say we have:

  if (cond)
    a1 = b1 op1 c1;
  else
    a2 = b2 op2 c2;
  a = PHI <a1, a2>

and op2 is safe to speculate and op1 isn't.  We can't assume that
a2's block comes before a1's, so we can't use:

  a1 = IFN_COND_FOO (cond, b1, c1, a2)

in place of the original a1.  And we also can't always defer a1 to the
join block because it might be used later in a1's block.

Thanks,
Richard