This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: vec_cond_expr adjustments

From: Richard Guenther <richard dot guenther at gmail dot com>
To: Marc Glisse <marc dot glisse at inria dot fr>
Cc: gcc-patches at gcc dot gnu dot org
Date: Tue, 2 Oct 2012 14:41:12 +0200
Subject: Re: vec_cond_expr adjustments
References: <alpine.DEB.2.02.1209291454140.11224@stedding.saclay.inria.fr> <CAFiYyc3E5wvNuT+yfM1THfEA5s3OfzbNRAuu9dQHg___mMknVw@mail.gmail.com> <alpine.DEB.2.02.1210011424250.4973@stedding.saclay.inria.fr>

On Mon, Oct 1, 2012 at 5:57 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
> [merging both threads, thanks for the answers]
>
>
> On Mon, 1 Oct 2012, Richard Guenther wrote:
>
>>>> optabs should be fixed instead, an is_gimple_val condition is
>>>> implicitely
>>>> val != 0.
>>>
>>>
>>> For vectors, I think it should be val < 0 (with an appropriate cast of
>>> val
>>> to a signed integer vector type if necessary). Or (val & highbit) != 0,
>>> but
>>> that's longer.
>>
>>
>> I don't think so.  Throughout the compiler we generally assume false == 0
>> and anything else is true.  (yes, for FP there is STORE_FLAG_VALUE, but
>> it's scope is quite limited - if we want sth similar for vectors we'd have
>> to
>> invent it).
>
>
> See below.
>
>
>>>> If we for example have
>>>>
>>>> predicate = a < b;
>>>> x = predicate ? d : e;
>>>> y = predicate ? f : g;
>>>>
>>>> we ideally want to re-use the predicate computation on targets where
>>>> that would be optimal (and combine should be able to recover the
>>>> case where it is not).
>>>
>>>
>>> That I don't understand. The vcond instruction implemented by targets
>>> takes
>>> as arguments d, e, cmp, a, b and emits the comparison itself. I don't see
>>> how I can avoid sending to the targets both (d,e,<,a,b) and (f,g,<,a,b).
>>> They will notice eventually that a<b is computed twice and remove one of
>>> the
>>> two, but I don't see how to do that in optabs.c. Or I can compute x = a <
>>> b,
>>> use x < 0 as the comparison passed to the targets, and expect targets
>>> (those
>>> for which it is true) to recognize that < 0 is useless in a vector
>>> condition
>>> (PR54700), or is useless on a comparison result.
>>
>>
>> But that's a limitation of how vcond works.  ISTR there is/was a vselect
>> instruction as well, taking a "mask" and two vectors to select from.  At
>> least
>> that's how vcond works internally for some sub-targets.
>
>
> vselect seems to only appear in config/. Would it be defined as:
> vselect(m,a,b)=(a&m)|(b&~m) ? I would almost be tempted to just define a
> pattern in .md files and let combine handle it, although it might be one
> instruction too long for that (and if m is x<y, ~m might look like x>=y).
> Or would it match the OpenCL select: "For each component of a vector type,
> result[i] = if MSB of c[i] is set ? b[i] : a[i]."? Or the pattern with &
> and | but with a precondition that the value of each element of the mask
> must be 0 or ±1?
>
> I don't find vcond that bad, as long as targets check for trivial
> comparisons in the expansion (what trivial means may depend on the
> platform). It is quite flexible for targets.

Well, ok.

>
> On Mon, 1 Oct 2012, Richard Guenther wrote:
>
>>>         tmp = fold_build2_loc (gimple_location (def_stmt),
>>>                                code,
>>> -                              boolean_type_node,
>>> +                              TREE_TYPE (cond),
>>
>>
>> That's obvious.
>
>
> Ok, I'll test and commit that line separately.
>
>>> +  if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
>>> +    {
>>> +      int count = VECTOR_CST_NELTS (op0);
>>> +      tree *elts =  XALLOCAVEC (tree, count);
>>> +      gcc_assert (TREE_CODE (type) == VECTOR_TYPE);
>>> +
>>> +      for (int i = 0; i < count; i++)
>>> +       {
>>> +         tree elem_type = TREE_TYPE (type);
>>> +         tree elem0 = VECTOR_CST_ELT (op0, i);
>>> +         tree elem1 = VECTOR_CST_ELT (op1, i);
>>> +
>>> +         elts[i] = fold_relational_const (code, elem_type,
>>> +                                          elem0, elem1);
>>> +
>>> +         if(elts[i] == NULL_TREE)
>>> +           return NULL_TREE;
>>> +
>>> +         elts[i] = fold_negate_const (elts[i], elem_type);
>>
>>
>> I think you need to invent something new similar to STORE_FLAG_VALUE
>> or use STORE_FLAG_VALUE here.  With the above you try to map
>> {0, 1} to {0, -1} which is only true if the operation on the element types
>> returns {0, 1} (thus, STORE_FLAG_VALUE is 1).
>
>
> Er, seems to me that constant folding of a scalar comparison in the
> front/middle-end only returns {0, 1}.

The point is we need to define some semantics for vector comparison
results.  One variant is to make it target independent which in turn
would inhibit (or make it more difficult) to exploit some target features.
You for example use {0, -1} for truth values - probably to exploit target
features - even though the most natural middle-end way would be to
use {0, 1} as for everything else (caveat: there may be both signed
and unsigned bools, we don't allow vector components with non-mode precision,
thus you could argue that a signed bool : 1 is just "sign-extended"
for your solution).  A different variant is to make it target dependent
to leverage optimization opportunities - that's why STORE_FLAG_VALUE
exists.  For example with vector comparisons a < v result, when
performing bitwise operations on it, you either have to make the target
expand code to produce {0, -1} even if the natural compare instruction
would, say, produce {0, 0x80000} - or not constrain the possible values
of its result (like forwprop would do with your patch).  In general we
want constant folding to yield the same results as if the HW carried
out the operation to make -O0 code not diverge from -O1.  Thus,

v4si g;
int main() { g = { 1, 2, 3, 4 } < { 4, 3, 2, 1}; }

should not assign different values to g dependent on constant propagation
performed or not.  The easiest way out is something like STORE_FLAG_VALUE
if there does not exist a middle-end choice for vector true / false components
that can be easily generated from what the target produces.

Like if you perform a FP comparison

int main () { double x = 1.0; static _Bool b; b = x < 3.0; }

you get without CCP on x86_64:

        ucomisd -8(%rbp), %xmm0
        seta    %al
        movb    %al, b.1715(%rip)

thus the equivalent of

    flag_reg = x < 3.0;
    b = flag_reg ? 1 : 0;

for vector compares you get something similar:

    flag_vec = x < y;
    res = flag_vec ? { 1, ... } : { 0, ... };

which I think you can see being produced by generic vector lowering
(in do_compare).  Where I can see we indeed use {0, -1} ... which
would match your constant folding behavior.

We may not be able to easily recover from this intermediate step
with combine (I'm not sure), so a target dependent value may
be prefered.


>>> +/* Return true if EXPR is an integer constant representing true.  */
>>> +
>>> +bool
>>> +integer_truep (const_tree expr)
>>> +{
>>> +  STRIP_NOPS (expr);
>>> +
>>> +  switch (TREE_CODE (expr))
>>> +    {
>>> +    case INTEGER_CST:
>>> +      /* Do not just test != 0, some places expect the value 1.  */
>>> +      return (TREE_INT_CST_LOW (expr) == 1
>>> +             && TREE_INT_CST_HIGH (expr) == 0);
>>
>>
>> I wonder if using STORE_FLAG_VALUE is better here (note that it
>> usually differs for FP vs. integral comparisons and the mode passed
>> to STORE_FLAG_VALUE is that of the comparison result).
>
>
> I notice there is already a VECTOR_STORE_FLAG_VALUE (used only once in
> simplify-rtx, in a way that seems a bit strange but I'll try to
> understand that later). Thanks for showing me this macro, it seems
> important indeed. However the STORE_FLAG_VALUE mechanism seems to be for
> the RTL level.
>
> It looks like it would be possible to have 3 different semantics:
> source code is OpenCL, middle-end whatever we want (0 / 1 for instance),
> and back-end is whatever the target wants. The front-end would generate
> for a<b : vec_cond_expr(a<b,-1,0)

seems like the middle-end uses this for lowering vector compares,
a < b -> { a[0] < b[0] ? -1 : 0, ... }

> and for a?b:c : vec_cond_expr(a<0,b,c)

it looks like ?: is not generally handled by tree-vect-generic, so it must
be either not supported by the frontend or lowered therein (ISTR
it is forced to appear as a != {0,...} ? ... : ...)

> and there is no need for the middle-end to use the same representation
> of comparisons as the front-ends or targets (expand of a vec_cond_expr
> whose first argument is not a comparison would use != 0 if we chose a 0
> / 1 encoding).
>
> However, since the front-ends and many targets agree on the OpenCL
> semantics, it means introducing a conversion back and forth in the
> middle-end, which may complicate things a bit. Note also that we already
> have constant_boolean_node that returns a vector of -1 for true, and
> that the front-end doesn't currently generate a vec_cond_expr for a<b.

Yeah, I realized that now.

> It is true though that using 1 for true would fit better with the scalar
> ops.
>
> Well, if someone feels like taking a decision... I am happy with either
> choice, as long as I know where to go.

I'd say adjust your fold-const patch to not negate the scalar result
but build a proper -1 / 0 value based on integer_zerop().

Thanks,
Richard.

>> That said, until we are sure what semantics we want here (forwprop
>> for example doesn't look at 'comparisons' but operations on special
>> values and types) I'd prefer to not introduce integer_truep ().
>
>
> I completely agree that defining the semantics comes first :-)
>
> --
> Marc Glisse

Follow-Ups:
- Re: vec_cond_expr adjustments
  - From: Marc Glisse

References:
- Re: Constant-fold vector comparisons
  - From: Richard Guenther
- Re: vec_cond_expr adjustments
  - From: Marc Glisse

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]