vec_cond_expr adjustments

Fri Oct 5 15:01:00 GMT 2012

[I am still a little confused, sorry for the long email...]

On Tue, 2 Oct 2012, Richard Guenther wrote:

>>>> +  if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
>>>> +    {
>>>> +      int count = VECTOR_CST_NELTS (op0);
>>>> +      tree *elts =  XALLOCAVEC (tree, count);
>>>> +      gcc_assert (TREE_CODE (type) == VECTOR_TYPE);
>>>> +
>>>> +      for (int i = 0; i < count; i++)
>>>> +       {
>>>> +         tree elem_type = TREE_TYPE (type);
>>>> +         tree elem0 = VECTOR_CST_ELT (op0, i);
>>>> +         tree elem1 = VECTOR_CST_ELT (op1, i);
>>>> +
>>>> +         elts[i] = fold_relational_const (code, elem_type,
>>>> +                                          elem0, elem1);
>>>> +
>>>> +         if(elts[i] == NULL_TREE)
>>>> +           return NULL_TREE;
>>>> +
>>>> +         elts[i] = fold_negate_const (elts[i], elem_type);
>>>
>>>
>>> I think you need to invent something new similar to STORE_FLAG_VALUE
>>> or use STORE_FLAG_VALUE here.  With the above you try to map
>>> {0, 1} to {0, -1} which is only true if the operation on the element types
>>> returns {0, 1} (thus, STORE_FLAG_VALUE is 1).
>>
>> Er, seems to me that constant folding of a scalar comparison in the
>> front/middle-end only returns {0, 1}.
[and later]
> I'd say adjust your fold-const patch to not negate the scalar result
> but build a proper -1 / 0 value based on integer_zerop().

I don't mind doing it that way, but I would like to understand first. 
LT_EXPR on scalars is guaranteed (in generic.texi) to be 0 or 1. So 
negating should be the same as testing with integer_zerop to build -1 or 
0. Is it just a matter of style (then I am ok), or am I missing a reason 
which makes the negation wrong?

> The point is we need to define some semantics for vector comparison
> results.

Yes. I think a documentation patch should come first: generic.texi is 
missing an entry for VEC_COND_EXPR and the entry for LT_EXPR doesn't 
mention vectors. But before that we need to decide what to put there...

> One variant is to make it target independent which in turn
> would inhibit (or make it more difficult) to exploit some target features.
> You for example use {0, -1} for truth values - probably to exploit target
> features -

Actually it was mostly because that is the meaning in the language. OpenCL 
says that a<b is a vector of 0 and -1, and that ?: only looks at the MSB 
of the elements in the condition. The fact that it matches what some 
targets do is a simple consequence of the fact that OpenCL was based on 
what hardware already did.

> even though the most natural middle-end way would be to
> use {0, 1} as for everything else

I agree that it would be natural and convenient in a number of places.

> (caveat: there may be both signed and unsigned bools, we don't allow 
> vector components with non-mode precision, thus you could argue that a 
> signed bool : 1 is just "sign-extended" for your solution).

Not sure how that would translate in the code.

> A different variant is to make it target dependent to leverage 
> optimization opportunities

That's an interesting possibility...

> that's why STORE_FLAG_VALUE exists.

AFAICS it only appears when we go from gimple to rtl, not before (and 
there is already a VECTOR_STORE_FLAG_VALUE, although no target defines 
it). Which doesn't mean we couldn't make it appear earlier for vectors.

> For example with vector comparisons a < v result, when
> performing bitwise operations on it, you either have to make the target
> expand code to produce {0, -1} even if the natural compare instruction
> would, say, produce {0, 0x80000} - or not constrain the possible values
> of its result (like forwprop would do with your patch).  In general we
> want constant folding to yield the same results as if the HW carried
> out the operation to make -O0 code not diverge from -O1.  Thus,
>
> v4si g;
> int main() { g = { 1, 2, 3, 4 } < { 4, 3, 2, 1}; }
>
> should not assign different values to g dependent on constant propagation
> performed or not.

That one is clear, OpenCL constrains the answer to be {-1,-1,0,0}, whether 
your target likes it or not. Depending on how things are handled, 
comparisons could be constrained internally to only appear (possibly 
indirectly) in the first argument of a vec_cond_expr.

> The easiest way out is something like STORE_FLAG_VALUE
> if there does not exist a middle-end choice for vector true / false components
> that can be easily generated from what the target produces.
>
> Like if you perform a FP comparison
>
> int main () { double x = 1.0; static _Bool b; b = x < 3.0; }
>
> you get without CCP on x86_64:
>
>        ucomisd -8(%rbp), %xmm0
>        seta    %al
>        movb    %al, b.1715(%rip)
>
> thus the equivalent of
>
>    flag_reg = x < 3.0;
>    b = flag_reg ? 1 : 0;

where this expansion happens in the back-end.

> for vector compares you get something similar:
>
>    flag_vec = x < y;
>    res = flag_vec ? { 1, ... } : { 0, ... };
>
> which I think you can see being produced by generic vector lowering
> (in do_compare).  Where I can see we indeed use {0, -1} ... which
> would match your constant folding behavior.
>
> We may not be able to easily recover from this intermediate step
> with combine (I'm not sure), so a target dependent value may
> be prefered.

Being able to optimize it is indeed a key point. Let's try on an example 
(not assuming any specific representation in the middle-end for now). Say 
I write this C/OpenCL code: ((a<b)&&(c<d))?x:y (not currently supported)

The front-end gives to the middle-end: ((((a<b)?-1:0)&((c<d)?-1:0))<0)?x:y

On an architecture like sse, neon or altivec where VECTOR_STORE_FLAG_VALUE 
is -1 (well, should be), expansion of (a<b)?-1:0 would just be a<b. The <0 
can also disappear if the vcond instruction only looks at the MSB (x86). 
And we are left in the back-end with ((a<b)&(c<d))?x:y, as desired.

On other architectures, expecting the back-end to simplify everything does 
seem hard. But it isn't obvious how to handle it in the middle end either.
Some other forms we could imagine the middle-end producing:
(a<b)?(c<d)?x:y:y
or assuming that VECTOR_STORE_FLAG_VALUE is defined:
(((a<b)&(c<d))!=0)?x:y (back-end would remove the != 0 on altivec)
Both would require special code to happen.

But then how do we handle for instance sparc, where IIUC comparing 2 
vectors returns an integer, where bits 0, 1, etc of the integer represent 
true/false for the comparisons of elements 0, 1, etc of the vectors (as in 
vec_merge, but not constant)? Defining VECTOR_STORE_FLAG_VALUE is not 
possible since comparisons don't return a vector, but we would still want 
to compute a<b, c<d, and perform an AND of those 2 integers before calling 
the usual code for the selection.

If we assume a -1/0 and MSB representation in the middle-end, the 
front-end could just pass ((a<b)&(c<d))?x:y to the middle-end. When 
moving to the back-end, "nothing" would happen on x86.

Comparing x86, neon and altivec, they all have comparisons that return a 
vector of -1 and 0. On the other hand, they have different selection 
instructions. x86 uses <0, altivec uses !=0 and neon has a bitwise select 
and thus requires exactly -1 or 0. It thus seems to me that we should 
decide in the middle-end that vector comparisons return vectors of -1 and 
0. VEC_COND_EXPR is more complicated. We could for instance require that 
it takes as first argument a vector of -1 and 0 (thus <0, !=0 and the neon 
thing are equivalent). Which would leave to decide what the expansion of 
vec_cond_expr passes to the targets when the first argument is not a 
comparison, between !=0, <0, ==-1 or others (I vote for <0 because of 
opencl). One issue is that targets wouldn't know if it was a dummy 
comparison that can safely be ignored because the other part is the result 
of logical operations on comparisons (thus composed of -1 and 0) or a 
genuine comparison with an arbitrary vector, so a new optimization would 
be needed (in the back-end I guess or we would need an alternate 
instruction to vcond) to detect if a vector is a "signed boolean" vector.
We could instead say that vec_cond_expr really follows OpenCL's semantics 
and looks at the MSB of each element. I am not sure that would change 
much, it would mostly delay the apparition of <0 to RTL expansion time 
(and thus make gimple slightly lighter).

>>>> +/* Return true if EXPR is an integer constant representing true.  */
>>>> +
>>>> +bool
>>>> +integer_truep (const_tree expr)
>>>> +{
>>>> +  STRIP_NOPS (expr);
>>>> +
>>>> +  switch (TREE_CODE (expr))
>>>> +    {
>>>> +    case INTEGER_CST:
>>>> +      /* Do not just test != 0, some places expect the value 1.  */
>>>> +      return (TREE_INT_CST_LOW (expr) == 1
>>>> +             && TREE_INT_CST_HIGH (expr) == 0);
>>>
>>>
>>> I wonder if using STORE_FLAG_VALUE is better here (note that it
>>> usually differs for FP vs. integral comparisons and the mode passed
>>> to STORE_FLAG_VALUE is that of the comparison result).
>>
>>
>> I notice there is already a VECTOR_STORE_FLAG_VALUE (used only once in
>> simplify-rtx, in a way that seems a bit strange but I'll try to
>> understand that later). Thanks for showing me this macro, it seems
>> important indeed. However the STORE_FLAG_VALUE mechanism seems to be for
>> the RTL level.
>>
>> It looks like it would be possible to have 3 different semantics:
>> source code is OpenCL, middle-end whatever we want (0 / 1 for instance),
>> and back-end is whatever the target wants. The front-end would generate
>> for a<b : vec_cond_expr(a<b,-1,0)
>
> seems like the middle-end uses this for lowering vector compares,
> a < b -> { a[0] < b[0] ? -1 : 0, ... }
>
>> and for a?b:c : vec_cond_expr(a<0,b,c)
>
> it looks like ?: is not generally handled by tree-vect-generic, so it must
> be either not supported by the frontend or lowered therein (ISTR
> it is forced to appear as a != {0,...} ? ... : ...)

Not supported by the front-end yet (not even by the gimplifier), I have 
(bad) patches but I can't really finish them before this conversation is 
done.

I think there are quite few places in the middle-end that assume that 
comparisons return a vector of -1/0 and even fewer that vec_cond_expr only 
looks at the MSB of each element. So it is still time to change that if 
you want to. But if we want to change it, I think it should happen now 
before even more vector code gets in (not particularly my patches, I am 
thinking of cilk and others too).

Ok, that's long enough, I need to send it now...

-- 
Marc Glisse