vec_cond_expr adjustments
Marc Glisse
marc.glisse@inria.fr
Fri Oct 5 15:01:00 GMT 2012
[I am still a little confused, sorry for the long email...]
On Tue, 2 Oct 2012, Richard Guenther wrote:
>>>> + if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
>>>> + {
>>>> + int count = VECTOR_CST_NELTS (op0);
>>>> + tree *elts = XALLOCAVEC (tree, count);
>>>> + gcc_assert (TREE_CODE (type) == VECTOR_TYPE);
>>>> +
>>>> + for (int i = 0; i < count; i++)
>>>> + {
>>>> + tree elem_type = TREE_TYPE (type);
>>>> + tree elem0 = VECTOR_CST_ELT (op0, i);
>>>> + tree elem1 = VECTOR_CST_ELT (op1, i);
>>>> +
>>>> + elts[i] = fold_relational_const (code, elem_type,
>>>> + elem0, elem1);
>>>> +
>>>> + if(elts[i] == NULL_TREE)
>>>> + return NULL_TREE;
>>>> +
>>>> + elts[i] = fold_negate_const (elts[i], elem_type);
>>>
>>>
>>> I think you need to invent something new similar to STORE_FLAG_VALUE
>>> or use STORE_FLAG_VALUE here. With the above you try to map
>>> {0, 1} to {0, -1} which is only true if the operation on the element types
>>> returns {0, 1} (thus, STORE_FLAG_VALUE is 1).
>>
>> Er, seems to me that constant folding of a scalar comparison in the
>> front/middle-end only returns {0, 1}.
[and later]
> I'd say adjust your fold-const patch to not negate the scalar result
> but build a proper -1 / 0 value based on integer_zerop().
I don't mind doing it that way, but I would like to understand first.
LT_EXPR on scalars is guaranteed (in generic.texi) to be 0 or 1. So
negating should be the same as testing with integer_zerop to build -1 or
0. Is it just a matter of style (then I am ok), or am I missing a reason
which makes the negation wrong?
> The point is we need to define some semantics for vector comparison
> results.
Yes. I think a documentation patch should come first: generic.texi is
missing an entry for VEC_COND_EXPR and the entry for LT_EXPR doesn't
mention vectors. But before that we need to decide what to put there...
> One variant is to make it target independent which in turn
> would inhibit (or make it more difficult) to exploit some target features.
> You for example use {0, -1} for truth values - probably to exploit target
> features -
Actually it was mostly because that is the meaning in the language. OpenCL
says that a<b is a vector of 0 and -1, and that ?: only looks at the MSB
of the elements in the condition. The fact that it matches what some
targets do is a simple consequence of the fact that OpenCL was based on
what hardware already did.
> even though the most natural middle-end way would be to
> use {0, 1} as for everything else
I agree that it would be natural and convenient in a number of places.
> (caveat: there may be both signed and unsigned bools, we don't allow
> vector components with non-mode precision, thus you could argue that a
> signed bool : 1 is just "sign-extended" for your solution).
Not sure how that would translate in the code.
> A different variant is to make it target dependent to leverage
> optimization opportunities
That's an interesting possibility...
> that's why STORE_FLAG_VALUE exists.
AFAICS it only appears when we go from gimple to rtl, not before (and
there is already a VECTOR_STORE_FLAG_VALUE, although no target defines
it). Which doesn't mean we couldn't make it appear earlier for vectors.
> For example with vector comparisons a < v result, when
> performing bitwise operations on it, you either have to make the target
> expand code to produce {0, -1} even if the natural compare instruction
> would, say, produce {0, 0x80000} - or not constrain the possible values
> of its result (like forwprop would do with your patch). In general we
> want constant folding to yield the same results as if the HW carried
> out the operation to make -O0 code not diverge from -O1. Thus,
>
> v4si g;
> int main() { g = { 1, 2, 3, 4 } < { 4, 3, 2, 1}; }
>
> should not assign different values to g dependent on constant propagation
> performed or not.
That one is clear, OpenCL constrains the answer to be {-1,-1,0,0}, whether
your target likes it or not. Depending on how things are handled,
comparisons could be constrained internally to only appear (possibly
indirectly) in the first argument of a vec_cond_expr.
> The easiest way out is something like STORE_FLAG_VALUE
> if there does not exist a middle-end choice for vector true / false components
> that can be easily generated from what the target produces.
>
> Like if you perform a FP comparison
>
> int main () { double x = 1.0; static _Bool b; b = x < 3.0; }
>
> you get without CCP on x86_64:
>
> ucomisd -8(%rbp), %xmm0
> seta %al
> movb %al, b.1715(%rip)
>
> thus the equivalent of
>
> flag_reg = x < 3.0;
> b = flag_reg ? 1 : 0;
where this expansion happens in the back-end.
> for vector compares you get something similar:
>
> flag_vec = x < y;
> res = flag_vec ? { 1, ... } : { 0, ... };
>
> which I think you can see being produced by generic vector lowering
> (in do_compare). Where I can see we indeed use {0, -1} ... which
> would match your constant folding behavior.
>
> We may not be able to easily recover from this intermediate step
> with combine (I'm not sure), so a target dependent value may
> be prefered.
Being able to optimize it is indeed a key point. Let's try on an example
(not assuming any specific representation in the middle-end for now). Say
I write this C/OpenCL code: ((a<b)&&(c<d))?x:y (not currently supported)
The front-end gives to the middle-end: ((((a<b)?-1:0)&((c<d)?-1:0))<0)?x:y
On an architecture like sse, neon or altivec where VECTOR_STORE_FLAG_VALUE
is -1 (well, should be), expansion of (a<b)?-1:0 would just be a<b. The <0
can also disappear if the vcond instruction only looks at the MSB (x86).
And we are left in the back-end with ((a<b)&(c<d))?x:y, as desired.
On other architectures, expecting the back-end to simplify everything does
seem hard. But it isn't obvious how to handle it in the middle end either.
Some other forms we could imagine the middle-end producing:
(a<b)?(c<d)?x:y:y
or assuming that VECTOR_STORE_FLAG_VALUE is defined:
(((a<b)&(c<d))!=0)?x:y (back-end would remove the != 0 on altivec)
Both would require special code to happen.
But then how do we handle for instance sparc, where IIUC comparing 2
vectors returns an integer, where bits 0, 1, etc of the integer represent
true/false for the comparisons of elements 0, 1, etc of the vectors (as in
vec_merge, but not constant)? Defining VECTOR_STORE_FLAG_VALUE is not
possible since comparisons don't return a vector, but we would still want
to compute a<b, c<d, and perform an AND of those 2 integers before calling
the usual code for the selection.
If we assume a -1/0 and MSB representation in the middle-end, the
front-end could just pass ((a<b)&(c<d))?x:y to the middle-end. When
moving to the back-end, "nothing" would happen on x86.
Comparing x86, neon and altivec, they all have comparisons that return a
vector of -1 and 0. On the other hand, they have different selection
instructions. x86 uses <0, altivec uses !=0 and neon has a bitwise select
and thus requires exactly -1 or 0. It thus seems to me that we should
decide in the middle-end that vector comparisons return vectors of -1 and
0. VEC_COND_EXPR is more complicated. We could for instance require that
it takes as first argument a vector of -1 and 0 (thus <0, !=0 and the neon
thing are equivalent). Which would leave to decide what the expansion of
vec_cond_expr passes to the targets when the first argument is not a
comparison, between !=0, <0, ==-1 or others (I vote for <0 because of
opencl). One issue is that targets wouldn't know if it was a dummy
comparison that can safely be ignored because the other part is the result
of logical operations on comparisons (thus composed of -1 and 0) or a
genuine comparison with an arbitrary vector, so a new optimization would
be needed (in the back-end I guess or we would need an alternate
instruction to vcond) to detect if a vector is a "signed boolean" vector.
We could instead say that vec_cond_expr really follows OpenCL's semantics
and looks at the MSB of each element. I am not sure that would change
much, it would mostly delay the apparition of <0 to RTL expansion time
(and thus make gimple slightly lighter).
>>>> +/* Return true if EXPR is an integer constant representing true. */
>>>> +
>>>> +bool
>>>> +integer_truep (const_tree expr)
>>>> +{
>>>> + STRIP_NOPS (expr);
>>>> +
>>>> + switch (TREE_CODE (expr))
>>>> + {
>>>> + case INTEGER_CST:
>>>> + /* Do not just test != 0, some places expect the value 1. */
>>>> + return (TREE_INT_CST_LOW (expr) == 1
>>>> + && TREE_INT_CST_HIGH (expr) == 0);
>>>
>>>
>>> I wonder if using STORE_FLAG_VALUE is better here (note that it
>>> usually differs for FP vs. integral comparisons and the mode passed
>>> to STORE_FLAG_VALUE is that of the comparison result).
>>
>>
>> I notice there is already a VECTOR_STORE_FLAG_VALUE (used only once in
>> simplify-rtx, in a way that seems a bit strange but I'll try to
>> understand that later). Thanks for showing me this macro, it seems
>> important indeed. However the STORE_FLAG_VALUE mechanism seems to be for
>> the RTL level.
>>
>> It looks like it would be possible to have 3 different semantics:
>> source code is OpenCL, middle-end whatever we want (0 / 1 for instance),
>> and back-end is whatever the target wants. The front-end would generate
>> for a<b : vec_cond_expr(a<b,-1,0)
>
> seems like the middle-end uses this for lowering vector compares,
> a < b -> { a[0] < b[0] ? -1 : 0, ... }
>
>> and for a?b:c : vec_cond_expr(a<0,b,c)
>
> it looks like ?: is not generally handled by tree-vect-generic, so it must
> be either not supported by the frontend or lowered therein (ISTR
> it is forced to appear as a != {0,...} ? ... : ...)
Not supported by the front-end yet (not even by the gimplifier), I have
(bad) patches but I can't really finish them before this conversation is
done.
I think there are quite few places in the middle-end that assume that
comparisons return a vector of -1/0 and even fewer that vec_cond_expr only
looks at the MSB of each element. So it is still time to change that if
you want to. But if we want to change it, I think it should happen now
before even more vector code gets in (not particularly my patches, I am
thinking of cilk and others too).
Ok, that's long enough, I need to send it now...
--
Marc Glisse
More information about the Gcc-patches
mailing list