This is the mail archive of the
mailing list for the GCC project.
Re: [RFC] [patch] Support vectorization of min/max location pattern
On 07/08/2010 11:19 AM, Ira Rosen wrote:
> It's minloc pattern, i.e., a loop that finds the location of the minimum:
> float arr[N};
> for (i = 0; i < N; i++)
> if (arr[i] < limit)
> pos = i + 1;
> limit = arr[i];
> Vectorizer's input code:
> # pos_22 = PHI <pos_1(4), 1(2)>
> # limit_24 = PHI <limit_4(4), 0(2)>
> pos_1 = [cond_expr] limit_9 < limit_24 ? pos_10 : pos_22; //
> limit_4 = [cond_expr] limit_9 < limit_24 ? limit_9 : limit_24; // min
Ok, I get it now.
So your thinking was that you needed the builtin to replace the
comparison portion of the VEC_COND_EXPR? Or, looking again I see
that you don't actually use VEC_COND_EXPR, you use ...
> + /* Create: VEC_DEST = (VEC_OPRND1 & MASK) | (VEC_OPRND2 & !MASK). */
... explicit masking. I.e. you assume that the return value of
the builtin is a bit mask of the full width, and that there's no
better way to implement the VEC_COND.
I wonder if it wouldn't be better to extend the definition
of VEC_COND_EXPR so that the comparison values can be of a
different type than the data operands (with the caveat that the
number of elements should be the same -- i.e. 4-wide compare must
match 4-wide data movement).
I can think of 2 portability problems with your current solution:
(1) SSE4.1 would prefer to use BLEND instructions, which perform
that entire (X & M) | (Y & ~M) operation in one insn.
(2) The mips C.cond.PS instruction does *not* produce a bitmask
like altivec or sse do. Instead it sets multiple condition
codes. One then uses MOV[TF].PS to merge the elements based
on the individual condition codes. While there's no direct
corresponding instruction that will operate on integers, I
don't think it would be too difficult to use MOV[TF].G or
BC1AND2[FT] instructions to emulate it. In any case, this
is again a case where you don't want to expose any part of
the VEC_COND at the gimple level.