[PATCH, x86] Improves x86 permutation expand
Evgeny Stupachenko
evstupac@gmail.com
Mon Jun 9 10:13:00 GMT 2014
Right now we need to cover permutations coming from 3 loads/stores
group. My case covers them. I agree that another order of pblend and
pshufb covers additional cases. Good point. We can cover this in a
separate patch.
Asserts are ok if we exclude AVX2 in ISA checks.
Is the following patch ok?
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 8827256..1fe2398 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -43185,6 +43185,90 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d)
return ok;
}
+/* A subroutine of ix86_expand_vec_perm_const_1. Try to simplify
+ the permutation using the SSE4_1 pblendv instruction. Potentially
+ reduces permutaion from 2 pshufb and or to 1 pshufb and pblendv. */
+
+static bool
+expand_vec_perm_pblendv (struct expand_vec_perm_d *d)
+{
+ unsigned i, which, nelt = d->nelt;
+ struct expand_vec_perm_d dcopy, dcopy1;
+ enum machine_mode vmode = d->vmode;
+ bool ok;
+
+ /* Use the same checks as in expand_vec_perm_blend, but skipping
+ AVX2 as it requires more than 2 instructions for general case. */
+ if (d->one_operand_p)
+ return false;
+ if (TARGET_AVX && (vmode == V4DFmode || vmode == V8SFmode))
+ ;
+ else if (TARGET_SSE4_1 && GET_MODE_SIZE (vmode) == 16)
+ ;
+ else
+ return false;
+
+ /* Figure out where permutation elements stay not in their
+ respective lanes. */
+ for (i = 0, which = 0; i < nelt; ++i)
+ {
+ unsigned e = d->perm[i];
+ if (e != i)
+ which |= (e < nelt ? 1 : 2);
+ }
+ /* We can pblend the part where elements stay not in their
+ respective lanes only when these elements are all in one
+ half of a permutation.
+ {0 1 8 3 4 5 9 7} is ok as 8, 9 are at not at their respective
+ lanes, but both 8 and 9 >= 8
+ {0 1 8 3 4 5 2 7} is not ok as 2 and 8 are not at their
+ respective lanes and 8 >= 8, but 2 not. */
+ if (which != 1 && which != 2)
+ return false;
+ if (d->testing_p)
+ return true;
+
+ /* First we apply one operand permutation to the part where
+ elements stay not in their respective lanes. */
+ dcopy = *d;
+ if (which == 2)
+ dcopy.op0 = dcopy.op1 = d->op1;
+ else
+ dcopy.op0 = dcopy.op1 = d->op0;
+ dcopy.one_operand_p = true;
+
+ for (i = 0; i < nelt; ++i)
+ {
+ unsigned e = d->perm[i];
+ if (which == 2)
+ dcopy.perm[i] = ((e >= nelt) ? (e - nelt) : e);
+ }
+
+ ok = expand_vec_perm_1 (&dcopy);
+ gcc_assert (ok);
+
+ /* Next we put permuted elements into thier positions. */
+ dcopy1 = *d;
+ if (which == 2)
+ dcopy1.op1 = dcopy.target;
+ else
+ dcopy1.op0 = dcopy.target;
+
+ for (i = 0; i < nelt; ++i)
+ {
+ unsigned e = d->perm[i];
+ if (which == 2)
+ dcopy1.perm[i] = ((e >= nelt) ? (nelt + i) : e);
+ else
+ dcopy1.perm[i] = ((e < nelt) ? i : e);
+ }
+
+ ok = expand_vec_perm_blend (&dcopy1);
+ gcc_assert (ok);
+
+ return true;
+}
+
static bool expand_vec_perm_interleave3 (struct expand_vec_perm_d *d);
/* A subroutine of ix86_expand_vec_perm_builtin_1. Try to simplify
@@ -44557,6 +44641,9 @@ ix86_expand_vec_perm_const_1 (struct
expand_vec_perm_d *d)
if (expand_vec_perm_vperm2f128 (d))
return true;
+ if (expand_vec_perm_pblendv (d))
+ return true;
+
/* Try sequences of three instructions. */
if (expand_vec_perm_2vperm2f128_vshuf (d))
On Thu, Jun 5, 2014 at 8:49 PM, Richard Henderson <rth@redhat.com> wrote:
> On 06/05/2014 08:29 AM, Evgeny Stupachenko wrote:
>> + /* Figure out where permutation elements stay not in their
>> + respective lanes. */
>> + for (i = 0, which = 0; i < nelt; ++i)
>> + {
>> + unsigned e = d->perm[i];
>> + if (e != i)
>> + which |= (e < nelt ? 1 : 2);
>> + }
>> + /* We can pblend the part where elements stay not in their
>> + respective lanes only when these elements are all in one
>> + half of a permutation.
>> + {0 1 8 3 4 5 9 7} is ok as 8, 9 are not at their respective
>> + lanes, but both 8 and 9 >= 8
>> + {0 1 8 3 4 5 2 7} is not ok as 2 and 8 are not at their
>> + respective lanes and 8 >= 8, but 2 not. */
>> + if (which != 1 && which != 2)
>> + return false;
>
> I was about to suggest that you'd get more success by putting the blend first,
> and do the shuffle second. But I suppose it does cover a few cases that the
> other way would miss, e.g.
>
> { 0 4 7 3 }
>
> because we can't blend 0 and 4 (or 3 and 7) into the same vector. Whereas the
> direction you're trying can't handle
>
> { 0 6 6 1 }
>
> But that can be implemented with
>
> { 0 1 2 3 }
> { 4 5 6 7 }
> -----------
> { 0 1 6 3 } (pblend)
> -----------
> { 0 6 6 1 } (pshufb)
>
> So I guess we should cover these two cases in successive patches.
>
>> + if (!expand_vec_perm_blend (&dcopy1))
>> + return false;
>> +
>> + return true;
>
> You should avoid doing any work in this function if the ISA isn't enabled.
> Don't wait until the last test for blend to fail. Separate that out from the
> start of expand_vec_perm_blend as a subroutine, perhaps.
>
> We should be able to prove that we've got a valid blend as input here, so I'd
> be more inclined to write
>
> ok = expand_vec_perm_blend (&dcopy1);
> gcc_assert (ok);
> return true;
>
>> + if (!expand_vec_perm_1 (&dcopy))
>> + return false;
>
> If we know we have pblend, then we know we have pshufb, so again I don't see
> how expand_vec_perm_1 can fail. Another assert would be good.
>
> There is a point, earlier in the function, where we know whether we're going to
> succeed or not. I believe just after
>
>> + if (which != 1 && which != 2)
>> + return false;
>
> You should add a
>
> if (d->testing_p)
> return true;
>
> at that point.
>
>
> r~
More information about the Gcc-patches
mailing list