This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH X86, PR62128] Rotate pattern for AVX2
- From: Jakub Jelinek <jakub at redhat dot com>
- To: Uros Bizjak <ubizjak at gmail dot com>
- Cc: Evgeny Stupachenko <evstupac at gmail dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, Richard Henderson <rth at redhat dot com>, "H.J. Lu" <hjl dot tools at gmail dot com>
- Date: Wed, 1 Oct 2014 09:32:08 +0200
- Subject: Re: [PATCH X86, PR62128] Rotate pattern for AVX2
- Authentication-results: sourceware.org; auth=none
- References: <CAOvf_xytRo3-vFOdbOuDibk3LCsWrAMsqe1Vd_uSg+QwA71+XA at mail dot gmail dot com> <CAFULd4axXUAuf655zX0NTr13NmDSHF-jqZnZkQAQ-VYqefpqCQ at mail dot gmail dot com> <CAFULd4Y5-mUyU0NcXg+vtidbPuGCM=Tw20M5=k88kouCGPMk-g at mail dot gmail dot com> <CAFULd4Y2yiRW+bWcb7JW6aW_Qmk_E34DT0Y5-=ALjrR9eikBgg at mail dot gmail dot com> <CAOvf_xxcEftax3_j5-TaXL3LDfO-tVeGuaGK1Yb3Lj8A9J0TZw at mail dot gmail dot com> <CAFULd4YyMR615-mZ=5pjj2L2GYEX6+jD8HYDhtoUrYyFptawhw at mail dot gmail dot com>
- Reply-to: Jakub Jelinek <jakub at redhat dot com>
On Wed, Oct 01, 2014 at 08:26:27AM +0200, Uros Bizjak wrote:
> On Wed, Oct 1, 2014 at 12:13 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> > expand_vselect for some reason ignores the expander.
> > Does it work with expanders?
> > The comment talks about insn only:
> > /* Construct (set target (vec_select op0 (parallel perm))) and
> > return true if that's a valid instruction in the active ISA. */
>
> It looks to me that the whole approach is wrong from the beginning.
>
> There is already a function that generates perm/palignr sequence,
> conveniently named expand_vec_perm_palignr. This function should be
> extended to handle AVX2 sequence. You don't have to add any new
> patterns, existing avx2_permv2ti and avx2_palignr2ti should do the
> trick.
Yeah, I'd expect something like (so far not even compile tested):
--- gcc/config/i386/i386.c 2014-09-26 10:33:18.789645102 +0200
+++ gcc/config/i386/i386.c 2014-10-01 09:27:41.435813522 +0200
@@ -43297,44 +43297,79 @@ expand_vec_perm_palignr (struct expand_v
rtx shift, target;
struct expand_vec_perm_d dcopy;
- /* Even with AVX, palignr only operates on 128-bit vectors. */
- if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
+ /* palignr doesn't help for one_operand_p case. */
+ if (d->one_operand_p)
return false;
- min = nelt, max = 0;
+ /* Even with AVX, palignr only operates on 128-bit vectors,
+ in AVX2 palignr operates on both 128-bit lanes. */
+ if ((!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
+ && (!TARGET_AVX2 || GET_MODE_SIZE (d->vmode) != 32))
+ return false;
+
+ min = 2 * nelt, max = 0;
for (i = 0; i < nelt; ++i)
{
unsigned e = d->perm[i];
+ if (GET_MODE_SIZE (d->vmode) == 32)
+ e = (e & ((nelt / 2) - 1)) | ((e & nelt) >> 1);
if (e < min)
min = e;
if (e > max)
max = e;
}
- if (min == 0 || max - min >= nelt)
+ if (min == 0
+ || max - min >= (GET_MODE_SIZE (d->vmode) == 32 ? nelt / 2 : nelt))
return false;
/* Given that we have SSSE3, we know we'll be able to implement the
- single operand permutation after the palignr with pshufb. */
- if (d->testing_p)
+ single operand permutation after the palignr with pshufb for
+ 128-bit vectors. */
+ if (d->testing_p && GET_MODE_SIZE (d->vmode) == 16)
return true;
dcopy = *d;
- shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
- target = gen_reg_rtx (TImode);
- emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1),
- gen_lowpart (TImode, d->op0), shift));
-
- dcopy.op0 = dcopy.op1 = gen_lowpart (d->vmode, target);
- dcopy.one_operand_p = true;
in_order = true;
for (i = 0; i < nelt; ++i)
{
- unsigned e = dcopy.perm[i] - min;
+ unsigned e = dcopy.perm[i];
+ if (GET_MODE_SIZE (d->vmode) == 32
+ && e >= nelt
+ && (e & (nelt / 2 - 1)) < min)
+ e = e - min - (nelt / 2);
+ else
+ e = e - min;
if (e != i)
in_order = false;
dcopy.perm[i] = e;
}
+ dcopy.one_operand_p = true;
+
+ /* For AVX2, test whether we can permute the result in one instruction. */
+ if (d->testing_p)
+ {
+ if (in_order)
+ return true;
+ dcopy.op1 = dcopy.op0;
+ return expand_vec_perm_1 (&dcopy);
+ }
+
+ shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
+ if (GET_MODE_SIZE (d->vmode) == 16)
+ {
+ target = gen_reg_rtx (TImode);
+ emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1),
+ gen_lowpart (TImode, d->op0), shift));
+ }
+ else
+ {
+ target = gen_reg_rtx (V2TImode);
+ emit_insn (gen_avx2_palignr2vti (target, gen_lowpart (V2TImode, d->op1),
+ gen_lowpart (V2TImode, d->op0), shift));
+ }
+
+ dcopy.op0 = dcopy.op1 = gen_lowpart (d->vmode, target);
/* Test for the degenerate case where the alignment by itself
produces the desired permutation. */
> And, as said by H.J., please add the testcase from the PR that will
> exercise the code path. Without the testcase included, the patch is
> unreviewable, and this is the reason why no maintainer (including me)
> wants to approve it in its current form.
I'd hope my gcc.dg/torture/vshuf* tests catch it, if not in normal
mode, at least in the expensive one. OT, note, those tests haven't been
extended for AVX-512 modes (for most modes it is trivial, just
V64QI will need new vshuf-64.inc).
Jakub