This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [RFC PATCH] Enable V32HI/V64QI const permutations
- From: Jakub Jelinek <jakub at redhat dot com>
- To: Ilya Tocar <tocarip dot intel at gmail dot com>
- Cc: Uros Bizjak <ubizjak at gmail dot com>, Kirill Yukhin <kirill dot yukhin at gmail dot com>, Evgeny Stupachenko <evstupac at gmail dot com>, "H.J. Lu" <hjl dot tools at gmail dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Mon, 6 Oct 2014 17:32:18 +0200
- Subject: Re: [RFC PATCH] Enable V32HI/V64QI const permutations
- Authentication-results: sourceware.org; auth=none
- References: <20141003143908 dot GT1986 at tucnak dot redhat dot com> <20141006070841 dot GW1986 at tucnak dot redhat dot com> <20141006140907 dot GA98884 at msticlxl7 dot ims dot intel dot com>
- Reply-to: Jakub Jelinek <jakub at redhat dot com>
On Mon, Oct 06, 2014 at 06:09:07PM +0400, Ilya Tocar wrote:
> > Speaking of -mavx512{bw,vl,f}, there apparently is a full 2 operand shuffle
> > for V32HI, V16S[IF], V8D[IF], so the only one instruction full
> > 2 operand shuffle we are missing is V64QI, right?
> >
> > What would be best worst case sequence for that?
> >
> > I'd think 2x vpermi2w, 2x vpshufb and one vpor could achieve that,
> > (first vpermi2w would put the even bytes into the right word positions
> > (i.e. at the right position or one above it), second vpermi2w would put
> > the odd bytes into the right word positions (i.e. at the right position
> > or one below it),
> I think we will also need to spend insns converting byte-sized mask into
> word-sized mask.
I'm talking about the constant permutations here (see my other mail to
Kirill). In that case, you can tweak the mask as much as you want.
I mean something like (completely untested, would need a separate function):
if (TARGET_AVX512BW && d->vmode == V64QImode)
;
else
return false;
/* We can emit arbitrary two operand V64QImode permutations
with 2 vpermi2w, 2 vpshufb and one vpor instruction. */
if (d->testing_p)
return true;
struct expand_vec_perm_d ds[2];
rtx rperm[128], vperm, target0, target1;
for (i = 0; i < 2; i++)
{
ds[i] = *d;
ds[i].vmode = V32HImode;
ds[i].nelt = 32;
ds[i].target = gen_reg_rtx (V32HImode);
ds[i].op0 = gen_lowpart (V32HImode, d->op0);
ds[i].op1 = gen_lowpart (V32HImode, d->op1);
}
/* Prepare permutations such that the first one takes care of
putting the even bytes into the right positions or one higher
positions (ds[0]) and the second one takes care of
putting the odd bytes into the right positions or one below
(ds[1]).
for (i = 0; i < nelt; i++)
{
ds[i & 1].perm[i / 2] = d->perm[i] / 2;
if (i & 1)
{
rperm[i] = constm1_rtx;
rperm[i + 64] = GEN_INT ((i & 14) + 1 - (d->perm[i] & 1));
}
else
{
rperm[i] = GEN_INT ((i & 14) + (d->perm[i] & 1));
rperm[i + 64] = constm1_rtx;
}
}
bool ok = expand_vec_perm_1 (&ds[0]);
gcc_assert (ok);
ds[0].target = gen_lowpart (V64QImode, ds[0].target);
ok = expand_vec_perm_1 (&ds[1]);
gcc_assert (ok);
ds[1].target = gen_lowpart (V64QImode, ds[1].target);
vperm = gen_rtx_CONST_VECTOR (V64QImode, gen_rtvec_v (64, rperm));
vperm = force_reg (vmode, vperm);
target0 = gen_reg_rtx (V64QImode);
emit_insn (gen_avx512bw_pshufbv64qi3 (target0, ds[0].target, vperm));
vperm = gen_rtx_CONST_VECTOR (V64QImode, gen_rtvec_v (64, rperm + 64));
vperm = force_reg (vmode, vperm);
target1 = gen_reg_rtx (V64QImode);
emit_insn (gen_avx512bw_pshufbv64qi3 (target1, ds[1].target, vperm));
emit_insn (gen_iorv64qi3 (d->target, target0, target1));
Jakub