This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC PATCH] Enable V32HI/V64QI const permutations


On Mon, Oct 06, 2014 at 06:09:07PM +0400, Ilya Tocar wrote:
> > Speaking of -mavx512{bw,vl,f}, there apparently is a full 2 operand shuffle
> > for V32HI, V16S[IF], V8D[IF], so the only one instruction full
> > 2 operand shuffle we are missing is V64QI, right?
> > 
> > What would be best worst case sequence for that?
> > 
> > I'd think 2x vpermi2w, 2x vpshufb and one vpor could achieve that,
> > (first vpermi2w would put the even bytes into the right word positions
> > (i.e. at the right position or one above it), second vpermi2w would put
> > the odd bytes into the right word positions (i.e. at the right position
> > or one below it),
> I think we will also need to spend insns converting byte-sized mask into
> word-sized mask.

I'm talking about the constant permutations here (see my other mail to
Kirill).  In that case, you can tweak the mask as much as you want.

I mean something like (completely untested, would need a separate function):

  if (TARGET_AVX512BW && d->vmode == V64QImode)
    ;
  else
    return false;

  /* We can emit arbitrary two operand V64QImode permutations
     with 2 vpermi2w, 2 vpshufb and one vpor instruction.  */
  if (d->testing_p)
    return true;

  struct expand_vec_perm_d ds[2];
  rtx rperm[128], vperm, target0, target1;
  for (i = 0; i < 2; i++)
    {
      ds[i] = *d;
      ds[i].vmode = V32HImode;
      ds[i].nelt = 32;
      ds[i].target = gen_reg_rtx (V32HImode);
      ds[i].op0 = gen_lowpart (V32HImode, d->op0);
      ds[i].op1 = gen_lowpart (V32HImode, d->op1);
    }
  /* Prepare permutations such that the first one takes care of
     putting the even bytes into the right positions or one higher
     positions (ds[0]) and the second one takes care of
     putting the odd bytes into the right positions or one below
     (ds[1]).  
     
  for (i = 0; i < nelt; i++)
    {
      ds[i & 1].perm[i / 2] = d->perm[i] / 2;
      if (i & 1)
	{
	  rperm[i] = constm1_rtx;
	  rperm[i + 64] = GEN_INT ((i & 14) + 1 - (d->perm[i] & 1));
	}
      else
	{
	  rperm[i] = GEN_INT ((i & 14) + (d->perm[i] & 1));
	  rperm[i + 64] = constm1_rtx;
	}
    }

  bool ok = expand_vec_perm_1 (&ds[0]);
  gcc_assert (ok);
  ds[0].target = gen_lowpart (V64QImode, ds[0].target);

  ok = expand_vec_perm_1 (&ds[1]);
  gcc_assert (ok);
  ds[1].target = gen_lowpart (V64QImode, ds[1].target);

  vperm = gen_rtx_CONST_VECTOR (V64QImode, gen_rtvec_v (64, rperm));
  vperm = force_reg (vmode, vperm);
  target0 = gen_reg_rtx (V64QImode);
  emit_insn (gen_avx512bw_pshufbv64qi3 (target0, ds[0].target, vperm));

  vperm = gen_rtx_CONST_VECTOR (V64QImode, gen_rtvec_v (64, rperm + 64));
  vperm = force_reg (vmode, vperm);
  target1 = gen_reg_rtx (V64QImode);
  emit_insn (gen_avx512bw_pshufbv64qi3 (target1, ds[1].target, vperm));

  emit_insn (gen_iorv64qi3 (d->target, target0, target1));

	Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]