This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] AVX2 vec_widen_[su]mult_{hi,lo}*, sdot_prod* and udot_prod*


On 10/14/2011 07:18 AM, Jakub Jelinek wrote:
> +  /* This would be 2 insns shorter if
> +     rperm[i] = GEN_INT (((~i & 1) << 2) + i / 2);
> +     has been used instead (both vpslrq insns wouldn't be needed),
> +     but vec_widen_*mult_hi_* is usually used together with
> +     vec_widen_*mult_lo_* and by writing it this way the load
> +     of the constant and the two vpermd instructions (cross-lane)
> +     can be CSEd together.  */
> +  for (i = 0; i < 8; ++i)
> +    rperm[i] = GEN_INT (((i & 1) << 2) + i / 2);
> +  vperm = gen_rtx_CONST_VECTOR (V8SImode, gen_rtvec_v (8, rperm));
> +  vperm = force_reg (V8SImode, vperm);
> +  emit_insn (gen_avx2_permvarv8si (t1, vperm, operands[1]));
> +  emit_insn (gen_avx2_permvarv8si (t2, vperm, operands[2]));
> +  emit_insn (gen_lshrv4di3 (gen_lowpart (V4DImode, t3),
> +			    gen_lowpart (V4DImode, t1), GEN_INT (32)));
> +  emit_insn (gen_lshrv4di3 (gen_lowpart (V4DImode, t4),
> +			    gen_lowpart (V4DImode, t2), GEN_INT (32)));
> +  emit_insn (gen_avx2_<u>mulv4siv4di3 (operands[0], t3, t4));

So what you're doing here is the low-part permutation:

	0 4 1 5 2 6 3 7

followed by a shift to get

	4 . 5 . 6 . 7 .

But you need to load a 256-bit constant from memory to get it.

I wonder if it wouldn't be better to use VPERMQ to handle the lane change:

	0   2   1   3
	0 1 4 5 2 3 6 7

shared between the hi/lo, and a VPSHUFD to handle the in-lane ordering:

	0 0 1 1 2 2 3 3
	4 4 5 5 6 6 7 7

In the end we get 2+(2+2)=6 insns as setup prior to the VPMULDQs, as compared
to your 1+2+(0+2)=5 insns, but no need to wait for the constant load.  Of 
course, if the constant load gets hoisted out of the loop, yours will likely
win on throughput.

Thoughts, Uros and those looking in from Intel?

Otherwise it looks ok.


r~


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]