This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH AArch64]Support missing vcond pattern by adding/using vec_cmp/vcond_mask patterns.

From: James Greenhalgh <james dot greenhalgh at arm dot com>
To: Bin Cheng <Bin dot Cheng at arm dot com>
Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, nd <nd at arm dot com>
Date: Tue, 31 May 2016 16:24:54 +0100
Subject: Re: [PATCH AArch64]Support missing vcond pattern by adding/using vec_cmp/vcond_mask patterns.
Authentication-results: sourceware.org; auth=none
Nodisclaimer: True
References: <DB5PR08MB1144FD75076B139152FF6394E7480 at DB5PR08MB1144 dot eurprd08 dot prod dot outlook dot com>
Spamdiagnosticmetadata: NSPM
Spamdiagnosticoutput: 1:23

On Tue, May 17, 2016 at 09:02:22AM +0000, Bin Cheng wrote:
> Hi,
> Alan and Renlin noticed that some vcond patterns are not supported in
> AArch64(or AArch32?) backend, and they both had some patches fixing this.
> After investigation, I agree with them that vcond/vcondu in AArch64's backend
> should be re-implemented using vec_cmp/vcond_mask patterns, so here comes
> this patch which is based on Alan's.  This patch supports all vcond/vcondu
> patterns by implementing/using vec_cmp and vcond_mask patterns.  Different to
> the original patch, it doesn't change GCC's expanding process, and it keeps
> vcond patterns.  The patch also introduces vec_cmp*_internal to support
> special case optimization for vcond/vcondu which current implementation does.
> Apart from Alan's patch, I also learned ideas from Renlin's, and it is my
> change that shall be blamed if any potential bug is introduced.
> 
> With this patch, GCC's test condition "vect_cond_mixed" can be enabled on
> AArch64 (in a following patch).  Bootstrap and test on AArch64.  Is it OK?
> BTW, this patch is necessary for gcc.dg/vect/PR56541.c (on AArch64) which was
> added before in tree if-conversion patch.

Splitting this patch would have been very helpful. One patch each for the
new standard pattern names, and one patch for the refactor of vcond. As
it is, this patch is rather difficult to read.

> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
> index bd73bce..f51473a 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -1053,7 +1053,7 @@
>      }
>  
>    cmp_fmt = gen_rtx_fmt_ee (cmp_operator, V2DImode, operands[1], operands[2]);
> -  emit_insn (gen_aarch64_vcond_internalv2div2di (operands[0], operands[1],
> +  emit_insn (gen_vcondv2div2di (operands[0], operands[1],
>                operands[2], cmp_fmt, operands[1], operands[2]));
>    DONE;
>  })
> @@ -2225,204 +2225,215 @@
>    DONE;
>  })
>  
> -(define_expand "aarch64_vcond_internal<mode><mode>"
> +(define_expand "vcond_mask_<mode><v_cmp_result>"
> +  [(match_operand:VALLDI 0 "register_operand")
> +   (match_operand:VALLDI 1 "nonmemory_operand")
> +   (match_operand:VALLDI 2 "nonmemory_operand")
> +   (match_operand:<V_cmp_result> 3 "register_operand")]
> +  "TARGET_SIMD"
> +{
> +  /* If we have (a = (P) ? -1 : 0);
> +     Then we can simply move the generated mask (result must be int).  */
> +  if (operands[1] == CONSTM1_RTX (<MODE>mode)
> +      && operands[2] == CONST0_RTX (<MODE>mode))
> +    emit_move_insn (operands[0], operands[3]);
> +  /* Similarly, (a = (P) ? 0 : -1) is just inverting the generated mask.  */
> +  else if (operands[1] == CONST0_RTX (<MODE>mode)
> +	   && operands[2] == CONSTM1_RTX (<MODE>mode))
> +    emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[3]));
> +  else
> +    {
> +      if (!REG_P (operands[1]))
> +	operands[1] = force_reg (<MODE>mode, operands[1]);
> +      if (!REG_P (operands[2]))
> +	operands[2] = force_reg (<MODE>mode, operands[2]);
> +      emit_insn (gen_aarch64_simd_bsl<mode> (operands[0], operands[3],
> +					     operands[1], operands[2]));
> +    }
> +
> +  DONE;
> +})
> +

This pattern is fine.

> +;; Patterns comparing two vectors to produce a mask.

This comment is insufficient. The logic in vec_cmp<mode><mode>_internal
does not always return the expected mask (in particular for NE), but this
is not made clear in the comment.

> +
> +(define_expand "vec_cmp<mode><mode>_internal"
>    [(set (match_operand:VSDQ_I_DI 0 "register_operand")
> -	(if_then_else:VSDQ_I_DI
> -	  (match_operator 3 "comparison_operator"
> -	    [(match_operand:VSDQ_I_DI 4 "register_operand")
> -	     (match_operand:VSDQ_I_DI 5 "nonmemory_operand")])
> -	  (match_operand:VSDQ_I_DI 1 "nonmemory_operand")
> -	  (match_operand:VSDQ_I_DI 2 "nonmemory_operand")))]
> +	  (match_operator 1 "comparison_operator"
> +	    [(match_operand:VSDQ_I_DI 2 "register_operand")
> +	     (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))]
>    "TARGET_SIMD"

<snip>

> +(define_expand "vec_cmp<mode><mode>"
> +  [(set (match_operand:VSDQ_I_DI 0 "register_operand")
> +	  (match_operator 1 "comparison_operator"
> +	    [(match_operand:VSDQ_I_DI 2 "register_operand")
> +	     (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))]
> +  "TARGET_SIMD"
> +{
> +  enum rtx_code code = GET_CODE (operands[1]);
> +
> +  emit_insn (gen_vec_cmp<mode><mode>_internal (operands[0],
> +					 operands[1], operands[2],
> +					 operands[3]));
> +  /* See comments in vec_cmp<mode><v_cmp_result>_internal, below
> +     operators are computed using other operators, and we need to
> +     revert the result.  */

s/revert/invert

There are no comments in vec_cmp<mode><v_cmp_result>_internal for this
to refer to. But, there should be - more comments would definitely help
this patch!

These comments apply equally to the other copies of this comment in the
patch.

> +  if (code == NE)
> +    emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
>    DONE;
>  })
>  
> -(define_expand "aarch64_vcond_internal<VDQF_COND:mode><VDQF:mode>"
> -  [(set (match_operand:VDQF_COND 0 "register_operand")
> -	(if_then_else:VDQF
> -	  (match_operator 3 "comparison_operator"
> -	    [(match_operand:VDQF 4 "register_operand")
> -	     (match_operand:VDQF 5 "nonmemory_operand")])
> -	  (match_operand:VDQF_COND 1 "nonmemory_operand")
> -	  (match_operand:VDQF_COND 2 "nonmemory_operand")))]
> +(define_expand "vec_cmp<mode><v_cmp_result>_internal"
> +  [(set (match_operand:<V_cmp_result> 0 "register_operand")
> +	(match_operator 1 "comparison_operator"
> +	    [(match_operand:VDQF 2 "register_operand")
> +	     (match_operand:VDQF 3 "nonmemory_operand")]))]

This needs much more thorough commenting. The logic is now extremely
difficult to verify as to which pattern is responible for inverting the
mask. This makes it hard to spot whether the inversions applied to the
unordered comparisons are correct. I can't really review the patch with
the logic split across uncommented patterns in this way - is there
anything you can do to clean it up?

> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-vcond.c b/gcc/testsuite/gcc.target/aarch64/vect-vcond.c
> new file mode 100644
> index 0000000..b0ca6d1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vect-vcond.c

None of this is AArch64 specific?

> @@ -0,0 +1,59 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
> +
> +float fa[1024] = {0.0}, fb[1024] = {0.0}, fc[1024];
> +double da[1024] = {0.0}, db[1024] = {0.0}, dc[1024];
> +
> +int vcond_f_i (int *x)
> +{
> +  int i = 0;
> +  for (i = 0; i < 1024; i++)
> +    {
> +      fa[i] -= 1.0;
> +      fb[i] += 1.0;
> +      fc[i] = (x[i] < i) ? fa[i] : fb[i];
> +    }
> +
> +  return 0;
> +}
> +
> +int vcond_f_u (unsigned int *x)
> +{
> +  unsigned int i = 0;
> +  for (i = 0; i < 1024; i++)
> +    {
> +      fa[i] -= 1.0;
> +      fb[i] += 1.0;
> +      fc[i] = (x[i] < i) ? fa[i] : fb[i];
> +    }
> +
> +  return 0;
> +}
> +
> +int vcond_d_i (long long *x)
> +{
> +  long long i = 0;
> +  for (i = 0; i < 1024; i++)
> +    {
> +      da[i] -= 1.0;
> +      db[i] += 1.0;
> +      dc[i] = (x[i] < i) ? da[i] : db[i];
> +    }
> +
> +  return 0;
> +}
> +
> +int vcond_d_u (unsigned long long *x)
> +{
> +  unsigned long long i = 0;
> +  for (i = 0; i < 1024; i++)
> +    {
> +      da[i] -= 1.0;
> +      db[i] += 1.0;
> +      dc[i] = (x[i] < i) ? da[i] : db[i];
> +    }
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */

Thanks,
James

References:
- [PATCH AArch64]Support missing vcond pattern by adding/using vec_cmp/vcond_mask patterns.
  - From: Bin Cheng

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]