115659 – powerpc fallout from removing vcond{,u,eq} patterns

Bug 115659 - powerpc fallout from removing vcond{,u,eq} patterns

Summary: powerpc fallout from removing vcond{,u,eq} patterns

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	15.0

Importance:	P3 normal
Target Milestone:	15.0
Assignee:	Kewen Lin

URL:
Keywords:	missed-optimization

Depends on:
Blocks:	114189
	Show dependency tree / graph

Reported:	2024-06-26 05:54 UTC by Kewen Lin
Modified:	2024-07-12 06:44 UTC (History)
CC List:	5 users (show)

See Also:
Host:
Target:	powerpc*
Build:
Known to work:
Known to fail:
Last reconfirmed:	2024-06-26 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Kewen Lin 2024-06-26 05:54:32 UTC

Applying the patch dropping vcond{,u,eq}_optab support (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114189#c2), there is only one failure on both BE and LE:

FAIL: gcc.target/powerpc/pr66144-3.c scan-assembler-not \\\\mvspltisw\\\\M

Previously I blindly took it as false alarm, but after further checking, I realized it exposed a miss-opt.

In function rs6000_emit_vector_cond_expr, there is one optimization

  /* Optimize vec1 == vec2, to know the mask generates -1/0.  */
  if (GET_MODE_CLASS (dest_mode) == MODE_VECTOR_INT
      && (GET_CODE (op_true) == CONST_VECTOR
	  || GET_CODE (op_false) == CONST_VECTOR))

  ...

, it's some special handling for

   1) op_true -1 and op_false 0
   2) op_false 0 and op_true -1
   3) op_true -1
   4) op_false 0

by reusing the result of vector comparison as it returns -1 or 0.

Comment 1 Kewen Lin 2024-06-26 06:18:49 UTC

Now isel has some handling on x CMP y ? -1 : 0 to x CMP y, 

	  /* Try to fold x CMP y ? -1 : 0 to x CMP y.  */
	  if (can_compute_op0
	      && integer_minus_onep (op1)
	      && integer_zerop (op2)
	      && TYPE_MODE (TREE_TYPE (lhs)) == TYPE_MODE (TREE_TYPE (op0)))
	    {
	      tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
	      gassign *new_stmt = gimple_build_assign (lhs, conv_op);
	      gsi_replace (gsi, new_stmt, true);
	      return new_stmt;
	    }

it looks can be extended to cover:

   c = x CMP y 
   r = c ? -1 : z  =>  r = c ? c : z
   r = c ?  z : 0  =>  r = c ? z : c

, but better to be supported in match.pd?

The handling in rs6000_emit_vector_cond_expr already knows inversion happens or not, so it further handles the case like:

   c = x CMP y  // c' = x OP y, c = ~c'
 
   r = c ?  0 : z   =>  r = c' ?  z  : c'
   r = c ?  z : -1  =>  r = c' ?  c' : z

it seems to need a helper to query whether if target would expand with inversion for one given comparison operator?

Comment 2 Andrew Pinski 2024-06-26 06:24:15 UTC

Note I think this could help scalar code too:
```
int a[1], b[1], c[1];

void
test (void)
{
  a[0] = (b[0] == c[0]) ? -1 : a[0];
}

void
test1 (void)
{
  a[0] = (-(b[0] == c[0])) | a[0];
}

```

So this could be something like:
```
(simplify
 (cond @0 @1 integer_all_ones_p)
 (bit_ior (negate (convert @0)) @1))
(simplify
 (vec_cond @0 @1 integer_all_ones_p)
 (bit_ior (view_convert @0) @1))
```
The second one might need a target_supports_op_p for the bit_ior.

Comment 3 Richard Biener 2024-06-26 07:39:21 UTC

   c = x CMP y 
   r = c ? -1 : z  =>  r = c ? c : z
   r = c ?  z : 0  =>  r = c ? z : c

this is probably best left for ISEL.  I agree the transforms eliminating
the COND are useful in general and suitable also for match.pd.  Watch
out for vectorizer patterns though which creates scalar COND_EXPRs for
bool mask <-> bool value transforms.

Note the transforms need guarding with the mode check since for targets
with compares producing mask modes it doesn't work (x86, for smaller
vectors could resort to AVX compares, but the way the mask mode target
hook operates this doesn't look easy).

Comment 4 Kewen Lin 2024-06-27 06:07:06 UTC

(In reply to Richard Biener from comment #3)
>    c = x CMP y 
>    r = c ? -1 : z  =>  r = c ? c : z
>    r = c ?  z : 0  =>  r = c ? z : c
> 
> this is probably best left for ISEL.  I agree the transforms eliminating
> the COND are useful in general and suitable also for match.pd.  Watch
> out for vectorizer patterns though which creates scalar COND_EXPRs for
> bool mask <-> bool value transforms.

Thanks for the suggestion! If going with ISEL, the patch seems to be like:

-----
diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
index 54c1801038b..abb18932228 100644
--- a/gcc/gimple-isel.cc
+++ b/gcc/gimple-isel.cc
@@ -240,16 +240,34 @@ gimple_expand_vec_cond_expr (struct function *fun, gimple_stmt_iterator *gsi,
             can_compute_op0 = expand_vec_cmp_expr_p (op0a_type, op0_type,
                                                      tcode);

-          /* Try to fold x CMP y ? -1 : 0 to x CMP y.  */
 	  if (can_compute_op0
-	      && integer_minus_onep (op1)
-	      && integer_zerop (op2)
 	      && TYPE_MODE (TREE_TYPE (lhs)) == TYPE_MODE (TREE_TYPE (op0)))
 	    {
-	      tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
-	      gassign *new_stmt = gimple_build_assign (lhs, conv_op);
-	      gsi_replace (gsi, new_stmt, true);
-	      return new_stmt;
+	      bool op1_minus_onep = integer_minus_onep (op1);
+	      bool op2_zerop = integer_zerop (op2);
+	      /* Try to fold x CMP y ? -1 : 0 to x CMP y.  */
+	      if (op1_minus_onep && op2_zerop)
+		{
+		  tree conv_op
+		    = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
+		  gassign *new_stmt = gimple_build_assign (lhs, conv_op);
+		  gsi_replace (gsi, new_stmt, true);
+		  return new_stmt;
+		}
+	      /* Try to fold x CMP y ? -1 : z to x CMP y ? x CMP y : z,
+		 or x CMP y ? z : 0 to x CMP y ? z : x CMP y.  */
+	      if (op1_minus_onep || op2_zerop)
+		{
+		  tree conv_op
+		    = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
+		  tree new_op = make_ssa_name (TREE_TYPE (lhs));
+		  gassign *new_stmt = gimple_build_assign (new_op, conv_op);
+		  if (op1_minus_onep)
+		    op1 = new_op;
+		  else
+		    op2 = new_op;
+		  gsi_insert_seq_before (gsi, new_stmt, GSI_SAME_STMT);
+		}
 	    }

 	  /* When the compare has EH we do not want to forward it when

-----

But this doesn't help this exposed failure, as it belongs to the latter case. If further going with some hacks for inversion:

-----
diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
index abb18932228..afc2c9f1386 100644
--- a/gcc/gimple-isel.cc
+++ b/gcc/gimple-isel.cc
@@ -240,6 +240,15 @@ gimple_expand_vec_cond_expr (struct function *fun, gimple_stmt_iterator *gsi,
            can_compute_op0 = expand_vec_cmp_expr_p (op0a_type, op0_type,
                                                     tcode);

+         auto need_inverted_p = [](tree_code c, machine_mode m) {
+           if (GET_MODE_CLASS (m) == MODE_VECTOR_INT)
+             return (c == NE_EXPR || c == GE_EXPR || c == LE_EXPR);
+           gcc_assert (GET_MODE_CLASS (m) == MODE_VECTOR_FLOAT);
+           return (c == NE_EXPR || c == UNLE_EXPR || c == UNLT_EXPR
+                   || c == UNGE_EXPR || c == UNGT_EXPR || c == UNORDERED_EXPR
+                   || c == UNEQ_EXPR);
+         };
+
          if (can_compute_op0
              && TYPE_MODE (TREE_TYPE (lhs)) == TYPE_MODE (TREE_TYPE (op0)))
            {
@@ -254,6 +263,23 @@ gimple_expand_vec_cond_expr (struct function *fun, gimple_stmt_iterator *gsi,
                  gsi_replace (gsi, new_stmt, true);
                  return new_stmt;
                }
+             bool inverted_p = need_inverted_p (tcode, TYPE_MODE (op0a_type));
+             bool op1_zerop = integer_zerop (op1);
+             bool op2_minus_onep = integer_minus_onep (op2);
+             /* Try to fold x CMP y ? 0 : -1 to ~(x CMP y), it can reuse
+                the comparison before the inversion.  */
+             if (inverted_p && op1_zerop && op2_minus_onep)
+               {
+                 tree inv_op0 = make_ssa_name (TREE_TYPE (op0));
+                 gassign *inv_stmt
+                   = gimple_build_assign (inv_op0, BIT_NOT_EXPR, op0);
+                 gsi_insert_seq_before (gsi, inv_stmt, GSI_SAME_STMT);
+                 tree conv_op
+                   = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), inv_op0);
+                 gassign *new_stmt = gimple_build_assign (lhs, conv_op);
+                 gsi_replace (gsi, new_stmt, true);
+                 return new_stmt;
+               }
              /* Try to fold x CMP y ? -1 : z to x CMP y ? x CMP y : z,
                 or x CMP y ? z : 0 to x CMP y ? z : x CMP y.  */
              if (op1_minus_onep || op2_zerop)
@@ -268,6 +294,25 @@ gimple_expand_vec_cond_expr (struct function *fun, gimple_stmt_iterator *gsi,
                    op2 = new_op;
                  gsi_insert_seq_before (gsi, new_stmt, GSI_SAME_STMT);
                }
+             /* Try to fold x CMP y ? z : -1 to x CMP y ? z : ~(x CMP y),
+                or x CMP y ? 0 : z to x CMP y ? ~(x CMP y) : z, expect it
+                can reuse the comparison before the inversion.  */
+             else if (inverted_p && (op1_zerop || op2_minus_onep))
+               {
+                 tree inv_op0 = make_ssa_name (TREE_TYPE (op0));
+                 gassign *inv_stmt
+                   = gimple_build_assign (inv_op0, BIT_NOT_EXPR, op0);
+                 gsi_insert_seq_before (gsi, inv_stmt, GSI_SAME_STMT);
+                 tree conv_op
+                   = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), inv_op0);
+                 tree new_op = make_ssa_name (TREE_TYPE (lhs));
+                 gassign *new_stmt = gimple_build_assign (new_op, conv_op);
+                 gsi_insert_seq_before (gsi, new_stmt, GSI_SAME_STMT);
+                 if (integer_minus_onep (op2))
+                   op2 = new_op;
+                 else
+                   op1 = new_op;
+               }
            }

          /* When the compare has EH we do not want to forward it when

----

As the ML thread discussion, I think we will expand the target unsupported vector comparison early, then we can see this inversion and able to canonicalize the latter one to the former one?

Comment 5 Kewen Lin 2024-06-27 06:42:37 UTC

(In reply to Andrew Pinski from comment #2)
> Note I think this could help scalar code too:
> ```
> int a[1], b[1], c[1];
> 
> void
> test (void)
> {
>   a[0] = (b[0] == c[0]) ? -1 : a[0];
> }
> 
> void
> test1 (void)
> {
>   a[0] = (-(b[0] == c[0])) | a[0];
> }
> 
> ```
> 

Good catch!

> So this could be something like:
> ```
> (simplify
>  (cond @0 @1 integer_all_ones_p)
>  (bit_ior (negate (convert @0)) @1))
> (simplify
>  (vec_cond @0 @1 integer_all_ones_p)
>  (bit_ior (view_convert @0) @1))
> ```

Missing negate for the vector one?

> The second one might need a target_supports_op_p for the bit_ior.

Thanks for the hints! This looks more simplified than still keeping vec_cond, do we need to consider the target costing on cond (conditional select) vs. negate + or?

Comment 6 Andrew Pinski 2024-06-27 06:48:27 UTC

(In reply to Kewen Lin from comment #5)
> (In reply to Andrew Pinski from comment #2)
> > Note I think this could help scalar code too:
> > ```
> > int a[1], b[1], c[1];
> > 
> > void
> > test (void)
> > {
> >   a[0] = (b[0] == c[0]) ? -1 : a[0];
> > }
> > 
> > void
> > test1 (void)
> > {
> >   a[0] = (-(b[0] == c[0])) | a[0];
> > }
> > 
> > ```
> > 
> 
> Good catch!
> 
> > So this could be something like:
> > ```
> > (simplify
> >  (cond @0 @1 integer_all_ones_p)
> >  (bit_ior (negate (convert @0)) @1))
> > (simplify
> >  (vec_cond @0 @1 integer_all_ones_p)
> >  (bit_ior (view_convert @0) @1))
> > ```
> 
> Missing negate for the vector one?

No because vector true is already -1 :).

Comment 7 Kewen Lin 2024-06-27 08:59:07 UTC

> > > (simplify
> > >  (vec_cond @0 @1 integer_all_ones_p)
> > >  (bit_ior (view_convert @0) @1))
> > > ```
> > 
> > Missing negate for the vector one?
> 
> No because vector true is already -1 :).

I could be wrong, but this vector transformation seems wrong, like @0 is -1, originally wants @1 but this simplification returns -1, while @0 is 0, originally wants -1 but this simplification returns @1, the results get switched?

Comment 8 Kewen Lin 2024-06-27 09:58:14 UTC

Inspired by Andrew's comments, it looks we can have:

   c = x CMP y
   r = c ?  0 :  z   =>  r =  ~c & z      (1)
   r = c ?  z :  0   =>  r =   c & z      (2)
   r = c ? -1 :  z   =>  r =   c | z      (3)
   r = c ?  z : -1   =>  r =  ~c | z      (4)

so if target supports vector "or" and "and", (2)(3) is clearly an improvement (basic logical operation should not be slower than vector select), (1)(4) may need further cost comparison (or if target supports the compound operation then query with optab support).

Comment 9 Richard Biener 2024-06-27 10:31:30 UTC

I think the inversion code wants to check invert_tree_comparison and see if
the inverted compare is supported and only if not fall back to inverting the
comparison result (there is of course the multi-use case to consider).

I also think that incrementally improving the /* Try to fold x CMP y ? -1 : 0 to x CMP y.  */ is fine we don't have to handle everything in one patch.

Thanks for working on this.  The x86 folks seem to be able to handle most
things within the backend which is also fine, handling common problems in
the middle-end is of course better.

Comment 10 Kewen Lin 2024-07-01 07:30:47 UTC

(In reply to Richard Biener from comment #9)
> I think the inversion code wants to check invert_tree_comparison and see if
> the inverted compare is supported and only if not fall back to inverting the
> comparison result (there is of course the multi-use case to consider).

OK, for now all/most targets claim to support all comparisons (do swapping and inversion etc. in expanders themselves), it seems that we have to handle this until then we have some generic handlings for them.

> I also think that incrementally improving the /* Try to fold x CMP y ? -1 :
> 0 to x CMP y.  */ is fine we don't have to handle everything in one patch.
> 
> Thanks for working on this.  The x86 folks seem to be able to handle most
> things within the backend which is also fine, handling common problems in
> the middle-end is of course better.

Thanks for the suggestions, posted two patches for review and comments. Yes, I realized that with some define_insn_and_split in backend can also catch some pattern and generate expected code.

Comment 11 GCC Commits 2024-07-02 07:15:32 UTC

The master branch has been updated by Kewen Lin <linkw@gcc.gnu.org>:

https://gcc.gnu.org/g:56670281c6db19d75c7b63e38971ab84681b245c

commit r15-1763-g56670281c6db19d75c7b63e38971ab84681b245c
Author: Kewen Lin <linkw@linux.ibm.com>
Date:   Tue Jul 2 02:13:35 2024 -0500

    isel: Fold more in gimple_expand_vec_cond_expr [PR115659]
    
    As PR115659 shows, assuming c = x CMP y, there are some
    folding chances for patterns r = c ? -1/z : z/0.
    
    For r = c ? -1 : z, it can be folded into:
      - r = c | z (with ior_optab supported)
      - or r = c ? c : z
    
    while for r = c ?  z : 0, it can be foled into:
      - r = c & z (with and_optab supported)
      - or r = c ? z : c
    
    This patch is to teach ISEL to take care of them and also
    remove the redundant gsi_replace as the caller of function
    gimple_expand_vec_cond_expr will handle it.
    
            PR tree-optimization/115659
    
    gcc/ChangeLog:
    
            * gimple-isel.cc (gimple_expand_vec_cond_expr): Add more foldings for
            patterns x CMP y ? -1 : z and x CMP y ? z : 0.

Comment 12 GCC Commits 2024-07-08 05:16:25 UTC

The master branch has been updated by Kewen Lin <linkw@gcc.gnu.org>:

https://gcc.gnu.org/g:f379596e0ba99df249d6e8b3f2e66edfcea916fe

commit r15-1890-gf379596e0ba99df249d6e8b3f2e66edfcea916fe
Author: Kewen Lin <linkw@linux.ibm.com>
Date:   Mon Jul 8 00:14:59 2024 -0500

    isel: Fold more in gimple_expand_vec_cond_expr with andc and iorc [PR115659]
    
    As PR115659 shows, assuming c = x CMP y, there are some
    folding chances for patterns r = c ? 0/z : z/-1:
      - for r = c ? 0 : z, it can be folded into r = ~c & z.
      - for r = c ? z : -1, it can be folded into r = ~c | z.
    
    But BIT_AND/BIT_IOR applied on one BIT_NOT operand is a
    compound operation, it's arguable to consider it beats
    vector selection.  So this patch is to introduce new
    optabs andc, iorc and its corresponding internal functions
    BIT_{ANDC,IORC}, and if targets defines such optabs for
    vector modes, it means targets support these hardware
    insns and should be not worse than vector selection.
    
            PR tree-optimization/115659
    
    gcc/ChangeLog:
    
            * doc/md.texi: Document andcm3 and iorcm3.
            * gimple-isel.cc (gimple_expand_vec_cond_expr): Add more foldings for
            patterns x CMP y ? 0 : z and x CMP y ? z : -1.
            * internal-fn.def (BIT_ANDC): New internal function.
            (BIT_IORC): Likewise.
            * optabs.def (andc, iorc): New optab.

Comment 13 GCC Commits 2024-07-08 05:16:30 UTC

The master branch has been updated by Kewen Lin <linkw@gcc.gnu.org>:

https://gcc.gnu.org/g:6425dae07aa4be58abade03455c2d9744f73d4e1

commit r15-1891-g6425dae07aa4be58abade03455c2d9744f73d4e1
Author: Kewen Lin <linkw@linux.ibm.com>
Date:   Mon Jul 8 00:15:00 2024 -0500

    rs6000: Replace orc with iorc [PR115659]
    
    Since iorc optab is introduced, this patch is to update the
    expander names and all the related uses like bif expanders,
    gen functions accordingly.
    
            PR tree-optimization/115659
    
    gcc/ChangeLog:
    
            * config/rs6000/rs6000-builtins.def: Update some bif expanders by
            replacing orc<mode>3 with iorc<mode>3.
            * config/rs6000/rs6000-string.cc (expand_cmp_vec_sequence): Update gen
            function by replacing orc<mode>3 with iorc<mode>3.
            * config/rs6000/rs6000.md (orc<mode>3): Rename to ...
            (iorc<mode>3): ... this.

Comment 14 GCC Commits 2024-07-12 06:34:25 UTC

The master branch has been updated by Kewen Lin <linkw@gcc.gnu.org>:

https://gcc.gnu.org/g:f7e4000397842671fe7e5c0473f1fa62707e1db9

commit r15-1991-gf7e4000397842671fe7e5c0473f1fa62707e1db9
Author: Kewen Lin <linkw@linux.ibm.com>
Date:   Fri Jul 12 01:32:57 2024 -0500

    rs6000: Remove vcond{,u} expanders
    
    As PR114189 shows, middle-end will obsolete vcond, vcondu
    and vcondeq optabs soon.  This patch is to remove all
    vcond{,u} expanders in rs6000 port and adjust the function
    rs6000_emit_vector_cond_expr which is called by those
    expanders as static.
    
            PR target/115659
    
    gcc/ChangeLog:
    
            * config/rs6000/rs6000-protos.h (rs6000_emit_vector_cond_expr): Remove.
            * config/rs6000/rs6000.cc (rs6000_emit_vector_cond_expr): Add static
            qualifier as it is only called by rs6000_emit_swsqrt now.
            * config/rs6000/vector.md (vcond<VEC_F:mode><VEC_F:mode>): Remove.
            (vcond<VEC_I:mode><VEC_I:mode>): Remove.
            (vcondv4sfv4si): Likewise.
            (vcondv4siv4sf): Likewise.
            (vcondv2dfv2di): Likewise.
            (vcondv2div2df): Likewise.
            (vcondu<VEC_I:mode><VEC_I:mode>): Likewise.
            (vconduv4sfv4si): Likewise.
            (vconduv2dfv2di): Likewise.

Comment 15 Kewen Lin 2024-07-12 06:44:52 UTC

Commit r15-1890 and r15-1891 make the regression gone and r15-1991 removes vcond{,u} expanders in rs6000, mark this as resolved. For the potential enhancement on targets without iorc & andc support, I'll file a PR once middle-end takes over the expanding for the target unsupported comparison.