109088 – GCC does not always vectorize conditional reduction

Bug 109088 - GCC does not always vectorize conditional reduction

Summary: GCC does not always vectorize conditional reduction

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	13.0

Importance:	P3 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:	vectorizer
	Show dependency tree / graph

Reported:	2023-03-10 09:24 UTC by JuzheZhong
Modified:	2023-11-16 06:50 UTC (History)
CC List:	4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2023-03-10 00:00:00

Attachments
testcase (187 bytes, text/plain) 2023-03-10 14:04 UTC, Andrew Pinski	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description JuzheZhong 2023-03-10 09:24:32 UTC

Such case, LLVM succeed in auto-vectorization but GCC fail.

https://godbolt.org/z/1x68jo36d

Comment 1 Uroš Bizjak 2023-03-10 10:39:15 UTC

Please read https://gcc.gnu.org/bugs/ on how to correctly report a bug.

Comment 2 Richard Biener 2023-03-10 12:39:03 UTC

I think we can do conditional reduction but maybe not for variable-length vectors?  Otherwise I'm sure there's a duplicate bug about this.

Comment 3 JuzheZhong 2023-03-10 13:16:55 UTC

(In reply to Richard Biener from comment #2)
> I think we can do conditional reduction but maybe not for variable-length
> vectors?  Otherwise I'm sure there's a duplicate bug about this.

No, GCC still can not vectorize when I specify the fixed-length vectors 
(-msve-vector-bits=512).

If it's a known issue, do you have any suggestion? I am willing to fix it when
I finished the RVV auto-vectorization.

Comment 4 Andrew Pinski 2023-03-10 14:04:45 UTC

Created attachment 54635 [details]
testcase

Comment 5 Andrew Pinski 2023-03-10 14:09:42 UTC

GCC does vectorize:
    if (a[i] > b[i]) {
      result += a[i];
    }

Even for variable-length vectors.
Just we don't vectorize when there is an extra conditional operation.

GCC will even vectorize with variable-length vectors:
#include <stdint.h>

uint64_t single_loop_with_if_condition(
        uint64_t * restrict a,
        uint64_t * restrict b,
        uint64_t * restrict c,
        int loop_size) {
  uint64_t result = 0;

  for (int i = 0; i < loop_size; i++) {
    c[i] = a[i] + 1;
    if (a[i] > b[i]) {
      result += c[i];
    }
  }
  return result;
}

Comment 6 JuzheZhong 2023-09-26 12:14:10 UTC

After investigations:

GCC failed to vectorize reduction with multiple conditional operations:

ifcvt dump:

# result_20 = PHI <result_9(8), 0(18)>
...
_11 = result_20 + 10;
result_17 = _4 + _11;
_23 = _4 > _7;
result_9 = _23 ? result_17 : result_20;

It's odd that GCC failed to vectorize it since they are not complicate statements.

In LLVM, it will vectorize them into:

vector_ssa_2 = <vector_ssa_result, 0>
...
vector_ssa_1 = vector_ssa_2 + 10;
vector_ssa_3 = vector_ssa_1 + 10;
mask_ssa_1 = vector_ssa_4 > vector_ssa_5;
vector_ssa_result = select <mask_ssa_1, vector_ssa_3, vector_ssa_2>

I think GCC should be able to vectorize it like LLVM:

vector_ssa_2 = <vector_ssa_result, 0>
...
vector_ssa_1 = vector_ssa_2 + 10;
vector_ssa_3 = vector_ssa_1 + 10;
mask_ssa_1 = vector_ssa_4 > vector_ssa_5;
vector_ssa_result = VCOND_MASK <mask_ssa_1, vector_ssa_3, vector_ssa_2>

I saw this code disable the vectorization:
      else if (!bbs.is_empty ()
	       && bb->loop_father->header == bb
	       && bb->loop_father->dont_vectorize)
	{
	  if (dump_enabled_p ())
	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
			     "splitting region at dont-vectorize loop %d "
			     "entry at bb%d\n",
			     bb->loop_father->num, bb->index);
	  split = true;
	}

I am not familiar with these codes, any ideas ? Thanks.

Comment 7 JuzheZhong 2023-09-27 02:45:12 UTC

Update the analysis:

We failed to recognize it as reduction because, the PHI result has 2 uses:

# result_20 = PHI <result_9(8), 0(18)>
...
_11 = result_20 + 10;   --------> first use
result_17 = _4 + _11;
_23 = _4 > _7;
result_9 = _23 ? result_17 : result_20;  -----> second use

It seems that it is the if-conversion issue which makes loop vectorizer failed
to vectorize it.

I have checked LLVM implementation, their "result" ssa always has single use no
matter how I modify the codes (for example, result += a[i] + b[i] + c[i]).

Comment 8 JuzheZhong 2023-09-27 02:58:51 UTC

It's because the order of the operations we are doing:

For code as follows:

result += mask ? a[i] + x : 0;

GCC:
result_ssa_1 = PHI <result_ssa_2, 0>
...
STMT 1. tmp = a[i] + x;
STMT 2. tmp2 = tmp + result_ssa_1;
STMT 3. result_ssa_2 = mask ? tmp2 : result_ssa_1;

Here we can see both STMT 2 and STMT 3 are using 'result_ssa_1',
we end up with 2 uses of the PHI result. Then, we failed to vectorize.

Wheras LLVM:

result_ssa_1 = PHI <result_ssa_2, 0>
...
IR 1. tmp = a[i] + x;
IR 2. tmp2 = mask ? tmp : 0;
IR 3. result_ssa_2 = tmp2 + result_ssa_1.

LLVM only has 1 use.

Is it reasonable to swap the order in match.pd ?

Comment 9 Richard Biener 2023-09-27 07:15:42 UTC

(In reply to JuzheZhong from comment #8)
> It's because the order of the operations we are doing:
> 
> For code as follows:
> 
> result += mask ? a[i] + x : 0;
> 
> GCC:
> result_ssa_1 = PHI <result_ssa_2, 0>
> ...
> STMT 1. tmp = a[i] + x;
> STMT 2. tmp2 = tmp + result_ssa_1;
> STMT 3. result_ssa_2 = mask ? tmp2 : result_ssa_1;
> 
> Here we can see both STMT 2 and STMT 3 are using 'result_ssa_1',
> we end up with 2 uses of the PHI result. Then, we failed to vectorize.
> 
> Wheras LLVM:
> 
> result_ssa_1 = PHI <result_ssa_2, 0>
> ...
> IR 1. tmp = a[i] + x;
> IR 2. tmp2 = mask ? tmp : 0;
> IR 3. result_ssa_2 = tmp2 + result_ssa_1.

For floating point these are not equivalent (adding zero isn't a no-op).

> LLVM only has 1 use.
> 
> Is it reasonable to swap the order in match.pd ?

if-conversion could be teached to swap this (it's if-conversion creating
the IL for conditional reductions) when valid.  IIRC Robin Dapp also has
a patch to make if-conversion emit .COND_ADD instead which should make
it even better to vectorize.

Comment 10 JuzheZhong 2023-09-27 07:34:41 UTC

(In reply to Richard Biener from comment #9)
> (In reply to JuzheZhong from comment #8)
> > It's because the order of the operations we are doing:
> > 
> > For code as follows:
> > 
> > result += mask ? a[i] + x : 0;
> > 
> > GCC:
> > result_ssa_1 = PHI <result_ssa_2, 0>
> > ...
> > STMT 1. tmp = a[i] + x;
> > STMT 2. tmp2 = tmp + result_ssa_1;
> > STMT 3. result_ssa_2 = mask ? tmp2 : result_ssa_1;
> > 
> > Here we can see both STMT 2 and STMT 3 are using 'result_ssa_1',
> > we end up with 2 uses of the PHI result. Then, we failed to vectorize.
> > 
> > Wheras LLVM:
> > 
> > result_ssa_1 = PHI <result_ssa_2, 0>
> > ...
> > IR 1. tmp = a[i] + x;
> > IR 2. tmp2 = mask ? tmp : 0;
> > IR 3. result_ssa_2 = tmp2 + result_ssa_1.
> 
> For floating point these are not equivalent (adding zero isn't a no-op).


Yes, I agree these are not equivalent for floating-point.
But I they are equivalent if we specify -ffast-math.

I have double checked LLVM, they failed to vectorize conditionl
floating-point reduction too by default.

However, if we specify LLVM -ffast-math, it will generate the same 
if-conversion IR sequence as integer, then vectorization succeed.


> 
> > LLVM only has 1 use.
> > 
> > Is it reasonable to swap the order in match.pd ?
> 
> if-conversion could be teached to swap this (it's if-conversion creating
> the IL for conditional reductions) when valid.  IIRC Robin Dapp also has
> a patch to make if-conversion emit .COND_ADD instead which should make
> it even better to vectorize.

I knew that patch, Robin is trying fixing the issue (in-order reduction)that I posted.

I have confirm that patch can't help since it didn't modify the code for this case, we will end up with multiple use in conditional reduction.

The reduction failed since:

  /* If this isn't a nested cycle or if the nested cycle reduction value
     is used ouside of the inner loop we cannot handle uses of the reduction
     value.  */
  if (nlatch_def_loop_uses > 1 || nphi_def_loop_uses > 1)
    {
      if (dump_enabled_p ())
        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                         "reduction used in loop.\n");
      return NULL;
    }

when  nphi_def_loop_uses  > 1, we failed to vectorize.

I have checked LLVM codes, and I think we can extend this function:

strip_nop_cond_scalar_reduction

We should be able to strip all the statement until we can reach the
use of PHI result, like this:

LLVM is able to handle this case:

for ()
  if (cond)
    result += a[i] + b[i] + c[i] + .... 

No matter how many variables are added in the condition reduction.
They well handle that since they keep iterating all the statement until
reach the result:

result_ssa_1 = PHI <>
tmp1 = result_ssa_1 + a[i];
tmp2 = tmp1 + b[i];
tmp3 = tmp2 + c[i];
....

We keep iterating until find the result_ssa_1 to hold the reduction variable.

Is this LLVM's approach reasonable to GCC?

If yes, I can translate LLVM code into GCC.

Thanks.

Comment 11 Richard Biener 2023-09-27 09:06:03 UTC

I don't think strip_nop_cond_scalar_reduction is the place to adjust here, maybe it's the caller.  I don't have time to dig into the specific issue right now but if we require scalar code adjustments then we need to perform those in if-conversion.

But to me it looks like allowing

> > STMT 1. tmp = a[i] + x;
> > STMT 2. tmp2 = tmp + result_ssa_1;
> > STMT 3. result_ssa_2 = mask ? tmp2 : result_ssa_1;

in vect_is_simple_reduction might also be a reasonable approach.  The
use in the COND_EXPR isn't really a use - it's a conditional update.

Comment 12 JuzheZhong 2023-09-27 09:27:20 UTC

(In reply to Richard Biener from comment #11)
> I don't think strip_nop_cond_scalar_reduction is the place to adjust here,
> maybe it's the caller.  I don't have time to dig into the specific issue
> right now but if we require scalar code adjustments then we need to perform
> those in if-conversion.
> 
> But to me it looks like allowing
> 
> > > STMT 1. tmp = a[i] + x;
> > > STMT 2. tmp2 = tmp + result_ssa_1;
> > > STMT 3. result_ssa_2 = mask ? tmp2 : result_ssa_1;
> 
> in vect_is_simple_reduction might also be a reasonable approach.  The
> use in the COND_EXPR isn't really a use - it's a conditional update.

Thanks Richi.

Enhancing vect_is_simple_reduction in loop vectorizer is also a good approach.
But I think it's better to recognize the scalar condition reduction (if-conversion) as early as possible. Obviously, current if-conversion failed to
recognize it as a feasible conditional reduction.

I think enhancing vect_is_simple_reduction is the approach that it's unlikely we
can simplify the scalar code in if-converison to fit current loop vectorizer.

I believe we will eventually have to enhance both if-converison and loop vectorizer in the future. And I prefer improving the if-conversion and working
on it. Will keep you posted.

Thanks a lot!

Comment 13 JuzheZhong 2023-09-27 14:11:09 UTC

Hi, Richi. This is my draft approach to enhance the finding more potential
condtional reduction.

diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index a8c915913ae..c25d2038f16 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -1790,8 +1790,72 @@ is_cond_scalar_reduction (gimple *phi, gimple **reduc, tree arg_0, tree arg_1,
       std::swap (r_op1, r_op2);
       std::swap (r_nop1, r_nop2);
     }
-  else if (r_nop1 != PHI_RESULT (header_phi))
-    return false;
+  else if (r_nop1 == PHI_RESULT (header_phi))
+    ;
+  else
+    {
+      /* Analyze the statement chain of STMT so that we could teach generate
+        better if-converison code sequence.  We are trying to catch this
+        following situation:
+
+          loop-header:
+          reduc_1 = PHI <..., reduc_2>
+          ...
+          if (...)
+          tmp1 = reduc_1 + rhs1;
+          tmp2 = tmp1 + rhs2;
+          tmp3 = tmp2 + rhs3;
+          ...
+          reduc_3 = tmpN-1 + rhsN-1;
+
+          reduc_2 = PHI <reduc_1, reduc_3>
+
+          and convert to
+
+          reduc_2 = PHI <0, reduc_1>
+          tmp1 = rhs1 + rhs2;
+          tmp2 = tmp1 + rhs3;
+          tmp3 = tmp2 + rhs4;
+          ...
+          tmpN-1 = tmpN-2 + rhsN;
+          ifcvt = cond_expr ? tmpN-1 : 0
+          reduc_1 = tmpN-1 +/- ifcvt;  */
+      if (num_imm_uses (PHI_RESULT (header_phi)) != 2)
+       return false;
+      FOR_EACH_IMM_USE_FAST (use_p, imm_iter, PHI_RESULT (header_phi))
+       {
+         gimple *use_stmt = USE_STMT (use_p);
+         if (is_gimple_assign (use_stmt))
+           {
+             if (gimple_assign_rhs_code (use_stmt) != reduction_op)
+               return false;
+             if (TREE_CODE (gimple_assign_lhs (use_stmt)) != SSA_NAME)
+               return false;
+
+             bool visited_p = false;
+             while (!visited_p)
+               {
+                 use_operand_p use;
+                 if (!single_imm_use (gimple_assign_lhs (use_stmt), &use,
+                                      &use_stmt)
+                     || gimple_bb (use_stmt) != gimple_bb (stmt)
+                     || !is_gimple_assign (use_stmt)
+                     || TREE_CODE (gimple_assign_lhs (use_stmt)) != SSA_NAME
+                     || gimple_assign_rhs_code (use_stmt) != reduction_op)
+                   return false;
+
+                 if (gimple_assign_lhs (use_stmt) == gimple_assign_lhs (stmt))
+                   {
+                     r_op2 = r_op1;
+                     r_op1 = PHI_RESULT (header_phi);
+                     visited_p = true;
+                   }
+               }
+           }
+         else if (use_stmt != phi)
+           return false;
+       }
+    }


My approach is doing the check as follows:

	   tmp1 = reduc_1 + rhs1;
	   tmp2 = tmp1 + rhs2;
	   tmp3 = tmp2 + rhs3;
	   ...
	   reduc_3 = tmpN-1 + rhsN-1;

Start the iteration check from "tmp1 = reduc_1 + rhs1;" until "reduc_3 = tmpN-1 + rhsN-1;"

Make sure each statement are PLUS_EXPR for reduction sum.
Does it look reasonable ?

It succeed on vectorization.

Comment 14 Richard Biener 2023-10-06 09:44:33 UTC

Sorry for the delay - but this looks exactly like Robins transform to COND_ADD, no?  But sure, the current code doesn't handle a reduction path through multiple stmts but when if-conversion would convert the final add to a COND_ADD then
it should be a matter of teaching tree-vect-loop.cc:check_reduction_path
about conditional operations?

Comment 15 JuzheZhong 2023-11-10 12:02:58 UTC

Hi,Richard.
Confirmed Robin's patch doesn't help with this issue.

The root cause of this issue is failed to recognize it as possible
vectorization of conditional reduction which means is_cond_scalar_reduction
is FALSE.

I have this following patch which bootstrap on X86 and regtest passed
also passed on aarch64. This following patch can enhance if-conv
conditional reduction recognition.

diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index a8c915913ae..2bdd3710a65 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -1784,14 +1784,119 @@ is_cond_scalar_reduction (gimple *phi, gimple **reduc, tree arg_0, tree arg_1,
   r_nop2 = strip_nop_cond_scalar_reduction (*has_nop, r_op2);
 
   /* Make R_OP1 to hold reduction variable.  */
+  gimple *nonphi_use_stmt = NULL;
   if (r_nop2 == PHI_RESULT (header_phi)
       && commutative_tree_code (reduction_op))
     {
       std::swap (r_op1, r_op2);
       std::swap (r_nop1, r_nop2);
     }
-  else if (r_nop1 != PHI_RESULT (header_phi))
-    return false;
+  else if (r_nop1 == PHI_RESULT (header_phi))
+    ;
+  else
+    {
+      /* Analyze the statement chain of STMT so that we could teach generate
+	 better if-converison code sequence.  We are trying to catch this
+	 following situation:
+
+	   loop-header:
+	   reduc_1 = PHI <..., reduc_2>
+	   ...
+	   if (...)
+	   tmp1 = reduc_1 + rhs1;
+	   tmp2 = tmp1 + rhs2;
+	   tmp3 = tmp2 + rhs3;
+	   ...
+	   reduc_3 = tmpN-1 + rhsN-1;
+
+	   reduc_2 = PHI <reduc_1, reduc_3>
+
+	   and convert to
+
+	   reduc_2 = PHI <0, reduc_1>
+	   tmp1 = rhs1;
+	   tmp2 = tmp1 + rhs2;
+	   tmp3 = tmp2 + rhs3;
+	   ...
+	   reduc_3 = tmpN-1 + rhsN-1;
+	   ifcvt = cond_expr ? reduc_3 : 0;
+	   reduc_1 = reduc_1 +/- ifcvt;  */
+      if (num_imm_uses (PHI_RESULT (header_phi)) != 2)
+	return false;
+      if (!ANY_INTEGRAL_TYPE_P (TREE_TYPE (PHI_RESULT (phi)))
+	  && !(FLOAT_TYPE_P (TREE_TYPE (PHI_RESULT (phi)))
+	       && !HONOR_SIGNED_ZEROS (TREE_TYPE (PHI_RESULT (phi)))
+	       && !HONOR_SIGN_DEPENDENT_ROUNDING (TREE_TYPE (PHI_RESULT (phi)))
+	       && !HONOR_NANS (TREE_TYPE (PHI_RESULT (phi)))))
+	return false;
+      gimple *prev_use_stmt, *curr_use_stmt;
+      use_operand_p use;
+      FOR_EACH_IMM_USE_FAST (use_p, imm_iter, PHI_RESULT (header_phi))
+	{
+	  prev_use_stmt = curr_use_stmt = USE_STMT (use_p);
+	  if (is_gimple_assign (curr_use_stmt))
+	    {
+	      if (TREE_CODE (gimple_assign_lhs (curr_use_stmt)) != SSA_NAME)
+		return false;
+	      if (*has_nop)
+		{
+		  if (!CONVERT_EXPR_CODE_P (
+			gimple_assign_rhs_code (curr_use_stmt)))
+		    return false;
+		}
+	      else
+		{
+		  if (gimple_assign_rhs_code (curr_use_stmt) != reduction_op)
+		    return false;
+		}
+
+	      bool visited_p = false;
+	      nonphi_use_stmt = curr_use_stmt;
+	      while (!visited_p)
+		{
+		  if (!single_imm_use (gimple_assign_lhs (prev_use_stmt), &use,
+				       &curr_use_stmt)
+		      || gimple_bb (curr_use_stmt) != gimple_bb (stmt)
+		      || !is_gimple_assign (curr_use_stmt)
+		      || TREE_CODE (gimple_assign_lhs (curr_use_stmt))
+			   != SSA_NAME
+		      || gimple_assign_rhs_code (curr_use_stmt) != reduction_op)
+		    return false;
+		  if (curr_use_stmt == stmt)
+		    {
+		      if (*has_nop)
+			{
+			  if (!single_imm_use (gimple_assign_lhs (
+						 nonphi_use_stmt),
+					       &use, &curr_use_stmt))
+			    return false;
+			  r_op1 = gimple_assign_lhs (nonphi_use_stmt);
+			  r_nop1 = PHI_RESULT (header_phi);
+			  nonphi_use_stmt = curr_use_stmt;
+			}
+		      else
+			r_op1 = PHI_RESULT (header_phi);
+
+		      if (*has_nop)
+			{
+			  if (!single_imm_use (gimple_assign_lhs (stmt), &use,
+					       &curr_use_stmt))
+			    return false;
+			  r_op2 = gimple_assign_lhs (stmt);
+			  r_nop2 = gimple_assign_lhs (curr_use_stmt);
+			}
+		      else
+			r_op2 = gimple_assign_lhs (stmt);
+		      visited_p = true;
+		    }
+		  else
+		    prev_use_stmt = curr_use_stmt;
+		}
+	    }
+	  else if (curr_use_stmt != phi)
+	    return false;
+	}
+    }
 
   if (*has_nop)
     {
@@ -1816,12 +1921,41 @@ is_cond_scalar_reduction (gimple *phi, gimple **reduc, tree arg_0, tree arg_1,
 	continue;
       if (use_stmt == stmt)
 	continue;
+      if (use_stmt == nonphi_use_stmt)
+	continue;
       if (gimple_code (use_stmt) != GIMPLE_PHI)
 	return false;
     }
 
   *op0 = r_op1; *op1 = r_op2;
   *reduc = stmt;
+
+  if (nonphi_use_stmt)
+    {
+      /* Transform:
+
+	if (...)
+	   tmp1 = reduc_1 + rhs1;
+	   tmp2 = tmp1 + rhs2;
+	   tmp3 = tmp2 + rhs3;
+
+	into:
+
+	   tmp1 = rhs1 + 0;   ---> We replace reduc_1 into '0'
+	   tmp2 = tmp1 + rhs2;
+	   tmp3 = tmp2 + rhs3;
+	   ...
+	   reduc_3 = tmpN-1 + rhsN-1;
+	   ifcvt = cond_expr ? reduc_3 : 0;  */
+      gcc_assert (gimple_assign_rhs_code (nonphi_use_stmt) == reduction_op);
+      if (gimple_assign_rhs1 (nonphi_use_stmt) == r_op1)
+	gimple_assign_set_rhs1 (nonphi_use_stmt,
+				build_zero_cst (TREE_TYPE (r_op1)));
+      else if (gimple_assign_rhs2 (nonphi_use_stmt) == r_op1)
+	gimple_assign_set_rhs2 (nonphi_use_stmt,
+				build_zero_cst (TREE_TYPE (r_op1)));
+      update_stmt (nonphi_use_stmt);
+    }
   return true;
 }
 
@@ -1886,12 +2020,17 @@ convert_scalar_cond_reduction (gimple *reduc, gimple_stmt_iterator *gsi,
       gsi_remove (&stmt_it, true);
       release_defs (nop_reduc);
     }
+
   gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
 
   /* Delete original reduction stmt.  */
-  stmt_it = gsi_for_stmt (reduc);
-  gsi_remove (&stmt_it, true);
-  release_defs (reduc);
+  if (op1 != gimple_assign_lhs (reduc))
+    {
+      stmt_it = gsi_for_stmt (reduc);
+      gsi_remove (&stmt_it, true);
+      release_defs (reduc);
+    }
+
   return rhs;
 }

I have fully tested it with different kinds of condtional reduction.
All can be vectorized.

I am not sure whether it is a feasible solution.

Comment 16 Richard Biener 2023-11-10 13:10:15 UTC

it's not exactly clear what the transform is you propose.  it looks like a re-association but SSA names do not match up but the transform only replaces
a single op with 0!?

Comment 17 JuzheZhong 2023-11-10 13:42:21 UTC

Sorry for confusing and not enough information.

I am trying to transform:

+          reduc_1 = PHI <..., reduc_2>
+          ...
+          if (...)
+          tmp1 = reduc_1 + rhs1;
+          tmp2 = tmp1 + rhs2;
+          tmp3 = tmp2 + rhs3;
+          ...
+          reduc_3 = tmpN-1 + rhsN-1;
+
+          reduc_2 = PHI <reduc_1, reduc_3>

First transform the first statement:

tmp1 = reduc_1 + rhs1; into  tmp1 = rhs1 + 0;

Then it will become bogus data move assignment: tmp1 = rhs1.
The later PASS will eliminate it.

Then, transform the reduction PHI: 

reduc_1 = PHI <..., reduc_2>

into if-convert statement:

reduc_1 = PHI <_ifc__35(8), 0(18)>

Thid, transform 

+          reduc_3 = tmpN-1 + rhsN-1;
+
+          reduc_2 = PHI <reduc_1, reduc_3>

into :

reduc_3 = tmpN-1 + rhsN-1;
_ifc__35 = .COND_ADD (condition, reduc_1, reduc_3, reduc_1);


So finally:

result_1 = PHI <_ifc__35(8), 0(18)>
...
tmp1 = rhs1;
tmp2 = tmp1 + rhs2;
tmp3 = tmp2 + rhs3;
...
reduc_3 = tmpN-1 + rhsN-1;
_ifc__35 = .COND_ADD (condition, reduc_1, reduc_3, reduc_1);

Comment 18 Richard Biener 2023-11-15 14:09:42 UTC

Ah, so this is for

int foo (int *a, int x)
{
  int sum = 0;
  for (int i = 0; i < 1024; ++i)
    if (a[i] < 10)
      sum = sum + a[i] + x;
  return sum;
}

transforming it to

int foo (int *a, int x)
{
  int sum = 0;
  for (int i = 0; i < 1024; ++i)
    {
      tem = a[i] + x;
      if (a[i] < 10)
        sum = sum + tem;
    }
  return sum;
}

note this is re-associating and thus not always valid.  It also executes
stmts unconditionally (also not always valid, for FP spurious exceptions
might be raised).

Comment 19 JuzheZhong 2023-11-15 14:38:05 UTC

I have added:

+      if (!ANY_INTEGRAL_TYPE_P (TREE_TYPE (PHI_RESULT (phi)))
+	  && !(FLOAT_TYPE_P (TREE_TYPE (PHI_RESULT (phi)))
+	       && !HONOR_SIGNED_ZEROS (TREE_TYPE (PHI_RESULT (phi)))
+	       && !HONOR_SIGN_DEPENDENT_ROUNDING (TREE_TYPE (PHI_RESULT (phi)))
+	       && !HONOR_NANS (TREE_TYPE (PHI_RESULT (phi)))))
+	return false;

for floating-point. I failed to see which situation will cause FP exceptions ?

Comment 20 rguenther@suse.de 2023-11-15 14:42:45 UTC

On Wed, 15 Nov 2023, juzhe.zhong at rivai dot ai wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088
> 
> --- Comment #19 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> I have added:
> 
> +      if (!ANY_INTEGRAL_TYPE_P (TREE_TYPE (PHI_RESULT (phi)))

For TYPE_OVERFLOW_UNDEFINED you have to convert the ops to unsigned
to avoid spurious undefined overflow.

> +         && !(FLOAT_TYPE_P (TREE_TYPE (PHI_RESULT (phi)))
> +              && !HONOR_SIGNED_ZEROS (TREE_TYPE (PHI_RESULT (phi)))
> +              && !HONOR_SIGN_DEPENDENT_ROUNDING (TREE_TYPE (PHI_RESULT (phi)))
> +              && !HONOR_NANS (TREE_TYPE (PHI_RESULT (phi)))))

You should check flag_associative_math which covers one half
and !flag_trapping_math which covers spurious FP exceptions.

> +       return false;
> 
> for floating-point. I failed to see which situation will cause FP exceptions ?

Ops with NaN cause INVALID, but there's also INEXACT which can be
set differently after re-association.

Comment 21 JuzheZhong 2023-11-16 01:06:17 UTC

Thanks Richi.

Does re-associating (with eliminating exceptions) in if-convert is a reasonable approach ?

If yes, I am gonna to revisit this PR on GCC-15 and refine the codes.

Comment 22 rguenther@suse.de 2023-11-16 06:50:42 UTC

On Thu, 16 Nov 2023, juzhe.zhong at rivai dot ai wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088
> 
> --- Comment #21 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> Thanks Richi.
> 
> Does re-associating (with eliminating exceptions) in if-convert is a reasonable
> approach ?

Yeah, I don't think we have a much better place at the moment.