This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: ivopts improvement

From: Tom de Vries <vries at codesourcery dot com>
To: Zdenek Dvorak <rakdver at kam dot mff dot cuni dot cz>
Cc: Paolo Bonzini <bonzini at gnu dot org>, Richard Guenther <richard dot guenther at gmail dot com>, gcc-patches at gcc dot gnu dot org, Bernd Schmidt <bernds at codesourcery dot com>, Maxim Kuvyrkov <maxim at codesourcery dot com>
Date: Sun, 13 Mar 2011 18:35:23 +0100
Subject: Re: ivopts improvement
References: <4D6B6DB9.7050302@codesourcery.com> <4D6B7AFD.6000201@gnu.org> <4D6BAF92.3030707@codesourcery.com> <4D6BB48A.9090003@gnu.org> <AANLkTi=-0CJ=W7JXYs9sBZw7ic6W0SYU9tmJVr+YHh5J@mail.gmail.com> <20110228181222.GA24295@kam.mff.cuni.cz> <4D6EBE2A.10002@codesourcery.com> <4D6F550B.9060400@gnu.org> <4D6FA591.6060604@codesourcery.com> <20110304223704.GA9660@kam.mff.cuni.cz>

On 03/04/2011 11:37 PM, Zdenek Dvorak wrote:
> Hi,
> 
>>    /* Whether the loop body includes any function calls.  */
>>    bool body_includes_call;
>> +
>> +  /* Whether the loop body includes any function calls that possibly have side
>> +     effects.  */
>> +  bool body_includes_side_effect_call;
>>  };
>>  
>>  /* An assignment of iv candidates to uses.  */
>> @@ -456,6 +460,20 @@
>>    return exit;
>>  }
>>  
>> +/* Returns true if single_exit (DATA->current_loop) is the only possible exit.
>> +   Uses the same logic as loop_only_exit_p.  */
> 
> why are you duplicating the functionality, instead of simply caching the result
> of loop_only_exit_p?
> 

I was trying to avoid iterating over the loop body twice, once for
body_includes_call and once for body_includes_side_effect_call (or
loop_only_exit_p). But indeed, duplicating functionality is not ideal
either. I additionally tried a version which both does not duplicate
functionality, and is runtime efficient, but the implementation became
very convoluted, so I settled for your suggestion of caching the result
of loop_only_exit_p.

>> +/* Tries to detect
>> +     NIT == (use_iv_max < USE->iv->base)
>> +            ? 0
>> +            : (use_iv_max - USE->iv->base)
>> +   where
>> +     use_iv_real_base == (USE->iv->base - USE->iv->step)
>> +     && CAND->iv->base == base_ptr + use_iv_real_base
>> +   and returns the exclusive upper bound for CAND->var_after:
>> +     base_ptr + use_iv_max.  */
>> +
>> +static tree
>> +get_lt_bound (struct iv_use *use, struct iv_cand *cand, tree nit)
>> +{
> ...
>> +  /* use_iv_real_base == use->iv->base - use->iv->step.  */
>> +  use_iv_real_base = fold_build_plus (MINUS_EXPR, use->iv->base, use->iv->step);
>> +
>> +  /* cand_iv_base.  */
>> +
>> +  /* cand->iv->base == base_ptr + use_iv_real_base.  */
> ...
>> +  /* 0.  */
> ...
> 
> This function seriously needs better comments.  All that are currently present just
> give relations between variables that can be as easily seen from the code (but
> do not at all explain what the variables are supposed to mean), 

I see.

> or make no sense
> (what does the 0. comment mean?)

I was trying to repeat parts of the function header comment bit by bit,
but got too terse in the process.

> Otherwise the patch looks ok (but I would still like to see get_lt_bound with proper
> comments, currently I don't really understand what happens there),

changes compared to last submission:
iterator.6.3-ml.patch:
- split up fold_build_plus into fold_build_plus and robust_plus, in
  order to reuse robust_plus in fold_plus in iterator.6.6-ml.patch.
iterator.6.4-ml.patch:
- just cache result of loop_only_exit_p.
- make loop_only_exit_p robust against exit == NULL.
iterator.6.5-ml.patch:
- new patch. keep ssa_name field valid.
iterator.6.6-ml.patch:
- improved comments.
- factored out folding functionality into fold_plus and
  fold_walk_def_plus.
- detect use loop bound based on use->stmt rather than on the nit
  COND_EXPR.
- improved code to handle 'int' iterator.
- improved code to handle '<=' case.
- improved code to handle negative step.
- improved code to handle iv increments after loop exit.
iterator.6.6-ml.test.patch:
- duplicated test for int iterator.

reg-tested on x86_64.

I hope the comments are better now.

Thanks,
- Tom

diff -u gcc/tree-ssa-loop-ivopts.c gcc/tree-ssa-loop-ivopts.c
--- gcc/tree-ssa-loop-ivopts.c	(working copy)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -340,6 +340,44 @@
 
 static VEC(tree,heap) *decl_rtl_to_reset;
 
+/* Detects whether A is of POINTER_TYPE, and modifies CODE and B to make
+   A CODE B type-safe.  */
+
+static inline void
+robust_plus (enum tree_code *code, tree a, tree *b)
+{
+  tree a_type = TREE_TYPE (a);
+  tree b_type = TREE_TYPE (*b);
+
+  if (POINTER_TYPE_P (a_type))
+    {
+      switch (*code)
+        {
+        case MINUS_EXPR:
+          *b = fold_build1 (NEGATE_EXPR, b_type, *b);
+
+          /* Fall-through.  */
+        case PLUS_EXPR:
+          *code = POINTER_PLUS_EXPR;
+          break;
+        default:
+          gcc_unreachable ();
+        }
+    }
+  else
+    *b = fold_convert (a_type, *b);
+}
+
+/* Returns (TREE_TYPE (A))(A CODE B), where CODE is either PLUS_EXPR or
+   MINUS_EXPR.  Handles the case that A is a pointer robustly.  */
+
+static inline tree
+fold_build_plus (enum tree_code code, tree a, tree b)
+{
+  robust_plus (&code, a, &b);
+  return fold_build2 (code, TREE_TYPE (a), a, b);
+}
+
 /* Number of uses recorded in DATA.  */
 
 static inline unsigned
@@ -2255,18 +2293,7 @@
   if ((HAVE_PRE_INCREMENT && GET_MODE_SIZE (mem_mode) == cstepi)
       || (HAVE_PRE_DECREMENT && GET_MODE_SIZE (mem_mode) == -cstepi))
     {
-      enum tree_code code = MINUS_EXPR;
-      tree new_base;
-      tree new_step = step;
-
-      if (POINTER_TYPE_P (TREE_TYPE (base)))
-	{
-	  new_step = fold_build1 (NEGATE_EXPR, TREE_TYPE (step), step);
-	  code = POINTER_PLUS_EXPR;
-	}
-      else
-	new_step = fold_convert (TREE_TYPE (base), new_step);
-      new_base = fold_build2 (code, TREE_TYPE (base), base, new_step);
+      tree new_base = fold_build_plus (MINUS_EXPR, base, step);
       add_candidate_1 (data, new_base, step, important, IP_BEFORE_USE, use,
 		       use->stmt);
     }

diff -u gcc/tree-ssa-loop-ivopts.c gcc/tree-ssa-loop-ivopts.c
--- gcc/tree-ssa-loop-ivopts.c	(working copy)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -292,6 +292,9 @@
 
   /* Whether the loop body includes any function calls.  */
   bool body_includes_call;
+
+  /* Whether the loop body can only be exited via single exit.  */
+  bool loop_single_exit_p;
 };
 
 /* An assignment of iv candidates to uses.  */
@@ -4403,7 +4406,7 @@
       if (double_int_ucmp (max_niter, period_value) > 0)
         {
           /* See if we can take advantage of infered loop bound information.  */
-          if (loop_only_exit_p (loop, exit))
+          if (data->loop_single_exit_p)
             {
               if (!estimated_loop_iterations (loop, true, &max_niter))
                 return false;
@@ -6397,6 +6400,8 @@
   renumber_gimple_stmt_uids_in_blocks (body, loop->num_nodes);
   free (body);
 
+  data->loop_single_exit_p = loop_only_exit_p (loop, single_exit (loop));
+
   /* For each ssa name determines whether it behaves as an induction variable
      in some loop.  */
   if (!find_induction_variables (data))
only in patch2:
unchanged:
--- gcc/tree-ssa-loop-niter.c	(revision 170268)
+++ gcc/tree-ssa-loop-niter.c	(working copy)
@@ -1771,7 +1771,7 @@ loop_only_exit_p (const struct loop *loo
   unsigned i;
   gimple call;
 
-  if (exit != single_exit (loop))
+  if (exit == NULL || exit != single_exit (loop))
     return false;
 
   body = get_loop_body (loop);

diff -u gcc/tree-ssa-loop-ivopts.c gcc/tree-ssa-loop-ivopts.c
--- gcc/tree-ssa-loop-ivopts.c	(working copy)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -1194,6 +1300,7 @@ record_use (struct ivopts_data *data, tr
 	    gimple stmt, enum use_type use_type)
 {
   struct iv_use *use = XCNEW (struct iv_use);
+  tree tmp;
 
   use->id = n_iv_uses (data);
   use->type = use_type;
@@ -1204,11 +1311,14 @@ record_use (struct ivopts_data *data, tr
 
   /* To avoid showing ssa name in the dumps, if it was not reset by the
      caller.  */
+  tmp = iv->ssa_name;
   iv->ssa_name = NULL_TREE;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     dump_use (dump_file, use);
 
+  iv->ssa_name = tmp;
+
   VEC_safe_push (iv_use_p, heap, data->iv_uses, use);
 
   return use;

diff -u gcc/tree-ssa-loop-ivopts.c gcc/tree-ssa-loop-ivopts.c
--- gcc/tree-ssa-loop-ivopts.c	(working copy)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -419,6 +419,67 @@
   return fold_build2 (code, TREE_TYPE (a), a, b);
 }
 
+/* Folds (TREE_TYPE (A))(A CODE B), where CODE is either PLUS_EXPR or
+   MINUS_EXPR.  Returns the folded expression if folding is successful.
+   Otherwise, return NULL_TREE.  Handles the case that A is a pointer
+   robustly.  */
+
+static inline tree
+fold_plus (enum tree_code code, tree a, tree b)
+{
+  tree a_type = TREE_TYPE (a);
+  tree res;
+
+  STRIP_NOPS (a);
+  robust_plus (&code, a, &b);
+
+  res = fold_binary (code, TREE_TYPE (a), a, b);
+  if (res == NULL_TREE)
+    return NULL_TREE;
+
+  return fold_convert (a_type, res);
+}
+
+/* Folds (TREE_TYPE (A))(A CODE B), where CODE is either PLUS_EXPR or
+   MINUS_EXPR, possibly using the defining stmt of A.  Returns the folded
+   expression if folding is successful.  Otherwise, return NULL_TREE.  */
+
+static inline tree
+fold_walk_def_plus (enum tree_code code, tree a, tree b)
+{
+  tree a_type = TREE_TYPE (a);
+  tree res, a0, a1;
+  gimple def_stmt;
+
+  res = fold_plus (code, a, b);
+  if (res != NULL_TREE)
+    return res;
+
+  STRIP_NOPS (a);
+
+  if (TREE_CODE (a) != SSA_NAME)
+    return NULL_TREE;
+
+  def_stmt = SSA_NAME_DEF_STMT (a);
+  if (!is_gimple_assign (def_stmt)
+      || (gimple_assign_rhs_code (def_stmt) != PLUS_EXPR
+	  && gimple_assign_rhs_code (def_stmt) != POINTER_PLUS_EXPR))
+    return NULL_TREE;
+  a0 = gimple_assign_rhs1 (def_stmt);
+  a1 = gimple_assign_rhs2 (def_stmt);
+
+  def_stmt = SSA_NAME_DEF_STMT (a1);
+  if (is_gimple_assign (def_stmt)
+      && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def_stmt)))
+    a1 = fold_convert (TREE_TYPE (a1), gimple_assign_rhs1 (def_stmt));
+
+  res = fold_plus (code, fold_build_plus (PLUS_EXPR, a0, a1), b);
+  if (res == NULL_TREE)
+    return NULL_TREE;
+
+  return fold_convert (a_type, res);
+}
+
 /* Number of uses recorded in DATA.  */
 
 static inline unsigned
@@ -825,17 +886,25 @@
 
   if (!slot)
     {
-      /* Try to determine number of iterations.  We must know it
-	 unconditionally (i.e., without possibility of # of iterations
-	 being zero).  Also, we cannot safely work with ssa names that
-	 appear in phi nodes on abnormal edges, so that we do not create
-	 overlapping life ranges for them (PR 27283).  */
+      /* Try to determine number of iterations.  We cannot safely work with ssa
+         names that appear in phi nodes on abnormal edges, so that we do not
+         create overlapping life ranges for them (PR 27283).  */
       desc = XNEW (struct tree_niter_desc);
       if (number_of_iterations_exit (data->current_loop,
 				     exit, desc, true)
-	  && integer_zerop (desc->may_be_zero)
      	  && !contains_abnormal_ssa_name_p (desc->niter))
-	niter = desc->niter;
+	{
+	  if (!integer_zerop (desc->may_be_zero))
+            /* Construct COND_EXPR that describes the number of iterations.
+               Either the COND_EXPR is not too expensive, and we can use it as
+               loop bound, or we can deduce a LT_EXPR bound from it.  */
+	    niter
+	      = build3 (COND_EXPR, TREE_TYPE (desc->niter), desc->may_be_zero,
+			build_int_cst_type (TREE_TYPE (desc->niter), 0),
+			desc->niter);
+	  else
+	    niter = desc->niter;
+	}
       else
 	niter = NULL_TREE;
 
@@ -4357,6 +4426,153 @@
   return (exit->flags & EDGE_TRUE_VALUE ? EQ_EXPR : NE_EXPR);
 }
 
+/* Get the loop bound and comparison operator of USE->iv, and store them in
+   BOUND_P and COMP_P.  Returns false if unsuccessful.  */
+
+static bool
+get_use_lt_bound (struct iv_use *use, tree *bound_p, enum tree_code *comp_p)
+{
+  gimple stmt = use->stmt;
+
+  if (gimple_code (stmt) != GIMPLE_COND
+      || gimple_cond_lhs (stmt) != use->iv->ssa_name)
+    return false;
+
+  *comp_p = gimple_cond_code (stmt);
+  *bound_p = gimple_cond_rhs (stmt);
+
+  return true;
+}
+
+/* Tries to replace loop exit test USE, by one formulated in terms of a LT_EXPR
+   comparison with CAND.  Stores the resulting comparison in COMP_P and bound in
+   BOUND_P.  */
+
+static bool
+iv_elimination_compare_lt (struct ivopts_data *data, struct iv_use *use,
+                           struct iv_cand *cand, tree *bound_p,
+			   enum tree_code *comp_p)
+{
+  enum tree_code use_comp, canon_comp;
+  tree base_ptr, use_iv_real_base, use_lt_bound, bound;
+  bool use_uses_inced_iv, use_after_cand_inc;
+  tree use_type = TREE_TYPE (use->iv->ssa_name);
+  tree cand_type, cand_iv_base = cand->iv->base;
+  STRIP_NOPS (cand_iv_base);
+  cand_type = TREE_TYPE (cand_iv_base);
+
+  /* We're trying to replace 'i < n' with 'p < base + n' in
+
+     void
+     f1 (char *base, unsigned long int s, unsigned long int n)
+     {
+       unsigned long int i = s;
+       char *p = base + s;
+       do
+         {
+	   *p = '\0';
+	   p++;
+	   i++;
+	 }
+       while (i < n);
+     }
+
+     Overflow of base + n can't happen because either:
+     - s < n, and i will step to n, and p will step to base + n, or
+     - s >= n, so base + n < base + s, and assuming pointer arithmetic
+       doesn't overflow, base + s doesn't overflow, so base + n won't.
+
+     This transformation is not valid if i and n are signed, because
+     base + n might underflow.
+  */
+
+  /* Use should be an unsigned integral.  */
+  if (!INTEGRAL_TYPE_P (use_type) || !TYPE_UNSIGNED (use_type))
+    return false;
+
+  /* Cand should be a pointer, and pointer overflow should be undefined.  */
+  if (!POINTER_TYPE_P (cand_type) || !POINTER_TYPE_OVERFLOW_UNDEFINED)
+    return false;
+
+  /* Make sure that the loop iterates till the loop bound is hit.  */
+  if (!data->loop_single_exit_p)
+    return false;
+
+  /* We only handle this case for the moment.  */
+  if (!tree_int_cst_equal (use->iv->step, cand->iv->step))
+    return false;
+
+  /* For now, we only handle the case that cand is a source level cand.  It
+     is possible to also allow other cands, provided we can prove there is
+     pointer arithmetic in the loop body reaching base + n.  */
+  if (cand->pos != IP_ORIGINAL)
+    return false;
+
+  /* Determine if the exit test is formulated in terms of the phi or the
+     increment of the use iv.  */
+  use_uses_inced_iv
+    = gimple_code (SSA_NAME_DEF_STMT (use->iv->ssa_name)) != GIMPLE_PHI;
+
+  /* Determine if the exit test is before or after the increment of the
+     cand.  */
+  use_after_cand_inc
+    = stmt_after_increment (data->current_loop, cand, use->stmt);
+
+  /* For now, we only handle these cases.  */
+  if (use_after_cand_inc != use_uses_inced_iv)
+    return false;
+
+  /* Get the base of the non-incremented loop var.  */
+  if (use_uses_inced_iv)
+    {
+      use_iv_real_base = fold_plus (MINUS_EXPR, use->iv->base, use->iv->step);
+      if (use_iv_real_base == NULL_TREE)
+	return false;
+    }
+  else
+    use_iv_real_base = use->iv->base;
+
+  /* Detect p = base + s.  */
+  base_ptr = fold_walk_def_plus (MINUS_EXPR, cand->iv->base,
+				 fold_convert (sizetype, use_iv_real_base));
+  if (base_ptr == NULL_TREE)
+    return false;
+  STRIP_NOPS (base_ptr);
+  if (TREE_CODE (base_ptr) != SSA_NAME)
+    return false;
+
+  /* Get the bound of the iv of the use.  */
+  if (!get_use_lt_bound (use, &use_lt_bound, &use_comp))
+    return false;
+
+  /* Determine canon_comp.  */
+  if (*comp_p == NE_EXPR)
+    canon_comp = use_comp;
+  else if (*comp_p == EQ_EXPR)
+    canon_comp = invert_tree_comparison (use_comp, false);
+  else
+    gcc_unreachable ();
+
+  /* Allow positive and negative step, and inclusive and exclusive bound.
+     To trigger inclusive bound, we need -funsafe-loop-optimizations.  */
+  if (canon_comp != LT_EXPR && canon_comp != GT_EXPR
+      && canon_comp != LE_EXPR && canon_comp != GE_EXPR)
+    return false;
+
+  /* Calculate bound.  */
+  bound = fold_build_plus (PLUS_EXPR, base_ptr,
+			   fold_convert (sizetype, use_lt_bound));
+  if (bound == NULL_TREE)
+    return false;
+
+  if (expression_expensive_p (bound))
+    return false;
+
+  *comp_p = use_comp;
+  *bound_p = bound;
+  return true;
+}
+
 /* Check whether it is possible to express the condition in USE by comparison
    of candidate CAND.  If so, store the value compared with to BOUND, and the
    comparison operator to COMP.  */
@@ -4435,6 +4651,21 @@
   *bound = aff_combination_to_tree (&bnd);
   *comp = iv_elimination_compare (data, use);
 
+  /* Try to implement nit using a '<' instead.  */
+  if (TREE_CODE (nit) == COND_EXPR)
+    {
+      if (iv_elimination_compare_lt (data, use, cand, bound, comp))
+        return true;
+
+      /* We could try to see if the non-lt bound is not too expensive, but the
+         cost infrastructure needs tuning for that first.  Even though
+         expression_expensive_p always returns true for COND_EXPRs, it happens
+         that the bound is folded into a MAX_EXPR, which is approved by
+         expression_expensive_p, but attributed a too low cost by force_var_cost
+         in case the MAX_EXPR would expand into control flow.  */
+      return false;
+    }
+
   /* It is unlikely that computing the number of iterations using division
      would be more profitable than keeping the original induction variable.  */
   if (expression_expensive_p (*bound))

Index: gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c	(revision 0)
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ivopts -fno-tree-vectorize -fno-tree-loop-ivcanon" } */
+
+void
+f1 (char *p, unsigned long int i, unsigned long int n)
+{
+  p += i;
+  do
+    {
+      *p = '\0';
+      p += 1;
+      i++;
+    }
+  while (i < n);
+}
+
+void
+f2 (char *p, unsigned int i, unsigned int n)
+{
+  p += i;
+  do
+    {
+      *p = '\0';
+      p += 1;
+      i++;
+    }
+  while (i < n);
+}
+
+/* { dg-final { scan-tree-dump-times "PHI" 2 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "PHI <p_" 2 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "p_\[0-9\]* <" 2 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */

Follow-Ups:
- Re: ivopts improvement
  - From: Zdenek Dvorak

References:
- Re: ivopts improvement
  - From: Tom de Vries
- Re: ivopts improvement
  - From: Paolo Bonzini
- Re: ivopts improvement
  - From: Tom de Vries
- Re: ivopts improvement
  - From: Zdenek Dvorak

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]