This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [CFG] Loop unswitching

From: Jan Hubicka <jh at suse dot cz>
To: Zdenek Dvorak <rakdver at atrey dot karlin dot mff dot cuni dot cz>
Cc: gcc-patches at gcc dot gnu dot org, gcc-pdo at atrey dot karlin dot mff dot cuni dot cz
Date: Sat, 23 Feb 2002 13:46:27 +0100
Subject: Re: [CFG] Loop unswitching
References: <20020222211516.GA13376@atrey.karlin.mff.cuni.cz>
> + /* This controls how big loops are optimized, and by how much we allow
> +    them to grow.  */
> +   
> + #ifndef MAX_UNSWITCH_INSNS
> + #define MAX_UNSWITCH_INSNS 200
> + #endif
> + 
> + #ifndef MAX_NUM_UNSWITCH
> + #define MAX_NUM_UNSWITCH 8
> + #endif
> + 

:) Nasty issue...

> +   /* Latch must be dominated by it (ugly technical restriction,
> +      we should remove this later).  */
> +   if (!dominated_by_p (loops->cfg.dom, loop->latch, bb))
> +     return false;

Why exactly it is needed?
(you don't need to solve it now, I just want to understand :)
> + 
> +   /* Condition must be invariant.  */
> +   set = pc_set (bb->end);
> +   if (!set)
> +     abort ();
> +   test = XEXP (SET_SRC (set), 0);
> + 
> +   /* Test must be non-trivial (we use trivial ones to disable unreachable
> +      branches.  */
> +   if (rtx_equal_p (test, always_true_cond)
> +       || rtx_equal_p (test, always_false_cond))
> +     return false;

Hmm, really?  Where? Why?
> + 
> +   for (i = 0; i < loop->num_nodes; i++)
> +     if (modified_between_p (test, body[i]->head, NEXT_INSN (body[i]->end)))
> +       return false;
> + 
> +   return true;
> + }
> + 
> + /* Reverses CONDition; returns NULL if we cannot.  */
> + static rtx
> + reversed_condition (cond)
> +      rtx cond;
> + {
> +   enum rtx_code reversed;
> +   reversed = reversed_comparison_code (cond, NULL);
> +   if (reversed == UNKNOWN)
> +     return NULL_RTX;
> +   else
> +     return gen_rtx_fmt_ee (reversed,
> + 	 		     GET_MODE (cond), XEXP (cond, 0),
> + 			     XEXP (cond, 1));
> + }

Perhaps this can go into jump, it is usefull from elsewhere as well
(and whole code dealing with conditionals should go to rtlanal and jump.c
should die BTW).

Why do you need it?  (we do have function to reverse conditional branch. Isn't
that enought?)
> + 
> + /* Unswitch single LOOP.  COND_CHECKED holds list of conditions we already
> +    unswitched on and are true in our branch.  NUM is number of unswitchings
> +    done; do not allow it to grow too much, it is too easy to create example
> +    on that the code would grow exponentially.  */
> + static void
> + unswitch_single_loop (loops, loop, cond_checked, num)
> +      struct loops *loops;
> +      struct loop *loop;
> +      rtx cond_checked;
> +      int *num;
> + {
> +   basic_block *bbs, bb;
> +   struct loop *nloop;
> +   int i, true_first;
> +   rtx cond, rcond, conds, rconds, acond, set;
> + 
> +   /* Do not unswitch too much.  */
> +   if (*num > MAX_NUM_UNSWITCH)
> +     return;
> + 
> +   /* We only unswitch innermost loops (at least for now).  */
> +   if (loop->inner)
> +     return;
> + 
> +   /* And we must be able to duplicate loop body.  */
> +   if (!can_duplicate_loop_p (loop))
> +     return;
> + 
> +   /* Check the size of loop.  */
> +   if (num_loop_insns (loop) > MAX_UNSWITCH_INSNS)
> +     return;

With profile feedback, you may implement function to estimate number
of iterations of the loop and unswitch just loops with more than 1 iteration
at average.

Also use maybe_hot_bb_p for the conditional being unswitched, so we do not
unswitch on rarely executed block.
Perhaps you may compute ratio of entry edge frequency and the conditional
to determine average number of executions of the conditional for one
invokation of the loop.
THat should replace the number of iterations heuristics.  Additionally we
may require conditional to be very hot place in the loop, perhaps something
more strong than maybe_hot_bb_p is desirable.
> + 
> +   /* Find a bb to unswitch on.  We use just a stupid test of invariantness
> +      of the condition: all used regs must not be modified inside loop body.  */
> +   bbs = get_loop_body (loop);
> +   for (i = 0; i < loop->num_nodes; i++)
> +     if (may_unswitch_on_p (loops, bbs[i], loop, bbs))
> +       {
> +         int always_true = 0;
> +         int always_false = 0;
> + 
> + 	if (!(cond = get_condition (bbs[i]->end, NULL)))
> + 	  abort ();
> + 	rcond = reversed_condition (cond);

indent -gnu, please

> + 
> + 	/* Check whether the result can be predicted.  */
> + 	for (acond = cond_checked; acond; acond = XEXP (acond, 1))
> + 	  {
> + 	    if (rtx_equal_p (cond, XEXP (acond, 0)))
> + 	      {
> + 		/* True.  */
> + 		always_true = 1;
> + 		break;
> + 	      }
> + 	    if (rtx_equal_p (rcond, XEXP (acond, 0)))
> + 	      {
> + 		/* False.  */
> + 		always_false = 1;
> + 		break;
> + 	      }
> + 	  }

Simplify_relational is easier to do I guess.
I believe that get_comparison (or however the function in loop.c is named
will do the job for you as well as canonicalizing it).
> + 
> + 	/* In case we are able to predict result, we replace condition by
> + 	   corresponding trivial one.  It is not best way -- removing the
> + 	   unreachable branch would be better as it could make more
> + 	   opportunities for unswitching/ unrolling.  There are two problems
> + 	   with this.  The first is handling of dominators -- we would have to
> + 	   recount dominators for big area.  The second and more important one
> + 	   is that we could create unreachable blocks; we would have to remove
> + 	   them too, having big trouble with keeping loop structure up-to-date.
> + 	   */

This is nasty.  W/o midlevel RTL we can't do such trick easilly.  We must
eighter always canonicalize the conditional, that is probably easy, or find
way around this.

Are the unreachable blocks really so big problem for loop structure?  I would
guess that it is relatively easy... hmm, I see, you may turn loop into non-loop
Nasty :)
> + 	set = pc_set (bbs[i]->end);
> + 	if (!set)
> + 	  abort ();
> + 	if (always_true)
> + 	  {
> + 	    if (!validate_change (bbs[i]->end, &XEXP (SET_SRC (set), 0),
> + 				  copy_rtx (always_true_cond), 0))
> + 	      break;
> + 	  }
> + 	else if (always_false)
> + 	  {
> + 	    if (!validate_change (bbs[i]->end, &XEXP (SET_SRC (set), 0),
> + 				  copy_rtx (always_false_cond), 0))
> + 	      break;
> + 	  }
> + 	else
> + 	  break;
> +       }
> +  
> +   if (i == loop->num_nodes)
> +     {
> +       free (bbs);
> +       return;
> +     }
> + 
> +   conds = alloc_EXPR_LIST (0, cond, cond_checked);
> +   if (rcond)
> +     rconds = alloc_EXPR_LIST (0, rcond, cond_checked);
> +   else
> +     rconds = cond_checked;
> + 
> +   /* Separate condition.  */
> +   bb = split_loop_bb (loops, bbs[i], PREV_INSN (bbs[i]->end))->dest;
> +   free (bbs);
> +   true_first = !(bb->succ->flags & EDGE_FALLTHRU);
> + 
> +   /* Unswitch the loop.  */
> +   nloop = unswitch_loop (loops, loop, bb);
> +   if (!nloop)
> +     abort ();
> + 
> +   /* Count number of unswitchings.  */
> +   (*num)++;
> + 
> +   /* Invoke itself on modified loops.  */
> +   unswitch_single_loop (loops, nloop, true_first ? conds : rconds, num);
> +   unswitch_single_loop (loops, loop, true_first ? rconds : conds, num);

Please avoid recursions.  On "systems", like DOS, where stack size is limited,
gcc may get crashes. Politics is to avoid stack consumption that depends
on program size.  Of course this is not hold elsewhere, but lets not make
it worse.
> +   free_EXPR_LIST_node (conds);
> +   if (rcond)
> +     free_EXPR_LIST_node (rconds);
> + }
> + 
> + /* Checks whether BB is inside RPE_LOOP and is dominated by RPE_DOM.  */
> + struct rpe_data
> +  {
> +    struct loop *loop;
> +    basic_block dom;
> +    sbitmap *doms;
> +  };
> + 
> + static bool
> + rpe_enum_p (bb, data)
> +      basic_block bb;
> +      void *data;
> + {
> +   struct rpe_data *rpe = data;
> +   return flow_bb_inside_loop_p (rpe->loop, bb)
> + 	 && dominated_by_p (rpe->doms, bb, rpe->dom);
> + }
> + 
> + /* Remove given edge E and all bbs inside LOOP that are dominated by target 
> +    of E.  Fix dominators by redirecting to copy if FIX is set.  */
> + static void
> + remove_path (e, loop, doms, fix)
> +      edge e;
> +      struct loop *loop;
> +      sbitmap *doms;
> +      int fix;

Hmm, this can be probably made generic in cfganal or somewhere.  It looks
like relativly usefull cleanup function.S

Why you can't use remove_path to cleanup after the constant conditionals are
recognized above?
> + {
> +   int nrem, i, j;
> +   basic_block *rem_bbs, bb, nbb, *dom_bbs, dom_bb, ndom;
> +   int n_dom_bbs;
> +   edge ae;
> +   struct rpe_data rpe;
> +  
> +   /* Check for case when we are able to get to e->dest by other ways.  */
> +   if (e->dest->pred->pred_next)
> +     {
> +       for (ae = e->dest->pred; ae; ae = ae->pred_next)
> + 	if (ae != e && !dominated_by_p (doms, ae->src, e->dest))
> + 	  break;
> +       if (ae)
> + 	{
> + 	  /* We just want to cancel edge.  */
> + 	  loop_delete_branch_edge (e);
> + 	  return;
> + 	}
> +     }
> +   
> +   /* Find bbs we are interested in.  */
> +   rpe.loop = loop;
> +   rpe.dom = e->dest;
> +   rpe.doms = doms;
> +   rem_bbs = xcalloc (loop->num_nodes, sizeof (basic_block));
> +   nrem = dfs_enumerate_from (e->dest, 0, rpe_enum_p, rem_bbs,
> + 			     loop->num_nodes, &rpe);
> +   
> +   if (fix)
> +     {
> +       /* Fix dominators.  */
> +       for (i = 0; i < nrem; i++)
> + 	{
> + 	  bb = rem_bbs[i];
> + 	  ndom = RBI (bb)->copy;
> + 	  n_dom_bbs = get_dominated_by (doms, bb, &dom_bbs);
> + 	  for (j = 0; j < n_dom_bbs; j++)
> + 	    {
> + 	      /* This may also be bbs inside loop; but it does not matter,
> + 		 as they will be deleted anyway.  */
> + 	      dom_bb = dom_bbs[j];
> + 	      set_immediate_dominator (doms, dom_bb, ndom);
> + 	    }
> + 	  free (dom_bbs);
> + 	}
> +       /* Fix frequencies.  */
> +       for (i = 0; i < nrem; i++)
> + 	{
> + 	  bb = rem_bbs[i];
> + 	  nbb = RBI (bb)->copy;
> + 	  nbb->frequency += bb->frequency;
You should update branch probabilities as well.
Please try to be curefull and not mess them by messed up frequencies
(try to find robust formula to update them)

> + /* Compute loop from header and latch info filled in LOOP and add it to
> +    LOOPS.  */
> + static void
> + add_loop (loops, loop)
> +      struct loops *loops;
> +      struct loop *loop;

Hmm, also sounds like generally usefull function.

> +   /* Some sanity checking.  */
> +   if (!flow_bb_inside_loop_p (loop, unswitch_on))
> +     abort ();
> +   if (!unswitch_on->succ || !unswitch_on->succ->succ_next ||
> +       unswitch_on->succ->succ_next->succ_next)
> +     abort ();
> +   if (!dominated_by_p (loops->cfg.dom, loop->latch, unswitch_on))
> +     abort ();
> +   if (loop->inner)
> +     abort ();
> +   if (!flow_bb_inside_loop_p (loop, unswitch_on->succ->dest))
> +     abort ();
> +   if (!flow_bb_inside_loop_p (loop, unswitch_on->succ->succ_next->dest))
> +     abort ();
> +   
> +   /* Will we be able to perform redirection?  */
> +   if (!any_condjump_p (unswitch_on->end))
> +     return NULL;
Maybe better is check whether there are abnormals around.  Not that
important.  I don't think we want to unswitch computed jumps, but
perhaps...
> +   if (!cfg_layout_can_duplicate_bb_p (unswitch_on))
> +     return NULL;
Didn't you tested that whole loop is duplicable earlier?

> +   /* Cfglayout says: */
> +   /* Avoid redirect_edge_and_branch from overactive optimizing.  */
> +   dest->index = n_basic_blocks + 1;
> +   if (e->flags & EDGE_FALLTHRU)
> +     redirect_edge_succ (e, dest);
> +   else
> +     {
> +       if (!redirect_edge_and_branch (e, dest))
> + 	abort ();
> +     }

Please do not propagate dirty hacks to code.  Just create
cfg_layout_redirect_edge and hide the hackery there.

This can be probably called loop_redirect_edge or something,
as most redirect_edge do update of something.
> +   dest->index = dest_index;
> +   }
> + 
> + /* Deletes edge if possible.  */
> + static void
> + loop_delete_branch_edge (e)
> +      edge e;
> + {
> +   basic_block src = e->src;
> + 
> +   if (src->succ->succ_next)
> +     {
> +       int tot_pr, tot_rem;
> +       edge ae;
> + 
> +       /* Cannot handle more than two exit edges.  */
> +       if (src->succ->succ_next->succ_next)
> + 	return;
> +       /* Neither this.  */
> +       if (!any_condjump_p (src->end))
> + 	return;
> + 
> +       /* Try to fix probabilities.  This is probably as wrong as to leave
> + 	 them as they are.  */
> +       tot_pr = 0;
> +       for (ae = src->succ; ae; ae = ae->succ_next)
> + 	tot_pr += ae->probability;
> +       tot_rem = tot_pr - e->probability;
> +       if (tot_rem > 0)
> + 	{
> + 	  for (ae = src->succ; ae; ae = ae->succ_next)
> + 	    ae->probability = (ae->probability * tot_pr) / tot_rem;
> + 	}
> + 
> +       /* I do not know how much is this correct.  */
> +       delete_insn (src->end);
> +       
> +       remove_edge (e);
> +       src->succ->flags |= EDGE_FALLTHRU;

Should be safe for condjumps. Perhaps we can create function for that
in case later we will want to do something else.
Other way around can be cfg_layout_redirect_edge that will put both
edges to same destination.  Jump should be optimized away then.


> +     }
> +   else
> +     {
> +       /* Cannot happen -- we are duplicating loop! */
> +       abort ();
> +     }
> + }
> + 

> + /* Duplicates body of LOOP to given edge E NDUPL times.  Takes care of
> +    updating LOOPS structure.  E's destination must be LOOP header for this to
> +    work.  Remove exit edges from copies corresponding to set bits in WONT_EXIT
> +    (bit 0 corresponds to original LOOP body).  If UPDATE_DOMINATORS is set,
> +    also take care of updating dominators outside LOOP.  If UPDATE_FREQ is set,
> +    also update frequencies, otherwise just copy them.  Returns false if
> +    duplication is impossible.  */
> + /* Grrr... this function is getting far too complicated for me to like it.
> +    I should make something with it.  */

Did you seenthe unrool.c equivalent?  Still MUCH better :)
It does nasty operation, so it needs to be compelx..

Looks really good!

Honza
Follow-Ups:
- Re: [CFG] Loop unswitching
  - From: Zdenek Dvorak
- Re: [CFG] Loop unswitching
  - From: Mark Mitchell
References:
- [CFG] Loop unswitching
  - From: Zdenek Dvorak
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]