This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[patch] Make jump threading preserve loops


Hello,

this patch makes jump threading preserve loops.  At the moment, jump
threading tries to keep the loops valid by using edge flags; this is
not quite reliable, and also not all cases are handled correctly.
Preserving the validity of current_loops structure ensures that
no invalid loops may be created much more reliably.

To achieve this, some restructuralization of tree-ssa-threadupdate
turned out to be useful -- threading through loop headers is now handled
completely separately (except for the actual code & cfg rewriting, which
is of course shared), as the reasoning is the most complicated there.

Although we now prohibit jump threading in some cases where it was
allowed before, the most common cases where threading through loop
header is useful are handled.  Also, some other cases that used
to be forbidden before are now allowed, which e.g. forces the changes
to vect-102.c and vect-103.c testcases in order to prevent jump
threading from simplifying them too much.

Some changes to other parts of compiler turned out to be necessary:
1) cleanup_tree_cfg_loop is now run a bit more often, which revealed
   problems in fix_loop_structure and several other functions related
   to maintaining information about loops.
2) Jump threading sometimes peels iterations from loops, which is
   annoying and may prevent e.g. vectorizer from working efficiently,
   by spoiling the alignment of arrays.  With this patch, loop peeling
   occured more often, so I was forced to restrict it a bit (otherwise,
   since vrp + dom are run about 5 times through the compilation, we
   would end up peeling 4 or 5 iterations from many loops); in
   particular, some types of jump threading are only allowed in the
   first instance of dom pass, so some changes to enable us to detect
   whether the first instance of a pass is run were necessary.

One a bit annoying bit is that although we update the information about
loops, we do not manage to update dominators during jump threading;
the main problem there is that jump threading may create unreachable
blocks, and dominators infrastructure does not like that.  I am thinking
about some solution, but for the moment, we use
determine_bb_domination_status to recover the parts of the dominance
information we need by scanning cfg.

Bootstrapped & regtested on i686, x86_64 and ppc64.

Zdenek

	* tree-vrp.c (finalize_jump_threads): Do not care about dominance info.
	(execute_vrp): Keep loops through jump threading.
	* tree-ssa-threadupdate.c (set_loop_for_bb, thread_single_edge,
	dbds_continue_enumeration_p, determine_bb_domination_status,
	thread_through_loop_header): New functions.
	(create_edge_and_update_destination_phis,
	create_edge_and_update_destination_phis): Set loops for the new blocks.
	(prune_undesirable_thread_requests): Removed.
	(redirect_edges): Do not pretend that redirect_edge_and_branch can
	create new blocks.
	(thread_block): Do not call prune_undesirable_thread_requests.
	Update loops.
	(mark_threaded_blocks): Select edges to thread here.
	(thread_through_all_blocks): Take may_peel_loop_headers argument.
	Thread edges through loop headers independently.
	* cfgloopmanip.c (create_preheader, mfb_keep_just): Export.
	(create_preheaders): Handle the case some loops were removed.
	(fix_loop_structure): Rewritten.
	* tree-pass.h (TODO_mark_first_instance): New.
	(first_pass_instance): Declare.
	* tree-scalar-evolution.c (scev_finalize): Clear scalar_evolution_info.
	* tree-ssa-dom.c (tree_ssa_dominator_optimize): Keep loops through
	jump threading.  Pass may_peel_loop_headers to
	thread_through_all_blocks according to first_pass_instance.
	* predict.c (tree_estimate_probability):  Do not call
	calculate_dominance_info unnecessarily.  Call create_preheaders.
	* tree-cfgcleanup.c (cleanup_tree_cfg_loop): Add fix_lcssa argument.
	Update loop closed ssa form according to it.
	* cfgloop.c (mfb_keep_just): Removed.
	(loop_preheader_edge): Assert that we indeed have preheaders.
	* cfgloop.h (create_preheader): Declare.
	* tree-flow.h (cleanup_tree_cfg_loop, thread_through_all_blocks):
	Declaration changed.
	* basic-block.h (mfb_keep_just): Declare.
	* tree-cfg.c (tree_split_edge): Assert that the edge does not change
	by redirection.
	* passes.c (first_pass_instance): New variable.
	(next_pass_1): Set TODO_mark_first_instance.
	(execute_todo): Set first_pass_instance.  Add argument to
	cleanup_tree_cfg_loop call.
	* tree-ssa-threadedge.c (thread_across_edge): Improve comments.

	* gcc.dg/tree-ssa/ssa-dom-thread-2.c: New test.
	* gcc.dg/vect/vect-102.c, gcc.dg/vect/vect-103.c: Use more complex
	construction to prevent vectorizing.

Index: tree-vrp.c
===================================================================
*** tree-vrp.c	(revision 118220)
--- tree-vrp.c	(working copy)
*************** identify_jump_threads (void)
*** 4652,4668 ****
  static void
  finalize_jump_threads (void)
  {
!   bool cfg_altered = false;
!   cfg_altered = thread_through_all_blocks ();
  
-   /* If we threaded jumps, then we need to recompute the dominance
-      information, to safely do that we must clean up the CFG first.  */
-   if (cfg_altered)
-     {
-       free_dominance_info (CDI_DOMINATORS);
-       cleanup_tree_cfg ();
-       calculate_dominance_info (CDI_DOMINATORS);
-     }
    VEC_free (tree, heap, stack);
  }
  
--- 4652,4662 ----
  static void
  finalize_jump_threads (void)
  {
!   bool altered = thread_through_all_blocks (false);
! 
!   if (altered)
!     cleanup_tree_cfg_loop (false);
  
    VEC_free (tree, heap, stack);
  }
  
*************** execute_vrp (void)
*** 4779,4785 ****
  {
    insert_range_assertions ();
  
!   current_loops = loop_optimizer_init (LOOPS_NORMAL);
    if (current_loops)
      scev_initialize (current_loops);
  
--- 4773,4780 ----
  {
    insert_range_assertions ();
  
!   current_loops = loop_optimizer_init (LOOPS_NORMAL
! 				       | LOOPS_HAVE_MARKED_SINGLE_EXITS);
    if (current_loops)
      scev_initialize (current_loops);
  
*************** execute_vrp (void)
*** 4788,4798 ****
    vrp_finalize ();
  
    if (current_loops)
!     {
!       scev_finalize ();
!       loop_optimizer_finalize (current_loops);
!       current_loops = NULL;
!     }
  
    /* ASSERT_EXPRs must be removed before finalizing jump threads
       as finalizing jump threads calls the CFG cleanup code which
--- 4783,4789 ----
    vrp_finalize ();
  
    if (current_loops)
!     scev_finalize ();
  
    /* ASSERT_EXPRs must be removed before finalizing jump threads
       as finalizing jump threads calls the CFG cleanup code which
*************** execute_vrp (void)
*** 4807,4812 ****
--- 4798,4810 ----
    update_ssa (TODO_update_ssa);
  
    finalize_jump_threads ();
+ 
+   if (current_loops)
+     {
+       loop_optimizer_finalize (current_loops);
+       current_loops = NULL;
+     }
+ 
    return 0;
  }
  
Index: tree-ssa-threadupdate.c
===================================================================
*** tree-ssa-threadupdate.c	(revision 118220)
--- tree-ssa-threadupdate.c	(working copy)
*************** lookup_redirection_data (edge e, edge in
*** 301,306 ****
--- 301,318 ----
      }
  }
  
+ /* Sets the loop_father for BB to LOOP.  */
+ 
+ static void
+ set_loop_for_bb (struct loop *loop, basic_block bb)
+ {
+   if (!current_loops)
+     return;
+ 
+   gcc_assert (bb->loop_father == NULL);
+   add_bb_to_loop (bb, loop);
+ }
+ 
  /* Given a duplicate block and its single destination (both stored
     in RD).  Create an edge between the duplicate and its single
     destination.
*************** lookup_redirection_data (edge e, edge in
*** 309,321 ****
     destination.  */
  
  static void
! create_edge_and_update_destination_phis (struct redirection_data *rd)
  {
    edge e = make_edge (rd->dup_block, rd->outgoing_edge->dest, EDGE_FALLTHRU);
    tree phi;
  
    e->probability = REG_BR_PROB_BASE;
    e->count = rd->dup_block->count;
  
    /* If there are any PHI nodes at the destination of the outgoing edge
       from the duplicate block, then we will need to add a new argument
--- 321,337 ----
     destination.  */
  
  static void
! create_edge_and_update_destination_phis (struct local_info *local_info,
! 					 struct redirection_data *rd)
  {
    edge e = make_edge (rd->dup_block, rd->outgoing_edge->dest, EDGE_FALLTHRU);
    tree phi;
  
    e->probability = REG_BR_PROB_BASE;
    e->count = rd->dup_block->count;
+   e->aux = rd->outgoing_edge->aux;
+ 
+   set_loop_for_bb (local_info->bb->loop_father, rd->dup_block);
  
    /* If there are any PHI nodes at the destination of the outgoing edge
       from the duplicate block, then we will need to add a new argument
*************** create_duplicates (void **slot, void *da
*** 358,364 ****
  
        /* Go ahead and wire up outgoing edges and update PHIs for the duplicate
           block.  */
!       create_edge_and_update_destination_phis (rd);
      }
  
    /* Keep walking the hash table.  */
--- 374,380 ----
  
        /* Go ahead and wire up outgoing edges and update PHIs for the duplicate
           block.  */
!       create_edge_and_update_destination_phis (local_info, rd);
      }
  
    /* Keep walking the hash table.  */
*************** fixup_template_block (void **slot, void 
*** 379,584 ****
       and halt the hash table traversal.  */
    if (rd->dup_block && rd->dup_block == local_info->template_block)
      {
!       create_edge_and_update_destination_phis (rd);
        return 0;
      }
  
    return 1;
  }
  
- /* Not all jump threading requests are useful.  In particular some
-    jump threading requests can create irreducible regions which are
-    undesirable.
- 
-    This routine will examine the BB's incoming edges for jump threading
-    requests which, if acted upon, would create irreducible regions.  Any
-    such jump threading requests found will be pruned away.  */
- 
- static void
- prune_undesirable_thread_requests (basic_block bb)
- {
-   edge e;
-   edge_iterator ei;
-   bool may_create_irreducible_region = false;
-   unsigned int num_outgoing_edges_into_loop = 0;
- 
-   /* For the heuristics below, we need to know if BB has more than
-      one outgoing edge into a loop.  */
-   FOR_EACH_EDGE (e, ei, bb->succs)
-     num_outgoing_edges_into_loop += ((e->flags & EDGE_LOOP_EXIT) == 0);
- 
-   if (num_outgoing_edges_into_loop > 1)
-     {
-       edge backedge = NULL;
- 
-       /* Consider the effect of threading the edge (0, 1) to 2 on the left
- 	 CFG to produce the right CFG:
-     
- 
-              0            0
-              |            |
-              1<--+        2<--------+
-             / \  |        |         |
-            2   3 |        4<----+   |
-             \ /  |       / \    |   |
-              4---+      E   1-- | --+
-              |              |   |
-              E              3---+
- 
- 
-  	Threading the (0, 1) edge to 2 effectively creates two loops
-  	(2, 4, 1) and (4, 1, 3) which are neither disjoint nor nested.
- 	This is not good.
- 
- 	However, we do need to be able to thread  (0, 1) to 2 or 3
- 	in the left CFG below (which creates the middle and right
- 	CFGs with nested loops).
- 
-              0          0             0
-              |          |             |
-              1<--+      2<----+       3<-+<-+
-             /|   |      |     |       |  |  |
-            2 |   |      3<-+  |       1--+  |
-             \|   |      |  |  |       |     |
-              3---+      1--+--+       2-----+
- 
- 	 
- 	 A safe heuristic appears to be to only allow threading if BB
- 	 has a single incoming backedge from one of its direct successors.  */
- 
-       FOR_EACH_EDGE (e, ei, bb->preds)
- 	{
- 	  if (e->flags & EDGE_DFS_BACK)
- 	    {
- 	      if (backedge)
- 		{
- 		  backedge = NULL;
- 		  break;
- 		}
- 	      else
- 		{
- 		  backedge = e;
- 		}
- 	    }
- 	}
- 
-       if (backedge && find_edge (bb, backedge->src))
- 	;
-       else
-         may_create_irreducible_region = true;
-     }
-   else
-     {
-       edge dest = NULL;
- 
-       /* If we thread across the loop entry block (BB) into the
- 	 loop and BB is still reached from outside the loop, then
- 	 we would create an irreducible CFG.  Consider the effect
- 	 of threading the edge (1, 4) to 5 on the left CFG to produce
- 	 the right CFG
- 
-              0               0
-             / \             / \
-            1   2           1   2
-             \ /            |   |
-              4<----+       5<->4
-             / \    |           |
-            E   5---+           E
- 
- 
- 	 Threading the (1, 4) edge to 5 creates two entry points
- 	 into the loop (4, 5) (one from block 1, the other from
- 	 block 2).  A classic irreducible region. 
- 
- 	 So look at all of BB's incoming edges which are not
- 	 backedges and which are not threaded to the loop exit.
- 	 If that subset of incoming edges do not all thread
- 	 to the same block, then threading any of them will create
- 	 an irreducible region.  */
- 
-       FOR_EACH_EDGE (e, ei, bb->preds)
- 	{
- 	  edge e2;
- 
- 	  /* We ignore back edges for now.  This may need refinement
-     	     as threading a backedge creates an inner loop which
- 	     we would need to verify has a single entry point. 
- 
- 	     If all backedges thread to new locations, then this
- 	     block will no longer have incoming backedges and we
- 	     need not worry about creating irreducible regions
- 	     by threading through BB.  I don't think this happens
- 	     enough in practice to worry about it.  */
- 	  if (e->flags & EDGE_DFS_BACK)
- 	    continue;
- 
- 	  /* If the incoming edge threads to the loop exit, then it
- 	     is clearly safe.  */
- 	  e2 = e->aux;
- 	  if (e2 && (e2->flags & EDGE_LOOP_EXIT))
- 	    continue;
- 
- 	  /* E enters the loop header and is not threaded.  We can
- 	     not allow any other incoming edges to thread into
- 	     the loop as that would create an irreducible region.  */
- 	  if (!e2)
- 	    {
- 	      may_create_irreducible_region = true;
- 	      break;
- 	    }
- 
- 	  /* We know that this incoming edge threads to a block inside
- 	     the loop.  This edge must thread to the same target in
- 	     the loop as any previously seen threaded edges.  Otherwise
- 	     we will create an irreducible region.  */
- 	  if (!dest)
- 	    dest = e2;
- 	  else if (e2 != dest)
- 	    {
- 	      may_create_irreducible_region = true;
- 	      break;
- 	    }
- 	}
-     }
- 
-   /* If we might create an irreducible region, then cancel any of
-      the jump threading requests for incoming edges which are
-      not backedges and which do not thread to the exit block.  */
-   if (may_create_irreducible_region)
-     {
-       FOR_EACH_EDGE (e, ei, bb->preds)
- 	{
- 	  edge e2;
- 
- 	  /* Ignore back edges.  */
- 	  if (e->flags & EDGE_DFS_BACK)
- 	    continue;
- 
- 	  e2 = e->aux;
- 
- 	  /* If this incoming edge was not threaded, then there is
- 	     nothing to do.  */
- 	  if (!e2)
- 	    continue;
- 
- 	  /* If this incoming edge threaded to the loop exit,
- 	     then it can be ignored as it is safe.  */
- 	  if (e2->flags & EDGE_LOOP_EXIT)
- 	    continue;
- 
- 	  if (e2)
- 	    {
- 	      /* This edge threaded into the loop and the jump thread
- 		 request must be cancelled.  */
- 	      if (dump_file && (dump_flags & TDF_DETAILS))
- 		fprintf (dump_file, "  Not threading jump %d --> %d to %d\n",
- 			 e->src->index, e->dest->index, e2->dest->index);
- 	      e->aux = NULL;
- 	    }
- 	}
-     }
- }
- 
  /* Hash table traversal callback to redirect each incoming edge
     associated with this hash table element to its new destination.  */
  
--- 395,407 ----
       and halt the hash table traversal.  */
    if (rd->dup_block && rd->dup_block == local_info->template_block)
      {
!       create_edge_and_update_destination_phis (local_info, rd);
        return 0;
      }
  
    return 1;
  }
  
  /* Hash table traversal callback to redirect each incoming edge
     associated with this hash table element to its new destination.  */
  
*************** redirect_edges (void **slot, void *data)
*** 621,631 ****
  	  /* Redirect the incoming edge to the appropriate duplicate
  	     block.  */
  	  e2 = redirect_edge_and_branch (e, rd->dup_block);
  	  flush_pending_stmts (e2);
- 
- 	  if ((dump_file && (dump_flags & TDF_DETAILS))
- 	      && e->src != e2->src)
- 	    fprintf (dump_file, "    basic block %d created\n", e2->src->index);
  	}
        else
  	{
--- 444,451 ----
  	  /* Redirect the incoming edge to the appropriate duplicate
  	     block.  */
  	  e2 = redirect_edge_and_branch (e, rd->dup_block);
+ 	  gcc_assert (e == e2);
  	  flush_pending_stmts (e2);
  	}
        else
  	{
*************** redirection_block_p (basic_block bb)
*** 697,742 ****
     successor of BB.  We then revector the incoming edges into BB to
     the appropriate duplicate of BB.
  
!    BB and its duplicates will have assignments to the same set of
!    SSA_NAMEs.  Right now, we just call into update_ssa to update the
!    SSA graph for those names.
! 
!    We are also going to experiment with a true incremental update
!    scheme for the duplicated resources.  One of the interesting
!    properties we can exploit here is that all the resources set
!    in BB will have the same IDFS, so we have one IDFS computation
!    per block with incoming threaded edges, which can lower the
!    cost of the true incremental update algorithm.  */
  
  static bool
! thread_block (basic_block bb)
  {
    /* E is an incoming edge into BB that we may or may not want to
       redirect to a duplicate of BB.  */
!   edge e;
    edge_iterator ei;
    struct local_info local_info;
! 
!   /* FOUND_BACKEDGE indicates that we found an incoming backedge
!      into BB, in which case we may ignore certain jump threads
!      to avoid creating irreducible regions.  */
!   bool found_backedge = false;
  
    /* ALL indicates whether or not all incoming edges into BB should
       be threaded to a duplicate of BB.  */
    bool all = true;
  
-   /* If optimizing for size, only thread this block if we don't have
-      to duplicate it or it's an otherwise empty redirection block.  */
-   if (optimize_size
-       && EDGE_COUNT (bb->preds) > 1
-       && !redirection_block_p (bb))
-     {
-       FOR_EACH_EDGE (e, ei, bb->preds)
- 	e->aux = NULL;
-       return false;
-     }
- 
    /* To avoid scanning a linear array for the element we need we instead
       use a hash table.  For normal code there should be no noticeable
       difference.  However, if we have a block with a large number of
--- 517,539 ----
     successor of BB.  We then revector the incoming edges into BB to
     the appropriate duplicate of BB.
  
!    If NOLOOP_ONLY is true, we only perform the threading as long as it
!    does not affect the structure of the loops in a nontrivial way.  */
  
  static bool
! thread_block (basic_block bb, bool noloop_only)
  {
    /* E is an incoming edge into BB that we may or may not want to
       redirect to a duplicate of BB.  */
!   edge e, e2;
    edge_iterator ei;
    struct local_info local_info;
!   struct loop *loop = bb->loop_father;
  
    /* ALL indicates whether or not all incoming edges into BB should
       be threaded to a duplicate of BB.  */
    bool all = true;
  
    /* To avoid scanning a linear array for the element we need we instead
       use a hash table.  For normal code there should be no noticeable
       difference.  However, if we have a block with a large number of
*************** thread_block (basic_block bb)
*** 746,780 ****
  				  redirection_data_eq,
  				  free);
  
!   FOR_EACH_EDGE (e, ei, bb->preds)
!     found_backedge |= ((e->flags & EDGE_DFS_BACK) != 0);
  
!   /* If BB has incoming backedges, then threading across BB might
!      introduce an irreducible region, which would be undesirable
!      as that inhibits various optimizations later.  Prune away
!      any jump threading requests which we know will result in
!      an irreducible region.  */
!   if (found_backedge)
!     prune_undesirable_thread_requests (bb);
  
    /* Record each unique threaded destination into a hash table for
       efficient lookups.  */
    FOR_EACH_EDGE (e, ei, bb->preds)
      {
!       if (!e->aux)
  	{
  	  all = false;
  	}
!       else
! 	{
! 	  edge e2 = e->aux;
! 	  update_bb_profile_for_threading (e->dest, EDGE_FREQUENCY (e),
! 					   e->count, e->aux);
! 
! 	  /* Insert the outgoing edge into the hash table if it is not
! 	     already in the hash table.  */
! 	  lookup_redirection_data (e2, e, INSERT);
! 	}
      }
  
    /* If we are going to thread all incoming edges to an outgoing edge, then
--- 543,587 ----
  				  redirection_data_eq,
  				  free);
  
!   /* If we thread the latch of the loop to its exit, the loop ceases to
!      exist.  Make sure we do not restrict ourselves in order to preserve
!      this loop.  */
!   if (current_loops && loop->header == bb)
!     {
!       e = loop_latch_edge (loop);
!       e2 = e->aux;
  
!       if (e2 && loop_exit_edge_p (loop, e2))
! 	{
! 	  loop->header = NULL;
! 	  loop->latch = NULL;
! 	}
!     }
  
    /* Record each unique threaded destination into a hash table for
       efficient lookups.  */
    FOR_EACH_EDGE (e, ei, bb->preds)
      {
!       e2 = e->aux;
! 
!       if (!e2
! 	  /* If NOLOOP_ONLY is true, we only allow threading through the
! 	     header of a loop to exit edges.  */
! 	  || (noloop_only
! 	      && current_loops
! 	      && bb == bb->loop_father->header
! 	      && !loop_exit_edge_p (bb->loop_father, e2)))
  	{
  	  all = false;
+ 	  continue;
  	}
! 
!       update_bb_profile_for_threading (e->dest, EDGE_FREQUENCY (e),
! 				       e->count, e->aux);
! 
!       /* Insert the outgoing edge into the hash table if it is not
! 	 already in the hash table.  */
!       lookup_redirection_data (e2, e, INSERT);
      }
  
    /* If we are going to thread all incoming edges to an outgoing edge, then
*************** thread_block (basic_block bb)
*** 822,827 ****
--- 629,959 ----
    return local_info.jumps_threaded;
  }
  
+ /* Threads edge E through E->dest to the edge E->aux.  Returns the copy
+    of E->dest created during threading, or E->dest if it was not necessary
+    to copy it (E is its single predecessor).  */
+ 
+ static basic_block
+ thread_single_edge (edge e)
+ {
+   basic_block bb = e->dest;
+   edge eto = e->aux;
+   struct redirection_data rd;
+   struct local_info local_info;
+ 
+   e->aux = NULL;
+ 
+   thread_stats.num_threaded_edges++;
+ 
+   if (single_pred_p (bb))
+     {
+       /* If BB has just a single predecessor, we should only remove the
+ 	 control statements at its end, and successors except for ETO.  */
+       remove_ctrl_stmt_and_useless_edges (bb, eto->dest);
+       return bb;
+     }
+ 
+   /* Otherwise, we need to create a copy.  */
+   update_bb_profile_for_threading (bb, EDGE_FREQUENCY (e), e->count, eto);
+ 
+   local_info.bb = bb;
+   rd.outgoing_edge = eto;
+ 
+   create_block_for_threading (bb, &rd);
+   create_edge_and_update_destination_phis (&local_info, &rd);
+ 
+   if (dump_file && (dump_flags & TDF_DETAILS))
+     fprintf (dump_file, "  Threaded jump %d --> %d to %d\n",
+ 	     e->src->index, e->dest->index, rd.dup_block->index);
+ 
+   rd.dup_block->count = e->count;
+   rd.dup_block->frequency = EDGE_FREQUENCY (e);
+   single_succ_edge (rd.dup_block)->count = e->count;
+   redirect_edge_and_branch (e, rd.dup_block);
+   flush_pending_stmts (e);
+ 
+   return rd.dup_block;
+ }
+ 
+ /* Callback for dfs_enumerate_from.  Returns true if BB is different
+    from STOP and DBDS_CE_STOP.  */
+ 
+ static basic_block dbds_ce_stop;
+ static bool
+ dbds_continue_enumeration_p (basic_block bb, void *stop)
+ {
+   return (bb != (basic_block) stop
+ 	  && bb != dbds_ce_stop);
+ }
+ 
+ /* Evaluates the dominance relationship of latch of the LOOP and BB, and
+    returns the state.  */
+ 
+ enum bb_dom_status
+ {
+   /* BB does not dominate latch of the LOOP.  */
+   DOMST_NONDOMINATING,
+   /* The LOOP is broken (there is no path from the header to its latch.  */
+   DOMST_LOOP_BROKEN,
+   /* BB dominates the latch of the LOOP.  */
+   DOMST_DOMINATING
+ };
+ 
+ static enum bb_dom_status
+ determine_bb_domination_status (struct loop *loop, basic_block bb)
+ {
+   basic_block *bblocks;
+   unsigned nblocks, i;
+   bool bb_reachable = false;
+   edge_iterator ei;
+   edge e;
+ 
+ #ifdef ENABLE_CHECKING
+   /* This function assumes BB is a successor of LOOP->header.  */
+     {
+       bool ok = false;
+ 
+       FOR_EACH_EDGE (e, ei, bb->preds)
+ 	{
+      	  if (e->src == loop->header)
+ 	    {
+ 	      ok = true;
+ 	      break;
+ 	    }
+ 	}
+ 
+       gcc_assert (ok);
+     }
+ #endif
+ 
+   if (bb == loop->latch)
+     return DOMST_DOMINATING;
+ 
+   /* Check that BB dominates LOOP->latch, and that it is back-reachable
+      from it.  */
+ 
+   bblocks = XCNEWVEC (basic_block, loop->num_nodes);
+   dbds_ce_stop = loop->header;
+   nblocks = dfs_enumerate_from (loop->latch, 1, dbds_continue_enumeration_p,
+ 				bblocks, loop->num_nodes, bb);
+   for (i = 0; i < nblocks; i++)
+     FOR_EACH_EDGE (e, ei, bblocks[i]->preds)
+       {
+ 	if (e->src == loop->header)
+ 	  {
+ 	    free (bblocks);
+ 	    return DOMST_NONDOMINATING;
+ 	  }
+ 	if (e->src == bb)
+ 	  bb_reachable = true;
+       }
+ 
+   free (bblocks);
+   return (bb_reachable ? DOMST_DOMINATING : DOMST_LOOP_BROKEN);
+ }
+ 
+ /* Thread jumps through the header of LOOP.  Returns true if cfg changes.
+    If MAY_PEEL_LOOP_HEADERS is false, we avoid threading from entry edges
+    to the inside of the loop.  */
+ 
+ static bool
+ thread_through_loop_header (struct loop *loop, bool may_peel_loop_headers)
+ {
+   basic_block header = loop->header;
+   edge e, tgt_edge, latch = loop_latch_edge (loop);
+   edge_iterator ei;
+   basic_block tgt_bb, atgt_bb;
+   enum bb_dom_status domst;
+ 
+   /* We have already threaded through headers to exits, so all the threading
+      requests now are to the inside of the loop.  We need to avoid creating
+      irreducible regions (i.e., loops with more than one entry block), and
+      also loop with several latch edges, or new subloops of the loop (although
+      there are cases where it might be appropriate, it is difficult to decide,
+      and doing it wrongly may confuse other optimizers).
+ 
+      We could handle more general cases here.  However, the intention is to
+      preserve some information about the loop, which is impossible if its
+      structure changes significantly, in a way that is not well understood.
+      Thus we only handle few important special cases, in which also updating
+      of the loop-carried information should be feasible:
+ 
+      1) Propagation of latch edge to a block that dominates the latch block
+ 	of a loop.  This aims to handle the following idiom:
+ 
+ 	first = 1;
+ 	while (1)
+ 	  {
+ 	    if (first)
+ 	      initialize;
+ 	    first = 0;
+ 	    body;
+ 	  }
+ 
+ 	After threading the latch edge, this becomes
+ 
+ 	first = 1;
+ 	if (first)
+ 	  initialize;
+ 	while (1)
+ 	  {
+ 	    first = 0;
+ 	    body;
+ 	  }
+ 
+ 	The original header of the loop is moved out of it, and we may thread
+ 	the remaining edges through it without further constraints.
+ 
+      2) All entry edges are propagated to a single basic block that dominates
+ 	the latch block of the loop.  This aims to handle the following idiom
+ 	(normally created for "for" loops):
+ 
+ 	i = 0;
+ 	while (1)
+ 	  {
+ 	    if (i >= 100)
+ 	      break;
+ 	    body;
+ 	    i++;
+ 	  }
+ 
+ 	This becomes
+ 
+ 	i = 0;
+ 	while (1)
+ 	  {
+ 	    body;
+ 	    i++;
+ 	    if (i >= 100)
+ 	      break;
+ 	  }
+      */
+ 
+   /* Threading through the header won't improve the code if the header has just
+      one successor.  */
+   if (single_succ_p (header))
+     goto fail;
+ 
+   if (latch->aux)
+     {
+       tgt_edge = latch->aux;
+       tgt_bb = tgt_edge->dest;
+     }
+   else if (!may_peel_loop_headers
+ 	   && !redirection_block_p (loop->header))
+     goto fail;
+   else
+     {
+       tgt_bb = NULL;
+       tgt_edge = NULL;
+       FOR_EACH_EDGE (e, ei, header->preds)
+ 	{
+ 	  if (!e->aux)
+ 	    {
+ 	      if (e == latch)
+ 		continue;
+ 
+ 	      /* If latch is not threaded, and there is a header
+ 		 edge that is not threaded, we would create loop
+ 		 with multiple entries.  */
+ 	      goto fail;
+ 	    }
+ 
+ 	  tgt_edge = e->aux;
+ 	  atgt_bb = tgt_edge->dest;
+ 	  if (!tgt_bb)
+ 	    tgt_bb = atgt_bb;
+ 	  /* Two targets of threading would make use create loop
+ 	     with multiple entries.  */
+ 	  else if (tgt_bb != atgt_bb)
+ 	    goto fail;
+ 	}
+ 
+       if (!tgt_bb)
+ 	{
+ 	  /* There are no threading requests.  */
+ 	     return false;
+ 	}
+ 
+       /* Redirecting to empty loop latch is useless.  */
+       if (tgt_bb == loop->latch
+ 	  && empty_block_p (loop->latch))
+ 	goto fail;
+     }
+ 
+   /* The target block must dominate the loop latch, otherwise we would be
+      creating a subloop.  */
+   domst = determine_bb_domination_status (loop, tgt_bb);
+   if (domst == DOMST_NONDOMINATING)
+     goto fail;
+   if (domst == DOMST_LOOP_BROKEN)
+     {
+       /* If the loop ceased to exist, mark it as such, and thread through its
+ 	 original header.  */
+       loop->header = NULL;
+       loop->latch = NULL;
+       return thread_block (header, false);
+     }
+ 
+   if (tgt_bb->loop_father->header == tgt_bb)
+     {
+       /* If the target of the threading is a header of a subloop, we need
+ 	 to create a preheader for it, so that the headers of the two loops
+ 	 do not merge.  */
+       if (EDGE_COUNT (tgt_bb->preds) > 2)
+ 	{
+ 	  tgt_bb = create_preheader (tgt_bb->loop_father, 0);
+ 	  gcc_assert (tgt_bb != NULL);
+ 	}
+       else
+ 	tgt_bb = loop_split_edge_with (tgt_edge, NULL);
+     }
+       
+   if (latch->aux)
+     {
+       /* First handle the case latch edge is redirected.  */
+       loop->latch = thread_single_edge (latch);
+       gcc_assert (single_succ (loop->latch) == tgt_bb);
+       loop->header = tgt_bb;
+ 
+       /* Thread the remaining edges through the former header.  */
+       thread_block (header, false);
+     }
+   else
+     {
+       basic_block new_preheader;
+ 
+       /* Now consider the case entry edges are redirected to the new entry
+ 	 block.  */
+       FOR_EACH_EDGE (e, ei, header->preds)
+ 	{
+ 	  if (e->aux)
+ 	    break;
+ 	}
+       thread_block (header, false);
+       new_preheader = e->dest;
+ 
+       /* Create the new latch block.  This is always necessary, as the latch
+ 	 must have only a single successor, but the original header had at
+ 	 least two successors.  */
+       mfb_kj_edge = single_succ_edge (new_preheader);
+       latch = make_forwarder_block (tgt_bb, mfb_keep_just, NULL);
+       add_bb_to_loop (latch->dest, loop);
+       loop->latch = latch->src;
+       loop->header = latch->dest;
+     }
+   
+   return true;
+ 
+ fail:
+   /* We failed to thread anything.  Cancel the requests.  */
+   FOR_EACH_EDGE (e, ei, header->preds)
+     {
+       e->aux = NULL;
+     }
+   return false;
+ }
+ 
  /* Walk through the registered jump threads and convert them into a
     form convenient for this pass.
  
*************** static void
*** 839,844 ****
--- 971,981 ----
  mark_threaded_blocks (bitmap threaded_blocks)
  {
    unsigned int i;
+   bitmap_iterator bi;
+   bitmap tmp = BITMAP_ALLOC (NULL);
+   basic_block bb;
+   edge e;
+   edge_iterator ei;
  
    for (i = 0; i < VEC_length (edge, threaded_edges); i += 2)
      {
*************** mark_threaded_blocks (bitmap threaded_bl
*** 846,853 ****
        edge e2 = VEC_index (edge, threaded_edges, i + 1);
  
        e->aux = e2;
!       bitmap_set_bit (threaded_blocks, e->dest->index);
      }
  }
  
  
--- 983,1012 ----
        edge e2 = VEC_index (edge, threaded_edges, i + 1);
  
        e->aux = e2;
!       bitmap_set_bit (tmp, e->dest->index);
!     }
! 
!   /* If optimizing for size, only thread through block if we don't have
!      to duplicate it or it's an otherwise empty redirection block.  */
!   if (optimize_size)
!     {
!       EXECUTE_IF_SET_IN_BITMAP (tmp, 0, i, bi)
! 	{
! 	  bb = BASIC_BLOCK (i);
! 	  if (EDGE_COUNT (bb->preds) > 1
! 	      && !redirection_block_p (bb))
! 	    {
! 	      FOR_EACH_EDGE (e, ei, bb->preds)
! 		      e->aux = NULL;
! 	    }
! 	  else
! 	    bitmap_set_bit (threaded_blocks, i);
! 	}
      }
+   else
+     bitmap_copy (threaded_blocks, tmp);
+ 
+   BITMAP_FREE(tmp);
  }
  
  
*************** mark_threaded_blocks (bitmap threaded_bl
*** 857,866 ****
     It is the caller's responsibility to fix the dominance information
     and rewrite duplicated SSA_NAMEs back into SSA form.
  
     Returns true if one or more edges were threaded, false otherwise.  */
  
  bool
! thread_through_all_blocks (void)
  {
    bool retval = false;
    unsigned int i;
--- 1016,1028 ----
     It is the caller's responsibility to fix the dominance information
     and rewrite duplicated SSA_NAMEs back into SSA form.
  
+    If MAY_PEEL_LOOP_HEADERS is false, we avoid threading edges through
+    loop headers if it does not simplify the loop.
+ 
     Returns true if one or more edges were threaded, false otherwise.  */
  
  bool
! thread_through_all_blocks (bool may_peel_loop_headers)
  {
    bool retval = false;
    unsigned int i;
*************** thread_through_all_blocks (void)
*** 875,888 ****
  
    mark_threaded_blocks (threaded_blocks);
  
    EXECUTE_IF_SET_IN_BITMAP (threaded_blocks, 0, i, bi)
      {
        basic_block bb = BASIC_BLOCK (i);
  
        if (EDGE_COUNT (bb->preds) > 0)
! 	retval |= thread_block (bb);
      }
  
    if (dump_file && (dump_flags & TDF_STATS))
      fprintf (dump_file, "\nJumps threaded: %lu\n",
  	     thread_stats.num_threaded_edges);
--- 1037,1073 ----
  
    mark_threaded_blocks (threaded_blocks);
  
+   /* First perform the threading requests that do not affect
+      loop structure.  */
    EXECUTE_IF_SET_IN_BITMAP (threaded_blocks, 0, i, bi)
      {
        basic_block bb = BASIC_BLOCK (i);
  
        if (EDGE_COUNT (bb->preds) > 0)
! 	retval |= thread_block (bb, true);
      }
  
+   /* Then perform the threading through loop headers.  We start with the
+      innermost loop, so that the changes in cfg we perform won't affect
+      further threading.  */
+   if (current_loops)
+     {
+       for (i = current_loops->num - 1; i > 0; i--)
+ 	{
+ 	  struct loop *loop = current_loops->parray[i];
+ 
+ 	  if (!loop
+ 	      || !loop->header
+ 	      || !bitmap_bit_p (threaded_blocks, loop->header->index))
+ 	    continue;
+ 
+ 	  retval |= thread_through_loop_header (loop, may_peel_loop_headers);
+ 	}
+     }
+ 
+   if (retval)
+     free_dominance_info (CDI_DOMINATORS);
+ 
    if (dump_file && (dump_flags & TDF_STATS))
      fprintf (dump_file, "\nJumps threaded: %lu\n",
  	     thread_stats.num_threaded_edges);
*************** thread_through_all_blocks (void)
*** 891,896 ****
--- 1076,1082 ----
    threaded_blocks = NULL;
    VEC_free (edge, heap, threaded_edges);
    threaded_edges = NULL;
+ 
    return retval;
  }
  
Index: cfgloopmanip.c
===================================================================
*** cfgloopmanip.c	(revision 118220)
--- cfgloopmanip.c	(working copy)
*************** static bool fix_bb_placement (struct loo
*** 46,52 ****
  static void fix_bb_placements (struct loops *, basic_block, bool *);
  static void place_new_loop (struct loops *, struct loop *);
  static void scale_loop_frequencies (struct loop *, int, int);
- static basic_block create_preheader (struct loop *, int);
  static void unloop (struct loops *, struct loop *, bool *);
  
  #define RDIV(X,Y) (((X) + (Y) / 2) / (Y))
--- 46,51 ----
*************** duplicate_loop_to_header_edge (struct lo
*** 1061,1068 ****
     MFB_KJ_EDGE to the entry part.  E is the edge for that we should decide
     whether to redirect it.  */
  
! static edge mfb_kj_edge;
! static bool
  mfb_keep_just (edge e)
  {
    return e != mfb_kj_edge;
--- 1060,1067 ----
     MFB_KJ_EDGE to the entry part.  E is the edge for that we should decide
     whether to redirect it.  */
  
! edge mfb_kj_edge;
! bool
  mfb_keep_just (edge e)
  {
    return e != mfb_kj_edge;
*************** mfb_update_loops (basic_block jump)
*** 1088,1094 ****
     entry; otherwise we also force preheader block to have only one successor.
     The function also updates dominators.  */
  
! static basic_block
  create_preheader (struct loop *loop, int flags)
  {
    edge e, fallthru;
--- 1087,1093 ----
     entry; otherwise we also force preheader block to have only one successor.
     The function also updates dominators.  */
  
! basic_block
  create_preheader (struct loop *loop, int flags)
  {
    edge e, fallthru;
*************** create_preheader (struct loop *loop, int
*** 1172,1183 ****
  
  /* Create preheaders for each loop from loop tree stored in LOOPS; for meaning
     of FLAGS see create_preheader.  */
  void
  create_preheaders (struct loops *loops, int flags)
  {
    unsigned i;
    for (i = 1; i < loops->num; i++)
!     create_preheader (loops->parray[i], flags);
    loops->state |= LOOPS_HAVE_PREHEADERS;
  }
  
--- 1171,1184 ----
  
  /* Create preheaders for each loop from loop tree stored in LOOPS; for meaning
     of FLAGS see create_preheader.  */
+ 
  void
  create_preheaders (struct loops *loops, int flags)
  {
    unsigned i;
    for (i = 1; i < loops->num; i++)
!     if (loops->parray[i])
!       create_preheader (loops->parray[i], flags);
    loops->state |= LOOPS_HAVE_PREHEADERS;
  }
  
*************** void
*** 1401,1409 ****
  fix_loop_structure (struct loops *loops, bitmap changed_bbs)
  {
    basic_block bb;
!   struct loop *loop, *ploop;
    unsigned i;
  
    /* Remove the old bb -> loop mapping.  */
    FOR_EACH_BB (bb)
      {
--- 1402,1412 ----
  fix_loop_structure (struct loops *loops, bitmap changed_bbs)
  {
    basic_block bb;
!   struct loop *loop;
    unsigned i;
  
+   gcc_assert (loops->state & LOOPS_HAVE_SIMPLE_LATCHES);
+ 
    /* Remove the old bb -> loop mapping.  */
    FOR_EACH_BB (bb)
      {
*************** fix_loop_structure (struct loops *loops,
*** 1420,1474 ****
  	continue;
  
        loop->num_nodes = 0;
-       if (loop->header)
- 	continue;
- 
-       while (loop->inner)
- 	{
- 	  ploop = loop->inner;
- 	  flow_loop_tree_node_remove (ploop);
- 	  flow_loop_tree_node_add (loop->outer, ploop);
- 	}
- 
-       /* Remove the loop and free its data.  */
        flow_loop_tree_node_remove (loop);
-       loops->parray[loop->num] = NULL;
-       flow_loop_free (loop);
-     }
  
!   /* Rescan the bodies of loops, starting from the outermost.  */
!   loop = loops->tree_root;
!   while (1)
!     {
!       if (loop->inner)
! 	loop = loop->inner;
!       else
  	{
! 	  while (!loop->next
! 		 && loop != loops->tree_root)
! 	    loop = loop->outer;
! 	  if (loop == loops->tree_root)
! 	    break;
! 
! 	  loop = loop->next;
  	}
- 
-       loop->num_nodes = flow_loop_nodes_find (loop->header, loop);
      }
  
!   /* Now fix the loop nesting.  */
    for (i = 1; i < loops->num; i++)
      {
        loop = loops->parray[i];
        if (!loop)
  	continue;
  
!       bb = loop_preheader_edge (loop)->src;
!       if (bb->loop_father != loop->outer)
! 	{
! 	  flow_loop_tree_node_remove (loop);
! 	  flow_loop_tree_node_add (bb->loop_father, loop);
! 	}
      }
  
    /* Mark the blocks whose loop has changed.  */
--- 1423,1448 ----
  	continue;
  
        loop->num_nodes = 0;
        flow_loop_tree_node_remove (loop);
  
!       if (!loop->header)
  	{
! 	  /* Remove the loop and free its data.  */
! 	  loops->parray[i] = NULL;
! 	  flow_loop_free (loop);
  	}
      }
  
!   /* Rescan the bodies of loops, starting from the outermost, and reconstruct
!      the loop nesting structure.  */
    for (i = 1; i < loops->num; i++)
      {
        loop = loops->parray[i];
        if (!loop)
  	continue;
  
!       flow_loop_tree_node_add (loop->header->loop_father, loop);
!       loop->num_nodes = flow_loop_nodes_find (loop->header, loop);
      }
  
    /* Mark the blocks whose loop has changed.  */
*************** fix_loop_structure (struct loops *loops,
*** 1481,1486 ****
--- 1455,1462 ----
        bb->aux = NULL;
      }
  
+   if (loops->state & LOOPS_HAVE_PREHEADERS)
+     create_preheaders (loops, CP_SIMPLE_PREHEADERS);
    if (loops->state & LOOPS_HAVE_MARKED_SINGLE_EXITS)
      mark_single_exit_loops (loops);
    if (loops->state & LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS)
Index: tree-pass.h
===================================================================
*** tree-pass.h	(revision 118220)
--- tree-pass.h	(working copy)
*************** struct dump_file_info
*** 217,222 ****
--- 217,225 ----
     in statements, used.  */
  #define TODO_update_smt_usage           (1 << 14)
  
+ /* Internally used for the first instance of a pass.  */
+ #define TODO_mark_first_instance	(1 << 15)
+ 
  #define TODO_update_ssa_any		\
      (TODO_update_ssa			\
       | TODO_update_ssa_no_phi		\
*************** extern struct tree_opt_pass *all_passes,
*** 401,404 ****
--- 404,411 ----
  extern void execute_pass_list (struct tree_opt_pass *);
  extern void execute_ipa_pass_list (struct tree_opt_pass *);
  
+ /* Set to true if the pass is called the first time during compilation of the
+    current function.  */
+ extern bool first_pass_instance;
+ 
  #endif /* GCC_TREE_PASS_H */
Index: tree-scalar-evolution.c
===================================================================
*** tree-scalar-evolution.c	(revision 118220)
--- tree-scalar-evolution.c	(working copy)
*************** void
*** 2866,2871 ****
--- 2866,2872 ----
  scev_finalize (void)
  {
    htab_delete (scalar_evolution_info);
+   scalar_evolution_info = NULL;
    BITMAP_FREE (already_instantiated);
  }
  
Index: testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2.c
===================================================================
*** testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2.c	(revision 0)
--- testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2.c	(revision 0)
***************
*** 0 ****
--- 1,119 ----
+ /* { dg-do compile } */ 
+ /* { dg-options "-O2 -fdump-tree-vrp1-stats -fdump-tree-dom1-stats" } */
+ 
+ void foo();
+ void bla();
+ void bar();
+ 
+ /* In the following two cases, we should be able to thread edge through
+    the loop header.  */
+ 
+ void thread_entry_through_header (void)
+ {
+   int i;
+ 
+   for (i = 0; i < 170; i++)
+     bla ();
+ }
+ 
+ void thread_latch_through_header (void)
+ {
+   int i = 0;
+   int first = 1;
+ 
+   do
+     {
+       if (first)
+ 	foo ();
+ 
+       first = 0;
+       bla ();
+     } while (i++ < 100);
+ }
+ 
+ /* This is a TODO -- it is correct to thread both entry and latch edge through
+    the header, but we do not handle this case yet.  */
+ 
+ void dont_thread_1 (void)
+ {
+   int i = 0;
+   int first = 1;
+ 
+   do
+     {
+       if (first)
+ 	foo ();
+       else
+ 	bar ();
+ 
+       first = 0;
+       bla ();
+     } while (i++ < 100);
+ }
+ 
+ /* Avoid threading in the following two cases, to prevent creating subloops.  */
+ 
+ void dont_thread_2 (int first)
+ {
+   int i = 0;
+ 
+   do
+     {
+       if (first)
+ 	foo ();
+       else
+ 	bar ();
+ 
+       first = 0;
+       bla ();
+     } while (i++ < 100);
+ }
+ 
+ void dont_thread_3 (int nfirst)
+ {
+   int i = 0;
+   int first = 0;
+ 
+   do
+     {
+       if (first)
+ 	foo ();
+       else
+ 	bar ();
+ 
+       first = nfirst;
+       bla ();
+     } while (i++ < 100);
+ }
+ 
+ /* Avoid threading in this case, in order to avoid creating loop with
+    multiple entries.  */
+ 
+ void dont_thread_4 (int a, int nfirst)
+ {
+   int i = 0;
+   int first;
+ 
+   if (a)
+     first = 0;
+   else
+     first = 1;
+ 
+   do
+     {
+       if (first)
+ 	foo ();
+       else
+ 	bar ();
+ 
+       first = nfirst;
+       bla ();
+     } while (i++ < 100);
+ }
+ 
+ /* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "vrp1"} } */
+ /* { dg-final { scan-tree-dump-times "Jumps threaded: 2" 0 "vrp1"} } */
+ /* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 0 "dom1"} } */
+ /* { dg-final { scan-tree-dump-times "Jumps threaded: 2" 1 "dom1"} } */
+ /* { dg-final { cleanup-tree-dump "dom1" } } */
+ /* { dg-final { cleanup-tree-dump "vrp1" } } */
Index: testsuite/gcc.dg/vect/vect-102.c
===================================================================
*** testsuite/gcc.dg/vect/vect-102.c	(revision 118220)
--- testsuite/gcc.dg/vect/vect-102.c	(working copy)
*************** int main1 (int x, int y) {
*** 23,29 ****
    for (i = 0; i < N; i++)
      {
         p->a[i] = a[i];
!        if (x == 135)
  	 abort (); /* to avoid vectorization  */
      }
  
--- 23,29 ----
    for (i = 0; i < N; i++)
      {
         p->a[i] = a[i];
!        if (x == 135 || x == 137)
  	 abort (); /* to avoid vectorization  */
      }
  
Index: testsuite/gcc.dg/vect/vect-103.c
===================================================================
*** testsuite/gcc.dg/vect/vect-103.c	(revision 118220)
--- testsuite/gcc.dg/vect/vect-103.c	(working copy)
*************** int main1 (int x, int y) {
*** 25,31 ****
      {
         p->a[i] = a[i];
         p->b[i] = b[i];
!        if (x == 135)
  	 abort (); /* to avoid vectorization  */
      }
  
--- 25,31 ----
      {
         p->a[i] = a[i];
         p->b[i] = b[i];
!        if (x == 135 || x == 137)
  	 abort (); /* to avoid vectorization  */
      }
  
Index: tree-ssa-dom.c
===================================================================
*** tree-ssa-dom.c	(revision 118220)
--- tree-ssa-dom.c	(working copy)
*************** tree_ssa_dominator_optimize (void)
*** 240,246 ****
  {
    struct dom_walk_data walk_data;
    unsigned int i;
-   struct loops loops_info;
  
    memset (&opt_stats, 0, sizeof (opt_stats));
  
--- 240,245 ----
*************** tree_ssa_dominator_optimize (void)
*** 273,292 ****
  
    calculate_dominance_info (CDI_DOMINATORS);
  
!   /* We need to know which edges exit loops so that we can
!      aggressively thread through loop headers to an exit
!      edge.  */
!   flow_loops_find (&loops_info);
!   mark_loop_exit_edges (&loops_info);
!   flow_loops_free (&loops_info);
! 
!   /* Clean up the CFG so that any forwarder blocks created by loop
!      canonicalization are removed.  */
!   cleanup_tree_cfg ();
!   calculate_dominance_info (CDI_DOMINATORS);
  
    /* We need accurate information regarding back edges in the CFG
!      for jump threading.  */
    mark_dfs_back_edges ();
  
    /* Recursively walk the dominator tree optimizing statements.  */
--- 272,286 ----
  
    calculate_dominance_info (CDI_DOMINATORS);
  
!   /* We need to know loop structures in order to avoid destroying them.
!      Note that we still can e.g. thread through loop headers to an exit
!      edge, or through loop header to the loop body, assuming that we update
!      the loop info.  */
!   current_loops = loop_optimizer_init (LOOPS_HAVE_SIMPLE_LATCHES);
  
    /* We need accurate information regarding back edges in the CFG
!      for jump threading; this may include back edes that are not part of
!      a single loop.  */
    mark_dfs_back_edges ();
  
    /* Recursively walk the dominator tree optimizing statements.  */
*************** tree_ssa_dominator_optimize (void)
*** 312,318 ****
    free_all_edge_infos ();
  
    /* Thread jumps, creating duplicate blocks as needed.  */
!   cfg_altered |= thread_through_all_blocks ();
  
    /* Removal of statements may make some EH edges dead.  Purge
       such edges from the CFG as needed.  */
--- 306,312 ----
    free_all_edge_infos ();
  
    /* Thread jumps, creating duplicate blocks as needed.  */
!   cfg_altered |= thread_through_all_blocks (first_pass_instance);
  
    /* Removal of statements may make some EH edges dead.  Purge
       such edges from the CFG as needed.  */
*************** tree_ssa_dominator_optimize (void)
*** 322,330 ****
        bitmap_zero (need_eh_cleanup);
      }
  
-   if (cfg_altered)
-     free_dominance_info (CDI_DOMINATORS);
- 
    /* Finally, remove everything except invariants in SSA_NAME_VALUE.
  
       Long term we will be able to let everything in SSA_NAME_VALUE
--- 316,321 ----
*************** tree_ssa_dominator_optimize (void)
*** 346,351 ****
--- 337,346 ----
    if (dump_file && (dump_flags & TDF_STATS))
      dump_dominator_optimization_stats (dump_file);
  
+   cleanup_tree_cfg_loop (false);
+   loop_optimizer_finalize (current_loops);
+   current_loops = NULL;
+ 
    /* Delete our main hashtable.  */
    htab_delete (avail_exprs);
  
Index: predict.c
===================================================================
*** predict.c	(revision 118220)
--- predict.c	(working copy)
*************** tree_estimate_probability (void)
*** 1294,1303 ****
    flow_loops_find (&loops_info);
    if (dump_file && (dump_flags & TDF_DETAILS))
      flow_loops_dump (&loops_info, dump_file, NULL, 0);
  
    add_noreturn_fake_exit_edges ();
    connect_infinite_loops_to_exit ();
-   calculate_dominance_info (CDI_DOMINATORS);
    calculate_dominance_info (CDI_POST_DOMINATORS);
  
    tree_bb_level_predictions ();
--- 1294,1303 ----
    flow_loops_find (&loops_info);
    if (dump_file && (dump_flags & TDF_DETAILS))
      flow_loops_dump (&loops_info, dump_file, NULL, 0);
+   create_preheaders (&loops_info, 0);
  
    add_noreturn_fake_exit_edges ();
    connect_infinite_loops_to_exit ();
    calculate_dominance_info (CDI_POST_DOMINATORS);
  
    tree_bb_level_predictions ();
Index: tree-cfgcleanup.c
===================================================================
*** tree-cfgcleanup.c	(revision 118220)
--- tree-cfgcleanup.c	(working copy)
*************** cleanup_tree_cfg (void)
*** 579,601 ****
    return changed;
  }
  
! /* Cleanup cfg and repair loop structures.  */
  
  void
! cleanup_tree_cfg_loop (void)
  {
    bool changed = cleanup_tree_cfg ();
  
!   if (changed)
      {
        bitmap changed_bbs = BITMAP_ALLOC (NULL);
        fix_loop_structure (current_loops, changed_bbs);
        calculate_dominance_info (CDI_DOMINATORS);
  
!       /* This usually does nothing.  But sometimes parts of cfg that originally
! 	 were inside a loop get out of it due to edge removal (since they
! 	 become unreachable by back edges from latch).  */
!       rewrite_into_loop_closed_ssa (changed_bbs, TODO_update_ssa);
  
        BITMAP_FREE (changed_bbs);
  
--- 579,605 ----
    return changed;
  }
  
! /* Cleanup cfg and repair loop structures.  If FIX_LCSSA is true, loop closed
!    ssa form is updated as well.  */
  
  void
! cleanup_tree_cfg_loop (bool fix_lcssa)
  {
    bool changed = cleanup_tree_cfg ();
  
!   if (changed && current_loops != NULL)
      {
        bitmap changed_bbs = BITMAP_ALLOC (NULL);
        fix_loop_structure (current_loops, changed_bbs);
        calculate_dominance_info (CDI_DOMINATORS);
  
!       if (fix_lcssa)
! 	{
! 	  /* This usually does nothing.  But sometimes parts of cfg that
! 	     originally were inside a loop get out of it due to edge removal
! 	     (since they become unreachable by back edges from latch).  */
! 	  rewrite_into_loop_closed_ssa (changed_bbs, TODO_update_ssa);
! 	}
  
        BITMAP_FREE (changed_bbs);
  
Index: cfgloop.c
===================================================================
*** cfgloop.c	(revision 118220)
--- cfgloop.c	(working copy)
*************** update_latch_info (basic_block jump)
*** 434,450 ****
    set_immediate_dominator (CDI_DOMINATORS, jump, single_pred (jump));
  }
  
- /* A callback for make_forwarder block, to redirect all edges except for
-    MFB_KJ_EDGE to the entry part.  E is the edge for that we should decide
-    whether to redirect it.  */
- 
- static edge mfb_kj_edge;
- static bool
- mfb_keep_just (edge e)
- {
-   return e != mfb_kj_edge;
- }
- 
  /* A callback for make_forwarder block, to redirect the latch edges into an
     entry part.  E is the edge for that we should decide whether to redirect
     it.  */
--- 434,439 ----
*************** loop_preheader_edge (const struct loop *
*** 1284,1289 ****
--- 1273,1280 ----
    edge e;
    edge_iterator ei;
  
+   gcc_assert (!current_loops
+ 	      || (current_loops->state & LOOPS_HAVE_PREHEADERS) != 0);
    FOR_EACH_EDGE (e, ei, loop->header->preds)
      if (e->src != loop->latch)
        break;
Index: cfgloop.h
===================================================================
*** cfgloop.h	(revision 118220)
--- cfgloop.h	(working copy)
*************** enum
*** 228,233 ****
--- 228,234 ----
    CP_SIMPLE_PREHEADERS = 1
  };
  
+ basic_block create_preheader (struct loop *, int);
  extern void create_preheaders (struct loops *, int);
  extern void force_single_succ_latches (struct loops *);
  
Index: tree-flow.h
===================================================================
*** tree-flow.h	(revision 118220)
--- tree-flow.h	(working copy)
*************** extern basic_block move_sese_region_to_f
*** 627,633 ****
  
  /* In tree-cfgcleanup.c  */
  extern bool cleanup_tree_cfg (void);
! extern void cleanup_tree_cfg_loop (void);
  
  /* In tree-pretty-print.c.  */
  extern void dump_generic_bb (FILE *, basic_block, int, int);
--- 627,633 ----
  
  /* In tree-cfgcleanup.c  */
  extern bool cleanup_tree_cfg (void);
! extern void cleanup_tree_cfg_loop (bool);
  
  /* In tree-pretty-print.c.  */
  extern void dump_generic_bb (FILE *, basic_block, int, int);
*************** bool multiplier_allowed_in_address_p (HO
*** 952,958 ****
  unsigned multiply_by_cost (HOST_WIDE_INT, enum machine_mode);
  
  /* In tree-ssa-threadupdate.c.  */
! extern bool thread_through_all_blocks (void);
  extern void register_jump_thread (edge, edge);
  
  /* In gimplify.c  */
--- 952,958 ----
  unsigned multiply_by_cost (HOST_WIDE_INT, enum machine_mode);
  
  /* In tree-ssa-threadupdate.c.  */
! extern bool thread_through_all_blocks (bool);
  extern void register_jump_thread (edge, edge);
  
  /* In gimplify.c  */
Index: basic-block.h
===================================================================
*** basic-block.h	(revision 118220)
--- basic-block.h	(working copy)
*************** static inline bool bb_has_eh_pred (basic
*** 1189,1192 ****
--- 1189,1196 ----
    return false;
  }
  
+ /* In cfgloopmanip.c.  */
+ extern edge mfb_kj_edge;
+ bool mfb_keep_just (edge);
+ 
  #endif /* GCC_BASIC_BLOCK_H */
Index: tree-cfg.c
===================================================================
*** tree-cfg.c	(revision 118220)
--- tree-cfg.c	(working copy)
*************** tree_split_edge (edge edge_in)
*** 3175,3181 ****
    new_edge->count = edge_in->count;
  
    e = redirect_edge_and_branch (edge_in, new_bb);
!   gcc_assert (e);
    reinstall_phi_args (new_edge, e);
  
    return new_bb;
--- 3175,3181 ----
    new_edge->count = edge_in->count;
  
    e = redirect_edge_and_branch (edge_in, new_bb);
!   gcc_assert (e == edge_in);
    reinstall_phi_args (new_edge, e);
  
    return new_bb;
Index: passes.c
===================================================================
*** passes.c	(revision 118220)
--- passes.c	(working copy)
*************** Software Foundation, 51 Franklin Street,
*** 103,108 ****
--- 103,109 ----
  /* Global variables used to communicate with passes.  */
  int dump_flags;
  bool in_gimple_form;
+ bool first_pass_instance;
  
  
  /* This is called from various places for FUNCTION_DECL, VAR_DECL,
*************** next_pass_1 (struct tree_opt_pass **list
*** 390,395 ****
--- 391,398 ----
        new = xmalloc (sizeof (*new));
        memcpy (new, pass, sizeof (*new));
  
+       new->todo_flags_start &= ~TODO_mark_first_instance;
+ 
        /* Indicate to register_dump_files that this pass has duplicates,
           and so it should rename the dump file.  The first instance will
           be -1, and be number of duplicates = -static_pass_number - 1.
*************** next_pass_1 (struct tree_opt_pass **list
*** 404,409 ****
--- 407,413 ----
      }
    else
      {
+       pass->todo_flags_start |= TODO_mark_first_instance;
        pass->static_pass_number = -1;
        *list = pass;
      }  
*************** execute_todo (unsigned int flags)
*** 714,719 ****
--- 718,727 ----
      gcc_assert (flags & TODO_update_ssa_any);
  #endif
  
+   /* Inform the pass whether it is the first time it is run.  */
+   first_pass_instance = (flags & TODO_mark_first_instance) != 0;
+   flags &= ~TODO_mark_first_instance;
+ 
    if (curr_properties & PROP_ssa)
      flags |= TODO_verify_ssa;
    flags &= ~last_verified;
*************** execute_todo (unsigned int flags)
*** 731,737 ****
        updating_used_alone = true;
  
        if (current_loops)
! 	cleanup_tree_cfg_loop ();
        else
  	cleanup_tree_cfg ();
  
--- 739,745 ----
        updating_used_alone = true;
  
        if (current_loops)
! 	cleanup_tree_cfg_loop (true);
        else
  	cleanup_tree_cfg ();
  
Index: tree-ssa-threadedge.c
===================================================================
*** tree-ssa-threadedge.c	(revision 118220)
--- tree-ssa-threadedge.c	(working copy)
*************** simplify_control_stmt_condition (edge e,
*** 482,488 ****
     Note it is quite common for the first block inside a loop to
     end with a conditional which is either always true or always
     false when reached via the loop backedge.  Thus we do not want
!    to blindly disable threading across a loop backedge.  */
  
  void
  thread_across_edge (tree dummy_cond,
--- 482,500 ----
     Note it is quite common for the first block inside a loop to
     end with a conditional which is either always true or always
     false when reached via the loop backedge.  Thus we do not want
!    to blindly disable threading across a loop backedge.
!  
!    DUMMY_COND is a shared cond_expr used by condition simplification as scratch,
!    to avoid allocating memory.
!  
!    HANDLE_DOMINATING_ASSERTS is true if we should try to replace operands of
!    the simplified condition with left-hand sides of ASSERT_EXPRs they are
!    used in.
!  
!    STACK is used to undo temporary equivalences created during the walk of
!    E->dest.
! 
!    SIMPLIFY is a pass-specific function used to simplify statements.  */
  
  void
  thread_across_edge (tree dummy_cond,


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]