This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[killloop] Prefetching


Hello,

this implements a prefetching pass on trees (almost the same as the one
posted in http://gcc.gnu.org/ml/gcc-patches/2005-06/msg01079.html).
Minor changes include removing of parts that were already merged to
mainline, and to the way addresses of prefetched memory references are
computed in order to make ivopts happier.

Zdenek

	* Makefile.in (tree-ssa-loop-prefetch.o): Add.
	(tree-cfgcleanup.o): Add SCEV_H dependency.
	* cfgloop.h (loopify, loop_version): Declaration changed.
	* cfgloopmanip.c (loopify, lv_adjust_loop_entry_edge,
	loop_version): Enable to specify how the frequencies
	should be scaled.
	* common.opt (fprefetch-loop-arrays-rtl): New.
	* loop-unswitch.c (unswitch_loop): Specify how the frequencies should
	be scaled.
	* modulo-sched.c (sms_schedule): Ditto.
	* loop.c (rest_of_handle_loop_optimize): Handle
	-fprefetch-loop-arrays-rtl.
	* timevar.def (TV_TREE_PREFETCH): New.
	* tree-cfgcleanup.c: Include tree-scalar-evolution.h.
	(cleanup_tree_cfg_loop): Flush scev cache if anything changes.
	* tree-flow.h (tree_ssa_prefetch_arrays, tree_unroll_loop): Declare.
	(single_dom_exit): Make argument const.
	* tree-optimize.c (init_tree_optimization_passes): Add
	pass_loop_prefetch.
	* tree-outof-ssa.c (struct temp_expr_table_d): New field expr_vars.
	(new_temp_expr_table, free_temp_expr_table, check_replaceable,
	find_replaceable_in_bb): Stop TER when a different ssa name for the
	same variable is encountered.
	* tree-pass.h (pass_loop_prefetch): Declare.
	* tree-ssa-loop-ivopts.c (single_dom_exit): Make argument const.
	* tree-ssa-loop-manip.c (build_if_stmt, tree_unroll_loop): New
	function.
	* tree-ssa-loop-prefetch.c: New file.
	* tree-ssa-loop.c (tree_ssa_loop_prefetch, gate_tree_ssa_loop_prefetch,
	pass_loop_prefetch): New.
	* doc/invoke.texi (-fprefetch-loop-arrays-rtl): Document.

Index: Makefile.in
===================================================================
RCS file: /cvs/gcc/gcc/gcc/Makefile.in,v
retrieving revision 1.1530.2.5
diff -c -3 -p -r1.1530.2.5 Makefile.in
*** Makefile.in	9 Sep 2005 16:52:15 -0000	1.1530.2.5
--- Makefile.in	14 Sep 2005 14:10:13 -0000
*************** OBJS-common = \
*** 940,946 ****
   tree-ssa-dom.o domwalk.o tree-tailcall.o gimple-low.o tree-iterator.o	   \
   tree-phinodes.o tree-ssanames.o tree-sra.o tree-complex.o		   \
   tree-vect-generic.o tree-ssa-loop.o tree-ssa-loop-niter.o		   \
!  tree-ssa-loop-manip.o tree-ssa-threadupdate.o				   \
   tree-vectorizer.o tree-vect-analyze.o tree-vect-transform.o		   \
   tree-ssa-loop-ivcanon.o tree-ssa-propagate.o tree-ssa-address.o	   \
   tree-ssa-math-opts.o tree-ssa-loop-endcond.o tree-affine.o		   \
--- 940,946 ----
   tree-ssa-dom.o domwalk.o tree-tailcall.o gimple-low.o tree-iterator.o	   \
   tree-phinodes.o tree-ssanames.o tree-sra.o tree-complex.o		   \
   tree-vect-generic.o tree-ssa-loop.o tree-ssa-loop-niter.o		   \
!  tree-ssa-loop-manip.o tree-ssa-threadupdate.o tree-ssa-loop-prefetch.o	   \
   tree-vectorizer.o tree-vect-analyze.o tree-vect-transform.o		   \
   tree-ssa-loop-ivcanon.o tree-ssa-propagate.o tree-ssa-address.o	   \
   tree-ssa-math-opts.o tree-ssa-loop-endcond.o tree-affine.o		   \
*************** tree-cfg.o : tree-cfg.c $(TREE_FLOW_H) $
*** 1826,1834 ****
  tree-cfgcleanup.o : tree-cfgcleanup.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \
     $(RTL_H) $(TREE_H) $(TM_P_H) $(EXPR_H) $(GGC_H) $(FLAGS_H) output.h \
     $(DIAGNOSTIC_H) errors.h function.h $(TIMEVAR_H) $(TM_H) coretypes.h \
!    $(TREE_DUMP_H) except.h langhooks.h $(CFGLOOP_H) tree-pass.h \
!    $(CFGLAYOUT_H) $(BASIC_BLOCK_H) hard-reg-set.h $(HASHTAB_H) toplev.h \
!    tree-ssa-propagate.h
  tree-tailcall.o : tree-tailcall.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \
     $(RTL_H) $(TREE_H) $(TM_P_H) function.h $(TM_H) coretypes.h \
     $(TREE_DUMP_H) $(DIAGNOSTIC_H) except.h tree-pass.h $(FLAGS_H) langhooks.h \
--- 1826,1832 ----
  tree-cfgcleanup.o : tree-cfgcleanup.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \
     $(RTL_H) $(TREE_H) $(TM_P_H) $(EXPR_H) $(GGC_H) $(FLAGS_H) output.h \
     $(DIAGNOSTIC_H) errors.h function.h $(TIMEVAR_H) $(TM_H) coretypes.h \
!    tree-ssa-propagate.h $(SCEV_H)
  tree-tailcall.o : tree-tailcall.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \
     $(RTL_H) $(TREE_H) $(TM_P_H) function.h $(TM_H) coretypes.h \
     $(TREE_DUMP_H) $(DIAGNOSTIC_H) except.h tree-pass.h $(FLAGS_H) langhooks.h \
*************** tree-ssa-loop-ivopts.o : tree-ssa-loop-i
*** 1898,1903 ****
--- 1896,1907 ----
     tree-pass.h $(GGC_H) $(RECOG_H) insn-config.h $(HASHTAB_H) $(SCEV_H) \
     $(CFGLOOP_H) $(PARAMS_H) langhooks.h $(BASIC_BLOCK_H) hard-reg-set.h \
     tree-chrec.h $(VARRAY_H) tree-affine.h
+ tree-ssa-loop-prefetch.o: tree-ssa-loop-prefetch.c $(TREE_FLOW_H) $(CONFIG_H) \
+    $(SYSTEM_H) $(RTL_H) $(TREE_H) $(TM_P_H) $(CFGLOOP_H) $(EXPR_H) \
+    output.h $(DIAGNOSTIC_H) $(TIMEVAR_H) $(TM_H) coretypes.h $(TREE_DUMP_H) \
+    tree-pass.h $(GGC_H) $(RECOG_H) insn-config.h $(HASHTAB_H) $(SCEV_H) \
+    $(CFGLOOP_H) $(PARAMS_H) langhooks.h $(BASIC_BLOCK_H) hard-reg-set.h \
+    tree-chrec.h toplev.h langhooks.h
  tree-ssa-loop-endcond.o : tree-ssa-loop-endcond.c $(TREE_FLOW_H) $(CONFIG_H) \
     $(SYSTEM_H) $(RTL_H) $(TREE_H) $(TM_P_H) $(CFGLOOP_H) $(EXPR_H) \
     output.h $(DIAGNOSTIC_H) $(TIMEVAR_H) $(TM_H) coretypes.h $(TREE_DUMP_H) \
Index: cfgloop.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cfgloop.h,v
retrieving revision 1.48.2.2
diff -c -3 -p -r1.48.2.2 cfgloop.h
*** cfgloop.h	2 Sep 2005 16:41:09 -0000	1.48.2.2
--- cfgloop.h	14 Sep 2005 14:10:13 -0000
*************** extern bool duplicate_loop_to_header_edg
*** 307,315 ****
  					   unsigned, sbitmap, edge, edge *,
  					   unsigned *, int);
  extern struct loop *loopify (struct loops *, edge, edge,
! 			     basic_block, edge, edge, bool);
  struct loop * loop_version (struct loops *, struct loop *, void *,
! 			    basic_block *, bool);
  extern bool remove_path (struct loops *, edge);
  extern edge split_loop_bb (basic_block, void *);
  
--- 307,316 ----
  					   unsigned, sbitmap, edge, edge *,
  					   unsigned *, int);
  extern struct loop *loopify (struct loops *, edge, edge,
! 			     basic_block, edge, edge, bool,
! 			     unsigned, unsigned);
  struct loop * loop_version (struct loops *, struct loop *, void *,
! 			    basic_block *, unsigned, unsigned, unsigned, bool);
  extern bool remove_path (struct loops *, edge);
  extern edge split_loop_bb (basic_block, void *);
  
Index: cfgloopmanip.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cfgloopmanip.c,v
retrieving revision 1.50.2.1
diff -c -3 -p -r1.50.2.1 cfgloopmanip.c
*** cfgloopmanip.c	2 Sep 2005 16:41:09 -0000	1.50.2.1
--- cfgloopmanip.c	14 Sep 2005 14:10:13 -0000
*************** scale_loop_frequencies (struct loop *loo
*** 468,479 ****
     LATCH_EDGE source.  HEADER_EDGE is redirected to basic block SWITCH_BB,
     FALSE_EDGE of SWITCH_BB to original destination of HEADER_EDGE and
     TRUE_EDGE of SWITCH_BB to original destination of LATCH_EDGE.
!    Returns newly created loop.  */
  
  struct loop *
  loopify (struct loops *loops, edge latch_edge, edge header_edge, 
  	 basic_block switch_bb, edge true_edge, edge false_edge,
! 	 bool redirect_all_edges)
  {
    basic_block succ_bb = latch_edge->dest;
    basic_block pred_bb = header_edge->src;
--- 468,480 ----
     LATCH_EDGE source.  HEADER_EDGE is redirected to basic block SWITCH_BB,
     FALSE_EDGE of SWITCH_BB to original destination of HEADER_EDGE and
     TRUE_EDGE of SWITCH_BB to original destination of LATCH_EDGE.
!    Returns newly created loop.  Frequencies and counts in the new loop
!    are scaled by NEW_SCALE and in the old one by OLD_SCALE.  */
  
  struct loop *
  loopify (struct loops *loops, edge latch_edge, edge header_edge, 
  	 basic_block switch_bb, edge true_edge, edge false_edge,
! 	 bool redirect_all_edges, unsigned old_scale, unsigned new_scale)
  {
    basic_block succ_bb = latch_edge->dest;
    basic_block pred_bb = header_edge->src;
*************** loopify (struct loops *loops, edge latch
*** 482,488 ****
    sbitmap seen;
    struct loop *loop = xcalloc (1, sizeof (struct loop));
    struct loop *outer = succ_bb->loop_father->outer;
!   int freq, prob, tot_prob;
    gcov_type cnt;
    edge e;
    edge_iterator ei;
--- 483,489 ----
    sbitmap seen;
    struct loop *loop = xcalloc (1, sizeof (struct loop));
    struct loop *outer = succ_bb->loop_father->outer;
!   int freq;
    gcov_type cnt;
    edge e;
    edge_iterator ei;
*************** loopify (struct loops *loops, edge latch
*** 492,501 ****
  
    freq = EDGE_FREQUENCY (header_edge);
    cnt = header_edge->count;
-   prob = EDGE_SUCC (switch_bb, 0)->probability;
-   tot_prob = prob + EDGE_SUCC (switch_bb, 1)->probability;
-   if (tot_prob == 0)
-     tot_prob = 1;
  
    /* Redirect edges.  */
    loop_redirect_edge (latch_edge, loop->header);
--- 493,498 ----
*************** loopify (struct loops *loops, edge latch
*** 523,534 ****
    add_bb_to_loop (switch_bb, outer);
  
    /* Fix frequencies.  */
!   switch_bb->frequency = freq;
!   switch_bb->count = cnt;
!   FOR_EACH_EDGE (e, ei, switch_bb->succs)
!     e->count = (switch_bb->count * e->probability) / REG_BR_PROB_BASE;
!   scale_loop_frequencies (loop, prob, tot_prob);
!   scale_loop_frequencies (succ_bb->loop_father, tot_prob - prob, tot_prob);
  
    /* Update dominators of blocks outside of LOOP.  */
    dom_bbs = xcalloc (n_basic_blocks, sizeof (basic_block));
--- 520,536 ----
    add_bb_to_loop (switch_bb, outer);
  
    /* Fix frequencies.  */
!   if (redirect_all_edges)
!     {
!       switch_bb->frequency = freq;
!       switch_bb->count = cnt;
!       FOR_EACH_EDGE (e, ei, switch_bb->succs)
! 	{
! 	  e->count = (switch_bb->count * e->probability) / REG_BR_PROB_BASE;
! 	}
!     }
!   scale_loop_frequencies (loop, old_scale, REG_BR_PROB_BASE);
!   scale_loop_frequencies (succ_bb->loop_father, new_scale, REG_BR_PROB_BASE);
  
    /* Update dominators of blocks outside of LOOP.  */
    dom_bbs = xcalloc (n_basic_blocks, sizeof (basic_block));
*************** create_loop_notes (void)
*** 1398,1410 ****
      --- edge e ---> [cond expr] ---> [first_head]
                          |
                          +---------> [second_head]
! */
  
  static basic_block
  lv_adjust_loop_entry_edge (basic_block first_head,
  			   basic_block second_head,
  			   edge e,
! 			   tree cond_expr)
  {
    basic_block new_head = NULL;
    edge e1;
--- 1400,1414 ----
      --- edge e ---> [cond expr] ---> [first_head]
                          |
                          +---------> [second_head]
! 
!   THEN_PROB is the probability of then branch of the condition.  */
  
  static basic_block
  lv_adjust_loop_entry_edge (basic_block first_head,
  			   basic_block second_head,
  			   edge e,
! 			   tree cond_expr,
! 			   unsigned then_prob)
  {
    basic_block new_head = NULL;
    edge e1;
*************** lv_adjust_loop_entry_edge (basic_block f
*** 1415,1425 ****
       insert conditional expr.  */
    new_head = split_edge (e);
  
- 
    lv_add_condition_to_bb (first_head, second_head, new_head,
  			  cond_expr);
  
    e1 = make_edge (new_head, first_head, EDGE_TRUE_VALUE);
    set_immediate_dominator (CDI_DOMINATORS, first_head, new_head);
    set_immediate_dominator (CDI_DOMINATORS, second_head, new_head);
  
--- 1419,1434 ----
       insert conditional expr.  */
    new_head = split_edge (e);
  
    lv_add_condition_to_bb (first_head, second_head, new_head,
  			  cond_expr);
  
+   e = single_succ_edge (new_head);
    e1 = make_edge (new_head, first_head, EDGE_TRUE_VALUE);
+   e1->probability = then_prob;
+   e->probability = REG_BR_PROB_BASE - then_prob;
+   e1->count = RDIV (e->count * e1->probability, REG_BR_PROB_BASE);
+   e->count = RDIV (e->count * e->probability, REG_BR_PROB_BASE);
+ 
    set_immediate_dominator (CDI_DOMINATORS, first_head, new_head);
    set_immediate_dominator (CDI_DOMINATORS, second_head, new_head);
  
*************** lv_adjust_loop_entry_edge (basic_block f
*** 1438,1449 ****
--- 1447,1461 ----
     may be a run time test for things that were not resolved by static
     analysis (overlapping ranges (anti-aliasing), alignment, etc.).
  
+    THEN_PROB is the probability of the then edge of the if.
+    
     If PLACE_AFTER is true, we place the new loop after LOOP in the
     instruction stream, otherwise it is placed before LOOP.  */
  
  struct loop *
  loop_version (struct loops *loops, struct loop * loop, 
  	      void *cond_expr, basic_block *condition_bb,
+ 	      unsigned then_prob, unsigned then_scale, unsigned else_scale,
  	      bool place_after)
  {
    basic_block first_head, second_head;
*************** loop_version (struct loops *loops, struc
*** 1475,1481 ****
  
    /* Split loop entry edge and insert new block with cond expr.  */
    cond_bb =  lv_adjust_loop_entry_edge (first_head, second_head,
! 					entry, cond_expr);
    if (condition_bb)
      *condition_bb = cond_bb;
  
--- 1487,1493 ----
  
    /* Split loop entry edge and insert new block with cond expr.  */
    cond_bb =  lv_adjust_loop_entry_edge (first_head, second_head,
! 					entry, cond_expr, then_prob);
    if (condition_bb)
      *condition_bb = cond_bb;
  
*************** loop_version (struct loops *loops, struc
*** 1492,1498 ****
  		   latch_edge,
  		   single_pred_edge (get_bb_copy (loop->header)),
  		   cond_bb, true_edge, false_edge,
! 		   false /* Do not redirect all edges.  */);
  
    exit = loop->single_exit;
    if (exit)
--- 1504,1511 ----
  		   latch_edge,
  		   single_pred_edge (get_bb_copy (loop->header)),
  		   cond_bb, true_edge, false_edge,
! 		   false /* Do not redirect all edges.  */,
! 		   then_scale, else_scale);
  
    exit = loop->single_exit;
    if (exit)
Index: common.opt
===================================================================
RCS file: /cvs/gcc/gcc/gcc/common.opt,v
retrieving revision 1.83.2.2
diff -c -3 -p -r1.83.2.2 common.opt
*** common.opt	2 Sep 2005 16:41:14 -0000	1.83.2.2
--- common.opt	14 Sep 2005 14:10:14 -0000
*************** Common Report Var(flag_pie,1) VarExists
*** 643,649 ****
  Generate position-independent code for executables if possible (small mode)
  
  fprefetch-loop-arrays
! Common Report Var(flag_prefetch_loop_arrays)
  Generate prefetch instructions, if available, for arrays in loops
  
  fprofile
--- 643,653 ----
  Generate position-independent code for executables if possible (small mode)
  
  fprefetch-loop-arrays
! Common Report Var(flag_prefetch_loop_arrays,1)
! Generate prefetch instructions, if available, for arrays in loops
! 
! fprefetch-loop-arrays-rtl
! Common Report Var(flag_prefetch_loop_arrays,2)
  Generate prefetch instructions, if available, for arrays in loops
  
  fprofile
Index: loop-unswitch.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/loop-unswitch.c,v
retrieving revision 1.32.2.1
diff -c -3 -p -r1.32.2.1 loop-unswitch.c
*** loop-unswitch.c	30 Aug 2005 09:31:52 -0000	1.32.2.1
--- loop-unswitch.c	14 Sep 2005 14:10:14 -0000
*************** unswitch_loop (struct loops *loops, stru
*** 465,471 ****
    /* Loopify from the copy of LOOP body, constructing the new loop.  */
    nloop = loopify (loops, latch_edge,
  		   single_pred_edge (get_bb_copy (loop->header)), switch_bb,
! 		   BRANCH_EDGE (switch_bb), FALLTHRU_EDGE (switch_bb), true);
  
    /* Remove branches that are now unreachable in new loops.  */
    remove_path (loops, true_edge);
--- 465,472 ----
    /* Loopify from the copy of LOOP body, constructing the new loop.  */
    nloop = loopify (loops, latch_edge,
  		   single_pred_edge (get_bb_copy (loop->header)), switch_bb,
! 		   BRANCH_EDGE (switch_bb), FALLTHRU_EDGE (switch_bb), true,
! 		   prob, REG_BR_PROB_BASE - prob);
  
    /* Remove branches that are now unreachable in new loops.  */
    remove_path (loops, true_edge);
Index: loop.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/loop.c,v
retrieving revision 1.537.2.1
diff -c -3 -p -r1.537.2.1 loop.c
*** loop.c	2 Sep 2005 16:42:21 -0000	1.537.2.1
--- loop.c	14 Sep 2005 14:10:14 -0000
*************** rest_of_handle_loop_optimize (void)
*** 11763,11769 ****
    free_bb_for_insn ();
    profile_status = PROFILE_ABSENT;
    
!   do_prefetch = flag_prefetch_loop_arrays ? LOOP_PREFETCH : 0;
    
    if (flag_rerun_loop_opt)
      {
--- 11763,11769 ----
    free_bb_for_insn ();
    profile_status = PROFILE_ABSENT;
    
!   do_prefetch = flag_prefetch_loop_arrays == 2 ? LOOP_PREFETCH : 0;
    
    if (flag_rerun_loop_opt)
      {
Index: modulo-sched.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/modulo-sched.c,v
retrieving revision 1.37.2.1
diff -c -3 -p -r1.37.2.1 modulo-sched.c
*** modulo-sched.c	2 Sep 2005 16:42:23 -0000	1.37.2.1
--- modulo-sched.c	14 Sep 2005 14:10:14 -0000
*************** build_loops_structure (FILE *dumpfile)
*** 932,937 ****
--- 932,941 ----
    return loops;
  }
  
+ /* Probability in % that the sms-ed loop rolls enough so that optimized
+    version may be entered.  Just a guess.  */
+ #define PROB_SMS_ENOUGH_ITERATIONS 80
+ 
  /* Main entry point, perform SMS scheduling on the loops of the function
     that consist of single basic blocks.  */
  void
*************** sms_schedule (FILE *dump_file)
*** 947,954 ****
    struct df *df;
    struct loops *loops;
    basic_block bb = NULL;
-   /* vars to the versioning only if needed*/
-   struct loop * nloop;
    basic_block condition_bb = NULL;
    edge latch_edge;
    gcov_type trip_count = 0;
--- 951,956 ----
*************** sms_schedule (FILE *dump_file)
*** 1260,1268 ****
  		{
  		  rtx comp_rtx = gen_rtx_fmt_ee (GT, VOIDmode, count_reg,
  						 GEN_INT(stage_count));
  
! 		  nloop = loop_version (loops, loop, comp_rtx, &condition_bb,
! 					true);
  		}
  
  	      /* Set new iteration count of loop kernel.  */
--- 1262,1273 ----
  		{
  		  rtx comp_rtx = gen_rtx_fmt_ee (GT, VOIDmode, count_reg,
  						 GEN_INT(stage_count));
+ 		  unsigned prob = (PROB_SMS_ENOUGH_ITERATIONS
+ 				   * REG_BR_PROB_BASE) / 100;
  
! 		  loop_version (loops, loop, comp_rtx, &condition_bb,
! 				prob, prob, REG_BR_PROB_BASE - prob,
! 				true);
  		}
  
  	      /* Set new iteration count of loop kernel.  */
Index: passes.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/passes.c,v
retrieving revision 2.105.2.1
diff -c -3 -p -r2.105.2.1 passes.c
*** passes.c	2 Sep 2005 16:42:26 -0000	2.105.2.1
--- passes.c	14 Sep 2005 14:10:14 -0000
*************** init_optimization_passes (void)
*** 573,578 ****
--- 573,579 ----
       pass_may_alias.  */
    NEXT_PASS (pass_lower_vector_ssa);
    NEXT_PASS (pass_complete_unroll);
+   NEXT_PASS (pass_loop_prefetch);
    NEXT_PASS (pass_iv_optimize);
    NEXT_PASS (pass_tree_loop_done);
    *p = NULL;
Index: timevar.def
===================================================================
RCS file: /cvs/gcc/gcc/gcc/timevar.def,v
retrieving revision 1.53.2.1
diff -c -3 -p -r1.53.2.1 timevar.def
*** timevar.def	2 Sep 2005 16:42:48 -0000	1.53.2.1
--- timevar.def	14 Sep 2005 14:10:14 -0000
*************** DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH    , "
*** 106,111 ****
--- 106,112 ----
  DEFTIMEVAR (TV_COMPLETE_UNROLL       , "complete unrolling")
  DEFTIMEVAR (TV_TREE_VECTORIZATION    , "tree vectorization")
  DEFTIMEVAR (TV_TREE_LINEAR_TRANSFORM , "tree loop linear")
+ DEFTIMEVAR (TV_TREE_PREFETCH	     , "tree prefetching")
  DEFTIMEVAR (TV_TREE_LOOP_IVOPTS	     , "tree iv optimization")
  DEFTIMEVAR (TV_TREE_LOOP_INIT	     , "tree loop init")
  DEFTIMEVAR (TV_TREE_LOOP_FINI	     , "tree loop fini")
Index: tree-cfgcleanup.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-cfgcleanup.c,v
retrieving revision 2.3.4.1
diff -c -3 -p -r2.3.4.1 tree-cfgcleanup.c
*** tree-cfgcleanup.c	2 Sep 2005 16:42:54 -0000	2.3.4.1
--- tree-cfgcleanup.c	14 Sep 2005 14:10:14 -0000
*************** Boston, MA 02110-1301, USA.  */
*** 44,49 ****
--- 44,50 ----
  #include "cfgloop.h"
  #include "cfglayout.h"
  #include "hashtab.h"
+ #include "tree-scalar-evolution.h"
  #include "tree-ssa-propagate.h"
  
  /* Remove any fallthru edge from EV.  Return true if an edge was removed.  */
*************** cleanup_tree_cfg (void)
*** 559,581 ****
  void
  cleanup_tree_cfg_loop (void)
  {
!   bitmap changed_bbs = BITMAP_ALLOC (NULL);
  
!   cleanup_tree_cfg ();
! 
!   fix_loop_structure (current_loops, changed_bbs);
!   calculate_dominance_info (CDI_DOMINATORS);
  
!   /* This usually does nothing.  But sometimes parts of cfg that originally
!      were inside a loop get out of it due to edge removal (since they
!      become unreachable by back edges from latch).  */
!   rewrite_into_loop_closed_ssa (changed_bbs, TODO_update_ssa);
  
!   BITMAP_FREE (changed_bbs);
  
  #ifdef ENABLE_CHECKING
!   verify_loop_structure (current_loops);
  #endif
  }
  
  /* Merge the PHI nodes at BB into those at BB's sole successor.  */
--- 560,585 ----
  void
  cleanup_tree_cfg_loop (void)
  {
!   bool changed = cleanup_tree_cfg ();
  
!   if (changed)
!     {
!       bitmap changed_bbs = BITMAP_ALLOC (NULL);
!       fix_loop_structure (current_loops, changed_bbs);
!       calculate_dominance_info (CDI_DOMINATORS);
  
!       /* This usually does nothing.  But sometimes parts of cfg that originally
! 	 were inside a loop get out of it due to edge removal (since they
! 	 become unreachable by back edges from latch).  */
!       rewrite_into_loop_closed_ssa (changed_bbs, TODO_update_ssa);
  
!       BITMAP_FREE (changed_bbs);
  
  #ifdef ENABLE_CHECKING
!       verify_loop_structure (current_loops);
  #endif
+       scev_reset ();
+     }
  }
  
  /* Merge the PHI nodes at BB into those at BB's sole successor.  */
Index: tree-flow.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-flow.h,v
retrieving revision 2.130.2.4
diff -c -3 -p -r2.130.2.4 tree-flow.h
*** tree-flow.h	14 Sep 2005 09:44:17 -0000	2.130.2.4
--- tree-flow.h	14 Sep 2005 14:10:14 -0000
*************** void tree_ssa_lim (struct loops *);
*** 719,724 ****
--- 719,725 ----
  void tree_ssa_unswitch_loops (struct loops *);
  void canonicalize_induction_variables (struct loops *);
  void tree_unroll_loops_completely (struct loops *, bool);
+ void tree_ssa_prefetch_arrays (struct loops *);
  void remove_empty_loops (struct loops *);
  void tree_ssa_iv_optimize (struct loops *);
  void tree_ssa_reverse_loops (struct loops *);
*************** bool tree_duplicate_loop_to_header_edge 
*** 754,764 ****
  					 unsigned int, sbitmap,
  					 edge, edge *,
  					 unsigned int *, int);
  struct loop *tree_ssa_loop_version (struct loops *, struct loop *, tree,
  				    basic_block *);
  tree expand_simple_operations (tree);
  void substitute_in_loop_info (struct loop *, tree, tree);
- edge single_dom_exit (struct loop *);
  bool select_condition_shape (tree, tree, tree, bool, enum tree_code *, tree *);
  unsigned compare_cost (tree);
  
--- 755,767 ----
  					 unsigned int, sbitmap,
  					 edge, edge *,
  					 unsigned int *, int);
+ void tree_unroll_loop (struct loops *, struct loop *, unsigned, edge,
+ 		       struct tree_niter_desc *);
  struct loop *tree_ssa_loop_version (struct loops *, struct loop *, tree,
  				    basic_block *);
+ edge single_dom_exit (const struct loop *);
  tree expand_simple_operations (tree);
  void substitute_in_loop_info (struct loop *, tree, tree);
  bool select_condition_shape (tree, tree, tree, bool, enum tree_code *, tree *);
  unsigned compare_cost (tree);
  
Index: tree-outof-ssa.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-outof-ssa.c,v
retrieving revision 2.64.2.1
diff -c -3 -p -r2.64.2.1 tree-outof-ssa.c
*** tree-outof-ssa.c	2 Sep 2005 16:43:01 -0000	2.64.2.1
--- tree-outof-ssa.c	14 Sep 2005 14:10:14 -0000
*************** typedef struct value_expr_d 
*** 1299,1305 ****
  typedef struct temp_expr_table_d 
  {
    var_map map;
!   void **version_info;		
    value_expr_p *partition_dep_list;
    bitmap replaceable;
    bool saw_replaceable;
--- 1299,1306 ----
  typedef struct temp_expr_table_d 
  {
    var_map map;
!   void **version_info;
!   bitmap *expr_vars;
    value_expr_p *partition_dep_list;
    bitmap replaceable;
    bool saw_replaceable;
*************** new_temp_expr_table (var_map map)
*** 1344,1349 ****
--- 1345,1351 ----
    t->map = map;
  
    t->version_info = xcalloc (num_ssa_names + 1, sizeof (void *));
+   t->expr_vars = xcalloc (num_ssa_names + 1, sizeof (bitmap));
    t->partition_dep_list = xcalloc (num_var_partitions (map) + 1, 
  				   sizeof (value_expr_p));
  
*************** free_temp_expr_table (temp_expr_table_p 
*** 1367,1372 ****
--- 1369,1375 ----
  {
    value_expr_p p;
    tree *ret = NULL;
+   unsigned i;
  
  #ifdef ENABLE_CHECKING
    unsigned x;
*************** free_temp_expr_table (temp_expr_table_p 
*** 1383,1388 ****
--- 1386,1396 ----
    BITMAP_FREE (t->partition_in_use);
    BITMAP_FREE (t->replaceable);
  
+   for (i = 0; i <= num_ssa_names; i++)
+     if (t->expr_vars[i])
+       BITMAP_FREE (t->expr_vars[i]);
+   free (t->expr_vars);
+ 
    free (t->partition_dep_list);
    if (t->saw_replaceable)
      ret = (tree *)t->version_info;
*************** add_dependance (temp_expr_table_p tab, i
*** 1545,1555 ****
  static bool
  check_replaceable (temp_expr_table_p tab, tree stmt)
  {
!   tree var, def;
    int version;
    var_map map = tab->map;
    ssa_op_iter iter;
    tree call_expr;
  
    if (TREE_CODE (stmt) != MODIFY_EXPR)
      return false;
--- 1553,1564 ----
  static bool
  check_replaceable (temp_expr_table_p tab, tree stmt)
  {
!   tree var, def, basevar;
    int version;
    var_map map = tab->map;
    ssa_op_iter iter;
    tree call_expr;
+   bitmap def_vars = BITMAP_ALLOC (NULL), use_vars;
  
    if (TREE_CODE (stmt) != MODIFY_EXPR)
      return false;
*************** check_replaceable (temp_expr_table_p tab
*** 1580,1591 ****
--- 1589,1607 ----
      }
  
    version = SSA_NAME_VERSION (def);
+   basevar = SSA_NAME_VAR (def);
+   bitmap_set_bit (def_vars, DECL_UID (basevar));
  
    /* Add this expression to the dependency list for each use partition.  */
    FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_USE)
      {
        add_dependance (tab, version, var);
+ 
+       use_vars = tab->expr_vars[SSA_NAME_VERSION (var)];
+       if (use_vars)
+ 	bitmap_ior_into (def_vars, use_vars);
      }
+   tab->expr_vars[version] = def_vars;
  
    /* If there are VUSES, add a dependence on virtual defs.  */
    if (!ZERO_SSA_OPERANDS (stmt, SSA_OP_VUSE))
*************** static void
*** 1704,1710 ****
  find_replaceable_in_bb (temp_expr_table_p tab, basic_block bb)
  {
    block_stmt_iterator bsi;
!   tree stmt, def;
    stmt_ann_t ann;
    int partition;
    var_map map = tab->map;
--- 1720,1726 ----
  find_replaceable_in_bb (temp_expr_table_p tab, basic_block bb)
  {
    block_stmt_iterator bsi;
!   tree stmt, def, use;
    stmt_ann_t ann;
    int partition;
    var_map map = tab->map;
*************** find_replaceable_in_bb (temp_expr_table_
*** 1717,1746 ****
        ann = stmt_ann (stmt);
  
        /* Determine if this stmt finishes an existing expression.  */
!       FOR_EACH_SSA_TREE_OPERAND (def, stmt, iter, SSA_OP_USE)
  	{
! 	  if (tab->version_info[SSA_NAME_VERSION (def)])
  	    {
  	      bool same_root_var = false;
- 	      tree def2;
  	      ssa_op_iter iter2;
  
  	      /* See if the root variables are the same.  If they are, we
  		 do not want to do the replacement to avoid problems with
  		 code size, see PR tree-optimization/17549.  */
! 	      FOR_EACH_SSA_TREE_OPERAND (def2, stmt, iter2, SSA_OP_DEF)
! 		if (SSA_NAME_VAR (def) == SSA_NAME_VAR (def2))
! 		  {
! 		    same_root_var = true;
! 		    break;
! 		  }
  
  	      /* Mark expression as replaceable unless stmt is volatile
  		 or DEF sets the same root variable as STMT.  */
  	      if (!ann->has_volatile_ops && !same_root_var)
! 		mark_replaceable (tab, def);
  	      else
! 		finish_expr (tab, SSA_NAME_VERSION (def), false);
  	    }
  	}
        
--- 1733,1766 ----
        ann = stmt_ann (stmt);
  
        /* Determine if this stmt finishes an existing expression.  */
!       FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
  	{
! 	  unsigned ver = SSA_NAME_VERSION (use);
! 
! 	  if (tab->version_info[ver])
  	    {
  	      bool same_root_var = false;
  	      ssa_op_iter iter2;
+ 	      bitmap vars = tab->expr_vars[ver];
  
  	      /* See if the root variables are the same.  If they are, we
  		 do not want to do the replacement to avoid problems with
  		 code size, see PR tree-optimization/17549.  */
! 	      FOR_EACH_SSA_TREE_OPERAND (def, stmt, iter2, SSA_OP_DEF)
! 		{
! 		  if (bitmap_bit_p (vars, DECL_UID (SSA_NAME_VAR (def))))
! 		    {
! 		      same_root_var = true;
! 		      break;
! 		    }
! 		}
  
  	      /* Mark expression as replaceable unless stmt is volatile
  		 or DEF sets the same root variable as STMT.  */
  	      if (!ann->has_volatile_ops && !same_root_var)
! 		mark_replaceable (tab, use);
  	      else
! 		finish_expr (tab, ver, false);
  	    }
  	}
        
Index: tree-pass.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-pass.h,v
retrieving revision 2.53.2.1
diff -c -3 -p -r2.53.2.1 tree-pass.h
*** tree-pass.h	2 Sep 2005 16:43:02 -0000	2.53.2.1
--- tree-pass.h	14 Sep 2005 14:10:14 -0000
*************** extern struct tree_opt_pass pass_record_
*** 235,240 ****
--- 237,243 ----
  extern struct tree_opt_pass pass_if_conversion;
  extern struct tree_opt_pass pass_vectorize;
  extern struct tree_opt_pass pass_complete_unroll;
+ extern struct tree_opt_pass pass_loop_prefetch;
  extern struct tree_opt_pass pass_iv_optimize;
  extern struct tree_opt_pass pass_tree_loop_done;
  extern struct tree_opt_pass pass_ch;
Index: tree-ssa-loop-ivopts.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-ssa-loop-ivopts.c,v
retrieving revision 2.85.2.6
diff -c -3 -p -r2.85.2.6 tree-ssa-loop-ivopts.c
*** tree-ssa-loop-ivopts.c	14 Sep 2005 09:51:50 -0000	2.85.2.6
--- tree-ssa-loop-ivopts.c	14 Sep 2005 14:10:14 -0000
*************** loop_data (struct loop *loop)
*** 370,376 ****
  /* The single loop exit if it dominates the latch, NULL otherwise.  */
  
  edge
! single_dom_exit (struct loop *loop)
  {
    edge exit = loop->single_exit;
  
--- 370,376 ----
  /* The single loop exit if it dominates the latch, NULL otherwise.  */
  
  edge
! single_dom_exit (const struct loop *loop)
  {
    edge exit = loop->single_exit;
  
Index: tree-ssa-loop-manip.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-ssa-loop-manip.c,v
retrieving revision 2.35.2.1
diff -c -3 -p -r2.35.2.1 tree-ssa-loop-manip.c
*** tree-ssa-loop-manip.c	14 Sep 2005 09:44:19 -0000	2.35.2.1
--- tree-ssa-loop-manip.c	14 Sep 2005 14:10:14 -0000
*************** tree_duplicate_loop_to_header_edge (stru
*** 635,637 ****
--- 635,852 ----
  
    return true;
  }
+ 
+ /* Build if (COND) goto THEN_LABEL; else goto ELSE_LABEL;  */
+ 
+ static tree
+ build_if_stmt (tree cond, tree then_label, tree else_label)
+ {
+   return build3 (COND_EXPR, void_type_node,
+ 		 cond,
+ 		 build1 (GOTO_EXPR, void_type_node, then_label),
+ 		 build1 (GOTO_EXPR, void_type_node, else_label));
+ }
+ 
+ /* Unroll LOOP FACTOR times.  LOOPS is the loops tree.  DESC describes
+    number of iterations of LOOP.  EXIT is the exit of the loop to that
+    DESC corresponds.
+    
+    If N is number of iterations of the loop and MAY_BE_ZERO is the condition
+    under that loop exits in the first iteration even if N != 0,
+    
+    while (1)
+      {
+        x = phi (init, next);
+ 
+        pre;
+        if (st)
+          break;
+        post;
+      }
+ 
+    becomes
+    
+    if (MAY_BE_ZERO || N < FACTOR)
+      goto rest;
+ 
+    do
+      {
+        x = phi (init, next);
+ 
+        pre;
+        post;
+        pre;
+        post;
+        ...
+        pre;
+        post;
+        N -= FACTOR;
+        
+      } while (N >= FACTOR);
+ 
+    rest:
+      init' = phi (init, next);
+ 
+    while (1)
+      {
+        x = phi (init', next);
+ 
+        pre;
+        if (st)
+          break;
+        post;
+      } */
+ 
+ /* Probability in % that the unrolled loop is entered.  Just a guess.  */
+ #define PROB_UNROLLED_LOOP_ENTERED 80
+ 
+ void
+ tree_unroll_loop (struct loops *loops, struct loop *loop, unsigned factor,
+ 		  edge exit, struct tree_niter_desc *desc)
+ {
+   tree cond, type, niter, stmts, ctr_before, ctr_after, dont_exit, exit_if;
+   tree phi_old_loop, phi_new_loop, phi_rest, init, next, new_init, var;
+   struct loop *new_loop;
+   basic_block rest, exit_bb;
+   edge old_entry, new_entry, old_latch, precond_edge, new_exit;
+   edge nonexit, new_nonexit;
+   block_stmt_iterator bsi;
+   use_operand_p op;
+   bool ok;
+   unsigned prob_entry, scale_unrolled, scale_rest, est_niter;
+   sbitmap wont_exit;
+ 
+   est_niter = expected_loop_iterations (loop);
+   niter = force_gimple_operand (unshare_expr (desc->niter), &stmts,
+ 				true, NULL_TREE);
+   if (stmts)
+     bsi_insert_on_edge_immediate_loop (loop_preheader_edge (loop), stmts);
+ 
+   type = TREE_TYPE (niter);
+   cond = fold_build2 (GE_EXPR, boolean_type_node,
+ 		      niter, build_int_cst_type (type, factor));
+   if (!zero_p (desc->may_be_zero))
+     cond = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
+ 			invert_truthvalue (unshare_expr (desc->may_be_zero)),
+ 			cond);
+   cond = force_gimple_operand (cond, &stmts, false, NULL_TREE);
+   if (stmts)
+     bsi_insert_on_edge_immediate_loop (loop_preheader_edge (loop), stmts);
+   /* cond now may be a gimple comparison, which would be OK, but also any
+      other gimple rhs (say a && b).  In this case we need to force it to
+      operand.  */
+   if (!is_gimple_condexpr (cond))
+     {
+       cond = force_gimple_operand (cond, &stmts, true, NULL_TREE);
+       if (stmts)
+ 	bsi_insert_on_edge_immediate_loop (loop_preheader_edge (loop), stmts);
+     }
+ 
+   /* Let us assume that the unrolled loop is quite likely to be entered.  */
+   if (nonzero_p (cond))
+     prob_entry = REG_BR_PROB_BASE;
+   else
+     prob_entry = PROB_UNROLLED_LOOP_ENTERED * REG_BR_PROB_BASE / 100;
+ 
+   /* Getting the profile right is nontrivial.  These values should at least
+      keep it consistent, and somewhat close to correct.  */
+   scale_unrolled = prob_entry;
+   scale_rest = REG_BR_PROB_BASE;
+      
+   new_loop = loop_version (loops, loop, cond, NULL, prob_entry, scale_unrolled,
+ 			   scale_rest, true);
+   gcc_assert (new_loop != NULL);
+   update_ssa (TODO_update_ssa);
+ 
+   /* Unroll the loop and remove the old exits.  */
+   dont_exit = ((exit->flags & EDGE_TRUE_VALUE)
+ 	       ? boolean_false_node
+ 	       : boolean_true_node);
+   if (exit == EDGE_SUCC (exit->src, 0))
+     nonexit = EDGE_SUCC (exit->src, 1);
+   else
+     nonexit = EDGE_SUCC (exit->src, 0);
+   nonexit->probability = REG_BR_PROB_BASE;
+   exit->probability = 0;
+   nonexit->count += exit->count;
+   exit->count = 0;
+   exit_if = last_stmt (exit->src);
+   COND_EXPR_COND (exit_if) = dont_exit;
+   update_stmt (exit_if);
+       
+   wont_exit = sbitmap_alloc (factor);
+   sbitmap_ones (wont_exit);
+   ok = tree_duplicate_loop_to_header_edge
+ 	  (loop, loop_latch_edge (loop), loops, factor - 1,
+ 	   wont_exit, NULL, NULL, NULL, DLTHE_FLAG_UPDATE_FREQ);
+   free (wont_exit);
+   gcc_assert (ok);
+   update_ssa (TODO_update_ssa);
+ 
+   /* Prepare the cfg and update the phi nodes.  */
+   rest = loop_preheader_edge (new_loop)->src;
+   precond_edge = single_pred_edge (rest);
+   loop_split_edge_with (loop_latch_edge (loop), NULL);
+   exit_bb = single_pred (loop->latch);
+ 
+   new_exit = make_edge (exit_bb, rest, EDGE_FALSE_VALUE);
+   new_exit->count = loop_preheader_edge (loop)->count;
+   est_niter = est_niter / factor + 1;
+   new_exit->probability = REG_BR_PROB_BASE / est_niter;
+ 
+   new_nonexit = single_pred_edge (loop->latch);
+   new_nonexit->flags = EDGE_TRUE_VALUE;
+   new_nonexit->probability = REG_BR_PROB_BASE - new_exit->probability;
+ 
+   old_entry = loop_preheader_edge (loop);
+   new_entry = loop_preheader_edge (new_loop);
+   old_latch = loop_latch_edge (loop);
+   for (phi_old_loop = phi_nodes (loop->header),
+        phi_new_loop = phi_nodes (new_loop->header);
+        phi_old_loop;
+        phi_old_loop = PHI_CHAIN (phi_old_loop),
+        phi_new_loop = PHI_CHAIN (phi_new_loop))
+     {
+       init = PHI_ARG_DEF_FROM_EDGE (phi_old_loop, old_entry);
+       op = PHI_ARG_DEF_PTR_FROM_EDGE (phi_new_loop, new_entry);
+       gcc_assert (operand_equal_for_phi_arg_p (init, USE_FROM_PTR (op)));
+       next = PHI_ARG_DEF_FROM_EDGE (phi_old_loop, old_latch);
+ 
+       /* Prefer using original variable as a base for the new ssa name.
+ 	 This is necessary for virtual ops, and useful in order to avoid
+ 	 losing debug info for real ops.  */
+       if (TREE_CODE (init) == SSA_NAME)
+ 	var = SSA_NAME_VAR (init);
+       else if (TREE_CODE (next) == SSA_NAME)
+ 	var = SSA_NAME_VAR (next);
+       else
+ 	{
+ 	  var = create_tmp_var (TREE_TYPE (init), "unrinittmp");
+ 	  add_referenced_tmp_var (var);
+ 	}
+ 
+       new_init = make_ssa_name (var, NULL_TREE);
+       phi_rest = create_phi_node (new_init, rest);
+       SSA_NAME_DEF_STMT (new_init) = phi_rest;
+ 
+       add_phi_arg (phi_rest, init, precond_edge);
+       add_phi_arg (phi_rest, next, new_exit);
+       SET_USE (op, new_init);
+     }
+ 
+   /* Finally create the new counter for number of iterations and add the new
+      exit instruction.  */
+   bsi = bsi_last (exit_bb);
+   create_iv (niter, build_int_cst_type (type, -factor), NULL_TREE, loop,
+ 	     &bsi, true, &ctr_before, &ctr_after);
+   exit_if = build_if_stmt (build2 (GE_EXPR, boolean_type_node, ctr_after,
+ 				   build_int_cst_type (type, factor)),
+ 			   tree_block_label (loop->latch),
+ 			   tree_block_label (rest));
+   bsi_insert_after (&bsi, exit_if, BSI_NEW_STMT);
+ 
+   verify_flow_info ();
+   verify_dominators (CDI_DOMINATORS);
+   verify_loop_structure (loops);
+   verify_loop_closed_ssa ();
+ }
Index: tree-ssa-loop-prefetch.c
===================================================================
RCS file: tree-ssa-loop-prefetch.c
diff -N tree-ssa-loop-prefetch.c
*** /dev/null	1 Jan 1970 00:00:00 -0000
--- tree-ssa-loop-prefetch.c	14 Sep 2005 14:10:14 -0000
***************
*** 0 ****
--- 1,1078 ----
+ /* Array prefetching.
+    Copyright (C) 2005 Free Software Foundation, Inc.
+    
+ This file is part of GCC.
+    
+ GCC is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 2, or (at your option) any
+ later version.
+    
+ GCC is distributed in the hope that it will be useful, but WITHOUT
+ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ for more details.
+    
+ You should have received a copy of the GNU General Public License
+ along with GCC; see the file COPYING.  If not, write to the Free
+ Software Foundation, 59 Temple Place - Suite 330, Boston, MA
+ 02111-1307, USA.  */
+ 
+ #include "config.h"
+ #include "system.h"
+ #include "coretypes.h"
+ #include "tm.h"
+ #include "tree.h"
+ #include "rtl.h"
+ #include "tm_p.h"
+ #include "hard-reg-set.h"
+ #include "basic-block.h"
+ #include "output.h"
+ #include "diagnostic.h"
+ #include "tree-flow.h"
+ #include "tree-dump.h"
+ #include "timevar.h"
+ #include "cfgloop.h"
+ #include "varray.h"
+ #include "expr.h"
+ #include "tree-pass.h"
+ #include "ggc.h"
+ #include "insn-config.h"
+ #include "recog.h"
+ #include "hashtab.h"
+ #include "tree-chrec.h"
+ #include "tree-scalar-evolution.h"
+ #include "toplev.h"
+ #include "params.h"
+ #include "langhooks.h"
+ 
+ /* This pass inserts prefetch instructions to optimize cache usage during
+    accesses to arrays in loops.  It processes loops sequentially and:
+ 
+    1) Gathers all memory references in the single loop.
+    2) For each of the references it decides when it is profitable to prefetch
+       it.  To do it, we evaluate the reuse among the accesses, and determines
+       two values: PREFETCH_BEFORE (meaning that it only makes sense to do
+       prefetching in the first PREFETCH_BEFORE iterations of the loop) and
+       PREFETCH_MOD (meaning that it only makes sense to prefetch in the
+       iterations of the loop that are zero modulo PREFETCH_MOD).  For example
+       (assuming cache line size is 64 bytes, char has size 1 byte and there
+       is no hardware sequential prefetch):
+ 
+       char *a;
+       for (i = 0; i < max; i++)
+ 	{
+ 	  a[255] = ...;		(0)
+ 	  a[i] = ...;		(1)
+ 	  a[i + 64] = ...;	(2)
+ 	  a[16*i] = ...;	(3)
+ 	  a[187*i] = ...;	(4)
+ 	  a[187*i + 50] = ...;	(5)
+ 	}
+ 
+        (0) obviously has PREFETCH_BEFORE 1
+        (1) has PREFETCH_BEFORE 64, since (2) accesses the same memory
+            location 64 iterations before it, and PREFETCH_MOD 64 (since
+ 	   it hits the same cache line otherwise).
+        (2) has PREFETCH_MOD 64
+        (3) has PREFETCH_MOD 4
+        (4) has PREFETCH_MOD 1.  We do not set PREFETCH_BEFORE here, since
+            the cache line accessed by (4) is the same with probability only
+ 	   7/32.
+        (5) has PREFETCH_MOD 1 as well.
+ 
+    3) We determine how much ahead we need to prefetch.  The number of
+       iterations needed is time to fetch / time spent in one iteration of
+       the loop.  The problem is that we do not know either of these values,
+       so we just make a heuristic guess based on a magic (possibly)
+       target-specific constant and size of the loop.
+ 
+    4) Determine which of the references we prefetch.  We take into account
+       that there is a maximum number of simultaneous prefetches (provided
+       by machine description).  We prefetch as many prefetches as possible
+       while still within this bound (starting with those with lowest
+       prefetch_mod, since they are responsible for most of the cache
+       misses).
+       
+    5) We unroll and peel loops so that we are able to satisfy PREFETCH_MOD
+       and PREFETCH_BEFORE requirements (within some bounds), and to avoid
+       prefetching nonaccessed memory.
+       TODO -- actually implement this.
+       
+    6) We actually emit the prefetch instructions.  ??? Perhaps emit the
+       prefetch instructions with guards in cases where 5) was not sufficient
+       to satisfy the constraints?
+ 
+    Some other TODO:
+       -- write and use more general reuse analysis (that could be also used
+ 	 in other cache aimed loop optimizations)
+       -- make it behave sanely together with the prefetches given by user
+ 	 (now we just ignore them; at the very least we should avoid
+ 	 optimizing loops in that user put his own prefetches)
+       -- we assume cache line size alignment of arrays; this could be
+ 	 improved.  */
+ 
+ /* Magic constants follow.  These should be replaced by machine specific
+    numbers.  */
+ 
+ /* A number that should rouhgly correspond to the number of instructions
+    executed before the prefetch is completed.  */
+ 
+ #ifndef PREFETCH_LATENCY
+ #define PREFETCH_LATENCY 200
+ #endif
+ 
+ /* Number of prefetches that can run at the same time.  */
+ 
+ #ifndef SIMULTANEOUS_PREFETCHES
+ #define SIMULTANEOUS_PREFETCHES 3
+ #endif
+ 
+ /* True if write can be prefetched by a read prefetch.  */
+ 
+ #ifndef WRITE_CAN_USE_READ_PREFETCH
+ #define WRITE_CAN_USE_READ_PREFETCH 1
+ #endif
+ 
+ /* True if read can be prefetched by a write prefetch. */
+ 
+ #ifndef READ_CAN_USE_WRITE_PREFETCH
+ #define READ_CAN_USE_WRITE_PREFETCH 0
+ #endif
+ 
+ /* Cache line size.  Assumed to be a power of two.  */
+ 
+ #ifndef PREFETCH_BLOCK
+ #define PREFETCH_BLOCK 32
+ #endif
+ 
+ /* Do we have a forward hardware sequential prefetching?  */
+ 
+ #ifndef HAVE_FORWARD_PREFETCH
+ #define HAVE_FORWARD_PREFETCH 0
+ #endif
+ 
+ /* Do we have a backward hardware sequential prefetching?  */
+ 
+ #ifndef HAVE_BACKWARD_PREFETCH
+ #define HAVE_BACKWARD_PREFETCH 0
+ #endif
+ 
+ /* In some cases we are only able to determine that there is a certain
+    probability that the two accesses hit the same cache line.  In this
+    case, we issue the prefetches for both of them if this probability
+    is less then (1000 - ACCEPTABLE_MISS_RATE) promile.  */
+ 
+ #ifndef ACCEPTABLE_MISS_RATE
+ #define ACCEPTABLE_MISS_RATE 50
+ #endif
+ 
+ #ifndef HAVE_prefetch
+ #define HAVE_prefetch 0
+ #endif
+ 
+ /* The group of references between that reuse may occur.  */
+ 
+ struct mem_ref_group
+ {
+   tree base;			/* Base of the reference.  */
+   HOST_WIDE_INT step;		/* Step of the reference.  */
+   struct mem_ref *refs;		/* References in the group.  */
+   struct mem_ref_group *next;	/* Next group of references.  */
+ };
+ 
+ /* Assigned to PREFETCH_BEFORE when all iterations are to be prefetched.  */
+ 
+ #define PREFETCH_ALL		(~(unsigned HOST_WIDE_INT) 0)
+ 
+ /* The memory reference.  */
+ 
+ struct mem_ref
+ {
+   tree stmt;			/* Statement in that the reference appears.  */
+   tree mem;			/* The reference.  */
+   HOST_WIDE_INT delta;		/* Constant offset of the reference.  */
+   bool write_p;			/* Is it a write?  */
+   struct mem_ref_group *group;	/* The group of references it belongs to.  */
+   unsigned HOST_WIDE_INT prefetch_mod;
+ 				/* Prefetch only each PREFETCH_MOD-th
+ 				   iteration.  */
+   unsigned HOST_WIDE_INT prefetch_before;
+ 				/* Prefetch only first PREFETCH_BEFORE
+ 				   iterations.  */
+   bool issue_prefetch_p;	/* Should we really issue the prefetch?  */
+   struct mem_ref *next;		/* The next reference in the group.  */
+ };
+ 
+ /* Dumps information obout reference REF to FILE.  */
+ 
+ static void
+ dump_mem_ref (FILE *file, struct mem_ref *ref)
+ {
+   fprintf (file, "Reference %p:\n", (void *) ref);
+ 
+   fprintf (file, "  group %p (base ", (void *) ref->group);
+   print_generic_expr (file, ref->group->base, TDF_SLIM);
+   fprintf (file, ", step ");
+   fprintf (file, HOST_WIDE_INT_PRINT_DEC, ref->group->step);
+   fprintf (file, ")\n");
+ 
+   fprintf (dump_file, "  delta ");
+   fprintf (file, HOST_WIDE_INT_PRINT_DEC, ref->delta);
+   fprintf (file, "\n");
+ 
+   fprintf (file, "  %s\n", ref->write_p ? "write" : "read");
+ 
+   fprintf (file, "\n");
+ }
+ 
+ /* Finds a group with BASE and STEP in GROUPS, or creates one if it does not
+    exist.  */
+ 
+ static struct mem_ref_group *
+ find_or_create_group (struct mem_ref_group **groups, tree base,
+ 		      HOST_WIDE_INT step)
+ {
+   struct mem_ref_group *group;
+ 
+   for (; *groups; groups = &(*groups)->next)
+     {
+       if ((*groups)->step == step
+ 	  && operand_equal_p ((*groups)->base, base, 0))
+ 	return *groups;
+ 
+       /* Keep the list of groups sorted by decreasing step.  */
+       if ((*groups)->step < step)
+ 	break;
+     }
+ 
+   group = xcalloc (1, sizeof (struct mem_ref_group));
+   group->base = base;
+   group->step = step;
+   group->refs = NULL;
+   group->next = *groups;
+   *groups = group;
+ 
+   return group;
+ }
+ 
+ /* Records a memory reference MEM in GROUP with offset DELTA and write status
+    WRITE_P.  The reference occurs in statement STMT.  */
+ 
+ static void
+ record_ref (struct mem_ref_group *group, tree stmt, tree mem,
+ 	    HOST_WIDE_INT delta, bool write_p)
+ {
+   struct mem_ref **aref;
+ 
+   /* Do not record the same address twice.  */
+   for (aref = &group->refs; *aref; aref = &(*aref)->next)
+     {
+       /* It does not have to be possible for write reference to reuse the read
+ 	 prefetch, or vice versa.  */
+       if (!WRITE_CAN_USE_READ_PREFETCH
+ 	  && write_p
+ 	  && !(*aref)->write_p)
+ 	continue;
+       if (!READ_CAN_USE_WRITE_PREFETCH
+ 	  && !write_p
+ 	  && (*aref)->write_p)
+ 	continue;
+ 
+       if ((*aref)->delta == delta)
+ 	return;
+     }
+ 
+   (*aref) = xcalloc (1, sizeof (struct mem_ref));
+   (*aref)->stmt = stmt;
+   (*aref)->mem = mem;
+   (*aref)->delta = delta;
+   (*aref)->write_p = write_p;
+   (*aref)->prefetch_before = PREFETCH_ALL;
+   (*aref)->prefetch_mod = 1;
+   (*aref)->issue_prefetch_p = false;
+   (*aref)->group = group;
+   (*aref)->next = NULL;
+ 
+   if (dump_file && (dump_flags & TDF_DETAILS))
+     dump_mem_ref (dump_file, *aref);
+ }
+ 
+ /* Release memory references in GROUPS.  */
+ 
+ static void
+ release_mem_refs (struct mem_ref_group *groups)
+ {
+   struct mem_ref_group *next_g;
+   struct mem_ref *ref, *next_r;
+ 
+   for (; groups; groups = next_g)
+     {
+       next_g = groups->next;
+       for (ref = groups->refs; ref; ref = next_r)
+ 	{
+ 	  next_r = ref->next;
+ 	  free (ref);
+ 	}
+       free (groups);
+     }
+ }
+ 
+ /* A structure used to pass arguments to idx_analyze_ref.  */
+ 
+ struct ar_data
+ {
+   struct loop *loop;			/* Loop of the reference.  */
+   tree stmt;				/* Statement of the reference.  */
+   HOST_WIDE_INT *step;			/* Step of the memory reference.  */
+   HOST_WIDE_INT *delta;			/* Offset of the memory reference.  */
+ };
+ 
+ /* Analyzes a single INDEX of a memory reference to obtain information
+    described at analyze_ref.  Callback for for_each_index.  */
+ 
+ static bool
+ idx_analyze_ref (tree base, tree *index, void *data)
+ {
+   struct ar_data *ar_data = data;
+   tree ibase, step, stepsize;
+   HOST_WIDE_INT istep, idelta = 0, imult = 1;
+ 
+   if (TREE_CODE (base) == MISALIGNED_INDIRECT_REF
+       || TREE_CODE (base) == ALIGN_INDIRECT_REF)
+     return false;
+ 
+   if (!simple_iv (ar_data->loop, ar_data->stmt, *index, &ibase, &step, false))
+     return false;
+ 
+   if (zero_p (step))
+     istep = 0;
+   else
+     {
+       if (!cst_and_fits_in_hwi (step))
+ 	return false;
+       istep = int_cst_value (step);
+     }
+ 
+   if (TREE_CODE (ibase) == PLUS_EXPR
+       && cst_and_fits_in_hwi (TREE_OPERAND (ibase, 1)))
+     {
+       idelta = int_cst_value (TREE_OPERAND (ibase, 1));
+       ibase = TREE_OPERAND (ibase, 0);
+     }
+   if (cst_and_fits_in_hwi (ibase))
+     {
+       idelta += int_cst_value (ibase);
+       ibase = build_int_cst_type (TREE_TYPE (ibase), 0);
+     }
+ 
+   if (TREE_CODE (base) == ARRAY_REF)
+     {
+       stepsize = array_ref_element_size (base);
+       if (!cst_and_fits_in_hwi (stepsize))
+ 	return false;
+       imult = int_cst_value (stepsize);
+ 
+       istep *= imult;
+       idelta *= imult;
+     }
+ 
+   *ar_data->step += istep;
+   *ar_data->delta += idelta;
+   *index = ibase;
+ 
+   return true;
+ }
+ 
+ /* Tries to express REF in shape &BASE + STEP * iter + DELTA, where DELTA and
+    STEP are integer constants and iter is number of iterations of LOOP.  The
+    reference occurs in statement STMT.  */
+ 
+ static bool
+ analyze_ref (struct loop *loop, tree ref, tree *base,
+ 	     HOST_WIDE_INT *step, HOST_WIDE_INT *delta,
+ 	     tree stmt)
+ {
+   struct ar_data ar_data;
+   tree off;
+   HOST_WIDE_INT bit_offset;
+ 
+   *step = 0;
+   *delta = 0;
+ 
+   /* First strip off the component references.  Ignore bitfields.  */
+   if (TREE_CODE (ref) == COMPONENT_REF
+       && DECL_NONADDRESSABLE_P (TREE_OPERAND (ref, 1)))
+     ref = TREE_OPERAND (ref, 0);
+ 
+   for (; TREE_CODE (ref) == COMPONENT_REF; ref = TREE_OPERAND (ref, 0))
+     {
+       off = DECL_FIELD_BIT_OFFSET (TREE_OPERAND (ref, 1));
+       bit_offset = TREE_INT_CST_LOW (off);
+       gcc_assert (bit_offset % BITS_PER_UNIT == 0);
+       
+       *delta += bit_offset / BITS_PER_UNIT;
+     }
+ 
+   *base = unshare_expr (ref);
+   ar_data.loop = loop;
+   ar_data.stmt = stmt;
+   ar_data.step = step;
+   ar_data.delta = delta;
+   return for_each_index (base, idx_analyze_ref, &ar_data);
+ }
+ 
+ /* Record a memory reference REF to the list REFS.  The reference occurs in
+    LOOP in statement STMT and it is write if WRITE_P.  */
+ 
+ static void
+ gather_memory_references_ref (struct loop *loop, struct mem_ref_group **refs,
+ 			      tree ref, bool write_p, tree stmt)
+ {
+   tree base;
+   HOST_WIDE_INT step, delta;
+   struct mem_ref_group *agrp;
+ 
+   if (!analyze_ref (loop, ref, &base, &step, &delta, stmt))
+     return;
+ 
+   /* Now we know that REF = &BASE + STEP * iter + DELTA, where DELTA and STEP
+      are integer constants.  */
+   agrp = find_or_create_group (refs, base, step);
+   record_ref (agrp, stmt, ref, delta, write_p);
+ }
+ 
+ /* Record the suitable memory references in LOOP.  */
+ 
+ static struct mem_ref_group *
+ gather_memory_references (struct loop *loop)
+ {
+   basic_block *body = get_loop_body_in_dom_order (loop);
+   basic_block bb;
+   unsigned i;
+   block_stmt_iterator bsi;
+   tree stmt, lhs, rhs;
+   struct mem_ref_group *refs = NULL;
+ 
+   /* Scan the loop body in order, so that the former references precede the
+      later ones.  */
+   for (i = 0; i < loop->num_nodes; i++)
+     {
+       bb = body[i];
+       if (bb->loop_father != loop)
+ 	continue;
+ 
+       for (bsi = bsi_start (bb); !bsi_end_p (bsi); bsi_next (&bsi))
+ 	{
+ 	  stmt = bsi_stmt (bsi);
+ 	  if (TREE_CODE (stmt) != MODIFY_EXPR)
+ 	    continue;
+ 
+ 	  lhs = TREE_OPERAND (stmt, 0);
+ 	  rhs = TREE_OPERAND (stmt, 1);
+ 
+ 	  if (REFERENCE_CLASS_P (rhs))
+ 	    gather_memory_references_ref (loop, &refs, rhs, false, stmt);
+ 	  if (REFERENCE_CLASS_P (lhs))
+ 	    gather_memory_references_ref (loop, &refs, lhs, true, stmt);
+ 	}
+     }
+   free (body);
+ 
+   return refs;
+ }
+ 
+ /* Prune the prefetch candidate REF using the self-reuse.  */
+ 
+ static void
+ prune_ref_by_self_reuse (struct mem_ref *ref)
+ {
+   HOST_WIDE_INT step = ref->group->step;
+   bool backward = step < 0;
+ 
+   if (step == 0)
+     {
+       /* Prefetch references to invariant address just once.  */
+       ref->prefetch_before = 1;
+       return;
+     }
+ 
+   if (backward)
+     step = -step;
+ 
+   if (step > PREFETCH_BLOCK)
+     return;
+ 
+   if ((backward && HAVE_BACKWARD_PREFETCH)
+       || (!backward && HAVE_FORWARD_PREFETCH))
+     {
+       ref->prefetch_before = 1;
+       return;
+     }
+ 
+   ref->prefetch_mod = PREFETCH_BLOCK / step;
+ }
+ 
+ /* Divides X by BY, rounding down.  */
+ 
+ static HOST_WIDE_INT
+ ddown (HOST_WIDE_INT x, unsigned HOST_WIDE_INT by)
+ {
+   gcc_assert (by > 0);
+ 
+   if (x >= 0)
+     return x / by;
+   else
+     return (x + by - 1) / by;
+ }
+ 
+ /* Prune the prefetch candidate REF using the reuse with BY.
+    If BY_IS_BEFORE is true, BY is before REF in the loop.  */
+ 
+ static void
+ prune_ref_by_group_reuse (struct mem_ref *ref, struct mem_ref *by,
+ 			  bool by_is_before)
+ {
+   HOST_WIDE_INT step = ref->group->step;
+   bool backward = step < 0;
+   HOST_WIDE_INT delta_r = ref->delta, delta_b = by->delta;
+   HOST_WIDE_INT delta = delta_b - delta_r;
+   HOST_WIDE_INT hit_from;
+   unsigned HOST_WIDE_INT prefetch_before, prefetch_block;
+ 
+   if (delta == 0)
+     {
+       /* If the references has the same address, only prefetch the
+ 	 former.  */
+       if (by_is_before)
+ 	ref->prefetch_before = 0;
+       
+       return;
+     }
+ 
+   if (!step)
+     {
+       /* If the reference addresses are invariant and fall into the
+ 	 same cache line, prefetch just the first one.  */
+       if (!by_is_before)
+ 	return;
+ 
+       if (ddown (ref->delta, PREFETCH_BLOCK)
+ 	  != ddown (by->delta, PREFETCH_BLOCK))
+ 	return;
+ 
+       ref->prefetch_before = 0;
+       return;
+     }
+ 
+   /* Only prune the reference that is behind in the array.  */
+   if (backward)
+     {
+       if (delta > 0)
+ 	return;
+ 
+       /* Transform the data so that we may assume that the accesses
+ 	 are forward.  */
+       delta = - delta;
+       step = -step;
+       delta_r = PREFETCH_BLOCK - 1 - delta_r;
+       delta_b = PREFETCH_BLOCK - 1 - delta_b;
+     }
+   else
+     {
+       if (delta < 0)
+ 	return;
+     }
+ 
+   /* Check whether the two references are likely to hit the same cache
+      line, and how distant the iterations in that it occurs are from
+      each other.  */
+ 
+   if (step <= PREFETCH_BLOCK)
+     {
+       /* The accesses are sure to meet.  Let us check when.  */
+       hit_from = ddown (delta_b, PREFETCH_BLOCK) * PREFETCH_BLOCK;
+       prefetch_before = (hit_from - delta_r + step - 1) / step;
+ 
+       if (prefetch_before < ref->prefetch_before)
+ 	ref->prefetch_before = prefetch_before;
+ 
+       return;
+     }
+ 
+   /* A more complicated case.  First let us ensure that size of cache line
+      and step are coprime (here we assume that PREFETCH_BLOCK is a power
+      of two.  */
+   prefetch_block = PREFETCH_BLOCK;
+   while ((step & 1) == 0
+ 	 && prefetch_block > 1)
+     {
+       step >>= 1;
+       prefetch_block >>= 1;
+       delta >>= 1;
+     }
+ 
+   /* Now step > prefetch_block, and step and prefetch_block are coprime.
+      Determine the probability that the accesses hit the same cache line.  */
+ 
+   prefetch_before = delta / step;
+   delta %= step;
+   if ((unsigned HOST_WIDE_INT) delta
+       <= (prefetch_block * ACCEPTABLE_MISS_RATE / 1000))
+     {
+       if (prefetch_before < ref->prefetch_before)
+ 	ref->prefetch_before = prefetch_before;
+ 
+       return;
+     }
+ 
+   /* Try also the following iteration.  */
+   prefetch_before++;
+   delta = step - delta;
+   if ((unsigned HOST_WIDE_INT) delta
+       <= (prefetch_block * ACCEPTABLE_MISS_RATE / 1000))
+     {
+       if (prefetch_before < ref->prefetch_before)
+ 	ref->prefetch_before = prefetch_before;
+ 
+       return;
+     }
+ 
+   /* The ref probably does not reuse by.  */
+   return;
+ }
+ 
+ /* Prune the prefetch candidate REF using the reuses with other references
+    in REFS.  */
+ 
+ static void
+ prune_ref_by_reuse (struct mem_ref *ref, struct mem_ref *refs)
+ {
+   struct mem_ref *prune_by;
+   bool before = true;
+ 
+   prune_ref_by_self_reuse (ref);
+ 
+   for (prune_by = refs; prune_by; prune_by = prune_by->next)
+     {
+       if (prune_by == ref)
+ 	{
+ 	  before = false;
+ 	  continue;
+ 	}
+ 
+       if (!WRITE_CAN_USE_READ_PREFETCH
+ 	  && ref->write_p
+ 	  && !prune_by->write_p)
+ 	continue;
+       if (!READ_CAN_USE_WRITE_PREFETCH
+ 	  && !ref->write_p
+ 	  && prune_by->write_p)
+ 	continue;
+ 
+       prune_ref_by_group_reuse (ref, prune_by, before);
+     }
+ }
+ 
+ /* Prune the prefetch candidates in GROUP using the reuse analysis.  */
+ 
+ static void
+ prune_group_by_reuse (struct mem_ref_group *group)
+ {
+   struct mem_ref *ref_pruned;
+ 
+   for (ref_pruned = group->refs; ref_pruned; ref_pruned = ref_pruned->next)
+     {
+       prune_ref_by_reuse (ref_pruned, group->refs);
+ 
+       if (dump_file && (dump_flags & TDF_DETAILS))
+ 	{
+ 	  fprintf (dump_file, "Reference %p:", (void *) ref_pruned);
+ 
+ 	  if (ref_pruned->prefetch_before == PREFETCH_ALL
+ 	      && ref_pruned->prefetch_mod == 1)
+ 	    fprintf (dump_file, " no restrictions");
+ 	  else if (ref_pruned->prefetch_before == 0)
+ 	    fprintf (dump_file, " do not prefetch");
+ 	  else if (ref_pruned->prefetch_before <= ref_pruned->prefetch_mod)
+ 	    fprintf (dump_file, " prefetch once");
+ 	  else
+ 	    {
+ 	      if (ref_pruned->prefetch_before != PREFETCH_ALL)
+ 		{
+ 		  fprintf (dump_file, " prefetch before ");
+ 		  fprintf (dump_file, HOST_WIDE_INT_PRINT_DEC,
+ 			   ref_pruned->prefetch_before);
+ 		}
+ 	      if (ref_pruned->prefetch_mod != 1)
+ 		{
+ 		  fprintf (dump_file, " prefetch mod ");
+ 		  fprintf (dump_file, HOST_WIDE_INT_PRINT_DEC,
+ 			   ref_pruned->prefetch_mod);
+ 		}
+ 	    }
+ 	  fprintf (dump_file, "\n");
+ 	}
+     }
+ }
+ 
+ /* Prune the list of prefetch candidates GROUPS using the reuse analysis.  */
+ 
+ static void
+ prune_by_reuse (struct mem_ref_group *groups)
+ {
+   for (; groups; groups = groups->next)
+     prune_group_by_reuse (groups);
+ }
+ 
+ /* Returns true if we should issue prefetch for REF.  */
+ 
+ static bool
+ should_issue_prefetch_p (struct mem_ref *ref)
+ {
+   /* For now do not issue prefetches for only first few of the
+      iterations.  */
+   if (ref->prefetch_before != PREFETCH_ALL)
+     return false;
+ 
+   return true;
+ }
+ 
+ /* Decide which of the prefetch candidates in GROUPS to prefetch.
+    AHEAD is the number of iterations to prefetch ahead (which corresponds
+    to the number of simultaneous instances of one prefetch running at a
+    time).  UNROLL_FACTOR is the factor by that the loop is going to be
+    unrolled.  Returns true if there is anything to prefetch.  */
+ 
+ static bool
+ schedule_prefetches (struct mem_ref_group *groups, unsigned unroll_factor,
+ 		     unsigned ahead)
+ {
+   unsigned max_prefetches, n_prefetches;
+   struct mem_ref *ref;
+   bool any = false;
+ 
+   max_prefetches = (SIMULTANEOUS_PREFETCHES * unroll_factor) / ahead;
+   if (max_prefetches > (unsigned) SIMULTANEOUS_PREFETCHES)
+     max_prefetches = SIMULTANEOUS_PREFETCHES;
+ 
+   if (dump_file && (dump_flags & TDF_DETAILS))
+     fprintf (dump_file, "Max prefetches to issue: %d.\n", max_prefetches);
+ 
+   if (!max_prefetches)
+     return false;
+ 
+   /* For now we just take memory references one by one and issue
+      prefetches for as many as possible.  The groups are sorted
+      starting with the largest step, since the references with
+      large step are more likely to cause many cache mises.  */
+ 
+   for (; groups; groups = groups->next)
+     for (ref = groups->refs; ref; ref = ref->next)
+       {
+ 	if (!should_issue_prefetch_p (ref))
+ 	  continue;
+ 
+ 	ref->issue_prefetch_p = true;
+ 
+ 	/* If prefetch_mod is less then unroll_factor, we need to insert
+ 	   several prefetches for the reference.  */
+ 	n_prefetches = ((unroll_factor + ref->prefetch_mod - 1)
+ 			/ ref->prefetch_mod);
+ 	if (max_prefetches <= n_prefetches)
+ 	  return true;
+ 
+ 	max_prefetches -= n_prefetches;
+ 	any = true;
+       }
+ 
+   return any;
+ }
+ 
+ /* Determine whether there is any reference suitable for prefetching
+    in GROUPS.  */
+ 
+ static bool
+ anything_to_prefetch_p (struct mem_ref_group *groups)
+ {
+   struct mem_ref *ref;
+ 
+   for (; groups; groups = groups->next)
+     for (ref = groups->refs; ref; ref = ref->next)
+       if (should_issue_prefetch_p (ref))
+ 	return true;
+ 
+   return false;
+ }
+ 
+ /* Issue prefetches for the reference REF into loop as decided before.
+    HEAD is the number of iterations to prefetch ahead.  UNROLL_FACTOR
+    is the factor by thet LOOP was unrolled.  */
+ 
+ static void
+ issue_prefetch_ref (struct mem_ref *ref, unsigned unroll_factor, unsigned ahead)
+ {
+   HOST_WIDE_INT delta;
+   tree addr, addr_base, prefetch, params, write_p;
+   block_stmt_iterator bsi;
+   unsigned n_prefetches, ap;
+ 
+   if (dump_file && (dump_flags & TDF_DETAILS))
+     fprintf (dump_file, "Issued prefetch for %p.\n", (void *) ref);
+ 
+   bsi = bsi_for_stmt (ref->stmt);
+ 
+   n_prefetches = ((unroll_factor + ref->prefetch_mod - 1)
+ 		  / ref->prefetch_mod);
+   addr_base = build_fold_addr_expr_with_type (ref->mem, ptr_type_node);
+   addr_base = force_gimple_operand_bsi (&bsi, unshare_expr (addr_base), true, NULL);
+ 
+   for (ap = 0; ap < n_prefetches; ap++)
+     {
+       /* Determine the address to prefetch.  */
+       delta = (ahead + ap * ref->prefetch_mod) * ref->group->step;
+       addr = fold_build2 (PLUS_EXPR, ptr_type_node,
+ 			  addr_base, build_int_cst (ptr_type_node, delta));
+       addr = force_gimple_operand_bsi (&bsi, unshare_expr (addr), true, NULL);
+ 
+       /* Create the prefetch instruction.  */
+       write_p = ref->write_p ? integer_one_node : integer_zero_node;
+       params = tree_cons (NULL_TREE, addr,
+ 			  tree_cons (NULL_TREE, write_p, NULL_TREE));
+ 				 
+       prefetch = build_function_call_expr (built_in_decls[BUILT_IN_PREFETCH],
+ 					   params);
+       bsi_insert_before (&bsi, prefetch, BSI_SAME_STMT);
+     }
+ }
+ 
+ /* Issue prefetches for the references in GROUPS into loop as decided before.
+    HEAD is the number of iterations to prefetch ahead.  UNROLL_FACTOR is the
+    factor by that LOOP was unrolled.  */
+ 
+ static void
+ issue_prefetches (struct mem_ref_group *groups,
+ 		  unsigned unroll_factor, unsigned ahead)
+ {
+   struct mem_ref *ref;
+ 
+   for (; groups; groups = groups->next)
+     for (ref = groups->refs; ref; ref = ref->next)
+       if (ref->issue_prefetch_p)
+ 	issue_prefetch_ref (ref, unroll_factor, ahead);
+ }
+ 
+ /* Determines whether we can profitably unroll LOOP, and if this is the
+    case, fill in DESC by the description of number of iterations.  */
+ 
+ static bool
+ can_unroll_loop_p (struct loop *loop, struct tree_niter_desc *desc)
+ {
+   edge exit;
+ 
+   /* Only consider innermost loops for unrolling.  */
+   if (loop->inner)
+     return false;
+ 
+   /* We only consider loops without control flow for unrolling.  This is not
+      a hard restriction -- tree_unroll_loop works with arbitrary loops
+      as well; but the unrolling/prefetching is usually more profitable for
+      loops consisting of a single basic block, and we want to limit the
+      code growth.  */
+   if (loop->num_nodes > 2)
+     return false;
+ 
+   /* We only want to unroll loops for that we are able to determine number
+      of iterations.  We also want to split the extra iterations of the loop
+      from its end, therefore we require that the loop has precisely one
+      exit.  */
+ 
+   exit = single_dom_exit (loop);
+   if (!exit)
+     return false;
+ 
+   if (!number_of_iterations_exit (loop, exit, desc, false))
+     return false;
+ 
+   /* And of course, we must be able to duplicate the loop.  */
+   if (!can_duplicate_loop_p (loop))
+     return false;
+ 
+   return true;
+ }
+ 
+ /* Determine the coefficient by that unroll LOOP, from the information
+    contained in the list of memory references REFS.  Description of
+    umber of iterations of LOOP is stored to DESC.  AHEAD is the number
+    of iterations ahead that we need to prefetch.  NINSNS is number of
+    insns of the LOOP.  */
+ 
+ static unsigned
+ determine_unroll_factor (struct loop *loop, struct mem_ref_group *refs,
+ 			 unsigned ahead, unsigned ninsns,
+ 			 struct tree_niter_desc *desc)
+ {
+   unsigned upper_bound, size_factor, constraint_factor;
+   unsigned factor, max_mod_constraint, ahead_factor;
+   struct mem_ref_group *agp;
+   struct mem_ref *ref;
+ 
+   if (!can_unroll_loop_p (loop, desc))
+     return 1;
+ 
+   upper_bound = PARAM_VALUE (PARAM_MAX_UNROLL_TIMES);
+ 
+   /* First check whether the loop is not too large to unroll.  */
+   size_factor = PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS) / ninsns;
+   if (size_factor <= 1)
+     return 1;
+ 
+   if (size_factor < upper_bound)
+     upper_bound = size_factor;
+ 
+   max_mod_constraint = 1;
+   for (agp = refs; agp; agp = agp->next)
+     for (ref = agp->refs; ref; ref = ref->next)
+       if (should_issue_prefetch_p (ref)
+ 	  && ref->prefetch_mod > max_mod_constraint)
+ 	max_mod_constraint = ref->prefetch_mod;
+ 
+   /* Set constraint_factor as large as needed to be able to satisfy the
+      largest modulo constraint.  */
+   constraint_factor = max_mod_constraint;
+ 
+   /* If ahead is too large in comparison with the number of available
+      prefetches, unroll the loop as much as needed to be able to prefetch
+      at least partially some of the references in the loop.  */
+   ahead_factor = ((ahead + SIMULTANEOUS_PREFETCHES - 1)
+ 		  / SIMULTANEOUS_PREFETCHES);
+ 
+   /* Unroll as much as useful, but bound the code size growth.  */
+   if (constraint_factor < ahead_factor)
+     factor = ahead_factor;
+   else
+     factor = constraint_factor;
+   if (factor > upper_bound)
+     factor = upper_bound;
+ 
+   return factor;
+ }
+ 
+ /* Issue prefetch instructions for array references in LOOP.  Returns
+    true if the LOOP was unrolled.  LOOPS is the array containing all
+    loops.  */
+ 
+ static bool
+ loop_prefetch_arrays (struct loops *loops, struct loop *loop)
+ {
+   struct mem_ref_group *refs;
+   unsigned ahead, ninsns, unroll_factor;
+   struct tree_niter_desc desc;
+   bool unrolled = false;
+ 
+   /* Step 1: gather the memory references.  */
+   refs = gather_memory_references (loop);
+ 
+   /* Step 2: estimate the reuse effects.  */
+   prune_by_reuse (refs);
+ 
+   if (!anything_to_prefetch_p (refs))
+     goto fail;
+ 
+   /* Step 3: determine the ahead and unroll factor.  */
+ 
+   /* FIXME: We should use not size of the loop, but the average number of
+      instructions executed per iteration of the loop.  */
+   ninsns = tree_num_loop_insns (loop);
+   ahead = (PREFETCH_LATENCY + ninsns - 1) / ninsns;
+   unroll_factor = determine_unroll_factor (loop, refs, ahead, ninsns,
+ 					   &desc);
+   if (dump_file && (dump_flags & TDF_DETAILS))
+     fprintf (dump_file, "Ahead %d, unroll factor %d\n", ahead, unroll_factor);
+ 
+   /* If the loop rolls less than the required unroll factor, prefetching
+      is useless.  */
+   if (unroll_factor > 1
+       && cst_and_fits_in_hwi (desc.niter)
+       && (unsigned HOST_WIDE_INT) int_cst_value (desc.niter) < unroll_factor)
+     goto fail;
+ 
+   /* Step 4: what to prefetch?  */
+   if (!schedule_prefetches (refs, unroll_factor, ahead))
+     goto fail;
+ 
+   /* Step 5: unroll the loop.  TODO -- peeling of first and last few
+      iterations so that we do not issue superfluous prefetches.  */
+   if (unroll_factor != 1)
+     {
+       tree_unroll_loop (loops, loop, unroll_factor,
+ 			single_dom_exit (loop), &desc);
+       unrolled = true;
+     }
+ 
+   /* Step 6: issue the prefetches.  */
+   issue_prefetches (refs, unroll_factor, ahead);
+ 
+ fail:
+   release_mem_refs (refs);
+   return unrolled;
+ }
+ 
+ /* Issue prefetch instructions for array references in LOOPS.  */
+ 
+ void
+ tree_ssa_prefetch_arrays (struct loops *loops)
+ {
+   unsigned i;
+   struct loop *loop;
+   bool unrolled = false;
+ 
+   if (!HAVE_prefetch
+       /* It is possible to ask compiler for say -mtune=i486 -march=pentium4.
+ 	 -mtune=i486 causes us having PREFETCH_BLOCK 0, since this is part
+ 	 of processor costs and i486 does not have prefetch, but
+ 	 -march=pentium4 causes HAVE_prefetch to be true.  Ugh.  */
+       || PREFETCH_BLOCK == 0)
+     return;
+ 
+   initialize_original_copy_tables ();
+ 
+   if (!built_in_decls[BUILT_IN_PREFETCH])
+     {
+       tree type = build_function_type (void_type_node,
+ 				       tree_cons (NULL_TREE,
+ 						  const_ptr_type_node,
+ 						  NULL_TREE));
+       tree decl = lang_hooks.builtin_function ("__builtin_prefetch", type,
+ 			BUILT_IN_PREFETCH, BUILT_IN_NORMAL,
+ 			NULL, NULL_TREE);
+       DECL_IS_NOVOPS (decl) = true;
+       built_in_decls[BUILT_IN_PREFETCH] = decl;
+     }
+ 
+   /* We assume that size of cache line is a power of two, so verify this
+      here.  */
+   gcc_assert ((PREFETCH_BLOCK & (PREFETCH_BLOCK - 1)) == 0);
+ 
+   for (i = loops->num - 1; i > 0; i--)
+     {
+       loop = loops->parray[i];
+ 
+       if (dump_file && (dump_flags & TDF_DETAILS))
+ 	fprintf (dump_file, "Processing loop %d:\n", loop->num);
+ 
+       if (loop)
+ 	unrolled |= loop_prefetch_arrays (loops, loop);
+ 
+       if (dump_file && (dump_flags & TDF_DETAILS))
+ 	fprintf (dump_file, "\n\n");
+     }
+ 
+   if (unrolled)
+     {
+       scev_reset ();
+       cleanup_tree_cfg_loop ();
+     }
+ 
+   free_original_copy_tables ();
+ }
Index: tree-ssa-loop-unswitch.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-ssa-loop-unswitch.c,v
retrieving revision 2.13.2.1
diff -c -3 -p -r2.13.2.1 tree-ssa-loop-unswitch.c
*** tree-ssa-loop-unswitch.c	2 Sep 2005 16:43:15 -0000	2.13.2.1
--- tree-ssa-loop-unswitch.c	14 Sep 2005 14:10:14 -0000
*************** Software Foundation, 51 Franklin Street,
*** 74,80 ****
     shape.  */
  
  static struct loop *tree_unswitch_loop (struct loops *, struct loop *, basic_block,
! 				   tree);
  static bool tree_unswitch_single_loop (struct loops *, struct loop *, int);
  static tree tree_may_unswitch_on (basic_block, struct loop *);
  
--- 74,80 ----
     shape.  */
  
  static struct loop *tree_unswitch_loop (struct loops *, struct loop *, basic_block,
! 					tree);
  static bool tree_unswitch_single_loop (struct loops *, struct loop *, int);
  static tree tree_may_unswitch_on (basic_block, struct loop *);
  
*************** static struct loop *
*** 274,286 ****
  tree_unswitch_loop (struct loops *loops, struct loop *loop,
  		    basic_block unswitch_on, tree cond)
  {
!   basic_block condition_bb;
  
    /* Some sanity checking.  */
    gcc_assert (flow_bb_inside_loop_p (loop, unswitch_on));
    gcc_assert (EDGE_COUNT (unswitch_on->succs) == 2);
    gcc_assert (loop->inner == NULL);
  
    return loop_version (loops, loop, unshare_expr (cond), 
! 		       &condition_bb, false);
  }
--- 274,290 ----
  tree_unswitch_loop (struct loops *loops, struct loop *loop,
  		    basic_block unswitch_on, tree cond)
  {
!   unsigned prob_true;
!   edge edge_true, edge_false;
  
    /* Some sanity checking.  */
    gcc_assert (flow_bb_inside_loop_p (loop, unswitch_on));
    gcc_assert (EDGE_COUNT (unswitch_on->succs) == 2);
    gcc_assert (loop->inner == NULL);
  
+   extract_true_false_edges_from_block (unswitch_on, &edge_true, &edge_false);
+   prob_true = edge_true->probability;
    return loop_version (loops, loop, unshare_expr (cond), 
! 		       NULL, prob_true, prob_true,
! 		       REG_BR_PROB_BASE - prob_true, false);
  }
Index: tree-ssa-loop.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-ssa-loop.c,v
retrieving revision 2.33.2.1
diff -c -3 -p -r2.33.2.1 tree-ssa-loop.c
*** tree-ssa-loop.c	2 Sep 2005 16:43:15 -0000	2.33.2.1
--- tree-ssa-loop.c	14 Sep 2005 14:10:14 -0000
*************** struct tree_opt_pass pass_complete_unrol
*** 406,411 ****
--- 406,445 ----
    0					/* letter */
  };
  
+ /* Prefetching.  */
+ 
+ static void
+ tree_ssa_loop_prefetch (void)
+ {
+   if (!current_loops)
+     return;
+ 
+   tree_ssa_prefetch_arrays (current_loops);
+ }
+ 
+ static bool
+ gate_tree_ssa_loop_prefetch (void)
+ {
+   return flag_prefetch_loop_arrays == 1;
+ }
+ 
+ struct tree_opt_pass pass_loop_prefetch =
+ {
+   "prefetch",				/* name */
+   gate_tree_ssa_loop_prefetch,		/* gate */
+   tree_ssa_loop_prefetch,	       	/* execute */
+   NULL,					/* sub */
+   NULL,					/* next */
+   0,					/* static_pass_number */
+   TV_TREE_PREFETCH,	  		/* tv_id */
+   PROP_cfg | PROP_ssa,			/* properties_required */
+   0,					/* properties_provided */
+   0,					/* properties_destroyed */
+   0,					/* todo_flags_start */
+   TODO_dump_func | TODO_verify_loops,	/* todo_flags_finish */
+   0					/* letter */
+ };
+ 
  /* Induction variable optimizations.  */
  
  static void
Index: doc/invoke.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/invoke.texi,v
retrieving revision 1.655.2.2
diff -c -3 -p -r1.655.2.2 invoke.texi
*** doc/invoke.texi	2 Sep 2005 16:45:08 -0000	1.655.2.2
--- doc/invoke.texi	14 Sep 2005 14:10:18 -0000
*************** Objective-C and Objective-C++ Dialects}.
*** 316,322 ****
  -funsafe-math-optimizations  -funsafe-loop-optimizations  -ffinite-math-only @gol
  -fno-trapping-math  -fno-zero-initialized-in-bss @gol
  -fomit-frame-pointer  -foptimize-register-move @gol
! -foptimize-sibling-calls  -fprefetch-loop-arrays @gol
  -fprofile-generate -fprofile-use @gol
  -fregmove  -frename-registers @gol
  -freorder-blocks  -freorder-blocks-and-partition -freorder-functions @gol
--- 316,322 ----
  -funsafe-math-optimizations  -funsafe-loop-optimizations  -ffinite-math-only @gol
  -fno-trapping-math  -fno-zero-initialized-in-bss @gol
  -fomit-frame-pointer  -foptimize-register-move @gol
! -foptimize-sibling-calls  -fprefetch-loop-arrays -fprefetch-loop-arrays-rtl @gol
  -fprofile-generate -fprofile-use @gol
  -fregmove  -frename-registers @gol
  -freorder-blocks  -freorder-blocks-and-partition -freorder-functions @gol
*************** With this option, the compiler will crea
*** 5080,5086 ****
--- 5080,5088 ----
  local variables when unrolling a loop which can result in superior code.
  
  @item -fprefetch-loop-arrays
+ @itemx -fprefetch-loop-arrays-rtl
  @opindex fprefetch-loop-arrays
+ @opindex fprefetch-loop-arrays-rtl
  If supported by the target machine, generate instructions to prefetch
  memory to improve the performance of loops that access large arrays.
  
*************** Move branches with loop invariant condit
*** 5600,5606 ****
--- 5602,5610 ----
  of the loop on both branches (modified according to result of the condition).
  
  @item -fprefetch-loop-arrays
+ @itemx -fprefetch-loop-arrays-rtl
  @opindex fprefetch-loop-arrays
+ @opindex fprefetch-loop-arrays-rtl
  If supported by the target machine, generate instructions to prefetch
  memory to improve the performance of loops that access large arrays.
  


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]