This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[patch] Tree level array prefetching


Hello,

this patch implements prefetching on tree level.  It is the updated and
upgraded version of the prefetching pass I have developed on lno branch
about a year ago.  I am not sure wheter this type of patch is suitable
at the current stage (most likely not), but anyway, comments are
welcome.

Description of how the pass works can be found at the beginning of
tree-ssa-loop-prefetch.c.  Basically we find memory references, check
for reuses to determine those that do not need to be prefetched and
those that do not need to be prefetched in every iteration, then
we unroll the loop as necessary and inserts the prefetch instructions
(calls to builtin_prefetch).

The patch does not remove the rtl profiling pass in order to keep it
shorter.  It also includes quite a few changes that are necessary or
useful to make other optimizers handle loops after unrolling and with
prefetch instructions (updating of frequencies after loop versioning and
unrolling, making the order of blocks after unrolling more sensible,
change to tree-outof-ssa to prevent TER from increasing register presure
too much in unrolled loops, nicer names for temporary variables created
by store motion, etc.).  I will submit those separately, as they are
interesting regardless of this patch.

The patch was bootstrapped & regtested on i686 and x86_64 with the pass
enabled.  Below are the results (compared with the old rtl prefetching
pass) of spec2000 on athlon; it seems to be a clear win on specint
(with the only noticeable regression on crafty), and performs reasonably
on specfp (although there are significant regressions on few tests;
I tried to investigate a few of these, and they are caused by reasons
that I was not able to fix, like the fact that register allocator
sometimes does not handle to assign registers in an unrolled loop
as well as it does in non-unrolled one).

Zdenek

   164.gzip          1400   199       704    *     1400   188       745    *
   175.vpr           1400   406       345    *     1400   378       370    *
   176.gcc                                   X                             X
   181.mcf           1800   729       247    *     1800   732       246    *
   186.crafty        1000   120       831    *     1000   122       822    *
   197.parser        1800   404       446    *     1800   395       455    *
   252.eon           1300   132       986    *     1300   133       981    *
   253.perlbmk       1800   209       861    *     1800   201       897    *
   254.gap           1100   165       667    *     1100   165       666    *
   255.vortex        1900   234       812    *     1900   233       816    *
   256.bzip2         1500   330       454    *     1500   325       461    *
   300.twolf         3000   817       367    *     3000   778       386    *
   Est. SPECint_base2000              560    
   Est. SPECint2000                                                 572    

   168.wupwise       1600   258       619    *     1600   272       588    *
   171.swim          3100   686       452    *     3100   689       450    *
   172.mgrid         1800   531       339    *     1800   487       370    *
   173.applu         2100   526       400    *     2100   500       420    *
   177.mesa          1400   197       709    *     1400   200       699    *
   178.galgel                                X                             X
   179.art           2600  1459       178    *     2600  1458       178    *
   183.equake        1300   287       453    *     1300   289       450    *
   187.facerec       1900   467       407    *     1900   474       400    *
   188.ammp          2200   626       351    *     2200   614       358    *
   189.lucas         2000   428       468    *     2000   419       478    *
   191.fma3d         2100   385       545    *     2100   392       536    *
   200.sixtrack      1100   229       480    *     1100   219       503    *
   301.apsi          2600   655       397    *     2600   636       409    *
   Est. SPECfp_base2000               426    
   Est. SPECfp2000                                                  430    

	* Makefile.in (tree-ssa-loop-prefetch.o): Add.
	(tree-cfgcleanup.o): Add SCEV_H dependency.
	* bb-reorder.c (copy_bb, duplicate_computed_gotos): Add argument to
	duplicate_block call.
	* cfghooks.c (duplicate_block): Enable to specify place where the
	duplicate should be created.
	* cfghooks.h (duplicate_block): Declaration changed.
	* cfglayout.c (copy_bbs): Enable to specify place where the
	duplicates should be created.
	* cfglayout.h (copy_bbs): Declaration changed.
	* cfgloop.h (loopify, loop_version): Declaration changed.
	* cfgloopmanip.c (loopify, lv_adjust_loop_entry_edge,
	loop_version): Enable to specify how the frequencies
	should be scaled.
	(duplicate_loop_to_header_edge): Specify place where new blocks are
	created.
	* common.opt (fprefetch-loop-arrays-rtl): New.
	* fold-const.c (ptr_difference_const): Use cst_and_fits_in_hwi instead
	of host_integerp.
	* loop-unswitch.c (unswitch_loop): Specify how the frequencies should
	be scaled.
	* modulo-sched.c (sms_schedule): Ditto.
	* passes.c (rest_of_handle_loop_optimize): Handle
	-fprefetch-loop-arrays-rtl.
	* timevar.def (TV_TREE_PREFETCH): New.
	* tracer.c (tail_duplicate): Specify where the new blocks should be
	placed.
	* tree-cfg.c (split_edge_bb_loc): New function.
	(tree_split_edge, tree_duplicate_sese_region): Use it.
	* tree-cfgcleanup.c: Include tree-scalar-evolution.h.
	(cleanup_tree_cfg_loop): Flush scev cache if anything changes.
	* tree-flow.h (tree_ssa_prefetch_arrays, tree_unroll_loop,
	single_dom_exit): Declare.
	* tree-optimize.c (init_tree_optimization_passes): Add
	pass_loop_prefetch and pass_ccp after store motion.
	(execute_todo): Do not call ggc_collect within loop optimizer passes.
	* tree-outof-ssa.c (struct temp_expr_table_d): New field expr_vars.
	(new_temp_expr_table, free_temp_expr_table, check_replaceable,
	find_replaceable_in_bb): Stop TER when a different ssa name for the
	same variable is encountered.
	* tree-pass.h (pass_loop_prefetch): Declare.
	* tree-pretty-print.c (dump_bb_header): Show frequencies of blocks.
	* tree-ssa-loop-im.c (lsm_tmp_name_add, gen_lsm_tmp_name,
	get_lsm_tmp_name): New.
	(schedule_sm): Use get_lsm_tmp_name.
	* tree-ssa-loop-ivopts.c (single_dom_exit): Export.
	(get_address_cost): Check all possible offsets for availability.
	(determine_use_iv_cost_condition): Try both iv elimination and
	expressing the original iv.
	(BEFORE_LOOP_COST): New macro.
	(determine_use_iv_cost_outer, determine_iv_cost): Use it.
	* tree-ssa-loop-manip.c (build_if_stmt, tree_unroll_loop): New
	function.
	* tree-ssa-loop-niter.c (nonzero_p): Export.
	(expand_simple_operations): Consider addresses simple.
	* tree-ssa-loop-prefetch.c: New file.
	* tree-ssa-loop-unswitch.c (tree_unswitch_loop): Specify how the
	frequencies should be scaled.
	* tree-ssa-loop.c (tree_ssa_loop_prefetch, gate_tree_ssa_loop_prefetch,
	pass_loop_prefetch): New.
	* tree-ssa-threadupdate.c (create_block_for_threading): Add argument to
	duplicate_block call.
	* tree-vectorizer.c (slpeel_tree_duplicate_loop_to_edge_cfg): Specify
	where the duplicated blocks should be placed.
	* tree.h (nonzero_p): Declare.
	* doc/invoke.texi (-fprefetch-loop-arrays-rtl): Document.

Index: Makefile.in
===================================================================
RCS file: /cvs/gcc/gcc/gcc/Makefile.in,v
retrieving revision 1.1502
diff -c -3 -p -r1.1502 Makefile.in
*** Makefile.in	9 Jun 2005 13:05:29 -0000	1.1502
--- Makefile.in	12 Jun 2005 20:44:02 -0000
*************** OBJS-common = \
*** 936,942 ****
   tree-ssa-loop-manip.o tree-ssa-threadupdate.o				   \
   tree-vectorizer.o tree-vect-analyze.o tree-vect-transform.o		   \
   tree-ssa-loop-ivcanon.o tree-ssa-propagate.o tree-ssa-address.o	   \
!  tree-ssa-math-opts.o							   \
   tree-ssa-loop-ivopts.o tree-if-conv.o tree-ssa-loop-unswitch.o		   \
   alias.o bb-reorder.o bitmap.o builtins.o caller-save.o calls.o	  	   \
   cfg.o cfganal.o cfgbuild.o cfgcleanup.o cfglayout.o cfgloop.o		   \
--- 936,942 ----
   tree-ssa-loop-manip.o tree-ssa-threadupdate.o				   \
   tree-vectorizer.o tree-vect-analyze.o tree-vect-transform.o		   \
   tree-ssa-loop-ivcanon.o tree-ssa-propagate.o tree-ssa-address.o	   \
!  tree-ssa-math-opts.o tree-ssa-loop-prefetch.o				   \
   tree-ssa-loop-ivopts.o tree-if-conv.o tree-ssa-loop-unswitch.o		   \
   alias.o bb-reorder.o bitmap.o builtins.o caller-save.o calls.o	  	   \
   cfg.o cfganal.o cfgbuild.o cfgcleanup.o cfglayout.o cfgloop.o		   \
*************** tree-cfg.o : tree-cfg.c $(TREE_FLOW_H) $
*** 1761,1769 ****
  tree-cfgcleanup.o : tree-cfg.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \
     $(RTL_H) $(TREE_H) $(TM_P_H) $(EXPR_H) $(GGC_H) $(FLAGS_H) output.h \
     $(DIAGNOSTIC_H) errors.h function.h $(TIMEVAR_H) $(TM_H) coretypes.h \
!    $(TREE_DUMP_H) except.h langhooks.h $(CFGLOOP_H) tree-pass.h \
!    $(CFGLAYOUT_H) $(BASIC_BLOCK_H) hard-reg-set.h $(HASHTAB_H) toplev.h \
!    tree-ssa-propagate.h
  tree-tailcall.o : tree-tailcall.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \
     $(RTL_H) $(TREE_H) $(TM_P_H) function.h $(TM_H) coretypes.h \
     $(TREE_DUMP_H) $(DIAGNOSTIC_H) except.h tree-pass.h $(FLAGS_H) langhooks.h \
--- 1761,1767 ----
  tree-cfgcleanup.o : tree-cfg.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \
     $(RTL_H) $(TREE_H) $(TM_P_H) $(EXPR_H) $(GGC_H) $(FLAGS_H) output.h \
     $(DIAGNOSTIC_H) errors.h function.h $(TIMEVAR_H) $(TM_H) coretypes.h \
!    tree-ssa-propagate.h $(SCEV_H)
  tree-tailcall.o : tree-tailcall.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \
     $(RTL_H) $(TREE_H) $(TM_P_H) function.h $(TM_H) coretypes.h \
     $(TREE_DUMP_H) $(DIAGNOSTIC_H) except.h tree-pass.h $(FLAGS_H) langhooks.h \
*************** tree-ssa-loop-ivopts.o : tree-ssa-loop-i
*** 1832,1837 ****
--- 1830,1841 ----
     tree-pass.h $(GGC_H) $(RECOG_H) insn-config.h $(HASHTAB_H) $(SCEV_H) \
     $(CFGLOOP_H) $(PARAMS_H) langhooks.h $(BASIC_BLOCK_H) hard-reg-set.h \
     tree-chrec.h $(VARRAY_H)
+ tree-ssa-loop-prefetch.o: tree-ssa-loop-prefetch.c $(TREE_FLOW_H) $(CONFIG_H) \
+    $(SYSTEM_H) $(RTL_H) $(TREE_H) $(TM_P_H) $(CFGLOOP_H) $(EXPR_H) \
+    output.h $(DIAGNOSTIC_H) $(TIMEVAR_H) $(TM_H) coretypes.h $(TREE_DUMP_H) \
+    tree-pass.h $(GGC_H) $(RECOG_H) insn-config.h $(HASHTAB_H) $(SCEV_H) \
+    $(CFGLOOP_H) $(PARAMS_H) langhooks.h $(BASIC_BLOCK_H) hard-reg-set.h \
+    tree-chrec.h toplev.h langhooks.h
  tree-ssa-loop-manip.o : tree-ssa-loop-manip.c $(TREE_FLOW_H) $(CONFIG_H) \
     $(SYSTEM_H) $(RTL_H) $(TREE_H) $(TM_P_H) $(CFGLOOP_H) \
     output.h $(DIAGNOSTIC_H) $(TIMEVAR_H) $(TM_H) coretypes.h $(TREE_DUMP_H) \
Index: bb-reorder.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/bb-reorder.c,v
retrieving revision 1.104
diff -c -3 -p -r1.104 bb-reorder.c
*** bb-reorder.c	1 Jun 2005 02:50:51 -0000	1.104
--- bb-reorder.c	12 Jun 2005 20:44:02 -0000
*************** copy_bb (basic_block old_bb, edge e, bas
*** 752,758 ****
  {
    basic_block new_bb;
  
!   new_bb = duplicate_block (old_bb, e);
    BB_COPY_PARTITION (new_bb, old_bb);
  
    gcc_assert (e->dest == new_bb);
--- 752,758 ----
  {
    basic_block new_bb;
  
!   new_bb = duplicate_block (old_bb, e, bb);
    BB_COPY_PARTITION (new_bb, old_bb);
  
    gcc_assert (e->dest == new_bb);
*************** duplicate_computed_gotos (void)
*** 2065,2071 ****
        if (!bitmap_bit_p (candidates, single_succ (bb)->index))
  	continue;
  
!       new_bb = duplicate_block (single_succ (bb), single_succ_edge (bb));
        new_bb->rbi->next = bb->rbi->next;
        bb->rbi->next = new_bb;
        new_bb->rbi->visited = 1;
--- 2065,2071 ----
        if (!bitmap_bit_p (candidates, single_succ (bb)->index))
  	continue;
  
!       new_bb = duplicate_block (single_succ (bb), single_succ_edge (bb), bb);
        new_bb->rbi->next = bb->rbi->next;
        bb->rbi->next = new_bb;
        new_bb->rbi->visited = 1;
Index: cfghooks.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cfghooks.c,v
retrieving revision 1.27
diff -c -3 -p -r1.27 cfghooks.c
*** cfghooks.c	26 May 2005 18:14:38 -0000	1.27
--- cfghooks.c	12 Jun 2005 20:44:02 -0000
*************** can_duplicate_block_p (basic_block bb)
*** 692,701 ****
  }
  
  /* Duplicates basic block BB and redirects edge E to it.  Returns the
!    new basic block.  */
  
  basic_block
! duplicate_block (basic_block bb, edge e)
  {
    edge s, n;
    basic_block new_bb;
--- 692,702 ----
  }
  
  /* Duplicates basic block BB and redirects edge E to it.  Returns the
!    new basic block.  The new basic block is placed after the basic block
!    AFTER.  */
  
  basic_block
! duplicate_block (basic_block bb, edge e, basic_block after)
  {
    edge s, n;
    basic_block new_bb;
*************** duplicate_block (basic_block bb, edge e)
*** 714,719 ****
--- 715,722 ----
  #endif
  
    new_bb = cfg_hooks->duplicate_block (bb);
+   if (after)
+     move_block_after (new_bb, after);
  
    new_bb->loop_depth = bb->loop_depth;
    new_bb->flags = bb->flags;
Index: cfghooks.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cfghooks.h,v
retrieving revision 1.16
diff -c -3 -p -r1.16 cfghooks.h
*** cfghooks.h	26 May 2005 18:14:38 -0000	1.16
--- cfghooks.h	12 Jun 2005 20:44:02 -0000
*************** extern void tidy_fallthru_edges (void);
*** 157,163 ****
  extern void predict_edge (edge e, enum br_predictor predictor, int probability);
  extern bool predicted_by_p (basic_block bb, enum br_predictor predictor);
  extern bool can_duplicate_block_p (basic_block);
! extern basic_block duplicate_block (basic_block, edge);
  extern bool block_ends_with_call_p (basic_block bb);
  extern bool block_ends_with_condjump_p (basic_block bb);
  extern int flow_call_edges_add (sbitmap);
--- 157,163 ----
  extern void predict_edge (edge e, enum br_predictor predictor, int probability);
  extern bool predicted_by_p (basic_block bb, enum br_predictor predictor);
  extern bool can_duplicate_block_p (basic_block);
! extern basic_block duplicate_block (basic_block, edge, basic_block);
  extern bool block_ends_with_call_p (basic_block bb);
  extern bool block_ends_with_condjump_p (basic_block bb);
  extern int flow_call_edges_add (sbitmap);
Index: cfglayout.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cfglayout.c,v
retrieving revision 1.87
diff -c -3 -p -r1.87 cfglayout.c
*** cfglayout.c	3 May 2005 16:35:17 -0000	1.87
--- cfglayout.c	12 Jun 2005 20:44:02 -0000
*************** end:
*** 1218,1229 ****
     is copied, we do not set the new blocks as header or latch.
  
     Created copies of N_EDGES edges in array EDGES are stored in array NEW_EDGES,
!    also in the same order.  */
  
  void
  copy_bbs (basic_block *bbs, unsigned n, basic_block *new_bbs,
  	  edge *edges, unsigned num_edges, edge *new_edges,
! 	  struct loop *base)
  {
    unsigned i, j;
    basic_block bb, new_bb, dom_bb;
--- 1218,1232 ----
     is copied, we do not set the new blocks as header or latch.
  
     Created copies of N_EDGES edges in array EDGES are stored in array NEW_EDGES,
!    also in the same order.
!    
!    Newly created basic blocks are put after the basic block AFTER in the
!    instruction stream, and the order of the blocks in BBS array is preserved.  */
  
  void
  copy_bbs (basic_block *bbs, unsigned n, basic_block *new_bbs,
  	  edge *edges, unsigned num_edges, edge *new_edges,
! 	  struct loop *base, basic_block after)
  {
    unsigned i, j;
    basic_block bb, new_bb, dom_bb;
*************** copy_bbs (basic_block *bbs, unsigned n, 
*** 1234,1240 ****
      {
        /* Duplicate.  */
        bb = bbs[i];
!       new_bb = new_bbs[i] = duplicate_block (bb, NULL);
        bb->rbi->duplicated = 1;
        /* Add to loop.  */
        add_bb_to_loop (new_bb, bb->loop_father->copy);
--- 1237,1244 ----
      {
        /* Duplicate.  */
        bb = bbs[i];
!       new_bb = new_bbs[i] = duplicate_block (bb, NULL, after);
!       after = new_bb;
        bb->rbi->duplicated = 1;
        /* Add to loop.  */
        add_bb_to_loop (new_bb, bb->loop_father->copy);
Index: cfglayout.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cfglayout.h,v
retrieving revision 1.17
diff -c -3 -p -r1.17 cfglayout.h
*** cfglayout.h	31 Mar 2005 14:59:48 -0000	1.17
--- cfglayout.h	12 Jun 2005 20:44:02 -0000
*************** extern void insn_locators_initialize (vo
*** 31,37 ****
  extern void reemit_insn_block_notes (void);
  extern bool can_copy_bbs_p (basic_block *, unsigned);
  extern void copy_bbs (basic_block *, unsigned, basic_block *,
! 		      edge *, unsigned, edge *, struct loop *);
  extern rtx duplicate_insn_chain (rtx, rtx);
  
  #endif /* GCC_CFGLAYOUT_H */
--- 31,38 ----
  extern void reemit_insn_block_notes (void);
  extern bool can_copy_bbs_p (basic_block *, unsigned);
  extern void copy_bbs (basic_block *, unsigned, basic_block *,
! 		      edge *, unsigned, edge *, struct loop *,
! 		      basic_block);
  extern rtx duplicate_insn_chain (rtx, rtx);
  
  #endif /* GCC_CFGLAYOUT_H */
Index: cfgloop.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cfgloop.h,v
retrieving revision 1.46
diff -c -3 -p -r1.46 cfgloop.h
*** cfgloop.h	2 Jun 2005 10:19:12 -0000	1.46
--- cfgloop.h	12 Jun 2005 20:44:02 -0000
*************** extern bool duplicate_loop_to_header_edg
*** 303,311 ****
  					   unsigned, sbitmap, edge, edge *,
  					   unsigned *, int);
  extern struct loop *loopify (struct loops *, edge, edge,
! 			     basic_block, edge, edge, bool);
! struct loop * loop_version (struct loops *, struct loop *, void *,
! 			    basic_block *);			     
  extern bool remove_path (struct loops *, edge);
  extern edge split_loop_bb (basic_block, void *);
  
--- 303,312 ----
  					   unsigned, sbitmap, edge, edge *,
  					   unsigned *, int);
  extern struct loop *loopify (struct loops *, edge, edge,
! 			     basic_block, edge, edge, bool,
! 			     unsigned, unsigned);
! struct loop *loop_version (struct loops *, struct loop *, void *,
! 			   basic_block *, unsigned, unsigned, unsigned, bool);
  extern bool remove_path (struct loops *, edge);
  extern edge split_loop_bb (basic_block, void *);
  
Index: cfgloopmanip.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cfgloopmanip.c,v
retrieving revision 1.47
diff -c -3 -p -r1.47 cfgloopmanip.c
*** cfgloopmanip.c	9 Apr 2005 16:09:10 -0000	1.47
--- cfgloopmanip.c	12 Jun 2005 20:44:02 -0000
*************** scale_loop_frequencies (struct loop *loo
*** 468,479 ****
     LATCH_EDGE source.  HEADER_EDGE is redirected to basic block SWITCH_BB,
     FALSE_EDGE of SWITCH_BB to original destination of HEADER_EDGE and
     TRUE_EDGE of SWITCH_BB to original destination of LATCH_EDGE.
!    Returns newly created loop.  */
  
  struct loop *
  loopify (struct loops *loops, edge latch_edge, edge header_edge, 
  	 basic_block switch_bb, edge true_edge, edge false_edge,
! 	 bool redirect_all_edges)
  {
    basic_block succ_bb = latch_edge->dest;
    basic_block pred_bb = header_edge->src;
--- 468,480 ----
     LATCH_EDGE source.  HEADER_EDGE is redirected to basic block SWITCH_BB,
     FALSE_EDGE of SWITCH_BB to original destination of HEADER_EDGE and
     TRUE_EDGE of SWITCH_BB to original destination of LATCH_EDGE.
!    Returns newly created loop.  Frequencies and counts in the new loop
!    are scaled by NEW_SCALE and in the old one by OLD_SCALE.  */
  
  struct loop *
  loopify (struct loops *loops, edge latch_edge, edge header_edge, 
  	 basic_block switch_bb, edge true_edge, edge false_edge,
! 	 bool redirect_all_edges, unsigned old_scale, unsigned new_scale)
  {
    basic_block succ_bb = latch_edge->dest;
    basic_block pred_bb = header_edge->src;
*************** loopify (struct loops *loops, edge latch
*** 482,488 ****
    sbitmap seen;
    struct loop *loop = xcalloc (1, sizeof (struct loop));
    struct loop *outer = succ_bb->loop_father->outer;
!   int freq, prob, tot_prob;
    gcov_type cnt;
    edge e;
    edge_iterator ei;
--- 483,489 ----
    sbitmap seen;
    struct loop *loop = xcalloc (1, sizeof (struct loop));
    struct loop *outer = succ_bb->loop_father->outer;
!   int freq;
    gcov_type cnt;
    edge e;
    edge_iterator ei;
*************** loopify (struct loops *loops, edge latch
*** 492,501 ****
  
    freq = EDGE_FREQUENCY (header_edge);
    cnt = header_edge->count;
-   prob = EDGE_SUCC (switch_bb, 0)->probability;
-   tot_prob = prob + EDGE_SUCC (switch_bb, 1)->probability;
-   if (tot_prob == 0)
-     tot_prob = 1;
  
    /* Redirect edges.  */
    loop_redirect_edge (latch_edge, loop->header);
--- 493,498 ----
*************** loopify (struct loops *loops, edge latch
*** 523,534 ****
    add_bb_to_loop (switch_bb, outer);
  
    /* Fix frequencies.  */
!   switch_bb->frequency = freq;
!   switch_bb->count = cnt;
!   FOR_EACH_EDGE (e, ei, switch_bb->succs)
!     e->count = (switch_bb->count * e->probability) / REG_BR_PROB_BASE;
!   scale_loop_frequencies (loop, prob, tot_prob);
!   scale_loop_frequencies (succ_bb->loop_father, tot_prob - prob, tot_prob);
  
    /* Update dominators of blocks outside of LOOP.  */
    dom_bbs = xcalloc (n_basic_blocks, sizeof (basic_block));
--- 520,536 ----
    add_bb_to_loop (switch_bb, outer);
  
    /* Fix frequencies.  */
!   if (redirect_all_edges)
!     {
!       switch_bb->frequency = freq;
!       switch_bb->count = cnt;
!       FOR_EACH_EDGE (e, ei, switch_bb->succs)
! 	{
! 	  e->count = (switch_bb->count * e->probability) / REG_BR_PROB_BASE;
! 	}
!     }
!   scale_loop_frequencies (loop, old_scale, REG_BR_PROB_BASE);
!   scale_loop_frequencies (succ_bb->loop_father, new_scale, REG_BR_PROB_BASE);
  
    /* Update dominators of blocks outside of LOOP.  */
    dom_bbs = xcalloc (n_basic_blocks, sizeof (basic_block));
*************** duplicate_loop_to_header_edge (struct lo
*** 860,865 ****
--- 862,868 ----
    int p, freq_in, freq_le, freq_out_orig;
    int prob_pass_thru, prob_pass_wont_exit, prob_pass_main;
    int add_irreducible_flag;
+   basic_block place_after;
  
    gcc_assert (e->dest == loop->header);
    gcc_assert (ndupl > 0);
*************** duplicate_loop_to_header_edge (struct lo
*** 871,877 ****
        gcc_assert (!flow_bb_inside_loop_p (loop, orig->dest));
      }
  
!   bbs = get_loop_body (loop);
  
    /* Check whether duplication is possible.  */
    if (!can_copy_bbs_p (bbs, loop->num_nodes))
--- 874,883 ----
        gcc_assert (!flow_bb_inside_loop_p (loop, orig->dest));
      }
  
!   n = loop->num_nodes;
!   bbs = get_loop_body_in_dom_order (loop);
!   gcc_assert (bbs[0] == loop->header);
!   gcc_assert (bbs[n  - 1] == loop->latch);
  
    /* Check whether duplication is possible.  */
    if (!can_copy_bbs_p (bbs, loop->num_nodes))
*************** duplicate_loop_to_header_edge (struct lo
*** 908,917 ****
  
        scale_step = xmalloc (ndupl * sizeof (int));
  
! 	for (i = 1; i <= ndupl; i++)
! 	  scale_step[i - 1] = TEST_BIT (wont_exit, i)
! 				? prob_pass_wont_exit
! 				: prob_pass_thru;
  
        if (is_latch)
  	{
--- 914,923 ----
  
        scale_step = xmalloc (ndupl * sizeof (int));
  
!       for (i = 1; i <= ndupl; i++)
! 	scale_step[i - 1] = TEST_BIT (wont_exit, i)
! 		? prob_pass_wont_exit
! 		: prob_pass_thru;
  
        if (is_latch)
  	{
*************** duplicate_loop_to_header_edge (struct lo
*** 954,961 ****
  
    loop->copy = target;
  
-   n = loop->num_nodes;
- 
    first_active = xmalloc (n * sizeof (basic_block));
    if (is_latch)
      {
--- 960,965 ----
*************** duplicate_loop_to_header_edge (struct lo
*** 974,986 ****
    spec_edges[SE_ORIG] = orig;
    spec_edges[SE_LATCH] = latch_edge;
  
    for (j = 0; j < ndupl; j++)
      {
        /* Copy loops.  */
        copy_loops_to (loops, orig_loops, n_orig_loops, target);
  
        /* Copy bbs.  */
!       copy_bbs (bbs, n, new_bbs, spec_edges, 2, new_spec_edges, loop);
  
        for (i = 0; i < n; i++)
  	new_bbs[i]->rbi->copy_number = j + 1;
--- 978,993 ----
    spec_edges[SE_ORIG] = orig;
    spec_edges[SE_LATCH] = latch_edge;
  
+   place_after = e->src;
    for (j = 0; j < ndupl; j++)
      {
        /* Copy loops.  */
        copy_loops_to (loops, orig_loops, n_orig_loops, target);
  
        /* Copy bbs.  */
!       copy_bbs (bbs, n, new_bbs, spec_edges, 2, new_spec_edges, loop,
! 		place_after);
!       place_after = new_spec_edges[SE_LATCH]->src;
  
        for (i = 0; i < n; i++)
  	new_bbs[i]->rbi->copy_number = j + 1;
*************** duplicate_loop_to_header_edge (struct lo
*** 1014,1020 ****
  	  redirect_edge_and_branch_force (new_spec_edges[SE_LATCH],
  					  loop->header);
  	  set_immediate_dominator (CDI_DOMINATORS, new_bbs[0], latch);
! 	  latch = loop->latch = new_bbs[1];
  	  e = latch_edge = new_spec_edges[SE_LATCH];
  	}
        else
--- 1021,1027 ----
  	  redirect_edge_and_branch_force (new_spec_edges[SE_LATCH],
  					  loop->header);
  	  set_immediate_dominator (CDI_DOMINATORS, new_bbs[0], latch);
! 	  latch = loop->latch = new_bbs[n - 1];
  	  e = latch_edge = new_spec_edges[SE_LATCH];
  	}
        else
*************** duplicate_loop_to_header_edge (struct lo
*** 1035,1041 ****
        if (!first_active_latch)
  	{
  	  memcpy (first_active, new_bbs, n * sizeof (basic_block));
! 	  first_active_latch = new_bbs[1];
  	}
  
        /* Set counts and frequencies.  */
--- 1042,1048 ----
        if (!first_active_latch)
  	{
  	  memcpy (first_active, new_bbs, n * sizeof (basic_block));
! 	  first_active_latch = new_bbs[n - 1];
  	}
  
        /* Set counts and frequencies.  */
*************** create_loop_notes (void)
*** 1368,1380 ****
      --- edge e ---> [cond expr] ---> [first_head]
                          |
                          +---------> [second_head]
! */
  
  static basic_block
  lv_adjust_loop_entry_edge (basic_block first_head,
  			   basic_block second_head,
  			   edge e,
! 			   tree cond_expr)
  {
    basic_block new_head = NULL;
    edge e1;
--- 1375,1389 ----
      --- edge e ---> [cond expr] ---> [first_head]
                          |
                          +---------> [second_head]
! 
!   THEN_PROB is the probability of then branch of the condition.  */
  
  static basic_block
  lv_adjust_loop_entry_edge (basic_block first_head,
  			   basic_block second_head,
  			   edge e,
! 			   tree cond_expr,
! 			   unsigned then_prob)
  {
    basic_block new_head = NULL;
    edge e1;
*************** lv_adjust_loop_entry_edge (basic_block f
*** 1385,1395 ****
       insert conditional expr.  */
    new_head = split_edge (e);
  
- 
    lv_add_condition_to_bb (first_head, second_head, new_head,
  			  cond_expr);
  
    e1 = make_edge (new_head, first_head, EDGE_TRUE_VALUE);
    set_immediate_dominator (CDI_DOMINATORS, first_head, new_head);
    set_immediate_dominator (CDI_DOMINATORS, second_head, new_head);
  
--- 1394,1409 ----
       insert conditional expr.  */
    new_head = split_edge (e);
  
    lv_add_condition_to_bb (first_head, second_head, new_head,
  			  cond_expr);
  
+   e = single_succ_edge (new_head);
    e1 = make_edge (new_head, first_head, EDGE_TRUE_VALUE);
+   e1->probability = then_prob;
+   e->probability = REG_BR_PROB_BASE - then_prob;
+   e1->count = RDIV (e->count * e1->probability, REG_BR_PROB_BASE);
+   e->count = RDIV (e->count * e->probability, REG_BR_PROB_BASE);
+ 
    set_immediate_dominator (CDI_DOMINATORS, first_head, new_head);
    set_immediate_dominator (CDI_DOMINATORS, second_head, new_head);
  
*************** lv_adjust_loop_entry_edge (basic_block f
*** 1401,1421 ****
  
  /* Main entry point for Loop Versioning transformation.
     
! This transformation given a condition and a loop, creates
! -if (condition) { loop_copy1 } else { loop_copy2 },
! where loop_copy1 is the loop transformed in one way, and loop_copy2
! is the loop transformed in another way (or unchanged). 'condition'
! may be a run time test for things that were not resolved by static
! analysis (overlapping ranges (anti-aliasing), alignment, etc.).  */
  
  struct loop *
  loop_version (struct loops *loops, struct loop * loop, 
! 	      void *cond_expr, basic_block *condition_bb)
  {
    basic_block first_head, second_head;
    edge entry, latch_edge, exit, true_edge, false_edge;
    int irred_flag;
    struct loop *nloop;
  
    /* CHECKME: Loop versioning does not handle nested loop at this point.  */
    if (loop->inner)
--- 1415,1443 ----
  
  /* Main entry point for Loop Versioning transformation.
     
!    This transformation given a condition and a loop, creates
!    -if (condition) { loop_copy1 } else { loop_copy2 },
!    where loop_copy1 is the loop transformed in one way, and loop_copy2
!    is the loop transformed in another way (or unchanged). 'condition'
!    may be a run time test for things that were not resolved by static
!    analysis (overlapping ranges (anti-aliasing), alignment, etc.).
! 
!    THEN_PROB is the probability of the then edge of the if.
!    
!    If PLACE_AFTER is true, we place the new loop after LOOP in the
!    instruction stream, otherwise it is placed before LOOP.  */
  
  struct loop *
  loop_version (struct loops *loops, struct loop * loop, 
! 	      void *cond_expr, basic_block *condition_bb,
! 	      unsigned then_prob, unsigned then_scale, unsigned else_scale,
! 	      bool place_after)
  {
    basic_block first_head, second_head;
    edge entry, latch_edge, exit, true_edge, false_edge;
    int irred_flag;
    struct loop *nloop;
+   basic_block cond_bb;
  
    /* CHECKME: Loop versioning does not handle nested loop at this point.  */
    if (loop->inner)
*************** loop_version (struct loops *loops, struc
*** 1439,1447 ****
    second_head = entry->dest;
  
    /* Split loop entry edge and insert new block with cond expr.  */
!   *condition_bb =  lv_adjust_loop_entry_edge (first_head, second_head,
! 					      entry, cond_expr);
!   if (!*condition_bb)
      {
        entry->flags |= irred_flag;
        return NULL;
--- 1461,1473 ----
    second_head = entry->dest;
  
    /* Split loop entry edge and insert new block with cond expr.  */
! 
!   cond_bb = lv_adjust_loop_entry_edge (first_head, second_head,
! 				       entry, cond_expr, then_prob);
!   if (condition_bb)
!     *condition_bb = cond_bb;
! 
!   if (!cond_bb)
      {
        entry->flags |= irred_flag;
        return NULL;
*************** loop_version (struct loops *loops, struc
*** 1449,1460 ****
  
    latch_edge = single_succ_edge (loop->latch->rbi->copy);
    
!   extract_cond_bb_edges (*condition_bb, &true_edge, &false_edge);
    nloop = loopify (loops,
  		   latch_edge,
  		   single_pred_edge (loop->header->rbi->copy),
! 		   *condition_bb, true_edge, false_edge,
! 		   false /* Do not redirect all edges.  */);
  
    exit = loop->single_exit;
    if (exit)
--- 1475,1487 ----
  
    latch_edge = single_succ_edge (loop->latch->rbi->copy);
    
!   extract_cond_bb_edges (cond_bb, &true_edge, &false_edge);
    nloop = loopify (loops,
  		   latch_edge,
  		   single_pred_edge (loop->header->rbi->copy),
! 		   cond_bb, true_edge, false_edge,
! 		   false /* Do not redirect all edges.  */,
! 		   then_scale, else_scale);
  
    exit = loop->single_exit;
    if (exit)
*************** loop_version (struct loops *loops, struc
*** 1464,1478 ****
    lv_flush_pending_stmts (latch_edge);
  
    /* loopify redirected condition_bb's succ edge. Update its PENDING_STMTS.  */ 
!   extract_cond_bb_edges (*condition_bb, &true_edge, &false_edge);
    lv_flush_pending_stmts (false_edge);
    /* Adjust irreducible flag.  */
    if (irred_flag)
      {
!       (*condition_bb)->flags |= BB_IRREDUCIBLE_LOOP;
        loop_preheader_edge (loop)->flags |= EDGE_IRREDUCIBLE_LOOP;
        loop_preheader_edge (nloop)->flags |= EDGE_IRREDUCIBLE_LOOP;
!       single_pred_edge ((*condition_bb))->flags |= EDGE_IRREDUCIBLE_LOOP;
      }
  
    /* At this point condition_bb is loop predheader with two successors, 
--- 1491,1520 ----
    lv_flush_pending_stmts (latch_edge);
  
    /* loopify redirected condition_bb's succ edge. Update its PENDING_STMTS.  */ 
!   extract_cond_bb_edges (cond_bb, &true_edge, &false_edge);
    lv_flush_pending_stmts (false_edge);
    /* Adjust irreducible flag.  */
    if (irred_flag)
      {
!       cond_bb->flags |= BB_IRREDUCIBLE_LOOP;
        loop_preheader_edge (loop)->flags |= EDGE_IRREDUCIBLE_LOOP;
        loop_preheader_edge (nloop)->flags |= EDGE_IRREDUCIBLE_LOOP;
!       single_pred_edge (cond_bb)->flags |= EDGE_IRREDUCIBLE_LOOP;
!     }
! 
!   if (place_after)
!     {
!       basic_block *bbs = get_loop_body_in_dom_order (nloop), after;
!       unsigned i;
! 
!       after = loop->latch;
! 
!       for (i = 0; i < nloop->num_nodes; i++)
! 	{
! 	  move_block_after (bbs[i], after);
! 	  after = bbs[i];
! 	}
!       free (bbs);
      }
  
    /* At this point condition_bb is loop predheader with two successors, 
Index: common.opt
===================================================================
RCS file: /cvs/gcc/gcc/gcc/common.opt,v
retrieving revision 1.73
diff -c -3 -p -r1.73 common.opt
*** common.opt	4 Jun 2005 17:07:55 -0000	1.73
--- common.opt	12 Jun 2005 20:44:02 -0000
*************** Common Report Var(flag_pie,1) VarExists
*** 615,621 ****
  Generate position-independent code for executables if possible (small mode)
  
  fprefetch-loop-arrays
! Common Report Var(flag_prefetch_loop_arrays)
  Generate prefetch instructions, if available, for arrays in loops
  
  fprofile
--- 615,625 ----
  Generate position-independent code for executables if possible (small mode)
  
  fprefetch-loop-arrays
! Common Report Var(flag_prefetch_loop_arrays,1)
! Generate prefetch instructions, if available, for arrays in loops
! 
! fprefetch-loop-arrays-rtl
! Common Report Var(flag_prefetch_loop_arrays,2)
  Generate prefetch instructions, if available, for arrays in loops
  
  fprofile
Index: fold-const.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/fold-const.c,v
retrieving revision 1.593
diff -c -3 -p -r1.593 fold-const.c
*** fold-const.c	12 Jun 2005 08:03:09 -0000	1.593
--- fold-const.c	12 Jun 2005 20:44:03 -0000
*************** ptr_difference_const (tree e1, tree e2, 
*** 11841,11850 ****
  	toffset2 = fold_convert (type, toffset2);
  
        tdiff = fold_build2 (MINUS_EXPR, type, toffset1, toffset2);
!       if (!host_integerp (tdiff, 0))
  	return false;
  
!       *diff = tree_low_cst (tdiff, 0);
      }
    else if (toffset1 || toffset2)
      {
--- 11841,11850 ----
  	toffset2 = fold_convert (type, toffset2);
  
        tdiff = fold_build2 (MINUS_EXPR, type, toffset1, toffset2);
!       if (!cst_and_fits_in_hwi (tdiff))
  	return false;
  
!       *diff = int_cst_value (tdiff);
      }
    else if (toffset1 || toffset2)
      {
Index: loop-unswitch.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/loop-unswitch.c,v
retrieving revision 1.30
diff -c -3 -p -r1.30 loop-unswitch.c
*** loop-unswitch.c	3 May 2005 13:09:33 -0000	1.30
--- loop-unswitch.c	12 Jun 2005 20:44:03 -0000
*************** unswitch_loop (struct loops *loops, stru
*** 466,472 ****
    /* Loopify from the copy of LOOP body, constructing the new loop.  */
    nloop = loopify (loops, latch_edge,
  		   single_pred_edge (loop->header->rbi->copy), switch_bb,
! 		   BRANCH_EDGE (switch_bb), FALLTHRU_EDGE (switch_bb), true);
  
    /* Remove branches that are now unreachable in new loops.  */
    remove_path (loops, true_edge);
--- 466,473 ----
    /* Loopify from the copy of LOOP body, constructing the new loop.  */
    nloop = loopify (loops, latch_edge,
  		   single_pred_edge (loop->header->rbi->copy), switch_bb,
! 		   BRANCH_EDGE (switch_bb), FALLTHRU_EDGE (switch_bb), true,
! 		   prob, REG_BR_PROB_BASE - prob);
  
    /* Remove branches that are now unreachable in new loops.  */
    remove_path (loops, true_edge);
Index: modulo-sched.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/modulo-sched.c,v
retrieving revision 1.34
diff -c -3 -p -r1.34 modulo-sched.c
*** modulo-sched.c	7 Jun 2005 14:30:20 -0000	1.34
--- modulo-sched.c	12 Jun 2005 20:44:03 -0000
*************** build_loops_structure (FILE *dumpfile)
*** 930,935 ****
--- 930,939 ----
    return loops;
  }
  
+ /* Probability in % that the sms-ed loop rolls enough so that optimized
+    version may be entered.  Just a guess.  */
+ #define PROB_SMS_ENOUGH_ITERATIONS 80
+ 
  /* Main entry point, perform SMS scheduling on the loops of the function
     that consist of single basic blocks.  */
  void
*************** sms_schedule (FILE *dump_file)
*** 945,952 ****
    struct df *df;
    struct loops *loops;
    basic_block bb = NULL;
-   /* vars to the versioning only if needed*/
-   struct loop * nloop;
    basic_block condition_bb = NULL;
    edge latch_edge;
    gcov_type trip_count = 0;
--- 949,954 ----
*************** sms_schedule (FILE *dump_file)
*** 1257,1265 ****
  	      if (count_reg && ! count_init)
  		{
  		  rtx comp_rtx = gen_rtx_fmt_ee (GT, VOIDmode, count_reg,
! 						 GEN_INT(stage_count));
! 
! 		  nloop = loop_version (loops, loop, comp_rtx, &condition_bb);
  		}
  
  	      /* Set new iteration count of loop kernel.  */
--- 1259,1271 ----
  	      if (count_reg && ! count_init)
  		{
  		  rtx comp_rtx = gen_rtx_fmt_ee (GT, VOIDmode, count_reg,
! 						 GEN_INT (stage_count));
! 		  unsigned prob = (PROB_SMS_ENOUGH_ITERATIONS
! 				   * REG_BR_PROB_BASE) / 100;
! 
! 		  loop_version (loops, loop, comp_rtx, &condition_bb,
! 				prob, prob, REG_BR_PROB_BASE - prob,
! 				true);
  		}
  
  	      /* Set new iteration count of loop kernel.  */
Index: passes.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/passes.c,v
retrieving revision 2.92
diff -c -3 -p -r2.92 passes.c
*** passes.c	9 Jun 2005 16:21:35 -0000	2.92
--- passes.c	12 Jun 2005 20:44:03 -0000
*************** rest_of_handle_loop_optimize (void)
*** 1101,1107 ****
    free_bb_for_insn ();
    profile_status = PROFILE_ABSENT;
  
!   do_prefetch = flag_prefetch_loop_arrays ? LOOP_PREFETCH : 0;
  
    if (flag_rerun_loop_opt)
      {
--- 1101,1107 ----
    free_bb_for_insn ();
    profile_status = PROFILE_ABSENT;
  
!   do_prefetch = flag_prefetch_loop_arrays == 2 ? LOOP_PREFETCH : 0;
  
    if (flag_rerun_loop_opt)
      {
Index: timevar.def
===================================================================
RCS file: /cvs/gcc/gcc/gcc/timevar.def,v
retrieving revision 1.50
diff -c -3 -p -r1.50 timevar.def
*** timevar.def	6 Jun 2005 18:55:58 -0000	1.50
--- timevar.def	12 Jun 2005 20:44:03 -0000
*************** DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH    , "
*** 102,107 ****
--- 102,108 ----
  DEFTIMEVAR (TV_COMPLETE_UNROLL       , "complete unrolling")
  DEFTIMEVAR (TV_TREE_VECTORIZATION    , "tree vectorization")
  DEFTIMEVAR (TV_TREE_LINEAR_TRANSFORM , "tree loop linear")
+ DEFTIMEVAR (TV_TREE_PREFETCH	     , "tree prefetching")
  DEFTIMEVAR (TV_TREE_LOOP_IVOPTS	     , "tree iv optimization")
  DEFTIMEVAR (TV_TREE_LOOP_INIT	     , "tree loop init")
  DEFTIMEVAR (TV_TREE_LOOP_FINI	     , "tree loop fini")
Index: tracer.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tracer.c,v
retrieving revision 1.25
diff -c -3 -p -r1.25 tracer.c
*** tracer.c	12 Mar 2005 00:34:04 -0000	1.25
--- tracer.c	12 Jun 2005 20:44:03 -0000
*************** tail_duplicate (void)
*** 280,286 ****
  	      e = find_edge (bb, bb2);
  
  	      nduplicated += counts [bb2->index];
! 	      bb2 = duplicate_block (bb2, e);
  
  	      /* Reconsider the original copy of block we've duplicated.
  	         Removing the most common predecessor may make it to be
--- 280,286 ----
  	      e = find_edge (bb, bb2);
  
  	      nduplicated += counts [bb2->index];
! 	      bb2 = duplicate_block (bb2, e, bb);
  
  	      /* Reconsider the original copy of block we've duplicated.
  	         Removing the most common predecessor may make it to be
Index: tree-cfg.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-cfg.c,v
retrieving revision 2.203
diff -c -3 -p -r2.203 tree-cfg.c
*** tree-cfg.c	5 Jun 2005 18:07:40 -0000	2.203
--- tree-cfg.c	12 Jun 2005 20:44:03 -0000
*************** reinstall_phi_args (edge new_edge, edge 
*** 3012,3017 ****
--- 3012,3033 ----
    PENDING_STMT (old_edge) = NULL;
  }
  
+ /* Returns the basic block after that the new basic block created
+    by splitting edge EDGE_IN should be placed.  Tries to keep the new block
+    near its "logical" location.  This is of most help to humans looking
+    at debugging dumps.  */
+ 
+ static basic_block
+ split_edge_bb_loc (edge edge_in)
+ {
+   basic_block dest = edge_in->dest;
+ 
+   if (dest->prev_bb && find_edge (dest->prev_bb, dest))
+     return edge_in->src;
+   else
+     return dest->prev_bb;
+ }
+ 
  /* Split a (typically critical) edge EDGE_IN.  Return the new block.
     Abort on abnormal edges.  */
  
*************** tree_split_edge (edge edge_in)
*** 3027,3039 ****
    src = edge_in->src;
    dest = edge_in->dest;
  
!   /* Place the new block in the block list.  Try to keep the new block
!      near its "logical" location.  This is of most help to humans looking
!      at debugging dumps.  */
!   if (dest->prev_bb && find_edge (dest->prev_bb, dest))
!     after_bb = edge_in->src;
!   else
!     after_bb = dest->prev_bb;
  
    new_bb = create_empty_bb (after_bb);
    new_bb->frequency = EDGE_FREQUENCY (edge_in);
--- 3043,3049 ----
    src = edge_in->src;
    dest = edge_in->dest;
  
!   after_bb = split_edge_bb_loc (edge_in);
  
    new_bb = create_empty_bb (after_bb);
    new_bb->frequency = EDGE_FREQUENCY (edge_in);
*************** tree_duplicate_sese_region (edge entry, 
*** 4309,4315 ****
    else if (entry_freq > total_freq)
      entry_freq = total_freq;
  
!   copy_bbs (region, n_region, region_copy, &exit, 1, &exit_copy, loop);
    scale_bbs_frequencies_int (region, n_region, total_freq - entry_freq,
  			     total_freq);
    scale_bbs_frequencies_int (region_copy, n_region, entry_freq, total_freq);
--- 4319,4326 ----
    else if (entry_freq > total_freq)
      entry_freq = total_freq;
  
!   copy_bbs (region, n_region, region_copy, &exit, 1, &exit_copy, loop,
! 	    split_edge_bb_loc (entry));
    scale_bbs_frequencies_int (region, n_region, total_freq - entry_freq,
  			     total_freq);
    scale_bbs_frequencies_int (region_copy, n_region, entry_freq, total_freq);
Index: tree-cfgcleanup.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-cfgcleanup.c,v
retrieving revision 2.1
diff -c -3 -p -r2.1 tree-cfgcleanup.c
*** tree-cfgcleanup.c	28 May 2005 21:09:17 -0000	2.1
--- tree-cfgcleanup.c	12 Jun 2005 20:44:03 -0000
*************** Boston, MA 02111-1307, USA.  */
*** 44,49 ****
--- 44,50 ----
  #include "cfgloop.h"
  #include "cfglayout.h"
  #include "hashtab.h"
+ #include "tree-scalar-evolution.h"
  #include "tree-ssa-propagate.h"
  
  /* Remove any fallthru edge from EV.  Return true if an edge was removed.  */
*************** cleanup_tree_cfg (void)
*** 536,558 ****
  void
  cleanup_tree_cfg_loop (void)
  {
!   bitmap changed_bbs = BITMAP_ALLOC (NULL);
  
!   cleanup_tree_cfg ();
! 
!   fix_loop_structure (current_loops, changed_bbs);
!   calculate_dominance_info (CDI_DOMINATORS);
  
!   /* This usually does nothing.  But sometimes parts of cfg that originally
!      were inside a loop get out of it due to edge removal (since they
!      become unreachable by back edges from latch).  */
!   rewrite_into_loop_closed_ssa (changed_bbs, TODO_update_ssa);
  
!   BITMAP_FREE (changed_bbs);
  
  #ifdef ENABLE_CHECKING
!   verify_loop_structure (current_loops);
  #endif
  }
  
  /* Merge the PHI nodes at BB into those at BB's sole successor.  */
--- 537,562 ----
  void
  cleanup_tree_cfg_loop (void)
  {
!   bool changed = cleanup_tree_cfg ();
  
!   if (changed)
!     {
!       bitmap changed_bbs = BITMAP_ALLOC (NULL);
!       fix_loop_structure (current_loops, changed_bbs);
!       calculate_dominance_info (CDI_DOMINATORS);
  
!       /* This usually does nothing.  But sometimes parts of cfg that originally
! 	 were inside a loop get out of it due to edge removal (since they
! 	 become unreachable by back edges from latch).  */
!       rewrite_into_loop_closed_ssa (changed_bbs, TODO_update_ssa);
  
!       BITMAP_FREE (changed_bbs);
  
  #ifdef ENABLE_CHECKING
!       verify_loop_structure (current_loops);
  #endif
+       scev_reset ();
+     }
  }
  
  /* Merge the PHI nodes at BB into those at BB's sole successor.  */
Index: tree-flow.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-flow.h,v
retrieving revision 2.120
diff -c -3 -p -r2.120 tree-flow.h
*** tree-flow.h	9 Jun 2005 13:05:35 -0000	2.120
--- tree-flow.h	12 Jun 2005 20:44:03 -0000
*************** void tree_ssa_lim (struct loops *);
*** 650,655 ****
--- 650,656 ----
  void tree_ssa_unswitch_loops (struct loops *);
  void canonicalize_induction_variables (struct loops *);
  void tree_unroll_loops_completely (struct loops *, bool);
+ void tree_ssa_prefetch_arrays (struct loops *);
  void tree_ssa_iv_optimize (struct loops *);
  
  bool number_of_iterations_exit (struct loop *, edge,
*************** bool tree_duplicate_loop_to_header_edge 
*** 677,684 ****
--- 678,688 ----
  					 unsigned int, sbitmap,
  					 edge, edge *,
  					 unsigned int *, int);
+ void tree_unroll_loop (struct loops *, struct loop *, unsigned, edge,
+ 		       struct tree_niter_desc *);
  struct loop *tree_ssa_loop_version (struct loops *, struct loop *, tree,
  				    basic_block *);
+ edge single_dom_exit (const struct loop *);
  tree expand_simple_operations (tree);
  void substitute_in_loop_info (struct loop *, tree, tree);
  
Index: tree-optimize.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-optimize.c,v
retrieving revision 2.107
diff -c -3 -p -r2.107 tree-optimize.c
*** tree-optimize.c	9 Jun 2005 13:05:36 -0000	2.107
--- tree-optimize.c	12 Jun 2005 20:44:03 -0000
*************** init_tree_optimization_passes (void)
*** 480,485 ****
--- 480,486 ----
    NEXT_PASS (pass_lim);
    NEXT_PASS (pass_unswitch);
    NEXT_PASS (pass_scev_cprop);
+   NEXT_PASS (pass_ccp);
    NEXT_PASS (pass_record_bounds);
    NEXT_PASS (pass_linear_transform);
    NEXT_PASS (pass_iv_canon);
*************** init_tree_optimization_passes (void)
*** 490,495 ****
--- 491,497 ----
       pass_may_alias.  */
    NEXT_PASS (pass_lower_vector_ssa);
    NEXT_PASS (pass_complete_unroll);
+   NEXT_PASS (pass_loop_prefetch);
    NEXT_PASS (pass_iv_optimize);
    NEXT_PASS (pass_loop_done);
    *p = NULL;
*************** execute_todo (struct tree_opt_pass *pass
*** 558,564 ****
        fflush (dump_file);
      }
  
!   if (flags & TODO_ggc_collect)
      {
        ggc_collect ();
      }
--- 560,569 ----
        fflush (dump_file);
      }
  
!   if ((flags & TODO_ggc_collect)
!       /* Loop structures are not ggc-safe.  Temporary hack until the problems
! 	 are solved.  */
!       && !current_loops)
      {
        ggc_collect ();
      }
Index: tree-outof-ssa.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-outof-ssa.c,v
retrieving revision 2.63
diff -c -3 -p -r2.63 tree-outof-ssa.c
*** tree-outof-ssa.c	7 Jun 2005 14:30:22 -0000	2.63
--- tree-outof-ssa.c	12 Jun 2005 20:44:03 -0000
*************** typedef struct value_expr_d 
*** 1206,1212 ****
  typedef struct temp_expr_table_d 
  {
    var_map map;
!   void **version_info;		
    value_expr_p *partition_dep_list;
    bitmap replaceable;
    bool saw_replaceable;
--- 1206,1213 ----
  typedef struct temp_expr_table_d 
  {
    var_map map;
!   void **version_info;
!   bitmap *expr_vars;
    value_expr_p *partition_dep_list;
    bitmap replaceable;
    bool saw_replaceable;
*************** new_temp_expr_table (var_map map)
*** 1251,1256 ****
--- 1252,1258 ----
    t->map = map;
  
    t->version_info = xcalloc (num_ssa_names + 1, sizeof (void *));
+   t->expr_vars = xcalloc (num_ssa_names + 1, sizeof (bitmap));
    t->partition_dep_list = xcalloc (num_var_partitions (map) + 1, 
  				   sizeof (value_expr_p));
  
*************** free_temp_expr_table (temp_expr_table_p 
*** 1274,1279 ****
--- 1276,1282 ----
  {
    value_expr_p p;
    tree *ret = NULL;
+   unsigned i;
  
  #ifdef ENABLE_CHECKING
    unsigned x;
*************** free_temp_expr_table (temp_expr_table_p 
*** 1290,1295 ****
--- 1293,1303 ----
    BITMAP_FREE (t->partition_in_use);
    BITMAP_FREE (t->replaceable);
  
+   for (i = 0; i <= num_ssa_names; i++)
+     if (t->expr_vars[i])
+       BITMAP_FREE (t->expr_vars[i]);
+   free (t->expr_vars);
+ 
    free (t->partition_dep_list);
    if (t->saw_replaceable)
      ret = (tree *)t->version_info;
*************** add_dependance (temp_expr_table_p tab, i
*** 1452,1462 ****
  static bool
  check_replaceable (temp_expr_table_p tab, tree stmt)
  {
!   tree var, def;
    int version;
    var_map map = tab->map;
    ssa_op_iter iter;
    tree call_expr;
  
    if (TREE_CODE (stmt) != MODIFY_EXPR)
      return false;
--- 1460,1471 ----
  static bool
  check_replaceable (temp_expr_table_p tab, tree stmt)
  {
!   tree var, def, basevar;
    int version;
    var_map map = tab->map;
    ssa_op_iter iter;
    tree call_expr;
+   bitmap def_vars = BITMAP_ALLOC (NULL), use_vars;
  
    if (TREE_CODE (stmt) != MODIFY_EXPR)
      return false;
*************** check_replaceable (temp_expr_table_p tab
*** 1487,1498 ****
--- 1496,1514 ----
      }
  
    version = SSA_NAME_VERSION (def);
+   basevar = SSA_NAME_VAR (def);
+   bitmap_set_bit (def_vars, var_ann (basevar)->uid);
  
    /* Add this expression to the dependency list for each use partition.  */
    FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_USE)
      {
        add_dependance (tab, version, var);
+ 
+       use_vars = tab->expr_vars[SSA_NAME_VERSION (var)];
+       if (use_vars)
+ 	bitmap_ior_into (def_vars, use_vars);
      }
+   tab->expr_vars[version] = def_vars;
  
    /* If there are VUSES, add a dependence on virtual defs.  */
    if (!ZERO_SSA_OPERANDS (stmt, SSA_OP_VUSE))
*************** static void
*** 1611,1617 ****
  find_replaceable_in_bb (temp_expr_table_p tab, basic_block bb)
  {
    block_stmt_iterator bsi;
!   tree stmt, def;
    stmt_ann_t ann;
    int partition;
    var_map map = tab->map;
--- 1627,1633 ----
  find_replaceable_in_bb (temp_expr_table_p tab, basic_block bb)
  {
    block_stmt_iterator bsi;
!   tree stmt, def, use;
    stmt_ann_t ann;
    int partition;
    var_map map = tab->map;
*************** find_replaceable_in_bb (temp_expr_table_
*** 1624,1653 ****
        ann = stmt_ann (stmt);
  
        /* Determine if this stmt finishes an existing expression.  */
!       FOR_EACH_SSA_TREE_OPERAND (def, stmt, iter, SSA_OP_USE)
  	{
! 	  if (tab->version_info[SSA_NAME_VERSION (def)])
  	    {
  	      bool same_root_var = false;
- 	      tree def2;
  	      ssa_op_iter iter2;
  
  	      /* See if the root variables are the same.  If they are, we
  		 do not want to do the replacement to avoid problems with
  		 code size, see PR tree-optimization/17549.  */
! 	      FOR_EACH_SSA_TREE_OPERAND (def2, stmt, iter2, SSA_OP_DEF)
! 		if (SSA_NAME_VAR (def) == SSA_NAME_VAR (def2))
! 		  {
! 		    same_root_var = true;
! 		    break;
! 		  }
  
  	      /* Mark expression as replaceable unless stmt is volatile
  		 or DEF sets the same root variable as STMT.  */
  	      if (!ann->has_volatile_ops && !same_root_var)
! 		mark_replaceable (tab, def);
  	      else
! 		finish_expr (tab, SSA_NAME_VERSION (def), false);
  	    }
  	}
        
--- 1640,1673 ----
        ann = stmt_ann (stmt);
  
        /* Determine if this stmt finishes an existing expression.  */
!       FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
  	{
! 	  unsigned ver = SSA_NAME_VERSION (use);
! 
! 	  if (tab->version_info[ver])
  	    {
  	      bool same_root_var = false;
  	      ssa_op_iter iter2;
+ 	      bitmap vars = tab->expr_vars[ver];
  
  	      /* See if the root variables are the same.  If they are, we
  		 do not want to do the replacement to avoid problems with
  		 code size, see PR tree-optimization/17549.  */
! 	      FOR_EACH_SSA_TREE_OPERAND (def, stmt, iter2, SSA_OP_DEF)
! 		{
! 		  if (bitmap_bit_p (vars, var_ann (SSA_NAME_VAR (def))->uid))
! 		    {
! 		      same_root_var = true;
! 		      break;
! 		    }
! 		}
  
  	      /* Mark expression as replaceable unless stmt is volatile
  		 or DEF sets the same root variable as STMT.  */
  	      if (!ann->has_volatile_ops && !same_root_var)
! 		mark_replaceable (tab, use);
  	      else
! 		finish_expr (tab, ver, false);
  	    }
  	}
        
Index: tree-pass.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-pass.h,v
retrieving revision 2.42
diff -c -3 -p -r2.42 tree-pass.h
*** tree-pass.h	9 Jun 2005 13:05:37 -0000	2.42
--- tree-pass.h	12 Jun 2005 20:44:03 -0000
*************** extern struct tree_opt_pass pass_record_
*** 178,183 ****
--- 178,184 ----
  extern struct tree_opt_pass pass_if_conversion;
  extern struct tree_opt_pass pass_vectorize;
  extern struct tree_opt_pass pass_complete_unroll;
+ extern struct tree_opt_pass pass_loop_prefetch;
  extern struct tree_opt_pass pass_iv_optimize;
  extern struct tree_opt_pass pass_loop_done;
  extern struct tree_opt_pass pass_ch;
Index: tree-pretty-print.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-pretty-print.c,v
retrieving revision 2.61
diff -c -3 -p -r2.61 tree-pretty-print.c
*** tree-pretty-print.c	7 Jun 2005 12:01:27 -0000	2.61
--- tree-pretty-print.c	12 Jun 2005 20:44:03 -0000
*************** dump_bb_header (pretty_printer *buffer, 
*** 2218,2223 ****
--- 2218,2225 ----
        INDENT (indent);
        pp_string (buffer, "# BLOCK ");
        pp_decimal_int (buffer, bb->index);
+       pp_string (buffer, ", freq ");
+       pp_decimal_int (buffer, bb->frequency);
  
        if (flags & TDF_LINENO)
  	{
Index: tree-ssa-loop-im.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-ssa-loop-im.c,v
retrieving revision 2.45
diff -c -3 -p -r2.45 tree-ssa-loop-im.c
*** tree-ssa-loop-im.c	7 Jun 2005 12:01:29 -0000	2.45
--- tree-ssa-loop-im.c	12 Jun 2005 20:44:03 -0000
*************** rewrite_mem_refs (tree tmp_var, struct m
*** 925,930 ****
--- 925,1033 ----
      }
  }
  
+ /* The name and the length of the currently generated variable
+    for lsm.  */
+ #define MAX_LSM_NAME_LENGTH 40
+ static char lsm_tmp_name[MAX_LSM_NAME_LENGTH + 1];
+ static int lsm_tmp_name_length;
+ 
+ /* Adds S to lsm_tmp_name.  */
+ 
+ static void
+ lsm_tmp_name_add (const char *s)
+ {
+   int l = strlen (s) + lsm_tmp_name_length;
+   if (l > MAX_LSM_NAME_LENGTH)
+     return;
+ 
+   strcpy (lsm_tmp_name + lsm_tmp_name_length, s);
+   lsm_tmp_name_length = l;
+ }
+ 
+ /* Stores the name for temporary variable that replaces REF to
+    lsm_tmp_name.  */
+ 
+ static void
+ gen_lsm_tmp_name (tree ref)
+ {
+   const char *name;
+ 
+   switch (TREE_CODE (ref))
+     {
+     case MISALIGNED_INDIRECT_REF:
+     case ALIGN_INDIRECT_REF:
+     case INDIRECT_REF:
+       gen_lsm_tmp_name (TREE_OPERAND (ref, 0));
+       lsm_tmp_name_add ("_");
+       break;
+ 
+     case BIT_FIELD_REF:
+     case VIEW_CONVERT_EXPR:
+     case ARRAY_RANGE_REF:
+       gen_lsm_tmp_name (TREE_OPERAND (ref, 0));
+       break;
+ 
+     case REALPART_EXPR:
+       gen_lsm_tmp_name (TREE_OPERAND (ref, 0));
+       lsm_tmp_name_add ("_RE");
+       break;
+       
+     case IMAGPART_EXPR:
+       gen_lsm_tmp_name (TREE_OPERAND (ref, 0));
+       lsm_tmp_name_add ("_IM");
+       break;
+ 
+     case COMPONENT_REF:
+       gen_lsm_tmp_name (TREE_OPERAND (ref, 0));
+       lsm_tmp_name_add ("_");
+       name = get_name (TREE_OPERAND (ref, 1));
+       if (!name)
+ 	name = "F";
+       lsm_tmp_name_add ("_");
+       lsm_tmp_name_add (name);
+ 
+     case ARRAY_REF:
+       gen_lsm_tmp_name (TREE_OPERAND (ref, 0));
+       lsm_tmp_name_add ("_I");
+       break;
+ 
+     case SSA_NAME:
+       ref = SSA_NAME_VAR (ref);
+       /* Fallthru.  */
+ 
+     case VAR_DECL:
+     case PARM_DECL:
+       name = get_name (ref);
+       if (!name)
+ 	name = "D";
+       lsm_tmp_name_add (name);
+       break;
+ 
+     case STRING_CST:
+       lsm_tmp_name_add ("S");
+       break;
+ 
+     case RESULT_DECL:
+       lsm_tmp_name_add ("R");
+       break;
+ 
+     default:
+       gcc_unreachable ();
+     }
+ }
+ 
+ /* Determines name for temporary variable that replaces REF.
+    The name is accumulated into the lsm_tmp_name variable.  */
+ 
+ static char *
+ get_lsm_tmp_name (tree ref)
+ {
+   lsm_tmp_name_length = 0;
+   gen_lsm_tmp_name (ref);
+   lsm_tmp_name_add ("_lsm");
+   return lsm_tmp_name;
+ }
+ 
  /* Records request for store motion of memory reference REF from LOOP.
     MEM_REFS is the list of occurrences of the reference REF inside LOOP;
     these references are rewritten by a new temporary variable.
*************** schedule_sm (struct loop *loop, edge *ex
*** 950,956 ****
        fprintf (dump_file, " from loop %d\n", loop->num);
      }
  
!   tmp_var = make_rename_temp (TREE_TYPE (ref), "lsm_tmp");
  
    fmt_data.loop = loop;
    fmt_data.orig_loop = loop;
--- 1053,1060 ----
        fprintf (dump_file, " from loop %d\n", loop->num);
      }
  
!   tmp_var = make_rename_temp (TREE_TYPE (ref),
! 			      get_lsm_tmp_name (ref));
  
    fmt_data.loop = loop;
    fmt_data.orig_loop = loop;
Index: tree-ssa-loop-ivopts.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-ssa-loop-ivopts.c,v
retrieving revision 2.80
diff -c -3 -p -r2.80 tree-ssa-loop-ivopts.c
*** tree-ssa-loop-ivopts.c	12 Jun 2005 14:02:58 -0000	2.80
--- tree-ssa-loop-ivopts.c	12 Jun 2005 20:44:03 -0000
*************** Software Foundation, 59 Temple Place - S
*** 97,102 ****
--- 97,106 ----
     this.  */
  #define AVG_LOOP_NITER(LOOP) 5
  
+ /* Cost accounted for computations that are performed before the loop.  */
+ 
+ #define BEFORE_LOOP_COST(LOOP, COST) \
+   (((COST) + AVG_LOOP_NITER (LOOP) - 1) / AVG_LOOP_NITER (LOOP))
  
  /* Representation of the induction variable.  */
  struct iv
*************** loop_data (struct loop *loop)
*** 358,365 ****
  
  /* The single loop exit if it dominates the latch, NULL otherwise.  */
  
! static edge
! single_dom_exit (struct loop *loop)
  {
    edge exit = loop->single_exit;
  
--- 362,369 ----
  
  /* The single loop exit if it dominates the latch, NULL otherwise.  */
  
! edge
! single_dom_exit (const struct loop *loop)
  {
    edge exit = loop->single_exit;
  
*************** get_address_cost (bool symbol_present, b
*** 3302,3329 ****
  
    if (!initialized)
      {
!       HOST_WIDE_INT i;
        initialized = true;
- 
        reg1 = gen_raw_REG (Pmode, LAST_VIRTUAL_REGISTER + 1);
- 
        addr = gen_rtx_fmt_ee (PLUS, Pmode, reg1, NULL_RTX);
!       for (i = 1; i <= 1 << 20; i <<= 1)
  	{
  	  XEXP (addr, 1) = gen_int_mode (i, Pmode);
  	  if (!memory_address_p (Pmode, addr))
  	    break;
  	}
-       max_offset = i >> 1;
        off = max_offset;
  
!       for (i = 1; i <= 1 << 20; i <<= 1)
  	{
! 	  XEXP (addr, 1) = gen_int_mode (-i, Pmode);
  	  if (!memory_address_p (Pmode, addr))
  	    break;
  	}
-       min_offset = -(i >> 1);
  
        if (dump_file && (dump_flags & TDF_DETAILS))
  	{
--- 3306,3341 ----
  
    if (!initialized)
      {
!       HOST_WIDE_INT i, j;
!       HOST_WIDE_INT bits = HOST_BITS_PER_WIDE_INT;
!       if (GET_MODE_BITSIZE (Pmode) < bits)
! 	bits = GET_MODE_BITSIZE (Pmode);
        initialized = true;
        reg1 = gen_raw_REG (Pmode, LAST_VIRTUAL_REGISTER + 1);
        addr = gen_rtx_fmt_ee (PLUS, Pmode, reg1, NULL_RTX);
! 
!       max_offset = 0;
!       i = 0;
!       for (j = 0; j < bits - 1; j++)
  	{
+ 	  i = (i << 1) + 1;
  	  XEXP (addr, 1) = gen_int_mode (i, Pmode);
  	  if (!memory_address_p (Pmode, addr))
  	    break;
+ 	  max_offset = i;
  	}
        off = max_offset;
  
!       min_offset = 0;
!       i = -1;
!       for (j = 0; j < bits; j++)
  	{
! 	  XEXP (addr, 1) = gen_int_mode (i, Pmode);
  	  if (!memory_address_p (Pmode, addr))
  	    break;
+ 	  min_offset = i;
+ 	  i <<= 1;
  	}
  
        if (dump_file && (dump_flags & TDF_DETAILS))
  	{
*************** determine_use_iv_cost_condition (struct 
*** 4042,4049 ****
  				 struct iv_use *use, struct iv_cand *cand)
  {
    tree bound = NULL_TREE, op, cond;
!   bitmap depends_on = NULL;
!   unsigned cost;
  
    /* Only consider real candidates.  */
    if (!cand->iv)
--- 4054,4061 ----
  				 struct iv_use *use, struct iv_cand *cand)
  {
    tree bound = NULL_TREE, op, cond;
!   bitmap depends_on_elim = NULL, depends_on_express = NULL, depends_on;
!   unsigned elim_cost, express_cost, cost;
  
    /* Only consider real candidates.  */
    if (!cand->iv)
*************** determine_use_iv_cost_condition (struct 
*** 4052,4069 ****
        return false;
      }
  
    if (may_eliminate_iv (data, use, cand, &bound))
!     {
!       cost = force_var_cost (data, bound, &depends_on);
  
!       set_use_iv_cost (data, use, cand, cost, depends_on, bound);
!       return cost != INFTY;
!     }
! 
!   /* The induction variable elimination failed; just express the original
!      giv.  If it is compared with an invariant, note that we cannot get
!      rid of it.  */
!   cost = get_computation_cost (data, use, cand, false, &depends_on);
  
    cond = *use->op_p;
    if (TREE_CODE (cond) != SSA_NAME)
--- 4064,4079 ----
        return false;
      }
  
+   /* Try iv elimination.  */
    if (may_eliminate_iv (data, use, cand, &bound))
!     elim_cost = force_var_cost (data, bound, &depends_on_elim);
!   else
!     elim_cost = INFTY;
  
!   /* Try expressing the original giv.  If it is compared with an invariant,
!      note that we cannot get rid of it.  */
!   express_cost = get_computation_cost (data, use, cand, false,
! 				       &depends_on_express);
  
    cond = *use->op_p;
    if (TREE_CODE (cond) != SSA_NAME)
*************** determine_use_iv_cost_condition (struct 
*** 4075,4085 ****
  	{
  	  op = get_iv (data, op)->base;
  	  fd_ivopts_data = data;
! 	  walk_tree (&op, find_depends, &depends_on, NULL);
  	}
      }
  
!   set_use_iv_cost (data, use, cand, cost, depends_on, NULL);
    return cost != INFTY;
  }
  
--- 4085,4125 ----
  	{
  	  op = get_iv (data, op)->base;
  	  fd_ivopts_data = data;
! 	  walk_tree (&op, find_depends, &depends_on_express, NULL);
  	}
      }
  
!   if (elim_cost == INFTY)
!     {
!       set_use_iv_cost (data, use, cand, express_cost, depends_on_express,
! 		       NULL);
!       return express_cost != INFTY;
!     }
! 
!   /* The computation for iv elimination is loop invariant, and thus does not
!      need to be performed inside the loop.  */
!   elim_cost = BEFORE_LOOP_COST (data->current_loop, elim_cost);
!   if (elim_cost < express_cost)
!     {
!       cost = elim_cost;
!       depends_on = depends_on_elim;
!       depends_on_elim = NULL;
!     }
!   else
!     {
!       cost = express_cost;
!       depends_on = depends_on_express;
!       depends_on_express = NULL;
!       bound = NULL_TREE;
!     }
! 
!   set_use_iv_cost (data, use, cand, cost, depends_on, bound);
! 
!   if (depends_on_elim)
!     BITMAP_FREE (depends_on_elim);
!   if (depends_on_express)
!     BITMAP_FREE (depends_on_express);
! 
    return cost != INFTY;
  }
  
*************** determine_use_iv_cost_outer (struct ivop
*** 4144,4151 ****
  
        depends_on = NULL;
        cost = force_var_cost (data, value, &depends_on);
! 
!       cost /= AVG_LOOP_NITER (loop);
  
        set_use_iv_cost (data, use, cand, cost, depends_on, value);
        return cost != INFTY;
--- 4184,4190 ----
  
        depends_on = NULL;
        cost = force_var_cost (data, value, &depends_on);
!       cost = BEFORE_LOOP_COST (loop, cost);
  
        set_use_iv_cost (data, use, cand, cost, depends_on, value);
        return cost != INFTY;
*************** determine_use_iv_cost_outer (struct ivop
*** 4159,4165 ****
        cost = get_computation_cost_at (data, use, cand, false, &depends_on,
  				      last_stmt (exit->src));
        if (cost != INFTY)
! 	cost /= AVG_LOOP_NITER (loop);
      }
    else
      {
--- 4198,4204 ----
        cost = get_computation_cost_at (data, use, cand, false, &depends_on,
  				      last_stmt (exit->src));
        if (cost != INFTY)
! 	cost = BEFORE_LOOP_COST (loop, cost);
      }
    else
      {
*************** determine_iv_cost (struct ivopts_data *d
*** 4295,4301 ****
    cost_base = force_var_cost (data, base, NULL);
    cost_step = add_cost (TYPE_MODE (TREE_TYPE (base)));
  
!   cand->cost = cost_step + cost_base / AVG_LOOP_NITER (current_loop);
  
    /* Prefer the original iv unless we may gain something by replacing it;
       this is not really relevant for artificial ivs created by other
--- 4334,4340 ----
    cost_base = force_var_cost (data, base, NULL);
    cost_step = add_cost (TYPE_MODE (TREE_TYPE (base)));
  
!   cand->cost = cost_step + BEFORE_LOOP_COST (current_loop, cost_base);
  
    /* Prefer the original iv unless we may gain something by replacing it;
       this is not really relevant for artificial ivs created by other
Index: tree-ssa-loop-manip.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-ssa-loop-manip.c,v
retrieving revision 2.32
diff -c -3 -p -r2.32 tree-ssa-loop-manip.c
*** tree-ssa-loop-manip.c	1 May 2005 08:08:06 -0000	2.32
--- tree-ssa-loop-manip.c	12 Jun 2005 20:44:03 -0000
*************** tree_duplicate_loop_to_header_edge (stru
*** 618,620 ****
--- 618,835 ----
  
    return true;
  }
+ 
+ /* Build if (COND) goto THEN_LABEL; else goto ELSE_LABEL;  */
+ 
+ static tree
+ build_if_stmt (tree cond, tree then_label, tree else_label)
+ {
+   return build3 (COND_EXPR, void_type_node,
+ 		 cond,
+ 		 build1 (GOTO_EXPR, void_type_node, then_label),
+ 		 build1 (GOTO_EXPR, void_type_node, else_label));
+ }
+ 
+ /* Unroll LOOP FACTOR times.  LOOPS is the loops tree.  DESC describes
+    number of iterations of LOOP.  EXIT is the exit of the loop to that
+    DESC corresponds.
+    
+    If N is number of iterations of the loop and MAY_BE_ZERO is the condition
+    under that loop exits in the first iteration even if N != 0,
+    
+    while (1)
+      {
+        x = phi (init, next);
+ 
+        pre;
+        if (st)
+          break;
+        post;
+      }
+ 
+    becomes
+    
+    if (MAY_BE_ZERO || N < FACTOR)
+      goto rest;
+ 
+    do
+      {
+        x = phi (init, next);
+ 
+        pre;
+        post;
+        pre;
+        post;
+        ...
+        pre;
+        post;
+        N -= FACTOR;
+        
+      } while (N >= FACTOR);
+ 
+    rest:
+      init' = phi (init, next);
+ 
+    while (1)
+      {
+        x = phi (init', next);
+ 
+        pre;
+        if (st)
+          break;
+        post;
+      } */
+ 
+ /* Probability in % that the unrolled loop is entered.  Just a guess.  */
+ #define PROB_UNROLLED_LOOP_ENTERED 80
+ 
+ void
+ tree_unroll_loop (struct loops *loops, struct loop *loop, unsigned factor,
+ 		  edge exit, struct tree_niter_desc *desc)
+ {
+   tree cond, type, niter, stmts, ctr_before, ctr_after, dont_exit, exit_if;
+   tree phi_old_loop, phi_new_loop, phi_rest, init, next, new_init, var;
+   struct loop *new_loop;
+   basic_block rest, exit_bb;
+   edge old_entry, new_entry, old_latch, precond_edge, new_exit;
+   edge nonexit, new_nonexit;
+   block_stmt_iterator bsi;
+   use_operand_p op;
+   bool ok;
+   unsigned prob_entry, scale_unrolled, scale_rest, est_niter;
+   sbitmap wont_exit;
+ 
+   est_niter = expected_loop_iterations (loop);
+   niter = force_gimple_operand (unshare_expr (desc->niter), &stmts,
+ 				true, NULL_TREE);
+   if (stmts)
+     bsi_insert_on_edge_immediate_loop (loop_preheader_edge (loop), stmts);
+ 
+   type = TREE_TYPE (niter);
+   cond = fold_build2 (GE_EXPR, boolean_type_node,
+ 		      niter, build_int_cst_type (type, factor));
+   if (!zero_p (desc->may_be_zero))
+     cond = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
+ 			invert_truthvalue (unshare_expr (desc->may_be_zero)),
+ 			cond);
+   cond = force_gimple_operand (cond, &stmts, false, NULL_TREE);
+   if (stmts)
+     bsi_insert_on_edge_immediate_loop (loop_preheader_edge (loop), stmts);
+   /* cond now may be a gimple comparison, which would be OK, but also any
+      other gimple rhs (say a && b).  In this case we need to force it to
+      operand.  */
+   if (!is_gimple_condexpr (cond))
+     {
+       cond = force_gimple_operand (cond, &stmts, true, NULL_TREE);
+       if (stmts)
+ 	bsi_insert_on_edge_immediate_loop (loop_preheader_edge (loop), stmts);
+     }
+ 
+   /* Let us assume that the unrolled loop is quite likely to be entered.  */
+   if (nonzero_p (cond))
+     prob_entry = REG_BR_PROB_BASE;
+   else
+     prob_entry = PROB_UNROLLED_LOOP_ENTERED * REG_BR_PROB_BASE / 100;
+ 
+   /* Getting the profile right is nontrivial.  These values should at least
+      keep it consistent, and somewhat close to correct.  */
+   scale_unrolled = prob_entry;
+   scale_rest = REG_BR_PROB_BASE;
+      
+   new_loop = loop_version (loops, loop, cond, NULL, prob_entry, scale_unrolled,
+ 			   scale_rest, true);
+   gcc_assert (new_loop != NULL);
+   update_ssa (TODO_update_ssa);
+ 
+   /* Unroll the loop and remove the old exits.  */
+   dont_exit = ((exit->flags & EDGE_TRUE_VALUE)
+ 	       ? boolean_false_node
+ 	       : boolean_true_node);
+   if (exit == EDGE_SUCC (exit->src, 0))
+     nonexit = EDGE_SUCC (exit->src, 1);
+   else
+     nonexit = EDGE_SUCC (exit->src, 0);
+   nonexit->probability = REG_BR_PROB_BASE;
+   exit->probability = 0;
+   nonexit->count += exit->count;
+   exit->count = 0;
+   exit_if = last_stmt (exit->src);
+   COND_EXPR_COND (exit_if) = dont_exit;
+   update_stmt (exit_if);
+       
+   wont_exit = sbitmap_alloc (factor);
+   sbitmap_ones (wont_exit);
+   ok = tree_duplicate_loop_to_header_edge
+ 	  (loop, loop_latch_edge (loop), loops, factor - 1,
+ 	   wont_exit, NULL, NULL, NULL, DLTHE_FLAG_UPDATE_FREQ);
+   free (wont_exit);
+   gcc_assert (ok);
+   update_ssa (TODO_update_ssa);
+ 
+   /* Prepare the cfg and update the phi nodes.  */
+   rest = loop_preheader_edge (new_loop)->src;
+   precond_edge = single_pred_edge (rest);
+   loop_split_edge_with (loop_latch_edge (loop), NULL);
+   exit_bb = single_pred (loop->latch);
+ 
+   new_exit = make_edge (exit_bb, rest, EDGE_FALSE_VALUE);
+   new_exit->count = loop_preheader_edge (loop)->count;
+   est_niter = est_niter / factor + 1;
+   new_exit->probability = REG_BR_PROB_BASE / est_niter;
+ 
+   new_nonexit = single_pred_edge (loop->latch);
+   new_nonexit->flags = EDGE_TRUE_VALUE;
+   new_nonexit->probability = REG_BR_PROB_BASE - new_exit->probability;
+ 
+   old_entry = loop_preheader_edge (loop);
+   new_entry = loop_preheader_edge (new_loop);
+   old_latch = loop_latch_edge (loop);
+   for (phi_old_loop = phi_nodes (loop->header),
+        phi_new_loop = phi_nodes (new_loop->header);
+        phi_old_loop;
+        phi_old_loop = PHI_CHAIN (phi_old_loop),
+        phi_new_loop = PHI_CHAIN (phi_new_loop))
+     {
+       init = PHI_ARG_DEF_FROM_EDGE (phi_old_loop, old_entry);
+       op = PHI_ARG_DEF_PTR_FROM_EDGE (phi_new_loop, new_entry);
+       gcc_assert (operand_equal_for_phi_arg_p (init, USE_FROM_PTR (op)));
+       next = PHI_ARG_DEF_FROM_EDGE (phi_old_loop, old_latch);
+ 
+       /* Prefer using original variable as a base for the new ssa name.
+ 	 This is necessary for virtual ops, and useful in order to avoid
+ 	 losing debug info for real ops.  */
+       if (TREE_CODE (init) == SSA_NAME)
+ 	var = SSA_NAME_VAR (init);
+       else if (TREE_CODE (next) == SSA_NAME)
+ 	var = SSA_NAME_VAR (next);
+       else
+ 	{
+ 	  var = create_tmp_var (TREE_TYPE (init), "unrinittmp");
+ 	  add_referenced_tmp_var (var);
+ 	}
+ 
+       new_init = make_ssa_name (var, NULL_TREE);
+       phi_rest = create_phi_node (new_init, rest);
+       SSA_NAME_DEF_STMT (new_init) = phi_rest;
+ 
+       add_phi_arg (phi_rest, init, precond_edge);
+       add_phi_arg (phi_rest, next, new_exit);
+       SET_USE (op, new_init);
+     }
+ 
+   /* Finally create the new counter for number of iterations and add the new
+      exit instruction.  */
+   bsi = bsi_last (exit_bb);
+   create_iv (niter, build_int_cst_type (type, -factor), NULL_TREE, loop,
+ 	     &bsi, true, &ctr_before, &ctr_after);
+   exit_if = build_if_stmt (build2 (GE_EXPR, boolean_type_node, ctr_after,
+ 				   build_int_cst_type (type, factor)),
+ 			   tree_block_label (loop->latch),
+ 			   tree_block_label (rest));
+   bsi_insert_after (&bsi, exit_if, BSI_NEW_STMT);
+ 
+   verify_flow_info ();
+   verify_dominators (CDI_DOMINATORS);
+   verify_loop_structure (loops);
+   verify_loop_closed_ssa ();
+ }
Index: tree-ssa-loop-niter.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-ssa-loop-niter.c,v
retrieving revision 2.30
diff -c -3 -p -r2.30 tree-ssa-loop-niter.c
*** tree-ssa-loop-niter.c	8 Jun 2005 08:47:04 -0000	2.30
--- tree-ssa-loop-niter.c	12 Jun 2005 20:44:03 -0000
*************** zero_p (tree arg)
*** 68,74 ****
  /* Returns true if ARG a nonzero constant.  Unlike integer_nonzerop, it does
     not care about overflow flags.  */
  
! static bool
  nonzero_p (tree arg)
  {
    if (!arg)
--- 68,74 ----
  /* Returns true if ARG a nonzero constant.  Unlike integer_nonzerop, it does
     not care about overflow flags.  */
  
! bool
  nonzero_p (tree arg)
  {
    if (!arg)
*************** expand_simple_operations (tree expr)
*** 632,638 ****
  {
    unsigned i, n;
    tree ret = NULL_TREE, e, ee, stmt;
!   enum tree_code code = TREE_CODE (expr);
  
    if (is_gimple_min_invariant (expr))
      return expr;
--- 632,642 ----
  {
    unsigned i, n;
    tree ret = NULL_TREE, e, ee, stmt;
!   enum tree_code code;
! 
!   if (!expr)
!     return NULL_TREE;
!   code = TREE_CODE (expr);
  
    if (is_gimple_min_invariant (expr))
      return expr;
*************** expand_simple_operations (tree expr)
*** 671,676 ****
--- 675,683 ----
        && TREE_CODE (e) != SSA_NAME
        /* Assignments of invariants are simple.  */
        && !is_gimple_min_invariant (e)
+       /* Taking of an address must be considered simple, so that
+ 	 we are able to handle manual strength reduction/iv elimination.  */
+       && TREE_CODE (e) != ADDR_EXPR
        /* And increments and decrements by a constant are simple.  */
        && !((TREE_CODE (e) == PLUS_EXPR
  	    || TREE_CODE (e) == MINUS_EXPR)
Index: tree-ssa-loop-prefetch.c
===================================================================
RCS file: tree-ssa-loop-prefetch.c
diff -N tree-ssa-loop-prefetch.c
*** /dev/null	1 Jan 1970 00:00:00 -0000
--- tree-ssa-loop-prefetch.c	12 Jun 2005 20:44:03 -0000
***************
*** 0 ****
--- 1,1101 ----
+ /* Array prefetching.
+    Copyright (C) 2005 Free Software Foundation, Inc.
+    
+ This file is part of GCC.
+    
+ GCC is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 2, or (at your option) any
+ later version.
+    
+ GCC is distributed in the hope that it will be useful, but WITHOUT
+ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ for more details.
+    
+ You should have received a copy of the GNU General Public License
+ along with GCC; see the file COPYING.  If not, write to the Free
+ Software Foundation, 59 Temple Place - Suite 330, Boston, MA
+ 02111-1307, USA.  */
+ 
+ #include "config.h"
+ #include "system.h"
+ #include "coretypes.h"
+ #include "tm.h"
+ #include "tree.h"
+ #include "rtl.h"
+ #include "tm_p.h"
+ #include "hard-reg-set.h"
+ #include "basic-block.h"
+ #include "output.h"
+ #include "diagnostic.h"
+ #include "tree-flow.h"
+ #include "tree-dump.h"
+ #include "timevar.h"
+ #include "cfgloop.h"
+ #include "varray.h"
+ #include "expr.h"
+ #include "tree-pass.h"
+ #include "ggc.h"
+ #include "insn-config.h"
+ #include "recog.h"
+ #include "hashtab.h"
+ #include "tree-chrec.h"
+ #include "tree-scalar-evolution.h"
+ #include "toplev.h"
+ #include "params.h"
+ #include "langhooks.h"
+ 
+ /* This pass inserts prefetch instructions to optimize cache usage during
+    accesses to arrays in loops.  It processes loops sequentially and:
+ 
+    1) Gathers all memory references in the single loop.
+    2) For each of the references it decides when it is profitable to prefetch
+       it.  To do it, we evaluate the reuse among the accesses, and determines
+       two values: PREFETCH_BEFORE (meaning that it only makes sense to do
+       prefetching in the first PREFETCH_BEFORE iterations of the loop) and
+       PREFETCH_MOD (meaning that it only makes sense to prefetch in the
+       iterations of the loop that are zero modulo PREFETCH_MOD).  For example
+       (assuming cache line size is 64 bytes, char has size 1 byte and there
+       is no hardware sequential prefetch):
+ 
+       char *a;
+       for (i = 0; i < max; i++)
+ 	{
+ 	  a[255] = ...;		(0)
+ 	  a[i] = ...;		(1)
+ 	  a[i + 64] = ...;	(2)
+ 	  a[16*i] = ...;	(3)
+ 	  a[187*i] = ...;	(4)
+ 	  a[187*i + 50] = ...;	(5)
+ 	}
+ 
+        (0) obviously has PREFETCH_BEFORE 1
+        (1) has PREFETCH_BEFORE 64, since (2) accesses the same memory
+            location 64 iterations before it, and PREFETCH_MOD 64 (since
+ 	   it hits the same cache line otherwise).
+        (2) has PREFETCH_MOD 64
+        (3) has PREFETCH_MOD 4
+        (4) has PREFETCH_MOD 1.  We do not set PREFETCH_BEFORE here, since
+            the cache line accessed by (4) is the same with probability only
+ 	   7/32.
+        (5) has PREFETCH_MOD 1 as well.
+ 
+    3) We determine how much ahead we need to prefetch.  The number of
+       iterations needed is time to fetch / time spent in one iteration of
+       the loop.  The problem is that we do not know either of these values,
+       so we just make a heuristic guess based on a magic (possibly)
+       target-specific constant and size of the loop.
+ 
+    4) Determine which of the references we prefetch.  We take into account
+       that there is a maximum number of simultaneous prefetches (provided
+       by machine description).  We prefetch as many prefetches as possible
+       while still within this bound (starting with those with lowest
+       prefetch_mod, since they are responsible for most of the cache
+       misses).
+       
+    5) We unroll and peel loops so that we are able to satisfy PREFETCH_MOD
+       and PREFETCH_BEFORE requirements (within some bounds), and to avoid
+       prefetching nonaccessed memory.
+       TODO -- actually implement this.
+       
+    6) We actually emit the prefetch instructions.  ??? Perhaps emit the
+       prefetch instructions with guards in cases where 5) was not sufficient
+       to satisfy the constraints?
+ 
+    Some other TODO:
+       -- write and use more general reuse analysis (that could be also used
+ 	 in other cache aimed loop optimizations)
+       -- make it behave sanely together with the prefetches given by user
+ 	 (now we just ignore them; at the very least we should avoid
+ 	 optimizing loops in that user put his own prefetches)
+       -- we assume cache line size alignment of arrays; this could be
+ 	 improved.  */
+ 
+ /* Magic constants follow.  These should be replaced by machine specific
+    numbers.  */
+ 
+ /* A number that should rouhgly correspond to the number of instructions
+    executed before the prefetch is completed.  */
+ 
+ #ifndef PREFETCH_LATENCY
+ #define PREFETCH_LATENCY 200
+ #endif
+ 
+ /* Number of prefetches that can run at the same time.  */
+ 
+ #ifndef SIMULTANEOUS_PREFETCHES
+ #define SIMULTANEOUS_PREFETCHES 3
+ #endif
+ 
+ /* True if write can be prefetched by a read prefetch.  */
+ 
+ #ifndef WRITE_CAN_USE_READ_PREFETCH
+ #define WRITE_CAN_USE_READ_PREFETCH 1
+ #endif
+ 
+ /* True if read can be prefetched by a write prefetch. */
+ 
+ #ifndef READ_CAN_USE_WRITE_PREFETCH
+ #define READ_CAN_USE_WRITE_PREFETCH 0
+ #endif
+ 
+ /* Cache line size.  Assumed to be a power of two.  */
+ 
+ #ifndef PREFETCH_BLOCK
+ #define PREFETCH_BLOCK 32
+ #endif
+ 
+ /* Do we have a forward hardware sequential prefetching?  */
+ 
+ #ifndef HAVE_FORWARD_PREFETCH
+ #define HAVE_FORWARD_PREFETCH 0
+ #endif
+ 
+ /* Do we have a backward hardware sequential prefetching?  */
+ 
+ #ifndef HAVE_BACKWARD_PREFETCH
+ #define HAVE_BACKWARD_PREFETCH 0
+ #endif
+ 
+ /* In some cases we are only able to determine that there is a certain
+    probability that the two accesses hit the same cache line.  In this
+    case, we issue the prefetches for both of them if this probability
+    is less then (1000 - ACCEPTABLE_MISS_RATE) promile.  */
+ 
+ #ifndef ACCEPTABLE_MISS_RATE
+ #define ACCEPTABLE_MISS_RATE 50
+ #endif
+ 
+ #ifndef HAVE_prefetch
+ #define HAVE_prefetch 0
+ #endif
+ 
+ /* The group of references between that reuse may occur.  */
+ 
+ struct mem_ref_group
+ {
+   tree base;			/* Base of the reference.  */
+   HOST_WIDE_INT step;		/* Step of the reference.  */
+   tree group_iv;		/* Induction variable for the group.  */
+   bool issue_prefetch_p;	/* Is there any prefetch issued in the
+ 				   group?  */
+   struct mem_ref *refs;		/* References in the group.  */
+   struct mem_ref_group *next;	/* Next group of references.  */
+ };
+ 
+ /* Assigned to PREFETCH_BEFORE when all iterations are to be prefetched.  */
+ 
+ #define PREFETCH_ALL		(~(unsigned HOST_WIDE_INT) 0)
+ 
+ /* The memory reference.  */
+ 
+ struct mem_ref
+ {
+   HOST_WIDE_INT delta;		/* Constant offset of the reference.  */
+   bool write_p;			/* Is it a write?  */
+   struct mem_ref_group *group;	/* The group of references it belongs to.  */
+   unsigned HOST_WIDE_INT prefetch_mod;
+ 				/* Prefetch only each PREFETCH_MOD-th
+ 				   iteration.  */
+   unsigned HOST_WIDE_INT prefetch_before;
+ 				/* Prefetch only first PREFETCH_BEFORE
+ 				   iterations.  */
+   bool issue_prefetch_p;	/* Should we really issue the prefetch?  */
+   struct mem_ref *next;		/* The next reference in the group.  */
+ };
+ 
+ /* Dumps information obout reference REF to FILE.  */
+ 
+ static void
+ dump_mem_ref (FILE *file, struct mem_ref *ref)
+ {
+   fprintf (file, "Reference %p:\n", (void *) ref);
+ 
+   fprintf (file, "  group %p (base ", (void *) ref->group);
+   print_generic_expr (file, ref->group->base, TDF_SLIM);
+   fprintf (file, ", step ");
+   fprintf (file, HOST_WIDE_INT_PRINT_DEC, ref->group->step);
+   fprintf (file, ")\n");
+ 
+   fprintf (dump_file, "  delta ");
+   fprintf (file, HOST_WIDE_INT_PRINT_DEC, ref->delta);
+   fprintf (file, "\n");
+ 
+   fprintf (file, "  %s\n", ref->write_p ? "write" : "read");
+ 
+   fprintf (file, "\n");
+ }
+ 
+ /* Finds a group with BASE and STEP in GROUPS, or creates one if it does not
+    exist.  */
+ 
+ static struct mem_ref_group *
+ find_or_create_group (struct mem_ref_group **groups, tree base,
+ 		      HOST_WIDE_INT step)
+ {
+   struct mem_ref_group *group;
+ 
+   for (; *groups; groups = &(*groups)->next)
+     {
+       if ((*groups)->step == step
+ 	  && operand_equal_p ((*groups)->base, base, 0))
+ 	return *groups;
+ 
+       /* Keep the list of groups sorted by decreasing step.  */
+       if ((*groups)->step < step)
+ 	break;
+     }
+ 
+   group = xcalloc (1, sizeof (struct mem_ref_group));
+   group->base = base;
+   group->step = step;
+   group->group_iv = NULL_TREE;
+   group->refs = NULL;
+   group->next = *groups;
+   *groups = group;
+ 
+   return group;
+ }
+ 
+ /* Records a memory reference in GROUP with offset DELTA and write status
+    WRITE_P.  */
+ 
+ static void
+ record_ref (struct mem_ref_group *group, HOST_WIDE_INT delta,
+ 	    bool write_p)
+ {
+   struct mem_ref **aref;
+ 
+   for (aref = &group->refs; *aref; aref = &(*aref)->next)
+     {
+       /* It does not have to be possible for write reference to reuse the read
+ 	 prefetch, or vice versa.  */
+       if (!WRITE_CAN_USE_READ_PREFETCH
+ 	  && write_p
+ 	  && !(*aref)->write_p)
+ 	continue;
+       if (!READ_CAN_USE_WRITE_PREFETCH
+ 	  && !write_p
+ 	  && (*aref)->write_p)
+ 	continue;
+ 
+       if ((*aref)->delta == delta)
+ 	return;
+     }
+ 
+   (*aref) = xcalloc (1, sizeof (struct mem_ref));
+   (*aref)->delta = delta;
+   (*aref)->write_p = write_p;
+   (*aref)->prefetch_before = PREFETCH_ALL;
+   (*aref)->prefetch_mod = 1;
+   (*aref)->issue_prefetch_p = false;
+   (*aref)->group = group;
+   (*aref)->next = NULL;
+ 
+   if (dump_file && (dump_flags & TDF_DETAILS))
+     dump_mem_ref (dump_file, *aref);
+ }
+ 
+ /* Release memory references in GROUPS.  */
+ 
+ static void
+ release_mem_refs (struct mem_ref_group *groups)
+ {
+   struct mem_ref_group *next_g;
+   struct mem_ref *ref, *next_r;
+ 
+   for (; groups; groups = next_g)
+     {
+       next_g = groups->next;
+       for (ref = groups->refs; ref; ref = next_r)
+ 	{
+ 	  next_r = ref->next;
+ 	  free (ref);
+ 	}
+       free (groups);
+     }
+ }
+ 
+ /* A structure used to pass arguments to idx_analyze_ref.  */
+ 
+ struct ar_data
+ {
+   struct loop *loop;			/* Loop of the reference.  */
+   tree stmt;				/* Statement of the reference.  */
+   HOST_WIDE_INT *step;			/* Step of the memory reference.  */
+   HOST_WIDE_INT *delta;			/* Offset of the memory reference.  */
+ };
+ 
+ /* Analyzes a single INDEX of a memory reference to obtain information
+    described at analyze_ref.  Callback for for_each_index.  */
+ 
+ static bool
+ idx_analyze_ref (tree base, tree *index, void *data)
+ {
+   struct ar_data *ar_data = data;
+   tree ibase, step, stepsize;
+   HOST_WIDE_INT istep, idelta = 0, imult = 1;
+ 
+   if (TREE_CODE (base) == MISALIGNED_INDIRECT_REF
+       || TREE_CODE (base) == ALIGN_INDIRECT_REF)
+     return false;
+ 
+   if (!simple_iv (ar_data->loop, ar_data->stmt, *index, &ibase, &step, false))
+     return false;
+ 
+   if (zero_p (step))
+     istep = 0;
+   else
+     {
+       if (!cst_and_fits_in_hwi (step))
+ 	return false;
+       istep = int_cst_value (step);
+     }
+ 
+   if (TREE_CODE (ibase) == PLUS_EXPR
+       && cst_and_fits_in_hwi (TREE_OPERAND (ibase, 1)))
+     {
+       idelta = int_cst_value (TREE_OPERAND (ibase, 1));
+       ibase = TREE_OPERAND (ibase, 0);
+     }
+   if (cst_and_fits_in_hwi (ibase))
+     {
+       idelta += int_cst_value (ibase);
+       ibase = build_int_cst_type (TREE_TYPE (ibase), 0);
+     }
+ 
+   if (TREE_CODE (base) == ARRAY_REF)
+     {
+       stepsize = array_ref_element_size (base);
+       if (!cst_and_fits_in_hwi (stepsize))
+ 	return false;
+       imult = int_cst_value (stepsize);
+ 
+       istep *= imult;
+       idelta *= imult;
+     }
+ 
+   *ar_data->step += istep;
+   *ar_data->delta += idelta;
+   *index = ibase;
+ 
+   return true;
+ }
+ 
+ /* Tries to express REF in shape &BASE + STEP * iter + DELTA, where DELTA and
+    STEP are integer constants and iter is number of iterations of LOOP.  The
+    reference occurs in statement STMT.  */
+ 
+ static bool
+ analyze_ref (struct loop *loop, tree ref, tree *base,
+ 	     HOST_WIDE_INT *step, HOST_WIDE_INT *delta,
+ 	     tree stmt)
+ {
+   struct ar_data ar_data;
+   tree off;
+   HOST_WIDE_INT bit_offset;
+ 
+   *step = 0;
+   *delta = 0;
+ 
+   /* First strip off the component references.  Ignore bitfields.  */
+   if (TREE_CODE (ref) == COMPONENT_REF
+       && DECL_NONADDRESSABLE_P (TREE_OPERAND (ref, 1)))
+     ref = TREE_OPERAND (ref, 0);
+ 
+   for (; TREE_CODE (ref) == COMPONENT_REF; ref = TREE_OPERAND (ref, 0))
+     {
+       off = DECL_FIELD_BIT_OFFSET (TREE_OPERAND (ref, 1));
+       bit_offset = TREE_INT_CST_LOW (off);
+       gcc_assert (bit_offset % BITS_PER_UNIT == 0);
+       
+       *delta += bit_offset / BITS_PER_UNIT;
+     }
+ 
+   *base = unshare_expr (ref);
+   ar_data.loop = loop;
+   ar_data.stmt = stmt;
+   ar_data.step = step;
+   ar_data.delta = delta;
+   return for_each_index (base, idx_analyze_ref, &ar_data);
+ }
+ 
+ /* Record a memory reference REF to the list REFS.  The reference occurs in
+    LOOP in statement STMT and it is write if WRITE_P.  */
+ 
+ static void
+ gather_memory_references_ref (struct loop *loop, struct mem_ref_group **refs,
+ 			      tree ref, bool write_p, tree stmt)
+ {
+   tree base;
+   HOST_WIDE_INT step, delta;
+   struct mem_ref_group *agrp;
+ 
+   if (!analyze_ref (loop, ref, &base, &step, &delta, stmt))
+     return;
+ 
+   /* Now we know that REF = &BASE + STEP * iter + DELTA, where DELTA and STEP
+      are integer constants.  */
+   agrp = find_or_create_group (refs, base, step);
+   record_ref (agrp, delta, write_p);
+ }
+ 
+ /* Record the suitable memory references in LOOP.  */
+ 
+ static struct mem_ref_group *
+ gather_memory_references (struct loop *loop)
+ {
+   basic_block *body = get_loop_body_in_dom_order (loop);
+   basic_block bb;
+   unsigned i;
+   block_stmt_iterator bsi;
+   tree stmt, lhs, rhs;
+   struct mem_ref_group *refs = NULL;
+ 
+   /* Scan the loop body in order, so that the former references precede the
+      later ones.  */
+   for (i = 0; i < loop->num_nodes; i++)
+     {
+       bb = body[i];
+       if (bb->loop_father != loop)
+ 	continue;
+ 
+       for (bsi = bsi_start (bb); !bsi_end_p (bsi); bsi_next (&bsi))
+ 	{
+ 	  stmt = bsi_stmt (bsi);
+ 	  if (TREE_CODE (stmt) != MODIFY_EXPR)
+ 	    continue;
+ 
+ 	  lhs = TREE_OPERAND (stmt, 0);
+ 	  rhs = TREE_OPERAND (stmt, 1);
+ 
+ 	  if (REFERENCE_CLASS_P (rhs))
+ 	    gather_memory_references_ref (loop, &refs, rhs, false, stmt);
+ 	  if (REFERENCE_CLASS_P (lhs))
+ 	    gather_memory_references_ref (loop, &refs, lhs, true, stmt);
+ 	}
+     }
+   free (body);
+ 
+   return refs;
+ }
+ 
+ /* Prune the prefetch candidate REF using the self-reuse.  */
+ 
+ static void
+ prune_ref_by_self_reuse (struct mem_ref *ref)
+ {
+   HOST_WIDE_INT step = ref->group->step;
+   bool backward = step < 0;
+ 
+   if (step == 0)
+     {
+       /* Prefetch references to invariant address just once.  */
+       ref->prefetch_before = 1;
+       return;
+     }
+ 
+   if (backward)
+     step = -step;
+ 
+   if (step > PREFETCH_BLOCK)
+     return;
+ 
+   if ((backward && HAVE_BACKWARD_PREFETCH)
+       || (!backward && HAVE_FORWARD_PREFETCH))
+     {
+       ref->prefetch_before = 1;
+       return;
+     }
+ 
+   ref->prefetch_mod = PREFETCH_BLOCK / step;
+ }
+ 
+ /* Divides X by BY, rounding down.  */
+ 
+ static HOST_WIDE_INT
+ ddown (HOST_WIDE_INT x, unsigned HOST_WIDE_INT by)
+ {
+   gcc_assert (by > 0);
+ 
+   if (x >= 0)
+     return x / by;
+   else
+     return (x + by - 1) / by;
+ }
+ 
+ /* Prune the prefetch candidate REF using the reuse with BY.
+    If BY_IS_BEFORE is true, BY is before REF in the loop.  */
+ 
+ static void
+ prune_ref_by_group_reuse (struct mem_ref *ref, struct mem_ref *by,
+ 			  bool by_is_before)
+ {
+   HOST_WIDE_INT step = ref->group->step;
+   bool backward = step < 0;
+   HOST_WIDE_INT delta_r = ref->delta, delta_b = by->delta;
+   HOST_WIDE_INT delta = delta_b - delta_r;
+   HOST_WIDE_INT hit_from;
+   unsigned HOST_WIDE_INT prefetch_before, prefetch_block;
+ 
+   if (delta == 0)
+     {
+       /* If the references has the same address, only prefetch the
+ 	 former.  */
+       if (by_is_before)
+ 	ref->prefetch_before = 0;
+       
+       return;
+     }
+ 
+   if (!step)
+     {
+       /* If the reference addresses are invariant and fall into the
+ 	 same cache line, prefetch just the first one.  */
+       if (!by_is_before)
+ 	return;
+ 
+       if (ddown (ref->delta, PREFETCH_BLOCK)
+ 	  != ddown (by->delta, PREFETCH_BLOCK))
+ 	return;
+ 
+       ref->prefetch_before = 0;
+       return;
+     }
+ 
+   /* Only prune the reference that is behind in the array.  */
+   if (backward)
+     {
+       if (delta > 0)
+ 	return;
+ 
+       /* Transform the data so that we may assume that the accesses
+ 	 are forward.  */
+       delta = - delta;
+       step = -step;
+       delta_r = PREFETCH_BLOCK - 1 - delta_r;
+       delta_b = PREFETCH_BLOCK - 1 - delta_b;
+     }
+   else
+     {
+       if (delta < 0)
+ 	return;
+     }
+ 
+   /* Check whether the two references are likely to hit the same cache
+      line, and how distant the iterations in that it occurs are from
+      each other.  */
+ 
+   if (step <= PREFETCH_BLOCK)
+     {
+       /* The accesses are sure to meet.  Let us check when.  */
+       hit_from = ddown (delta_b, PREFETCH_BLOCK) * PREFETCH_BLOCK;
+       prefetch_before = (hit_from - delta_r + step - 1) / step;
+ 
+       if (prefetch_before < ref->prefetch_before)
+ 	ref->prefetch_before = prefetch_before;
+ 
+       return;
+     }
+ 
+   /* A more complicated case.  First let us ensure that size of cache line
+      and step are coprime (here we assume that PREFETCH_BLOCK is a power
+      of two.  */
+   prefetch_block = PREFETCH_BLOCK;
+   while ((step & 1) == 0
+ 	 && prefetch_block > 1)
+     {
+       step >>= 1;
+       prefetch_block >>= 1;
+       delta >>= 1;
+     }
+ 
+   /* Now step > prefetch_block, and step and prefetch_block are coprime.
+      Determine the probability that the accesses hit the same cache line.  */
+ 
+   prefetch_before = delta / step;
+   delta %= step;
+   if ((unsigned HOST_WIDE_INT) delta
+       <= (prefetch_block * ACCEPTABLE_MISS_RATE / 1000))
+     {
+       if (prefetch_before < ref->prefetch_before)
+ 	ref->prefetch_before = prefetch_before;
+ 
+       return;
+     }
+ 
+   /* Try also the following iteration.  */
+   prefetch_before++;
+   delta = step - delta;
+   if ((unsigned HOST_WIDE_INT) delta
+       <= (prefetch_block * ACCEPTABLE_MISS_RATE / 1000))
+     {
+       if (prefetch_before < ref->prefetch_before)
+ 	ref->prefetch_before = prefetch_before;
+ 
+       return;
+     }
+ 
+   /* The ref probably does not reuse by.  */
+   return;
+ }
+ 
+ /* Prune the prefetch candidate REF using the reuses with other references
+    in REFS.  */
+ 
+ static void
+ prune_ref_by_reuse (struct mem_ref *ref, struct mem_ref *refs)
+ {
+   struct mem_ref *prune_by;
+   bool before = true;
+ 
+   prune_ref_by_self_reuse (ref);
+ 
+   for (prune_by = refs; prune_by; prune_by = prune_by->next)
+     {
+       if (prune_by == ref)
+ 	{
+ 	  before = false;
+ 	  continue;
+ 	}
+ 
+       if (!WRITE_CAN_USE_READ_PREFETCH
+ 	  && ref->write_p
+ 	  && !prune_by->write_p)
+ 	continue;
+       if (!READ_CAN_USE_WRITE_PREFETCH
+ 	  && !ref->write_p
+ 	  && prune_by->write_p)
+ 	continue;
+ 
+       prune_ref_by_group_reuse (ref, prune_by, before);
+     }
+ }
+ 
+ /* Prune the prefetch candidates in GROUP using the reuse analysis.  */
+ 
+ static void
+ prune_group_by_reuse (struct mem_ref_group *group)
+ {
+   struct mem_ref *ref_pruned;
+ 
+   for (ref_pruned = group->refs; ref_pruned; ref_pruned = ref_pruned->next)
+     {
+       prune_ref_by_reuse (ref_pruned, group->refs);
+ 
+       if (dump_file && (dump_flags & TDF_DETAILS))
+ 	{
+ 	  fprintf (dump_file, "Reference %p:", (void *) ref_pruned);
+ 
+ 	  if (ref_pruned->prefetch_before == PREFETCH_ALL
+ 	      && ref_pruned->prefetch_mod == 1)
+ 	    fprintf (dump_file, " no restrictions");
+ 	  else if (ref_pruned->prefetch_before == 0)
+ 	    fprintf (dump_file, " do not prefetch");
+ 	  else if (ref_pruned->prefetch_before <= ref_pruned->prefetch_mod)
+ 	    fprintf (dump_file, " prefetch once");
+ 	  else
+ 	    {
+ 	      if (ref_pruned->prefetch_before != PREFETCH_ALL)
+ 		{
+ 		  fprintf (dump_file, " prefetch before ");
+ 		  fprintf (dump_file, HOST_WIDE_INT_PRINT_DEC,
+ 			   ref_pruned->prefetch_before);
+ 		}
+ 	      if (ref_pruned->prefetch_mod != 1)
+ 		{
+ 		  fprintf (dump_file, " prefetch mod ");
+ 		  fprintf (dump_file, HOST_WIDE_INT_PRINT_DEC,
+ 			   ref_pruned->prefetch_mod);
+ 		}
+ 	    }
+ 	  fprintf (dump_file, "\n");
+ 	}
+     }
+ }
+ 
+ /* Prune the list of prefetch candidates GROUPS using the reuse analysis.  */
+ 
+ static void
+ prune_by_reuse (struct mem_ref_group *groups)
+ {
+   for (; groups; groups = groups->next)
+     prune_group_by_reuse (groups);
+ }
+ 
+ /* Returns true if we should issue prefetch for REF.  */
+ 
+ static bool
+ should_issue_prefetch_p (struct mem_ref *ref)
+ {
+   /* For now do not issue prefetches for only first few of the
+      iterations.  */
+   if (ref->prefetch_before != PREFETCH_ALL)
+     return false;
+ 
+   return true;
+ }
+ 
+ /* Decide which of the prefetch candidates in GROUPS to prefetch.
+    AHEAD is the number of iterations to prefetch ahead (which corresponds
+    to the number of simultaneous instances of one prefetch running at a
+    time).  UNROLL_FACTOR is the factor by that the loop is going to be
+    unrolled.  Returns true if there is anything to prefetch.  */
+ 
+ static bool
+ schedule_prefetches (struct mem_ref_group *groups, unsigned unroll_factor,
+ 		     unsigned ahead)
+ {
+   unsigned max_prefetches, n_prefetches;
+   struct mem_ref *ref;
+   bool any = false;
+ 
+   max_prefetches = (SIMULTANEOUS_PREFETCHES * unroll_factor) / ahead;
+   if (max_prefetches > (unsigned) SIMULTANEOUS_PREFETCHES)
+     max_prefetches = SIMULTANEOUS_PREFETCHES;
+ 
+   if (dump_file && (dump_flags & TDF_DETAILS))
+     fprintf (dump_file, "Max prefetches to issue: %d.\n", max_prefetches);
+ 
+   if (!max_prefetches)
+     return false;
+ 
+   /* For now we just take memory references one by one and issue
+      prefetches for as many as possible.  The groups are sorted
+      starting with the largest step, since the references with
+      large step are more likely to cause many cache mises.  */
+ 
+   for (; groups; groups = groups->next)
+     for (ref = groups->refs; ref; ref = ref->next)
+       {
+ 	if (!should_issue_prefetch_p (ref))
+ 	  continue;
+ 
+ 	ref->issue_prefetch_p = true;
+ 	groups->issue_prefetch_p = true;
+ 
+ 	/* If prefetch_mod is less then unroll_factor, we need to insert
+ 	   several prefetches for the reference.  */
+ 	n_prefetches = ((unroll_factor + ref->prefetch_mod - 1)
+ 			/ ref->prefetch_mod);
+ 	if (max_prefetches <= n_prefetches)
+ 	  return true;
+ 
+ 	max_prefetches -= n_prefetches;
+ 	any = true;
+       }
+ 
+   return any;
+ }
+ 
+ /* Determine whether there is any reference suitable for prefetching
+    in GROUPS.  */
+ 
+ static bool
+ anything_to_prefetch_p (struct mem_ref_group *groups)
+ {
+   struct mem_ref *ref;
+ 
+   for (; groups; groups = groups->next)
+     for (ref = groups->refs; ref; ref = ref->next)
+       if (should_issue_prefetch_p (ref))
+ 	return true;
+ 
+   return false;
+ }
+ 
+ /* Issue prefetches for the referenc REF into LOOP as decided before.
+    HEAD is the number of iterations to prefetch ahead.  UNROLL_FACTOR
+    is the factor by thet LOOP was unrolled.  */
+ 
+ static void
+ issue_prefetch_ref (struct loop *loop, struct mem_ref *ref,
+ 		    unsigned unroll_factor, unsigned ahead)
+ {
+   HOST_WIDE_INT delta;
+   tree addr, stmts, prefetch, params, write_p;
+   block_stmt_iterator bsi;
+   unsigned n_prefetches, ap;
+ 
+   if (dump_file && (dump_flags & TDF_DETAILS))
+     fprintf (dump_file, "Issued prefetch for %p.\n", (void *) ref);
+ 
+   /* We put the prefetch to the loop header, so that it runs early.  */
+   bsi = bsi_after_labels (loop->header);
+ 
+   n_prefetches = ((unroll_factor + ref->prefetch_mod - 1)
+ 		  / ref->prefetch_mod);
+   for (ap = 0; ap < n_prefetches; ap++)
+     {
+       /* Determine the address to prefetch.  */
+       addr = ref->group->group_iv;
+ 
+       delta = (ahead + ap * ref->prefetch_mod) * ref->group->step + ref->delta;
+       if (delta)
+ 	{
+ 	  addr = build (PLUS_EXPR, ptr_type_node,
+ 			addr, build_int_cst (ptr_type_node, delta));
+ 	  addr = force_gimple_operand (addr, &stmts, true,
+ 				       SSA_NAME_VAR (ref->group->group_iv));
+ 	  if (stmts)
+ 	    bsi_insert_after (&bsi, stmts, BSI_CONTINUE_LINKING);
+ 	}
+ 
+       /* Create the prefetch instruction.  */
+       write_p = ref->write_p ? integer_one_node : integer_zero_node;
+       params = tree_cons (NULL_TREE, addr,
+ 			  tree_cons (NULL_TREE, write_p, NULL_TREE));
+ 				 
+       prefetch = build_function_call_expr (built_in_decls[BUILT_IN_PREFETCH],
+ 					   params);
+       bsi_insert_after (&bsi, prefetch, BSI_NEW_STMT);
+     }
+ }
+ 
+ /* Issue prefetches for the references in GROUPS into LOOP as decided before.
+    HEAD is the number of iterations to prefetch ahead.  UNROLL_FACTOR is the
+    factor by that LOOP was unrolled.  */
+ 
+ static void
+ issue_prefetches (struct loop *loop, struct mem_ref_group *groups,
+ 		  unsigned unroll_factor, unsigned ahead)
+ {
+   struct mem_ref *ref;
+   tree iv_var, base, step;
+   block_stmt_iterator bsi;
+   bool after;
+ 
+   for (; groups; groups = groups->next)
+     {
+       if (!groups->issue_prefetch_p)
+ 	continue;
+ 
+       /* Create the induction variable for the group.  */
+       iv_var = create_tmp_var (ptr_type_node, "prefetchtmp");
+       add_referenced_tmp_var (iv_var);
+ 
+       base = fold_convert (ptr_type_node,
+ 			   build_fold_addr_expr (unshare_expr (groups->base)));
+       step = build_int_cst (ptr_type_node, groups->step * unroll_factor);
+ 
+       standard_iv_increment_position (loop, &bsi, &after);
+       create_iv (base, step, iv_var, loop, &bsi, after, &groups->group_iv,
+ 		 NULL);
+ 
+       for (ref = groups->refs; ref; ref = ref->next)
+ 	if (ref->issue_prefetch_p)
+ 	  issue_prefetch_ref (loop, ref, unroll_factor, ahead);
+     }
+ }
+ 
+ /* Determines whether we can profitably unroll LOOP, and if this is the
+    case, fill in DESC by the description of number of iterations.  */
+ 
+ static bool
+ can_unroll_loop_p (struct loop *loop, struct tree_niter_desc *desc)
+ {
+   edge exit;
+ 
+   /* Only consider innermost loops for unrolling.  */
+   if (loop->inner)
+     return false;
+ 
+   /* We only consider loops without control flow for unrolling.  This is not
+      a hard restriction -- tree_unroll_loop works with arbitrary loops
+      as well; but the unrolling/prefetching is usually more profitable for
+      loops consisting of a single basic block, and we want to limit the
+      code growth.  */
+   if (loop->num_nodes > 2)
+     return false;
+ 
+   /* We only want to unroll loops for that we are able to determine number
+      of iterations.  We also want to split the extra iterations of the loop
+      from its end, therefore we require that the loop has precisely one
+      exit.  */
+ 
+   exit = single_dom_exit (loop);
+   if (!exit)
+     return false;
+ 
+   if (!number_of_iterations_exit (loop, exit, desc))
+     return false;
+ 
+   /* And of course, we must be able to duplicate the loop.  */
+   if (!can_duplicate_loop_p (loop))
+     return false;
+ 
+   return true;
+ }
+ 
+ /* Determine the coefficient by that unroll LOOP, from the information
+    contained in the list of memory references REFS.  Description of
+    umber of iterations of LOOP is stored to DESC.  AHEAD is the number
+    of iterations ahead that we need to prefetch.  NINSNS is number of
+    insns of the LOOP.  */
+ 
+ static unsigned
+ determine_unroll_factor (struct loop *loop, struct mem_ref_group *refs,
+ 			 unsigned ahead, unsigned ninsns,
+ 			 struct tree_niter_desc *desc)
+ {
+   unsigned upper_bound, size_factor, constraint_factor;
+   unsigned factor, max_mod_constraint, ahead_factor;
+   struct mem_ref_group *agp;
+   struct mem_ref *ref;
+ 
+   if (!can_unroll_loop_p (loop, desc))
+     return 1;
+ 
+   upper_bound = PARAM_VALUE (PARAM_MAX_UNROLL_TIMES);
+ 
+   /* First check whether the loop is not too large to unroll.  */
+   size_factor = PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS) / ninsns;
+   if (size_factor <= 1)
+     return 1;
+ 
+   if (size_factor < upper_bound)
+     upper_bound = size_factor;
+ 
+   max_mod_constraint = 1;
+   for (agp = refs; agp; agp = agp->next)
+     for (ref = agp->refs; ref; ref = ref->next)
+       if (should_issue_prefetch_p (ref)
+ 	  && ref->prefetch_mod > max_mod_constraint)
+ 	max_mod_constraint = ref->prefetch_mod;
+ 
+   /* Set constraint_factor as large as needed to be able to satisfy the
+      largest modulo constraint.  */
+   constraint_factor = max_mod_constraint;
+ 
+   /* If ahead is too large in comparison with the number of available
+      prefetches, unroll the loop as much as needed to be able to prefetch
+      at least partially some of the references in the loop.  */
+   ahead_factor = ((ahead + SIMULTANEOUS_PREFETCHES - 1)
+ 		  / SIMULTANEOUS_PREFETCHES);
+ 
+   /* Unroll as much as useful, but bound the code size growth.  */
+   if (constraint_factor < ahead_factor)
+     factor = ahead_factor;
+   else
+     factor = constraint_factor;
+   if (factor > upper_bound)
+     factor = upper_bound;
+ 
+   return factor;
+ }
+ 
+ /* Issue prefetch instructions for array references in LOOP.  Returns
+    true if the LOOP was unrolled.  LOOPS is the array containing all
+    loops.  */
+ 
+ static bool
+ loop_prefetch_arrays (struct loops *loops, struct loop *loop)
+ {
+   struct mem_ref_group *refs;
+   unsigned ahead, ninsns, unroll_factor;
+   struct tree_niter_desc desc;
+   bool unrolled = false;
+ 
+   /* Step 1: gather the memory references.  */
+   refs = gather_memory_references (loop);
+ 
+   /* Step 2: estimate the reuse effects.  */
+   prune_by_reuse (refs);
+ 
+   if (!anything_to_prefetch_p (refs))
+     goto fail;
+ 
+   /* Step 3: determine the ahead and unroll factor.  */
+ 
+   /* FIXME: We should use not size of the loop, but the average number of
+      instructions executed per iteration of the loop.  */
+   ninsns = tree_num_loop_insns (loop);
+   ahead = (PREFETCH_LATENCY + ninsns - 1) / ninsns;
+   unroll_factor = determine_unroll_factor (loop, refs, ahead, ninsns,
+ 					   &desc);
+   if (dump_file && (dump_flags & TDF_DETAILS))
+     fprintf (dump_file, "Ahead %d, unroll factor %d\n", ahead, unroll_factor);
+ 
+   /* If the loop rolls less than the required unroll factor, prefetching
+      is useless.  */
+   if (unroll_factor > 1
+       && cst_and_fits_in_hwi (desc.niter)
+       && (unsigned HOST_WIDE_INT) int_cst_value (desc.niter) < unroll_factor)
+     goto fail;
+ 
+   /* Step 4: what to prefetch?  */
+   if (!schedule_prefetches (refs, unroll_factor, ahead))
+     goto fail;
+ 
+   /* Step 5: unroll the loop.  TODO -- peeling of first and last few
+      iterations so that we do not issue superfluous prefetches.  */
+   if (unroll_factor != 1)
+     {
+       tree_unroll_loop (loops, loop, unroll_factor,
+ 			single_dom_exit (loop), &desc);
+       unrolled = true;
+     }
+ 
+   /* Step 6: issue the prefetches.  */
+   issue_prefetches (loop, refs, unroll_factor, ahead);
+ 
+ fail:
+   release_mem_refs (refs);
+   return unrolled;
+ }
+ 
+ /* Issue prefetch instructions for array references in LOOPS.  */
+ 
+ void
+ tree_ssa_prefetch_arrays (struct loops *loops)
+ {
+   unsigned i;
+   struct loop *loop;
+   bool unrolled = false;
+ 
+   if (!HAVE_prefetch
+       /* It is possible to ask compiler for say -mtune=i486 -march=pentium4.
+ 	 -mtune=i486 causes us having PREFETCH_BLOCK 0, since this is part
+ 	 of processor costs and i486 does not have prefetch, but
+ 	 -march=pentium4 causes HAVE_prefetch to be true.  Ugh.  */
+       || PREFETCH_BLOCK == 0)
+     return;
+ 
+   if (!built_in_decls[BUILT_IN_PREFETCH])
+     {
+       tree type = build_function_type (void_type_node,
+ 				       tree_cons (NULL_TREE,
+ 						  const_ptr_type_node,
+ 						  NULL_TREE));
+       tree decl = lang_hooks.builtin_function ("__builtin_prefetch", type,
+ 			BUILT_IN_PREFETCH, BUILT_IN_NORMAL,
+ 			NULL, NULL_TREE);
+       DECL_IS_NOVOPS (decl) = true;
+       built_in_decls[BUILT_IN_PREFETCH] = decl;
+     }
+ 
+   /* We assume that size of cache line is a power of two, so verify this
+      here.  */
+   gcc_assert ((PREFETCH_BLOCK & (PREFETCH_BLOCK - 1)) == 0);
+ 
+   for (i = loops->num - 1; i > 0; i--)
+     {
+       loop = loops->parray[i];
+ 
+       if (dump_file && (dump_flags & TDF_DETAILS))
+ 	fprintf (dump_file, "Processing loop %d:\n", loop->num);
+ 
+       if (loop)
+ 	unrolled |= loop_prefetch_arrays (loops, loop);
+ 
+       if (dump_file && (dump_flags & TDF_DETAILS))
+ 	fprintf (dump_file, "\n\n");
+     }
+ 
+   if (unrolled)
+     {
+       scev_reset ();
+       cleanup_tree_cfg_loop ();
+     }
+ }
Index: tree-ssa-loop-unswitch.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-ssa-loop-unswitch.c,v
retrieving revision 2.11
diff -c -3 -p -r2.11 tree-ssa-loop-unswitch.c
*** tree-ssa-loop-unswitch.c	3 May 2005 12:19:46 -0000	2.11
--- tree-ssa-loop-unswitch.c	12 Jun 2005 20:44:03 -0000
*************** Software Foundation, 59 Temple Place - S
*** 74,80 ****
     shape.  */
  
  static struct loop *tree_unswitch_loop (struct loops *, struct loop *, basic_block,
! 				   tree);
  static bool tree_unswitch_single_loop (struct loops *, struct loop *, int);
  static tree tree_may_unswitch_on (basic_block, struct loop *);
  
--- 74,80 ----
     shape.  */
  
  static struct loop *tree_unswitch_loop (struct loops *, struct loop *, basic_block,
! 					tree);
  static bool tree_unswitch_single_loop (struct loops *, struct loop *, int);
  static tree tree_may_unswitch_on (basic_block, struct loop *);
  
*************** tree_unswitch_loop (struct loops *loops,
*** 273,284 ****
  		    basic_block unswitch_on, tree cond)
  {
    basic_block condition_bb;
  
    /* Some sanity checking.  */
    gcc_assert (flow_bb_inside_loop_p (loop, unswitch_on));
    gcc_assert (EDGE_COUNT (unswitch_on->succs) == 2);
    gcc_assert (loop->inner == NULL);
  
    return loop_version (loops, loop, unshare_expr (cond), 
! 		       &condition_bb);
  }
--- 273,289 ----
  		    basic_block unswitch_on, tree cond)
  {
    basic_block condition_bb;
+   unsigned prob_true;
+   edge edge_true, edge_false;
  
    /* Some sanity checking.  */
    gcc_assert (flow_bb_inside_loop_p (loop, unswitch_on));
    gcc_assert (EDGE_COUNT (unswitch_on->succs) == 2);
    gcc_assert (loop->inner == NULL);
  
+   extract_true_false_edges_from_block (unswitch_on, &edge_true, &edge_false);
+   prob_true = edge_true->probability;
    return loop_version (loops, loop, unshare_expr (cond), 
! 		       &condition_bb, prob_true, prob_true,
! 		       REG_BR_PROB_BASE - prob_true, false);
  }
Index: tree-ssa-loop.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-ssa-loop.c,v
retrieving revision 2.30
diff -c -3 -p -r2.30 tree-ssa-loop.c
*** tree-ssa-loop.c	17 May 2005 20:28:30 -0000	2.30
--- tree-ssa-loop.c	12 Jun 2005 20:44:03 -0000
*************** struct tree_opt_pass pass_complete_unrol
*** 377,382 ****
--- 377,416 ----
    0					/* letter */
  };
  
+ /* Prefetching.  */
+ 
+ static void
+ tree_ssa_loop_prefetch (void)
+ {
+   if (!current_loops)
+     return;
+ 
+   tree_ssa_prefetch_arrays (current_loops);
+ }
+ 
+ static bool
+ gate_tree_ssa_loop_prefetch (void)
+ {
+   return flag_prefetch_loop_arrays == 1;
+ }
+ 
+ struct tree_opt_pass pass_loop_prefetch =
+ {
+   "prefetch",				/* name */
+   gate_tree_ssa_loop_prefetch,		/* gate */
+   tree_ssa_loop_prefetch,	       	/* execute */
+   NULL,					/* sub */
+   NULL,					/* next */
+   0,					/* static_pass_number */
+   TV_TREE_PREFETCH,	  		/* tv_id */
+   PROP_cfg | PROP_ssa,			/* properties_required */
+   0,					/* properties_provided */
+   0,					/* properties_destroyed */
+   0,					/* todo_flags_start */
+   TODO_dump_func | TODO_verify_loops,	/* todo_flags_finish */
+   0					/* letter */
+ };
+ 
  /* Induction variable optimizations.  */
  
  static void
Index: tree-ssa-threadupdate.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-ssa-threadupdate.c,v
retrieving revision 2.30
diff -c -3 -p -r2.30 tree-ssa-threadupdate.c
*** tree-ssa-threadupdate.c	7 Jun 2005 21:01:14 -0000	2.30
--- tree-ssa-threadupdate.c	12 Jun 2005 20:44:03 -0000
*************** create_block_for_threading (basic_block 
*** 199,205 ****
  {
    /* We can use the generic block duplication code and simply remove
       the stuff we do not need.  */
!   rd->dup_block = duplicate_block (bb, NULL);
  
    /* Zero out the profile, since the block is unreachable for now.  */
    rd->dup_block->frequency = 0;
--- 199,205 ----
  {
    /* We can use the generic block duplication code and simply remove
       the stuff we do not need.  */
!   rd->dup_block = duplicate_block (bb, NULL, NULL);
  
    /* Zero out the profile, since the block is unreachable for now.  */
    rd->dup_block->frequency = 0;
Index: tree-vectorizer.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-vectorizer.c,v
retrieving revision 2.96
diff -c -3 -p -r2.96 tree-vectorizer.c
*** tree-vectorizer.c	12 Jun 2005 14:02:59 -0000	2.96
--- tree-vectorizer.c	12 Jun 2005 20:44:03 -0000
*************** slpeel_tree_duplicate_loop_to_edge_cfg (
*** 846,852 ****
    new_bbs = xmalloc (sizeof (basic_block) * loop->num_nodes);
  
    copy_bbs (bbs, loop->num_nodes, new_bbs,
! 	    &loop->single_exit, 1, &new_loop->single_exit, NULL);
  
    /* Duplicating phi args at exit bbs as coming 
       also from exit of duplicated loop.  */
--- 846,853 ----
    new_bbs = xmalloc (sizeof (basic_block) * loop->num_nodes);
  
    copy_bbs (bbs, loop->num_nodes, new_bbs,
! 	    &loop->single_exit, 1, &new_loop->single_exit, NULL,
! 	    e->src);
  
    /* Duplicating phi args at exit bbs as coming 
       also from exit of duplicated loop.  */
Index: tree.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree.h,v
retrieving revision 1.736
diff -c -3 -p -r1.736 tree.h
*** tree.h	9 Jun 2005 07:43:40 -0000	1.736
--- tree.h	12 Jun 2005 20:44:04 -0000
*************** extern int integer_pow2p (tree);
*** 3345,3350 ****
--- 3345,3351 ----
  extern int integer_nonzerop (tree);
  
  extern bool zero_p (tree);
+ extern bool nonzero_p (tree);
  extern bool cst_and_fits_in_hwi (tree);
  extern tree num_ending_zeros (tree);
  
Index: doc/invoke.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/invoke.texi,v
retrieving revision 1.634
diff -c -3 -p -r1.634 invoke.texi
*** doc/invoke.texi	10 Jun 2005 01:16:52 -0000	1.634
--- doc/invoke.texi	12 Jun 2005 20:44:05 -0000
*************** Objective-C and Objective-C++ Dialects}.
*** 314,320 ****
  -funsafe-math-optimizations  -ffinite-math-only @gol
  -fno-trapping-math  -fno-zero-initialized-in-bss @gol
  -fomit-frame-pointer  -foptimize-register-move @gol
! -foptimize-sibling-calls  -fprefetch-loop-arrays @gol
  -fprofile-generate -fprofile-use @gol
  -fregmove  -frename-registers @gol
  -freorder-blocks  -freorder-blocks-and-partition -freorder-functions @gol
--- 314,320 ----
  -funsafe-math-optimizations  -ffinite-math-only @gol
  -fno-trapping-math  -fno-zero-initialized-in-bss @gol
  -fomit-frame-pointer  -foptimize-register-move @gol
! -foptimize-sibling-calls  -fprefetch-loop-arrays -fprefetch-loop-arrays-rtl @gol
  -fprofile-generate -fprofile-use @gol
  -fregmove  -frename-registers @gol
  -freorder-blocks  -freorder-blocks-and-partition -freorder-functions @gol
*************** With this option, the compiler will crea
*** 5024,5030 ****
--- 5024,5032 ----
  local variables when unrolling a loop which can result in superior code.
  
  @item -fprefetch-loop-arrays
+ @itemx -fprefetch-loop-arrays-rtl
  @opindex fprefetch-loop-arrays
+ @opindex fprefetch-loop-arrays-rtl
  If supported by the target machine, generate instructions to prefetch
  memory to improve the performance of loops that access large arrays.
  
*************** Move branches with loop invariant condit
*** 5548,5554 ****
--- 5550,5558 ----
  of the loop on both branches (modified according to result of the condition).
  
  @item -fprefetch-loop-arrays
+ @itemx -fprefetch-loop-arrays-rtl
  @opindex fprefetch-loop-arrays
+ @opindex fprefetch-loop-arrays-rtl
  If supported by the target machine, generate instructions to prefetch
  memory to improve the performance of loops that access large arrays.
  


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]