This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[patch] Splitting induction variables in unroller


Hello,

this patch brings in a possibility to express the induction variables
in later iterations of the unrolled loop using their value in the first
iteration, e.g.  replaces

   i = i + 1;
   ...
   i = i + 1;
   ...
   i = i + 1;
   ...

   type chains by

   i0 = i + 1
   ...
   i = i0 + 1
   ...
   i = i0 + 2
   ...

Note that we do not perform live range splitting; this is left up to
webizer -- the reason we do not do it here is that I did not want to
clutter the code with yet another copy propagation pass, and the attempt
to persuade the copy propagation pass we have to do it (by inserting
i1 = i0 + 1 ; i = i1;) failed on the fact that cse undoes the copy
propagation).

Webizer + CSE often achieve the same effect; however there are still
reasons for having this patch:

1) CSE will not do the chain breaking in case cfg in loop body is
   complicated (and this would occur more often in future, provided
   that we still want to simplify the CSE pass)
2) CSE fails to do the chain breaking on platforms with autoincrement
   memory modes (from quite obscure reasons; unfortunately fixing
   this behavior spoils the code)
3) It provides framework for optimizations of type
   http://gcc.gnu.org/ml/gcc-patches/2004-06/msg00922.html
   (making addressing relative to the addresses used in the first
   iteration).

Bootstrapped (with -funroll-all-loops) & regtested on i686.

Zdenek

	* Makefile.in (loop-unroll.o): Add HASHTAB_H and RECOG_H dependency.
	* basic-block.h (struct reorder_block_def): Add copy_number field.
	* cfgloop.h (biv_p): Declare.
	* cfgloopmanip.c (duplicate_loop_to_header_edge): Set copy_number.
	* common.opt (fsplit-ivs-in-unroller): New flag.
	* loop-iv.c (biv_p): New function.
	* loop-unroll.c: Include hashtab.h and recog.h.
	(struct iv_to_split, struct split_ivs_info): New types.
	(analyze_ivs_to_split, si_info_start_duplication, split_ivs_in_copies,
	free_si_info, si_info_hash, si_info_eq, analyze_iv_to_split_insn,
	determine_split_iv_delta, get_ivts_expr, allocate_basic_variable,
	insert_base_initialization, split_iv): New functions.
	(peel_loop_completely, unroll_loop_constant_iterations,
	unroll_loop_runtime_iterations, peel_loop_simple, unroll_loop_stupid):
	Use them.
	* doc/invoke.texi (-fsplit-ivs-in-unroller): Document.

Index: Makefile.in
===================================================================
RCS file: /cvs/gcc/gcc/gcc/Makefile.in,v
retrieving revision 1.1306
diff -c -3 -p -r1.1306 Makefile.in
*** Makefile.in	23 Jun 2004 20:12:41 -0000	1.1306
--- Makefile.in	27 Jun 2004 20:37:22 -0000
*************** loop-unswitch.o : loop-unswitch.c $(CONF
*** 1969,1975 ****
     output.h $(EXPR_H) coretypes.h $(TM_H)
  loop-unroll.o: loop-unroll.c $(CONFIG_H) $(SYSTEM_H) $(RTL_H) $(TM_H) \
     $(BASIC_BLOCK_H) hard-reg-set.h $(CFGLOOP_H) $(CFGLAYOUT_H) $(PARAMS_H) \
!    output.h $(EXPR_H) coretypes.h $(TM_H)
  dominance.o : dominance.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
     hard-reg-set.h $(BASIC_BLOCK_H) et-forest.h
  et-forest.o : et-forest.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) et-forest.h alloc-pool.h
--- 1969,1975 ----
     output.h $(EXPR_H) coretypes.h $(TM_H)
  loop-unroll.o: loop-unroll.c $(CONFIG_H) $(SYSTEM_H) $(RTL_H) $(TM_H) \
     $(BASIC_BLOCK_H) hard-reg-set.h $(CFGLOOP_H) $(CFGLAYOUT_H) $(PARAMS_H) \
!    output.h $(EXPR_H) coretypes.h $(TM_H) $(HASHTAB_H) $(RECOG_H)
  dominance.o : dominance.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
     hard-reg-set.h $(BASIC_BLOCK_H) et-forest.h
  et-forest.o : et-forest.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) et-forest.h alloc-pool.h
Index: basic-block.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/basic-block.h,v
retrieving revision 1.199
diff -c -3 -p -r1.199 basic-block.h
*** basic-block.h	14 Jun 2004 12:09:06 -0000	1.199
--- basic-block.h	27 Jun 2004 20:37:22 -0000
*************** typedef struct reorder_block_def
*** 297,302 ****
--- 297,303 ----
    /* Used by loop copying.  */
    basic_block copy;
    int duplicated;
+   int copy_number;
  
    /* These fields are used by bb-reorder pass.  */
    int visited;
Index: cfgloop.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cfgloop.h,v
retrieving revision 1.20
diff -c -3 -p -r1.20 cfgloop.h
*** cfgloop.h	20 Jun 2004 21:31:28 -0000	1.20
--- cfgloop.h	27 Jun 2004 20:37:22 -0000
*************** extern void iv_analysis_loop_init (struc
*** 397,402 ****
--- 397,403 ----
  extern rtx iv_get_reaching_def (rtx, rtx);
  extern bool iv_analyze (rtx, rtx, struct rtx_iv *);
  extern rtx get_iv_value (struct rtx_iv *, rtx);
+ extern bool biv_p (rtx, rtx);
  extern void find_simple_exit (struct loop *, struct niter_desc *);
  extern void iv_number_of_iterations (struct loop *, rtx, rtx,
  				     struct niter_desc *);
Index: cfgloopmanip.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cfgloopmanip.c,v
retrieving revision 1.26
diff -c -3 -p -r1.26 cfgloopmanip.c
*** cfgloopmanip.c	13 May 2004 06:39:32 -0000	1.26
--- cfgloopmanip.c	27 Jun 2004 20:37:22 -0000
*************** duplicate_loop_to_header_edge (struct lo
*** 979,984 ****
--- 979,987 ----
        /* Copy bbs.  */
        copy_bbs (bbs, n, new_bbs, spec_edges, 2, new_spec_edges, loop);
  
+       for (i = 0; i < n; i++)
+ 	new_bbs[i]->rbi->copy_number = j + 1;
+ 
        /* Note whether the blocks and edges belong to an irreducible loop.  */
        if (add_irreducible_flag)
  	{
*************** duplicate_loop_to_header_edge (struct lo
*** 1057,1062 ****
--- 1060,1067 ----
        int n_dom_bbs,j;
  
        bb = bbs[i];
+       bb->rbi->copy_number = 0;
+ 
        n_dom_bbs = get_dominated_by (CDI_DOMINATORS, bb, &dom_bbs);
        for (j = 0; j < n_dom_bbs; j++)
  	{
Index: common.opt
===================================================================
RCS file: /cvs/gcc/gcc/gcc/common.opt,v
retrieving revision 1.37
diff -c -3 -p -r1.37 common.opt
*** common.opt	20 Jun 2004 21:31:28 -0000	1.37
--- common.opt	27 Jun 2004 20:37:22 -0000
*************** fsingle-precision-constant
*** 686,691 ****
--- 686,695 ----
  Common Report Var(flag_single_precision_constant)
  Convert floating point constants to single precision constants
  
+ fsplit-ivs-in-unroller
+ Common Report Var(flag_split_ivs_in_unroller) Init(1)
+ Split lifetimes of induction variables when loops are unrolled.
+ 
  fstack-check
  Common Report Var(flag_stack_check)
  Insert stack checking code into the program
Index: loop-iv.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/loop-iv.c,v
retrieving revision 2.10
diff -c -3 -p -r2.10 loop-iv.c
*** loop-iv.c	24 Jun 2004 16:50:35 -0000	2.10
--- loop-iv.c	27 Jun 2004 20:37:22 -0000
*************** iv_analyze (rtx insn, rtx def, struct rt
*** 1153,1158 ****
--- 1153,1175 ----
    return iv->base != NULL_RTX;
  }
  
+ /* Checks whether definition of register REG in INSN a basic induction
+    variable.  */
+ 
+ bool
+ biv_p (rtx insn, rtx reg)
+ {
+   struct rtx_iv iv;
+ 
+   if (!REG_P (reg))
+     return false;
+ 
+   if (last_def[REGNO (reg)] != insn)
+     return false;
+ 
+   return iv_analyze_biv (reg, &iv);
+ }
+ 
  /* Calculates value of IV at ITERATION-th iteration.  */
  
  rtx
Index: loop-unroll.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/loop-unroll.c,v
retrieving revision 1.16
diff -c -3 -p -r1.16 loop-unroll.c
*** loop-unroll.c	24 Feb 2004 23:39:55 -0000	1.16
--- loop-unroll.c	27 Jun 2004 20:37:22 -0000
*************** Software Foundation, 59 Temple Place - S
*** 30,35 ****
--- 30,37 ----
  #include "params.h"
  #include "output.h"
  #include "expr.h"
+ #include "hashtab.h"
+ #include "recog.h"
  
  /* This pass performs loop unrolling and peeling.  We only perform these
     optimizations on innermost loops (with single exception) because
*************** Software Foundation, 59 Temple Place - S
*** 66,71 ****
--- 68,95 ----
     showed that this choice may affect performance in order of several %.
     */
  
+ /* Information about induction variables to split.  */
+ 
+ struct iv_to_split
+ {
+   rtx insn;		/* The insn in that the induction variable occurs.  */
+   rtx base_var;		/* The variable on that the values in the further
+ 			   iterations are based.  */
+   rtx step;		/* Step of the induction variable.  */
+   unsigned n_loc;
+   unsigned loc[3];	/* Location where the definition of the induction
+ 			   variable occurs in the insn.  For example if
+ 			   N_LOC is 2, the expression is located at
+ 			   XEXP (XEXP (single_set, loc[0]), loc[1]).  */ 
+ };
+ 
+ struct split_ivs_info
+ {
+   htab_t insns_to_split;	/* A hashtable of insns to split.  */
+   unsigned first_new_block;	/* The first basic block that was
+ 				   duplicated.  */
+ };
+ 
  static void decide_unrolling_and_peeling (struct loops *, int);
  static void peel_loops_completely (struct loops *, int);
  static void decide_peel_simple (struct loop *, int);
*************** static void peel_loop_completely (struct
*** 79,84 ****
--- 103,112 ----
  static void unroll_loop_stupid (struct loops *, struct loop *);
  static void unroll_loop_constant_iterations (struct loops *, struct loop *);
  static void unroll_loop_runtime_iterations (struct loops *, struct loop *);
+ static struct split_ivs_info *analyze_ivs_to_split (struct loop *);
+ static void si_info_start_duplication (struct split_ivs_info *);
+ static void split_ivs_in_copies (struct split_ivs_info *, unsigned, bool, bool);
+ static void free_si_info (struct split_ivs_info *);
  
  /* Unroll and/or peel (depending on FLAGS) LOOPS.  */
  void
*************** peel_loop_completely (struct loops *loop
*** 428,433 ****
--- 456,462 ----
    unsigned n_remove_edges, i;
    edge *remove_edges, ei;
    struct niter_desc *desc = get_simple_loop_desc (loop);
+   struct split_ivs_info *si_info = NULL;
  
    npeel = desc->niter;
  
*************** peel_loop_completely (struct loops *loop
*** 442,447 ****
--- 471,480 ----
        remove_edges = xcalloc (npeel, sizeof (edge));
        n_remove_edges = 0;
  
+       if (flag_split_ivs_in_unroller)
+ 	si_info = analyze_ivs_to_split (loop);
+ 
+       si_info_start_duplication (si_info);
        if (!duplicate_loop_to_header_edge (loop, loop_preheader_edge (loop),
  		loops, npeel,
  		wont_exit, desc->out_edge, remove_edges, &n_remove_edges,
*************** peel_loop_completely (struct loops *loop
*** 450,455 ****
--- 483,494 ----
  
        free (wont_exit);
  
+       if (si_info)
+ 	{
+ 	  split_ivs_in_copies (si_info, npeel, false, true);
+ 	  free_si_info (si_info);
+ 	}
+ 
        /* Remove the exit edges.  */
        for (i = 0; i < n_remove_edges; i++)
  	remove_path (loops, remove_edges[i]);
*************** unroll_loop_constant_iterations (struct 
*** 597,602 ****
--- 636,642 ----
    unsigned max_unroll = loop->lpt_decision.times;
    struct niter_desc *desc = get_simple_loop_desc (loop);
    bool exit_at_end = loop_exit_at_end_p (loop);
+   struct split_ivs_info *si_info = NULL;
  
    niter = desc->niter;
  
*************** unroll_loop_constant_iterations (struct 
*** 611,616 ****
--- 651,659 ----
    remove_edges = xcalloc (max_unroll + exit_mod + 1, sizeof (edge));
    n_remove_edges = 0;
  
+   if (flag_split_ivs_in_unroller)
+     si_info = analyze_ivs_to_split (loop);
+ 
    if (!exit_at_end)
      {
        /* The exit is not at the end of the loop; leave exit test
*************** unroll_loop_constant_iterations (struct 
*** 627,632 ****
--- 670,676 ----
  
        if (exit_mod)
  	{
+ 	  si_info_start_duplication (si_info);
  	  if (!duplicate_loop_to_header_edge (loop, loop_preheader_edge (loop),
  					      loops, exit_mod,
  					      wont_exit, desc->out_edge,
*************** unroll_loop_constant_iterations (struct 
*** 634,639 ****
--- 678,686 ----
  					      DLTHE_FLAG_UPDATE_FREQ))
  	    abort ();
  
+ 	  if (si_info && exit_mod > 1)
+ 	    split_ivs_in_copies (si_info, exit_mod, false, false);
+ 
  	  desc->noloop_assumptions = NULL_RTX;
  	  desc->niter -= exit_mod;
  	  desc->niter_max -= exit_mod;
*************** unroll_loop_constant_iterations (struct 
*** 659,670 ****
--- 706,721 ----
  	  if (desc->noloop_assumptions)
  	    RESET_BIT (wont_exit, 1);
  
+ 	  si_info_start_duplication (si_info);
  	  if (!duplicate_loop_to_header_edge (loop, loop_preheader_edge (loop),
  		loops, exit_mod + 1,
  		wont_exit, desc->out_edge, remove_edges, &n_remove_edges,
  		DLTHE_FLAG_UPDATE_FREQ))
  	    abort ();
  
+ 	  if (si_info && exit_mod > 0)
+ 	    split_ivs_in_copies (si_info, exit_mod + 1, false, false);
+ 
  	  desc->niter -= exit_mod + 1;
  	  desc->niter_max -= exit_mod + 1;
  	  desc->noloop_assumptions = NULL_RTX;
*************** unroll_loop_constant_iterations (struct 
*** 677,688 ****
--- 728,746 ----
      }
  
    /* Now unroll the loop.  */
+   si_info_start_duplication (si_info);
    if (!duplicate_loop_to_header_edge (loop, loop_latch_edge (loop),
  		loops, max_unroll,
  		wont_exit, desc->out_edge, remove_edges, &n_remove_edges,
  		DLTHE_FLAG_UPDATE_FREQ))
      abort ();
  
+   if (si_info)
+     {
+       split_ivs_in_copies (si_info, max_unroll, true, true);
+       free_si_info (si_info);
+     }
+ 
    free (wont_exit);
  
    if (exit_at_end)
*************** unroll_loop_runtime_iterations (struct l
*** 842,847 ****
--- 900,909 ----
    unsigned max_unroll = loop->lpt_decision.times;
    struct niter_desc *desc = get_simple_loop_desc (loop);
    bool exit_at_end = loop_exit_at_end_p (loop);
+   struct split_ivs_info *si_info = NULL;
+ 
+   if (flag_split_ivs_in_unroller)
+     si_info = analyze_ivs_to_split (loop);
  
    /* Remember blocks whose dominators will have to be updated.  */
    dom_bbs = xcalloc (n_basic_blocks, sizeof (basic_block));
*************** unroll_loop_runtime_iterations (struct l
*** 979,990 ****
--- 1041,1059 ----
    sbitmap_ones (wont_exit);
    RESET_BIT (wont_exit, may_exit_copy);
  
+   si_info_start_duplication (si_info);
    if (!duplicate_loop_to_header_edge (loop, loop_latch_edge (loop),
  		loops, max_unroll,
  		wont_exit, desc->out_edge, remove_edges, &n_remove_edges,
  		DLTHE_FLAG_UPDATE_FREQ))
      abort ();
  
+   if (si_info)
+     {
+       split_ivs_in_copies (si_info, max_unroll, true, true);
+       free_si_info (si_info);
+     }
+ 
    free (wont_exit);
  
    if (exit_at_end)
*************** peel_loop_simple (struct loops *loops, s
*** 1138,1147 ****
--- 1207,1221 ----
    sbitmap wont_exit;
    unsigned npeel = loop->lpt_decision.times;
    struct niter_desc *desc = get_simple_loop_desc (loop);
+   struct split_ivs_info *si_info = NULL;
+ 
+   if (flag_split_ivs_in_unroller && npeel > 1)
+     si_info = analyze_ivs_to_split (loop);
  
    wont_exit = sbitmap_alloc (npeel + 1);
    sbitmap_zero (wont_exit);
  
+   si_info_start_duplication (si_info);
    if (!duplicate_loop_to_header_edge (loop, loop_preheader_edge (loop),
  		loops, npeel, wont_exit, NULL, NULL, NULL,
  		DLTHE_FLAG_UPDATE_FREQ))
*************** peel_loop_simple (struct loops *loops, s
*** 1149,1154 ****
--- 1223,1234 ----
  
    free (wont_exit);
  
+   if (si_info)
+     {
+       split_ivs_in_copies (si_info, npeel, false, false);
+       free_si_info (si_info);
+     }
+ 
    if (desc->simple_p)
      {
        if (desc->const_iter)
*************** unroll_loop_stupid (struct loops *loops,
*** 1271,1285 ****
--- 1351,1376 ----
    sbitmap wont_exit;
    unsigned nunroll = loop->lpt_decision.times;
    struct niter_desc *desc = get_simple_loop_desc (loop);
+   struct split_ivs_info *si_info = NULL;
+ 
+   if (flag_split_ivs_in_unroller)
+     si_info = analyze_ivs_to_split (loop);
  
    wont_exit = sbitmap_alloc (nunroll + 1);
    sbitmap_zero (wont_exit);
  
+   si_info_start_duplication (si_info);
    if (!duplicate_loop_to_header_edge (loop, loop_latch_edge (loop),
  		loops, nunroll, wont_exit, NULL, NULL, NULL,
  		DLTHE_FLAG_UPDATE_FREQ))
      abort ();
  
+   if (si_info)
+     {
+       split_ivs_in_copies (si_info, nunroll, true, true);
+       free_si_info (si_info);
+     }
+ 
    free (wont_exit);
  
    if (desc->simple_p)
*************** unroll_loop_stupid (struct loops *loops,
*** 1297,1299 ****
--- 1388,1761 ----
      fprintf (dump_file, ";; Unrolled loop %d times, %i insns\n",
  	     nunroll, num_loop_insns (loop));
  }
+ 
+ /* A hash function for information about insns to split.  */
+ 
+ static hashval_t
+ si_info_hash (const void *ivts)
+ {
+   return htab_hash_pointer (((struct iv_to_split *) ivts)->insn);
+ }
+ 
+ /* An equality functions for information about insns to split.  */
+ 
+ static int
+ si_info_eq (const void *ivts1, const void *ivts2)
+ {
+   const struct iv_to_split *i1 = ivts1;
+   const struct iv_to_split *i2 = ivts2;
+ 
+   return i1->insn == i2->insn;
+ }
+ 
+ /* Determine whether there is an induction variable in INSN that
+    we would like to split during unrolling.  */
+ 
+ static struct iv_to_split *
+ analyze_iv_to_split_insn (rtx insn)
+ {
+   rtx set, dest;
+   struct rtx_iv iv;
+   struct iv_to_split *ivts;
+ 
+   /* For now we just split the basic induction variables.  Later this may be
+      extended for example by selecting also addresses of memory references.  */
+   set = single_set (insn);
+   if (!set)
+     return NULL;
+ 
+   dest = SET_DEST (set);
+   if (!REG_P (dest))
+     return NULL;
+ 
+   if (!biv_p (insn, dest))
+     return NULL;
+ 
+   if (!iv_analyze (insn, dest, &iv))
+     abort ();
+ 
+   if (iv.step == const0_rtx
+       || iv.mode != iv.extend_mode)
+     return NULL;
+ 
+   /* Record the insn to split.  */
+   ivts = xmalloc (sizeof (struct iv_to_split));
+   ivts->insn = insn;
+   ivts->base_var = NULL_RTX;
+   ivts->step = iv.step;
+   ivts->n_loc = 1;
+   ivts->loc[0] = 1;
+   
+   return ivts;
+ }
+ 
+ /* Determines which of induction variables in LOOP to split.  */
+ 
+ static struct split_ivs_info *
+ analyze_ivs_to_split (struct loop *loop)
+ {
+   basic_block *body, bb;
+   unsigned i;
+   struct split_ivs_info *si_info = xcalloc (1, sizeof (struct split_ivs_info));
+   rtx insn;
+   struct iv_to_split *ivts;
+   PTR *slot;
+ 
+   si_info->insns_to_split = htab_create (5 * loop->num_nodes,
+ 					 si_info_hash, si_info_eq, free);
+ 
+   iv_analysis_loop_init (loop);
+ 
+   body = get_loop_body (loop);
+   for (i = 0; i < loop->num_nodes; i++)
+     {
+       bb = body[i];
+       if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
+ 	continue;
+ 
+       FOR_BB_INSNS (bb, insn)
+ 	{
+ 	  if (!INSN_P (insn))
+ 	    continue;
+ 
+ 	  ivts = analyze_iv_to_split_insn (insn);
+ 
+ 	  if (!ivts)
+ 	    continue;
+ 
+ 	  slot = htab_find_slot (si_info->insns_to_split, ivts, INSERT);
+ 	  *slot = ivts;
+ 	}
+     }
+ 
+   free (body);
+ 
+   return si_info;
+ }
+ 
+ /* Called just before loop duplication.  Records start of duplicated area
+    to SI_INFO.  */
+ 
+ static void 
+ si_info_start_duplication (struct split_ivs_info *si_info)
+ {
+   if (si_info)
+     si_info->first_new_block = last_basic_block;
+ }
+ 
+ /* Determine the number of iterations between initialization of the base
+    variable and the current copy (N_COPY).  N_COPIES is the total number
+    of newly created copies.  UNROLLING is true if we are unrolling
+    (not peeling) the loop.  */
+ 
+ static unsigned
+ determine_split_iv_delta (unsigned n_copy, unsigned n_copies, bool unrolling)
+ {
+   if (unrolling)
+     {
+       /* If we are unrolling, initialization is done in the original loop
+ 	 body (number 0).  */
+       return n_copy;
+     }
+   else
+     {
+       /* If we are peeling, the copy in that the initialization occurs has
+ 	 number 1.  The original loop (number 0) is the last.  */
+       if (n_copy)
+ 	return n_copy - 1;
+       else
+ 	return n_copies;
+     }
+ }
+ 
+ /* Locates expression corresponding to the location recorded in IVTS
+    in EXPR.  */
+ 
+ static rtx *
+ get_ivts_expr (rtx expr, struct iv_to_split *ivts)
+ {
+   unsigned i;
+   rtx *ret = &expr;
+ 
+   for (i = 0; i < ivts->n_loc; i++)
+     ret = &XEXP (*ret, ivts->loc[i]);
+ 
+   return ret;
+ }
+ 
+ /* Allocate basic variable for the induction variable chain.  Callback for
+    htab_traverse.  */
+ 
+ static int
+ allocate_basic_variable (void **slot, void *data ATTRIBUTE_UNUSED)
+ {
+   struct iv_to_split *ivts = *slot;
+   rtx expr = *get_ivts_expr (single_set (ivts->insn), ivts);
+ 
+   ivts->base_var = gen_reg_rtx (GET_MODE (expr));
+ 
+   return 1;
+ }
+ 
+ /* Insert initialization of basic variable of IVTS before INSN, taking
+    the initial value from it.  */
+ 
+ static void
+ insert_base_initialization (struct iv_to_split *ivts, rtx insn)
+ {
+   rtx expr = copy_rtx (*get_ivts_expr (single_set (insn), ivts));
+   rtx seq;
+ 
+   start_sequence ();
+   expr = force_operand (expr, ivts->base_var);
+   if (expr != ivts->base_var)
+     emit_move_insn (ivts->base_var, expr);
+   seq = get_insns ();
+   end_sequence ();
+ 
+   emit_insn_before (seq, insn);
+ }
+ 
+ /* Replace the use of induction variable described in IVTS in INSN
+    by base variable + DELTA * step.  */
+ 
+ static void
+ split_iv (struct iv_to_split *ivts, rtx insn, unsigned delta)
+ {
+   rtx expr, *loc, seq, incr, var;
+   enum machine_mode mode = GET_MODE (ivts->base_var);
+   rtx src, dest, set;
+ 
+   if (!delta)
+     expr = ivts->base_var;
+   else
+     {
+       incr = simplify_gen_binary (MULT, mode,
+ 				  ivts->step, gen_int_mode (delta, mode));
+       expr = simplify_gen_binary (PLUS, GET_MODE (ivts->base_var),
+ 				  ivts->base_var, incr);
+     }
+ 
+   loc = get_ivts_expr (single_set (insn), ivts);
+ 
+   if (validate_change (insn, loc, expr, 0))
+     return;
+ 
+   start_sequence ();
+   if (REG_P (expr))
+     var = expr;
+   else
+     {
+       var = gen_reg_rtx (mode);
+   
+       expr = force_operand (expr, var);
+       if (expr != var)
+ 	emit_move_insn (var, expr);
+     }
+   seq = get_insns ();
+   end_sequence ();
+ 
+   emit_insn_before (seq, insn);
+       
+   if (validate_change (insn, loc, var, 0))
+     return;
+ 
+   /* The last chance.  Try recreating the assignment in insn completely from
+      scratch.  */
+   set = single_set (insn);
+   if (!set)
+     abort ();
+ 
+   start_sequence ();
+   *loc = var;
+   src = copy_rtx (SET_SRC (insn));
+   dest = copy_rtx (SET_DEST (insn));
+   src = force_operand (src, dest);
+   if (src != dest)
+     emit_move_insn (dest, src);
+    seq = get_insns ();
+    end_sequence ();
+      
+   emit_insn_before (seq, insn);
+   delete_insn (insn);
+ }
+ 
+ /* Splits induction variables (that are marked in SI_INFO) in copies of loop.
+    I.e. replace
+ 
+    i = i + 1;
+    ...
+    i = i + 1;
+    ...
+    i = i + 1;
+    ...
+ 
+    type chains by
+ 
+    i0 = i + 1
+    ...
+    i = i0 + 1
+    ...
+    i = i0 + 2
+    ...
+ 
+    UNROLLING is true if we unrolled (not peeled) the loop.
+    REWRITE_ORIGINAL_BODY is true if we should also rewrite the original body of
+    the loop (as it should happen in complete unrolling, but not in ordinary
+    peeling of the loop).  */
+ 
+ static void
+ split_ivs_in_copies (struct split_ivs_info *si_info, unsigned n_copies,
+ 		     bool unrolling, bool rewrite_original_loop)
+ {
+   unsigned i, delta;
+   basic_block bb, orig_bb;
+   rtx insn, orig_insn, next;
+   struct iv_to_split ivts_templ, *ivts;
+ 
+   /* Sanity check -- we need to put initialization in the original loop
+      body.  */
+   if (unrolling && !rewrite_original_loop)
+     abort ();
+ 
+   /* Allocate the basic variables (i0).  */
+   htab_traverse (si_info->insns_to_split, allocate_basic_variable, NULL);
+ 
+   for (i = si_info->first_new_block; i < (unsigned) last_basic_block; i++)
+     {
+       bb = BASIC_BLOCK (i);
+       orig_bb = bb->rbi->original;
+ 
+       delta = determine_split_iv_delta (bb->rbi->copy_number, n_copies,
+ 					unrolling);
+       orig_insn = BB_HEAD (orig_bb);
+       for (insn = BB_HEAD (bb); insn != NEXT_INSN (BB_END (bb)); insn = next)
+ 	{
+ 	  next = NEXT_INSN (insn);
+ 	  if (!INSN_P (insn))
+ 	    continue;
+ 
+ 	  while (!INSN_P (orig_insn))
+ 	    orig_insn = NEXT_INSN (orig_insn);
+ 
+ 	  ivts_templ.insn = orig_insn;
+ 	  ivts = htab_find (si_info->insns_to_split, &ivts_templ);
+ 	  if (ivts)
+ 	    {
+ 
+ #ifdef ENABLE_CHECKING
+ 	      if (!rtx_equal_p (PATTERN (insn), PATTERN (orig_insn)))
+ 		abort ();
+ #endif
+ 
+ 	      if (!delta)
+ 		insert_base_initialization (ivts, insn);
+ 	      split_iv (ivts, insn, delta);
+ 	    }
+ 	  orig_insn = NEXT_INSN (orig_insn);
+ 	}
+     }
+ 
+   if (!rewrite_original_loop)
+     return;
+ 
+   /* Rewrite also the original loop body.  Find them as originals of the blocks
+      in the last copied iteration, i.e. those that have
+      bb->rbi->original->copy == bb.  */
+   for (i = si_info->first_new_block; i < (unsigned) last_basic_block; i++)
+     {
+       bb = BASIC_BLOCK (i);
+       orig_bb = bb->rbi->original;
+       if (orig_bb->rbi->copy != bb)
+ 	continue;
+ 
+       delta = determine_split_iv_delta (0, n_copies, unrolling);
+       for (orig_insn = BB_HEAD (orig_bb);
+ 	   orig_insn != NEXT_INSN (BB_END (bb));
+ 	   orig_insn = next)
+ 	{
+ 	  next = NEXT_INSN (orig_insn);
+ 
+ 	  if (!INSN_P (orig_insn))
+ 	    continue;
+ 
+ 	  ivts_templ.insn = orig_insn;
+ 	  ivts = htab_find (si_info->insns_to_split, &ivts_templ);
+ 	  if (!ivts)
+ 	    continue;
+ 
+ 	  if (!delta)
+ 	    insert_base_initialization (ivts, orig_insn);
+ 	  split_iv (ivts, orig_insn, delta);
+ 	}
+     }
+ }
+ 
+ /* Release SI_INFO.  */
+ 
+ static void
+ free_si_info (struct split_ivs_info *si_info)
+ {
+   htab_delete (si_info->insns_to_split);
+   free (si_info);
+ }
Index: doc/invoke.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/invoke.texi,v
retrieving revision 1.473
diff -c -3 -p -r1.473 invoke.texi
*** doc/invoke.texi	22 Jun 2004 08:32:31 -0000	1.473
--- doc/invoke.texi	27 Jun 2004 20:37:22 -0000
*************** in the following sections.
*** 310,315 ****
--- 310,316 ----
  -fsingle-precision-constant  @gol
  -fstrength-reduce  -fstrict-aliasing  -ftracer  -fthread-jumps @gol
  -funroll-all-loops  -funroll-loops  -fpeel-loops @gol
+ -fsplit-ivs-in-unroller @gol
  -funswitch-loops  -fold-unroll-loops  -fold-unroll-all-loops @gol
  -ftree-pre  -ftree-ccp  -ftree-dce  @gol
  -ftree-dominator-opts -ftree-dse -ftree-copyrename @gol
*************** the loop is entered.  This usually makes
*** 4430,4435 ****
--- 4431,4448 ----
  @option{-funroll-all-loops} implies the same options as
  @option{-funroll-loops},
  
+ @item -fsplit-ivs-in-unroller
+ @opindex -fsplit-ivs-in-unroller
+ Enables expressing of values of induction variables in later iterations
+ of the unrolled loop using the value in the first iteration.  This breaks
+ long dependency chains, thus improving efficiency of the scheduling passes
+ (for best results, @option{-fweb} should be used as well).
+ 
+ Combination of @option{-fweb} and CSE is often sufficient to obtain the
+ same effect. However in cases the loop body is more complicated than
+ a single basic block, this is not reliable.  It also does not work at all
+ on some of the architectures due to restrictions in the CSE pass.
+ 
  @item -fprefetch-loop-arrays
  @opindex fprefetch-loop-arrays
  If supported by the target machine, generate instructions to prefetch


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]