This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] Modulo-scheduling improvements. Patch 1 of 2.


Hi,

This is the first of two patches intended to fix and improve
applicability of SMS. It concerns SMS in general. The second patch
adds support for SMS on SPU architecture which does not have a PARALLEL
jump instruction.

This patch addresses the third and the fourth items on
http://gcc.gnu.org/wiki/SwingModuloScheduling:
The way the profitability estimation of SMS currently works cripples
SMS. This is because SMS attempts to verify that the insertion of all
regmoves into the kernel does not increase its total cycle count, as
determined by using the DFA engine (i.e. considering resource
utilization and disregarding dependencies). This inaccurate estimate
was found to be overly conservative. This patch drops the
attempt to estimate the cost of inserting regmoves to the kernel, to
help test and tune SMS. We bound the size of scheduling window in
order to prevent
the dependencies spanning more than one iteration thus requiring
regmoves. This default
behaviour can be changed by specifying the
"-fmodulo-sched-allow-regmoves" flag.


In addition we fix several problems of SMS. One of them was described in the second item on http://gcc.gnu.org/wiki/SwingModuloScheduling. The modulo-scheduler may violate anti-dependencies when scheduling both the source and the sink of the dependence in the same cycle. This can occur when we have two or more defs of the same register, and the last use of the register is scheduled II cycles after the first def (it cannot be scheduled later than that because of a dependence arc), so both are assigned to the same row, but possibly in the wrong order. Note also that for a single def and it's uses this problem does not arise, as register moves are used to handle live-ranges exceeding II cycles. This patch preserves the correct order for such cases.


Another problem this patch fixes is that ignoring the ANTI_DEPs can lead to dependence/ordering violation in case of multiple definitions of the same register. This caused a bootstrap failure on PowerPC with SMS enabled and fixed in ddg.c. Reduced testcase attached.

We have tested the performance of SMS on benchmarks from embedded suites
on SPU port and didn't see any significant degradation as result of SMS:
there are no additional instructions in the inner loops and it is enabled only
if it's able to find a schedule with at least two overlapping iterations. On
the other hand there were cases of significant improvements. For example
of SMS impact, there is more than 50% improvement on the loop of
vect-widen-mult-sum.c from the vectorizer testsuite. Currently SMS
doesn't work on vectorized loops, (where it's impact is often more significant
due to longer latencies), support for this is in works.

The combined patch was bootstrapped (with -fmodulo-sched enabled)
enabled on i586-linux and powerpc-linux. make-check with and without
the patch completed without new regressions for the C testsuite on
powerpc-linux and spu, now retesting for other languages (though the
only testcases for the modulo-scheduling are in gcc.dg).

ok for mainline when the tests complete?

:ADDPATCH modulo-sched:

thanks,
Vladimir

2007-01-18 Vladimir Yanovsky <yanov@il.ibm.com>

      (calculate_maxii): remove function.
      (undo_generate_reg_moves): Likewise.
      (undo_permute_partial_schedule): Likewise.
      (kernel_number_of_cycles): Likewise.
      (sms_schedule): remove profitability checks.
      (get_sched_window): prevent necessitating regmoves by limiting
      the scheduling window, unless FLAG_MODULO_SCHED_ALLOW_REGMOVES==1.
      (sms_schedule_by_order): use must_follow & must_precede bitmaps to
      determine order of nodes within the cycle.
      * common.opt (fmodulo-sched-allow-regmoves): new flag.
      * testsuite/gcc.dg/sms-antideps.c: new test.



Index: ddg.c
===================================================================
*** ddg.c	(revision 120757)
--- ddg.c	(working copy)
*************** create_ddg_dependence (ddg_ptr g, ddg_no
*** 178,194 ****
     {
       /* Some interloop dependencies are relaxed:
 	 1. Every insn is output dependent on itself; ignore such deps.
! 	 2. Every true/flow dependence is an anti dependence in the
 	 opposite direction with distance 1; such register deps
! 	 will be removed by renaming if broken --- ignore them.  */
!       if (!(t == OUTPUT_DEP && src_node == dest_node)
! 	  && !(t == ANTI_DEP && dt == REG_DEP))
 	add_backarc_to_ddg (g, e);
       else
 	free (e);
     }
-   else if (t == ANTI_DEP && dt == REG_DEP)
-     free (e);  /* We can fix broken anti register deps using reg-moves.  */
   else
     add_edge_to_ddg (g, e);
 }
--- 178,196 ----
     {
       /* Some interloop dependencies are relaxed:
 	 1. Every insn is output dependent on itself; ignore such deps.
! 	 2. Assuming a single use of any register,
! 	 If there were a single definition of every register we could
! 	 have ignored the antidependencies as
! 	 every true/flow dependence is an anti dependence in the
 	 opposite direction with distance 1; such register deps
! 	 will be removed by renaming if broken. But in some cases we do see
! 	 multiple definition, for instance in case of long long registers so
! 	 we don't remove them.*/
!       if (!(t == OUTPUT_DEP && src_node == dest_node))
 	add_backarc_to_ddg (g, e);
       else
 	free (e);
     }
   else
     add_edge_to_ddg (g, e);
 }
Index: ChangeLog
===================================================================
*** ChangeLog	(revision 120757)
--- ChangeLog	(working copy)
***************
*** 1,3 ****
--- 1,18 ----
+ 2007-01-17  Vladimir Yanovsky <yanov@il.ibm.com>
+ 	* ddg.c (create_ddg_dependence): don't ignore ANTI_DEPs
+ 	* modulo-sched.c (print_node_sched_params): add parameter and verbosity.
+ 	(calculate_maxii): remove function.
+ 	(undo_generate_reg_moves): Likewise.
+ 	(undo_permute_partial_schedule): Likewise.
+ 	(kernel_number_of_cycles): Likewise.
+ 	(sms_schedule): remove profitability checks.
+ 	(get_sched_window): prevent necessitating regmoves by limiting
+ 	the scheduling window, unless FLAG_MODULO_SCHED_ALLOW_REGMOVES==1.
+ 	(sms_schedule_by_order): use must_follow & must_precede bitmaps to
+ 	determine order of nodes within the cycle.
+ 	* common.opt (fmodulo-sched-allow-regmoves): new flag.
+ 	* testsuite/gcc.dg/sms-antideps.c: new test.
+
 2007-01-13  Zdenek Dvorak <dvorakz@suse.cz>

 	* ipa-reference.c (analyze_function): Consider also addresses taken
Index: testsuite/gcc.dg/sms-antideps.c
===================================================================
*** testsuite/gcc.dg/sms-antideps.c	(revision 0)
--- testsuite/gcc.dg/sms-antideps.c	(revision 0)
***************
*** 0 ****
--- 1,34 ----
+ /* This test is a reduced test case for a bug that caused
+  *   bootstrapping with -fmodulo-sched.
+  *     Related to a broken anti-dep that was not fixed by
+  *       reg-moves.  */
+
+ /* { dg-do run } */
+ /* { dg-options "-O2 -fmodulo-sched" } */
+
+ #include <stdlib.h>
+
+ unsigned long long
+ foo (long long ixi, unsigned ctr)
+ {
+    unsigned long long irslt = 1;
+     long long ix = ixi;
+
+      for (; ctr; ctr--)
+         {
+ 	     irslt *= ix;
+ 	        ix *= ix;
+ 		 }
+
+       if (irslt != 14348907)
+ 	   abort ();
+        return irslt;
+ }
+
+
+ int main ()
+ {
+    unsigned long long res;
+
+     res = foo (3, 4);
+ }
Index: testsuite/ChangeLog
===================================================================
*** testsuite/ChangeLog	(revision 120757)
--- testsuite/ChangeLog	(working copy)
***************
*** 1,3 ****
--- 1,6 ----
+ 2007-01-14  Vladimir Yanovsky <yanov@il.ibm.com>
+ 	* gcc.dg/sms-antideps.c: New test.
+
 2007-01-13  Zdenek Dvorak <dvorakz@suse.cz>

 	* gcc.dg/20070112-1.c: New test.
Index: modulo-sched.c
===================================================================
*** modulo-sched.c	(revision 120757)
--- modulo-sched.c	(working copy)
*************** static partial_schedule_ptr create_parti
*** 161,167 ****
 static void free_partial_schedule (partial_schedule_ptr);
 static void reset_partial_schedule (partial_schedule_ptr, int new_ii);
 void print_partial_schedule (partial_schedule_ptr, FILE *);
- static int kernel_number_of_cycles (rtx first_insn, rtx last_insn);
 static ps_insn_ptr ps_add_node_check_conflicts (partial_schedule_ptr,
 						ddg_node_ptr node, int cycle,
 						sbitmap must_precede,
--- 161,166 ----
*************** set_node_sched_params (ddg_ptr g)
*** 370,376 ****
 }

 static void
! print_node_sched_params (FILE * file, int num_nodes)
 {
   int i;

--- 369,375 ----
 }

 static void
! print_node_sched_params (FILE * file, int num_nodes, ddg_ptr g)
 {
   int i;

*************** print_node_sched_params (FILE * file, in
*** 382,388 ****
       rtx reg_move = nsp->first_reg_move;
       int j;

!       fprintf (file, "Node %d:\n", i);
       fprintf (file, " asap = %d:\n", nsp->asap);
       fprintf (file, " time = %d:\n", nsp->time);
       fprintf (file, " nreg_moves = %d:\n", nsp->nreg_moves);
--- 381,387 ----
       rtx reg_move = nsp->first_reg_move;
       int j;

!       fprintf (file, "Node = %d; INSN = %d\n", i,  (INSN_UID
(g->nodes[i].insn)));
       fprintf (file, " asap = %d:\n", nsp->asap);
       fprintf (file, " time = %d:\n", nsp->time);
       fprintf (file, " nreg_moves = %d:\n", nsp->nreg_moves);
*************** print_node_sched_params (FILE * file, in
*** 395,434 ****
     }
 }

- /* Calculate an upper bound for II.  SMS should not schedule the loop if it
-    requires more cycles than this bound.  Currently set to the sum of the
-    longest latency edge for each node.  Reset based on experiments.  */
- static int
- calculate_maxii (ddg_ptr g)
- {
-   int i;
-   int maxii = 0;
-
-   for (i = 0; i < g->num_nodes; i++)
-     {
-       ddg_node_ptr u = &g->nodes[i];
-       ddg_edge_ptr e;
-       int max_edge_latency = 0;

-       for (e = u->out; e; e = e->next_out)
- 	max_edge_latency = MAX (max_edge_latency, e->latency);
-
-       maxii += max_edge_latency;
-     }
-   return maxii;
- }
-
- /*
-    Breaking intra-loop register anti-dependences:
-    Each intra-loop register anti-dependence implies a cross-iteration true
-    dependence of distance 1. Therefore, we can remove such false dependencies
-    and figure out if the partial schedule broke them by checking if (for a
-    true-dependence of distance 1): SCHED_TIME (def) < SCHED_TIME (use) and
-    if so generate a register move.   The number of such moves is equal to:
-               SCHED_TIME (use) - SCHED_TIME (def)       { 0 broken
-    nreg_moves = ----------------------------------- + 1 - {   dependence.
-                             ii                          { 1 if not.
- */
 static struct undo_replace_buff_elem *
 generate_reg_moves (partial_schedule_ptr ps)
 {
--- 394,400 ----
*************** generate_reg_moves (partial_schedule_ptr
*** 536,574 ****
   return reg_move_replaces;
 }

- /* We call this when we want to undo the SMS schedule for a given loop.
-    One of the things that we do is to delete the register moves generated
-    for the sake of SMS; this function deletes the register move instructions
-    recorded in the undo buffer.  */
- static void
- undo_generate_reg_moves (partial_schedule_ptr ps,
- 			 struct undo_replace_buff_elem *reg_move_replaces)
- {
-   int i,j;
-
-   for (i = 0; i < ps->g->num_nodes; i++)
-     {
-       ddg_node_ptr u = &ps->g->nodes[i];
-       rtx prev;
-       rtx crr = SCHED_FIRST_REG_MOVE (u);
-
-       for (j = 0; j < SCHED_NREG_MOVES (u); j++)
- 	{
- 	  prev = PREV_INSN (crr);
- 	  delete_insn (crr);
- 	  crr = prev;
- 	}
-       SCHED_FIRST_REG_MOVE (u) = NULL_RTX;
-     }
-
-   while (reg_move_replaces)
-     {
-       struct undo_replace_buff_elem *rep = reg_move_replaces;

-       reg_move_replaces = reg_move_replaces->next;
-       replace_rtx (rep->insn, rep->new_reg, rep->orig_reg);
-     }
- }

 /* Free memory allocated for the undo buffer.  */
 static void
--- 502,508 ----
*************** permute_partial_schedule (partial_schedu
*** 641,668 ****
 			    PREV_INSN (last));
 }

- /* As part of undoing SMS we return to the original ordering of the
-    instructions inside the loop kernel.  Given the partial schedule PS, this
-    function returns the ordering of the instruction according to their CUID
-    in the DDG (PS->G), which is the original order of the instruction before
-    performing SMS.  */
- static void
- undo_permute_partial_schedule (partial_schedule_ptr ps, rtx last)
- {
-   int i;
-
-   for (i = 0 ; i < ps->g->num_nodes; i++)
-     if (last == ps->g->nodes[i].insn
- 	|| last == ps->g->nodes[i].first_note)
-       break;
-     else if (PREV_INSN (last) != ps->g->nodes[i].insn)
-       reorder_insns_nobb (ps->g->nodes[i].first_note, ps->g->nodes[i].insn,
- 			  PREV_INSN (last));
- }
-
- /* Used to generate the prologue & epilogue.  Duplicate the subset of
-    nodes whose stages are between FROM_STAGE and TO_STAGE (inclusive
-    of both), together with a prefix/suffix of their reg_moves.  */
 static void
 duplicate_insns_of_cycles (partial_schedule_ptr ps, int from_stage,
 			   int to_stage, int for_prolog)
--- 575,580 ----
*************** sms_schedule (void)
*** 1098,1104 ****
       mii = 1; /* Need to pass some estimate of mii.  */
       rec_mii = sms_order_nodes (g, mii, node_order);
       mii = MAX (res_MII (g), rec_mii);
!       maxii = (calculate_maxii (g) * SMS_MAX_II_FACTOR) / 100;

       if (dump_file)
 	fprintf (dump_file, "SMS iis %d %d %d (rec_mii, mii, maxii)\n",
--- 1010,1016 ----
       mii = 1; /* Need to pass some estimate of mii.  */
       rec_mii = sms_order_nodes (g, mii, node_order);
       mii = MAX (res_MII (g), rec_mii);
!       maxii = (2 * mii * SMS_MAX_II_FACTOR) / 100;

       if (dump_file)
 	fprintf (dump_file, "SMS iis %d %d %d (rec_mii, mii, maxii)\n",
*************** sms_schedule (void)
*** 1132,1139 ****
 	}
       else
 	{
- 	  int orig_cycles = kernel_number_of_cycles (BB_HEAD (g->bb),
BB_END (g->bb));
- 	  int new_cycles;
 	  struct undo_replace_buff_elem *reg_move_replaces;

 	  if (dump_file)
--- 1044,1049 ----
*************** sms_schedule (void)
*** 1146,1152 ****
 		       "SMS Branch (%d) will later be scheduled at cycle %d.\n",
 		       g->closing_branch->cuid, PS_MIN_CYCLE (ps) - 1);
 	    }
!
 	  /* Set the stage boundaries.  If the DDG is built with closing_branch_deps,
 	     the closing_branch was scheduled and should appear in the last (ii-1)
 	     row.  Otherwise, we are free to schedule the branch, and we let nodes
--- 1056,1062 ----
 		       "SMS Branch (%d) will later be scheduled at cycle %d.\n",
 		       g->closing_branch->cuid, PS_MIN_CYCLE (ps) - 1);
 	    }
! 	
 	  /* Set the stage boundaries.  If the DDG is built with closing_branch_deps,
 	     the closing_branch was scheduled and should appear in the last (ii-1)
 	     row.  Otherwise, we are free to schedule the branch, and we let nodes
*************** sms_schedule (void)
*** 1155,1222 ****
 	  normalize_sched_times (ps);
 	  rotate_partial_schedule (ps, PS_MIN_CYCLE (ps));
 	  set_columns_for_ps (ps);

! 	  /* Generate the kernel just to be able to measure its cycles.  */
! 	  permute_partial_schedule (ps, g->closing_branch->first_note);
! 	  reg_move_replaces = generate_reg_moves (ps);

! 	  /* Get the number of cycles the new kernel expect to execute in.  */
! 	  new_cycles = kernel_number_of_cycles (BB_HEAD (g->bb), BB_END (g->bb));

! 	  /* Get back to the original loop so we can do loop versioning.  */
! 	  undo_permute_partial_schedule (ps, g->closing_branch->first_note);
! 	  if (reg_move_replaces)
! 	    undo_generate_reg_moves (ps, reg_move_replaces);

! 	  if ( new_cycles >= orig_cycles)
! 	    {
! 	      /* SMS is not profitable so undo the permutation and reg move
generation
! 	         and return the kernel to its original state.  */
! 	      if (dump_file)
! 		fprintf (dump_file, "Undoing SMS because it is not profitable.\n");

! 	    }
 	  else
! 	    {
! 	      canon_loop (loop);
!
!               /* case the BCT count is not known , Do loop-versioning */
! 	      if (count_reg && ! count_init)
! 		{
! 		  rtx comp_rtx = gen_rtx_fmt_ee (GT, VOIDmode, count_reg,
! 						 GEN_INT(stage_count));
! 		  unsigned prob = (PROB_SMS_ENOUGH_ITERATIONS
! 				   * REG_BR_PROB_BASE) / 100;
!
! 		  loop_version (loop, comp_rtx, &condition_bb,
! 				prob, prob, REG_BR_PROB_BASE - prob,
! 				true);
! 		}
!
! 	      /* Set new iteration count of loop kernel.  */
!               if (count_reg && count_init)
! 		SET_SRC (single_set (count_init)) = GEN_INT (loop_count
! 							     - stage_count + 1);
!
! 	      /* Now apply the scheduled kernel to the RTL of the loop.  */
! 	      permute_partial_schedule (ps, g->closing_branch->first_note);
!
!               /* Mark this loop as software pipelined so the later
! 	      scheduling passes doesn't touch it.  */
! 	      if (! flag_resched_modulo_sched)
! 		g->bb->flags |= BB_DISABLE_SCHEDULE;
! 	      /* The life-info is not valid any more.  */
! 	      g->bb->flags |= BB_DIRTY;
!
! 	      reg_move_replaces = generate_reg_moves (ps);
! 	      if (dump_file)
! 		print_node_sched_params (dump_file, g->num_nodes);
! 	      /* Generate prolog and epilog.  */
! 	      if (count_reg && !count_init)
! 		generate_prolog_epilog (ps, loop, count_reg);
! 	      else
! 	 	generate_prolog_epilog (ps, loop, NULL_RTX);
! 	    }
 	  free_undo_replace_buff (reg_move_replaces);
 	}

--- 1065,1110 ----
 	  normalize_sched_times (ps);
 	  rotate_partial_schedule (ps, PS_MIN_CYCLE (ps));
 	  set_columns_for_ps (ps);
+ 	
+ 	  canon_loop (loop);

!           /* case the BCT count is not known , Do loop-versioning */
! 	  if (count_reg && ! count_init)
! 	    {
! 	      rtx comp_rtx = gen_rtx_fmt_ee (GT, VOIDmode, count_reg,
! 	    				 GEN_INT(stage_count));
! 	      unsigned prob = (PROB_SMS_ENOUGH_ITERATIONS
! 	    		   * REG_BR_PROB_BASE) / 100;
!
! 	      loop_version (loop, comp_rtx, &condition_bb,
! 	    		prob, prob, REG_BR_PROB_BASE - prob,
! 	    		true);
! 	    }

! 	  /* Set new iteration count of loop kernel.  */
!           if (count_reg && count_init)
! 	    SET_SRC (single_set (count_init)) = GEN_INT (loop_count
! 	    					     - stage_count + 1);

! 	  /* Now apply the scheduled kernel to the RTL of the loop.  */
! 	  permute_partial_schedule (ps, g->closing_branch->first_note);

!           /* Mark this loop as software pipelined so the later
! 	  scheduling passes doesn't touch it.  */
! 	  if (! flag_resched_modulo_sched)
! 	    g->bb->flags |= BB_DISABLE_SCHEDULE;
! 	  /* The life-info is not valid any more.  */
! 	  g->bb->flags |= BB_DIRTY;

! 	  reg_move_replaces = generate_reg_moves (ps);
! 	  if (dump_file)
! 	    print_node_sched_params (dump_file, g->num_nodes, g);
! 	  /* Generate prolog and epilog.  */
! 	  if (count_reg && !count_init)
! 	    generate_prolog_epilog (ps, loop, count_reg);
 	  else
! 	    generate_prolog_epilog (ps, loop, NULL_RTX);
! 	
 	  free_undo_replace_buff (reg_move_replaces);
 	}

*************** get_sched_window (partial_schedule_ptr p
*** 1354,1360 ****

early_start = MAX (early_start, node_st);

! 	      if (e->data_type == MEM_DEP)
 		end = MIN (end, SCHED_TIME (v_node) + ii - 1);
 	    }
 	}
--- 1242,1248 ----

early_start = MAX (early_start, node_st);

! 	      if (!flag_modulo_sched_allow_regmoves || e->data_type == MEM_DEP)
 		end = MIN (end, SCHED_TIME (v_node) + ii - 1);
 	    }
 	}
*************** get_sched_window (partial_schedule_ptr p
*** 1376,1382 ****
 	      late_start = MIN (late_start,
 				SCHED_TIME (v_node) - e->latency
 				+ (e->distance * ii));
! 	      if (e->data_type == MEM_DEP)
 		end = MAX (end, SCHED_TIME (v_node) - ii + 1);
 	    }
 	}
--- 1264,1270 ----
 	      late_start = MIN (late_start,
 				SCHED_TIME (v_node) - e->latency
 				+ (e->distance * ii));
! 	      if (!flag_modulo_sched_allow_regmoves || e->data_type == MEM_DEP)
 		end = MAX (end, SCHED_TIME (v_node) - ii + 1);
 	    }
 	}
*************** get_sched_window (partial_schedule_ptr p
*** 1401,1407 ****
 	      early_start = MAX (early_start,
 				 SCHED_TIME (v_node) + e->latency
 				 - (e->distance * ii));
! 	      if (e->data_type == MEM_DEP)
 		end = MIN (end, SCHED_TIME (v_node) + ii - 1);
 	    }
 	}
--- 1289,1295 ----
 	      early_start = MAX (early_start,
 				 SCHED_TIME (v_node) + e->latency
 				 - (e->distance * ii));
! 	      if (!flag_modulo_sched_allow_regmoves || e->data_type == MEM_DEP)
 		end = MIN (end, SCHED_TIME (v_node) + ii - 1);
 	    }
 	}
*************** get_sched_window (partial_schedule_ptr p
*** 1414,1420 ****
 	      late_start = MIN (late_start,
 				SCHED_TIME (v_node) - e->latency
 				+ (e->distance * ii));
! 	      if (e->data_type == MEM_DEP)
 		start = MAX (start, SCHED_TIME (v_node) - ii + 1);
 	    }
 	}
--- 1302,1308 ----
 	      late_start = MIN (late_start,
 				SCHED_TIME (v_node) - e->latency
 				+ (e->distance * ii));
! 	      if (!flag_modulo_sched_allow_regmoves || e->data_type == MEM_DEP)
 		start = MAX (start, SCHED_TIME (v_node) - ii + 1);
 	    }
 	}
*************** sms_schedule_by_order (ddg_ptr g, int mi
*** 1524,1543 ****
 	    fprintf(dump_file, "Trying to schedule node %d in (%d .. %d) step %d\n",
 		    u, start, end, step);

!           /* use must_follow & must_precede bitmaps to determine order
! 	     of nodes within the cycle.  */
           sbitmap_zero (must_precede);
           sbitmap_zero (must_follow);
       	  for (e = u_node->in; e != 0; e = e->next_in)
!             if (TEST_BIT (sched_nodes, e->src->cuid)
! 	        && e->latency == (ii * e->distance)
! 		&& start == SCHED_TIME (e->src))
              SET_BIT (must_precede, e->src->cuid);

 	  for (e = u_node->out; e != 0; e = e->next_out)
!             if (TEST_BIT (sched_nodes, e->dest->cuid)
! 	        && e->latency == (ii * e->distance)
! 		&& end == SCHED_TIME (e->dest))
              SET_BIT (must_follow, e->dest->cuid);

 	  success = 0;
--- 1412,1443 ----
 	    fprintf(dump_file, "Trying to schedule node %d in (%d .. %d) step %d\n",
 		    u, start, end, step);

!           /* Use must_follow & must_precede bitmaps to determine order
! 	     of nodes within the cycle.
! 	     The modulo-scheduler may violate anti-dependences when scheduling both
! 	     the source and the sink of the dependence in the same cycle. This can
! 	     occur when we have two or more defs of the same register, and the last
! 	     use of the register is scheduled II cycles after the first def (it
! 	     cannot be scheduled later than that because of a dependence arc), so
! 	     both are assigned to the same row, but possibly in the wrong order.
! 	     Note also that for TRUE_DEP or a single def and it's uses this problem
! 	     does not arise, as register moves are used to handle live-ranges
! 	     exceeding II cycles.  */
! 	
           sbitmap_zero (must_precede);
           sbitmap_zero (must_follow);
       	  for (e = u_node->in; e != 0; e = e->next_in)
!             if ((TEST_BIT (sched_nodes, e->src->cuid)
! 		&& (e->latency == (ii * e->distance)
! 		     && start == SCHED_TIME (e->src)) )
! 		  || (e->type != TRUE_DEP))
              SET_BIT (must_precede, e->src->cuid);

 	  for (e = u_node->out; e != 0; e = e->next_out)
!             if ((TEST_BIT (sched_nodes, e->dest->cuid)
! 		&& (e->latency == (ii * e->distance)
! 		     && start == SCHED_TIME (e->src)))
! 		  || (e->type != TRUE_DEP))
              SET_BIT (must_follow, e->dest->cuid);

 	  success = 0;
*************** advance_one_cycle (void)
*** 2255,2311 ****
 		      targetm.sched.dfa_post_cycle_insn ());
 }

- /* Given the kernel of a loop (from FIRST_INSN to LAST_INSN), finds
-    the number of cycles according to DFA that the kernel fits in,
-    we use this to check if we done well with SMS after we add
-    register moves.  In some cases register moves overhead makes
-    it even worse than the original loop.  We want SMS to be performed
-    when it gives less cycles after register moves are added.  */
- static int
- kernel_number_of_cycles (rtx first_insn, rtx last_insn)
- {
-   int cycles = 0;
-   rtx insn;
-   int can_issue_more = issue_rate;

-   state_reset (curr_state);
-
-   for (insn = first_insn;
-        insn != NULL_RTX && insn != last_insn;
-        insn = NEXT_INSN (insn))
-     {
-       if (! INSN_P (insn) || GET_CODE (PATTERN (insn)) == USE)
- 	continue;
-
-       /* Check if there is room for the current insn.  */
-       if (!can_issue_more || state_dead_lock_p (curr_state))
- 	{
- 	  cycles ++;
- 	  advance_one_cycle ();
- 	  can_issue_more = issue_rate;
- 	}
-
- 	/* Update the DFA state and return with failure if the DFA found
- 	   recource conflicts.  */
-       if (state_transition (curr_state, insn) >= 0)
- 	{
- 	  cycles ++;
- 	  advance_one_cycle ();
- 	  can_issue_more = issue_rate;
- 	}
-
-       if (targetm.sched.variable_issue)
- 	can_issue_more =
- 	  targetm.sched.variable_issue (sched_dump, sched_verbose,
- 					insn, can_issue_more);
-       /* A naked CLOBBER or USE generates no instruction, so don't
- 	 let them consume issue slots.  */
-       else if (GET_CODE (PATTERN (insn)) != USE
- 	       && GET_CODE (PATTERN (insn)) != CLOBBER)
- 	can_issue_more--;
-     }
-   return cycles;
- }

 /* Checks if PS has resource conflicts according to DFA, starting from
    FROM cycle to TO cycle; returns true if there are conflicts and false
--- 2155,2161 ----
Index: common.opt
===================================================================
*** common.opt	(revision 120757)
--- common.opt	(working copy)
*************** fmodulo-sched
*** 602,607 ****
--- 602,611 ----
 Common Report Var(flag_modulo_sched)
 Perform SMS based modulo scheduling before the first scheduling pass

+ fmodulo-sched-allow-regmoves
+ Common Report Var(flag_modulo_sched_allow_regmoves)
+ Perform SMS based modulo scheduling with register moves allowed
+
 fmove-loop-invariants
 Common Report Var(flag_move_loop_invariants) Init(1)
 Move loop invariant computations out of loops
Index: ddg.c
===================================================================
*** ddg.c	(revision 120757)
--- ddg.c	(working copy)
*************** create_ddg_dependence (ddg_ptr g, ddg_no
*** 178,194 ****
      {
        /* Some interloop dependencies are relaxed:
  	 1. Every insn is output dependent on itself; ignore such deps.
! 	 2. Every true/flow dependence is an anti dependence in the
  	 opposite direction with distance 1; such register deps
! 	 will be removed by renaming if broken --- ignore them.  */
!       if (!(t == OUTPUT_DEP && src_node == dest_node)
! 	  && !(t == ANTI_DEP && dt == REG_DEP))
  	add_backarc_to_ddg (g, e);
        else
  	free (e);
      }
-   else if (t == ANTI_DEP && dt == REG_DEP)
-     free (e);  /* We can fix broken anti register deps using reg-moves.  */
    else
      add_edge_to_ddg (g, e);
  }
--- 178,196 ----
      {
        /* Some interloop dependencies are relaxed:
  	 1. Every insn is output dependent on itself; ignore such deps.
! 	 2. Assuming a single use of any register, 
! 	 If there were a single definition of every register we could
! 	 have ignored the antidependencies as
! 	 every true/flow dependence is an anti dependence in the
  	 opposite direction with distance 1; such register deps
! 	 will be removed by renaming if broken. But in some cases we do see
! 	 multiple definition, for instance in case of long long registers so
! 	 we don't remove them.*/
!       if (!(t == OUTPUT_DEP && src_node == dest_node))
  	add_backarc_to_ddg (g, e);
        else
  	free (e);
      }
    else
      add_edge_to_ddg (g, e);
  }
Index: ChangeLog
===================================================================
*** ChangeLog	(revision 120757)
--- ChangeLog	(working copy)
***************
*** 1,3 ****
--- 1,18 ----
+ 2007-01-17  Vladimir Yanovsky <yanov@il.ibm.com>
+ 	* ddg.c (create_ddg_dependence): don't ignore ANTI_DEPs
+ 	* modulo-sched.c (print_node_sched_params): add parameter and verbosity.
+ 	(calculate_maxii): remove function.
+ 	(undo_generate_reg_moves): Likewise.
+ 	(undo_permute_partial_schedule): Likewise.
+ 	(kernel_number_of_cycles): Likewise.
+ 	(sms_schedule): remove profitability checks.
+ 	(get_sched_window): prevent necessitating regmoves by limiting
+ 	the scheduling window, unless FLAG_MODULO_SCHED_ALLOW_REGMOVES==1.
+ 	(sms_schedule_by_order): use must_follow & must_precede bitmaps to
+ 	determine order of nodes within the cycle.
+ 	* common.opt (fmodulo-sched-allow-regmoves): new flag.
+ 	* testsuite/gcc.dg/sms-antideps.c: new test.
+ 
  2007-01-13  Zdenek Dvorak <dvorakz@suse.cz>
  
  	* ipa-reference.c (analyze_function): Consider also addresses taken
Index: testsuite/gcc.dg/sms-antideps.c
===================================================================
*** testsuite/gcc.dg/sms-antideps.c	(revision 0)
--- testsuite/gcc.dg/sms-antideps.c	(revision 0)
***************
*** 0 ****
--- 1,34 ----
+ /* This test is a reduced test case for a bug that caused
+  *   bootstrapping with -fmodulo-sched.
+  *     Related to a broken anti-dep that was not fixed by
+  *       reg-moves.  */
+ 
+ /* { dg-do run } */
+ /* { dg-options "-O2 -fmodulo-sched" } */
+ 
+ #include <stdlib.h>
+ 
+ unsigned long long
+ foo (long long ixi, unsigned ctr)
+ {
+    unsigned long long irslt = 1;
+     long long ix = ixi;
+ 
+      for (; ctr; ctr--)
+         {
+ 	     irslt *= ix;
+ 	        ix *= ix;
+ 		 }
+ 
+       if (irslt != 14348907)
+ 	   abort ();
+        return irslt;
+ }
+ 
+ 
+ int main ()
+ {
+    unsigned long long res;
+ 
+     res = foo (3, 4);
+ }
Index: testsuite/ChangeLog
===================================================================
*** testsuite/ChangeLog	(revision 120757)
--- testsuite/ChangeLog	(working copy)
***************
*** 1,3 ****
--- 1,6 ----
+ 2007-01-14  Vladimir Yanovsky <yanov@il.ibm.com>
+ 	* gcc.dg/sms-antideps.c: New test.
+ 
  2007-01-13  Zdenek Dvorak <dvorakz@suse.cz>
  
  	* gcc.dg/20070112-1.c: New test.
Index: modulo-sched.c
===================================================================
*** modulo-sched.c	(revision 120757)
--- modulo-sched.c	(working copy)
*************** static partial_schedule_ptr create_parti
*** 161,167 ****
  static void free_partial_schedule (partial_schedule_ptr);
  static void reset_partial_schedule (partial_schedule_ptr, int new_ii);
  void print_partial_schedule (partial_schedule_ptr, FILE *);
- static int kernel_number_of_cycles (rtx first_insn, rtx last_insn);
  static ps_insn_ptr ps_add_node_check_conflicts (partial_schedule_ptr,
  						ddg_node_ptr node, int cycle,
  						sbitmap must_precede,
--- 161,166 ----
*************** set_node_sched_params (ddg_ptr g)
*** 370,376 ****
  }
  
  static void
! print_node_sched_params (FILE * file, int num_nodes)
  {
    int i;
  
--- 369,375 ----
  }
  
  static void
! print_node_sched_params (FILE * file, int num_nodes, ddg_ptr g)
  {
    int i;
  
*************** print_node_sched_params (FILE * file, in
*** 382,388 ****
        rtx reg_move = nsp->first_reg_move;
        int j;
  
!       fprintf (file, "Node %d:\n", i);
        fprintf (file, " asap = %d:\n", nsp->asap);
        fprintf (file, " time = %d:\n", nsp->time);
        fprintf (file, " nreg_moves = %d:\n", nsp->nreg_moves);
--- 381,387 ----
        rtx reg_move = nsp->first_reg_move;
        int j;
  
!       fprintf (file, "Node = %d; INSN = %d\n", i,  (INSN_UID (g->nodes[i].insn)));
        fprintf (file, " asap = %d:\n", nsp->asap);
        fprintf (file, " time = %d:\n", nsp->time);
        fprintf (file, " nreg_moves = %d:\n", nsp->nreg_moves);
*************** print_node_sched_params (FILE * file, in
*** 395,434 ****
      }
  }
  
- /* Calculate an upper bound for II.  SMS should not schedule the loop if it
-    requires more cycles than this bound.  Currently set to the sum of the
-    longest latency edge for each node.  Reset based on experiments.  */
- static int
- calculate_maxii (ddg_ptr g)
- {
-   int i;
-   int maxii = 0;
- 
-   for (i = 0; i < g->num_nodes; i++)
-     {
-       ddg_node_ptr u = &g->nodes[i];
-       ddg_edge_ptr e;
-       int max_edge_latency = 0;
  
-       for (e = u->out; e; e = e->next_out)
- 	max_edge_latency = MAX (max_edge_latency, e->latency);
- 
-       maxii += max_edge_latency;
-     }
-   return maxii;
- }
- 
- /*
-    Breaking intra-loop register anti-dependences:
-    Each intra-loop register anti-dependence implies a cross-iteration true
-    dependence of distance 1. Therefore, we can remove such false dependencies
-    and figure out if the partial schedule broke them by checking if (for a
-    true-dependence of distance 1): SCHED_TIME (def) < SCHED_TIME (use) and
-    if so generate a register move.   The number of such moves is equal to:
-               SCHED_TIME (use) - SCHED_TIME (def)       { 0 broken
-    nreg_moves = ----------------------------------- + 1 - {   dependence.
-                             ii                          { 1 if not.
- */
  static struct undo_replace_buff_elem *
  generate_reg_moves (partial_schedule_ptr ps)
  {
--- 394,400 ----
*************** generate_reg_moves (partial_schedule_ptr
*** 536,574 ****
    return reg_move_replaces;
  }
  
- /* We call this when we want to undo the SMS schedule for a given loop.
-    One of the things that we do is to delete the register moves generated
-    for the sake of SMS; this function deletes the register move instructions
-    recorded in the undo buffer.  */
- static void
- undo_generate_reg_moves (partial_schedule_ptr ps,
- 			 struct undo_replace_buff_elem *reg_move_replaces)
- {
-   int i,j;
- 
-   for (i = 0; i < ps->g->num_nodes; i++)
-     {
-       ddg_node_ptr u = &ps->g->nodes[i];
-       rtx prev;
-       rtx crr = SCHED_FIRST_REG_MOVE (u);
- 
-       for (j = 0; j < SCHED_NREG_MOVES (u); j++)
- 	{
- 	  prev = PREV_INSN (crr);
- 	  delete_insn (crr);
- 	  crr = prev;
- 	}
-       SCHED_FIRST_REG_MOVE (u) = NULL_RTX;
-     }
- 
-   while (reg_move_replaces)
-     {
-       struct undo_replace_buff_elem *rep = reg_move_replaces;
  
-       reg_move_replaces = reg_move_replaces->next;
-       replace_rtx (rep->insn, rep->new_reg, rep->orig_reg);
-     }
- }
  
  /* Free memory allocated for the undo buffer.  */
  static void
--- 502,508 ----
*************** permute_partial_schedule (partial_schedu
*** 641,668 ****
  			    PREV_INSN (last));
  }
  
- /* As part of undoing SMS we return to the original ordering of the
-    instructions inside the loop kernel.  Given the partial schedule PS, this
-    function returns the ordering of the instruction according to their CUID
-    in the DDG (PS->G), which is the original order of the instruction before
-    performing SMS.  */
- static void
- undo_permute_partial_schedule (partial_schedule_ptr ps, rtx last)
- {
-   int i;
- 
-   for (i = 0 ; i < ps->g->num_nodes; i++)
-     if (last == ps->g->nodes[i].insn
- 	|| last == ps->g->nodes[i].first_note)
-       break;
-     else if (PREV_INSN (last) != ps->g->nodes[i].insn)
-       reorder_insns_nobb (ps->g->nodes[i].first_note, ps->g->nodes[i].insn,
- 			  PREV_INSN (last));
- }
- 
- /* Used to generate the prologue & epilogue.  Duplicate the subset of
-    nodes whose stages are between FROM_STAGE and TO_STAGE (inclusive
-    of both), together with a prefix/suffix of their reg_moves.  */
  static void
  duplicate_insns_of_cycles (partial_schedule_ptr ps, int from_stage,
  			   int to_stage, int for_prolog)
--- 575,580 ----
*************** sms_schedule (void)
*** 1098,1104 ****
        mii = 1; /* Need to pass some estimate of mii.  */
        rec_mii = sms_order_nodes (g, mii, node_order);
        mii = MAX (res_MII (g), rec_mii);
!       maxii = (calculate_maxii (g) * SMS_MAX_II_FACTOR) / 100;
  
        if (dump_file)
  	fprintf (dump_file, "SMS iis %d %d %d (rec_mii, mii, maxii)\n",
--- 1010,1016 ----
        mii = 1; /* Need to pass some estimate of mii.  */
        rec_mii = sms_order_nodes (g, mii, node_order);
        mii = MAX (res_MII (g), rec_mii);
!       maxii = (2 * mii * SMS_MAX_II_FACTOR) / 100;
  
        if (dump_file)
  	fprintf (dump_file, "SMS iis %d %d %d (rec_mii, mii, maxii)\n",
*************** sms_schedule (void)
*** 1132,1139 ****
  	}
        else
  	{
- 	  int orig_cycles = kernel_number_of_cycles (BB_HEAD (g->bb), BB_END (g->bb));
- 	  int new_cycles;
  	  struct undo_replace_buff_elem *reg_move_replaces;
  
  	  if (dump_file)
--- 1044,1049 ----
*************** sms_schedule (void)
*** 1146,1152 ****
  		       "SMS Branch (%d) will later be scheduled at cycle %d.\n",
  		       g->closing_branch->cuid, PS_MIN_CYCLE (ps) - 1);
  	    }
! 
  	  /* Set the stage boundaries.  If the DDG is built with closing_branch_deps,
  	     the closing_branch was scheduled and should appear in the last (ii-1)
  	     row.  Otherwise, we are free to schedule the branch, and we let nodes
--- 1056,1062 ----
  		       "SMS Branch (%d) will later be scheduled at cycle %d.\n",
  		       g->closing_branch->cuid, PS_MIN_CYCLE (ps) - 1);
  	    }
! 	  
  	  /* Set the stage boundaries.  If the DDG is built with closing_branch_deps,
  	     the closing_branch was scheduled and should appear in the last (ii-1)
  	     row.  Otherwise, we are free to schedule the branch, and we let nodes
*************** sms_schedule (void)
*** 1155,1222 ****
  	  normalize_sched_times (ps);
  	  rotate_partial_schedule (ps, PS_MIN_CYCLE (ps));
  	  set_columns_for_ps (ps);
  
! 	  /* Generate the kernel just to be able to measure its cycles.  */
! 	  permute_partial_schedule (ps, g->closing_branch->first_note);
! 	  reg_move_replaces = generate_reg_moves (ps);
  
! 	  /* Get the number of cycles the new kernel expect to execute in.  */
! 	  new_cycles = kernel_number_of_cycles (BB_HEAD (g->bb), BB_END (g->bb));
  
! 	  /* Get back to the original loop so we can do loop versioning.  */
! 	  undo_permute_partial_schedule (ps, g->closing_branch->first_note);
! 	  if (reg_move_replaces)
! 	    undo_generate_reg_moves (ps, reg_move_replaces);
  
! 	  if ( new_cycles >= orig_cycles)
! 	    {
! 	      /* SMS is not profitable so undo the permutation and reg move generation
! 	         and return the kernel to its original state.  */
! 	      if (dump_file)
! 		fprintf (dump_file, "Undoing SMS because it is not profitable.\n");
  
! 	    }
  	  else
! 	    {
! 	      canon_loop (loop);
! 
!               /* case the BCT count is not known , Do loop-versioning */
! 	      if (count_reg && ! count_init)
! 		{
! 		  rtx comp_rtx = gen_rtx_fmt_ee (GT, VOIDmode, count_reg,
! 						 GEN_INT(stage_count));
! 		  unsigned prob = (PROB_SMS_ENOUGH_ITERATIONS
! 				   * REG_BR_PROB_BASE) / 100;
! 
! 		  loop_version (loop, comp_rtx, &condition_bb,
! 				prob, prob, REG_BR_PROB_BASE - prob,
! 				true);
! 		}
! 
! 	      /* Set new iteration count of loop kernel.  */
!               if (count_reg && count_init)
! 		SET_SRC (single_set (count_init)) = GEN_INT (loop_count
! 							     - stage_count + 1);
! 
! 	      /* Now apply the scheduled kernel to the RTL of the loop.  */
! 	      permute_partial_schedule (ps, g->closing_branch->first_note);
! 
!               /* Mark this loop as software pipelined so the later
! 	      scheduling passes doesn't touch it.  */
! 	      if (! flag_resched_modulo_sched)
! 		g->bb->flags |= BB_DISABLE_SCHEDULE;
! 	      /* The life-info is not valid any more.  */
! 	      g->bb->flags |= BB_DIRTY;
! 
! 	      reg_move_replaces = generate_reg_moves (ps);
! 	      if (dump_file)
! 		print_node_sched_params (dump_file, g->num_nodes);
! 	      /* Generate prolog and epilog.  */
! 	      if (count_reg && !count_init)
! 		generate_prolog_epilog (ps, loop, count_reg);
! 	      else
! 	 	generate_prolog_epilog (ps, loop, NULL_RTX);
! 	    }
  	  free_undo_replace_buff (reg_move_replaces);
  	}
  
--- 1065,1110 ----
  	  normalize_sched_times (ps);
  	  rotate_partial_schedule (ps, PS_MIN_CYCLE (ps));
  	  set_columns_for_ps (ps);
+ 	  
+ 	  canon_loop (loop);
  
!           /* case the BCT count is not known , Do loop-versioning */
! 	  if (count_reg && ! count_init)
! 	    {
! 	      rtx comp_rtx = gen_rtx_fmt_ee (GT, VOIDmode, count_reg,
! 	    				 GEN_INT(stage_count));
! 	      unsigned prob = (PROB_SMS_ENOUGH_ITERATIONS
! 	    		   * REG_BR_PROB_BASE) / 100;
! 
! 	      loop_version (loop, comp_rtx, &condition_bb,
! 	    		prob, prob, REG_BR_PROB_BASE - prob,
! 	    		true);
! 	    }
  
! 	  /* Set new iteration count of loop kernel.  */
!           if (count_reg && count_init)
! 	    SET_SRC (single_set (count_init)) = GEN_INT (loop_count
! 	    					     - stage_count + 1);
  
! 	  /* Now apply the scheduled kernel to the RTL of the loop.  */
! 	  permute_partial_schedule (ps, g->closing_branch->first_note);
  
!           /* Mark this loop as software pipelined so the later
! 	  scheduling passes doesn't touch it.  */
! 	  if (! flag_resched_modulo_sched)
! 	    g->bb->flags |= BB_DISABLE_SCHEDULE;
! 	  /* The life-info is not valid any more.  */
! 	  g->bb->flags |= BB_DIRTY;
  
! 	  reg_move_replaces = generate_reg_moves (ps);
! 	  if (dump_file)
! 	    print_node_sched_params (dump_file, g->num_nodes, g);
! 	  /* Generate prolog and epilog.  */
! 	  if (count_reg && !count_init)
! 	    generate_prolog_epilog (ps, loop, count_reg);
  	  else
! 	    generate_prolog_epilog (ps, loop, NULL_RTX);
! 	 
  	  free_undo_replace_buff (reg_move_replaces);
  	}
  
*************** get_sched_window (partial_schedule_ptr p
*** 1354,1360 ****
  
  	      early_start = MAX (early_start, node_st);
  
! 	      if (e->data_type == MEM_DEP)
  		end = MIN (end, SCHED_TIME (v_node) + ii - 1);
  	    }
  	}
--- 1242,1248 ----
  
  	      early_start = MAX (early_start, node_st);
  
! 	      if (!flag_modulo_sched_allow_regmoves || e->data_type == MEM_DEP)
  		end = MIN (end, SCHED_TIME (v_node) + ii - 1);
  	    }
  	}
*************** get_sched_window (partial_schedule_ptr p
*** 1376,1382 ****
  	      late_start = MIN (late_start,
  				SCHED_TIME (v_node) - e->latency
  				+ (e->distance * ii));
! 	      if (e->data_type == MEM_DEP)
  		end = MAX (end, SCHED_TIME (v_node) - ii + 1);
  	    }
  	}
--- 1264,1270 ----
  	      late_start = MIN (late_start,
  				SCHED_TIME (v_node) - e->latency
  				+ (e->distance * ii));
! 	      if (!flag_modulo_sched_allow_regmoves || e->data_type == MEM_DEP)
  		end = MAX (end, SCHED_TIME (v_node) - ii + 1);
  	    }
  	}
*************** get_sched_window (partial_schedule_ptr p
*** 1401,1407 ****
  	      early_start = MAX (early_start,
  				 SCHED_TIME (v_node) + e->latency
  				 - (e->distance * ii));
! 	      if (e->data_type == MEM_DEP)
  		end = MIN (end, SCHED_TIME (v_node) + ii - 1);
  	    }
  	}
--- 1289,1295 ----
  	      early_start = MAX (early_start,
  				 SCHED_TIME (v_node) + e->latency
  				 - (e->distance * ii));
! 	      if (!flag_modulo_sched_allow_regmoves || e->data_type == MEM_DEP)
  		end = MIN (end, SCHED_TIME (v_node) + ii - 1);
  	    }
  	}
*************** get_sched_window (partial_schedule_ptr p
*** 1414,1420 ****
  	      late_start = MIN (late_start,
  				SCHED_TIME (v_node) - e->latency
  				+ (e->distance * ii));
! 	      if (e->data_type == MEM_DEP)
  		start = MAX (start, SCHED_TIME (v_node) - ii + 1);
  	    }
  	}
--- 1302,1308 ----
  	      late_start = MIN (late_start,
  				SCHED_TIME (v_node) - e->latency
  				+ (e->distance * ii));
! 	      if (!flag_modulo_sched_allow_regmoves || e->data_type == MEM_DEP)
  		start = MAX (start, SCHED_TIME (v_node) - ii + 1);
  	    }
  	}
*************** sms_schedule_by_order (ddg_ptr g, int mi
*** 1524,1543 ****
  	    fprintf(dump_file, "Trying to schedule node %d in (%d .. %d) step %d\n",
  		    u, start, end, step);
  
!           /* use must_follow & must_precede bitmaps to determine order
! 	     of nodes within the cycle.  */
            sbitmap_zero (must_precede);
            sbitmap_zero (must_follow);
        	  for (e = u_node->in; e != 0; e = e->next_in)
!             if (TEST_BIT (sched_nodes, e->src->cuid)
! 	        && e->latency == (ii * e->distance)
! 		&& start == SCHED_TIME (e->src))
               SET_BIT (must_precede, e->src->cuid);
  
  	  for (e = u_node->out; e != 0; e = e->next_out)
!             if (TEST_BIT (sched_nodes, e->dest->cuid)
! 	        && e->latency == (ii * e->distance)
! 		&& end == SCHED_TIME (e->dest))
               SET_BIT (must_follow, e->dest->cuid);
  
  	  success = 0;
--- 1412,1443 ----
  	    fprintf(dump_file, "Trying to schedule node %d in (%d .. %d) step %d\n",
  		    u, start, end, step);
  
!           /* Use must_follow & must_precede bitmaps to determine order
! 	     of nodes within the cycle. 
! 	     The modulo-scheduler may violate anti-dependences when scheduling both
! 	     the source and the sink of the dependence in the same cycle. This can
! 	     occur when we have two or more defs of the same register, and the last
! 	     use of the register is scheduled II cycles after the first def (it
! 	     cannot be scheduled later than that because of a dependence arc), so
! 	     both are assigned to the same row, but possibly in the wrong order.
! 	     Note also that for TRUE_DEP or a single def and it's uses this problem 
! 	     does not arise, as register moves are used to handle live-ranges
! 	     exceeding II cycles.  */
! 	    
            sbitmap_zero (must_precede);
            sbitmap_zero (must_follow);
        	  for (e = u_node->in; e != 0; e = e->next_in)
!             if ((TEST_BIT (sched_nodes, e->src->cuid)
! 		&& (e->latency == (ii * e->distance)
! 		     && start == SCHED_TIME (e->src)) )
! 		  || (e->type != TRUE_DEP))
               SET_BIT (must_precede, e->src->cuid);
  
  	  for (e = u_node->out; e != 0; e = e->next_out)
!             if ((TEST_BIT (sched_nodes, e->dest->cuid)
! 		&& (e->latency == (ii * e->distance)
! 		     && start == SCHED_TIME (e->src)))
! 		  || (e->type != TRUE_DEP))
               SET_BIT (must_follow, e->dest->cuid);
  
  	  success = 0;
*************** advance_one_cycle (void)
*** 2255,2311 ****
  		      targetm.sched.dfa_post_cycle_insn ());
  }
  
- /* Given the kernel of a loop (from FIRST_INSN to LAST_INSN), finds
-    the number of cycles according to DFA that the kernel fits in,
-    we use this to check if we done well with SMS after we add
-    register moves.  In some cases register moves overhead makes
-    it even worse than the original loop.  We want SMS to be performed
-    when it gives less cycles after register moves are added.  */
- static int
- kernel_number_of_cycles (rtx first_insn, rtx last_insn)
- {
-   int cycles = 0;
-   rtx insn;
-   int can_issue_more = issue_rate;
  
-   state_reset (curr_state);
- 
-   for (insn = first_insn;
-        insn != NULL_RTX && insn != last_insn;
-        insn = NEXT_INSN (insn))
-     {
-       if (! INSN_P (insn) || GET_CODE (PATTERN (insn)) == USE)
- 	continue;
- 
-       /* Check if there is room for the current insn.  */
-       if (!can_issue_more || state_dead_lock_p (curr_state))
- 	{
- 	  cycles ++;
- 	  advance_one_cycle ();
- 	  can_issue_more = issue_rate;
- 	}
- 
- 	/* Update the DFA state and return with failure if the DFA found
- 	   recource conflicts.  */
-       if (state_transition (curr_state, insn) >= 0)
- 	{
- 	  cycles ++;
- 	  advance_one_cycle ();
- 	  can_issue_more = issue_rate;
- 	}
- 
-       if (targetm.sched.variable_issue)
- 	can_issue_more =
- 	  targetm.sched.variable_issue (sched_dump, sched_verbose,
- 					insn, can_issue_more);
-       /* A naked CLOBBER or USE generates no instruction, so don't
- 	 let them consume issue slots.  */
-       else if (GET_CODE (PATTERN (insn)) != USE
- 	       && GET_CODE (PATTERN (insn)) != CLOBBER)
- 	can_issue_more--;
-     }
-   return cycles;
- }
  
  /* Checks if PS has resource conflicts according to DFA, starting from
     FROM cycle to TO cycle; returns true if there are conflicts and false
--- 2155,2161 ----
Index: common.opt
===================================================================
*** common.opt	(revision 120757)
--- common.opt	(working copy)
*************** fmodulo-sched
*** 602,607 ****
--- 602,611 ----
  Common Report Var(flag_modulo_sched)
  Perform SMS based modulo scheduling before the first scheduling pass
  
+ fmodulo-sched-allow-regmoves
+ Common Report Var(flag_modulo_sched_allow_regmoves)
+ Perform SMS based modulo scheduling with register moves allowed
+ 
  fmove-loop-invariants
  Common Report Var(flag_move_loop_invariants) Init(1)
  Move loop invariant computations out of loops

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]