This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [SMS] Support new loop pattern


Apologies for the messed up previous e-mail.

2011/10/12 Ayal Zaks <ayal.zaks@gmail.com>:
>>> - the last jump instruction should look like: ?pc=(regF!=0)?label:pc, regF is
>
> you'd probably want to bump to next instruction if falling through,
> e.g., pc=(regF!=0)?label:pc+4
>

It is considered that program counter is increased automatically on
hardware level.
Otherwise we should add something like "pc=pc+4" in parallel to each
instruction in RTL.

>>> ?flag register;
>>> - the last instruction which sets regF should be: regF=COMPARE(regC,X), where X
>>> ?is a constant, or maybe a register, which is not changed inside a loop;
>>> - only one instruction modifies regC inside a loop (other can use regC, but not
>>> ?write), and it should simply adjust it by a constant: regC=regC+step, where
>>> ?step is a constant.
>>
>>> When doloop is succesfully scheduled by SMS, its number of
>>> iterations of loop kernel should be decreased by the number of stages in a
>>> schedule minus one, while other iterations expand to prologue and epilogue.
>>> In new supported loops such approach can't be used, because some
>>> instructions can use count register (regC). ?Instead of this,
>>> the final register value X in compare instruction regF=COMPARE(regC,X)
>>> is changed to another value Y respective to the stage this instruction
>>> is scheduled (Y = X - stage * step).
>
> making sure this does not underflow; i.e., that the number of
> iterations is no less than stage (you've addressed this towards the
> end below).
>

Yes, this situation is processed correctly.

>>
>> The main difference from doloop case is that regC can be used by some
>> instructions in loop body.
>> That's why we are unable to simply adjust regC initial value, but have
>> to keep it's value correct on each particular iteration.
>> So, we change comparison instruction accordingly.
>>
>> An example:
>> int a[100];
>> int main()
>> {
>> ?int i;
>> ?for (i = 85; i > 12; i -= 5)
>> ? ? ?a[i] = i * i;
>> ?return a[15]-225;
>> }
>> ARM assembler with "-O2 -fno-auto-inc-dec":
>> ? ? ? ?ldr ? ? r0, .L5
>> ? ? ? ?mov ? ? r3, #85
>> ? ? ? ?mov ? ? r2, r0
>> .L2:
>> ? ? ? ?mul ? ? r1, r3, r3
>> ? ? ? ?sub ? ? r3, r3, #5
>> ? ? ? ?cmp ? ? r3, #10
>> ? ? ? ?str ? ? r1, [r2, #340]
>> ? ? ? ?sub ? ? r2, r2, #20
>> ? ? ? ?bne ? ? .L2
>> ? ? ? ?ldr ? ? r0, [r0, #60]
>> ? ? ? ?sub ? ? r0, r0, #225
>> ? ? ? ?bx ? ? ?lr
>> .L5:
>> ? ? ? ?.word ? a
>>
>> Loop body is executed 15 times.
>> When compiling with SMS, it finds a schedule with ii=7, stage_count=3
>> and following times:
>> Stage ?Time ? ? ? Insn
>> 0 ? ? ? ? ?5 ? ? ?mul ? ? r1, r3, r3
>> 1 ? ? ? ? 10 ? ? sub ? ? r3, r3, #5
>> 1 ? ? ? ? 11 ? ? cmp ? ? r3, #10
>> 1 ? ? ? ? 11 ? ? str ? ? r1, [r2, #340]
>> 1 ? ? ? ? 13 ? ? bne ? ? .L2
>> 2 ? ? ? ? 16 ? ? sub ? ? r2, r2, #20
>>
>
> branch is not scheduled last?
>

Yes, branch schedule time is smaller then sub's one.
This mean that "sub r2, r2, $20" instruction from original iteration
number K will be executed after
"bne .L2" from original iteration number K.
But certainly bne remains to be the last instuction in new loop body.
Below you can see how it looks after SMS.

>> To make new schedule correct the loop body
>> should be executed 14 times and we change compare instruction:
>
> the loop itself should execute 13 times.

with i =
85, 80, 75, 70, 65
60, 55, 50, 45, 40
35, 30, 25, 20, 15
this gives total 15 iterations (15 stores to memory).
And new loop body will be executed 13 times (one store goes to
epilogue and one - to prologue).

>> regF=COMPARE(regC,X) to regF=COMPARE(regC,Y) where Y = X - stage * step.
>> In our example regC is r3, X is 10, step = -5, compare instruction
>> is scheduled on stage 1, so it should be Y = 10 - 1 * (-5) = 15.
>>
>
> right. In general, if the compare is on stage s (starting from 0), it
> will be executed s times in the epilog, so it should exit the loop
> upon reaching Y = X - s * step.
>
>> So, after SMS it looks like:
>> ? ? ? ?ldr ? ? r0, .L5
>> ? ? ? ?mov ? ? r3, #85
>> ? ? ? ?mov ? ? r2, r0
>> ;;prologue
>> ? ? ? ?mul ? ? r1, r3, r3 ? ? ?;;from stage 0 first iteration
>> ? ? ? ?sub ? ? r3, r3, #5 ? ? ?;;3 insns from stage 1 first iteration
>> ? ? ? ?cmp ? ? r3, #10
>> ? ? ? ?str ? ? r1, [r2, #340]
>> ? ? ? ?mul ? ? r1, r3, r3 ? ? ?;;from stage 0 second iteration
>> ;;body
>> .L2:
>> ? ? ? ?sub ? ? r3, r3, #5
>> ? ? ? ?sub ? ? r2, r2, #20
>> ? ? ? ?cmp ? ? r3, #15 ? ? ? ? ;; new value to compare with is Y=15
>> ? ? ? ?str ? ? r1, [r2, #340]
>> ? ? ? ?mul ? ? r1, r3, r3
>> ? ? ? ?bne ? ? .L2
>> ;;epilogue
>> ? ? ? ?sub ? ? r2, r2, #20 ? ? ;;from stage 2 pre-last iteration
>> ? ? ? ?sub ? ? r3, r3, #5 ? ? ?;;3 insns from stage 1 last iteration
>> ? ? ? ?cmp ? ? r3, #10
>> ? ? ? ?str ? ? r1, [r2, #340]
>> ? ? ? ?sub ? ? r2, r2, #20 ? ? ;;from stage 2 last iteration
>>
>> ? ? ? ?ldr ? ? r0, [r0, #60]
>> ? ? ? ?sub ? ? r0, r0, #225
>> ? ? ? ?bx ? ? ?lr
>> .L5:
>> ? ? ? ?.word ? a
>>

Here in comments I mention why insn was copied to prolog and epilog.
Only branch is not copied at all.

>>> Testing of this appoach reveals two bugs, which do not appear while SMS was
>>> used only for doloop loops. ?Both these bugs happen due to the nature of the
>>> flag register. ?On x86_64 it is clobbered by most of arithmetic instructions.
> This should ideally be solved by a dedicated (separate) patch.
> ...
> This too should be solved by a dedicated (separate) patch, for easier digestion.

As Ayal asks, I'll continue discussion of these two bugs in two
separate e-mails, answering on this letter.

>>>
>>> One more thing to point out is number of loop iterations. When number of
>>> iterations of a loop is not known at compile time, SMS has to create two loop
>>> versions (original and scheduled), and execute scheduled one only when real
>>> number of iterations is bigger than number of stages. ?In doloop case the
>>> number of iterations simply equals to the count register value before the loop.
>>> So SMS finds its constant initialization or makes two loop versions. ?In new
>>> supported loops number of iterations value is more complex. ?It even can't be
>>> calculated as (final_reg_value-start_reg_value)/step because of examples like
>>> this:
>>>
>>> for (unsigned int x = 0x0; x != 0x6F80919A; x += 0xEDCBA987) ...;
>>>
>>> This loop has 22 iterations. ?So, i decided to use get_simple_loop_desc
>>> function which gives a structure with loop characteristics, some of them helps
>>> to find iteration number:
>>>
>>> rtx niter_expr - The number of iterations of the loop;
>>> bool const_iter - True if the loop iterates the constant number of times;
>>> unsigned HOST_WIDEST_INT niter - Number of iterations if constant;
>>>
>>> But we can use these expressions only after looking through some other fields
>>> of returned structure:
>>>
>>> bool simple_p - True if we are able to say anything about number of iterations
>>> of the loop;
>>> rtx assumptions - Assumptions under that the rest of the information is valid;
>>> rtx noloop_assumptions - Assumptions under which the loop ends before reaching
>>> the latch;
>>> rtx infinite - Condition under which the loop is infinite.
>>>
>>> I decide to allow SMS scheduling only when simple_p is true and other three
>>> fields are NULL_RTX, or when simple_p is true and
>>> flag_unsafe_loop_optimizations is set. ?One more exception is infinite
>>> condition, and the next separate patch is an attempt to process it.
>>>
>
> ok, still need to go over this rather lengthy and orthogonal (although
> it exposes the bugs above) piece.
>
> Ayal.
>
>

New version is attached, it suits current trunk.
Without fixing both bugs mentioned above, this patch brokes bootstrap on x86-64.

Together with DDG fixes the patch was succesfully regtested on ARM,
and "regstrapped" on x86-64 and IA64.

--
Roman Zhuykov
zhroma@ispras.ru
2011-12-07  Roman Zhuykov  <zhroma@ispras.ru>
	* modulo-sched.c (nondoloop_register_get): New function.
	(const_iteration_count): Rename to ...
	(search_const_init): ...this.  Add new parameter (is_const).  Always
	return register initialization rtx and set is_const to true
	only when it is constant.
	(duplicate_insns_of_cycles): Add new parameter (doloop_p).  Do not
	duplicate instructions with count_reg only when doloop_p is set.
	Update all callers.
	(generate_prolog_epilog): Add new parameters.  Correctly generate loop
	prologue for new loop pattern.
	(sms_schedule): Support new loop pattern.
---

diff --git a/gcc/modulo-sched.c b/gcc/modulo-sched.c
index 0ea9a4d..e62aca7 100644
--- a/gcc/modulo-sched.c
+++ b/gcc/modulo-sched.c
@@ -220,7 +220,8 @@ static void set_node_sched_params (ddg_ptr);
 static partial_schedule_ptr sms_schedule_by_order (ddg_ptr, int, int, int *);
 static void permute_partial_schedule (partial_schedule_ptr, rtx);
 static void generate_prolog_epilog (partial_schedule_ptr, struct loop *,
-                                    rtx, rtx);
+                                    rtx, bool, bool, rtx, HOST_WIDEST_INT,
+                                    bool, HOST_WIDEST_INT, rtx *);
 static int calculate_stage_count (partial_schedule_ptr, int);
 static void calculate_must_precede_follow (ddg_node_ptr, int, int,
 					   int, int, sbitmap, sbitmap, sbitmap);
@@ -255,7 +256,7 @@ typedef struct node_sched_params node_sched_params;
 DEF_VEC_O (node_sched_params);
 DEF_VEC_ALLOC_O (node_sched_params, heap);
 
-/* The following three functions are copied from the current scheduler
+/* The following two functions are copied from the current scheduler
    code in order to use sched_analyze() for computing the dependencies.
    They are used when initializing the sched_info structure.  */
 static const char *
@@ -398,37 +399,164 @@ doloop_register_get (rtx head ATTRIBUTE_UNUSED, rtx tail ATTRIBUTE_UNUSED)
 #endif
 }
 
-/* Check if COUNT_REG is set to a constant in the PRE_HEADER block, so
-   that the number of iterations is a compile-time constant.  If so,
-   return the rtx that sets COUNT_REG to a constant, and set COUNT to
-   this constant.  Otherwise return 0.  */
+/* Same as previous for loop with always-the-same-step counter.  */
 static rtx
-const_iteration_count (rtx count_reg, basic_block pre_header,
-		       HOST_WIDEST_INT * count)
+nondoloop_register_get (rtx head, rtx tail, int cmp_side,
+			rtx *addsub_output, rtx *cmp_output)
+{
+  rtx insn, reg, flagreg, addsub, cmp, end;
+
+  /* Check jump instruction form */
+  insn = single_set (tail);
+  if (insn == NULL_RTX
+      || SET_DEST (insn) != pc_rtx
+      || GET_CODE (SET_SRC (insn)) != IF_THEN_ELSE
+      || GET_CODE (XEXP (SET_SRC (insn), 1)) != LABEL_REF
+      || XEXP (SET_SRC (insn), 2) != pc_rtx)
+    return NULL_RTX;
+
+  /* Check loop exit condition */
+  insn = XEXP (SET_SRC (insn), 0);
+  if (GET_CODE (insn) != NE || XEXP (insn, 1) != const0_rtx)
+    return NULL_RTX;
+
+  /* Flags register */
+  flagreg = XEXP (insn, 0);
+
+  /* Searching comparison instruction */
+  cmp = PREV_INSN (tail);
+  while (cmp != PREV_INSN (head))
+    {
+      if (INSN_P (cmp) && reg_set_p (flagreg, cmp))
+        break;
+      cmp = PREV_INSN (cmp);
+    }
+  if (cmp == PREV_INSN (head))
+    return NULL_RTX;
+
+  /* Check comparison */
+  insn = single_set (cmp);
+  if (insn == NULL_RTX
+      || ! rtx_equal_p (flagreg, SET_DEST (insn))
+      || GET_CODE (SET_SRC (insn)) != COMPARE)
+    return NULL_RTX;
+
+  /* Loop register */
+  gcc_assert (0 <= cmp_side && cmp_side <= 1);
+  reg = XEXP (SET_SRC (insn), cmp_side);
+  if (! REG_P (reg))
+    return NULL_RTX;
+
+  /* End value */
+  end = XEXP (SET_SRC (insn), 1 - cmp_side);
+  if (! REG_P (end) && ! CONST_INT_P (end))
+    return NULL_RTX;
+
+  /* Searching register add\sub instruction */
+  addsub = PREV_INSN (cmp);
+  while (addsub != PREV_INSN (head))
+    {
+      if (INSN_P (addsub) && reg_set_p (reg, addsub))
+        break;
+      addsub = PREV_INSN (addsub);
+    }
+  if (addsub == PREV_INSN (head))
+    return NULL_RTX;
+
+  /* Checking register change instruction */
+  insn = single_set (addsub);
+  if (insn == NULL_RTX || ! rtx_equal_p (reg, SET_DEST (insn)))
+    return NULL_RTX;
+  insn = SET_SRC (insn);
+  if ((GET_CODE (insn) != PLUS && GET_CODE (insn) != MINUS)
+      || ! rtx_equal_p (reg, XEXP (insn, 0))
+      || ! (CONST_INT_P (XEXP (insn, 1))))
+    return NULL_RTX;
+
+  /* No other REG and END (if reg) modifications allowed */
+  for (insn = head; insn != tail; insn = NEXT_INSN (insn))
+    {
+      if (REG_P(end) && reg_set_p (end, insn))
+        {
+          if (dump_file)
+          {
+            fprintf (dump_file, "SMS end register found ");
+            print_rtl_single (dump_file, reg);
+            fprintf (dump_file, " outside write in insn:\n");
+            print_rtl_single (dump_file, insn);
+          }
+	  return NULL_RTX;
+	}
+      if (insn != addsub && reg_set_p (reg, insn))
+        {
+          if (dump_file)
+          {
+            fprintf (dump_file, "SMS count_reg found ");
+            print_rtl_single (dump_file, reg);
+            fprintf (dump_file, " outside write in insn:\n");
+            print_rtl_single (dump_file, insn);
+          }
+          return NULL_RTX;
+        }
+    }
+
+  *addsub_output = addsub;
+  *cmp_output = cmp;
+  return reg;
+}
+
+/* Check if REG is set to a constant in the PRE_HEADER block.
+   If possible to find, return the rtx that sets REG.
+   If REG is set to a constant (probably not directly),
+   set IS_CONST to true and VALUE to that constant value.  */
+static rtx
+search_const_init (basic_block pre_header, rtx reg, bool *is_const,
+		   HOST_WIDEST_INT *value)
 {
   rtx insn;
   rtx head, tail;
 
-  if (! pre_header)
-    return NULL_RTX;
+  if (!pre_header)
+    {
+      *is_const = false;
+      return NULL_RTX;
+    }
 
   get_ebb_head_tail (pre_header, pre_header, &head, &tail);
 
   for (insn = tail; insn != PREV_INSN (head); insn = PREV_INSN (insn))
     if (NONDEBUG_INSN_P (insn) && single_set (insn) &&
-	rtx_equal_p (count_reg, SET_DEST (single_set (insn))))
+	rtx_equal_p (reg, SET_DEST (single_set (insn))))
       {
-	rtx pat = single_set (insn);
+	rtx src, pat = single_set (insn);
+	src = SET_SRC (pat);
 
-	if (CONST_INT_P (SET_SRC (pat)))
+	if (CONST_INT_P (src))
 	  {
-	    *count = INTVAL (SET_SRC (pat));
-	    return insn;
+	    *is_const = true;
+	    *value = INTVAL (src);
+	  }
+	else if (REG_P (src))
+	  { /* Check if previous insn sets SRC = constant.  */
+	    pat = single_set (PREV_INSN (insn));
+	    if (pat != NULL_RTX && rtx_equal_p (src, SET_DEST (pat))
+		&& CONST_INT_P (SET_SRC (pat)))
+	      {
+		*is_const = true;
+		*value = INTVAL (SET_SRC (pat));
+	      }
+	    else
+		*is_const = false;
 	  }
+	else
+	  *is_const = false;
 
-	return NULL_RTX;
+	return insn;
       }
+    else if (reg_set_p (reg, insn))
+      break;
 
+  *is_const = false;
   return NULL_RTX;
 }
 
@@ -1103,7 +1231,7 @@ clear:
 
 static void
 duplicate_insns_of_cycles (partial_schedule_ptr ps, int from_stage,
-			   int to_stage, rtx count_reg)
+			   int to_stage, rtx count_reg, bool doloop_p)
 {
   int row;
   ps_insn_ptr ps_ij;
@@ -1115,14 +1243,14 @@ duplicate_insns_of_cycles (partial_schedule_ptr ps, int from_stage,
 	int first_u, last_u;
 	rtx u_insn;
 
-        /* Do not duplicate any insn which refers to count_reg as it
-           belongs to the control part.
+        /* In doloop case do not duplicate any insn which refers
+	   to count_reg as it belongs to the control part.
            The closing branch is scheduled as well and thus should
            be ignored.
            TODO: This should be done by analyzing the control part of
            the loop.  */
 	u_insn = ps_rtl_insn (ps, u);
-        if (reg_mentioned_p (count_reg, u_insn)
+        if ((doloop_p && reg_mentioned_p (count_reg, u_insn))
             || JUMP_P (u_insn))
           continue;
 
@@ -1142,7 +1270,10 @@ duplicate_insns_of_cycles (partial_schedule_ptr ps, int from_stage,
 /* Generate the instructions (including reg_moves) for prolog & epilog.  */
 static void
 generate_prolog_epilog (partial_schedule_ptr ps, struct loop *loop,
-                        rtx count_reg, rtx count_init)
+                        rtx count_reg, bool doloop_p, bool count_init_isconst,
+			rtx fin_reg, HOST_WIDEST_INT fin_nonconst_adjust,
+			bool create_reg, HOST_WIDEST_INT reg_val,
+			rtx *created_reg)
 {
   int i;
   int last_stage = PS_STAGE_COUNT (ps) - 1;
@@ -1151,12 +1282,12 @@ generate_prolog_epilog (partial_schedule_ptr ps, struct loop *loop,
   /* Generate the prolog, inserting its insns on the loop-entry edge.  */
   start_sequence ();
 
-  if (!count_init)
+  if (doloop_p && !count_init_isconst)
     {
-      /* Generate instructions at the beginning of the prolog to
-         adjust the loop count by STAGE_COUNT.  If loop count is constant
-         (count_init), this constant is adjusted by STAGE_COUNT in
-         generate_prolog_epilog function.  */
+      /* In doloop we generate instructions at the beginning of the prolog to
+         adjust the initial value of doloop counter by STAGE_COUNT.
+	 If loop count is constant, this adjustment is done outside this
+         function, simply correcting the source of initialization insn.  */
       rtx sub_reg = NULL_RTX;
 
       sub_reg = expand_simple_binop (GET_MODE (count_reg), MINUS,
@@ -1167,8 +1298,40 @@ generate_prolog_epilog (partial_schedule_ptr ps, struct loop *loop,
         emit_move_insn (count_reg, sub_reg);
     }
 
+  if (!doloop_p)
+    {
+      /* In non-doloop we generate instructions at the beginning of
+         the prolog to adjust the final value (with this value loop count
+	 register is compared to check whether the loop should stop).  */
+      if (fin_nonconst_adjust != 0)
+	{
+	  /* If the final value is in a register - create another register
+	     to store a shifted value.  */
+	  rtx new_reg, reg = NULL_RTX;
+	  reg = gen_reg_rtx (GET_MODE (fin_reg));
+	  new_reg = expand_simple_binop (GET_MODE (fin_reg), MINUS, fin_reg,
+					 GEN_INT (fin_nonconst_adjust),
+					 reg, 0, OPTAB_DIRECT);
+	  gcc_assert (REG_P (new_reg));
+	  if (REGNO (new_reg) != REGNO (reg))
+	    emit_move_insn (reg, new_reg);
+	  *created_reg = new_reg;
+	}
+      else if (create_reg)
+	{
+	  /* If old final value is an immediate, and the new one can't be
+	     an immediate, we create a register to store it.  If both values
+	     are immediate the adjustment is done outside this fuction,
+	     just correcting the constant value in compare intruction.  */
+	  rtx reg = NULL_RTX;
+	  reg = gen_reg_rtx (GET_MODE (count_reg));
+	  emit_move_insn (reg, GEN_INT (reg_val));
+	  *created_reg = reg;
+	}
+    }
+
   for (i = 0; i < last_stage; i++)
-    duplicate_insns_of_cycles (ps, 0, i, count_reg);
+    duplicate_insns_of_cycles (ps, 0, i, count_reg, doloop_p);
 
   /* Put the prolog on the entry edge.  */
   e = loop_preheader_edge (loop);
@@ -1182,7 +1345,7 @@ generate_prolog_epilog (partial_schedule_ptr ps, struct loop *loop,
   start_sequence ();
 
   for (i = 0; i < last_stage; i++)
-    duplicate_insns_of_cycles (ps, i + 1, last_stage, count_reg);
+    duplicate_insns_of_cycles (ps, i + 1, last_stage, count_reg, doloop_p);
 
   /* Put the epilogue on the exit edge.  */
   gcc_assert (single_exit (loop));
@@ -1460,13 +1623,30 @@ sms_schedule (void)
           continue;
         }
 
-      /* Make sure this is a doloop.  */
-      if ( !(count_reg = doloop_register_get (head, tail)))
-      {
-        if (dump_file)
-          fprintf (dump_file, "SMS doloop_register_get failed\n");
-	continue;
-      }
+      /* Is this a doloop?  */
+      if ((count_reg = doloop_register_get (head, tail)))
+        {
+	  if (dump_file)
+	    fprintf (dump_file, "SMS doloop\n");
+        }
+      else if ((count_reg = nondoloop_register_get (head, tail, 0,
+						    &insn, &insn)))
+	{
+	  if (dump_file)
+	    fprintf (dump_file, "SMS non-doloop\n");
+	}
+      else if ((count_reg = nondoloop_register_get (head, tail, 1,
+						    &insn, &insn)))
+	{
+	  if (dump_file)
+	    fprintf (dump_file, "SMS non-doloop with transposed cmp\n");
+	}
+      else
+	{
+	  if (dump_file)
+	    fprintf (dump_file, "SMS imcompatible loop\n");
+	  continue;
+	}
 
       /* Don't handle BBs with calls or barriers
 	 or !single_set with the exception of instructions that include
@@ -1516,7 +1696,6 @@ sms_schedule (void)
 	    fprintf (dump_file, "SMS create_ddg failed\n");
 	  continue;
         }
-
       g_arr[loop->num] = g;
       if (dump_file)
         fprintf (dump_file, "...OK\n");
@@ -1528,14 +1707,28 @@ sms_schedule (void)
     fprintf (dump_file, "=========================\n\n");
   }
 
+  df_clear_flags (DF_LR_RUN_DCE);
+
   /* We don't want to perform SMS on new loops - created by versioning.  */
   FOR_EACH_LOOP (li, loop, 0)
     {
+      bool doloop_p, count_fin_isconst, count_init_isconst;
+      bool was_immediate = false;
+      bool prolog_create_reg = false;
+      int prolog_fin_nonconst_adjust = 0;
+      bool nonsimple_loop = false;
       rtx head, tail;
-      rtx count_reg, count_init;
-      int mii, rec_mii, stage_count, min_cycle;
-      HOST_WIDEST_INT loop_count = 0;
+      int min_cycle;
       bool opt_sc_p;
+      rtx count_reg, count_fin_reg, new_comp_reg = NULL_RTX;
+      rtx count_init_insn, count_fin_init_insn;
+      rtx add, cmp;
+      int mii, rec_mii, cmp_side = -1, cmp_stage = -1;
+      int stage_count = 0;
+      HOST_WIDEST_INT count_init_val = 0, count_fin_val = 0;
+      HOST_WIDEST_INT count_step = 0, loop_count = -1;
+      HOST_WIDEST_INT count_fin_newval = 0;
+      struct niter_desc *desc = NULL;
 
       if (! (g = g_arr[loop->num]))
         continue;
@@ -1573,32 +1766,159 @@ sms_schedule (void)
 	               (HOST_WIDEST_INT) profile_info->sum_max);
 	      fprintf (dump_file, "\n");
 	    }
-	  fprintf (dump_file, "SMS doloop\n");
 	  fprintf (dump_file, "SMS built-ddg %d\n", g->num_nodes);
           fprintf (dump_file, "SMS num-loads %d\n", g->num_loads);
           fprintf (dump_file, "SMS num-stores %d\n", g->num_stores);
 	}
 
 
-      /* In case of th loop have doloop register it gets special
-	 handling.  */
-      count_init = NULL_RTX;
-      if ((count_reg = doloop_register_get (head, tail)))
+      /* Extract count register and determine loop type.  */
+      add = NULL_RTX;
+      cmp = NULL_RTX;
+      if ((count_reg = doloop_register_get (head, tail))
+	  || (count_reg = nondoloop_register_get (head, tail, 0, &add, &cmp))
+	  || (count_reg = nondoloop_register_get (head, tail, 1, &add, &cmp)))
 	{
-	  basic_block pre_header;
+	  basic_block pre_header = loop_preheader_edge (loop)->src;
+
+	  doloop_p = (cmp == NULL_RTX);
+	  if (doloop_p)
+	    {
+	      /* Doloop finish parameters are always the same.  */
+	      count_step = -1;
+	      count_fin_isconst = true;
+	      count_fin_val = 0;
+	      count_fin_reg = NULL_RTX;
+	      count_fin_init_insn = NULL_RTX;
+	    }
+	  else
+	    {
+	      /* In other loop we need to determine counter step
+	         and finish parameters.  */
+	      rtx step, end;
+
+	      gcc_assert (single_set (add) && single_set (cmp));
+
+	      /* Extract the step.  */
+	      step = XEXP (SET_SRC (single_set (add)), 1);
+	      gcc_assert (CONST_INT_P (step));
+
+	      if (GET_CODE (SET_SRC (single_set (add))) == MINUS)
+	        count_step = - INTVAL (step);
+	      else if (GET_CODE (SET_SRC (single_set (add))) == PLUS)
+	        count_step = INTVAL (step);
+	      else
+		gcc_unreachable ();
+
+	      gcc_assert(count_step != 0);
+
+	      /* Check what operand of compare insn is a counter register.  */
+	      if (count_reg == XEXP (SET_SRC (single_set (cmp)), 0))
+		cmp_side = 0;
+	      else if (count_reg == XEXP (SET_SRC (single_set (cmp)), 1))
+		cmp_side = 1;
+	      else
+		gcc_unreachable ();
+
+	      /* Extract finish border for counter reg.  */
+	      end = XEXP (SET_SRC (single_set (cmp)), 1 - cmp_side);
 
-	  pre_header = loop_preheader_edge (loop)->src;
-	  count_init = const_iteration_count (count_reg, pre_header,
-					      &loop_count);
+	      if (CONST_INT_P (end))
+		{
+		  /* Constant finish border.  loop until (reg != const).  */
+		  count_fin_isconst = true;
+		  count_fin_val = INTVAL (end);
+		  count_fin_reg = NULL_RTX;
+		  count_fin_init_insn = NULL_RTX;
+		}
+	      else if (REG_P (end))
+		{
+		  /* Register is a border.  Loop until (reg != fin_reg).  */
+		  count_fin_reg = end;
+		  count_fin_isconst = false;
+		  /* Try to find constant initinalization of fin_reg
+		   * in preheader.  */
+		  count_fin_init_insn = search_const_init (pre_header,
+							   count_fin_reg,
+							   &count_fin_isconst,
+							   &count_fin_val);
+		}
+	      else
+		gcc_unreachable ();
+	    }
+	  /* Try to find a constant initalization of count_reg in preheader.  */
+	  count_init_insn = search_const_init (pre_header,
+					       count_reg,
+					       &count_init_isconst,
+					       &count_init_val);
+	}
+      else /* Loop is incompatible now, but it was OK on while analyzing!  */
+	gcc_assert (count_reg);
+
+
+      desc = get_simple_loop_desc (loop);
+      gcc_assert (desc);
+      /* nonsimple_loop means it's impossible to analyze the loop
+         or there are some assumptions to make the analyzis results right
+         or there is a condition of non-infinite number of iterations.
+        We want doloops to be scheduled even if analyzis shows they are
+	 nonsimple (backward compatibility).  */
+      nonsimple_loop = !desc->simple_p;
+      /* We allow scheduling loop with some assumptions or infinite condition
+	 only when unsafe_loop_optimizations flag is enabled.  */
+      if (flag_unsafe_loop_optimizations)
+	 {
+	   desc->infinite = NULL_RTX;
+	   desc->assumptions = NULL_RTX;
+	   desc->noloop_assumptions = NULL_RTX;
+	 }
+      nonsimple_loop = nonsimple_loop || (desc->assumptions != NULL_RTX)
+			|| (desc->noloop_assumptions != NULL_RTX)
+			|| (desc->infinite != NULL_RTX);
+      /* Only doloops can be nonsimple_loops for SMS.  */
+      if (nonsimple_loop && !doloop_p)
+	{
+	  free_ddg (g);
+	  continue;
+	}
+      /* Manually set some description fields in non-simple doloop.  */
+      if (nonsimple_loop)
+	{
+	  gcc_assert(doloop_p);
+	  desc->const_iter = false;
+	  desc->infinite = NULL_RTX;
 	}
-      gcc_assert (count_reg);
 
-      if (dump_file && count_init)
+      if (desc->const_iter)
+	{
+	  gcc_assert (!desc->infinite);
+	  loop_count = desc->niter;
+	  if (dump_file)
+	    fprintf (dump_file, "SMS const loop iterations = "
+		     HOST_WIDEST_INT_PRINT_DEC "\n", loop_count);
+	}
+      if (count_init_isconst && count_fin_isconst)
         {
-          fprintf (dump_file, "SMS const-doloop ");
-          fprintf (dump_file, HOST_WIDEST_INT_PRINT_DEC,
-		     loop_count);
-          fprintf (dump_file, "\n");
+	  gcc_assert (doloop_p || desc->const_iter);
+	  if (doloop_p)
+	    {
+	      if (nonsimple_loop)
+		{
+	          loop_count = count_init_val;
+		  desc->const_iter = true;
+		}
+              gcc_assert (desc->const_iter && loop_count == count_init_val);
+	    }
+	  if (dump_file)
+	    {
+	      fprintf (dump_file, "SMS const-%s ",
+		       doloop_p ? "doloop" : "loop");
+	      fprintf (dump_file, HOST_WIDEST_INT_PRINT_DEC " to "
+		       HOST_WIDEST_INT_PRINT_DEC " step "
+		       HOST_WIDEST_INT_PRINT_DEC,
+		       count_init_val, count_fin_val, count_step);
+	      fprintf (dump_file, "\n");
+	    }
         }
 
       node_order = XNEWVEC (int, g->num_nodes);
@@ -1649,7 +1969,7 @@ sms_schedule (void)
 	     1 means that there is no interleaving between iterations thus
 	     we let the scheduling passes do the job in this case.  */
 	  if (stage_count < PARAM_VALUE (PARAM_SMS_MIN_SC)
-	      || (count_init && (loop_count <= stage_count))
+	      || (desc->const_iter && (loop_count <= stage_count))
 	      || (flag_branch_probabilities && (trip_count <= stage_count)))
 	    {
 	      if (dump_file)
@@ -1709,23 +2029,24 @@ sms_schedule (void)
 	      print_partial_schedule (ps, dump_file);
 	    }
  
-          /* case the BCT count is not known , Do loop-versioning */
-	  if (count_reg && ! count_init)
-            {
-	      rtx comp_rtx = gen_rtx_fmt_ee (GT, VOIDmode, count_reg,
-	  				     GEN_INT(stage_count));
-	      unsigned prob = (PROB_SMS_ENOUGH_ITERATIONS
-			       * REG_BR_PROB_BASE) / 100;
-
-	      loop_version (loop, comp_rtx, &condition_bb,
-	  		    prob, prob, REG_BR_PROB_BASE - prob,
-			    true);
-	     }
+	  if (!desc->const_iter)
+	    {
+	      /* Loop versioning if the number of iterations is unknown.  */
+	      unsigned prob;
+	      rtx vers_cond;
+	      vers_cond = gen_rtx_fmt_ee (GT, VOIDmode, nonsimple_loop ?
+					  count_reg : desc->niter_expr,
+					  GEN_INT (stage_count));
+	      if (dump_file)
+		{
+		  fprintf (dump_file, "\nLoop versioning condition:\n");
+		  print_rtl_single (dump_file, vers_cond);
+		}
 
-	  /* Set new iteration count of loop kernel.  */
-          if (count_reg && count_init)
-	    SET_SRC (single_set (count_init)) = GEN_INT (loop_count
-						     - stage_count + 1);
+	      prob = (PROB_SMS_ENOUGH_ITERATIONS * REG_BR_PROB_BASE) / 100;
+	      loop_version (loop, vers_cond, &condition_bb, prob,
+			    prob, REG_BR_PROB_BASE - prob, true);
+	    }
 
 	  /* Now apply the scheduled kernel to the RTL of the loop.  */
 	  permute_partial_schedule (ps, g->closing_branch->first_note);
@@ -1741,8 +2062,121 @@ sms_schedule (void)
 	  apply_reg_moves (ps);
 	  if (dump_file)
 	    print_node_sched_params (dump_file, g->num_nodes, ps);
-	  /* Generate prolog and epilog.  */
-          generate_prolog_epilog (ps, loop, count_reg, count_init);
+
+	  if (doloop_p && count_init_isconst)
+	    {
+	      /* Change counter reg initialization constant. In more complex
+	         cases this adjustment is done with adding some insns
+		 to loop prologue in generate_prolog_epilog function.  */
+	      gcc_assert (single_set (count_init_insn) != NULL_RTX);
+	      SET_SRC (single_set (count_init_insn))
+		    = GEN_INT (count_init_val - stage_count + 1);
+	      df_insn_rescan (count_init_insn);
+	    }
+
+	  if (!doloop_p)
+	    {
+	      /* Calculation of the compare insn stage in schedule.  */
+	      ps_insn_ptr crr_insn;
+	      int row, stage;
+	      cmp_stage = -1;
+	      for (row = 0; row < ps->ii; row++)
+		for (crr_insn = ps->rows[row];
+		     crr_insn;
+		     crr_insn = crr_insn->next_in_row)
+		  {
+		    stage = SCHED_STAGE (crr_insn->id);
+		    gcc_assert (0 <= stage && stage < stage_count);
+		    if (rtx_equal_p (ps_rtl_insn (ps, crr_insn->id), cmp))
+		      {
+			gcc_assert (cmp_stage == -1);
+		        cmp_stage = stage;
+		      }
+		  }
+              if (dump_file)
+		fprintf (dump_file, "cmp_stage=%d\n", cmp_stage);
+	      gcc_assert (cmp_stage >= 0);
+	    }
+
+	  /* When compare insn stage is non-zero we are to shift the final
+	     counter reg value (which counter is compared to exit loop).
+	     Final value can be an immediate or can be a register, which
+	     constant initialization we find in preheader.  */
+	  was_immediate = false;
+	  if (!doloop_p && count_fin_isconst && cmp_stage > 0)
+	    {
+              gcc_assert (0 <= cmp_side && cmp_side <= 1);
+	      /* New finish value.  */
+	      count_fin_newval = count_fin_val - count_step * cmp_stage;
+	      was_immediate = CONST_INT_P (XEXP (SET_SRC (single_set (cmp)),
+							  1 - cmp_side));
+	      if (was_immediate)
+		{
+		  /* Check whether new value also can be an immediate.
+		     For exapmle, on ARM not all values can be encoded as
+		     an immediate, so we have to load it to a register once
+		     before the loop starts.  */
+		  rtx to = GEN_INT (count_fin_newval);
+		  prolog_create_reg = rtx_cost (to, GET_CODE (to), 0, false)
+			    > rtx_cost (GEN_INT(1), CONST_INT, 0, false);
+	        }
+	      else
+		{
+		  /* A value is already in a register and we easily change
+		     initialization instruction in preheader.  */
+		  gcc_assert (count_fin_init_insn);
+		  SET_SRC (single_set (count_fin_init_insn))
+			= GEN_INT (count_fin_newval);
+		  df_insn_rescan (count_fin_init_insn);
+		}
+	    }
+
+	  /* The adjustment of finish register value.
+	     Zero means no adjustment needed or adjusment is done
+	     without additional insn in prologue.  */
+	  if (!doloop_p && !count_fin_isconst)
+	    prolog_fin_nonconst_adjust = count_step * cmp_stage;
+
+	  /* Ready to generate prolog and epilog.  */
+	  generate_prolog_epilog (ps, loop, count_reg, doloop_p,
+			          count_init_isconst, count_fin_reg,
+				  prolog_fin_nonconst_adjust,
+				  prolog_create_reg, count_fin_newval,
+				  &new_comp_reg);
+
+	  /* And only after generating prolog and epilog it is possible
+	     to modify the compare instruction (to prevent copying wrong insn
+	     form to first and last stages).  */
+	  if (!doloop_p && cmp_stage > 0)
+	    {
+              gcc_assert (0 <= cmp_side && cmp_side <= 1);
+	      if (was_immediate && !prolog_create_reg)
+		{
+		/* Easy case - just modify a constant.  */
+		  gcc_assert (new_comp_reg == NULL_RTX);
+		  XEXP (SET_SRC (single_set (cmp)), 1 - cmp_side)
+			= GEN_INT (count_fin_newval);
+		}
+	      else
+		{
+		  if (count_fin_isconst && !was_immediate)
+		    /* Value is in a register and we already changed
+		       initialization instruction in preheader.  */
+		    gcc_assert (new_comp_reg == NULL_RTX);
+		  else
+		    {
+		      /* Another case - use created by generate_prolog_epilog
+		         register, which value is initialized in prologue.  */
+		      gcc_assert (new_comp_reg != NULL_RTX);
+		      XEXP (SET_SRC (single_set (cmp)), 1 - cmp_side)
+			      = new_comp_reg;
+		    }
+		}
+	      df_insn_rescan (cmp);
+	    }
+	  else
+	    gcc_assert (new_comp_reg == NULL_RTX);
+
 	  break;
 	}
 
@@ -1752,7 +2186,9 @@ sms_schedule (void)
       free_ddg (g);
     }
 
+  df_set_flags (DF_LR_RUN_DCE);
   free (g_arr);
+  iv_analysis_done ();
 
   /* Release scheduler data, needed until now because of DFA.  */
   haifa_sched_finish ();

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]