patches to reorg.c for c4x

Herman ten Brugge Haj.Ten.Brugge@net.HCC.nl
Tue Sep 29 13:31:00 GMT 1998


Hello,

Now that the c4x target is included in egcs we really need the patches
to reorg.c. If these are not applied the compilation fails when generating
the libgcc library files. I did sent these patches some time ago but they
were not included.

I will describe the modification I made and include the patch to solve the
problems. I generated all these patches myself.

I started by moving some code from fill_slots_from_thread to
steal_delay_list_from_target because this code was only needed in this routine.
Micheal hayes sent this patch a long time ago after he received it from me.
I did not understand reorg.c very well at that time and made the patch
in the wrong place.

The first problem is with steal_delay_list_from_target. If you have the
following program:
	bned label	: delayed branch to label when condition is not equal
	 cmp 1,r0	: 3 delay slots the first one modifies the 
	 <empty>	  condition codes.
	 <empty>

      label:
	bned anotherlabel
	 cmp r2,r3	: another instruction that modifies the 
	 <empty>	  condition codes.
	 <empty>

The problem I was having is that the instruction 'cmp 1,r0' modifies the
condition codes. We can not steal any instructions from the delay list
if any of the instructions modify the condition codes. This test is only
needed if we try to steal instructions from conditional branches.
The patch also has a side effect. It will not move any instructions from
conditional branches to the first delayed branch if we try to steal from
a conditional branch. We could do this but we end up executing a lot of
instructions twice. We could try to improved this in the future.
I do the test for HAVE_cc0 because it is also in the main for loop in
steal_delay_list_from_target. I am not sure if a delayed branch can ever
contain CC0 because the setter & user are required to remain consecutive
and in-order. I don't think the test is a problem however.

1998-09-29 Herman A.J. ten Brugge <Haj.Ten.Brugge@net.HCC.nl>
	
	* reorg.c fix problems for targets with more than 1 delay slot. (c4x)


--- reorg.c.org	Tue Sep 29 17:55:09 1998
+++ reorg.c	Tue Sep 29 18:30:42 1998
@@ -61,6 +61,10 @@ Boston, MA 02111-1307, USA.  */
    we can hoist insns from the fall-through path for forward branches or
    steal insns from the target of backward branches.
 
+   The TMS320C3x and C4x have three branch delay slots.  When the three
+   slots are filled, the branch penalty is zero.  Most insns can fill the
+   delay slots except jump insns.
+
    Three techniques for filling delay slots have been implemented so far:
 
    (1) `fill_simple_delay_slots' is the simplest, most efficient way
@@ -1664,7 +1702,9 @@ steal_delay_list_from_target (insn, cond
   int total_slots_filled = *pslots_filled;
   rtx new_delay_list = 0;
   int must_annul = *pannul_p;
+  int used_annul = 0;
   int i;
+  struct resources cc_set;
 
   /* We can't do anything if there are more delay slots in SEQ than we
      can handle, or if we don't know that it will be a taken branch.
@@ -1674,7 +1714,27 @@ steal_delay_list_from_target (insn, cond
      Also, exit if the branch has more than one set, since then it is computing
      other results that can't be ignored, e.g. the HPPA mov&branch instruction.
      ??? It may be possible to move other sets into INSN in addition to
-     moving the instructions in the delay slots.  */
+     moving the instructions in the delay slots.
+
+     We can not steal the delay list if one of the instructions in the
+     current delay_list modifies the status bits and the jump is a
+     conditional jump. */
+
+  CLEAR_RESOURCE (&cc_set);
+  for (temp = delay_list; temp; temp = XEXP (temp, 1))
+    {
+      rtx trial = XEXP (temp, 0);
+
+      mark_set_resources (trial, &cc_set, 0, 1);
+      if (insn_references_resource_p (XVECEXP (seq , 0, 0), &cc_set, 0)
+#ifdef HAVE_cc0
+	  /* If TRIAL sets CC0, we can't copy it, so we can't steal this
+	     delay list.  */
+	  || find_reg_note (trial, REG_CC_USER, NULL_RTX)
+#endif
+	 )
+        return delay_list;
+    }
 
   if (XVECLEN (seq, 0) - 1 > slots_remaining
       || ! condition_dominates_p (condition, XVECEXP (seq, 0, 0))
@@ -1686,9 +1746,11 @@ steal_delay_list_from_target (insn, cond
       rtx trial = XVECEXP (seq, 0, i);
       int flags;
 
+      mark_set_resources (trial, &cc_set, 0, 1);
       if (insn_references_resource_p (trial, sets, 0)
 	  || insn_sets_resource_p (trial, needed, 0)
 	  || insn_sets_resource_p (trial, sets, 0)
+	  || insn_references_resource_p (XVECEXP (seq , 0, 0), &cc_set, 0)
 #ifdef HAVE_cc0
 	  /* If TRIAL sets CC0, we can't copy it, so we can't steal this
 	     delay list.  */
@@ -3668,8 +3761,6 @@ fill_slots_from_thread (insn, condition,
 
 		  delay_list = add_to_delay_list (temp, delay_list);
 
-		  mark_set_resources (trial, &opposite_needed, 0, 1);
-
 		  if (slots_to_fill == ++(*pslots_filled))
 		    {
 		      /* Even though we have filled all the slots, we
@@ -3747,9 +3838,7 @@ fill_slots_from_thread (insn, condition,
     {
       /* If this is the `true' thread, we will want to follow the jump,
 	 so we can only do this if we have taken everything up to here.  */
-      if (thread_if_true && trial == new_thread
-	  && ! insn_references_resource_p (XVECEXP (PATTERN (trial), 0, 0),
-					   &opposite_needed, 0))
+      if (thread_if_true && trial == new_thread)
 	delay_list
 	  = steal_delay_list_from_target (insn, condition, PATTERN (trial),
 					  delay_list, &set, &needed,


The second problem is with try_merge_delay_insns. The following code shows
the problem:
         bned label     : delayed branch to label when condition is not equal
          ldiu  r3,r0   : tree delay instructions which modify r0 
          lsh   -16,r0 
          addi  1,r0,r0 
  
         ldiu  r3,r0    : some other code that modifies r0
         lsh   -16,r0
        label:

The problem was that the second 'ldiu  r3,r0' was removed. This was because
only the first instruction in the delayed branch was compared with this
instruction. I now first set the mark_referenced_resources for all 
instructions in the delay list and then check if we can get rid of the
extra instructions.

@@ -1855,7 +1932,9 @@ try_merge_delay_insns (insn, thread)
      will essentially disable this optimization.  This method is somewhat of
      a kludge, but I don't see a better way.)  */
   if (! annul_p)
-    mark_referenced_resources (next_to_match, &needed, 1);
+    for (i = 1 ; i < num_slots ; i++)
+      if (XVECEXP (PATTERN (insn), 0, i))
+        mark_referenced_resources (XVECEXP (PATTERN (insn), 0, i), &needed, 1);
 
   for (trial = thread; !stop_search_p (trial, 1); trial = next_trial)
     {
@@ -1904,8 +1983,6 @@ try_merge_delay_insns (insn, thread)
 	    break;
 
 	  next_to_match = XVECEXP (PATTERN (insn), 0, slot_number);
-	  if (! annul_p)
-	    mark_referenced_resources (next_to_match, &needed, 1);
 	}
 
       mark_set_resources (trial, &set, 0, 1);


The third problem I had was also in this function. The sequence below
shows the problem.

        label:
	 ... 		some other code

         bgtd label     : delayed branch to label when condition is greater then
          cmpi3  0,r4   : set condition codes
          nop
          nop
	 bd label	: unconditional delayed branch to label
	  addf   f7,*ar0,r0 : add and destroy condition codes
	  stf    f0,*ar0    : store result
          cmpi3  0,r4   : set condition codes

The problem was that the second 'cmpi3 0,r4' was removed because the 
resources were not updated. The patch below fixes this.

@@ -1954,6 +2035,11 @@ try_merge_delay_insns (insn, thread)
 
 	      next_to_match = XVECEXP (PATTERN (insn), 0, slot_number);
 	    }
+	  else
+	    {
+              mark_set_resources (dtrial, &set, 0, 1);
+              mark_referenced_resources (dtrial, &needed, 1);
+	    }
 	}
     }
 


The patch below is allready included in testgcc-980813. I modified the 
routine check_annul_list_true_false the same way as was suggested by
Paul Eggert <eggert@twinsun.com> on Mon Aug 17 00:12:42 1998.

@@ -234,7 +238,7 @@ static int insn_sets_resource_p PROTO((r
 static rtx find_end_label	PROTO((void));
 static rtx emit_delay_sequence	PROTO((rtx, rtx, int));
 static rtx add_to_delay_list	PROTO((rtx, rtx));
-static void delete_from_delay_slot PROTO((rtx));
+static rtx delete_from_delay_slot PROTO((rtx));
 static void delete_scheduled_jump PROTO((rtx));
 static void note_delay_statistics PROTO((int, int));
 static rtx optimize_skip	PROTO((rtx));
@@ -1011,7 +1022,7 @@ add_to_delay_list (insn, delay_list)
 /* Delete INSN from the delay slot of the insn that it is in.  This may
    produce an insn without anything in its delay slots.  */
 
-static void
+static rtx
 delete_from_delay_slot (insn)
      rtx insn;
 {
@@ -1060,6 +1071,8 @@ delete_from_delay_slot (insn)
 
   /* Show we need to fill this insn again.  */
   obstack_ptr_grow (&unfilled_slots_obstack, trial);
+
+  return trial;
 }
 
 /* Delete INSN, a JUMP_INSN.  If it is a conditional jump, we must track down
@@ -1624,6 +1637,31 @@ redirect_with_delay_list_safe_p (jump, n
   return (li == NULL);
 }
 
+/* DELAY_LIST is a list of insns that have already been placed into delay
+   slots.  See if all of them have the same annulling status as ANNUL_TRUE_P.
+   If not, return 0; otherwise return 1.  */
+
+static int
+check_annul_list_true_false (annul_true_p, delay_list)
+     int annul_true_p;
+     rtx delay_list;
+{
+  rtx temp, trial;
+
+  if (delay_list)
+    {
+      for (temp = delay_list; temp; temp = XEXP (temp, 1))
+        {
+          rtx trial = XEXP (temp, 0);
+ 
+          if ((annul_true_p && INSN_FROM_TARGET_P (trial))
+	      || (!annul_true_p && !INSN_FROM_TARGET_P (trial)))
+	    return 0;
+        }
+    }
+  return 1;
+}
+
 
 /* INSN branches to an insn whose pattern SEQ is a SEQUENCE.  Given that
    the condition tested by INSN is CONDITION and the resources shown in
@@ -1714,9 +1776,15 @@ steal_delay_list_from_target (insn, cond
 	       || (! insn_sets_resource_p (trial, other_needed, 0)
 		   && ! may_trap_p (PATTERN (trial)))))
 	  ? eligible_for_delay (insn, total_slots_filled, trial, flags)
-	  : (must_annul = 1,
-	     eligible_for_annul_false (insn, total_slots_filled, trial, flags)))
+	  : (must_annul || (delay_list == NULL && new_delay_list == NULL))
+	     && (must_annul = 1,
+	         check_annul_list_true_false (0, delay_list)
+	         && check_annul_list_true_false (0, new_delay_list)
+	         && eligible_for_annul_false (insn, total_slots_filled,
+					      trial, flags)))
 	{
+	  if (must_annul)
+	    used_annul = 1;
 	  temp = copy_rtx (trial);
 	  INSN_FROM_TARGET_P (temp) = 1;
 	  new_delay_list = add_to_delay_list (temp, new_delay_list);
@@ -1735,7 +1803,8 @@ steal_delay_list_from_target (insn, cond
   /* Add any new insns to the delay list and update the count of the
      number of slots filled.  */
   *pslots_filled = total_slots_filled;
-  *pannul_p = must_annul;
+  if (used_annul)
+    *pannul_p = 1;
 
   if (delay_list == 0)
     return new_delay_list;
@@ -1765,6 +1834,8 @@ steal_delay_list_from_fallthrough (insn,
 {
   int i;
   int flags;
+  int must_annul = *pannul_p;
+  int used_annul = 0;
 
   flags = get_jump_flags (insn, JUMP_LABEL (insn));
 
@@ -1798,14 +1869,17 @@ steal_delay_list_from_fallthrough (insn,
 	  continue;
 	}
 
-      if (! *pannul_p
+      if (! must_annul
 	  && ((condition == const_true_rtx
 	       || (! insn_sets_resource_p (trial, other_needed, 0)
 		   && ! may_trap_p (PATTERN (trial)))))
 	  ? eligible_for_delay (insn, *pslots_filled, trial, flags)
-	  : (*pannul_p = 1,
-	     eligible_for_annul_true (insn, *pslots_filled, trial, flags)))
+	  : (must_annul || delay_list == NULL) && (must_annul = 1,
+	     check_annul_list_true_false (1, delay_list)
+	     && eligible_for_annul_true (insn, *pslots_filled, trial, flags)))
 	{
+	  if (must_annul)
+	    used_annul = 1;
 	  delete_from_delay_slot (trial);
 	  delay_list = add_to_delay_list (trial, delay_list);
 
@@ -1816,8 +1890,11 @@ steal_delay_list_from_fallthrough (insn,
 	break;
     }
 
+  if (used_annul)
+    *pannul_p = 1;
   return delay_list;
 }
+
 
 /* Try merging insns starting at THREAD which match exactly the insns in
    INSN's delay list.
@@ -1941,8 +2018,12 @@ try_merge_delay_insns (insn, thread)
 	    {
 	      if (! annul_p)
 		{
+		  rtx new;
+
 		  update_block (dtrial, thread);
-		  delete_from_delay_slot (dtrial);
+		  new = delete_from_delay_slot (dtrial);
+	          if (INSN_DELETED_P (thread))
+		    thread = new;
 		  INSN_FROM_TARGET_P (next_to_match) = 0;
 		}
 	      else
@@ -1968,8 +2054,12 @@ try_merge_delay_insns (insn, thread)
 	{
 	  if (GET_MODE (merged_insns) == SImode)
 	    {
+	      rtx new;
+
 	      update_block (XEXP (merged_insns, 0), thread);
-	      delete_from_delay_slot (XEXP (merged_insns, 0));
+	      new = delete_from_delay_slot (XEXP (merged_insns, 0));
+	      if (INSN_DELETED_P (thread))
+		thread = new;
 	    }
 	  else
 	    {
@@ -3600,9 +3690,10 @@ fill_slots_from_thread (insn, condition,
 	  /* There are two ways we can win:  If TRIAL doesn't set anything
 	     needed at the opposite thread and can't trap, or if it can
 	     go into an annulled delay slot.  */
-	  if (condition == const_true_rtx
-	      || (! insn_sets_resource_p (trial, &opposite_needed, 1)
-		  && ! may_trap_p (pat)))
+	  if (!must_annul
+	      && (condition == const_true_rtx
+	          || (! insn_sets_resource_p (trial, &opposite_needed, 1)
+		      && ! may_trap_p (pat))))
 	    {
 	      old_trial = trial;
 	      trial = try_split (pat, trial, 0);
@@ -3630,9 +3721,11 @@ fill_slots_from_thread (insn, condition,
 	      if (thread == old_trial)
 		thread = trial;
 	      pat = PATTERN (trial);
-	      if ((thread_if_true
-		   ? eligible_for_annul_false (insn, *pslots_filled, trial, flags)
-		   : eligible_for_annul_true (insn, *pslots_filled, trial, flags)))
+	      if ((must_annul || delay_list == NULL) && (thread_if_true
+		   ? check_annul_list_true_false (0, delay_list)
+		     && eligible_for_annul_false (insn, *pslots_filled, trial, flags)
+		   : check_annul_list_true_false (1, delay_list)
+		     && eligible_for_annul_true (insn, *pslots_filled, trial, flags)))
 		{
 		  rtx temp;
 

-- 
-------------------------------------------------------------------------
Herman ten Brugge			Email:	Haj.Ten.Brugge@net.HCC.nl



More information about the Gcc-patches mailing list