This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RFA: Reduce iterations of elimination bookkeeping

A few weeks ago I instrumented the top section of the main reload loop. Specifically, I was looking to see how often we have to do elimination bookkeeping.

For an x86 bootstrap, we call reload 101k times and we perform elimination bookkeeping 151k times. Obviously, we're iterating less than half the times we call reload -- that was somewhat of a surprise.

However, there's still 50k iterations of elimination bookkeeping that are worth looking at.

We will iterate on elimination bookkeeping any time the size of the frame changes. So if we spill a pseudo to memory, allocate a caller-save slot, align the stack, spill a memory address. We also iterate for a variety of other reasons such as unexpected changes in elimination offsets, a previously eliminable register is no longer eliminable, any spill code generation.

About 13k of the iterations occur because we allocated a stack slot for caller-save registers. Yup, that's right, 13k just because we allocated a slot for a caller-save. These iterations can be trivially avoided by allocating caller-save slots before we do the elimination bookkeeping.

About 26k iterations occur because of requested stack alignments. Yes, we allocate a slot to ensure stack alignment *after* elimination bookkeeping. Interestingly enough, we used to align prior to elimination bookkeeping, but the code was moved in response to pr29248 and pr28966. Fortunately, all that was really necessary to fix those PRs was to avoid aligning the stack if none had yet been allocated -- changing the sequencing of elimination bookkeeping & stack alignment was not necessary and obviously was making more work for reload than was necessary.

With those two fixes to sequencing, we can eliminate 38k (of the 50k) iterations of elimination bookkeeping. The remaining iterations are almost exclusively due to spilling.

From a code generation standpoint, these changes permute where objects land in the frame, so it's possible we can get minor code generation differences on targets with restricted displacements in reg+d addressing modes. We can also get some changes in cache behaviour. I would expect both effects to be neutral overall.

From a compile-time standpoint, we're clearly doing less work, so I'd expect some minor (possibly unmeasurable) compile-time improvements.

Bootstrapped and regression tested on i686-pc-linux-gnu. I also verified pr29248 and pr28966 continue to generate the desired code.

   * reload1.c (reload): Allocate caller-save areas and conditionally
   align the stack before elimination bookkeeping.

Index: reload1.c
--- reload1.c	(revision 145487)
+++ reload1.c	(working copy)
@@ -965,11 +965,31 @@
       int did_spill;
       HOST_WIDE_INT starting_frame_size;
-      starting_frame_size = get_frame_size ();
       set_initial_elim_offsets ();
       set_initial_label_offsets ();
+      /* Set up the caller-save areas before elimination bookkeeping.
+	 This eliminates about 25% of the iterations of the elimination
+	 bookkeeping code as we no longer have to iterate the bookkeeping
+	 if CALLER_SAVE_NEEDED is true.  */
+      if (caller_save_needed)
+	setup_save_areas ();
+      /* If we have a stack frame, go ahead and align it before we
+	 handle elimination bookkeeping.  This avoids another 50% of the
+	 iterations of the bookkeeping code. 
+	 We don't align if there is no stack, as that will cause a stack
+	 frame when none is needed should STARTING_FRAME_OFFSET not be
+	 already aligned to STACK_BOUNDARY.  */
+      if (get_frame_size () && crtl->stack_alignment_needed)
+	assign_stack_local (BLKmode, 0, crtl->stack_alignment_needed);
+      /* Do this after we have set up the caller-save areas and handled
+	 the stack alignment requests.  This allows elimination bookkeeping
+	 to stabilize without iterating much more often.  */
+      starting_frame_size = get_frame_size ();
       /* For each pseudo register that has an equivalent location defined,
 	 try to eliminate any eliminable registers (such as the frame pointer)
 	 assuming initial offsets for the replacement register, which
@@ -1025,26 +1045,9 @@
-      if (caller_save_needed)
-	setup_save_areas ();
       /* If we allocated another stack slot, redo elimination bookkeeping.  */
       if (starting_frame_size != get_frame_size ())
-      if (starting_frame_size && crtl->stack_alignment_needed)
-	{
-	  /* If we have a stack frame, we must align it now.  The
-	     stack size may be a part of the offset computation for
-	     register elimination.  So if this changes the stack size,
-	     then repeat the elimination bookkeeping.  We don't
-	     realign when there is no stack, as that will cause a
-	     stack frame when none is needed should
-	     STARTING_FRAME_OFFSET not be already aligned to
-	     STACK_BOUNDARY.  */
-	  assign_stack_local (BLKmode, 0, crtl->stack_alignment_needed);
-	  if (starting_frame_size != get_frame_size ())
-	    continue;
-	}
       if (caller_save_needed)

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]