This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] More improvements to x87 stack popping


As revealed by Uros' benchmarking, although my recent improvements to
popping FP registers of the x87 stack generally improves performance,
there exist a few cases where my changes actually increase the number of
fxch instructions required.  These pathological instances are technically
regressions and have therefore been sitting on my conscience.  The patch
below contains two further refinements to redress the situation.  The
first tweak is an impressive piece of algorithmic analysis (that in
practice makes relatively little difference), and the second is an
almost mindless heuristic (that does surprisingly well, Doh!).


The first tweak is the counterintuitive result that placing as many
registers as possible in their correct locations doesn't always
minimize the number of instructions required to rearrange the stack
into its final ordering!?  The issue is the assymmetry of the x87
stack's behaviour, where all swaps have to go through the top-of-stack.

Using the same notation as the previous thread, consider the following
example, where we start with (A B c D e) and require (A B D) on output.
With my recent change to preference as many final locations as possible,
we generate the following sequence.

	; initial state (A B c D e)
	FSTP 2		(B A D e)
	FSTP 3		(A D B)		; At the end of the popping
					; phase as many registers are
					; in their correct locations
					; as possible.
	FXCH 1		(D A B)
	FXCH 2		(B A D)
	FXCH 1		(A B D)

Notice the problem is that although "A" is placed in its correct
final location during the popping phase, it has to be moved out of
the way in order to reorder the elements into the correct permutation
during the "shuffle" phase.

It turns out that we can do better by not placing the eventual top
of stack in its correct location *if* there's any shuffling to be
done afterwards.  When we know that we need to permute things, we
purposefully place the top-of-stack elsewhere during the popping
phase.

For the example above, we now generate the shorter and faster:

	; initial state (A B c D e)
	FSTP 4		(B c D A)
	FSTP 1		(B D A)
	FXCH 1		(D B A)
	FXCH 2		(A B D)



The second change is the trivial heuristic that the most recently
generated values of the previous basic block are most likely to
be the next required values in the subsequent basic block.  The
current algorithm fills unassigned stack slots from bottom to top
when popping values off of the stack, hence (A B C d e f) with no
preference becomes (C B A) where the three live values end up
on the stack in reverse order.  The surprisingly effective modification
(though marginally more effort) is to fill the unassigned stack slots
from new-top to bottom, so that (A B C d e f) becomes (A B C), where
the most recently generated values tend to remain near the top of
the stack.


Implementing these two improvements reduced the number of "fxch"
instructions required in povray when compiled with -O2 -ffast-math.
The following patch has been tested on i686-pc-linux-gnu with a full
"make bootstrap", all default languages, and regression tested with
a top-level "make -k check" with no new failures.

Ok for mainline?



2004-12-10  Roger Sayle  <roger@eyesopen.com>

	* reg-stack.c (change_stack): Avoid placing the new top-of-stack in
	its correct location during popping if we need to permute the stack
	afterwards.  Attempt to preserve the original stack ordering.


Index: reg-stack.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/reg-stack.c,v
retrieving revision 1.168
diff -c -3 -p -r1.168 reg-stack.c
*** reg-stack.c	30 Nov 2004 02:34:56 -0000	1.168
--- reg-stack.c	11 Dec 2004 00:27:24 -0000
*************** change_stack (rtx insn, stack old, stack
*** 2374,2386 ****
      {
        bool slots[REG_STACK_SIZE];
        int pops[REG_STACK_SIZE];
!       int next, dest;

        /* First pass to determine the free slots.  */
        for (reg = 0; reg <= new->top; reg++)
  	slots[reg] = TEST_HARD_REG_BIT (new->reg_set, old->reg[reg]);

        /* Second pass to allocate preferred slots.  */
        for (reg = old->top; reg > new->top; reg--)
  	if (TEST_HARD_REG_BIT (new->reg_set, old->reg[reg]))
  	  {
--- 2374,2387 ----
      {
        bool slots[REG_STACK_SIZE];
        int pops[REG_STACK_SIZE];
!       int next, dest, topsrc;

        /* First pass to determine the free slots.  */
        for (reg = 0; reg <= new->top; reg++)
  	slots[reg] = TEST_HARD_REG_BIT (new->reg_set, old->reg[reg]);

        /* Second pass to allocate preferred slots.  */
+       topsrc = -1;
        for (reg = old->top; reg > new->top; reg--)
  	if (TEST_HARD_REG_BIT (new->reg_set, old->reg[reg]))
  	  {
*************** change_stack (rtx insn, stack old, stack
*** 2388,2393 ****
--- 2389,2398 ----
  	    for (next = 0; next <= new->top; next++)
  	      if (!slots[next] && new->reg[next] == old->reg[reg])
  		{
+ 		  /* If this is a preference for the new top of stack, record
+ 		     the fact by remembering it's old->reg in topsrc.  */
+                   if (next == new->top)
+ 		    topsrc = reg;
  		  slots[next] = true;
  		  dest = next;
  		  break;
*************** change_stack (rtx insn, stack old, stack
*** 2397,2404 ****
  	else
  	  pops[reg] = reg;

        /* Third pass allocates remaining slots and emits pop insns.  */
!       next = 0;
        for (reg = old->top; reg > new->top; reg--)
  	{
  	  dest = pops[reg];
--- 2402,2424 ----
  	else
  	  pops[reg] = reg;

+       /* Intentionally, avoid placing the top of stack in it's correct
+ 	 location, if we still need to permute the stack below and we
+ 	 can usefully place it somewhere else.  This is the case if any
+ 	 slot is still unallocated, in which case we should place the
+ 	 top of stack there.  */
+       if (topsrc != -1)
+ 	for (reg = 0; reg < new->top; reg++)
+ 	  if (!slots[reg])
+ 	    {
+ 	      pops[topsrc] = reg;
+ 	      slots[new->top] = false;
+ 	      slots[reg] = true;
+ 	      break;
+ 	    }
+
        /* Third pass allocates remaining slots and emits pop insns.  */
!       next = new->top;
        for (reg = old->top; reg > new->top; reg--)
  	{
  	  dest = pops[reg];
*************** change_stack (rtx insn, stack old, stack
*** 2406,2432 ****
  	    {
  	      /* Find next free slot.  */
  	      while (slots[next])
! 		next++;
! 	      dest = next++;
  	    }
  	  emit_pop_insn (insn, old, FP_MODE_REG (old->reg[dest], DFmode),
  			 EMIT_BEFORE);
  	}
      }
    else
!     /* The following loop attempts to maximize the number of times we
!        pop the top of the stack, as this permits the use of the faster
!        ffreep instruction on platforms that support it.  */
!     for (reg = 0; reg <= old->top; reg++)
!       if (! TEST_HARD_REG_BIT (new->reg_set, old->reg[reg]))
! 	{
! 	  while (old->top > reg
! 		 && ! TEST_HARD_REG_BIT (new->reg_set, old->reg[old->top]))
! 	    emit_pop_insn (insn, old, FP_MODE_REG (old->reg[old->top], DFmode),
  			   EMIT_BEFORE);
! 	  emit_pop_insn (insn, old, FP_MODE_REG (old->reg[reg], DFmode),
  			 EMIT_BEFORE);
! 	}

    if (new->top == -2)
      {
--- 2426,2463 ----
  	    {
  	      /* Find next free slot.  */
  	      while (slots[next])
! 		next--;
! 	      dest = next--;
  	    }
  	  emit_pop_insn (insn, old, FP_MODE_REG (old->reg[dest], DFmode),
  			 EMIT_BEFORE);
  	}
      }
    else
!     {
!       /* The following loop attempts to maximize the number of times we
! 	 pop the top of the stack, as this permits the use of the faster
! 	 ffreep instruction on platforms that support it.  */
!       int live, next;
!
!       live = 0;
!       for (reg = 0; reg <= old->top; reg++)
!         if (TEST_HARD_REG_BIT (new->reg_set, old->reg[reg]))
!           live++;
!
!       next = live;
!       while (old->top >= live)
!         if (TEST_HARD_REG_BIT (new->reg_set, old->reg[old->top]))
! 	  {
! 	    while (TEST_HARD_REG_BIT (new->reg_set, old->reg[next]))
! 	      next--;
! 	    emit_pop_insn (insn, old, FP_MODE_REG (old->reg[next], DFmode),
  			   EMIT_BEFORE);
! 	  }
! 	else
! 	  emit_pop_insn (insn, old, FP_MODE_REG (old->reg[old->top], DFmode),
  			 EMIT_BEFORE);
!     }

    if (new->top == -2)
      {


Roger
--
Roger Sayle,                         E-mail: roger@eyesopen.com
OpenEye Scientific Software,         WWW: http://www.eyesopen.com/
Suite 1107, 3600 Cerrillos Road,     Tel: (+1) 505-473-7385
Santa Fe, New Mexico, 87507.         Fax: (+1) 505-473-0833


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]