This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

i386 frame pointer elimination patch



Hi
This is the frame pointer elimination patch. It is basically my patch from
november, but updated for new REGNO, with tweaked some parts, fixed indentation
and clarified comments.  Hope that now it will be much easier to review.

Main idea is to save the clobbered registers first, before allocating stack
frame. This avoid extra stack pointer manipulation (two AGI stalls and one move
in the epilogue at least).
Also the stack allocators now appears inside of the function, so they can be
combined with other stack pointer changes by the new peep2s or by new pass in the
future.

Patch also fixes the missalignment of long doubles. You may verify that currently:
int a;
main()
{
  long double a;
  printf("%i\n",((int)&a)&15);
}
prints 8 on i386 unless compiled with -fomit-frame-pointer.


Patch reduces stack frames ocasionally by taking cfun->stack_alignment_needed
into account.

The debuging, profiling and stack frame unwinding algs are not changed, since
frame pointer still points to the same place in the stack frame, only offsets of
local variables are changed.

I am getting quite notable speedups in the recursive functions. Following are the
changed results in my benchmark suite:

Bell Labs Benchmark B1: Ackerman's function (recursion) (nongpl/bell-labs/b1.c)
---%    151%    123%    101% 
---%    144%    127%    106% 
Hanoi (tests/hanoi.c)
101%    107%    101%    106% 
101%    107%    101%    105% 
Quicksort (tests/qsort.c)
100%    104%    101%    104% 
100%    101%    100%     99% 
Bell Labs Benchmark B7: (integer statistics) (nongpl/bell-labs/b7.c)
---%    106%    103%    153% 
---%    106%    150%    103% 
Bell Labs benchmark B2 (tree creation) (nongpl/bell-labs/b2.c):    112% 
Bell Labs benchmark B3 (qsort for strings) (nongpl/bell-labs/b3.c):104% 
Bell Labs benchmark B4 (tty driver fragt) (nongpl/bell-labs/b4.c): 113% 
Bell Labs benchmark B6 (symbol tbl insert) (nongpl/bell-labs/b5.c):105% 
Bell Labs benchmark B6 (release buffers) (nongpl/bell-labs/b6.c):  107% 
Dhrystone (nongpl/dhry/dhry_1.c nongpl/dhry/dhry_2.c):      103% 
XaoS internal loop (tests/xaos.c):                          101% 
Bzip2 block sorting loop (tests/bzip2.c):                   101% 

With -fomit-frame-pointer the gains are even more noticeable, but I don't heave
the exact results available right now.

The costs of -mpreferred-stack-boundary code is noticeably lower, but I am still
able to benchmark it easilly:

Bell Labs Benchmark B1: Ackerman's function (recursion) (nongpl/bell-labs/b1.c)
---%     60%     71%     93% 
---%     61%     66%     94% 
Hanoi (tests/hanoi.c)
101%     98%     99%     99% 
101%     98%     99%     98% 
Quicksort (tests/qsort.c)
 90%     96%     97%     94% 
 90%     95%     95%     92% 
Bell Labs benchmark B2 (tree creation) (nongpl/bell-labs/b2.c):     93% 
Bell Labs benchmark B3 (qsort for strings) (nongpl/bell-labs/b3.c): 94% 
Bell Labs benchmark B4 (tty driver fragt) (nongpl/bell-labs/b4.c):  86% 
Bell Labs benchmark B6 (symbol tbl insert) (nongpl/bell-labs/b5.c): 92% 
Bell Labs benchmark B6 (release buffers) (nongpl/bell-labs/b6.c):   82% 
Dhrystone (nongpl/dhry/dhry_1.c nongpl/dhry/dhry_2.c):       97% 
Palette approximation (tests/pal.c):                         99% 
Slalom benchmark (nongpl/slalom.c):                         102% 

I would like to followup this patch by patch collecting cfun->preffered_stack_boundary
neccesary for given function based on maximal alignment needed by called functions
(then the callers would take into account that necesary alignment is
MAX (cfun->stack_alignment_needed, cfun->preferred_stack_boundary) then)
and we can avoid cost of preferred_stack_boundary at least for recursive and static
functions.

Also it often results in shorter code when stack pointer is reliable, because
ADD is used at the place of LEA. The size of XaoS binary has changed from
506980 to 498180.

I've tested this patch quite extensivly and it seems to be reliable.

Wed Jan  19 14:25:42 CET 2000  Jan Hubicka  <jh@suse.cz>
	* i386.h (FIRST_PSEUDO_REGISTER): Set to 21.
	(FIXED_REGISTERS, CALL_USED_REGISTERS,
	 REG_ALLOC_ORDER): Add frame pointer
	(FRAME_POINTER_REGNUM): Set to 20
	(HARD_FRAME_POINTER_REGNUM): New macro.
	(ELIMINABLE_REGS): Eliminate ARG_POINTER and FRAME_POINTER
	to HARD_FRAME_POINTER.
	(REGNO_OK_FOR_BASE_P): Accept FRAME_POINTER_REGNUM
	(REG_OK_FOR_INDEX_NONSTRICT_P): Likewise.
	(REG_OK_FOR_BASE_NONSTRICT_P): Likewise.
	(HI_REGISTER_NAMES): Add "frame".
	(debug_reg): Handle FRAME_POINTER_REGNUM.
	* i386.c (SAVED_REGS_FIRST): new macro.
	(AT_BP): Use hard_frame_pointer_rtx instead of frame_pointer_rtx
	(ix86_decompise_address, memory_address_length): Likewise.
	(call_insn_operand): Handle frame_pointer_rtx.
	(ix86_can_use_return_insn_p): Likewise.
	(ix86_compute_frame_size): Make static, update prototype, new
	parameters padding1, padding2, use ix86_nsaved_regs, use
	stack_alignment_needed.
	(ix86_initial_elimination_offset): Handle FRAME_POINTER_REGNUM
	to HARD_FRAME_POINTER_REGNUM conversions.
	(ix86_expand_prologue): Handle SAVED_REGS_FIRST prologues.
	(ix86_expand_epilogue): Handle SAVED_REGS_FIRST epilogues.
	(print_reg): Abort on FRAME_POINTER_REGNUM
	(ix86_expand_move): Moves to virtual_stack_vars_rtx must be from
	register.
	* i386-protos.h (ix86_compute_frame_size): Remove.
	(ix86_initial_elimination_offset): Declare.
	* i386.md (set_frame): New insn pattern and splitter.
	(frame_noop): Likewise.

diff -Nrc3p i386.new2/i386.c i386/i386.c
*** i386.new2/i386.c	Wed Jan 19 12:28:00 2000
--- i386/i386.c	Wed Jan 19 14:17:01 2000
*************** Boston, MA 02111-1307, USA. */
*** 41,46 ****
--- 41,56 ----
  #include "basic-block.h"
  #include "ggc.h"
  
+ /* True when we want to do pushes before allocating stack to get better
+    scheduling.
+ 
+    Saving registers first is win in the most cases except for LEAVE
+    instruction.  Macro is 0 iff we will use LEAVE.  */
+ 
+ #define SAVED_REGS_FIRST \
+   (!frame_pointer_needed || (!TARGET_USE_LEAVE && !optimize_size))
+ 
+ 
  #ifdef EXTRA_CONSTRAINT
  /* If EXTRA_CONSTRAINT is defined, then the 'S'
     constraint in REG_CLASS_FROM_LETTER will no longer work, and various
*************** const int x86_split_long_moves = m_PPRO;
*** 214,220 ****
  const int x86_promote_QImode = m_K6 | m_PENT | m_386 | m_486;
  const int x86_single_stringop = m_386;
  
! #define AT_BP(mode) (gen_rtx_MEM ((mode), frame_pointer_rtx))
  
  const char * const hi_reg_name[] = HI_REGISTER_NAMES;
  const char * const qi_reg_name[] = QI_REGISTER_NAMES;
--- 224,230 ----
  const int x86_promote_QImode = m_K6 | m_PENT | m_386 | m_486;
  const int x86_single_stringop = m_386;
  
! #define AT_BP(mode) (gen_rtx_MEM ((mode), hard_frame_pointer_rtx))
  
  const char * const hi_reg_name[] = HI_REGISTER_NAMES;
  const char * const qi_reg_name[] = QI_REGISTER_NAMES;
*************** static void ix86_init_machine_status PRO
*** 325,331 ****
  static void ix86_mark_machine_status PARAMS ((struct function *));
  static void ix86_split_to_parts PARAMS ((rtx, rtx *, enum machine_mode));
  static int ix86_safe_length_prefix PARAMS ((rtx));
! static HOST_WIDE_INT ix86_compute_frame_size PARAMS((HOST_WIDE_INT, int *));
  static int ix86_nsaved_regs PARAMS((void));
  static void ix86_emit_save_regs PARAMS((void));
  static void ix86_emit_restore_regs PARAMS((void));
--- 335,341 ----
  static void ix86_mark_machine_status PARAMS ((struct function *));
  static void ix86_split_to_parts PARAMS ((rtx, rtx *, enum machine_mode));
  static int ix86_safe_length_prefix PARAMS ((rtx));
! static HOST_WIDE_INT ix86_compute_frame_size PARAMS((HOST_WIDE_INT, int *, int *, int *));
  static int ix86_nsaved_regs PARAMS((void));
  static void ix86_emit_save_regs PARAMS((void));
  static void ix86_emit_restore_regs PARAMS((void));
*************** call_insn_operand (op, mode)
*** 1030,1035 ****
--- 1040,1046 ----
       compiler aborts when trying to eliminate them.  */
    if (GET_CODE (op) == REG
        && (op == arg_pointer_rtx
+ 	  || op == frame_pointer_rtx
  	  || (REGNO (op) >= FIRST_PSEUDO_REGISTER
  	      && REGNO (op) <= LAST_VIRTUAL_REGISTER)))
      return 0;
*************** ix86_initial_elimination_offset (from, t
*** 1569,1659 ****
       int from;
       int to;
  {
!   if (from == ARG_POINTER_REGNUM && to == FRAME_POINTER_REGNUM)
!     return 8;			/* Skip saved PC and previous frame pointer */
    else
      {
!       int nregs;
!       int poffset;
!       int offset;
!       int preferred_alignment = PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT;
        HOST_WIDE_INT tsize = ix86_compute_frame_size (get_frame_size (),
! 						     &nregs);
  
-       offset = (tsize + nregs * UNITS_PER_WORD);
  
!       poffset = 4;
!       if (frame_pointer_needed)
! 	poffset += UNITS_PER_WORD;
! 
!       if (from == ARG_POINTER_REGNUM)
! 	offset += poffset;
        else
! 	offset -= ((poffset + preferred_alignment - 1)
! 		   & -preferred_alignment) - poffset;
!       return offset;
      }
  }
  
  /* Compute the size of local storage taking into consideration the
     desired stack alignment which is to be maintained.  Also determine
!    the number of registers saved below the local storage.  */
  
! HOST_WIDE_INT
! ix86_compute_frame_size (size, nregs_on_stack)
       HOST_WIDE_INT size;
       int *nregs_on_stack;
  {
-   int limit;
    int nregs;
!   int regno;
!   int padding;
!   int pic_reg_used = flag_pic && (current_function_uses_pic_offset_table
! 				  || current_function_uses_const_pool);
    HOST_WIDE_INT total_size;
  
!   limit = frame_pointer_needed
! 	  ? FRAME_POINTER_REGNUM : STACK_POINTER_REGNUM;
  
!   nregs = 0;
! 
!   for (regno = limit - 1; regno >= 0; regno--)
!     if ((regs_ever_live[regno] && ! call_used_regs[regno])
! 	|| (regno == PIC_OFFSET_TABLE_REGNUM && pic_reg_used))
!       nregs++;
! 
!   padding = 0;
!   total_size = size + (nregs * UNITS_PER_WORD);
  
  #ifdef PREFERRED_STACK_BOUNDARY
    {
      int offset;
      int preferred_alignment = PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT;
  
!     offset = 4;
!     if (frame_pointer_needed)
!       offset += UNITS_PER_WORD;
  
      total_size += offset;
-     
-     padding = ((total_size + preferred_alignment - 1)
- 	       & -preferred_alignment) - total_size;
  
!     if (padding < (((offset + preferred_alignment - 1)
! 		    & -preferred_alignment) - offset))
!       padding += preferred_alignment;
! 
!     /* Don't bother aligning the stack of a leaf function
!        which doesn't allocate any stack slots.  */
!     if (size == 0 && current_function_is_leaf)
!       padding = 0;
    }
  #endif
  
    if (nregs_on_stack)
      *nregs_on_stack = nregs;
  
!   return size + padding;
  }
  
  /* Emit code to save registers in the prologue.  */
--- 1580,1713 ----
       int from;
       int to;
  {
!   int padding1;
!   int nregs;
! 
!   /* Stack grows downward:
!     
!      [arguments]
! 						<- ARG_POINTER
!      saved pc
! 
!      saved frame pointer if frame_pointer_needed
! 						<- HARD_FRAME_POINTER
!      [saved regs if SAVED_REGS_FIRST]
! 
!      [padding1]   \
! 		   |				<- FRAME_POINTER
!      [frame]	   > tsize
! 		   |
!      [padding2]   /
! 
!      [saved regs if !SAVED_REGS_FIRST]
!      						<- STACK_POINTER
!     */
! 
!   if (from == ARG_POINTER_REGNUM && to == HARD_FRAME_POINTER_REGNUM)
!     /* Skip saved PC and previous frame pointer.
!        Executed only when frame_pointer_needed.  */
!     return 8;
!   else if (from == FRAME_POINTER_REGNUM
! 	   && to == HARD_FRAME_POINTER_REGNUM)
!     {
!       ix86_compute_frame_size (get_frame_size (), &nregs, &padding1, (int *)0);
!       if (SAVED_REGS_FIRST)
! 	padding1 += nregs * UNITS_PER_WORD;
!       return -padding1;
!     }
    else
      {
!       /* ARG_POINTER or FRAME_POINTER to STACK_POINTER elimination.  */
!       int frame_size = frame_pointer_needed ? 8 : 4;
        HOST_WIDE_INT tsize = ix86_compute_frame_size (get_frame_size (),
! 						     &nregs, &padding1, (int *)0);
  
  
!       if (to != STACK_POINTER_REGNUM)
! 	abort();
!       else if (from == ARG_POINTER_REGNUM)
! 	return tsize + nregs * UNITS_PER_WORD + frame_size;
!       else if (from != FRAME_POINTER_REGNUM)
! 	abort();
!       else if (SAVED_REGS_FIRST)
! 	return tsize - padding1;
        else
! 	return tsize + nregs * UNITS_PER_WORD - padding1;
      }
  }
  
+ 
  /* Compute the size of local storage taking into consideration the
     desired stack alignment which is to be maintained.  Also determine
!    the number of registers saved below the local storage.  
!  
!    PADDING1 returns padding before stack frame and PADDING2 returns
!    padding after stack frame;
!  */
  
! static HOST_WIDE_INT
! ix86_compute_frame_size (size, nregs_on_stack, rpadding1, rpadding2)
       HOST_WIDE_INT size;
       int *nregs_on_stack;
+      int *rpadding1;
+      int *rpadding2;
  {
    int nregs;
!   int padding1 = 0;
!   int padding2 = 0;
    HOST_WIDE_INT total_size;
+   int stack_alignment_needed = cfun->stack_alignment_needed / BITS_PER_UNIT;
  
!   nregs = ix86_nsaved_regs ();
  
!   total_size = size;
  
  #ifdef PREFERRED_STACK_BOUNDARY
    {
      int offset;
      int preferred_alignment = PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT;
  
!     offset = frame_pointer_needed ? 8 : 4;
! 
!     /* When frame is not empty we ought to have recorded the alignment.  */
!     if (size && !stack_alignment_needed)
!       abort();
! 
!     if (stack_alignment_needed < 4)
!       stack_alignment_needed = 4;
! 
!     if (stack_alignment_needed > preferred_alignment)
!       abort();
! 
!     if (SAVED_REGS_FIRST)
!       offset += nregs * UNITS_PER_WORD;
!     else
!       total_size += nregs * UNITS_PER_WORD;
  
      total_size += offset;
  
!     /* Align start of frame for local function.  */
!     padding1 = ((offset + stack_alignment_needed - 1)
! 		& -stack_alignment_needed) - offset;
!     total_size += padding1;
! 
!     /* Align stack boundary. */
!     if (!current_function_is_leaf)
!       padding2 = ((total_size + preferred_alignment - 1)
! 		  & -preferred_alignment) - total_size;
    }
  #endif
  
    if (nregs_on_stack)
      *nregs_on_stack = nregs;
  
!   if (rpadding1)
!     *rpadding1 = padding1;
! 
!   if (rpadding2)
!     *rpadding2 = padding2;
! 
!   return size + padding1 + padding2;
  }
  
  /* Emit code to save registers in the prologue.  */
*************** ix86_emit_save_regs ()
*** 1667,1673 ****
    int pic_reg_used = flag_pic && (current_function_uses_pic_offset_table
  				  || current_function_uses_const_pool);
    limit = (frame_pointer_needed
! 	   ? FRAME_POINTER_REGNUM : STACK_POINTER_REGNUM);
  
    for (regno = limit - 1; regno >= 0; regno--)
      if ((regs_ever_live[regno] && !call_used_regs[regno])
--- 1721,1727 ----
    int pic_reg_used = flag_pic && (current_function_uses_pic_offset_table
  				  || current_function_uses_const_pool);
    limit = (frame_pointer_needed
! 	   ? HARD_FRAME_POINTER_REGNUM : STACK_POINTER_REGNUM);
  
    for (regno = limit - 1; regno >= 0; regno--)
      if ((regs_ever_live[regno] && !call_used_regs[regno])
*************** ix86_emit_save_regs ()
*** 1683,1705 ****
  void
  ix86_expand_prologue ()
  {
    int pic_reg_used = flag_pic && (current_function_uses_pic_offset_table
  				  || current_function_uses_const_pool);
-   HOST_WIDE_INT tsize = ix86_compute_frame_size (get_frame_size (), (int *)0);
-   rtx insn;
  
    /* Note: AT&T enter does NOT have reversed args.  Enter is probably
       slower on all targets.  Also sdb doesn't like it.  */
  
    if (frame_pointer_needed)
      {
!       insn = emit_insn (gen_push (frame_pointer_rtx));
        RTX_FRAME_RELATED_P (insn) = 1;
  
!       insn = emit_move_insn (frame_pointer_rtx, stack_pointer_rtx);
        RTX_FRAME_RELATED_P (insn) = 1;
      }
  
    if (tsize == 0)
      ;
    else if (! TARGET_STACK_PROBE || tsize < CHECK_STACK_LIMIT)
--- 1737,1763 ----
  void
  ix86_expand_prologue ()
  {
+   HOST_WIDE_INT tsize = ix86_compute_frame_size (get_frame_size (), (int *)0, (int *)0,
+ 						 (int *)0);
+   rtx insn;
    int pic_reg_used = flag_pic && (current_function_uses_pic_offset_table
  				  || current_function_uses_const_pool);
  
    /* Note: AT&T enter does NOT have reversed args.  Enter is probably
       slower on all targets.  Also sdb doesn't like it.  */
  
    if (frame_pointer_needed)
      {
!       insn = emit_insn (gen_push (hard_frame_pointer_rtx));
        RTX_FRAME_RELATED_P (insn) = 1;
  
!       insn = emit_move_insn (hard_frame_pointer_rtx, stack_pointer_rtx);
        RTX_FRAME_RELATED_P (insn) = 1;
      }
  
+   if (SAVED_REGS_FIRST)
+     ix86_emit_save_regs ();
+ 
    if (tsize == 0)
      ;
    else if (! TARGET_STACK_PROBE || tsize < CHECK_STACK_LIMIT)
*************** ix86_expand_prologue ()
*** 1708,1714 ****
  	insn = emit_insn (gen_prologue_allocate_stack (stack_pointer_rtx,
  						       stack_pointer_rtx,
  						       GEN_INT (-tsize),
! 						       frame_pointer_rtx));
        else
          insn = emit_insn (gen_addsi3 (stack_pointer_rtx, stack_pointer_rtx,
  				      GEN_INT (-tsize)));
--- 1766,1772 ----
  	insn = emit_insn (gen_prologue_allocate_stack (stack_pointer_rtx,
  						       stack_pointer_rtx,
  						       GEN_INT (-tsize),
! 						       hard_frame_pointer_rtx));
        else
          insn = emit_insn (gen_addsi3 (stack_pointer_rtx, stack_pointer_rtx,
  				      GEN_INT (-tsize)));
*************** ix86_expand_prologue ()
*** 1732,1738 ****
  			     CALL_INSN_FUNCTION_USAGE (insn));
      }
  
!   ix86_emit_save_regs ();
  #ifdef SUBTARGET_PROLOGUE
    SUBTARGET_PROLOGUE;
  #endif  
--- 1790,1798 ----
  			     CALL_INSN_FUNCTION_USAGE (insn));
      }
  
!   if (!SAVED_REGS_FIRST)
!     ix86_emit_save_regs ();
! 
  #ifdef SUBTARGET_PROLOGUE
    SUBTARGET_PROLOGUE;
  #endif  
*************** ix86_emit_restore_regs ()
*** 1755,1761 ****
    int pic_reg_used = flag_pic && (current_function_uses_pic_offset_table
  				  || current_function_uses_const_pool);
    int limit = (frame_pointer_needed
! 	       ? FRAME_POINTER_REGNUM : STACK_POINTER_REGNUM);
    int regno;
  
    for (regno = 0; regno < limit; regno++)
--- 1815,1821 ----
    int pic_reg_used = flag_pic && (current_function_uses_pic_offset_table
  				  || current_function_uses_const_pool);
    int limit = (frame_pointer_needed
! 	       ? HARD_FRAME_POINTER_REGNUM : STACK_POINTER_REGNUM);
    int regno;
  
    for (regno = 0; regno < limit; regno++)
*************** ix86_expand_epilogue ()
*** 1825,1844 ****
  				  || current_function_uses_const_pool);
    int sp_valid = !frame_pointer_needed || current_function_sp_is_unchanging;
    HOST_WIDE_INT offset;
!   HOST_WIDE_INT tsize = ix86_compute_frame_size (get_frame_size (), &nregs);
  
    /* SP is often unreliable so we may have to go off the frame pointer. */
  
    offset = -(tsize + nregs * UNITS_PER_WORD);
  
    /* If we're only restoring one register and sp is not valid then
       using a move instruction to restore the register since it's
       less work than reloading sp and popping the register.  Otherwise,
       restore sp (if necessary) and pop the registers. */
  
!   if (nregs > 1 || sp_valid)
      {
!       if ( !sp_valid )
  	{
  	  rtx addr_offset;
  	  addr_offset = adj_offsettable_operand (AT_BP (QImode), offset);
--- 1885,1922 ----
  				  || current_function_uses_const_pool);
    int sp_valid = !frame_pointer_needed || current_function_sp_is_unchanging;
    HOST_WIDE_INT offset;
!   HOST_WIDE_INT tsize = ix86_compute_frame_size (get_frame_size (), &nregs, (int *)0,
! 						 (int *)0);
  
    /* SP is often unreliable so we may have to go off the frame pointer. */
  
    offset = -(tsize + nregs * UNITS_PER_WORD);
  
+   if (SAVED_REGS_FIRST)
+     {
+       if (!sp_valid)
+         {
+ 	  if (nregs)
+ 	    emit_insn (gen_rtx_SET (VOIDmode, stack_pointer_rtx,
+ 				    gen_rtx_PLUS (SImode, hard_frame_pointer_rtx,
+ 						  GEN_INT (- nregs * UNITS_PER_WORD))));
+ 	  else
+ 	    emit_insn (gen_epilogue_deallocate_stack (stack_pointer_rtx,
+ 						   hard_frame_pointer_rtx));
+ 	}
+       else if (tsize)
+ 	ix86_emit_esp_adjustment (tsize);
+       ix86_emit_restore_regs ();
+     }
+ 
    /* If we're only restoring one register and sp is not valid then
       using a move instruction to restore the register since it's
       less work than reloading sp and popping the register.  Otherwise,
       restore sp (if necessary) and pop the registers. */
  
!   else if (nregs > 1 || sp_valid)
      {
!       if (!sp_valid)
  	{
  	  rtx addr_offset;
  	  addr_offset = adj_offsettable_operand (AT_BP (QImode), offset);
*************** ix86_expand_epilogue ()
*** 1852,1858 ****
    else
      {
        limit = (frame_pointer_needed
! 	       ? FRAME_POINTER_REGNUM : STACK_POINTER_REGNUM);
        for (regno = 0; regno < limit; regno++)
  	if ((regs_ever_live[regno] && ! call_used_regs[regno])
  	    || (regno == PIC_OFFSET_TABLE_REGNUM && pic_reg_used))
--- 1930,1936 ----
    else
      {
        limit = (frame_pointer_needed
! 	       ? HARD_FRAME_POINTER_REGNUM : STACK_POINTER_REGNUM);
        for (regno = 0; regno < limit; regno++)
  	if ((regs_ever_live[regno] && ! call_used_regs[regno])
  	    || (regno == PIC_OFFSET_TABLE_REGNUM && pic_reg_used))
*************** ix86_expand_epilogue ()
*** 1866,1881 ****
    if (frame_pointer_needed)
      {
        /* If not an i386, mov & pop is faster than "leave". */
!       if (TARGET_USE_LEAVE)
! 	emit_insn (gen_leave());
        else
  	{
! 	  emit_insn (gen_epilogue_deallocate_stack (stack_pointer_rtx,
! 						    frame_pointer_rtx));
! 	  emit_insn (gen_popsi1 (frame_pointer_rtx));
  	}
      }
!   else if (tsize)
      ix86_emit_esp_adjustment (tsize);
  
  #ifdef FUNCTION_BLOCK_PROFILER_EXIT
--- 1944,1960 ----
    if (frame_pointer_needed)
      {
        /* If not an i386, mov & pop is faster than "leave". */
!       if (TARGET_USE_LEAVE || optimize_size)
! 	emit_insn (gen_leave ());
        else
  	{
! 	  if (!SAVED_REGS_FIRST)
! 	    emit_insn (gen_epilogue_deallocate_stack (stack_pointer_rtx,
! 						   hard_frame_pointer_rtx));
! 	  emit_insn (gen_popsi1 (hard_frame_pointer_rtx));
  	}
      }
!   else if (!SAVED_REGS_FIRST && tsize)
      ix86_emit_esp_adjustment (tsize);
  
  #ifdef FUNCTION_BLOCK_PROFILER_EXIT
*************** ix86_decompose_address (addr, out)
*** 2004,2010 ****
      }
  
    /* Special case: %ebp cannot be encoded as a base without a displacement.  */
!   if (base == frame_pointer_rtx && !disp)
      disp = const0_rtx;
  
    /* Special case: on K6, [%esi] makes the instruction vector decoded.
--- 2083,2089 ----
      }
  
    /* Special case: %ebp cannot be encoded as a base without a displacement.  */
!   if (base == hard_frame_pointer_rtx && !disp)
      disp = const0_rtx;
  
    /* Special case: on K6, [%esi] makes the instruction vector decoded.
*************** print_reg (x, code, file)
*** 2732,2737 ****
--- 2811,2817 ----
       FILE *file;
  {
    if (REGNO (x) == ARG_POINTER_REGNUM
+       || REGNO (x) == FRAME_POINTER_REGNUM
        || REGNO (x) == FLAGS_REG
        || REGNO (x) == FPSR_REG)
      abort ();
*************** ix86_expand_move (mode, operands)
*** 3678,3683 ****
--- 3839,3853 ----
    int strict = (reload_in_progress || reload_completed);
    rtx insn;
  
+   /* The loading of virtual_stack_vars_rtx must go trought the register
+      so lea pattern can match.  */
+   if (operands[1] == virtual_stack_vars_rtx
+       && !register_operand (operands[0], SImode))
+     {
+       operands[1] = gen_reg_rtx (SImode);
+       emit_move_insn (operands[1], virtual_stack_vars_rtx);
+     }
+ 
    if (flag_pic && mode == Pmode && symbolic_operand (operands[1], Pmode))
      {
        /* Emit insns to move operands[1] into operands[0].  */
*************** memory_address_length (addr)
*** 5505,5511 ****
        /* Special cases: ebp and esp need the two-byte modrm form.  */
        if (addr == stack_pointer_rtx
  	  || addr == arg_pointer_rtx
! 	  || addr == frame_pointer_rtx)
  	len = 1;
      }
  
--- 5590,5597 ----
        /* Special cases: ebp and esp need the two-byte modrm form.  */
        if (addr == stack_pointer_rtx
  	  || addr == arg_pointer_rtx
! 	  || addr == frame_pointer_rtx
! 	  || addr == hard_frame_pointer_rtx)
  	len = 1;
      }
  
diff -Nrc3p i386.new2/i386.h i386/i386.h
*** i386.new2/i386.h	Wed Jan 19 12:11:31 2000
--- i386/i386.h	Wed Jan 19 00:24:58 2000
*************** extern int ix86_arch;
*** 619,625 ****
     eliminated during reloading in favor of either the stack or frame
     pointer. */
  
! #define FIRST_PSEUDO_REGISTER 20
  
  /* Number of hardware registers that go into the DWARF-2 unwind info.
     If not defined, equals FIRST_PSEUDO_REGISTER.  */
--- 619,625 ----
     eliminated during reloading in favor of either the stack or frame
     pointer. */
  
! #define FIRST_PSEUDO_REGISTER 21
  
  /* Number of hardware registers that go into the DWARF-2 unwind info.
     If not defined, equals FIRST_PSEUDO_REGISTER.  */
*************** extern int ix86_arch;
*** 631,637 ****
     On the 80386, the stack pointer is such, as is the arg pointer. */
  #define FIXED_REGISTERS \
  /*ax,dx,cx,bx,si,di,bp,sp,st,st1,st2,st3,st4,st5,st6,st7,arg,flags,fpsr, dir*/ \
! {  0, 0, 0, 0, 0, 0, 0, 1, 0,  0,  0,  0,  0,  0,  0,  0,  1,    0,   0,   0 }
  
  /* 1 for registers not available across function calls.
     These must include the FIXED_REGISTERS and also any
--- 631,639 ----
     On the 80386, the stack pointer is such, as is the arg pointer. */
  #define FIXED_REGISTERS \
  /*ax,dx,cx,bx,si,di,bp,sp,st,st1,st2,st3,st4,st5,st6,st7,arg,flags,fpsr, dir*/ \
! {  0, 0, 0, 0, 0, 0, 0, 1, 0,  0,  0,  0,  0,  0,  0,  0,  1,    0,   0,   0,  \
! /*frame									    */ \
!    1}
  
  /* 1 for registers not available across function calls.
     These must include the FIXED_REGISTERS and also any
*************** extern int ix86_arch;
*** 642,648 ****
  
  #define CALL_USED_REGISTERS \
  /*ax,dx,cx,bx,si,di,bp,sp,st,st1,st2,st3,st4,st5,st6,st7,arg,flags,fpsr, dir*/ \
! {  1, 1, 1, 0, 0, 0, 0, 1, 1,  1,  1,  1,  1,  1,  1,  1,  1,    1,   1,   1 }
  
  /* Order in which to allocate registers.  Each register must be
     listed once, even those in FIXED_REGISTERS.  List frame pointer
--- 644,652 ----
  
  #define CALL_USED_REGISTERS \
  /*ax,dx,cx,bx,si,di,bp,sp,st,st1,st2,st3,st4,st5,st6,st7,arg,flags,fpsr, dir*/ \
! {  1, 1, 1, 0, 0, 0, 0, 1, 1,  1,  1,  1,  1,  1,  1,  1,  1,    1,   1,   1,  \
! /*frame									    */ \
!    0}
  
  /* Order in which to allocate registers.  Each register must be
     listed once, even those in FIXED_REGISTERS.  List frame pointer
*************** extern int ix86_arch;
*** 665,671 ****
  
  #define REG_ALLOC_ORDER \
  /*ax,dx,cx,bx,si,di,bp,sp,st,st1,st2,st3,st4,st5,st6,st7,arg,cc,fpsr, dir*/ \
! {  0, 1, 2, 3, 4, 5, 6, 7, 8,  9, 10, 11, 12, 13, 14, 15, 16,17,  18,  19 }
  
  /* A C statement (sans semicolon) to choose the order in which to
     allocate hard registers for pseudo-registers local to a basic
--- 669,677 ----
  
  #define REG_ALLOC_ORDER \
  /*ax,dx,cx,bx,si,di,bp,sp,st,st1,st2,st3,st4,st5,st6,st7,arg,cc,fpsr, dir*/ \
! {  0, 1, 2, 3, 4, 5, 6, 7, 8,  9, 10, 11, 12, 13, 14, 15, 16,17,  18,  19,  \
! /*frame									 */ \
!   20}
  
  /* A C statement (sans semicolon) to choose the order in which to
     allocate hard registers for pseudo-registers local to a basic
*************** extern int ix86_arch;
*** 762,768 ****
  #define STACK_POINTER_REGNUM 7
  
  /* Base register for access to local variables of the function.  */
! #define FRAME_POINTER_REGNUM 6
  
  /* First floating point reg */
  #define FIRST_FLOAT_REG 8
--- 768,777 ----
  #define STACK_POINTER_REGNUM 7
  
  /* Base register for access to local variables of the function.  */
! #define HARD_FRAME_POINTER_REGNUM 6
! 
! /* Base register for access to local variables of the function.  */
! #define FRAME_POINTER_REGNUM 20
  
  /* First floating point reg */
  #define FIRST_FLOAT_REG 8
*************** do {								\
*** 1397,1406 ****
     pointer register.  Secondly, the argument pointer register can always be
     eliminated; it is replaced with either the stack or frame pointer. */
  
! #define ELIMINABLE_REGS				\
! {{ ARG_POINTER_REGNUM, STACK_POINTER_REGNUM},	\
!  { ARG_POINTER_REGNUM, FRAME_POINTER_REGNUM},   \
!  { FRAME_POINTER_REGNUM, STACK_POINTER_REGNUM}}
  
  /* Given FROM and TO register numbers, say whether this elimination is allowed.
     Frame pointer elimination is automatically handled.
--- 1406,1416 ----
     pointer register.  Secondly, the argument pointer register can always be
     eliminated; it is replaced with either the stack or frame pointer. */
  
! #define ELIMINABLE_REGS					\
! {{ ARG_POINTER_REGNUM, STACK_POINTER_REGNUM},		\
!  { ARG_POINTER_REGNUM, HARD_FRAME_POINTER_REGNUM},	\
!  { FRAME_POINTER_REGNUM, STACK_POINTER_REGNUM},		\
!  { FRAME_POINTER_REGNUM, HARD_FRAME_POINTER_REGNUM}}	\
  
  /* Given FROM and TO register numbers, say whether this elimination is allowed.
     Frame pointer elimination is automatically handled.
*************** do {								\
*** 1444,1449 ****
--- 1454,1460 ----
  #define REGNO_OK_FOR_BASE_P(REGNO) \
    ((REGNO) <= STACK_POINTER_REGNUM \
     || (REGNO) == ARG_POINTER_REGNUM \
+    || (REGNO) == FRAME_POINTER_REGNUM \
     || (unsigned) reg_renumber[REGNO] <= STACK_POINTER_REGNUM)
  
  #define REGNO_OK_FOR_SIREG_P(REGNO) ((REGNO) == 4 || reg_renumber[REGNO] == 4)
*************** do {								\
*** 1466,1476 ****
--- 1477,1489 ----
  /* Non strict versions, pseudos are ok */
  #define REG_OK_FOR_INDEX_NONSTRICT_P(X)					\
    (REGNO (X) < STACK_POINTER_REGNUM					\
+    || REGNO (X) == FRAME_POINTER_REGNUM \
     || REGNO (X) >= FIRST_PSEUDO_REGISTER)
  
  #define REG_OK_FOR_BASE_NONSTRICT_P(X)					\
    (REGNO (X) <= STACK_POINTER_REGNUM					\
     || REGNO (X) == ARG_POINTER_REGNUM					\
+    || REGNO (X) == FRAME_POINTER_REGNUM \
     || REGNO (X) >= FIRST_PSEUDO_REGISTER)
  
  #define REG_OK_FOR_STRREG_NONSTRICT_P(X)				\
*************** while (0)
*** 1701,1707 ****
  /* Define if shifts truncate the shift count
     which implies one can omit a sign-extension or zero-extension
     of a shift count.  */
! /* On i386, shifts do truncate the count.  But bit opcodes don't. */
  
  /* #define SHIFT_COUNT_TRUNCATED */
  
--- 1714,1720 ----
  /* Define if shifts truncate the shift count
     which implies one can omit a sign-extension or zero-extension
     of a shift count.  */
! /* On i386, shifts do truncate the count.  But bit opcodew don't. */
  
  /* #define SHIFT_COUNT_TRUNCATED */
  
*************** while (0)
*** 2153,2159 ****
  #define HI_REGISTER_NAMES						\
  {"ax","dx","cx","bx","si","di","bp","sp",				\
   "st","st(1)","st(2)","st(3)","st(4)","st(5)","st(6)","st(7)","",	\
!  "flags","fpsr", "dirflag" }
  
  #define REGISTER_NAMES HI_REGISTER_NAMES
  
--- 2166,2172 ----
  #define HI_REGISTER_NAMES						\
  {"ax","dx","cx","bx","si","di","bp","sp",				\
   "st","st(1)","st(2)","st(3)","st(4)","st(5)","st(6)","st(7)","",	\
!  "flags","fpsr", "dirflag", "frane" }
  
  #define REGISTER_NAMES HI_REGISTER_NAMES
  
*************** do { long l;						\
*** 2372,2377 ****
--- 2385,2392 ----
  	 { fputs ("fpsr", FILE); break; }		\
         if (REGNO (X) == ARG_POINTER_REGNUM)		\
  	 { fputs ("argp", FILE); break; }		\
+        if (REGNO (X) == FRAME_POINTER_REGNUM)		\
+ 	 { fputs ("frame", FILE); break; }		\
         if (STACK_TOP_P (X))				\
  	 { fputs ("st(0)", FILE); break; }		\
         if (FP_REG_P (X))				\
diff -Nrc3p i386.new2/i386.md i386/i386.md
*** i386.new2/i386.md	Wed Jan 19 11:26:59 2000
--- i386/i386.md	Wed Jan 19 00:12:38 2000
***************
*** 1423,1428 ****
--- 1423,1453 ----
  	    ]
  	    (const_string "*")))])
  
+ ; This insn is generated by setjmp/longjmp expanders to set
+ ; stack frame.
+ (define_insn "*set_frame"
+   [(set (reg:SI 20) (reg:SI 6))]
+   ""
+   "#")
+ 
+ (define_split
+   [(set (reg:SI 20) (reg:SI 6))]
+   "reload_completed && frame_pointer_needed"
+   [(set (reg:SI 6) (plus:SI (reg:SI 6) (match_dup 0)))]
+   "operands[0] = GEN_INT (-ix86_initial_elimination_offset (20, 6));")
+  
+ ; Such noop insn gets created when structure is passed as parameter.
+ (define_insn "*frame_noop"
+   [(set (reg:SI 20) (reg:SI 20))]
+   "!reload_completed"
+   "#")
+ 
+ (define_split
+   [(set (reg:SI 20) (reg:SI 20))]
+   "reload_completed && frame_pointer_needed"
+   [(const_int 0)]
+   "DONE;")
+  
  (define_insn "*swaphi_1"
    [(set (match_operand:HI 0 "register_operand" "+r")
  	(match_operand:HI 1 "register_operand" "+r"))

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]