This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

[PATCH] Fix store motion, revised PRE memory handling


While doing the store motion fixes, I went to revise overall the PRE memory
handling, but it appears it was already done for load motion, someone
had just forgotten to remove all the old code.

There is a consistent pattern of

if (load_killed_in_block_p (...)) <which is the way you want to handle
it>
if (mem_*set_* crap) <which is what we used to do>

Which of course, makes the first part pointless. The second part is
*more* conservative than the first, since it only cares about memory
ever set in the block.  So if you wanted to just make the code faster,
at the expense of memory handling, you would have reversed the test order.  

Also, the double sets of the reg_set_in_bitmap block in
mark_set/mark_clobber point to the idea that someone just forgot to
remove the old code, since they are one after the other, in the same
pattern.

It's otherwise kind of hard to claim that what was there:

if (GET_CODE (dest) == REG)
  SET_BIT (reg_set_in_bitmap, REGNO (dest));
else if (GET_CODE (dest) == MEM)
  record_last_mem_set_info (insn);

if (GET_CODE (dest) == REG)
  SET_BIT (reg_set_in_bitmap, REGNO (dest));
else if (GET_CODE (dest) == MEM)
  mem_last_set = INSN_CUID (insn);

makes any sense. If it was a single incident, i'd consider it just a
thinko. But it's a consistent thing throughout, as i said above.
So I just removed the old, pointless, code, which had been preventing
store motion.

Anyway, with this patch, we get hoisting of mems out of loops (and
other places), and sinking of stores/removal of redundant stores.
Which was what we were supposed to have going on, but didn't really
(load motion should have worked, had the old code been removed, but it
would have been restricted to a laughably silly subset of mems).

This patch won't be committed until a proper fix is committed to df.c
to fix the not marking all the right regs as def'd in call insns, but
i'd still like it reviewed anyway, since that's just a simple matter
to fix anyway.

Ahh, yes, lastly, store motion doesn't really depend on gcse pieces
anymore (If you wanted to remove it to another file, you'd need to
clone simple_mem and a very small number of other small routines
shared between load and store motion), so it should be possible to run
it after regalloc if you really wanted to.

Tested on powerpc-unknown-linux-gnu, no regressions (I also hand
verified it on a number of tricky load/store loops from NULLSTONE, and
other places, to make sure it wasn't doing the wrong thing).

2001-07-16  Daniel Berlin  <dan@cgsoftware.com>

	* gcse.c: Update comment at top. 
	Update comment on mem handling.
	Include df.h for use as a dataflow analyzer.
	regvec, mem_last_set, mem_first_set, mem_set_in_block removed.
	Declaration of reg_set_info removed.
        (reg_set_info): Deleted function.
	(oprs_unchanged_p): Don't use mem_*set_* anymore. They are
	pointless with load_killed_in_block_p (they are *more*
	conservative then it, not less, and less accurate).
	(oprs_not_set_p): Ditto.	
	(topleve): New df_analyzer variable used by store motion.
	(mark_mem_regs): New function, analyze regs used by a mem.
	(store_ops_ok): Use dataflow analyzer results to determine if
	necessary regs are changed in the block.
	(find_moveable_store): Remove check for symbol ref, we can handle
	much more complex expressions now.  Use rtx_addr_varies_p instead
	of rtx_varies_p.
	(compute_store_table): Remove most of the code, it's unnecessary
	now that the dataflow analyzer records the info for us.
	(store_killed_after): Add parameter to say whether to do the
	store_ops_okay test, used to speed up testing when we already know
	the answer, and just want to know if the store itself was killed.
	(build_store_vector): Largely rewritten to calculate the various
	vectors properly, and somewhat optimized.
	(store_motion): Init the df_analyzer, get REG_DEF chains.  
	Also handle trapping expressions (since mems almost always trap)
	(alloc_gcse_mem): Don't allocate mem_set_in_block
	(free_gcse_mem): Don't free it, either.
	(record_last_mem_set_info): Update comment in front, remove
	mem_*set_* stuff. Note the reason we don't handle stores directly
	here. 
	(compute_hash_table): Update comments to reflect reality. Remove
	mem_*set_* references.
	(reset_opr_set_tables): Remove mem_*set_* references.
	(mark_call): Ditto.
	(mark_set): Ditto.  Also remove double sets of bitmaps for REG's.		(mark_clobber): Ditto (on both parts, we double set here too).
	(expr_killed_p): Remove mem_set_in_block test.
	(compute_transp): Remove mem_set_in_block test.
	(simple_mem): Redefine what a simple mem is.
	(process_insert_insn): Fix bug generating an insert for a
	memory operand (emit_move_insn won't work if the memory isn't
	astoundingly simple).

Index: gcse.c
===================================================================
RCS file: /cvs/gcc/egcs/gcc/gcse.c,v
retrieving revision 1.134
diff -c -3 -p -w -B -b -d -r1.134 gcse.c
*** gcse.c	2001/07/11 16:11:47	1.134
--- gcse.c	2001/07/16 05:16:55
*************** Boston, MA 02111-1307, USA.  */
*** 24,30 ****
     - do rough calc of how many regs are needed in each block, and a rough
       calc of how many regs are available in each class and use that to
       throttle back the code in cases where RTX_COST is minimal.
-    - dead store elimination
     - a store to the same address as a load does not kill the load if the
       source of the store is also the destination of the load.  Handling this
       allows more load motion, particularly out of loops.
--- 24,29 ----
*************** Boston, MA 02111-1307, USA.  */
*** 161,168 ****
  #include "expr.h" 
  #include "ggc.h"
  #include "params.h"
- 
  #include "obstack.h"
  #define obstack_chunk_alloc gmalloc
  #define obstack_chunk_free free
  
--- 160,167 ----
  #include "expr.h" 
  #include "ggc.h"
  #include "params.h"
  #include "obstack.h"
+ #include "df.h"
  #define obstack_chunk_alloc gmalloc
  #define obstack_chunk_free free
  
*************** static rtx * modify_mem_list;
*** 497,512 ****
  
  /* This array parallels modify_mem_list, but is kept canonicalized.  */
  static rtx * canon_modify_mem_list;
- 
- /* For each block, non-zero if memory is set in that block.
-    This is computed during hash table computation and is used by
-    expr_killed_p and compute_transp.
-    ??? Handling of memory is very simple, we don't make any attempt
-    to optimize things (later).
-    ??? This can be computed by compute_sets since the information
-    doesn't change.  */
- static char *mem_set_in_block;
- 
  /* Various variables for statistics gathering.  */
  
  /* Memory used in a pass.
--- 496,501 ----
*************** static void invalidate_any_buried_refs	P
*** 687,700 ****
  static void compute_ld_motion_mems	PARAMS ((void)); 
  static void trim_ld_motion_mems		PARAMS ((void));
  static void update_ld_motion_stores	PARAMS ((struct expr *));
! static void reg_set_info		PARAMS ((rtx, rtx, void *));
! static int store_ops_ok			PARAMS ((rtx, basic_block));
  static void find_moveable_store		PARAMS ((rtx));
  static int compute_store_table		PARAMS ((void));
  static int load_kills_store		PARAMS ((rtx, rtx));
  static int find_loads			PARAMS ((rtx, rtx));
  static int store_killed_in_insn		PARAMS ((rtx, rtx));
! static int store_killed_after		PARAMS ((rtx, rtx, basic_block));
  static int store_killed_before		PARAMS ((rtx, rtx, basic_block));
  static void build_store_vectors		PARAMS ((void));
  static void insert_insn_start_bb	PARAMS ((rtx, basic_block));
--- 676,688 ----
  static void compute_ld_motion_mems	PARAMS ((void)); 
  static void trim_ld_motion_mems		PARAMS ((void));
  static void update_ld_motion_stores	PARAMS ((struct expr *));
! static int store_ops_ok			PARAMS ((rtx, basic_block, rtx, int));
  static void find_moveable_store		PARAMS ((rtx));
  static int compute_store_table		PARAMS ((void));
  static int load_kills_store		PARAMS ((rtx, rtx));
  static int find_loads			PARAMS ((rtx, rtx));
  static int store_killed_in_insn		PARAMS ((rtx, rtx));
! static int store_killed_after		PARAMS ((rtx, rtx, basic_block, int));
  static int store_killed_before		PARAMS ((rtx, rtx, basic_block));
  static void build_store_vectors		PARAMS ((void));
  static void insert_insn_start_bb	PARAMS ((rtx, basic_block));
*************** alloc_gcse_mem (f)
*** 1032,1038 ****
    /* Allocate vars to track sets of regs, memory per block.  */
    reg_set_in_block = (sbitmap *) sbitmap_vector_alloc (n_basic_blocks,
  						       max_gcse_regno);
-   mem_set_in_block = (char *) gmalloc (n_basic_blocks);
    /* Allocate array to keep a list of insns which modify memory in each
       basic block.  */
    modify_mem_list = (rtx *) gmalloc (n_basic_blocks * sizeof (rtx *));
--- 1019,1024 ----
*************** free_gcse_mem ()
*** 1052,1058 ****
    free (reg_set_bitmap);
  
    sbitmap_vector_free (reg_set_in_block);
-   free (mem_set_in_block);
    /* re-Cache any INSN_LIST nodes we have allocated.  */
    {
      int i;
--- 1038,1043 ----
*************** compute_sets (f)
*** 1317,1332 ****
  static int *reg_first_set;
  static int *reg_last_set;
  
- /* While computing "first/last set" info, this is the CUID of first/last insn
-    to set memory or -1 if not set.  `mem_last_set' is also used when
-    performing GCSE to record whether memory has been set since the beginning
-    of the block.
  
-    Note that handling of memory is very simple, we don't make any attempt
-    to optimize things (later).  */
- static int mem_first_set;
- static int mem_last_set;
- 
  /* See whether X, the source of a set, is something we want to consider for
     GCSE.  */
  
--- 1302,1308 ----
*************** oprs_unchanged_p (x, insn, avail_p)
*** 1409,1420 ****
        if (load_killed_in_block_p (BLOCK_FOR_INSN (insn), INSN_CUID (insn),
  				  x, avail_p))
  	return 0;
-       if (avail_p && mem_last_set != NEVER_SET
- 	  && mem_last_set >= INSN_CUID (insn))
- 	return 0;
-       else if (! avail_p && mem_first_set != NEVER_SET
- 	       && mem_first_set < INSN_CUID (insn))
- 	return 0;
        else
  	return oprs_unchanged_p (XEXP (x, 0), insn, avail_p);
  
--- 1385,1390 ----
*************** canon_list_insert (dest, unused1, v_insn
*** 2410,2416 ****
      alloc_INSN_LIST (dest, canon_modify_mem_list[BLOCK_NUM (insn)]);
  }
  
- /* Record memory first/last/block set information for INSN.  */
  /* Record memory modification information for INSN.  We do not actually care
     about the memory location(s) that are set, or even how they are set (consider
     a CALL_INSN).  We merely need to record which insns modify memory.  */
--- 2380,2385 ----
*************** static void
*** 2419,2429 ****
  record_last_mem_set_info (insn)
       rtx insn;
  {
!   if (mem_first_set == NEVER_SET)
!     mem_first_set = INSN_CUID (insn);
! 
!   mem_last_set = INSN_CUID (insn);
!   mem_set_in_block[BLOCK_NUM (insn)] = 1;
    modify_mem_list[BLOCK_NUM (insn)] = 
      alloc_INSN_LIST (insn, modify_mem_list[BLOCK_NUM (insn)]);
  
--- 2388,2395 ----
  record_last_mem_set_info (insn)
       rtx insn;
  {
!   /* load_killed_in_block_p will handle the case of calls clobbering
!      everything. */
    modify_mem_list[BLOCK_NUM (insn)] = 
      alloc_INSN_LIST (insn, modify_mem_list[BLOCK_NUM (insn)]);
  
*************** compute_hash_table (set_p)
*** 2486,2497 ****
  
    /* While we compute the hash table we also compute a bit array of which
       registers are set in which blocks.
-      We also compute which blocks set memory, in the absence of aliasing
-      support [which is TODO].
       ??? This isn't needed during const/copy propagation, but it's cheap to
       compute.  Later.  */
    sbitmap_vector_zero (reg_set_in_block, n_basic_blocks);
-   memset ((char *) mem_set_in_block, 0, n_basic_blocks);
  
    /* re-Cache any INSN_LIST nodes we have allocated.  */
    {
--- 2452,2460 ----
*************** compute_hash_table (set_p)
*** 2519,2532 ****
  
        /* First pass over the instructions records information used to
  	 determine when registers and memory are first and last set.
! 	 ??? The mem_set_in_block and hard-reg reg_set_in_block computation
  	 could be moved to compute_sets since they currently don't change.  */
  
        for (i = 0; i < max_gcse_regno; i++)
  	reg_first_set[i] = reg_last_set[i] = NEVER_SET;
  
-       mem_first_set = NEVER_SET;
-       mem_last_set = NEVER_SET;
  
        for (insn = BLOCK_HEAD (bb);
  	   insn && insn != NEXT_INSN (BLOCK_END (bb));
--- 2482,2493 ----
  
        /* First pass over the instructions records information used to
  	 determine when registers and memory are first and last set.
! 	 ??? hard-reg reg_set_in_block computation
  	 could be moved to compute_sets since they currently don't change.  */
  
        for (i = 0; i < max_gcse_regno; i++)
  	reg_first_set[i] = reg_last_set[i] = NEVER_SET;
  
  
        for (insn = BLOCK_HEAD (bb);
  	   insn && insn != NEXT_INSN (BLOCK_END (bb));
*************** reset_opr_set_tables ()
*** 2760,2766 ****
    /* Also keep a record of the last instruction to modify memory.
       For now this is very trivial, we only record whether any memory
       location has been modified.  */
-   mem_last_set = 0;
    {
      int i;
  
--- 2721,2726 ----
*************** oprs_not_set_p (x, insn)
*** 2807,2814 ****
        if (load_killed_in_block_p (BLOCK_FOR_INSN (insn), 
  				  INSN_CUID (insn), x, 0))
  	return 0;
-       if (mem_last_set != 0)
- 	return 0;
        else
  	return oprs_not_set_p (XEXP (x, 0), insn);
  
--- 2767,2772 ----
*************** static void
*** 2847,2853 ****
  mark_call (insn)
       rtx insn;
  {
-   mem_last_set = INSN_CUID (insn);
    if (! CONST_CALL_P (insn))
      record_last_mem_set_info (insn);
  }
--- 2805,2810 ----
*************** mark_set (pat, insn)
*** 2871,2881 ****
    else if (GET_CODE (dest) == MEM)
      record_last_mem_set_info (insn);
  
-   if (GET_CODE (dest) == REG)
-     SET_BIT (reg_set_bitmap, REGNO (dest));
-   else if (GET_CODE (dest) == MEM)
-     mem_last_set = INSN_CUID (insn);
- 
    if (GET_CODE (SET_SRC (pat)) == CALL)
      mark_call (insn);
  }
--- 2828,2833 ----
*************** mark_clobber (pat, insn)
*** 2894,2903 ****
    if (GET_CODE (clob) == REG)
      SET_BIT (reg_set_bitmap, REGNO (clob));
    else
-     mem_last_set = INSN_CUID (insn);
-   if (GET_CODE (clob) == REG)
-     SET_BIT (reg_set_bitmap, REGNO (clob));
-   else
      record_last_mem_set_info (insn);
  }
  
--- 2846,2851 ----
*************** expr_killed_p (x, bb)
*** 3149,3156 ****
      case MEM:
        if (load_killed_in_block_p (bb, get_max_uid () + 1, x, 0))
  	return 1;
-       if (mem_set_in_block[bb->index])
- 	return 1;
        else
  	return expr_killed_p (XEXP (x, 0), bb);
  
--- 3097,3102 ----
*************** compute_transp (x, indx, bmap, set_p)
*** 3853,3870 ****
  	      list_entry = XEXP (list_entry, 1);
  	    }
  	}
-       if (set_p)
- 	{
- 	  for (bb = 0; bb < n_basic_blocks; bb++)
- 	    if (mem_set_in_block[bb])
- 	      SET_BIT (bmap[bb], indx);
- 	}
-       else
- 	{
- 	  for (bb = 0; bb < n_basic_blocks; bb++)
- 	    if (mem_set_in_block[bb])
- 	      RESET_BIT (bmap[bb], indx);
- 	}
  
        x = XEXP (x, 0);
        goto repeat;
--- 3792,3797 ----
*************** process_insert_insn (expr)
*** 4639,4645 ****
  
    /* If the expression is something that's an operand, like a constant,
       just copy it to a register.  */
!   if (general_operand (exp, GET_MODE (reg)))
      emit_move_insn (reg, exp);
  
    /* Otherwise, make a new insn to compute this expression and make sure the
--- 4566,4573 ----
  
    /* If the expression is something that's an operand, like a constant,
       just copy it to a register.  */
!   if (general_operand (exp, GET_MODE (reg)) 
!       && !(memory_operand (exp, GET_MODE (reg))))
      emit_move_insn (reg, exp);
  
    /* Otherwise, make a new insn to compute this expression and make sure the
*************** simple_mem (x)
*** 6142,6148 ****
    if (GET_MODE (x) == BLKmode)
      return 0;
  
!   if (!rtx_varies_p (XEXP (x, 0), 0))
      return 1;
    
    return 0;
--- 6070,6077 ----
    if (GET_MODE (x) == BLKmode)
      return 0;
  
!   /* See comment in find_moveable_store */
!   if (!rtx_addr_varies_p (XEXP (x, 0), 0))
      return 1;
  
    return 0;
*************** update_ld_motion_stores (expr)
*** 6373,6409 ****
  
  /* Store motion code.  */
  
- /* This is used to communicate the target bitvector we want to use in the 
-    reg_set_info routine when called via the note_stores mechanism.  */
- static sbitmap * regvec;
- 
  /* Used in computing the reverse edge graph bit vectors.  */
  static sbitmap * st_antloc;
  
  /* Global holding the number of store expressions we are dealing with.  */
  static int num_stores;
  
! /* Checks to set if we need to mark a register set. Called from note_stores.  */
  
! static void
! reg_set_info (dest, setter, data)
!      rtx dest, setter ATTRIBUTE_UNUSED;
!      void * data ATTRIBUTE_UNUSED;
  {
!   if (GET_CODE (dest) == SUBREG)
!     dest = SUBREG_REG (dest);
  
!   if (GET_CODE (dest) == REG)
!     SET_BIT (*regvec, REGNO (dest));
  }
  
  /* Return non-zero if the register operands of expression X are killed 
!    anywhere in basic block BB.  */
  
  static int
! store_ops_ok (x, bb)
       rtx x;
       basic_block bb;
  {
    int i;
    enum rtx_code code;
--- 6302,6362 ----
  
  /* Store motion code.  */
  
  /* Used in computing the reverse edge graph bit vectors.  */
  static sbitmap * st_antloc;
  
  /* Global holding the number of store expressions we are dealing with.  */
  static int num_stores;
  
! /* Dataflow analyzer to figure out reg-def chains */
! struct df *df_analyzer;
  
! /* Mark which registers are used by the mem, in the sbitmap used. */
! static int
! mark_mem_regs (x, used)
!      rtx x;
!      sbitmap used;
  {
!   register const char *fmt;
!   int i, j;
    
!   if (GET_CODE (x) == REG)
!     {
!       if (!TEST_BIT (used, REGNO (x)))
! 	{
! 	  SET_BIT (used, REGNO (x));
! 	  return 1;
  	}
+       return 0;
+     }
+ 
+   fmt = GET_RTX_FORMAT (GET_CODE (x));
+   for (i = GET_RTX_LENGTH (GET_CODE (x)) - 1; i >= 0; i--)
+     {
+       if (fmt[i] == 'e')
+ 	{
+ 	  if (mark_mem_regs (XEXP (x, i),used))
+ 	    return 1;
+ 	}
+       else if (fmt[i] == 'E')
+ 	for (j = XVECLEN (x, i) - 1; j >= 0; j--)
+ 	  if (mark_mem_regs (XVECEXP (x, i, j),used))
+ 	    return 1;
+     }
  
+   return 0;
+ }
+ 
+ 
  /* Return non-zero if the register operands of expression X are killed 
!    before/after insn in basic block BB.  */
  
  static int
! store_ops_ok (x, bb,insn, before)
       rtx x;
       basic_block bb;
+      rtx insn;
+      int before;
  {
    int i;
    enum rtx_code code;
*************** store_ops_ok (x, bb)
*** 6419,6428 ****
    switch (code)
      {
      case REG:
! 	/* If a reg has changed after us in this
! 	   block, the operand has been killed.  */
! 	return TEST_BIT (reg_set_in_block[bb->index], REGNO (x));
  
      case MEM:
        x = XEXP (x, 0);
        goto repeat;
--- 6372,6417 ----
    switch (code)
      {
      case REG:
! 	{
! 	  /* Okay, since the reg def chains are ordered by bb/insn
! 	     (since that's how it calculates them, and even if it didn't,
! 	     we could just sort them), we just walk until we find a def
! 	     in our BB, then walk until we find a def after/before our
! 	     insn, and if we find a reg def after/before our insn, in the
! 	     same bb, we return the approriate value.  If there is no
! 	     such def, to prevent walking *every* reg def, we stop once
! 	     we are out of our BB again. */
! 	  struct df_link *currref;
! 	  bool thereyet=FALSE;
! 	  for (currref = df_analyzer->regs[REGNO(x)].defs;
! 	       currref;
! 	       currref = currref->next)
! 	    {
! 	      if (! (DF_REF_BB (currref->ref)  == bb))
! 		{
! 		  if (!thereyet)
! 		    continue;
! 		  else 
! 		    return 1;
! 		}
! 	      if (before)
! 		{
! 		  if (INSN_UID (DF_REF_INSN (currref->ref)) >= INSN_UID (insn))
! 		    continue;
! 		}
! 	      else
! 		{
! 		  if (INSN_UID (DF_REF_INSN (currref->ref)) < INSN_UID (insn))
! 		    continue;
! 		}
! 	      thereyet = TRUE;
! 	      if (DF_REF_TYPE (currref->ref) == DF_REF_REG_DEF)
! 		return 0;
! 	    }
! 	  return 1;
! 	}
  	
+ 	
      case MEM:
        x = XEXP (x, 0);
        goto repeat;
*************** store_ops_ok (x, bb)
*** 6466,6472 ****
  	      goto repeat;
  	    }
  	  
! 	  if (! store_ops_ok (tem, bb))
  	    return 0;
  	}
        else if (fmt[i] == 'E')
--- 6455,6461 ----
  	      goto repeat;
  	    }
  	  
! 	  if (! store_ops_ok (tem, bb, insn, before))
  	    return 0;
  	}
        else if (fmt[i] == 'E')
*************** store_ops_ok (x, bb)
*** 6475,6481 ****
  	  
  	  for (j = 0; j < XVECLEN (x, i); j++)
  	    {
! 	      if (! store_ops_ok (XVECEXP (x, i, j), bb))
  		return 0;
  	    }
  	}
--- 6464,6470 ----
  	  
  	  for (j = 0; j < XVECLEN (x, i); j++)
  	    {
! 	      if (! store_ops_ok (XVECEXP (x, i, j), bb, insn, before))
  		return 0;
  	    }
  	}
*************** find_moveable_store (insn)
*** 6501,6514 ****
    
    if (GET_CODE (dest) != MEM || MEM_VOLATILE_P (dest)
        || GET_MODE (dest) == BLKmode)
-     return;
- 
-   if (GET_CODE (XEXP (dest, 0)) != SYMBOL_REF)
        return;
! 
!   if (rtx_varies_p (XEXP (dest, 0), 0))
      return;
- 
    ptr = ldst_entry (dest);
    ptr->stores = alloc_INSN_LIST (insn, ptr->stores);
  }
--- 6490,6501 ----
    
    if (GET_CODE (dest) != MEM || MEM_VOLATILE_P (dest)
        || GET_MODE (dest) == BLKmode)
      return;
!   /* ??? Is this conservative, or just correct? We get more
!      *candidates* without it, but i don't think we ever remove any
!      stores where the address did vary. */
!   if (rtx_addr_varies_p (XEXP (dest, 0),0))
      return;
    ptr = ldst_entry (dest);
    ptr->stores = alloc_INSN_LIST (insn, ptr->stores);
  }
*************** static int
*** 6520,6535 ****
  compute_store_table ()
  {
    int bb, ret;
-   unsigned regno;
    rtx insn, pat;
- 
    max_gcse_regno = max_reg_num ();
  
-   reg_set_in_block = (sbitmap *) sbitmap_vector_alloc (n_basic_blocks,
- 						       max_gcse_regno);
-   sbitmap_vector_zero (reg_set_in_block, n_basic_blocks);
    pre_ldst_mems = 0;
- 
    /* Find all the stores we care about.  */
    for (bb = 0; bb < n_basic_blocks; bb++)
      {
--- 6507,6516 ----
*************** compute_store_table ()
*** 6533,6580 ****
    /* Find all the stores we care about.  */
    for (bb = 0; bb < n_basic_blocks; bb++)
      {
-       regvec = & (reg_set_in_block[bb]);
        for (insn = BLOCK_END (bb);
  	   insn && insn != PREV_INSN (BLOCK_HEAD (bb));
  	   insn = PREV_INSN (insn))
  	{
! #ifdef NON_SAVING_SETJMP 
! 	  if (NON_SAVING_SETJMP && GET_CODE (insn) == NOTE
! 	      && NOTE_LINE_NUMBER (insn) == NOTE_INSN_SETJMP)
! 	    {
! 	      for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
! 		SET_BIT (reg_set_in_block[bb], regno);
! 	      continue;
! 	    }
! #endif
! 	/* Ignore anything that is not a normal insn.  */
! 	if (GET_RTX_CLASS (GET_CODE (insn)) != 'i')
  	    continue;
- 
- 	  if (GET_CODE (insn) == CALL_INSN)
- 	    {
- 	      for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
- 		if ((call_used_regs[regno]
- 		     && regno != STACK_POINTER_REGNUM
- #if HARD_FRAME_POINTER_REGNUM != FRAME_POINTER_REGNUM
- 		     && regno != HARD_FRAME_POINTER_REGNUM
- #endif
- #if ARG_POINTER_REGNUM != FRAME_POINTER_REGNUM
- 		     && ! (regno == ARG_POINTER_REGNUM && fixed_regs[regno])
- #endif
- #if defined (PIC_OFFSET_TABLE_REGNUM) && !defined (PIC_OFFSET_TABLE_REG_CALL_CLOBBERED)
- 		     && ! (regno == PIC_OFFSET_TABLE_REGNUM && flag_pic)
- #endif
- 
- 		     && regno != FRAME_POINTER_REGNUM)
- 		    || global_regs[regno])
- 		SET_BIT (reg_set_in_block[bb], regno);
- 	    }
- 	  
  	  pat = PATTERN (insn);
- 	  note_stores (pat, reg_set_info, NULL);
- 	  
- 	  /* Now that we've marked regs, look for stores.  */
  	  if (GET_CODE (pat) == SET)
  	    find_moveable_store (insn);
  	}
--- 6514,6526 ----
    /* Find all the stores we care about.  */
    for (bb = 0; bb < n_basic_blocks; bb++)
      {
        for (insn = BLOCK_END (bb);
  	   insn && insn != PREV_INSN (BLOCK_HEAD (bb));
  	   insn = PREV_INSN (insn))
  	{
! 	  if (!INSN_P (insn))
  	    continue;
  	  pat = PATTERN (insn);
  	  if (GET_CODE (pat) == SET)
  	    find_moveable_store (insn);
  	}
*************** store_killed_in_insn (x, insn)
*** 6659,6667 ****
        rtx pat = PATTERN (insn);
        /* Check for memory stores to aliased objects.  */
        if (GET_CODE (SET_DEST (pat)) == MEM && !expr_equiv_p (SET_DEST (pat), x))
! 	/* pretend its a load and check for aliasing.  */
  	if (find_loads (SET_DEST (pat), x))
  	  return 1;
        return find_loads (SET_SRC (pat), x);
      }
    else
--- 6604,6613 ----
        rtx pat = PATTERN (insn);
        /* Check for memory stores to aliased objects.  */
        if (GET_CODE (SET_DEST (pat)) == MEM && !expr_equiv_p (SET_DEST (pat), x))
! 	{
  	  if (find_loads (SET_DEST (pat), x))
  	    return 1;
+ 	}
        return find_loads (SET_SRC (pat), x);
      }
    else
*************** store_killed_in_insn (x, insn)
*** 6672,6692 ****
     within basic block BB.  */
  
  static int 
! store_killed_after (x, insn, bb)
       rtx x, insn;
       basic_block bb;
  {
     rtx last = bb->end;
     
     if (insn == last)
       return 0;
! 
!   /* Check if the register operands of the store are OK in this block.
!      Note that if registers are changed ANYWHERE in the block, we'll 
!      decide we can't move it, regardless of whether it changed above 
!      or below the store. This could be improved by checking the register
!      operands while lookinng for aliasing in each insn.  */
!   if (!store_ops_ok (XEXP (x, 0), bb))
      return 1;
  
     for ( ; insn && insn != NEXT_INSN (last); insn = NEXT_INSN (insn))
--- 6618,6635 ----
   within basic block BB.  */
  
  static int 
! store_killed_after (x, insn, bb, testops)
       rtx x, insn;
       basic_block bb;
+      int testops;
  {
     rtx last = bb->end;
     
     if (insn == last)
       return 0;
!    if (testops)
!      /* Check if the register operands of the store are OK in this block.*/
!      if (!store_ops_ok (XEXP (x, 0), bb, insn, 0))
         return 1;
     
     for ( ; insn && insn != NEXT_INSN (last); insn = NEXT_INSN (insn))
*************** store_killed_before (x, insn, bb)
*** 6707,6722 ****
  
     if (insn == first)
       return store_killed_in_insn (x, insn);
!    
!   /* Check if the register operands of the store are OK in this block.
!      Note that if registers are changed ANYWHERE in the block, we'll 
!      decide we can't move it, regardless of whether it changed above 
!      or below the store. This could be improved by checking the register
!      operands while lookinng for aliasing in each insn.  */
!   if (!store_ops_ok (XEXP (x, 0), bb))
      return 1;
  
!    for ( ; insn && insn != PREV_INSN (first); insn = PREV_INSN (insn))
       if (store_killed_in_insn (x, insn))
         return 1;
     
--- 6650,6660 ----
  
     if (insn == first)
       return store_killed_in_insn (x, insn);
!    /* Check if the register operands of the store are OK in this block.*/
!    if (!store_ops_ok (XEXP (x, 0), bb, insn, 1))
       return 1;
     
!    for (insn = PREV_INSN (insn) ; insn && insn != PREV_INSN (first); insn = PREV_INSN (insn))
       if (store_killed_in_insn (x, insn))
         return 1;
     
*************** static void
*** 6733,6741 ****
  build_store_vectors () 
  {
    basic_block bb;
!   int b;
    rtx insn, st;
    struct ls_expr * ptr;
  
    /* Build the gen_vector. This is any store in the table which is not killed
       by aliasing later in its block.  */
--- 6671,6681 ----
  build_store_vectors () 
  {
    basic_block bb;
!   int b,i,j;
    rtx insn, st;
    struct ls_expr * ptr;
+   sbitmap tested, *result;
+   sbitmap used;
    
    /* Build the gen_vector. This is any store in the table which is not killed
       by aliasing later in its block.  */
*************** build_store_vectors () 
*** 6745,6750 ****
--- 6685,6703 ----
    st_antloc = (sbitmap *) sbitmap_vector_alloc (n_basic_blocks, num_stores);
    sbitmap_vector_zero (st_antloc, n_basic_blocks);
  
+   /* Computed stores, for use later by transp */
+   comp = (sbitmap *) sbitmap_vector_alloc (n_basic_blocks, num_stores);
+   sbitmap_vector_zero (comp, n_basic_blocks);
+ 
+   /* Note: In case someone needs something to optimize about store
+      motion, here's the next place to look.  We currently test one more
+      basic block per store than necessary (at least).  Since we know, at
+      the end of this for loop, whether a store was killed in one of the
+      basic blocks (We know both whether it's killed before, and killed
+      after, the insn in the bb it resides in. So unless the insn
+      consists of multiple store/loads, we know whether it was killed
+      in the entire bb), we could avoid testing it for kill and transp in
+      the next for loop. */
    for (ptr = first_ls_expr (); ptr != NULL; ptr = next_ls_expr (ptr))
      { 
        /* Put all the stores into either the antic list, or the avail list,
*************** build_store_vectors () 
*** 6756,6763 ****
  	{
  	  insn = XEXP (st, 0);
  	  bb = BLOCK_FOR_INSN (insn);
! 	  
! 	  if (!store_killed_after (ptr->pattern, insn, bb))
  	    {
  	      /* If we've already seen an availale expression in this block,
  		 we can delete the one we saw already (It occurs earlier in
--- 6709,6717 ----
  	{
  	  insn = XEXP (st, 0);
  	  bb = BLOCK_FOR_INSN (insn);
! 	  /* A store listed is always computed in the bb it points to */
! 	  SET_BIT (comp[bb->index], ptr->index);
! 	  if (!store_killed_after (ptr->pattern, insn, bb, 1))
  	    {
  	      /* If we've already seen an availale expression in this block,
  		 we can delete the one we saw already (It occurs earlier in
*************** build_store_vectors () 
*** 6801,6850 ****
    ae_kill = (sbitmap *) sbitmap_vector_alloc (n_basic_blocks, num_stores);
    sbitmap_vector_zero (ae_kill, n_basic_blocks);
  
    transp = (sbitmap *) sbitmap_vector_alloc (n_basic_blocks, num_stores);
!   sbitmap_vector_zero (transp, n_basic_blocks);
  
    for (ptr = first_ls_expr (); ptr != NULL; ptr = next_ls_expr (ptr))
-     for (b = 0; b < n_basic_blocks; b++)
        {
! 	if (store_killed_after (ptr->pattern, BLOCK_HEAD (b), BASIC_BLOCK (b)))
  	  {
! 	    /* The anticipatable expression is not killed if it's gen'd. */
! 	    /*
! 	      We leave this check out for now. If we have a code sequence 
! 	      in a block which looks like:
! 			ST MEMa = x
! 			L     y = MEMa
! 			ST MEMa = z
! 	      We should flag this as having an ANTIC expression, NOT
! 	      transparent, NOT killed, and AVAIL.
! 	      Unfortunately, since we haven't re-written all loads to
! 	      use the reaching reg, we'll end up doing an incorrect 
! 	      Load in the middle here if we push the store down. It happens in
! 		    gcc.c-torture/execute/960311-1.c with -O3
! 	      If we always kill it in this case, we'll sometimes do
! 	      uneccessary work, but it shouldn't actually hurt anything.
! 	    if (!TEST_BIT (ae_gen[b], ptr->index)).  */
! 	    SET_BIT (ae_kill[b], ptr->index);
  	  }
  	else
! 	  SET_BIT (transp[b], ptr->index);
        }
! 
!   /* Any block with no exits calls some non-returning function, so
!      we better mark the store killed here, or we might not store to
!      it at all.  If we knew it was abort, we wouldn't have to store,
!      but we don't know that for sure.  */
!   if (gcse_file) 
      {
!       fprintf (gcse_file, "ST_avail and ST_antic (shown under loads..)\n");
!       print_ldst_list (gcse_file);
!       dump_sbitmap_vector (gcse_file, "st_antloc", "", st_antloc, n_basic_blocks);
!       dump_sbitmap_vector (gcse_file, "st_kill", "", ae_kill, n_basic_blocks);
!       dump_sbitmap_vector (gcse_file, "Transpt", "", transp, n_basic_blocks);
!       dump_sbitmap_vector (gcse_file, "st_avloc", "", ae_gen, n_basic_blocks);
      }
  }
  
  /* Insert an instruction at the begining of a basic block, and update 
     the BLOCK_HEAD if needed.  */
--- 6756,6889 ----
    ae_kill = (sbitmap *) sbitmap_vector_alloc (n_basic_blocks, num_stores);
    sbitmap_vector_zero (ae_kill, n_basic_blocks);
  
+ 
    transp = (sbitmap *) sbitmap_vector_alloc (n_basic_blocks, num_stores);
!   sbitmap_vector_ones (transp, n_basic_blocks);
! 
!   tested = sbitmap_alloc (max_gcse_regno);
!   sbitmap_zero (tested);
!   result = sbitmap_vector_alloc (n_basic_blocks, max_gcse_regno);
!   sbitmap_vector_zero (result, n_basic_blocks);
!   used = sbitmap_alloc (max_gcse_regno);
!   sbitmap_zero (used);
  
+   /* This whole big nasty thing computes kill and transparent.
+      It's done in this nasty way because profiling showed store motion
+      taking twice as long as GCSE, with the cause being 1 million calls
+      to store_ops_ok taking 30% of the entire runtime of the
+      compiler. 
+      Since store most expressions use the same registers, there's no
+      point in checking them 8 million times for the same basic blocks. If
+      they weren't okay in a BB the last time we checked, they won't be
+      okay now. Since we check all the bb's on each iteration, we don't
+      need a vector for which registers we've tested, just the results.
+      We then proceed to use the results of what store_ops_ok was for a
+      given reg and bb, and if the results were a kill, we don't even need
+      to check if the store was killed in the basic block, it'll be
+      in the kill set because it's regs changed between here and there.
+ 
+      
+      If the whole store had no registers, we just skip store_ops_okay
+      anyway (since it's checking reg operands), and proceed to see if
+      it's okay in each bb, setting the approriate bits.
+ 
+      With this in place, we now take almost no time at all to perform
+      store motion. (It's not on the first page of the profile, it
+      takes less than a second).
+   */
+ 
    for (ptr = first_ls_expr (); ptr != NULL; ptr = next_ls_expr (ptr))
      {
!       /* Make sure we don't have a load-only expr, which we never seem
! 	 to, but i don't think there's actually a guarantee */
!       if (ptr->stores != NULL)
  	{
! 	  /* First mark the regs used by the mem */
! 	  mark_mem_regs (ptr->pattern, used);
! 	  /* Now see if it had any regs */
! 	  if (!(sbitmap_first_set_bit (used) == -1))
! 	    {
! 	      /* For each register, see if we've tested it */
! 	      EXECUTE_IF_SET_IN_SBITMAP (used, 0, i, 
! 	      {
! 		if (TEST_BIT (tested, i))
! 		  {
! 		    /* Already tested the register, so check the
! 		       result, and if we had an okay result, check the
! 		       store itself. */
! 		    for (j = 0; j < n_basic_blocks; j++)
! 		      {
! 			if (!TEST_BIT (result[j], i) || store_killed_after (ptr->pattern, BLOCK_HEAD (j), 
! 									    BASIC_BLOCK (j), FALSE))
! 			  {
! 			    SET_BIT (ae_kill[j], ptr->index);
! 			    if (!TEST_BIT (comp[j], ptr->index))
! 			      RESET_BIT (transp[j], ptr->index);
! 			  }
! 		      }
  		  }
  		else
! 		  {
! 		    /* We haven't tested it yet, so mark it tested,
! 		       and perform the tests */
! 		    SET_BIT (tested, i);
! 		    /* Check if it's okay in each BB */
! 		    for (j = 0; j < n_basic_blocks; j++)
! 		      {
! 			if (store_ops_ok (XEXP (ptr->pattern, 0), BASIC_BLOCK (j), BLOCK_HEAD (j), 0))
! 			  {
! 			    SET_BIT (result[j], ptr->index);
  			  }
! 			else
  			  {
! 			    /* It's not okay, so it's killed and maybe
! 			       not transparent */
! 			    SET_BIT (ae_kill[j], ptr->index);
! 			    if (!TEST_BIT (comp[j], ptr->index))
! 			      {
! 				RESET_BIT (transp[j], ptr->index);
! 			      }
! 			    continue;
! 			  }
! 			/* The ops were okay, so check the store
! 			   itself */
! 			if (store_killed_after (ptr->pattern, BLOCK_HEAD (j), BASIC_BLOCK (j), FALSE))
! 			  {
! 			    SET_BIT (ae_kill[j], ptr->index);
! 			    if (!TEST_BIT (comp[j], ptr->index))
! 			      {
! 				RESET_BIT (transp[j], ptr->index);
! 			      }
! 			  }
! 		      }
! 		  }
! 	      });
! 	      /* Reset the used list */
! 	      sbitmap_zero (used);
! 	    }
! 	  /* If it had no registers, we come here, and do the
! 	     approriate testing */
! 	  else
! 	    {
! 	      for (j = 0; j < n_basic_blocks; j++)
! 		{
! 		  if (store_killed_after (ptr->pattern, BLOCK_HEAD (j), BASIC_BLOCK (j), FALSE))
! 		    {
! 		      SET_BIT (ae_kill[j], ptr->index);
! 		      if (!TEST_BIT (comp[j], ptr->index))
! 			{
! 			  RESET_BIT (transp[j], ptr->index);
! 			}
! 		    }
! 		}
! 	    }  
  	}
      }
+   sbitmap_free (tested);
+   sbitmap_free (used);
+   sbitmap_vector_free (result);
+   sbitmap_vector_free (comp);
+ }
  
  /* Insert an instruction at the begining of a basic block, and update 
     the BLOCK_HEAD if needed.  */
*************** static void 
*** 7023,7029 ****
  free_store_memory ()
  {
    free_ldst_mems ();
-   
    if (ae_gen)
      sbitmap_vector_free (ae_gen);
    if (ae_kill)
--- 7062,7067 ----
*************** free_store_memory ()
*** 7036,7043 ****
      sbitmap_vector_free (pre_insert_map);
    if (pre_delete_map)
      sbitmap_vector_free (pre_delete_map);
-   if (reg_set_in_block)
-     sbitmap_vector_free (reg_set_in_block);
    
    ae_gen = ae_kill = transp = st_antloc = NULL;
    pre_insert_map = pre_delete_map = reg_set_in_block = NULL;
--- 7074,7079 ----
*************** store_motion ()
*** 7051,7058 ****
  {
    int x;
    struct ls_expr * ptr;
!   int update_flow = 0;
  
    if (gcse_file)
      {
        fprintf (gcse_file, "before store motion\n");
--- 7087,7096 ----
  {
    int x;
    struct ls_expr * ptr;
!   sbitmap trapping_expr;
!   int i;
  
+   int update_flow = 0;
    if (gcse_file)
      {
        fprintf (gcse_file, "before store motion\n");
*************** store_motion ()
*** 7061,7072 ****
  
  
    init_alias_analysis ();
! 
    /* Find all the stores that are live to the end of their block.  */
    num_stores = compute_store_table ();
    if (num_stores == 0)
      {
!       sbitmap_vector_free (reg_set_in_block);
        end_alias_analysis ();
        return;
      }
--- 7099,7111 ----
  
  
    init_alias_analysis ();
!   df_analyzer = df_init();
!   df_analyse (df_analyzer, 0, DF_RD_CHAIN | DF_HARD_REGS);
    /* Find all the stores that are live to the end of their block.  */
    num_stores = compute_store_table ();
    if (num_stores == 0)
      {
!       df_finish (df_analyzer);
        end_alias_analysis ();
        return;
      }
*************** store_motion ()
*** 7075,7080 ****
--- 7114,7144 ----
    add_noreturn_fake_exit_edges ();
    build_store_vectors ();
  
+   /* Collect expressions which might trap.  */
+   trapping_expr = sbitmap_alloc (n_exprs);
+   sbitmap_zero (trapping_expr);
+   for (ptr = first_ls_expr (); ptr != NULL; ptr = next_ls_expr(ptr))
+     {
+ 	    if (may_trap_p (ptr->pattern))
+ 		    SET_BIT (trapping_expr, ptr->index);
+     }
+   for (i = 0; i < n_basic_blocks; i++)
+     {
+       edge e;
+ 
+       /* If the current block is the destination of an abnormal edge, we
+ 	 kill all trapping expressions because we won't be able to properly
+ 	 place the instruction on the edge.  So make them neither
+ 	 anticipatable nor transparent.  This is fairly conservative.  */
+       for (e = BASIC_BLOCK (i)->pred; e ; e = e->pred_next)
+ 	if (e->flags & EDGE_ABNORMAL)
+ 	  {
+ 	    sbitmap_difference (st_antloc[i], st_antloc[i], trapping_expr);
+ 	    sbitmap_difference (transp[i], transp[i], trapping_expr);
+ 	    break;
+ 	  }
+     }
+ 
    edge_list = pre_edge_rev_lcm (gcse_file, num_stores, transp, ae_gen, 
  				st_antloc, ae_kill, &pre_insert_map, 
  				&pre_delete_map);
*************** store_motion ()
*** 7093,7101 ****
  
    if (update_flow)
      commit_edge_insertions ();
! 
    free_store_memory ();
    free_edge_list (edge_list);
    remove_fake_edges ();
    end_alias_analysis ();
  }
--- 7157,7166 ----
    
    if (update_flow)
      commit_edge_insertions ();
!   sbitmap_free (trapping_expr);
    free_store_memory ();
    free_edge_list (edge_list);
    remove_fake_edges ();
    end_alias_analysis ();
+   df_finish (df_analyzer);
  }

-- 
"I was watching the Superbowl with my 92 year old grandfather.
The team scored a touchdown.  They showed the instant replay.
He thought they scored another one.  I was gonna tell him, but I
figured the game *he* was watching was better.
"-Steven Wright


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]