[PATCH] New interference graph implementation.

Peter Bergner bergner@vnet.ibm.com
Thu Sep 6 18:41:00 GMT 2007


Below is a patch to convert our interference graph bit matrix to use a
compressed triangular bit matrix plus adjacency lists.  The current code uses a
square bit matrix which besides being large (max_allocno * max_allocno bits),
contains duplicated information on either side of the diagonal.  A standard
triangular bit matrix eliminates the diagonal and the lower portion of the
square bit matrix so there is no duplicated information.  It contains space
for (max_allocno * (max_allocno -1)) / 2 bits, so saving just over half the
space of a square bit matrix.  Visually, it saves the empty elements of the
figure below.

A compressed triangular bit matrix goes further and eliminates the space
in the bit matrix associated with indices representing local allocnos
(local meaning they are referenced in only one basic block) that are live
in different basic blocks (the * elements in the figure below).  The amount
of space we save, if any, is dependant on the specific allocno live ranges.
In the worst case, the allocnos are either all global or all local to one
basic block (fpppp is an example of this).  In this worst case scenario, our
compressed triangular bit matrix just devolves into a standard triangular
bit matrix, saving just over half the space of the square bit matrix.

         G    G    G    L0   L0   L0   L1   L1   L2   L2   L2   L2
       -------------------------------------------------------------
    G  |    |  0 |  1 |  2 |  3 |  4 |  5 |  6 |  7 |  8 |  9 | 10 |
       -------------------------------------------------------------
    G  |    |    | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
       -------------------------------------------------------------
    G  |    |    |    | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 |
       -------------------------------------------------------------
    L0 |    |    |    |    | 30 | 31 |  * |  * |  * |  * |  * |  * |
       -------------------------------------------------------------
    L0 |    |    |    |    |    | 32 |  * |  * |  * |  * |  * |  * |
       -------------------------------------------------------------
    L0 |    |    |    |    |    |    |  * |  * |  * |  * |  * |  * |
       -------------------------------------------------------------
    L1 |    |    |    |    |    |    |    | 33 |  * |  * |  * |  * |
       -------------------------------------------------------------
    L1 |    |    |    |    |    |    |    |    |  * |  * |  * |  * |
       -------------------------------------------------------------
    L2 |    |    |    |    |    |    |    |    |    | 34 | 35 | 36 |
       -------------------------------------------------------------
    L2 |    |    |    |    |    |    |    |    |    |    | 37 | 38 |
       -------------------------------------------------------------
    L2 |    |    |    |    |    |    |    |    |    |    |    | 39 |
       -------------------------------------------------------------
    L2 |    |    |    |    |    |    |    |    |    |    |    |    |
       -------------------------------------------------------------

In addition to the compressed triangular bit matrix, I have also added an
adjacency list which allows for fast access to an allocno's neighbors in
the conflict graph.  They are implemented as a short vector of allocnos
with a next pointer to chain longer sequences together.  I had attempted
to use bitmap.c bitmaps for the adjacency lists, but found that the time
spent in global_alloc() compiling fpppp increased by 13%, so I reverted
to using my own implementation.

Some example memory savings I'm seeing with the patch are:

Test case attached to http://gcc.gnu.org/PR33237, we get:

  ## max_blk:     13167
  ## max_regno:   41566
  ## max_allocno: 29072
  ## Compressed triangular bitmatrix size: 23836452 bits, 2979557 bytes
  ## Standard triangular bitmatrix size:   422576056 bits, 52822007 bytes [5.64%]
  ## Square bitmatrix size:                845181184 bits, 105647648 bytes [2.84%]

So our compressed triangular bit matrix is only 2.84% of the size of
the square bit matrix.

fpppp (as mentioned above, a worst case scenario, but we still are half
the space of the square bit matrix):

  ## max_blk:     2
  ## max_regno:   4259
  ## max_allocno: 2179
  ## Compressed triangular bitmatrix size: 2372931 bits, 296617 bytes
  ## Standard triangular bitmatrix size:   2372931 bits, 296617 bytes [100.00%]
  ## Square bitmatrix size:                4748041 bits, 593506 bytes [50.38%]

twldrv:

  ## max_blk:     201
  ## max_regno:   7139
  ## max_allocno: 2043
  ## Compressed triangular bitmatrix size: 802677 bits, 100335 bytes
  ## Standard triangular bitmatrix size:   2085903 bits, 260738 bytes [38.48%]
  ## Square bitmatrix size:                4173849 bits, 521732 bytes [19.38%]

tomcatv:

  ## max_blk:     46
  ## max_regno:   931
  ## max_allocno: 343
  ## Compressed triangular bitmatrix size: 40487 bits, 5061 bytes
  ## Standard triangular bitmatrix size:   58653 bits, 7332 bytes [69.03%]
  ## Square bitmatrix size:                117649 bits, 14707 bytes [34.69%]


As for time savings, for the test case in PR33237, global_alloc is now 46.7%
faster, but due to the performance problem reported in that PR, the benefit
is totally swamped by the time tree memory partitioning is taking.  As for
the other benchmarks listed here as well as quite a few smallish testcases
I haven't listed here, I saw no measurable difference in compile time,
so we haven't made things slower and in some cases, can make them faster.

Seongbae, can you try bootstrapping/regtesting this patch on some of
your non ppc hardware?  Thanks.

This has passed bootstrapping and regtesting on powerpc64-linux running
the testsuite in both 32-bit and 64-bit modes.  Is this ok for mainline?


Peter


2007-09-06  Peter Bergner  <bergner@vnet.ibm.com>

	* configure.ac (--enable-checking): Add global category.
	(ENABLE_GLOBAL_CHECKING): Define if requested.
	* configure: Regenerate.
	* config.in: Regenerate.
	* doc/install.texi: Document it.
	* sparseset.h, sparseset.c: New files.
	* Makefile.in (OBJS-common): Add sparseset.o.
	(sparseset.o): New rule.
	* global.c: Comments updated.
	Include "sparseset.h".
	(INT_BITS): Define as HOST_BITS_PER_WIDEST_FAST_INT.
	(INT_TYPE): Define as HOST_WIDEST_FAST_INT.
	(adjacency_list_d): New type.
	(partial_bitnum, max_bitnum, adjacency, adjacency_pool): New global
	variables.
	(add_neighbor, adjacency_iter_init, adjacency_iter_done,
	adjacency_iter_next, regno_basic_block, conflict_p, set_conflicts_p,
	regno_compare): New functions.
	(ADJACENCY_VEC_LENGTH, CONFLICT_BITNUM, CONFLICT_BITNUM_FAST,
	FOR_EACH_CONFLICT): New defines.
	(mirror_conflicts): Removed function.
	(allocno_row_words): Removed global variable.
	(CONFLICTP, EXECUTE_IF_SET_IN_ALLOCNO_SET, SET_ALLOCNO_LIVE,
	CLEAR_ALLOCNO_LIVE): Removed defines.
	(allocnos_live): Redefine global variable as a sparseset.
	(global_alloc): Sort the allocno to regno mapping according to
	which basic blocks the regnos are referenced in.  Modify the
	conflict bit matrix to a compressed triangular bitmatrix.
	(global_conflicts): Update uses of allocnos_live.
	Use the new adjacency list to visit an allocno's neighbors
	rather than iterating over all possible allocnos.
	Call set_conflicts_p to setup conflicts rather than adding
	them manually.
	(mark_reg_store): Update uses of allocnos_live.
	(dump_conflicts): Iterate over regnos rather than allocnos so
	that all dump output will be sorted by regno number.
	Use the new adjacency list to visit an allocno's neighbors
	rather than iterating over all possible allocnos.

Index: doc/install.texi
===================================================================
--- doc/install.texi	(revision 128200)
+++ doc/install.texi	(working copy)
@@ -1258,8 +1258,9 @@ checks available are @samp{yes} (most co
 all), @samp{all} (all but @samp{valgrind}), @samp{release} (cheapest
 checks @samp{assert,runtime}) or @samp{none} (same as @samp{no}).
 Individual checks can be enabled with these flags @samp{assert},
-@samp{fold}, @samp{gc}, @samp{gcac} @samp{misc}, @samp{rtl},
-@samp{rtlflag}, @samp{runtime}, @samp{tree}, and @samp{valgrind}.
+@samp{fold}, @samp{gc}, @samp{gcac}, @samp{global}, @samp{misc},
+@samp{rtl}, @samp{rtlflag}, @samp{runtime}, @samp{tree},
+and @samp{valgrind}.
 
 The @samp{valgrind} check requires the external @command{valgrind}
 simulator, available from @uref{http://valgrind.org/}.  The
Index: configure
===================================================================
--- configure	(revision 128200)
+++ configure	(working copy)
@@ -867,7 +867,7 @@ Optional Features:
 			  enable expensive run-time checks.  With LIST,
 			  enable only specific categories of checks.
 			  Categories are: yes,no,all,none,release.
-			  Flags are: assert,df,fold,gc,gcac,misc,
+			  Flags are: assert,df,fold,gc,gcac,global,misc,
 			  rtlflag,rtl,runtime,tree,valgrind,types.
   --enable-mapped-location   location_t is fileline integer cookie
   --enable-coverage=LEVEL
@@ -6452,7 +6452,7 @@ do
 			ac_gc_always_collect=1 ; ac_rtl_checking=1 ;
 			ac_rtlflag_checking=1 ; ac_runtime_checking=1 ;
 			ac_tree_checking=1 ; ac_valgrind_checking= ;
-			ac_types_checking=1 ;;
+			ac_types_checking=1 ; ac_global_checking=1 ;;
 	release)	ac_assert_checking=1 ; ac_checking= ; ac_df_checking= ;
 			ac_fold_checking= ; ac_gc_checking= ;
 			ac_gc_always_collect= ; ac_rtl_checking= ;
@@ -6465,6 +6465,7 @@ do
 	fold)		ac_fold_checking=1 ;;
 	gc)		ac_gc_checking=1 ;;
 	gcac)		ac_gc_always_collect=1 ;;
+	global)		ac_global_checking=1 ;;
 	misc)		ac_checking=1 ;;
 	rtl)		ac_rtl_checking=1 ;;
 	rtlflag)	ac_rtlflag_checking=1 ;;
@@ -6562,6 +6563,13 @@ cat >>confdefs.h <<\_ACEOF
 _ACEOF
 
 fi
+if test x$ac_global_checking != x ; then
+
+cat >>confdefs.h <<\_ACEOF
+#define ENABLE_GLOBAL_CHECKING 1
+_ACEOF
+
+fi
 valgrind_path_defines=
 valgrind_command=
 
Index: config.in
===================================================================
--- config.in	(revision 128200)
+++ config.in	(working copy)
@@ -93,6 +93,13 @@
 #endif
 
 
+/* Define if you want global_alloc to verify valid accesses to several data
+   structures. This is quite expensive. */
+#ifndef USED_FOR_TARGET
+#undef ENABLE_GLOBAL_CHECKING
+#endif
+
+
 /* Define to 1 if translation of program messages to the user's native
    language is requested. */
 #ifndef USED_FOR_TARGET
Index: global.c
===================================================================
--- global.c	(revision 128200)
+++ global.c	(working copy)
@@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  
 #include "df.h"
 #include "vecprim.h"
 #include "dbgcnt.h"
+#include "sparseset.h"
 
 /* This pass of the compiler performs global register allocation.
    It assigns hard register numbers to all the pseudo registers
@@ -55,28 +56,40 @@ along with GCC; see the file COPYING3.  
    reg for it.  The reload pass is independent in other respects
    and it is run even when stupid register allocation is in use.
 
-   1. Assign allocation-numbers (allocnos) to the pseudo-registers
-   still needing allocations and to the pseudo-registers currently
-   allocated by local-alloc which may be spilled by reload.
-   Set up tables reg_allocno and allocno_reg to map
-   reg numbers to allocnos and vice versa.
+   1. Assign allocation-numbers (allocnos) to the pseudo-registers still
+   needing allocations and to the pseudo-registers currently allocated by
+   local-alloc which may be spilled by reload.  Set up tables reg_allocno
+   and allocno_reg to map reg numbers to allocnos and vice versa.
    max_allocno gets the number of allocnos in use.
 
-   2. Allocate a max_allocno by max_allocno conflict bit matrix and clear it.
-   Allocate a max_allocno by FIRST_PSEUDO_REGISTER conflict matrix
-   for conflicts between allocnos and explicit hard register use
-   (which includes use of pseudo-registers allocated by local_alloc).
-
-   3. For each basic block
-    walk forward through the block, recording which
-    pseudo-registers and which hardware registers are live.
-    Build the conflict matrix between the pseudo-registers
-    and another of pseudo-registers versus hardware registers.
-    Also record the preferred hardware registers
-    for each pseudo-register.
+   2. Allocate a max_allocno by max_allocno compressed triangular conflict
+   bit matrix (a triangular bit matrix with portions removed for which we
+   can guarantee there are no conflicts, example: two local pseudos that
+   live in different basic blocks) and clear it.  Allocate a max_allocno
+   by FIRST_PSEUDO_REGISTER conflict matrix for conflicts between allocnos
+   and explicit hard register use (which includes use of pseudo-registers
+   allocated by local_alloc). Note for triangular bit matrices, there are
+   two equations for computing the bit number for two allocnos (LOW < HIGH):
+
+     1) BITNUM = f(HIGH) + LOW, where
+	f(HIGH) = (HIGH * (HIGH - 1)) / 2
+
+     2) BITNUM = f(LOW) + HIGH, where
+	f(LOW) = LOW * (max_allocno - LOW) + (LOW * (LOW - 1)) / 2 - LOW - 1
+
+   We use the second (and less common) equation as this gives us better
+   cache locality for local allocnos that are live within the same basic
+   block.  Also note that f(HIGH) and f(LOW) can be precalculated for all
+   values of HIGH and LOW, so all that is necessary to compute the bit
+   number for two allocnos LOW and HIGH is a load followed by an addition.
+
+   3. For each basic block walk forward through the block, recording which
+   pseudo-registers and which hardware registers are live.
+   Build the conflict matrix between the pseudo-registers and another of
+   pseudo-registers versus hardware registers.  Also record the preferred
+   hardware registers for each pseudo-register.
 
-   4. Sort a table of the allocnos into order of
-   desirability of the variables.
+   4. Sort a table of the allocnos into order of desirability of the variables.
 
    5. Allocate the variables in that order; each if possible into
    a preferred register, else into another register.  */
@@ -157,48 +170,220 @@ static int *allocno_order;
    type that element has.  We use the largest integer format on the
    host machine.  */
 
-#define INT_BITS HOST_BITS_PER_WIDE_INT
-#define INT_TYPE HOST_WIDE_INT
+#define INT_BITS HOST_BITS_PER_WIDEST_FAST_INT
+#define INT_TYPE HOST_WIDEST_FAST_INT
 
-/* max_allocno by max_allocno array of bits,
-   recording whether two allocno's conflict (can't go in the same
-   hardware register).
-
-   `conflicts' is symmetric after the call to mirror_conflicts.  */
+/* max_allocno by max_allocno compressed triangular bit matrix,
+   recording whether two allocnos conflict (can't go in the same
+   hardware register).  */
 
 static INT_TYPE *conflicts;
 
-/* Number of ints required to hold max_allocno bits.
-   This is the length of a row in `conflicts'.  */
+/* Precalculated partial bit number in the bit matrix.  For two allocnos,
+   the final bit number is equal to partial_bitnum[LOW] + HIGH.  */
+
+static int *partial_bitnum;
+
+/* Size in bits of the compressed triangular bit matrix.  */
+
+static int max_bitnum;
+
+/* The pool to allocate the adjacency list elements from.  */
+
+static alloc_pool adjacency_pool;
+
+/* The maximum number of neighbors stored in the neighbors vector before
+   we have to chain in another vector.  */
+
+#define ADJACENCY_VEC_LENGTH 30
+
+/* Conflict graph adjacency list.  */
+
+typedef struct adjacency_list_d
+{
+  int neighbors[ADJACENCY_VEC_LENGTH];
+  unsigned int index;
+  struct adjacency_list_d *next;
+} adjacency_t;
+
+static adjacency_t **adjacency;
+
+/* Add NEIGHBOR to ALLOC_NO's adjacency list.  It is assumed the caller
+   has already determined that NEIGHBOR is not already neighbor by
+   checking the conflict bit matrix.  */
+
+static inline void
+add_neighbor (int alloc_no, int neighbor)
+{
+  adjacency_t *adjlist = adjacency[alloc_no];
+
+  if (adjlist == NULL || adjlist->index == ADJACENCY_VEC_LENGTH)
+    {
+      adjacency_t *new = pool_alloc (adjacency_pool);
+      new->index = 0;
+      new->next = adjlist;
+      adjlist = new;
+      adjacency[alloc_no] = adjlist;
+    }
+
+  adjlist->neighbors[adjlist->index++] = neighbor;
+}
+
+/* Iterator for adjacency lists.  */
+
+typedef struct adjacency_iterator_d
+{
+  adjacency_t *vec;
+  unsigned int idx;
+} adjacency_iter;
+
+/* Initialize a single adjacency list iterator.  */
+
+static inline int
+adjacency_iter_init (adjacency_iter *ai, int allocno1)
+{
+  ai->vec = adjacency[allocno1];
+  ai->idx = 0;
+  return ai->vec != NULL;
+}
+ 
+/* Test whether we have visited all of the neighbors.  */
+
+static inline int
+adjacency_iter_done (adjacency_iter *ai)
+{
+  return ai->idx > ai->vec->index;
+}
+
+/* Advance to the next neighbor in AI.  */
+
+static inline int
+adjacency_iter_next (adjacency_iter *ai)
+{
+  unsigned int idx = ai->idx;
+  int neighbor = ai->vec->neighbors[idx++];
+  if (idx >= ai->vec->index && ai->vec->next != NULL)
+    {
+      ai->vec = ai->vec->next;
+      ai->idx = 0;
+    }
+  else
+    ai->idx = idx;
+  return neighbor;
+}
+
+/* Return the one basic block regno is used in.  If regno is used
+   in more than one basic block or if it is unknown which block it
+   is used in, return 0.  */
 
-static int allocno_row_words;
+static inline int
+regno_basic_block (int regno)
+{
+  int block = REG_BASIC_BLOCK (regno);
+  if (block < 0)
+    block = 0;
+  return block;
+}
+
+/* Macro to visit all of IN_ALLOCNO's neighbors.  Neighbors are
+   returned in OUT_ALLOCNO for each iteration of the loop.  */
+
+#define FOR_EACH_CONFLICT(IN_ALLOCNO, OUT_ALLOCNO, ITER)		\
+  if (!adjacency || !adjacency_iter_init (&(ITER), (IN_ALLOCNO)))	\
+    ;									\
+  else									\
+    for ((OUT_ALLOCNO) = adjacency_iter_next (&(ITER));			\
+	 !adjacency_iter_done (&(ITER));				\
+	 (OUT_ALLOCNO) = adjacency_iter_next (&(ITER)))
+
+/* Macros to determine the bit number within the triangular bit matrix for
+   the two allocnos Low and HIGH, with LOW strictly less than HIGH.  */
+
+#define CONFLICT_BITNUM(I, J) \
+  (((I) < (J)) ? (partial_bitnum[I] + (J)) : (partial_bitnum[J] + (I)))
+
+#define CONFLICT_BITNUM_FAST(I, I_PARTIAL_BITNUM, J) \
+  (((I) < (J)) ? ((I_PARTIAL_BITNUM) + (J)) : (partial_bitnum[J] + (I)))
+
+/* Return true if ALLOCNO1 and ALLOCNO2 conflict, meaning they cannot
+   share the same hardware register.  */
+
+static bool
+conflict_p (int allocno1, int allocno2)
+{
+  int bitnum;
+  INT_TYPE word, mask;
+
+#ifdef ENABLE_GLOBAL_CHECKING
+  int blk1, blk2;
+
+  gcc_assert (allocno1 >= 0 && allocno1 < max_allocno);
+  gcc_assert (allocno2 >= 0 && allocno2 < max_allocno);
+
+  blk1 = regno_basic_block (allocno[allocno1].reg);
+  blk2 = regno_basic_block (allocno[allocno2].reg);
+  gcc_assert (blk1 == 0 || blk2 == 0 || blk1 == blk2);
+#endif
+
+  if (allocno1 == allocno2)
+    /* By definition, an allocno does not conflict with itself.  */
+    return 0;
+
+  bitnum = CONFLICT_BITNUM (allocno1, allocno2);
+
+#ifdef ENABLE_GLOBAL_CHECKING
+  gcc_assert (bitnum >= 0 && bitnum < max_bitnum);
+#endif
 
-/* Two macros to test or store 1 in an element of `conflicts'.  */
+  word = conflicts[bitnum / INT_BITS];
+  mask = (INT_TYPE) 1 << (bitnum % INT_BITS);
+  return (word & mask) != 0;
+}
+
+/* Add conflict edges between ALLOCNO1 and all allocnos currently live.  */
+
+static void
+set_conflicts_p (int allocno1, sparseset live)
+{
+  int i;
+  int bitnum, index;
+  INT_TYPE word, mask;
+  int partial_bitnum_allocno1;
+
+#ifdef ENABLE_GLOBAL_CHECKING
+  gcc_assert (allocno1 >= 0 && allocno1 < max_allocno);
+#endif
 
-#define CONFLICTP(I, J) \
- (conflicts[(I) * allocno_row_words + (unsigned) (J) / INT_BITS]	\
-  & ((INT_TYPE) 1 << ((unsigned) (J) % INT_BITS)))
-
-/* For any allocno set in ALLOCNO_SET, set ALLOCNO to that allocno,
-   and execute CODE.  */
-#define EXECUTE_IF_SET_IN_ALLOCNO_SET(ALLOCNO_SET, ALLOCNO, CODE)	\
-do {									\
-  int i_;								\
-  int allocno_;								\
-  INT_TYPE *p_ = (ALLOCNO_SET);						\
-									\
-  for (i_ = allocno_row_words - 1, allocno_ = 0; i_ >= 0;		\
-       i_--, allocno_ += INT_BITS)					\
-    {									\
-      unsigned INT_TYPE word_ = (unsigned INT_TYPE) *p_++;		\
-									\
-      for ((ALLOCNO) = allocno_; word_; word_ >>= 1, (ALLOCNO)++)	\
-	{								\
-	  if (word_ & 1)						\
-	    {CODE;}							\
-	}								\
-    }									\
-} while (0)
+  partial_bitnum_allocno1 = partial_bitnum[allocno1];
+
+  EXECUTE_IF_SET_IN_SPARSESET (live, i)
+    {
+      /* By definition, an allocno does not conflict with itself.  */
+      if (allocno1 == i)
+	continue;
+
+#ifdef ENABLE_GLOBAL_CHECKING
+      gcc_assert (i >= 0 && i < max_allocno);
+#endif
+
+      bitnum = CONFLICT_BITNUM_FAST (allocno1, partial_bitnum_allocno1, i);
+
+#ifdef ENABLE_GLOBAL_CHECKING
+      gcc_assert (bitnum >= 0 && bitnum < max_bitnum);
+#endif
+
+      index = bitnum / INT_BITS;
+      word = conflicts[index];
+      mask = (INT_TYPE) 1 << (bitnum % INT_BITS);
+
+      if ((word & mask) == 0)
+	{
+	  conflicts[index] = word | mask;
+	  add_neighbor (allocno1, i);
+	  add_neighbor (i, allocno1);
+	}
+    }
+}
 
 /* Set of hard regs currently live (during scan of all insns).  */
 
@@ -230,20 +415,9 @@ static int local_reg_live_length[FIRST_P
 
 #define SET_REGBIT(TABLE, I, J)  SET_HARD_REG_BIT (allocno[I].TABLE, J)
 
-/* Bit mask for allocnos live at current point in the scan.  */
-
-static INT_TYPE *allocnos_live;
-
-/* Test, set or clear bit number I in allocnos_live,
-   a bit vector indexed by allocno.  */
+/* Set of allocnos live at current point in the scan.  */
 
-#define SET_ALLOCNO_LIVE(I)				\
-  (allocnos_live[(unsigned) (I) / INT_BITS]		\
-     |= ((INT_TYPE) 1 << ((unsigned) (I) % INT_BITS)))
-
-#define CLEAR_ALLOCNO_LIVE(I)				\
-  (allocnos_live[(unsigned) (I) / INT_BITS]		\
-     &= ~((INT_TYPE) 1 << ((unsigned) (I) % INT_BITS)))
+static sparseset allocnos_live;
 
 /* This is turned off because it doesn't work right for DImode.
    (And it is only used for DImode, so the other cases are worthless.)
@@ -336,8 +510,8 @@ compute_regs_asm_clobbered (char *regs_a
 static HARD_REG_SET eliminable_regset;
 
 static int allocno_compare (const void *, const void *);
+static int regno_compare (const void *, const void *);
 static void global_conflicts (void);
-static void mirror_conflicts (void);
 static void expand_preferences (void);
 static void prune_preferences (void);
 static void find_reg (int, HARD_REG_SET, int, int, int);
@@ -454,6 +628,8 @@ global_alloc (void)
 {
   int retval;
   size_t i;
+  int max_blk;
+  int *num_allocnos_per_blk;
 
   compute_regsets (&eliminable_regset, &no_global_alloc_regs);
 
@@ -496,9 +672,8 @@ global_alloc (void)
 
   reg_allocno = XNEWVEC (int, max_regno);
 
-  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
-    reg_allocno[i] = -1;
-
+  /* Initialially fill the reg_allocno array with regno's...  */
+  max_blk = 0;
   max_allocno = 0;
   for (i = FIRST_PSEUDO_REGISTER; i < (size_t) max_regno; i++)
     /* Note that reg_live_length[i] < 0 indicates a "constant" reg
@@ -510,28 +685,82 @@ global_alloc (void)
 	&& (! current_function_has_nonlocal_label
 	    || REG_N_CALLS_CROSSED (i) == 0))
       {
-	reg_allocno[i] = max_allocno++;
+	int blk = regno_basic_block (i);
+	reg_allocno[max_allocno++] = i;
 	gcc_assert (REG_LIVE_LENGTH (i));
+	if (blk > max_blk)
+	  max_blk = blk;
       }
-    else
-      reg_allocno[i] = -1;
 
   allocno = XCNEWVEC (struct allocno, max_allocno);
+  partial_bitnum = XNEWVEC (int, max_allocno);
+  num_allocnos_per_blk = XCNEWVEC (int, max_blk + 1);
 
-  for (i = FIRST_PSEUDO_REGISTER; i < (size_t) max_regno; i++)
-    if (reg_allocno[i] >= 0)
-      {
-	int num = reg_allocno[i];
-	allocno[num].reg = i;
-	allocno[num].size = PSEUDO_REGNO_SIZE (i);
-	allocno[num].calls_crossed += REG_N_CALLS_CROSSED (i);
-	allocno[num].throwing_calls_crossed
-	  += REG_N_THROWING_CALLS_CROSSED (i);
-	allocno[num].n_refs += REG_N_REFS (i);
-	allocno[num].freq += REG_FREQ (i);
-	if (allocno[num].live_length < REG_LIVE_LENGTH (i))
-	  allocno[num].live_length = REG_LIVE_LENGTH (i);
-      }
+  /* ...so we can sort them in the order we want them to receive
+     their allocnos.  */
+  qsort (reg_allocno, max_allocno, sizeof (int), regno_compare);
+
+  for (i = 0; i < (size_t) max_allocno; i++)
+    {
+      int regno = reg_allocno[i];
+      int blk = regno_basic_block (regno);
+#ifdef ENABLE_GLOBAL_CHECKING
+      gcc_assert (blk >= 0 && blk <= max_blk);
+#endif
+      num_allocnos_per_blk[blk]++;
+      allocno[i].reg = regno;
+      allocno[i].size = PSEUDO_REGNO_SIZE (regno);
+      allocno[i].calls_crossed += REG_N_CALLS_CROSSED (regno);
+      allocno[i].throwing_calls_crossed
+	+= REG_N_THROWING_CALLS_CROSSED (regno);
+      allocno[i].n_refs += REG_N_REFS (regno);
+      allocno[i].freq += REG_FREQ (regno);
+      if (allocno[i].live_length < REG_LIVE_LENGTH (regno))
+	allocno[i].live_length = REG_LIVE_LENGTH (regno);
+    }
+
+  /* The "global" block must contain all allocnos.  */
+  num_allocnos_per_blk[0] = max_allocno;
+
+  /* Now reinitialize the reg_allocno array in terms of the
+     optimzied regno to allocno mapping we created above.  */
+  for (i = 0; i < (size_t) max_regno; i++)
+    reg_allocno[i] = -1;
+
+  max_bitnum = 0;
+  for (i = 0; i < (size_t) max_allocno; i++)
+    {
+      int regno = allocno[i].reg;
+      int blk = regno_basic_block (regno);
+      int row_size = --num_allocnos_per_blk[blk];
+      reg_allocno[regno] = (int) i;
+      partial_bitnum[i] = (row_size > 0) ? max_bitnum - ((int) i + 1) : -1;
+      max_bitnum += row_size;
+    }
+
+#ifdef ENABLE_GLOBAL_CHECKING
+  gcc_assert (max_bitnum <= ((max_allocno * (max_allocno - 1)) / 2));
+#endif
+
+  if (dump_file)
+    {
+      int dump_bits = max_bitnum;
+      int dump_bytes = (dump_bits + 7) / 8;
+      int dump_bytes_hold = dump_bytes;
+      fprintf (dump_file, "## max_blk:     %d\n", max_blk);
+      fprintf (dump_file, "## max_regno:   %d\n", max_regno);
+      fprintf (dump_file, "## max_allocno: %d\n", max_allocno);
+      fprintf (dump_file, "## Compressed triangular bitmatrix size: %d bits, %d bytes\n", 
+	       dump_bits, dump_bytes);
+      dump_bits = (max_allocno * (max_allocno - 1)) / 2;
+      dump_bytes = (dump_bits + 7) / 8;
+      fprintf (dump_file, "## Standard triangular bitmatrix size:   %d bits, %d bytes [%.2f%%]\n", 
+	       dump_bits, dump_bytes, 100.0 * ((double)dump_bytes_hold / (double)dump_bytes));
+      dump_bits = max_allocno * max_allocno;
+      dump_bytes = (dump_bits + 7) / 8;
+      fprintf (dump_file, "## Square bitmatrix size:                %d bits, %d bytes [%.2f%%]\n", 
+	       dump_bits, dump_bytes, 100.8 * ((double)dump_bytes_hold / (double)dump_bytes));
+    }
 
   /* Calculate amount of usage of each hard reg by pseudos
      allocated by local-alloc.  This is to see if we want to
@@ -572,20 +801,30 @@ global_alloc (void)
 	  fprintf (dump_file, " %d", (int)i);
       fprintf (dump_file, "\n");
     }
-  allocno_row_words = (max_allocno + INT_BITS - 1) / INT_BITS;
 
-  /* We used to use alloca here, but the size of what it would try to
-     allocate would occasionally cause it to exceed the stack limit and
-     cause unpredictable core dumps.  Some examples were > 2Mb in size.  */
-  conflicts = XCNEWVEC (INT_TYPE, max_allocno * allocno_row_words);
-
-  allocnos_live = XNEWVEC (INT_TYPE, allocno_row_words);
+  conflicts = NULL;
+  adjacency = NULL;
+  adjacency_pool = NULL;
+  allocnos_live = NULL;
 
   /* If there is work to be done (at least one reg to allocate),
      perform global conflict analysis and allocate the regs.  */
 
   if (max_allocno > 0)
     {
+
+      /* We used to use alloca here, but the size of what it would try to
+	 allocate would occasionally cause it to exceed the stack limit and
+	 cause unpredictable core dumps.  Some examples were > 2Mb in size.  */
+      conflicts = XCNEWVEC (INT_TYPE, (max_bitnum + INT_BITS - 1) / INT_BITS);
+
+      adjacency = XCNEWVEC (adjacency_t *, max_allocno);
+      adjacency_pool = create_alloc_pool ("global_alloc adjacency list pool",
+					  sizeof (adjacency_t),
+					  1024);
+
+      allocnos_live = sparseset_alloc (max_allocno);
+
       /* Make a vector that mark_reg_{store,clobber} will store in.  */
       if (!regs_set)
 	regs_set = VEC_alloc (rtx, heap, 10);
@@ -595,8 +834,6 @@ global_alloc (void)
 
       global_conflicts ();
 
-      mirror_conflicts ();
-
       /* Eliminate conflicts between pseudos and eliminable registers.  If
 	 the register is not eliminated, the pseudo won't really be able to
 	 live in the eliminable register, so the conflict doesn't matter.
@@ -669,6 +906,8 @@ global_alloc (void)
 	  }
 
       free (allocno_order);
+      free (conflicts);
+      free (allocnos_live);
     }
 
   /* Do the reloads now while the allocno data still exists, so that we can
@@ -685,13 +924,44 @@ global_alloc (void)
 
   /* Clean up.  */
   free (reg_allocno);
+  free (num_allocnos_per_blk);
+  free (partial_bitnum);
   free (allocno);
-  free (conflicts);
-  free (allocnos_live);
+  if (adjacency != NULL)
+    {
+      free_alloc_pool (adjacency_pool);
+      free (adjacency);
+    }
 
   return retval;
 }
 
+/* Sort predicate for ordering the regnos.  We want the regno to allocno
+   mapping to have the property that all "global" regnos (ie, regnos that
+   are referenced in more than one basic block) have smaller allocno values
+   than "local" regnos (ie, regnos referenced in only one basic block).
+   In addition, for two basic blocks "i" and "j" with i < j, all regnos
+   local to basic block i should have smaller allocno values than regnos
+   local to basic block j.
+   Returns -1 (1) if *v1p should be allocated before (after) *v2p.  */
+
+static int
+regno_compare (const void *v1p, const void *v2p)
+{
+  int regno1 = *(const int *)v1p;
+  int regno2 = *(const int *)v2p;
+  int blk1 = REG_BASIC_BLOCK (regno1);
+  int blk2 = REG_BASIC_BLOCK (regno2);
+
+  /* Prefer lower numbered basic blocks.  Note that global and unknown
+     blocks have negative values, giving them high precedence.  */
+  if (blk1 - blk2)
+    return blk1 - blk2;
+
+  /* If both regs are referenced from the same block, sort by regno.  */
+  return regno1 - regno2;
+}
+
 /* Sort predicate for ordering the allocnos.
    Returns -1 (1) if *v1 should be allocated before (after) *v2.  */
 
@@ -735,7 +1005,7 @@ global_conflicts (void)
 
   FOR_EACH_BB (b)
     {
-      memset (allocnos_live, 0, allocno_row_words * sizeof (INT_TYPE));
+      sparseset_clear (allocnos_live);
 
       /* Initialize table of registers currently live
 	 to the state at the beginning of this basic block.
@@ -761,7 +1031,7 @@ global_conflicts (void)
 	    int a = reg_allocno[i];
 	    if (a >= 0)
 	      {
-		SET_ALLOCNO_LIVE (a);
+		sparseset_set_bit (allocnos_live, a);
 		block_start_allocnos[ax++] = a;
 	      }
 	    else if ((a = reg_renumber[i]) >= 0)
@@ -825,10 +1095,10 @@ global_conflicts (void)
 	  if (e != NULL)
 	    {
 #ifdef STACK_REGS
-	      EXECUTE_IF_SET_IN_ALLOCNO_SET (allocnos_live, ax,
-					     {
-					       allocno[ax].no_stack_reg = 1;
-					     });
+	      EXECUTE_IF_SET_IN_SPARSESET (allocnos_live, ax)
+		{
+		  allocno[ax].no_stack_reg = 1;
+		}
 	      for (ax = FIRST_STACK_REG; ax <= LAST_STACK_REG; ax++)
 		record_one_conflict (ax);
 #endif
@@ -975,8 +1245,8 @@ expand_preferences (void)
 	if (REG_NOTE_KIND (link) == REG_DEAD
 	    && REG_P (XEXP (link, 0))
 	    && reg_allocno[REGNO (XEXP (link, 0))] >= 0
-	    && ! CONFLICTP (reg_allocno[REGNO (SET_DEST (set))],
-			    reg_allocno[REGNO (XEXP (link, 0))]))
+	    && ! conflict_p (reg_allocno[REGNO (SET_DEST (set))],
+			     reg_allocno[REGNO (XEXP (link, 0))]))
 	  {
 	    int a1 = reg_allocno[REGNO (SET_DEST (set))];
 	    int a2 = reg_allocno[REGNO (XEXP (link, 0))];
@@ -1049,15 +1319,15 @@ prune_preferences (void)
 	 we want to give the lower-priority allocno the first chance for
 	 these registers).  */
       HARD_REG_SET temp, temp2;
-      int allocno2;
+      unsigned allocno2;
+      adjacency_iter ai;
 
       num = allocno_order[i];
 
       CLEAR_HARD_REG_SET (temp);
       CLEAR_HARD_REG_SET (temp2);
 
-      EXECUTE_IF_SET_IN_ALLOCNO_SET (conflicts + num * allocno_row_words,
-				     allocno2,
+      FOR_EACH_CONFLICT (num, allocno2, ai)
 	{
 	  if (allocno_to_order[allocno2] > i)
 	    {
@@ -1068,7 +1338,7 @@ prune_preferences (void)
 		IOR_HARD_REG_SET (temp2,
 				  allocno[allocno2].hard_reg_full_preferences);
 	    }
-	});
+	}
 
       AND_COMPL_HARD_REG_SET (temp, allocno[num].hard_reg_full_preferences);
       IOR_HARD_REG_SET (temp, temp2);
@@ -1387,8 +1657,9 @@ find_reg (int num, HARD_REG_SET losers, 
 
   if (best_reg >= 0)
     {
-      int lim, j;
+      unsigned lim, j;
       HARD_REG_SET this_reg;
+      adjacency_iter ai;
 
       /* Yes.  Record it as the hard register of this pseudo-reg.  */
       reg_renumber[allocno[num].reg] = best_reg;
@@ -1406,11 +1677,11 @@ find_reg (int num, HARD_REG_SET losers, 
 	}
       /* For each other pseudo-reg conflicting with this one,
 	 mark it as conflicting with the hard regs this one occupies.  */
-      lim = num;
-      EXECUTE_IF_SET_IN_ALLOCNO_SET (conflicts + lim * allocno_row_words, j,
+
+      FOR_EACH_CONFLICT (num, j, ai)
 	{
 	  IOR_HARD_REG_SET (allocno[j].hard_reg_conflicts, this_reg);
-	});
+	}
     }
 }
 
@@ -1460,21 +1731,18 @@ record_one_conflict (int regno)
   if (regno < FIRST_PSEUDO_REGISTER)
     /* When a hard register becomes live,
        record conflicts with live pseudo regs.  */
-    EXECUTE_IF_SET_IN_ALLOCNO_SET (allocnos_live, j,
+    EXECUTE_IF_SET_IN_SPARSESET (allocnos_live, j)
       {
 	SET_HARD_REG_BIT (allocno[j].hard_reg_conflicts, regno);
-      });
+      }
   else
     /* When a pseudo-register becomes live,
        record conflicts first with hard regs,
        then with other pseudo regs.  */
     {
       int ialloc = reg_allocno[regno];
-      int ialloc_prod = ialloc * allocno_row_words;
-
       IOR_HARD_REG_SET (allocno[ialloc].hard_reg_conflicts, hard_regs_live);
-      for (j = allocno_row_words - 1; j >= 0; j--)
-	conflicts[ialloc_prod + j] |= allocnos_live[j];
+      set_conflicts_p (ialloc, allocnos_live);
     }
 }
 
@@ -1491,38 +1759,6 @@ record_conflicts (int *allocno_vec, int 
     IOR_HARD_REG_SET (allocno[allocno_vec[len]].hard_reg_conflicts,
                       hard_regs_live);
 }
-
-/* If CONFLICTP (i, j) is true, make sure CONFLICTP (j, i) is also true.  */
-static void
-mirror_conflicts (void)
-{
-  int i, j;
-  int rw = allocno_row_words;
-  int rwb = rw * INT_BITS;
-  INT_TYPE *p = conflicts;
-  INT_TYPE *q0 = conflicts, *q1, *q2;
-  unsigned INT_TYPE mask;
-
-  for (i = max_allocno - 1, mask = 1; i >= 0; i--, mask <<= 1)
-    {
-      if (! mask)
-	{
-	  mask = 1;
-	  q0++;
-	}
-      for (j = allocno_row_words - 1, q1 = q0; j >= 0; j--, q1 += rwb)
-	{
-	  unsigned INT_TYPE word;
-
-	  for (word = (unsigned INT_TYPE) *p++, q2 = q1; word;
-	       word >>= 1, q2 += rw)
-	    {
-	      if (word & 1)
-		*q2 |= mask;
-	    }
-	}
-    }
-}
 
 /* Handle the case where REG is set by the insn being scanned,
    during the forward scan to accumulate conflicts.
@@ -1565,7 +1801,7 @@ mark_reg_store (rtx reg, const_rtx sette
     {
       if (reg_allocno[regno] >= 0)
 	{
-	  SET_ALLOCNO_LIVE (reg_allocno[regno]);
+	  sparseset_set_bit (allocnos_live, reg_allocno[regno]);
 	  record_one_conflict (regno);
 	}
     }
@@ -1647,7 +1883,7 @@ mark_reg_death (rtx reg)
   if (regno >= FIRST_PSEUDO_REGISTER)
     {
       if (reg_allocno[regno] >= 0)
-	CLEAR_ALLOCNO_LIVE (reg_allocno[regno]);
+	sparseset_clear_bit (allocnos_live, reg_allocno[regno]);
     }
 
   /* For pseudo reg, see if it has been assigned a hardware reg.  */
@@ -1968,6 +2204,7 @@ static void
 dump_conflicts (FILE *file)
 {
   int i;
+  int regno;
   int has_preferences;
   int nregs;
   nregs = 0;
@@ -1978,46 +2215,50 @@ dump_conflicts (FILE *file)
       nregs++;
     }
   fprintf (file, ";; %d regs to allocate:", nregs);
-  for (i = 0; i < max_allocno; i++)
-    {
-      int j;
-      if (reg_renumber[allocno[allocno_order[i]].reg] >= 0)
-	continue;
-      fprintf (file, " %d", allocno[allocno_order[i]].reg);
-      for (j = 0; j < max_regno; j++)
-	if (reg_allocno[j] == allocno_order[i]
-	    && j != allocno[allocno_order[i]].reg)
-	  fprintf (file, "+%d", j);
-      if (allocno[allocno_order[i]].size != 1)
-	fprintf (file, " (%d)", allocno[allocno_order[i]].size);
-    }
+  for (regno = 0; regno < max_regno; regno++)
+    if ((i = reg_allocno[regno]) >= 0)
+      {
+	int j;
+	if (reg_renumber[allocno[allocno_order[i]].reg] >= 0)
+	  continue;
+	fprintf (file, " %d", allocno[allocno_order[i]].reg);
+	for (j = 0; j < max_regno; j++)
+	  if (reg_allocno[j] == allocno_order[i]
+	      && j != allocno[allocno_order[i]].reg)
+	    fprintf (file, "+%d", j);
+	if (allocno[allocno_order[i]].size != 1)
+	  fprintf (file, " (%d)", allocno[allocno_order[i]].size);
+      }
   fprintf (file, "\n");
 
-  for (i = 0; i < max_allocno; i++)
-    {
-      int j;
-      fprintf (file, ";; %d conflicts:", allocno[i].reg);
-      for (j = 0; j < max_allocno; j++)
-	if (CONFLICTP (j, i))
-	  fprintf (file, " %d", allocno[j].reg);
-      for (j = 0; j < FIRST_PSEUDO_REGISTER; j++)
-	if (TEST_HARD_REG_BIT (allocno[i].hard_reg_conflicts, j))
-	  fprintf (file, " %d", j);
-      fprintf (file, "\n");
-
-      has_preferences = 0;
-      for (j = 0; j < FIRST_PSEUDO_REGISTER; j++)
-	if (TEST_HARD_REG_BIT (allocno[i].hard_reg_preferences, j))
-	  has_preferences = 1;
-
-      if (! has_preferences)
-	continue;
-      fprintf (file, ";; %d preferences:", allocno[i].reg);
-      for (j = 0; j < FIRST_PSEUDO_REGISTER; j++)
-	if (TEST_HARD_REG_BIT (allocno[i].hard_reg_preferences, j))
-	  fprintf (file, " %d", j);
-      fprintf (file, "\n");
-    }
+  for (regno = 0; regno < max_regno; regno++)
+    if ((i = reg_allocno[regno]) >= 0)
+      {
+	unsigned j;
+	adjacency_iter ai;
+	fprintf (file, ";; %d conflicts:", allocno[i].reg);
+	FOR_EACH_CONFLICT (i, j, ai)
+	  {
+	    fprintf (file, " %d", allocno[j].reg);
+	  }
+	for (j = 0; j < FIRST_PSEUDO_REGISTER; j++)
+	  if (TEST_HARD_REG_BIT (allocno[i].hard_reg_conflicts, j))
+	    fprintf (file, " %d", j);
+	fprintf (file, "\n");
+
+	has_preferences = 0;
+	for (j = 0; j < FIRST_PSEUDO_REGISTER; j++)
+	  if (TEST_HARD_REG_BIT (allocno[i].hard_reg_preferences, j))
+	    has_preferences = 1;
+
+	if (!has_preferences)
+	  continue;
+	fprintf (file, ";; %d preferences:", allocno[i].reg);
+	for (j = 0; j < FIRST_PSEUDO_REGISTER; j++)
+	  if (TEST_HARD_REG_BIT (allocno[i].hard_reg_preferences, j))
+	    fprintf (file, " %d", j);
+	fprintf (file, "\n");
+      }
   fprintf (file, "\n");
 }
 
Index: configure.ac
===================================================================
--- configure.ac	(revision 128200)
+++ configure.ac	(working copy)
@@ -348,7 +348,7 @@ AC_ARG_ENABLE(checking,
 			  enable expensive run-time checks.  With LIST,
 			  enable only specific categories of checks.
 			  Categories are: yes,no,all,none,release.
-			  Flags are: assert,df,fold,gc,gcac,misc,
+			  Flags are: assert,df,fold,gc,gcac,global,misc,
 			  rtlflag,rtl,runtime,tree,valgrind,types.],
 [ac_checking_flags="${enableval}"],[
 # Determine the default checks.
@@ -379,7 +379,7 @@ do
 			ac_gc_always_collect=1 ; ac_rtl_checking=1 ;
 			ac_rtlflag_checking=1 ; ac_runtime_checking=1 ;
 			ac_tree_checking=1 ; ac_valgrind_checking= ;
-			ac_types_checking=1 ;;
+			ac_types_checking=1 ; ac_global_checking=1 ;;
 	release)	ac_assert_checking=1 ; ac_checking= ; ac_df_checking= ;
 			ac_fold_checking= ; ac_gc_checking= ;
 			ac_gc_always_collect= ; ac_rtl_checking= ;
@@ -392,6 +392,7 @@ do
 	fold)		ac_fold_checking=1 ;;
 	gc)		ac_gc_checking=1 ;;
 	gcac)		ac_gc_always_collect=1 ;;
+	global)		ac_global_checking=1 ;;
 	misc)		ac_checking=1 ;;
 	rtl)		ac_rtl_checking=1 ;;
 	rtlflag)	ac_rtlflag_checking=1 ;;
@@ -469,6 +470,11 @@ if test x$ac_fold_checking != x ; then
 [Define if you want fold checked that it never destructs its argument.
    This is quite expensive.])
 fi
+if test x$ac_global_checking != x ; then
+  AC_DEFINE(ENABLE_GLOBAL_CHECKING, 1,
+[Define if you want global_alloc to verify valid accesses to several
+   data structures.  This is quite expensive.])
+fi
 valgrind_path_defines=
 valgrind_command=
 
Index: sparseset.c
===================================================================
--- sparseset.c	(revision 0)
+++ sparseset.c	(revision 0)
@@ -0,0 +1,233 @@
+/* SparseSet implementation.
+   Copyright (C) 2007 Free Software Foundation, Inc.
+   Contributed by Peter Bergner <bergner@vnet.ibm.com>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 2, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING.  If not, write to the Free
+Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA
+02110-1301, USA.  */
+
+#include "libiberty.h"
+#include "sparseset.h"
+
+
+/* Allocate and clear a n_elms SparseSet.  */
+
+sparseset
+sparseset_alloc (SPARSESET_ELT_TYPE n_elms)
+{
+  unsigned int n_bytes = sizeof (struct sparseset_def)
+			 + ((n_elms - 1) * 2 * sizeof (SPARSESET_ELT_TYPE));
+
+  sparseset set = (sparseset) xmalloc (n_bytes);
+  set->dense = &(set->elms[0]);
+  set->sparse = &(set->elms[n_elms]);
+  set->size = n_elms;
+  sparseset_clear (set);
+  return set;
+}
+
+/* Low level routine not meant for use outside of sparseset.[ch].
+   Assumes idx1 < s->members and idx2 < s->members.  */
+
+static inline void
+sparseset_swap (sparseset s, SPARSESET_ELT_TYPE idx1, SPARSESET_ELT_TYPE idx2)
+{
+  SPARSESET_ELT_TYPE tmp = s->dense[idx2];
+  sparseset_insert_bit (s, s->dense[idx1], idx2);
+  sparseset_insert_bit (s, tmp, idx1);
+}
+
+/* Operation: S = S - {e}
+   Delete e from the set S if it is a member of S.  */
+
+void
+sparseset_clear_bit (sparseset s, SPARSESET_ELT_TYPE e)
+{
+  if (sparseset_bit_p (s, e))
+    {
+      SPARSESET_ELT_TYPE idx = s->sparse[e];
+      SPARSESET_ELT_TYPE iter = s->iter;
+      SPARSESET_ELT_TYPE mem = s->members - 1;
+
+      /* If we are iterating over this set and we want to delete a
+	 member we've already visited, then we swap the element we
+	 want to delete with the element at the current iteration
+	 index so that it plays well together with the code below
+	 that actually removes the element.  */
+      if (s->iterating && idx <= iter)
+	{
+	  if (idx < iter)
+	    {
+	      sparseset_swap (s, idx, iter);
+	      idx = iter;
+	    }
+	  s->iter_inc = 0;
+	}
+
+      /* Replace the element we want to delete with the last element
+	 in the dense array and then decrement s->members, effectively
+	 removing the element we want to delete.  */
+      sparseset_insert_bit (s, s->dense[mem], idx);
+      s->members = mem;
+    }
+}
+
+/* Operation: D = S
+   Restrictions: none.  */
+
+void
+sparseset_copy (sparseset d, sparseset s)
+{
+  SPARSESET_ELT_TYPE i;
+
+  if (d == s)
+    return;
+
+  sparseset_clear (d);
+  for (i = 0; i < s->members; i++)
+    sparseset_insert_bit (d, s->dense[i], i);
+  d->members = s->members;
+}
+
+/* Operation: D = A & B.
+   Restrictions: none.  */
+
+void
+sparseset_and (sparseset d, sparseset a, sparseset b)
+{
+  SPARSESET_ELT_TYPE e;
+
+  if (a == b)
+    {
+      if (d != a)
+	sparseset_copy (d, a);
+      return;
+    }
+
+  if (d == a || d == b)
+    {
+      sparseset s = (d == a) ? b : a;
+
+      EXECUTE_IF_SET_IN_SPARSESET (d, e)
+	if (!sparseset_bit_p (s, e))
+	  sparseset_clear_bit (d, e);
+    }
+  else
+    {
+      sparseset sml, lrg;
+
+      if (sparseset_cardinality (a) < sparseset_cardinality (b))
+	{
+	  sml = a;
+	  lrg = b;
+	}
+      else
+	{
+	  sml = b;
+	  lrg = a;
+	}
+
+      sparseset_clear (d);
+      EXECUTE_IF_SET_IN_SPARSESET (sml, e)
+	if (sparseset_bit_p (lrg, e))
+	  sparseset_set_bit (d, e);
+    }
+}
+
+/* Operation: D = A & ~B.
+   Restrictions: D != B, unless D == A == B.  */
+
+void
+sparseset_and_compl (sparseset d, sparseset a, sparseset b)
+{
+  SPARSESET_ELT_TYPE e;
+
+  if (a == b)
+    {
+      sparseset_clear (d);
+      return;
+    }
+
+  gcc_assert (d != b);
+
+  if (d == a)
+    {
+      if (sparseset_cardinality (d) < sparseset_cardinality (b))
+	{
+	  EXECUTE_IF_SET_IN_SPARSESET (d, e)
+	    if (sparseset_bit_p (b, e))
+	      sparseset_clear_bit (d, e);
+	}
+      else
+	{
+	  EXECUTE_IF_SET_IN_SPARSESET (b, e)
+	    sparseset_clear_bit (d, e);
+	}
+    }
+  else
+    {
+      sparseset_clear (d);
+      EXECUTE_IF_SET_IN_SPARSESET (a, e)
+	if (!sparseset_bit_p (b, e))
+	  sparseset_set_bit (d, e);
+    }
+}
+
+/* Operation: D = A | B.
+   Restrictions: none.  */
+
+void
+sparseset_ior (sparseset d, sparseset a, sparseset b)
+{
+  SPARSESET_ELT_TYPE e;
+
+  if (a == b)
+    sparseset_copy (d, a);
+  else if (d == b)
+    {
+      EXECUTE_IF_SET_IN_SPARSESET (a, e)
+	sparseset_set_bit (d, e);
+    }
+  else
+    {
+      if (d != a)
+        sparseset_copy (d, a);
+      EXECUTE_IF_SET_IN_SPARSESET (b, e)
+	sparseset_set_bit (d, e);
+    }
+}
+
+/* Operation: A == B
+   Restrictions: none.  */
+
+bool
+sparseset_equal_p (sparseset a, sparseset b)
+{
+  SPARSESET_ELT_TYPE e;
+
+  if (a == b)
+    return true;
+
+  if (sparseset_cardinality (a) != sparseset_cardinality (b))
+    return false;
+
+  EXECUTE_IF_SET_IN_SPARSESET (a, e)
+    if (!sparseset_bit_p (b, e))
+      return false;
+
+  return true;
+}
+
Index: sparseset.h
===================================================================
--- sparseset.h	(revision 0)
+++ sparseset.h	(revision 0)
@@ -0,0 +1,163 @@
+/* SparseSet implementation.
+   Copyright (C) 2007 Free Software Foundation, Inc.
+   Contributed by Peter Bergner <bergner@vnet.ibm.com>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 2, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING.  If not, write to the Free
+Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA
+02110-1301, USA.  */
+
+#ifndef GCC_SPARSESET_H
+#define GCC_SPARSESET_H
+
+#include "system.h"
+#include <assert.h>
+
+#define SPARSESET_ELT_BITS ((unsigned) HOST_BITS_PER_WIDEST_FAST_INT)
+#define SPARSESET_ELT_TYPE unsigned int
+
+/* Data Structure used for the SparseSet representation.  */
+
+typedef struct sparseset_def
+{
+  SPARSESET_ELT_TYPE *dense;	/* Dense array.  */
+  SPARSESET_ELT_TYPE *sparse; 	/* Sparse array.  */
+  SPARSESET_ELT_TYPE members;	/* Number of elements.  */
+  SPARSESET_ELT_TYPE size;	/* Maximum number of elements.  */
+  SPARSESET_ELT_TYPE iter;	/* Iterator index.  */
+  unsigned char iter_inc;	/* Iteration increment amount.  */
+  bool iterating;
+  SPARSESET_ELT_TYPE elms[2];   /* Combined dense and sparse arrays.  */
+} *sparseset;
+
+#define sparseset_free(MAP)  free(MAP)
+extern sparseset sparseset_alloc (SPARSESET_ELT_TYPE n_elms);
+extern void sparseset_clear_bit (sparseset, SPARSESET_ELT_TYPE);
+extern void sparseset_copy (sparseset, sparseset);
+extern void sparseset_and (sparseset, sparseset, sparseset);
+extern void sparseset_and_compl (sparseset, sparseset, sparseset);
+extern void sparseset_ior (sparseset, sparseset, sparseset);
+extern bool sparseset_equal_p (sparseset, sparseset);
+
+/* Operation: S = {}
+   Clear the set of all elements.  */
+
+static inline void
+sparseset_clear (sparseset s)
+{
+  s->members = 0;
+  s->iterating = false;
+}
+
+/* Return the number of elements currently in the set.  */
+
+static inline SPARSESET_ELT_TYPE
+sparseset_cardinality (sparseset s)
+{
+  return s->members;
+}
+
+/* Return the maximum number of elements this set can hold.  */
+
+static inline SPARSESET_ELT_TYPE
+sparseset_size (sparseset s)
+{
+  return s->size;
+}
+
+/* Return true if e is a member of the set S, otherwise return false.  */
+
+static inline bool
+sparseset_bit_p (sparseset s, SPARSESET_ELT_TYPE e)
+{
+  SPARSESET_ELT_TYPE idx;
+
+  gcc_assert (e < s->size);
+
+  idx = s->sparse[e];
+
+  return idx < s->members && s->dense[idx] == e;
+}
+
+/* Low level insertion routine not meant for use outside of sparseset.[ch].
+   Assumes E is valid and not already a member of the set S.  */
+
+static inline void
+sparseset_insert_bit (sparseset s, SPARSESET_ELT_TYPE e, SPARSESET_ELT_TYPE idx)
+{
+  s->sparse[e] = idx;
+  s->dense[idx] = e;
+}
+
+/* Operation: S = S + {e}
+   Insert E into the set S, if it isn't already a member.  */
+
+static inline void
+sparseset_set_bit (sparseset s, SPARSESET_ELT_TYPE e)
+{
+  if (!sparseset_bit_p (s, e))
+    sparseset_insert_bit (s, e, s->members++);
+}
+
+/* Return and remove an arbitrary element from the set S.  */
+
+static inline SPARSESET_ELT_TYPE
+sparseset_pop (sparseset s)
+{
+  SPARSESET_ELT_TYPE mem = s->members;
+
+  gcc_assert (mem != 0);
+
+  s->members = mem - 1;
+  return s->dense[mem];
+}
+
+static inline void
+sparseset_iter_init (sparseset s)
+{
+  s->iter = 0;
+  s->iter_inc = 1;
+  s->iterating = true;
+}
+
+static inline bool
+sparseset_iter_p (sparseset s)
+{
+  if (s->iterating && s->iter < s->members)
+    return true;
+  else
+    return s->iterating = false;
+}
+
+static inline SPARSESET_ELT_TYPE
+sparseset_iter_elm (sparseset s)
+{
+  return s->dense[s->iter];
+}
+
+static inline void
+sparseset_iter_next (sparseset s)
+{
+  s->iter += s->iter_inc;
+  s->iter_inc = 1;
+}
+
+#define EXECUTE_IF_SET_IN_SPARSESET(SPARSESET, ITER)			\
+  for (sparseset_iter_init (SPARSESET);					\
+       sparseset_iter_p (SPARSESET)					\
+       && (((ITER) = sparseset_iter_elm (SPARSESET)) || 1);		\
+       sparseset_iter_next (SPARSESET))
+
+#endif /* GCC_SPARSESET_H */
Index: Makefile.in
===================================================================
--- Makefile.in	(revision 128200)
+++ Makefile.in	(working copy)
@@ -1122,6 +1122,7 @@ OBJS-common = \
 	sdbout.o \
 	see.o \
 	simplify-rtx.o \
+	sparseset.o \
 	sreal.o \
 	stack-ptr-mod.o \
 	stmt.o \
@@ -1760,6 +1761,7 @@ sbitmap.o: sbitmap.c $(CONFIG_H) $(SYSTE
     $(FLAGS_H) hard-reg-set.h $(BASIC_BLOCK_H) $(OBSTACK_H)
 ebitmap.o: ebitmap.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
 	$(EBITMAP_H)
+sparseset.o: sparseset.c $(SYSTEM_H) sparseset.h
 
 COLLECT2_OBJS = collect2.o tlink.o intl.o version.o
 COLLECT2_LIBS = @COLLECT2_LIBS@



More information about the Gcc-patches mailing list