This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Patch Pings for fixes to bz33562 and bz61912.



Pinging patches #1 and #2 from the 4 part series to improve DSE. ISTM that #3 and #4 should wait for gcc-7.

Patches #1 and #2 included for reference.

--- Begin Message --- This is the first of the 4 part patchkit to address deficiencies in our DSE implementation.

This patch addresses the P2 regression 33562 which has been a low priority regression since gcc-4.3. To summarize, DSE no longer has the ability to detect an aggregate store as dead if subsequent stores are done in a piecemeal fashion.

I originally tackled this by changing how we lower complex objects. That was sufficient to address 33562, but was reasonably rejected.

This version attacks the problem by improving DSE to track stores to memory at a byte level. That allows us to determine if a series of stores completely covers an earlier store (thus making the earlier store dead).

A useful side effect of this is we can detect when parts of a store are dead and potentially rewrite the store. This patch implements that for complex object initializations. While not strictly part of 33562, it's so closely related that I felt it belongs as part of this patch.

This originally limited the size of the tracked memory space to 64 bytes. I bumped the limit after working through the CONSTRUCTOR and mem* trimming patches. The 256 byte limit is still fairly arbitrary and I wouldn't lose sleep if we throttled back to 64 or 128 bytes.

Later patches in the kit will build upon this patch. So if pieces look like skeleton code, that's because it is.

The changes since the V2 patch are:

1. Using sbitmaps rather than bitmaps.
2. Returning a tri-state from dse_classify_store (renamed from dse_possible_dead_store_p)
3. More efficient trim computation
4. Moving trimming code out of dse_classify_store
5. Refactoring code to delete dead calls/assignments
6. dse_optimize_stmt moves into the dse_dom_walker class

Not surprisingly, this patch has most of the changes based on prior feedback as it includes the raw infrastructure.

Bootstrapped and regression tested on x86_64-linux-gnu.  OK for the trunk?
	PR tree-optimization/33562
	* params.def (PARM_DSE_MAX_OBJECT_SIZE): New PARAM.
	* sbitmap.h (bitmap_clear_range, bitmap_set_range): Prototype new
	functions.
	(bitmap_count_bits): Likewise.
	* sbitmap.c (bitmap_clear_range, bitmap_set_range): New functions.
	(bitmap_count_bits): Likewise.
	* tree-ssa-dse.c: Include params.h.
	(dse_store_status): New enum.
	(initialize_ao_ref_for_dse): New, partially extracted from
	dse_optimize_stmt.
	(valid_ao_ref_for_dse, normalize_ref): New.
	(setup_live_bytes_from_ref, compute_trims): Likewise.
	(clear_bytes_written_by, trim_complex_store): Likewise.
	(maybe_trim_partially_dead_store): Likewise.
	(maybe_trim_complex_store): Likewise.
	(dse_classify_store): Renamed from dse_possibly_dead_store_p.
	Track what bytes live from the original store.  Return tri-state
	for dead, partially dead or live.
	(dse_dom_walker): Add constructor, destructor and new private members.
	(delete_dead_call, delete_dead_assignment): New extracted from
	dse_optimize_stmt.
	(dse_optimize_stmt): Make a member of dse_dom_walker.
	Use initialize_ao_ref_for_dse.


	* gcc.dg/tree-ssa/complex-4.c: No longer xfailed.
	* gcc.dg/tree-ssa/complex-5.c: Likewise.
	* gcc.dg/tree-ssa/ssa-dse-9.c: Likewise.
	* gcc.dg/tree-ssa/ssa-dse-18.c: New test.
	* gcc.dg/tree-ssa/ssa-dse-19.c: Likewise.
	* gcc.dg/tree-ssa/ssa-dse-20.c: Likewise.
	* gcc.dg/tree-ssa/ssa-dse-21.c: Likewise.

diff --git a/gcc/params.def b/gcc/params.def
index 50f75a7..f367c1d 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -532,6 +532,11 @@ DEFPARAM(PARAM_AVG_LOOP_NITER,
 	 "Average number of iterations of a loop.",
 	 10, 1, 0)
 
+DEFPARAM(PARAM_DSE_MAX_OBJECT_SIZE,
+	 "dse-max-object-size",
+	 "Maximum size (in bytes) of objects tracked by dead store elimination.",
+	 256, 0, 0)
+
 DEFPARAM(PARAM_SCEV_MAX_EXPR_SIZE,
  	 "scev-max-expr-size",
 	 "Bound on size of expressions used in the scalar evolutions analyzer.",
diff --git a/gcc/sbitmap.c b/gcc/sbitmap.c
index 10b4347..2b66a6c 100644
--- a/gcc/sbitmap.c
+++ b/gcc/sbitmap.c
@@ -202,6 +202,39 @@ bitmap_empty_p (const_sbitmap bmap)
   return true;
 }
 
+void
+bitmap_clear_range (sbitmap bmap, unsigned int start, unsigned int count)
+{
+  for (unsigned int i = start; i < start + count; i++)
+    bitmap_clear_bit (bmap, i);
+}
+
+void
+bitmap_set_range (sbitmap bmap, unsigned int start, unsigned int count)
+{
+  for (unsigned int i = start; i < start + count; i++)
+    bitmap_set_bit (bmap, i);
+}
+
+
+unsigned int
+bitmap_count_bits (const_sbitmap bmap)
+{
+  unsigned int count = 0;
+  for (unsigned int i = 0; i < bmap->size; i++)
+    if (bmap->elms[i])
+      {
+# if HOST_BITS_PER_WIDEST_FAST_INT == HOST_BITS_PER_LONG
+	count += __builtin_popcountl (bmap->elms[i]);
+# elif HOST_BITS_PER_WIDEST_FAST_INT == HOST_BITS_PER_LONGLONG
+	count += __builtin_popcountll (bmap->elms[i]);
+# else
+	count += __builtin_popcount (bmap->elms[i]);
+# endif
+      }
+  return count;
+}
+
 
 /* Zero all elements in a bitmap.  */
 
diff --git a/gcc/sbitmap.h b/gcc/sbitmap.h
index bd734f9..aa43d0d 100644
--- a/gcc/sbitmap.h
+++ b/gcc/sbitmap.h
@@ -231,8 +231,11 @@ extern sbitmap *sbitmap_vector_alloc (unsigned int, unsigned int);
 extern sbitmap sbitmap_resize (sbitmap, unsigned int, int);
 extern void bitmap_copy (sbitmap, const_sbitmap);
 extern int bitmap_equal_p (const_sbitmap, const_sbitmap);
+extern unsigned int bitmap_count_bits (const_sbitmap);
 extern bool bitmap_empty_p (const_sbitmap);
 extern void bitmap_clear (sbitmap);
+extern void bitmap_clear_range (sbitmap, unsigned, unsigned);
+extern void bitmap_set_range (sbitmap, unsigned, unsigned);
 extern void bitmap_ones (sbitmap);
 extern void bitmap_vector_clear (sbitmap *, unsigned int);
 extern void bitmap_vector_ones (sbitmap *, unsigned int);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c b/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
index 87a2638..3155741 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
@@ -10,4 +10,4 @@ int f(void)
   return g(&t);
 }
 
-/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c b/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
index e2cd403..e6d027f 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
@@ -8,4 +8,4 @@ int f(void)
  __imag__ t = 2;
 }
 
-/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-18.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-18.c
new file mode 100644
index 0000000..92b2df8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-18.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+int g(_Complex int*);
+int f(void)
+{
+  _Complex int t = 0;
+  int i, j;
+ __imag__ t += 2;
+  return g(&t);
+}
+
+
+/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "REALPART_EXPR" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "IMAGPART_EXPR" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-19.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-19.c
new file mode 100644
index 0000000..718b746
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-19.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+int g(_Complex int*);
+int f(void)
+{
+  _Complex int t = 0;
+  int i, j;
+ __real__ t += 2;
+  return g(&t);
+}
+
+
+/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "REALPART_EXPR" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "IMAGPART_EXPR" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-20.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-20.c
new file mode 100644
index 0000000..4e14d9b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-20.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fno-tree-dce -fdump-tree-optimized" } */
+_Complex int t = 0;
+int f(void)
+{
+  t = 0;
+ __imag__ t = 2;
+}
+
+/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "REALPART_EXPR" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "IMAGPART_EXPR" 1 "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-21.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-21.c
new file mode 100644
index 0000000..d1e0b85
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-21.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fno-tree-dce -fdump-tree-optimized" } */
+_Complex int t = 0;
+int f(void)
+{
+  t = 0;
+ __real__ t = 2;
+}
+
+/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "REALPART_EXPR" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "IMAGPART_EXPR" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c
index 594c20c..ae48ddd 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c
@@ -11,4 +11,4 @@ foo ()
 }
 
 /* We should eliminate the first assignment.  */
-/* { dg-final { scan-tree-dump-times "VDEF" 2 "dse1" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "VDEF" 2 "dse1" } } */
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 778b363..40acd83 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-dfa.h"
 #include "domwalk.h"
 #include "tree-cfgcleanup.h"
+#include "params.h"
 
 /* This file implements dead store elimination.
 
@@ -68,14 +69,242 @@ along with GCC; see the file COPYING3.  If not see
    remove their dead edges eventually.  */
 static bitmap need_eh_cleanup;
 
+/* Return value from dse_classify_store */
+enum dse_store_status
+{
+  DSE_STORE_LIVE,
+  DSE_STORE_MAYBE_PARTIAL_DEAD,
+  DSE_STORE_DEAD
+};
+
+/* STMT is a statement that may write into memory.  Analyze it and
+   initialize WRITE to describe how STMT affects memory.
+
+   Return TRUE if the the statement was analyzed, FALSE otherwise.
+
+   It is always safe to return FALSE.  But typically better optimziation
+   can be achieved by analyzing more statements.  */
+
+static bool
+initialize_ao_ref_for_dse (gimple *stmt, ao_ref *write)
+{
+  /* It's advantageous to handle certain mem* functions.  */
+  if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
+    {
+      switch (DECL_FUNCTION_CODE (gimple_call_fndecl (stmt)))
+	{
+	  case BUILT_IN_MEMCPY:
+	  case BUILT_IN_MEMMOVE:
+	  case BUILT_IN_MEMSET:
+	    {
+	      tree size = NULL_TREE;
+	      if (gimple_call_num_args (stmt) == 3)
+		size = gimple_call_arg (stmt, 2);
+	      tree ptr = gimple_call_arg (stmt, 0);
+	      ao_ref_init_from_ptr_and_size (write, ptr, size);
+	      return true;
+	    }
+	  default:
+	    break;
+	}
+    }
+  else if (is_gimple_assign (stmt))
+    {
+      ao_ref_init (write, gimple_assign_lhs (stmt));
+      return true;
+    }
+  return false;
+}
+
+/* Given REF from the the alias oracle, return TRUE if it is a valid
+   memory reference for dead store elimination, false otherwise.
+
+   In particular, the reference must have a known base, known maximum
+   size, start at a byte offset and have a size that is one or more
+   bytes.   Finally the offset and size must be smaller than
+   PARAM_DSE_MAX_OBJECT_SIZE.  */
+
+static bool
+valid_ao_ref_for_dse (ao_ref *ref)
+{
+  return (ao_ref_base (ref)
+	  && ref->max_size != -1
+	  && (ref->offset % BITS_PER_UNIT) == 0
+	  && (ref->size % BITS_PER_UNIT) == 0
+	  && (ref->size / BITS_PER_UNIT) > 0);
+}
+
+/* Normalize COPY (an ao_ref) relative to REF.  Essentially when we are
+   done COPY will only refer bytes found within REF.
+
+   We have already verified that COPY intersects at least one
+   byte with REF.  */
+
+static void
+normalize_ref (ao_ref *copy, ao_ref *ref)
+{
+  /* If COPY starts before REF, then reset the beginning of
+     COPY to match REF and decrease the size of COPY by the
+     number of bytes removed from COPY.  */
+  if (copy->offset < ref->offset)
+    {
+      copy->size -= (ref->offset - copy->offset);
+      copy->offset = ref->offset;
+    }
+
+  /* If COPY extends beyond REF, chop off its size appropriately.  */
+  if (copy->offset + copy->size > ref->offset + ref->size)
+    copy->size -= (copy->offset + copy->size - (ref->offset + ref->size));
+}
+
+/* Clear any bytes written by STMT from the bitmap LIVE_BYTES.  The base
+   address written by STMT must match the one found in REF, which must
+   have its base address previously initialized.
+
+   This routine must be conservative.  If we don't know the offset or
+   actual size written, assume nothing was written.  */
+
+static void
+clear_bytes_written_by (sbitmap live_bytes, gimple *stmt, ao_ref *ref)
+{
+  ao_ref write;
+  if (!initialize_ao_ref_for_dse (stmt, &write))
+    return;
+
+  /* Verify we have the same base memory address, the write
+     has a known size and overlaps with REF.  */
+  if (valid_ao_ref_for_dse (&write)
+      && write.base == ref->base
+      && write.size == write.max_size
+      && ((write.offset < ref->offset
+	   && write.offset + write.size > ref->offset)
+	  || (write.offset >= ref->offset
+	      && write.offset < ref->offset + ref->size)))
+    {
+      normalize_ref (&write, ref);
+      bitmap_clear_range (live_bytes,
+			  (write.offset - ref->offset) / BITS_PER_UNIT,
+			  write.size / BITS_PER_UNIT);
+    }
+}
+
+/* REF is a memory write.  Extract relevant information from it and
+   initialize the LIVE_BYTES bitmap.  If successful, return TRUE.
+   Otherwise return FALSE.  */
+
+static bool
+setup_live_bytes_from_ref (ao_ref *ref, sbitmap live_bytes)
+{
+  if (valid_ao_ref_for_dse (ref)
+      && (ref->size / BITS_PER_UNIT
+	  <= PARAM_VALUE (PARAM_DSE_MAX_OBJECT_SIZE)))
+    {
+      bitmap_clear (live_bytes);
+      bitmap_set_range (live_bytes, 0, ref->size / BITS_PER_UNIT);
+      return true;
+    }
+  return false;
+}
+
+/* Compute the number of elements that we can trim from the head and
+   tail of ORIG resulting in a bitmap that is a superset of LIVE.
+
+   Store the number of elements trimmed from the head and tail in
+   TRIM_HEAD and TRIM_TAIL.  */
+
+static void
+compute_trims (ao_ref *ref, sbitmap live, int *trim_head, int *trim_tail)
+{
+  /* We use sbitmaps biased such that ref->offset is bit zero and the bitmap
+     extends through ref->size.  So we know that in the original bitmap
+     bits 0..ref->size were true.  We don't actually need the bitmap, just
+     the REF to compute the trims.  */
+
+  /* Now identify how much, if any of the tail we can chop off.  */
+  *trim_tail = 0;
+  int last_orig = (ref->size / BITS_PER_UNIT) - 1;
+  int last_live = bitmap_last_set_bit (live);
+  *trim_tail = (last_orig - last_live) & ~0x1;
+
+  /* Identify how much, if any of the head we can chop off.  */
+  int first_orig = 0;
+  int first_live = bitmap_first_set_bit (live);
+  *trim_head = (first_live - first_orig) & ~0x1;
+}
+
+/* STMT initializes an object from COMPLEX_CST where one or more of the
+   bytes written may be dead stores.  REF is a representation of the
+   memory written.  LIVE is the bitmap of stores that are actually live.
+
+   Attempt to rewrite STMT so that only the real or imaginary part of
+   the object is actually stored.  */
+
+static void
+maybe_trim_complex_store (ao_ref *ref, sbitmap live, gimple *stmt)
+{
+  int trim_head, trim_tail;
+  compute_trims (ref, live, &trim_head, &trim_tail);
+
+  /* The amount of data trimmed from the head or tail must be at
+     least half the size of the object to ensure we're trimming
+     the entire real or imaginary half.  By writing things this
+     way we avoid more O(n) bitmap operations.  */
+  if (trim_tail * 2 >= ref->size / BITS_PER_UNIT)
+    {
+      /* TREE_REALPART is live */
+      tree x = TREE_REALPART (gimple_assign_rhs1 (stmt));
+      tree y = gimple_assign_lhs (stmt);
+      y = build1 (REALPART_EXPR, TREE_TYPE (x), y);
+      gimple_assign_set_lhs (stmt, y);
+      gimple_assign_set_rhs1 (stmt, x);
+    }
+  else if (trim_head * 2 >= ref->size / BITS_PER_UNIT)
+    {
+      /* TREE_IMAGPART is live */
+      tree x = TREE_IMAGPART (gimple_assign_rhs1 (stmt));
+      tree y = gimple_assign_lhs (stmt);
+      y = build1 (IMAGPART_EXPR, TREE_TYPE (x), y);
+      gimple_assign_set_lhs (stmt, y);
+      gimple_assign_set_rhs1 (stmt, x);
+    }
+
+  /* Other cases indicate parts of both the real and imag subobjects
+     are live.  We do not try to optimize those cases.  */
+}
+
+/* STMT is a memory write where one or more bytes written are dead
+   stores.  ORIG is the bitmap of bytes stored by STMT.  LIVE is the
+   bitmap of stores that are actually live.
+
+   Attempt to rewrite STMT so that it writes fewer memory locations.  Right
+   now we only support trimming at the start or end of the memory region.
+   It's not clear how much there is to be gained by trimming from the middle
+   of the region.  */
+
+static void
+maybe_trim_partially_dead_store (ao_ref *ref, sbitmap live, gimple *stmt)
+{
+  if (is_gimple_assign (stmt))
+    {
+      switch (gimple_assign_rhs_code (stmt))
+	{
+	case COMPLEX_CST:
+	  maybe_trim_complex_store (ref, live, stmt);
+	  break;
+	default:
+	  break;
+	}
+    }
+}
 
 /* A helper of dse_optimize_stmt.
    Given a GIMPLE_ASSIGN in STMT that writes to REF, find a candidate
    statement *USE_STMT that may prove STMT to be dead.
    Return TRUE if the above conditions are met, otherwise FALSE.  */
 
-static bool
-dse_possible_dead_store_p (ao_ref *ref, gimple *stmt, gimple **use_stmt)
+static dse_store_status
+dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt,
+		    bool byte_tracking_enabled, sbitmap live_bytes)
 {
   gimple *temp;
   unsigned cnt = 0;
@@ -97,7 +326,7 @@ dse_possible_dead_store_p (ao_ref *ref, gimple *stmt, gimple **use_stmt)
       /* Limit stmt walking to be linear in the number of possibly
          dead stores.  */
       if (++cnt > 256)
-	return false;
+	return DSE_STORE_LIVE;
 
       if (gimple_code (temp) == GIMPLE_PHI)
 	defvar = PHI_RESULT (temp);
@@ -135,7 +364,7 @@ dse_possible_dead_store_p (ao_ref *ref, gimple *stmt, gimple **use_stmt)
 		  fail = true;
 		  BREAK_FROM_IMM_USE_STMT (ui);
 		}
-	      /* Do not consider the PHI as use if it dominates the 
+	      /* Do not consider the PHI as use if it dominates the
 	         stmt defining the virtual operand we are processing,
 		 we have processed it already in this case.  */
 	      if (gimple_bb (defvar_def) != gimple_bb (use_stmt)
@@ -164,7 +393,13 @@ dse_possible_dead_store_p (ao_ref *ref, gimple *stmt, gimple **use_stmt)
 	}
 
       if (fail)
-	return false;
+	{
+	  /* STMT might be partially dead and we may be able to reduce
+	     how many memory locations it stores into.  */
+	  if (byte_tracking_enabled && !gimple_clobber_p (stmt))
+	    return DSE_STORE_MAYBE_PARTIAL_DEAD;
+	  return DSE_STORE_LIVE;
+	}
 
       /* If we didn't find any definition this means the store is dead
          if it isn't a store to global reachable memory.  In this case
@@ -172,20 +407,100 @@ dse_possible_dead_store_p (ao_ref *ref, gimple *stmt, gimple **use_stmt)
       if (!temp)
 	{
 	  if (ref_may_alias_global_p (ref))
-	    return false;
+	    return DSE_STORE_LIVE;
 
 	  temp = stmt;
 	  break;
 	}
+
+      if (byte_tracking_enabled && temp)
+	clear_bytes_written_by (live_bytes, temp, ref);
     }
-  /* Continue walking until we reach a kill.  */
-  while (!stmt_kills_ref_p (temp, ref));
+  /* Continue walking until we reach a full kill as a single statement
+     or there are no more live bytes.  */
+  while (!stmt_kills_ref_p (temp, ref)
+	 && !(byte_tracking_enabled && bitmap_empty_p (live_bytes)));
 
   *use_stmt = temp;
+  return DSE_STORE_DEAD;
+}
+
+
+class dse_dom_walker : public dom_walker
+{
+public:
+  dse_dom_walker (cdi_direction direction) : dom_walker (direction)
+  { m_live_bytes = sbitmap_alloc (PARAM_VALUE (PARAM_DSE_MAX_OBJECT_SIZE));m_byte_tracking_enabled = false; }
+
+  ~dse_dom_walker () { sbitmap_free (m_live_bytes); }
+
+  virtual edge before_dom_children (basic_block);
 
-  return true;
+private:
+  sbitmap m_live_bytes;
+  bool m_byte_tracking_enabled;
+  void dse_optimize_stmt (gimple_stmt_iterator *);
+};
+
+/* Delete a dead store STMT, which is mem* call of some kind.  */
+static void
+delete_dead_call (gimple *stmt)
+{
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "  Deleted dead call: ");
+      print_gimple_stmt (dump_file, stmt, dump_flags, 0);
+      fprintf (dump_file, "\n");
+    }
+
+  tree lhs = gimple_call_lhs (stmt);
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  if (lhs)
+    {
+      tree ptr = gimple_call_arg (stmt, 0);
+      gimple *new_stmt = gimple_build_assign (lhs, ptr);
+      unlink_stmt_vdef (stmt);
+      if (gsi_replace (&gsi, new_stmt, true))
+        bitmap_set_bit (need_eh_cleanup, gimple_bb (stmt)->index);
+    }
+  else
+    {
+      /* Then we need to fix the operand of the consuming stmt.  */
+      unlink_stmt_vdef (stmt);
+
+      /* Remove the dead store.  */
+      if (gsi_remove (&gsi, true))
+	bitmap_set_bit (need_eh_cleanup, gimple_bb (stmt)->index);
+      release_defs (stmt);
+    }
 }
 
+/* Delete a dead store STMT, which is a gimple assignment. */
+
+static void
+delete_dead_assignment (gimple *stmt)
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "  Deleted dead store: ");
+      print_gimple_stmt (dump_file, stmt, dump_flags, 0);
+      fprintf (dump_file, "\n");
+    }
+
+  /* Then we need to fix the operand of the consuming stmt.  */
+  unlink_stmt_vdef (stmt);
+
+  /* Remove the dead store.  */
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  basic_block bb = gimple_bb (stmt);
+  if (gsi_remove (&gsi, true))
+    bitmap_set_bit (need_eh_cleanup, bb->index);
+
+  /* And release any SSA_NAMEs set in this statement back to the
+     SSA_NAME manager.  */
+  release_defs (stmt);
+}
 
 /* Attempt to eliminate dead stores in the statement referenced by BSI.
 
@@ -198,8 +513,8 @@ dse_possible_dead_store_p (ao_ref *ref, gimple *stmt, gimple **use_stmt)
    is used precisely once by a later store to the same location which
    post dominates the first store, then the first store is dead.  */
 
-static void
-dse_optimize_stmt (gimple_stmt_iterator *gsi)
+void
+dse_dom_walker::dse_optimize_stmt (gimple_stmt_iterator *gsi)
 {
   gimple *stmt = gsi_stmt (*gsi);
 
@@ -214,6 +529,10 @@ dse_optimize_stmt (gimple_stmt_iterator *gsi)
 	  || TREE_CODE (gimple_assign_lhs (stmt)) != MEM_REF))
     return;
 
+  ao_ref ref;
+  if (!initialize_ao_ref_for_dse (stmt, &ref))
+    return;
+
   /* We know we have virtual definitions.  We can handle assignments and
      some builtin calls.  */
   if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
@@ -225,42 +544,26 @@ dse_optimize_stmt (gimple_stmt_iterator *gsi)
 	  case BUILT_IN_MEMSET:
 	    {
 	      gimple *use_stmt;
-	      ao_ref ref;
-	      tree size = NULL_TREE;
-	      if (gimple_call_num_args (stmt) == 3)
-		size = gimple_call_arg (stmt, 2);
-	      tree ptr = gimple_call_arg (stmt, 0);
-	      ao_ref_init_from_ptr_and_size (&ref, ptr, size);
-	      if (!dse_possible_dead_store_p (&ref, stmt, &use_stmt))
+	      enum dse_store_status store_status;
+	      m_byte_tracking_enabled
+		= setup_live_bytes_from_ref (&ref, m_live_bytes);
+	      store_status = dse_classify_store (&ref, stmt, &use_stmt,
+						 m_byte_tracking_enabled,
+						 m_live_bytes);
+	      if (store_status == DSE_STORE_LIVE)
 		return;
 
-	      if (dump_file && (dump_flags & TDF_DETAILS))
-		{
-		  fprintf (dump_file, "  Deleted dead call: ");
-		  print_gimple_stmt (dump_file, gsi_stmt (*gsi), dump_flags, 0);
-		  fprintf (dump_file, "\n");
-		}
-
-	      tree lhs = gimple_call_lhs (stmt);
-	      if (lhs)
+	      if (store_status == DSE_STORE_MAYBE_PARTIAL_DEAD)
 		{
-		  gimple *new_stmt = gimple_build_assign (lhs, ptr);
-		  unlink_stmt_vdef (stmt);
-		  if (gsi_replace (gsi, new_stmt, true))
-		    bitmap_set_bit (need_eh_cleanup, gimple_bb (stmt)->index);
+		  maybe_trim_partially_dead_store (&ref, m_live_bytes, stmt);
+		  return;
 		}
-	      else
-		{
-		  /* Then we need to fix the operand of the consuming stmt.  */
-		  unlink_stmt_vdef (stmt);
 
-		  /* Remove the dead store.  */
-		  if (gsi_remove (gsi, true))
-		    bitmap_set_bit (need_eh_cleanup, gimple_bb (stmt)->index);
-		  release_defs (stmt);
-		}
-	      break;
+	      if (store_status == DSE_STORE_DEAD)
+		delete_dead_call (stmt);
+	      return;
 	    }
+
 	  default:
 	    return;
 	}
@@ -276,10 +579,20 @@ dse_optimize_stmt (gimple_stmt_iterator *gsi)
 	use_stmt = stmt;
       else
 	{
-	  ao_ref ref;
-	  ao_ref_init (&ref, gimple_assign_lhs (stmt));
-  	  if (!dse_possible_dead_store_p (&ref, stmt, &use_stmt))
+	  m_byte_tracking_enabled
+	    = setup_live_bytes_from_ref (&ref, m_live_bytes);
+	  enum dse_store_status store_status;
+	  store_status = dse_classify_store (&ref, stmt, &use_stmt,
+					     m_byte_tracking_enabled,
+					     m_live_bytes);
+	  if (store_status == DSE_STORE_LIVE)
 	    return;
+
+	  if (store_status == DSE_STORE_MAYBE_PARTIAL_DEAD)
+	    {
+	      maybe_trim_partially_dead_store (&ref, m_live_bytes, stmt);
+	      return;
+	    }
 	}
 
       /* Now we know that use_stmt kills the LHS of stmt.  */
@@ -290,35 +603,10 @@ dse_optimize_stmt (gimple_stmt_iterator *gsi)
 	  && !gimple_clobber_p (use_stmt))
 	return;
 
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	{
-	  fprintf (dump_file, "  Deleted dead store: ");
-	  print_gimple_stmt (dump_file, gsi_stmt (*gsi), dump_flags, 0);
-	  fprintf (dump_file, "\n");
-	}
-
-      /* Then we need to fix the operand of the consuming stmt.  */
-      unlink_stmt_vdef (stmt);
-
-      /* Remove the dead store.  */
-      basic_block bb = gimple_bb (stmt);
-      if (gsi_remove (gsi, true))
-	bitmap_set_bit (need_eh_cleanup, bb->index);
-
-      /* And release any SSA_NAMEs set in this statement back to the
-	 SSA_NAME manager.  */
-      release_defs (stmt);
+      delete_dead_assignment (stmt);
     }
 }
 
-class dse_dom_walker : public dom_walker
-{
-public:
-  dse_dom_walker (cdi_direction direction) : dom_walker (direction) {}
-
-  virtual edge before_dom_children (basic_block);
-};
-
 edge
 dse_dom_walker::before_dom_children (basic_block bb)
 {
@@ -391,7 +679,7 @@ pass_dse::execute (function *fun)
     }
 
   BITMAP_FREE (need_eh_cleanup);
-    
+
   /* For now, just wipe the post-dominator information.  */
   free_dominance_info (CDI_POST_DOMINATORS);
   return 0;

--- End Message ---
--- Begin Message ---
This is the second patch in the kit to improve our DSE implementation.

This patch recognizes when a CONSTRUCTOR assignment could be trimmed at the head or tail because those bytes are dead.

The first implementation of this turned the CONSTRUCTOR into a memset. This version actually rewrites the RHS and LHS of the CONSTRUCTOR assignment.

You'll note that the implementation computes head and tail trim counts, then masks them to an even byte count. We might even consider masking off the two low bits in the counts. This masking keeps higher alignments on the CONSTRUCTOR remnant which helps keep things efficient when the CONSTRUCTOR results in a memset call.

This patch hits a lot statically in GCC and the testsuite. There were hundreds of hits in each.

There may be some room for tuning. Trimming shouldn't ever result in poorer performance, but it may also not result in any measurable gain (it depends on how much gets trimmed relative to the size of the CONSTRUCTOR node and how the CONSTRUCTOR node gets expanded, the processor's capabilities for merging stores internally, etc etc). I suspect the main benefit comes when the CONSTRUCTOR collapses down to some thing small that gets expanded inline, thus exposing the internals to the rest of the optimization pipeline.

We could, in theory, split the CONSTRUCTOR to pick up dead bytes in the middle of the CONSTRUCTOR. I haven't looked to see how applicable that is in real code and what the cost/benefit analysis might look like.


Bootstrapped and regression tested on x86_64-linux-gnu.  OK for the trunk?
	PR tree-optimization/61912
	PR tree-optimization/77485
	* tree-sra.h: New file.
	* ipa-cp.c: Include tree-sra.h
	* ipa-prop.h (build_ref_for_offset): Remove prototype.
	* tree-ssa-dse.c: Include expr.h and tree-sra.h.
	(compute_trims, trim_constructor_store): New functions.
	(maybe_trim_partially_dead_store): Call trim_constructor_store.
	

	* g++.dg/tree-ssa/ssa-dse-1.C: New test.
	* gcc.dg/tree-ssa/pr30375: Adjust expected output.
	* gcc.dg/tree-ssa/ssa-dse-24.c: New test.

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index d3b5052..bc5ea87 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -122,6 +122,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-inline.h"
 #include "ipa-utils.h"
 #include "tree-ssa-ccp.h"
+#include "tree-sra.h"
 
 template <typename valtype> class ipcp_value;
 
diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
index 0e75cf4..6d7b480 100644
--- a/gcc/ipa-prop.h
+++ b/gcc/ipa-prop.h
@@ -820,10 +820,6 @@ ipa_parm_adjustment *ipa_get_adjustment_candidate (tree **, bool *,
 void ipa_release_body_info (struct ipa_func_body_info *);
 tree ipa_get_callee_param_type (struct cgraph_edge *e, int i);
 
-/* From tree-sra.c:  */
-tree build_ref_for_offset (location_t, tree, HOST_WIDE_INT, bool, tree,
-			   gimple_stmt_iterator *, bool);
-
 /* In ipa-cp.c  */
 void ipa_cp_c_finalize (void);
 
diff --git a/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-1.C b/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-1.C
new file mode 100644
index 0000000..3f85f3a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-1.C
@@ -0,0 +1,101 @@
+/* { dg-do compile } */
+/* { dg-options "-std=c++14 -O -fdump-tree-dse1-details" } */
+
+using uint = unsigned int;
+
+template<typename C, uint S>
+struct FixBuf
+{
+	C buf[S] = {};
+};
+
+template<typename C>
+struct OutBuf
+{
+	C*	cur;
+	C*	end;
+	C*	beg;
+
+	template<uint S>
+	constexpr
+	OutBuf(FixBuf<C, S>& b) : cur{b.buf}, end{b.buf + S}, beg{b.buf} { }
+
+	OutBuf(C* b, C* e) : cur{b}, end{e} { }
+	OutBuf(C* b, uint s) : cur{b}, end{b + s} { }
+
+	constexpr
+	OutBuf& operator<<(C v)
+	{
+		if (cur < end) {
+			*cur = v;
+		}
+		++cur;
+		return *this;
+	}
+
+	constexpr
+	OutBuf& operator<<(uint v)
+	{
+		uint q = v / 10U;
+		uint r = v % 10U;
+		if (q) {
+			*this << q;
+		}
+		*this << static_cast<C>(r + '0');
+		return *this;
+	}
+};
+
+template<bool BOS>
+struct BufOrSize
+{
+	template<typename C, uint S>
+	static constexpr auto Select(FixBuf<C, S>& fb, OutBuf<C>&)
+	{
+		return fb;
+	}
+};
+
+template<>
+struct BufOrSize<true>
+{
+	template<typename C, uint S>
+	static constexpr auto Select(FixBuf<C, S>&, OutBuf<C>& ob)
+	{
+		return ob.cur - ob.beg;
+	}
+};
+
+// if BOS=1, it will return the size of the generated data, else the data itself
+template<uint N, uint S, bool BOS = 0>
+constexpr
+auto fixbuf()
+{
+	FixBuf<char, S> fb;
+	OutBuf<char> ob{fb};
+	for (uint i = 0; i <= N; ++i) {
+		ob << i << static_cast<char>(i == N ? 0 : ' ');
+	}
+	return BufOrSize<BOS>::Select(fb, ob);
+}
+
+auto foo()
+{
+	constexpr auto x = fixbuf<13, 200>();
+	return x;
+}
+
+auto foo_sized()
+{
+	constexpr auto s = fixbuf<13, 0, 1>();
+	constexpr auto x = fixbuf<13, s>();
+	return x;
+}
+
+int main()
+{
+}
+
+
+/* { dg-final { scan-tree-dump-times "MEM\\\[\\(char\\\[\[0-9\]+\\\] \\*\\)&<retval> \\+ \[0-9\]+B\\\] = {}" 1 "dse1" } } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr30375.c b/gcc/testsuite/gcc.dg/tree-ssa/pr30375.c
index 0439b1c..cf0a93b 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr30375.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr30375.c
@@ -22,4 +22,5 @@ void test_signed_msg_encoding(void)
     f();
 }
 
-/* { dg-final { scan-tree-dump-times "signInfo = {}" 1 "dse1" } } */
+/* { dg-final { scan-tree-dump-times "MEM\\\[\\(char\\\[\[0-9\]+\\\] \\*\\)&signInfo \\+ \[0-9\]+B\\\] = {}" 1 "dse1" } } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-24.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-24.c
new file mode 100644
index 0000000..80a35ca
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-24.c
@@ -0,0 +1,62 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dse1" } */
+
+
+typedef unsigned int wchar_t;
+struct printf_info
+{
+  int prec;
+  int width;
+  wchar_t spec;
+  unsigned int is_long_double:1;
+  unsigned int is_short:1;
+  unsigned int is_long:1;
+  unsigned int alt:1;
+  unsigned int space:1;
+  unsigned int left:1;
+  unsigned int showsign:1;
+  unsigned int group:1;
+  unsigned int extra:1;
+  unsigned int is_char:1;
+  unsigned int wide:1;
+  unsigned int i18n:1;
+  unsigned int __pad:4;
+  unsigned short int user;
+  wchar_t pad;
+} info;
+
+void bar (struct printf_info *);
+
+void foo(int prec,
+  int width,
+  wchar_t spec,
+  unsigned int is_long_double,
+  unsigned int is_short,
+  unsigned int is_long,
+  unsigned int alt,
+  unsigned int space,
+  unsigned int left,
+  unsigned int showsign,
+  unsigned int group,
+  wchar_t pad)
+{
+    struct printf_info info = {
+        .prec = prec,
+        .width = width,
+        .spec = spec,
+        .is_long_double = is_long_double,
+        .is_short = is_short,
+        .is_long = is_long,
+        .alt = alt,
+        .space = space,
+        .left = left,
+        .showsign = showsign,
+        .group = group,
+        .pad = pad,
+        .extra = 0,
+        .wide = sizeof (char) != 1 };
+
+    bar (&info);
+}
+
+/* { dg-final { scan-tree-dump-times "MEM\\\[\\(char\\\[\[0-9\]+\\\] \\*\\)&info \\+ \[0-9\]+B\\\] = {}" 1 "dse1" } } */
diff --git a/gcc/tree-sra.h b/gcc/tree-sra.h
new file mode 100644
index 0000000..6d88ece
--- /dev/null
+++ b/gcc/tree-sra.h
@@ -0,0 +1,27 @@
+/* Header file for SRA helper functions
+   Copyright (C) 2016 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_TREE_SRA_H
+#define GCC_TREE_SRA_H
+
+tree build_ref_for_offset (location_t, tree, HOST_WIDE_INT, bool, tree,
+			   gimple_stmt_iterator *, bool);
+
+
+#endif /* GCC_TREE_SRA_H */
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 40acd83..4757702 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -34,6 +34,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "domwalk.h"
 #include "tree-cfgcleanup.h"
 #include "params.h"
+#include "expr.h"
+#include "tree-sra.h"
 
 /* This file implements dead store elimination.
 
@@ -272,6 +274,68 @@ maybe_trim_complex_store (ao_ref *ref, sbitmap live, gimple *stmt)
      are live.  We do not try to optimize those cases.  */
 }
 
+/* STMT initializes an object using a CONSTRUCTOR where one or more of the
+   bytes written are dead stores.  ORIG is the bitmap of bytes stored by
+   STMT.  LIVE is the bitmap of stores that are actually live.
+
+   Attempt to rewrite STMT so that only the real or imaginary part of
+   the object is actually stored.
+
+   The most common case for getting here is a CONSTRUCTOR with no elements
+   being used to zero initialize an object.  We do not try to handle other
+   cases as those would force us to fully cover the object with the
+   CONSTRUCTOR node except for the components that are dead.  */
+
+static void
+trim_constructor_store (ao_ref *ref, sbitmap live, gimple *stmt)
+{
+  tree ctor = gimple_assign_rhs1 (stmt);
+
+  HOST_WIDE_INT nz_elts, init_elts;
+  bool complete;
+  categorize_ctor_elements (ctor, &nz_elts, &init_elts, &complete);
+
+  /* This is the only case we currently handle.  It actually seems to
+     catch most cases of actual interest.  */
+  if (init_elts == 0 && !CONSTRUCTOR_NO_CLEARING (ctor))
+    {
+      int head_trim = 0;
+      int tail_trim = 0;
+      compute_trims (ref, live, &head_trim, &tail_trim);
+
+      /* Now we want to replace the constructor initializer
+	 with memset (object + head_trim, 0, size - head_trim - tail_trim).  */
+      if (head_trim || tail_trim)
+	{
+	  /* We want &lhs for the MEM_REF expression.  */
+	  tree lhs_addr = build_fold_addr_expr (gimple_assign_lhs (stmt));
+
+	  /* The number of bytes for the new constructor.  */
+	  int count = (ref->size / BITS_PER_UNIT) - head_trim - tail_trim - 1;
+
+	  /* And the new type for the CONSTRUCTOR.  Essentially it's just
+	     a char array large enough to cover the non-trimmed parts of
+	     the original CONSTRUCTOR.  Note we want explicit bounds here
+	     so that we know how many bytes to clear when expanding the
+	     CONSTRUCTOR.  */
+	  tree type = build_range_type (sizetype, size_zero_node,
+					build_int_cst (sizetype, count));
+	  type = build_array_type (char_type_node, type);
+
+
+	  /* Build a MEM_REF representing the whole accessed area, starting
+	     at the first byte not trimmed.  */
+	  tree exp = fold_build2 (MEM_REF, type, lhs_addr,
+				  build_int_cst (build_pointer_type (type),
+						 head_trim));
+
+	  /* Now update STMT with a new RHS and LHS.  */
+	  gimple_assign_set_lhs (stmt, exp);
+	  gimple_assign_set_rhs1 (stmt, build_constructor (type, NULL));
+	}
+    }
+}
+
 /* STMT is a memory write where one or more bytes written are dead
    stores.  ORIG is the bitmap of bytes stored by STMT.  LIVE is the
    bitmap of stores that are actually live.
@@ -288,6 +352,9 @@ maybe_trim_partially_dead_store (ao_ref *ref, sbitmap live, gimple *stmt)
     {
       switch (gimple_assign_rhs_code (stmt))
 	{
+	case CONSTRUCTOR:
+	  trim_constructor_store (ref, live, stmt);
+	  break;
 	case COMPLEX_CST:
 	  maybe_trim_complex_store (ref, live, stmt);
 	  break;

--- End Message ---

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]