This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix store merging (PR tree-optimization/84503)


On February 21, 2018 11:28:36 PM GMT+01:00, Jakub Jelinek <jakub@redhat.com> wrote:
>Hi!
>
>The long comment above the new check_no_overlap function below should
>hopefully explain the problem.  coalesce_immediate_stores is splitting
>the
>stores from the same base into groups that we are going to optimize
>individually; for each successfully merged group we emit new stores at
>the
>location of the last store from that group.  And
>coalesce_immediate_stores
>processes stores ordered primarily by increasing bit position and only
>secondarily by the original ordering in the basic block.
>On the following testcases, we have the unfortunate ordering of:
>     MEM[(long long int *)p_28] = 0;
>     MEM[(long long int *)p_28 + 8B] = 0;
>     MEM[(long long int *)p_28 + 16B] = 0;
>     MEM[(long long int *)p_28 + 24B] = 0;
>     _129 = (int) _130;
>     MEM[(int *)p_28 + 8B] = _129;
>     MEM[(int *)p_28].a = -1;
>We already have
>     MEM[(long long int *)p_28] = 0;
>     MEM[(int *)p_28].a = -1;
>in the first group (which is fine) and the following 2 stores to
>consider
>are
>     MEM[(long long int *)p_28 + 8B] = 0;
>and
>     MEM[(int *)p_28 + 8B] = _129;
>We add the first store to the group, which has the effect of emitting
>the
>merged stores at the location of former MEM[(int *)p_28].a = -1;
>store.  Then we process MEM[(int *)p_28 + 8B] = _129; (which was
>handled
>just as potential bswap possibility and as it overlaps with previous
>and can't be merged with next either, it will be its own group and thus
>reuse the original stmt; the net effect is that we've reordered the
>MEM[(long long int *)p_28 + 8B] = 0; store from before the
>MEM[(int *)p_28 + 8B] = _129; store to after it, but those two do
>alias.
>
>Instead of detecting this situation late, where we could just throw
>away the
>whole group (which would mean we couldn't merge even the
>MEM[(long long int *)p_28] = 0; and MEM[(int *)p_28].a = -1; stores)
>this patch attempts to check before calling merge_into whether it will
>not overlap with later stores we won't be able to emit in the same
>group
>and if it does, it terminates the group right away.
>
>I had to fix also the merged_store_group::merge_into width computation,
>it was mistakenly just counting sum of the bits we've added to the
>group,
>but we really need the distance between the first bit in the group and
>last
>bit, including any gaps in between, but exclusing gaps before the start
>and
>after the end (still within the bitregion).
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK. 

Richard. 

>For 7.x I think we'll need something along the lines of PR82916
>instead,
>while the same testcase fails there, we only handle stores with lhs
>being
>a constant and so the problem is that we aren't terminating the chain
>when
>we should on the variable store.
>
>2018-02-21  Jakub Jelinek  <jakub@redhat.com>
>
>	PR tree-optimization/84503
>	* gimple-ssa-store-merging.c (merged_store_group::merge_into): Compute
>	width as info->bitpos + info->bitsize - start.
>	(merged_store_group::merge_overlapping): Simplify width computation.
>	(check_no_overlap): New function.
>	(imm_store_chain_info::try_coalesce_bswap): Compute expected
>	start + width and last_order of the group, fail if check_no_overlap
>	fails.
>	(imm_store_chain_info::coalesce_immediate_stores): Don't merge info
>	to group if check_no_overlap fails.
>
>	* gcc.dg/pr84503-1.c: New test.
>	* gcc.dg/pr84503-2.c: New test.
>
>--- gcc/gimple-ssa-store-merging.c.jj	2018-01-16 09:52:26.230235131
>+0100
>+++ gcc/gimple-ssa-store-merging.c	2018-02-21 20:13:45.239129599 +0100
>@@ -1891,12 +1891,11 @@ merged_store_group::do_merge (store_imme
> void
> merged_store_group::merge_into (store_immediate_info *info)
> {
>-  unsigned HOST_WIDE_INT wid = info->bitsize;
>/* Make sure we're inserting in the position we think we're inserting. 
>*/
>   gcc_assert (info->bitpos >= start + width
> 	      && info->bitregion_start <= bitregion_end);
> 
>-  width += wid;
>+  width = info->bitpos + info->bitsize - start;
>   do_merge (info);
> }
> 
>@@ -1909,7 +1908,7 @@ merged_store_group::merge_overlapping (s
> {
>   /* If the store extends the size of the group, extend the width.  */
>   if (info->bitpos + info->bitsize > start + width)
>-    width += info->bitpos + info->bitsize - (start + width);
>+    width = info->bitpos + info->bitsize - start;
> 
>   do_merge (info);
> }
>@@ -2304,6 +2303,55 @@ gather_bswap_load_refs (vec<tree> *refs,
>     }
> }
> 
>+/* Check if there are any stores in M_STORE_INFO after index I
>+   (where M_STORE_INFO must be sorted by sort_by_bitpos) that overlap
>+   a potential group ending with END that have their order
>+   smaller than LAST_ORDER.  RHS_CODE is the kind of store in the
>+   group.  Return true if there are no such stores.
>+   Consider:
>+     MEM[(long long int *)p_28] = 0;
>+     MEM[(long long int *)p_28 + 8B] = 0;
>+     MEM[(long long int *)p_28 + 16B] = 0;
>+     MEM[(long long int *)p_28 + 24B] = 0;
>+     _129 = (int) _130;
>+     MEM[(int *)p_28 + 8B] = _129;
>+     MEM[(int *)p_28].a = -1;
>+   We already have
>+     MEM[(long long int *)p_28] = 0;
>+     MEM[(int *)p_28].a = -1;
>+   stmts in the current group and need to consider if it is safe to
>+   add MEM[(long long int *)p_28 + 8B] = 0; store into the same group.
>+   There is an overlap between that store and the MEM[(int *)p_28 +
>8B] = _129;
>+   store though, so if we add the MEM[(long long int *)p_28 + 8B] = 0;
>+   into the group and merging of those 3 stores is successful, merged
>+   stmts will be emitted at the latest store from that group, i.e.
>+   LAST_ORDER, which is the MEM[(int *)p_28].a = -1; store.
>+   The MEM[(int *)p_28 + 8B] = _129; store that originally follows
>+   the MEM[(long long int *)p_28 + 8B] = 0; would now be before it,
>+   so we need to refuse merging MEM[(long long int *)p_28 + 8B] = 0;
>+   into the group.  That way it will be its own store group and will
>+   not be touched.  If RHS_CODE is INTEGER_CST and there are
>overlapping
>+   INTEGER_CST stores, those are mergeable using merge_overlapping,
>+   so don't return false for those.  */
>+
>+static bool
>+check_no_overlap (vec<store_immediate_info *> m_store_info, unsigned
>int i,
>+		  enum tree_code rhs_code, unsigned int last_order,
>+		  unsigned HOST_WIDE_INT end)
>+{
>+  unsigned int len = m_store_info.length ();
>+  for (++i; i < len; ++i)
>+    {
>+      store_immediate_info *info = m_store_info[i];
>+      if (info->bitpos >= end)
>+	break;
>+      if (info->order < last_order
>+	  && (rhs_code != INTEGER_CST || info->rhs_code != INTEGER_CST))
>+	return false;
>+    }
>+  return true;
>+}
>+
> /* Return true if m_store_info[first] and at least one following store
>  form a group which store try_size bitsize value which is byte swapped
>    from a memory load or some value, or identity from some value.
>@@ -2375,6 +2423,7 @@ imm_store_chain_info::try_coalesce_bswap
>   unsigned int last_order = merged_store->last_order;
>   gimple *first_stmt = merged_store->first_stmt;
>   gimple *last_stmt = merged_store->last_stmt;
>+  unsigned HOST_WIDE_INT end = merged_store->start +
>merged_store->width;
>   store_immediate_info *infof = m_store_info[first];
> 
>   for (unsigned int i = first; i <= last; ++i)
>@@ -2413,25 +2462,23 @@ imm_store_chain_info::try_coalesce_bswap
> 	}
>       else
> 	{
>-	  if (n.base_addr)
>+	  if (n.base_addr && n.vuse != this_n.vuse)
> 	    {
>-	      if (n.vuse != this_n.vuse)
>-		{
>-		  if (vuse_store == 0)
>-		    return false;
>-		  vuse_store = 1;
>-		}
>-	      if (info->order > last_order)
>-		{
>-		  last_order = info->order;
>-		  last_stmt = info->stmt;
>-		}
>-	      else if (info->order < first_order)
>-		{
>-		  first_order = info->order;
>-		  first_stmt = info->stmt;
>-		}
>+	      if (vuse_store == 0)
>+		return false;
>+	      vuse_store = 1;
> 	    }
>+	  if (info->order > last_order)
>+	    {
>+	      last_order = info->order;
>+	      last_stmt = info->stmt;
>+	    }
>+	  else if (info->order < first_order)
>+	    {
>+	      first_order = info->order;
>+	      first_stmt = info->stmt;
>+	    }
>+	  end = MAX (end, info->bitpos + info->bitsize);
> 
> 	  ins_stmt = perform_symbolic_merge (ins_stmt, &n, info->ins_stmt,
> 					     &this_n, &n);
>@@ -2452,6 +2499,9 @@ imm_store_chain_info::try_coalesce_bswap
>   if (n.base_addr == NULL_TREE && !is_gimple_val (n.src))
>     return false;
> 
>+  if (!check_no_overlap (m_store_info, last, LROTATE_EXPR, last_order,
>end))
>+    return false;
>+
>   /* Don't handle memory copy this way if normal non-bswap processing
>      would handle it too.  */
>   if (n.n == cmpnop && (unsigned) n.n_ops == last - first + 1)
>@@ -2633,7 +2683,13 @@ imm_store_chain_info::coalesce_immediate
> 	       : !info->ops[0].base_addr)
> 	      && (infof->ops[1].base_addr
> 		  ? compatible_load_p (merged_store, info, base_addr, 1)
>-		  : !info->ops[1].base_addr))
>+		  : !info->ops[1].base_addr)
>+	      && check_no_overlap (m_store_info, i, info->rhs_code,
>+				   MAX (merged_store->last_order,
>+					info->order),
>+				   MAX (merged_store->start
>+					+ merged_store->width,
>+					info->bitpos + info->bitsize)))
> 	    {
> 	      merged_store->merge_into (info);
> 	      continue;
>--- gcc/testsuite/gcc.dg/pr84503-1.c.jj	2018-02-21 20:36:33.474226039
>+0100
>+++ gcc/testsuite/gcc.dg/pr84503-1.c	2018-02-21 20:20:53.144165116
>+0100
>@@ -0,0 +1,68 @@
>+/* PR tree-optimization/84503 */
>+/* { dg-do run } */
>+/* { dg-options "-O3" } */
>+
>+typedef __SIZE_TYPE__ size_t;
>+typedef __UINTPTR_TYPE__ uintptr_t;
>+
>+struct S { int a; unsigned short b; int c, d, e; long f, g, h; int i,
>j; };
>+static struct S *k;
>+static size_t l = 0;
>+int m;
>+
>+static int
>+bar (void)
>+{
>+  unsigned i;
>+  int j;
>+  if (k[0].c == 0)
>+    {
>+      ++m;
>+      size_t n = l * 2;
>+      struct S *o;
>+      o = (struct S *) __builtin_realloc (k, sizeof (struct S) * n);
>+      if (!o)
>+	__builtin_exit (0);
>+      k = o;
>+      for (i = l; i < n; i++)
>+	{
>+	  void *p = (void *) &k[i];
>+	  int q = 0;
>+	  size_t r = sizeof (struct S);
>+	  if ((((uintptr_t) p) % __alignof__ (long)) == 0
>+	      && r % sizeof (long) == 0)
>+	    {
>+	      long __attribute__ ((may_alias)) *s = (long *) p;
>+	      long *t = (long *) ((char *) s + r);
>+	      while (s < t)
>+		*s++ = 0;
>+	    }
>+	  else
>+	    __builtin_memset (p, q, r);
>+	  k[i].c = i + 1;
>+	  k[i].a = -1;
>+	}
>+      k[n - 1].c = 0;
>+      k[0].c = l;
>+      l = n;
>+    }
>+  j = k[0].c;
>+  k[0].c = k[j].c;
>+  return j;
>+}
>+
>+int
>+main ()
>+{
>+  k = (struct S *) __builtin_malloc (sizeof (struct S));
>+  if (!k)
>+    __builtin_exit (0);
>+  __builtin_memset (k, '\0', sizeof (struct S));
>+  k->a = -1;
>+  l = 1;
>+  for (int i = 0; i < 15; ++i)
>+    bar ();
>+  if (m != 4)
>+    __builtin_abort ();
>+  return 0;
>+}
>--- gcc/testsuite/gcc.dg/pr84503-2.c.jj	2018-02-21 20:36:41.874226585
>+0100
>+++ gcc/testsuite/gcc.dg/pr84503-2.c	2018-02-21 20:36:59.875227757
>+0100
>@@ -0,0 +1,5 @@
>+/* PR tree-optimization/84503 */
>+/* { dg-do run } */
>+/* { dg-options "-O3 -fno-tree-vectorize -fno-ivopts" } */
>+
>+#include "pr84503-1.c"
>
>	Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]