This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Fix store merging (PR tree-optimization/84503)
- From: Richard Biener <rguenther at suse dot de>
- To: Jakub Jelinek <jakub at redhat dot com>,Jeff Law <law at redhat dot com>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Thu, 22 Feb 2018 08:42:43 +0100
- Subject: Re: [PATCH] Fix store merging (PR tree-optimization/84503)
- Authentication-results: sourceware.org; auth=none
- References: <20180221222836.GS5867@tucnak>
On February 21, 2018 11:28:36 PM GMT+01:00, Jakub Jelinek <jakub@redhat.com> wrote:
>Hi!
>
>The long comment above the new check_no_overlap function below should
>hopefully explain the problem. coalesce_immediate_stores is splitting
>the
>stores from the same base into groups that we are going to optimize
>individually; for each successfully merged group we emit new stores at
>the
>location of the last store from that group. And
>coalesce_immediate_stores
>processes stores ordered primarily by increasing bit position and only
>secondarily by the original ordering in the basic block.
>On the following testcases, we have the unfortunate ordering of:
> MEM[(long long int *)p_28] = 0;
> MEM[(long long int *)p_28 + 8B] = 0;
> MEM[(long long int *)p_28 + 16B] = 0;
> MEM[(long long int *)p_28 + 24B] = 0;
> _129 = (int) _130;
> MEM[(int *)p_28 + 8B] = _129;
> MEM[(int *)p_28].a = -1;
>We already have
> MEM[(long long int *)p_28] = 0;
> MEM[(int *)p_28].a = -1;
>in the first group (which is fine) and the following 2 stores to
>consider
>are
> MEM[(long long int *)p_28 + 8B] = 0;
>and
> MEM[(int *)p_28 + 8B] = _129;
>We add the first store to the group, which has the effect of emitting
>the
>merged stores at the location of former MEM[(int *)p_28].a = -1;
>store. Then we process MEM[(int *)p_28 + 8B] = _129; (which was
>handled
>just as potential bswap possibility and as it overlaps with previous
>and can't be merged with next either, it will be its own group and thus
>reuse the original stmt; the net effect is that we've reordered the
>MEM[(long long int *)p_28 + 8B] = 0; store from before the
>MEM[(int *)p_28 + 8B] = _129; store to after it, but those two do
>alias.
>
>Instead of detecting this situation late, where we could just throw
>away the
>whole group (which would mean we couldn't merge even the
>MEM[(long long int *)p_28] = 0; and MEM[(int *)p_28].a = -1; stores)
>this patch attempts to check before calling merge_into whether it will
>not overlap with later stores we won't be able to emit in the same
>group
>and if it does, it terminates the group right away.
>
>I had to fix also the merged_store_group::merge_into width computation,
>it was mistakenly just counting sum of the bits we've added to the
>group,
>but we really need the distance between the first bit in the group and
>last
>bit, including any gaps in between, but exclusing gaps before the start
>and
>after the end (still within the bitregion).
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK.
Richard.
>For 7.x I think we'll need something along the lines of PR82916
>instead,
>while the same testcase fails there, we only handle stores with lhs
>being
>a constant and so the problem is that we aren't terminating the chain
>when
>we should on the variable store.
>
>2018-02-21 Jakub Jelinek <jakub@redhat.com>
>
> PR tree-optimization/84503
> * gimple-ssa-store-merging.c (merged_store_group::merge_into): Compute
> width as info->bitpos + info->bitsize - start.
> (merged_store_group::merge_overlapping): Simplify width computation.
> (check_no_overlap): New function.
> (imm_store_chain_info::try_coalesce_bswap): Compute expected
> start + width and last_order of the group, fail if check_no_overlap
> fails.
> (imm_store_chain_info::coalesce_immediate_stores): Don't merge info
> to group if check_no_overlap fails.
>
> * gcc.dg/pr84503-1.c: New test.
> * gcc.dg/pr84503-2.c: New test.
>
>--- gcc/gimple-ssa-store-merging.c.jj 2018-01-16 09:52:26.230235131
>+0100
>+++ gcc/gimple-ssa-store-merging.c 2018-02-21 20:13:45.239129599 +0100
>@@ -1891,12 +1891,11 @@ merged_store_group::do_merge (store_imme
> void
> merged_store_group::merge_into (store_immediate_info *info)
> {
>- unsigned HOST_WIDE_INT wid = info->bitsize;
>/* Make sure we're inserting in the position we think we're inserting.
>*/
> gcc_assert (info->bitpos >= start + width
> && info->bitregion_start <= bitregion_end);
>
>- width += wid;
>+ width = info->bitpos + info->bitsize - start;
> do_merge (info);
> }
>
>@@ -1909,7 +1908,7 @@ merged_store_group::merge_overlapping (s
> {
> /* If the store extends the size of the group, extend the width. */
> if (info->bitpos + info->bitsize > start + width)
>- width += info->bitpos + info->bitsize - (start + width);
>+ width = info->bitpos + info->bitsize - start;
>
> do_merge (info);
> }
>@@ -2304,6 +2303,55 @@ gather_bswap_load_refs (vec<tree> *refs,
> }
> }
>
>+/* Check if there are any stores in M_STORE_INFO after index I
>+ (where M_STORE_INFO must be sorted by sort_by_bitpos) that overlap
>+ a potential group ending with END that have their order
>+ smaller than LAST_ORDER. RHS_CODE is the kind of store in the
>+ group. Return true if there are no such stores.
>+ Consider:
>+ MEM[(long long int *)p_28] = 0;
>+ MEM[(long long int *)p_28 + 8B] = 0;
>+ MEM[(long long int *)p_28 + 16B] = 0;
>+ MEM[(long long int *)p_28 + 24B] = 0;
>+ _129 = (int) _130;
>+ MEM[(int *)p_28 + 8B] = _129;
>+ MEM[(int *)p_28].a = -1;
>+ We already have
>+ MEM[(long long int *)p_28] = 0;
>+ MEM[(int *)p_28].a = -1;
>+ stmts in the current group and need to consider if it is safe to
>+ add MEM[(long long int *)p_28 + 8B] = 0; store into the same group.
>+ There is an overlap between that store and the MEM[(int *)p_28 +
>8B] = _129;
>+ store though, so if we add the MEM[(long long int *)p_28 + 8B] = 0;
>+ into the group and merging of those 3 stores is successful, merged
>+ stmts will be emitted at the latest store from that group, i.e.
>+ LAST_ORDER, which is the MEM[(int *)p_28].a = -1; store.
>+ The MEM[(int *)p_28 + 8B] = _129; store that originally follows
>+ the MEM[(long long int *)p_28 + 8B] = 0; would now be before it,
>+ so we need to refuse merging MEM[(long long int *)p_28 + 8B] = 0;
>+ into the group. That way it will be its own store group and will
>+ not be touched. If RHS_CODE is INTEGER_CST and there are
>overlapping
>+ INTEGER_CST stores, those are mergeable using merge_overlapping,
>+ so don't return false for those. */
>+
>+static bool
>+check_no_overlap (vec<store_immediate_info *> m_store_info, unsigned
>int i,
>+ enum tree_code rhs_code, unsigned int last_order,
>+ unsigned HOST_WIDE_INT end)
>+{
>+ unsigned int len = m_store_info.length ();
>+ for (++i; i < len; ++i)
>+ {
>+ store_immediate_info *info = m_store_info[i];
>+ if (info->bitpos >= end)
>+ break;
>+ if (info->order < last_order
>+ && (rhs_code != INTEGER_CST || info->rhs_code != INTEGER_CST))
>+ return false;
>+ }
>+ return true;
>+}
>+
> /* Return true if m_store_info[first] and at least one following store
> form a group which store try_size bitsize value which is byte swapped
> from a memory load or some value, or identity from some value.
>@@ -2375,6 +2423,7 @@ imm_store_chain_info::try_coalesce_bswap
> unsigned int last_order = merged_store->last_order;
> gimple *first_stmt = merged_store->first_stmt;
> gimple *last_stmt = merged_store->last_stmt;
>+ unsigned HOST_WIDE_INT end = merged_store->start +
>merged_store->width;
> store_immediate_info *infof = m_store_info[first];
>
> for (unsigned int i = first; i <= last; ++i)
>@@ -2413,25 +2462,23 @@ imm_store_chain_info::try_coalesce_bswap
> }
> else
> {
>- if (n.base_addr)
>+ if (n.base_addr && n.vuse != this_n.vuse)
> {
>- if (n.vuse != this_n.vuse)
>- {
>- if (vuse_store == 0)
>- return false;
>- vuse_store = 1;
>- }
>- if (info->order > last_order)
>- {
>- last_order = info->order;
>- last_stmt = info->stmt;
>- }
>- else if (info->order < first_order)
>- {
>- first_order = info->order;
>- first_stmt = info->stmt;
>- }
>+ if (vuse_store == 0)
>+ return false;
>+ vuse_store = 1;
> }
>+ if (info->order > last_order)
>+ {
>+ last_order = info->order;
>+ last_stmt = info->stmt;
>+ }
>+ else if (info->order < first_order)
>+ {
>+ first_order = info->order;
>+ first_stmt = info->stmt;
>+ }
>+ end = MAX (end, info->bitpos + info->bitsize);
>
> ins_stmt = perform_symbolic_merge (ins_stmt, &n, info->ins_stmt,
> &this_n, &n);
>@@ -2452,6 +2499,9 @@ imm_store_chain_info::try_coalesce_bswap
> if (n.base_addr == NULL_TREE && !is_gimple_val (n.src))
> return false;
>
>+ if (!check_no_overlap (m_store_info, last, LROTATE_EXPR, last_order,
>end))
>+ return false;
>+
> /* Don't handle memory copy this way if normal non-bswap processing
> would handle it too. */
> if (n.n == cmpnop && (unsigned) n.n_ops == last - first + 1)
>@@ -2633,7 +2683,13 @@ imm_store_chain_info::coalesce_immediate
> : !info->ops[0].base_addr)
> && (infof->ops[1].base_addr
> ? compatible_load_p (merged_store, info, base_addr, 1)
>- : !info->ops[1].base_addr))
>+ : !info->ops[1].base_addr)
>+ && check_no_overlap (m_store_info, i, info->rhs_code,
>+ MAX (merged_store->last_order,
>+ info->order),
>+ MAX (merged_store->start
>+ + merged_store->width,
>+ info->bitpos + info->bitsize)))
> {
> merged_store->merge_into (info);
> continue;
>--- gcc/testsuite/gcc.dg/pr84503-1.c.jj 2018-02-21 20:36:33.474226039
>+0100
>+++ gcc/testsuite/gcc.dg/pr84503-1.c 2018-02-21 20:20:53.144165116
>+0100
>@@ -0,0 +1,68 @@
>+/* PR tree-optimization/84503 */
>+/* { dg-do run } */
>+/* { dg-options "-O3" } */
>+
>+typedef __SIZE_TYPE__ size_t;
>+typedef __UINTPTR_TYPE__ uintptr_t;
>+
>+struct S { int a; unsigned short b; int c, d, e; long f, g, h; int i,
>j; };
>+static struct S *k;
>+static size_t l = 0;
>+int m;
>+
>+static int
>+bar (void)
>+{
>+ unsigned i;
>+ int j;
>+ if (k[0].c == 0)
>+ {
>+ ++m;
>+ size_t n = l * 2;
>+ struct S *o;
>+ o = (struct S *) __builtin_realloc (k, sizeof (struct S) * n);
>+ if (!o)
>+ __builtin_exit (0);
>+ k = o;
>+ for (i = l; i < n; i++)
>+ {
>+ void *p = (void *) &k[i];
>+ int q = 0;
>+ size_t r = sizeof (struct S);
>+ if ((((uintptr_t) p) % __alignof__ (long)) == 0
>+ && r % sizeof (long) == 0)
>+ {
>+ long __attribute__ ((may_alias)) *s = (long *) p;
>+ long *t = (long *) ((char *) s + r);
>+ while (s < t)
>+ *s++ = 0;
>+ }
>+ else
>+ __builtin_memset (p, q, r);
>+ k[i].c = i + 1;
>+ k[i].a = -1;
>+ }
>+ k[n - 1].c = 0;
>+ k[0].c = l;
>+ l = n;
>+ }
>+ j = k[0].c;
>+ k[0].c = k[j].c;
>+ return j;
>+}
>+
>+int
>+main ()
>+{
>+ k = (struct S *) __builtin_malloc (sizeof (struct S));
>+ if (!k)
>+ __builtin_exit (0);
>+ __builtin_memset (k, '\0', sizeof (struct S));
>+ k->a = -1;
>+ l = 1;
>+ for (int i = 0; i < 15; ++i)
>+ bar ();
>+ if (m != 4)
>+ __builtin_abort ();
>+ return 0;
>+}
>--- gcc/testsuite/gcc.dg/pr84503-2.c.jj 2018-02-21 20:36:41.874226585
>+0100
>+++ gcc/testsuite/gcc.dg/pr84503-2.c 2018-02-21 20:36:59.875227757
>+0100
>@@ -0,0 +1,5 @@
>+/* PR tree-optimization/84503 */
>+/* { dg-do run } */
>+/* { dg-options "-O3 -fno-tree-vectorize -fno-ivopts" } */
>+
>+#include "pr84503-1.c"
>
> Jakub