This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Fix store merging wrong-code (PR tree-optimization/87859)
- From: Richard Biener <rguenther at suse dot de>
- To: Jakub Jelinek <jakub at redhat dot com>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Mon, 5 Nov 2018 11:04:43 +0100 (CET)
- Subject: Re: [PATCH] Fix store merging wrong-code (PR tree-optimization/87859)
- References: <20181105095642.GN11625@tucnak>
On Mon, 5 Nov 2018, Jakub Jelinek wrote:
> Hi!
>
> My recent change for PR86844 broke the following testcases.
>
> The problem is that we are processing the stores in the bitpos/bitsize
> order. The change PR86844 change was about if we try to merge two
> overlapping stores of INTEGER_CST, we need to go through all other stores
> that are overlapping those and come in program order before the last of
> those; if they are INTEGER_CST stores, we can merge them, if they are other
> stores, we punt completely. If there are any stores ordered in the
> bitpos/bitsize ordering in between those (thus also overlapping), we keep
> them unmerged as is.
>
> The problem is that if we this way skip any stores, we need to ensure that
> we don't merge this store group with any statements beyond that, because
> merged store group is stored at the point of the last statement, and that
> would overwrite the skipped stores. Let's use the testcase:
> i.f.i[0] = 0; // bitpos 0, bitsize 32
> i.f.i[1] = 0; // bitpos 32, bitsize 32
> i.f.i[2] = 0; // bitpos 64, bitsize 32
> i.f.q.f7 = 1; // all other stores are bitpos the number after f, bitsize 1
> i.f.q.f2 = 1;
> i.f.q.f21 = 1;
> i.f.q.f19 = 1;
> i.f.q.f14 = 1;
> i.f.q.f5 = 1;
> i.f.q.f0 = 1;
> i.f.q.f15 = 1;
> i.f.q.f16 = 1;
> i.f.q.f6 = 1;
> i.f.q.f9 = 1;
> i.f.q.f17 = 1;
> i.f.q.f1 = 1;
> i.f.q.f8 = 1;
> i.f.q.f13 = 1;
> i.f.q.f66 = 1;
> In the bitpos/bitsize order, this is:
> i.f.i[0] = 0; // bitpos 0, bitsize 32
> i.f.q.f0 = 1;
> i.f.q.f1 = 1;
> i.f.q.f2 = 1;
> i.f.q.f5 = 1;
> i.f.q.f6 = 1;
> i.f.q.f7 = 1;
> i.f.q.f8 = 1;
> i.f.q.f9 = 1;
> i.f.q.f13 = 1;
> i.f.q.f14 = 1;
> i.f.q.f15 = 1;
> i.f.q.f16 = 1;
> i.f.q.f17 = 1;
> i.f.q.f19 = 1;
> i.f.q.f21 = 1;
> i.f.i[1] = 0; // bitpos 32, bitsize 32
> i.f.i[2] = 0; // bitpos 64, bitsize 32
> i.f.q.f66 = 1;
> and when trying to merge overlapping i.f.i[0] = 0; with i.f.q.f0 = 1;,
> we also merge in overlapping f7, f2, f21, f19, f14, f5 and skip
> overlapping, but coming after the f0 store, stores f1 .. f17 other than the
> above mentioned ones (so those stores remain in the source). Now, in this
> case the merged store group would be emitted where the f0 store is and the
> optimization would be still correct. The problem is that we keep going and
> merge that store group with i[1] and i[2] stores (that is ok, those are
> before the f0 store, so again, still at the spot of f0 store), but then also
> merge with f66 store and that makes it incorrect, the
> i.f.q.f15 = 1;
> i.f.q.f16 = 1;
> i.f.q.f6 = 1;
> i.f.q.f9 = 1;
> i.f.q.f17 = 1;
> i.f.q.f1 = 1;
> i.f.q.f8 = 1;
> i.f.q.f13 = 1;
> stores remain in the IL, but because the last store of the merged group is
> now at f66 store, that is where those bits are overwritten with the store
> of 96 bits together. The following patch fixes that by making sure that if
> we skip this way any stores (keep them in the IL), then we ensure that we
> don't grow the store group beyond that last_order point.
>
> Now, the first testcase also shows that we skip those stmts completely
> unnecessarily, while they are overlapping stores that come after the
> latter of the two overlapping stores being merged, they are still
> INTEGER_CST stores and in the past we've merged them all into a constant
> store just fine. The reason PR86844 was done is that there can be
> non-INTEGER_CST stores that prevent the merging. So, this patch also
> remembers which of the overlapping stores we would skip are INTEGER_CST
> stores and which are other stores; if we see any to be skipped INTEGER_CSTs,
> we retry the analysis with including those too; if that works out, we use
> that, if it doesn't, we revert to what we chose before (skipping those
> stores with magic to ensure we don't merge with other stores later in
> program order).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and 8.3?
OK.
Thanks,
Richard.
> 2018-11-05 Jakub Jelinek <jakub@redhat.com>
>
> PR tree-optimization/87859
> * gimple-ssa-store-merging.c (struct merged_store_group): Add
> only_constants and first_nonmergeable_order members.
> (merged_store_group::merged_store_group): Initialize them.
> (merged_store_group::do_merge): Clear only_constants member if
> adding something other than INTEGER_CST store.
> (imm_store_chain_info::coalesce_immediate_stores): Don't merge
> stores with order >= first_nonmergeable_order. Use
> merged_store->only_constants instead of always recomputing it.
> Set merged_store->first_nonmergeable_order if we've skipped any
> stores. Attempt to merge overlapping INTEGER_CST stores that
> we would otherwise skip.
>
> * gcc.dg/store_merging_24.c: New test.
> * gcc.dg/store_merging_25.c: New test.
>
> --- gcc/gimple-ssa-store-merging.c.jj 2018-11-02 15:36:30.802999681 +0100
> +++ gcc/gimple-ssa-store-merging.c 2018-11-05 10:20:25.980370963 +0100
> @@ -1429,6 +1429,8 @@ struct merged_store_group
> unsigned int first_order;
> unsigned int last_order;
> bool bit_insertion;
> + bool only_constants;
> + unsigned int first_nonmergeable_order;
>
> auto_vec<store_immediate_info *> stores;
> /* We record the first and last original statements in the sequence because
> @@ -1821,6 +1823,8 @@ merged_store_group::merged_store_group (
> val = NULL;
> mask = NULL;
> bit_insertion = false;
> + only_constants = info->rhs_code == INTEGER_CST;
> + first_nonmergeable_order = ~0U;
> unsigned HOST_WIDE_INT align_bitpos = 0;
> get_object_alignment_1 (gimple_assign_lhs (info->stmt),
> &align, &align_bitpos);
> @@ -1936,6 +1940,8 @@ merged_store_group::do_merge (store_imme
> first_order = info->order;
> first_stmt = stmt;
> }
> + if (info->rhs_code != INTEGER_CST)
> + only_constants = false;
> }
>
> /* Merge a store recorded by INFO into this merged store.
> @@ -2696,32 +2702,25 @@ imm_store_chain_info::coalesce_immediate
> }
> }
>
> + if (info->order >= merged_store->first_nonmergeable_order)
> + ;
> +
> /* |---store 1---|
> |---store 2---|
> Overlapping stores. */
> - if (IN_RANGE (info->bitpos, merged_store->start,
> - merged_store->start + merged_store->width - 1))
> + else if (IN_RANGE (info->bitpos, merged_store->start,
> + merged_store->start + merged_store->width - 1))
> {
> /* Only allow overlapping stores of constants. */
> - if (info->rhs_code == INTEGER_CST)
> + if (info->rhs_code == INTEGER_CST && merged_store->only_constants)
> {
> - bool only_constants = true;
> - store_immediate_info *infoj;
> - unsigned int j;
> - FOR_EACH_VEC_ELT (merged_store->stores, j, infoj)
> - if (infoj->rhs_code != INTEGER_CST)
> - {
> - only_constants = false;
> - break;
> - }
> unsigned int last_order
> = MAX (merged_store->last_order, info->order);
> unsigned HOST_WIDE_INT end
> = MAX (merged_store->start + merged_store->width,
> info->bitpos + info->bitsize);
> - if (only_constants
> - && check_no_overlap (m_store_info, i, INTEGER_CST,
> - last_order, end))
> + if (check_no_overlap (m_store_info, i, INTEGER_CST,
> + last_order, end))
> {
> /* check_no_overlap call above made sure there are no
> overlapping stores with non-INTEGER_CST rhs_code
> @@ -2741,36 +2740,97 @@ imm_store_chain_info::coalesce_immediate
> store_by_bitpos order it comes after the last store that
> we can't merge with them. We can merge the first 3 stores
> and keep the last store as is though. */
> - unsigned int len = m_store_info.length (), k = i;
> - for (unsigned int j = i + 1; j < len; ++j)
> + unsigned int len = m_store_info.length ();
> + unsigned int try_order = last_order;
> + unsigned int first_nonmergeable_order;
> + unsigned int k;
> + bool last_iter = false;
> + int attempts = 0;
> + do
> {
> - store_immediate_info *info2 = m_store_info[j];
> - if (info2->bitpos >= end)
> - break;
> - if (info2->order < last_order)
> + unsigned int max_order = 0;
> + unsigned first_nonmergeable_int_order = ~0U;
> + unsigned HOST_WIDE_INT this_end = end;
> + k = i;
> + first_nonmergeable_order = ~0U;
> + for (unsigned int j = i + 1; j < len; ++j)
> {
> - if (info2->rhs_code != INTEGER_CST)
> + store_immediate_info *info2 = m_store_info[j];
> + if (info2->bitpos >= this_end)
> + break;
> + if (info2->order < try_order)
> {
> - /* Normally check_no_overlap makes sure this
> - doesn't happen, but if end grows below, then
> - we need to process more stores than
> - check_no_overlap verified. Example:
> - MEM[(int *)p_5] = 0;
> - MEM[(short *)p_5 + 3B] = 1;
> - MEM[(char *)p_5 + 4B] = _9;
> - MEM[(char *)p_5 + 2B] = 2; */
> - k = 0;
> - break;
> + if (info2->rhs_code != INTEGER_CST)
> + {
> + /* Normally check_no_overlap makes sure this
> + doesn't happen, but if end grows below,
> + then we need to process more stores than
> + check_no_overlap verified. Example:
> + MEM[(int *)p_5] = 0;
> + MEM[(short *)p_5 + 3B] = 1;
> + MEM[(char *)p_5 + 4B] = _9;
> + MEM[(char *)p_5 + 2B] = 2; */
> + k = 0;
> + break;
> + }
> + k = j;
> + this_end = MAX (this_end,
> + info2->bitpos + info2->bitsize);
> }
> - k = j;
> - end = MAX (end, info2->bitpos + info2->bitsize);
> + else if (info2->rhs_code == INTEGER_CST
> + && !last_iter)
> + {
> + max_order = MAX (max_order, info2->order + 1);
> + first_nonmergeable_int_order
> + = MIN (first_nonmergeable_int_order,
> + info2->order);
> + }
> + else
> + first_nonmergeable_order
> + = MIN (first_nonmergeable_order, info2->order);
> }
> + if (k == 0)
> + {
> + if (last_order == try_order)
> + break;
> + /* If this failed, but only because we grew
> + try_order, retry with the last working one,
> + so that we merge at least something. */
> + try_order = last_order;
> + last_iter = true;
> + continue;
> + }
> + last_order = try_order;
> + /* Retry with a larger try_order to see if we could
> + merge some further INTEGER_CST stores. */
> + if (max_order
> + && (first_nonmergeable_int_order
> + < first_nonmergeable_order))
> + {
> + try_order = MIN (max_order,
> + first_nonmergeable_order);
> + try_order
> + = MIN (try_order,
> + merged_store->first_nonmergeable_order);
> + if (try_order > last_order && ++attempts < 16)
> + continue;
> + }
> + first_nonmergeable_order
> + = MIN (first_nonmergeable_order,
> + first_nonmergeable_int_order);
> + end = this_end;
> + break;
> }
> + while (1);
>
> if (k != 0)
> {
> merged_store->merge_overlapping (info);
>
> + merged_store->first_nonmergeable_order
> + = MIN (merged_store->first_nonmergeable_order,
> + first_nonmergeable_order);
> +
> for (unsigned int j = i + 1; j <= k; j++)
> {
> store_immediate_info *info2 = m_store_info[j];
> @@ -2778,7 +2838,8 @@ imm_store_chain_info::coalesce_immediate
> if (info2->order < last_order)
> {
> gcc_assert (info2->rhs_code == INTEGER_CST);
> - merged_store->merge_overlapping (info2);
> + if (info != info2)
> + merged_store->merge_overlapping (info2);
> }
> /* Other stores are kept and not merged in any
> way. */
> --- gcc/testsuite/gcc.dg/store_merging_24.c.jj 2018-11-02 15:36:59.829518310 +0100
> +++ gcc/testsuite/gcc.dg/store_merging_24.c 2018-11-02 15:36:59.829518310 +0100
> @@ -0,0 +1,75 @@
> +/* PR tree-optimization/87859 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fdump-tree-store-merging-details" } */
> +/* { dg-final { scan-tree-dump "New sequence of \[23] stores to replace old one of 19 stores" "store-merging" { target i?86-*-* x86_64-*-* } } } */
> +/* { dg-final { scan-tree-dump "New sequence of 1 stores to replace old one of 6 stores" "store-merging" { target i?86-*-* x86_64-*-* } } } */
> +
> +struct S {
> + union F {
> + struct T {
> +#define A(n) unsigned n:1;
> +#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) \
> + A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
> + B(f) B(f1) B(f2) B(f3) B(f4) B(f5)
> + A(f60) A(f61) A(f62) A(f63) A(f64) A(f65) A(f66)
> + } q;
> + unsigned int i[3];
> + } f;
> +};
> +
> +struct S s = {
> + .f.q.f0 = 1, .f.q.f1 = 1, .f.q.f2 = 1, .f.q.f5 = 1, .f.q.f6 = 1,
> + .f.q.f7 = 1, .f.q.f8 = 1, .f.q.f9 = 1, .f.q.f13 = 1, .f.q.f14 = 1,
> + .f.q.f15 = 1, .f.q.f16 = 1, .f.q.f17 = 1, .f.q.f19 = 1, .f.q.f21 = 1,
> + .f.q.f66 = 1
> +};
> +
> +__attribute__((noipa)) void
> +bar (unsigned *x)
> +{
> + if (x[0] != s.f.i[0] || x[1] != s.f.i[1] || x[2] != s.f.i[2])
> + __builtin_abort ();
> +}
> +
> +__attribute__((noipa)) void
> +foo (unsigned char *state)
> +{
> + struct S i;
> + i.f.i[0] = 0;
> + i.f.i[1] = 0;
> + i.f.i[2] = 0;
> + i.f.q.f7 = 1;
> + i.f.q.f2 = 1;
> + i.f.q.f21 = 1;
> + i.f.q.f19 = 1;
> + i.f.q.f14 = 1;
> + i.f.q.f5 = 1;
> + i.f.q.f0 = 1;
> + i.f.q.f15 = 1;
> + i.f.q.f16 = 1;
> + i.f.q.f6 = 1;
> + i.f.q.f9 = 1;
> + i.f.q.f17 = 1;
> + i.f.q.f1 = 1;
> + i.f.q.f8 = 1;
> + i.f.q.f13 = 1;
> + i.f.q.f66 = 1;
> + if (*state)
> + {
> + i.f.q.f37 = 1;
> + i.f.q.f38 = 1;
> + i.f.q.f39 = 1;
> + i.f.q.f40 = 1;
> + i.f.q.f41 = 1;
> + i.f.q.f36 = 1;
> + }
> + bar (i.f.i);
> +}
> +
> +int
> +main ()
> +{
> + unsigned char z = 0;
> + foo (&z);
> + return 0;
> +}
> --- gcc/testsuite/gcc.dg/store_merging_25.c.jj 2018-11-05 10:28:41.544249761 +0100
> +++ gcc/testsuite/gcc.dg/store_merging_25.c 2018-11-05 10:28:04.033864475 +0100
> @@ -0,0 +1,75 @@
> +/* PR tree-optimization/87859 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fdump-tree-store-merging-details" } */
> +/* { dg-final { scan-tree-dump "New sequence of \[23] stores to replace old one of 14 stores" "store-merging" { target i?86-*-* x86_64-*-* } } } */
> +/* { dg-final { scan-tree-dump "New sequence of 1 stores to replace old one of 6 stores" "store-merging" { target i?86-*-* x86_64-*-* } } } */
> +
> +struct S {
> + union F {
> + struct T {
> +#define A(n) unsigned n:1;
> +#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) \
> + A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
> + B(f) B(f1) B(f2) B(f3) B(f4) B(f5)
> + A(f60) A(f61) A(f62) A(f63) A(f64) A(f65) A(f66)
> + } q;
> + unsigned int i[3];
> + } f;
> +};
> +
> +struct S s = {
> + .f.q.f0 = 1, .f.q.f1 = 1, .f.q.f2 = 1, .f.q.f5 = 1, .f.q.f6 = 1,
> + .f.q.f7 = 1, .f.q.f8 = 1, .f.q.f9 = 1, .f.q.f13 = 1, .f.q.f14 = 1,
> + .f.q.f15 = 1, .f.q.f16 = 1, .f.q.f17 = 1, .f.q.f19 = 1, .f.q.f21 = 1,
> + .f.q.f66 = 1
> +};
> +
> +__attribute__((noipa)) void
> +bar (unsigned *x)
> +{
> + if (x[0] != s.f.i[0] || x[1] != s.f.i[1] || x[2] != s.f.i[2])
> + __builtin_abort ();
> +}
> +
> +__attribute__((noipa)) void
> +foo (unsigned char *state, unsigned char c)
> +{
> + struct S i;
> + i.f.i[0] = 0;
> + i.f.i[1] = 0;
> + i.f.i[2] = 0;
> + i.f.q.f7 = 1;
> + i.f.q.f2 = 1;
> + i.f.q.f21 = 1;
> + i.f.q.f19 = 1;
> + i.f.q.f14 = 1;
> + i.f.q.f5 = 1;
> + i.f.q.f0 = 1;
> + i.f.q.f15 = 1;
> + i.f.q.f16 = 1;
> + i.f.q.f6 = 1;
> + i.f.q.f9 = 1;
> + i.f.q.f17 = c;
> + i.f.q.f1 = 1;
> + i.f.q.f8 = 1;
> + i.f.q.f13 = 1;
> + i.f.q.f66 = 1;
> + if (*state)
> + {
> + i.f.q.f37 = 1;
> + i.f.q.f38 = 1;
> + i.f.q.f39 = 1;
> + i.f.q.f40 = 1;
> + i.f.q.f41 = 1;
> + i.f.q.f36 = 1;
> + }
> + bar (i.f.i);
> +}
> +
> +int
> +main ()
> +{
> + unsigned char z = 0;
> + foo (&z, 1);
> + return 0;
> +}
>
> Jakub
>
>
--
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)