This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix store merging wrong-code (PR tree-optimization/87859)


On Mon, 5 Nov 2018, Jakub Jelinek wrote:

> Hi!
> 
> My recent change for PR86844 broke the following testcases.
> 
> The problem is that we are processing the stores in the bitpos/bitsize
> order.  The change PR86844 change was about if we try to merge two
> overlapping stores of INTEGER_CST, we need to go through all other stores
> that are overlapping those and come in program order before the last of
> those; if they are INTEGER_CST stores, we can merge them, if they are other
> stores, we punt completely.  If there are any stores ordered in the
> bitpos/bitsize ordering in between those (thus also overlapping), we keep
> them unmerged as is.
> 
> The problem is that if we this way skip any stores, we need to ensure that
> we don't merge this store group with any statements beyond that, because
> merged store group is stored at the point of the last statement, and that
> would overwrite the skipped stores.  Let's use the testcase:
>   i.f.i[0] = 0; // bitpos 0, bitsize 32
>   i.f.i[1] = 0; // bitpos 32, bitsize 32
>   i.f.i[2] = 0; // bitpos 64, bitsize 32
>   i.f.q.f7 = 1; // all other stores are bitpos the number after f, bitsize 1
>   i.f.q.f2 = 1;
>   i.f.q.f21 = 1;
>   i.f.q.f19 = 1;
>   i.f.q.f14 = 1;
>   i.f.q.f5 = 1;
>   i.f.q.f0 = 1;
>   i.f.q.f15 = 1;
>   i.f.q.f16 = 1;
>   i.f.q.f6 = 1;
>   i.f.q.f9 = 1;
>   i.f.q.f17 = 1;
>   i.f.q.f1 = 1;
>   i.f.q.f8 = 1;
>   i.f.q.f13 = 1;
>   i.f.q.f66 = 1;
> In the bitpos/bitsize order, this is:
>   i.f.i[0] = 0; // bitpos 0, bitsize 32
>   i.f.q.f0 = 1;
>   i.f.q.f1 = 1;
>   i.f.q.f2 = 1;
>   i.f.q.f5 = 1;
>   i.f.q.f6 = 1;
>   i.f.q.f7 = 1;
>   i.f.q.f8 = 1;
>   i.f.q.f9 = 1;
>   i.f.q.f13 = 1;
>   i.f.q.f14 = 1;
>   i.f.q.f15 = 1;
>   i.f.q.f16 = 1;
>   i.f.q.f17 = 1;
>   i.f.q.f19 = 1;
>   i.f.q.f21 = 1;
>   i.f.i[1] = 0; // bitpos 32, bitsize 32
>   i.f.i[2] = 0; // bitpos 64, bitsize 32
>   i.f.q.f66 = 1;
> and when trying to merge overlapping i.f.i[0] = 0; with i.f.q.f0 = 1;,
> we also merge in overlapping f7, f2, f21, f19, f14, f5 and skip
> overlapping, but coming after the f0 store, stores f1 .. f17 other than the
> above mentioned ones (so those stores remain in the source).  Now, in this
> case the merged store group would be emitted where the f0 store is and the
> optimization would be still correct.  The problem is that we keep going and
> merge that store group with i[1] and i[2] stores (that is ok, those are
> before the f0 store, so again, still at the spot of f0 store), but then also
> merge with f66 store and that makes it incorrect, the
>   i.f.q.f15 = 1;
>   i.f.q.f16 = 1;
>   i.f.q.f6 = 1;
>   i.f.q.f9 = 1;
>   i.f.q.f17 = 1;
>   i.f.q.f1 = 1;
>   i.f.q.f8 = 1;
>   i.f.q.f13 = 1;
> stores remain in the IL, but because the last store of the merged group is
> now at f66 store, that is where those bits are overwritten with the store
> of 96 bits together.  The following patch fixes that by making sure that if
> we skip this way any stores (keep them in the IL), then we ensure that we
> don't grow the store group beyond that last_order point.
> 
> Now, the first testcase also shows that we skip those stmts completely
> unnecessarily, while they are overlapping stores that come after the
> latter of the two overlapping stores being merged, they are still
> INTEGER_CST stores and in the past we've merged them all into a constant
> store just fine.  The reason PR86844 was done is that there can be
> non-INTEGER_CST stores that prevent the merging.  So, this patch also
> remembers which of the overlapping stores we would skip are INTEGER_CST
> stores and which are other stores; if we see any to be skipped INTEGER_CSTs,
> we retry the analysis with including those too; if that works out, we use
> that, if it doesn't, we revert to what we chose before (skipping those
> stores with magic to ensure we don't merge with other stores later in
> program order).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and 8.3?

OK.

Thanks,
Richard.

> 2018-11-05  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR tree-optimization/87859
> 	* gimple-ssa-store-merging.c (struct merged_store_group): Add
> 	only_constants and first_nonmergeable_order members.
> 	(merged_store_group::merged_store_group): Initialize them.
> 	(merged_store_group::do_merge): Clear only_constants member if
> 	adding something other than INTEGER_CST store.
> 	(imm_store_chain_info::coalesce_immediate_stores): Don't merge
> 	stores with order >= first_nonmergeable_order.  Use
> 	merged_store->only_constants instead of always recomputing it.
> 	Set merged_store->first_nonmergeable_order if we've skipped any
> 	stores.  Attempt to merge overlapping INTEGER_CST stores that
> 	we would otherwise skip.
> 
> 	* gcc.dg/store_merging_24.c: New test.
> 	* gcc.dg/store_merging_25.c: New test.
> 
> --- gcc/gimple-ssa-store-merging.c.jj	2018-11-02 15:36:30.802999681 +0100
> +++ gcc/gimple-ssa-store-merging.c	2018-11-05 10:20:25.980370963 +0100
> @@ -1429,6 +1429,8 @@ struct merged_store_group
>    unsigned int first_order;
>    unsigned int last_order;
>    bool bit_insertion;
> +  bool only_constants;
> +  unsigned int first_nonmergeable_order;
>  
>    auto_vec<store_immediate_info *> stores;
>    /* We record the first and last original statements in the sequence because
> @@ -1821,6 +1823,8 @@ merged_store_group::merged_store_group (
>    val = NULL;
>    mask = NULL;
>    bit_insertion = false;
> +  only_constants = info->rhs_code == INTEGER_CST;
> +  first_nonmergeable_order = ~0U;
>    unsigned HOST_WIDE_INT align_bitpos = 0;
>    get_object_alignment_1 (gimple_assign_lhs (info->stmt),
>  			  &align, &align_bitpos);
> @@ -1936,6 +1940,8 @@ merged_store_group::do_merge (store_imme
>        first_order = info->order;
>        first_stmt = stmt;
>      }
> +  if (info->rhs_code != INTEGER_CST)
> +    only_constants = false;
>  }
>  
>  /* Merge a store recorded by INFO into this merged store.
> @@ -2696,32 +2702,25 @@ imm_store_chain_info::coalesce_immediate
>  	    }
>  	}
>  
> +      if (info->order >= merged_store->first_nonmergeable_order)
> +	;
> +
>        /* |---store 1---|
>  	       |---store 2---|
>  	 Overlapping stores.  */
> -      if (IN_RANGE (info->bitpos, merged_store->start,
> -		    merged_store->start + merged_store->width - 1))
> +      else if (IN_RANGE (info->bitpos, merged_store->start,
> +			 merged_store->start + merged_store->width - 1))
>  	{
>  	  /* Only allow overlapping stores of constants.  */
> -	  if (info->rhs_code == INTEGER_CST)
> +	  if (info->rhs_code == INTEGER_CST && merged_store->only_constants)
>  	    {
> -	      bool only_constants = true;
> -	      store_immediate_info *infoj;
> -	      unsigned int j;
> -	      FOR_EACH_VEC_ELT (merged_store->stores, j, infoj)
> -		if (infoj->rhs_code != INTEGER_CST)
> -		  {
> -		    only_constants = false;
> -		    break;
> -		  }
>  	      unsigned int last_order
>  		= MAX (merged_store->last_order, info->order);
>  	      unsigned HOST_WIDE_INT end
>  		= MAX (merged_store->start + merged_store->width,
>  		       info->bitpos + info->bitsize);
> -	      if (only_constants
> -		  && check_no_overlap (m_store_info, i, INTEGER_CST,
> -				       last_order, end))
> +	      if (check_no_overlap (m_store_info, i, INTEGER_CST,
> +				    last_order, end))
>  		{
>  		  /* check_no_overlap call above made sure there are no
>  		     overlapping stores with non-INTEGER_CST rhs_code
> @@ -2741,36 +2740,97 @@ imm_store_chain_info::coalesce_immediate
>  		     store_by_bitpos order it comes after the last store that
>  		     we can't merge with them.  We can merge the first 3 stores
>  		     and keep the last store as is though.  */
> -		  unsigned int len = m_store_info.length (), k = i;
> -		  for (unsigned int j = i + 1; j < len; ++j)
> +		  unsigned int len = m_store_info.length ();
> +		  unsigned int try_order = last_order;
> +		  unsigned int first_nonmergeable_order;
> +		  unsigned int k;
> +		  bool last_iter = false;
> +		  int attempts = 0;
> +		  do
>  		    {
> -		      store_immediate_info *info2 = m_store_info[j];
> -		      if (info2->bitpos >= end)
> -			break;
> -		      if (info2->order < last_order)
> +		      unsigned int max_order = 0;
> +		      unsigned first_nonmergeable_int_order = ~0U;
> +		      unsigned HOST_WIDE_INT this_end = end;
> +		      k = i;
> +		      first_nonmergeable_order = ~0U;
> +		      for (unsigned int j = i + 1; j < len; ++j)
>  			{
> -			  if (info2->rhs_code != INTEGER_CST)
> +			  store_immediate_info *info2 = m_store_info[j];
> +			  if (info2->bitpos >= this_end)
> +			    break;
> +			  if (info2->order < try_order)
>  			    {
> -			      /* Normally check_no_overlap makes sure this
> -				 doesn't happen, but if end grows below, then
> -				 we need to process more stores than
> -				 check_no_overlap verified.  Example:
> -				    MEM[(int *)p_5] = 0;
> -				    MEM[(short *)p_5 + 3B] = 1;
> -				    MEM[(char *)p_5 + 4B] = _9;
> -				    MEM[(char *)p_5 + 2B] = 2;  */
> -			      k = 0;
> -			      break;
> +			      if (info2->rhs_code != INTEGER_CST)
> +				{
> +				  /* Normally check_no_overlap makes sure this
> +				     doesn't happen, but if end grows below,
> +				     then we need to process more stores than
> +				     check_no_overlap verified.  Example:
> +				      MEM[(int *)p_5] = 0;
> +				      MEM[(short *)p_5 + 3B] = 1;
> +				      MEM[(char *)p_5 + 4B] = _9;
> +				      MEM[(char *)p_5 + 2B] = 2;  */
> +				  k = 0;
> +				  break;
> +				}
> +			      k = j;
> +			      this_end = MAX (this_end,
> +					      info2->bitpos + info2->bitsize);
>  			    }
> -			  k = j;
> -			  end = MAX (end, info2->bitpos + info2->bitsize);
> +			  else if (info2->rhs_code == INTEGER_CST
> +				   && !last_iter)
> +			    {
> +			      max_order = MAX (max_order, info2->order + 1);
> +			      first_nonmergeable_int_order
> +				= MIN (first_nonmergeable_int_order,
> +				       info2->order);
> +			    }
> +			  else
> +			    first_nonmergeable_order
> +			      = MIN (first_nonmergeable_order, info2->order);
>  			}
> +		      if (k == 0)
> +			{
> +			  if (last_order == try_order)
> +			    break;
> +			  /* If this failed, but only because we grew
> +			     try_order, retry with the last working one,
> +			     so that we merge at least something.  */
> +			  try_order = last_order;
> +			  last_iter = true;
> +			  continue;
> +			}
> +		      last_order = try_order;
> +		      /* Retry with a larger try_order to see if we could
> +			 merge some further INTEGER_CST stores.  */
> +		      if (max_order
> +			  && (first_nonmergeable_int_order
> +			      < first_nonmergeable_order))
> +			{
> +			  try_order = MIN (max_order,
> +					   first_nonmergeable_order);
> +			  try_order
> +			    = MIN (try_order,
> +				   merged_store->first_nonmergeable_order);
> +			  if (try_order > last_order && ++attempts < 16)
> +			    continue;
> +			}
> +		      first_nonmergeable_order
> +			= MIN (first_nonmergeable_order,
> +			       first_nonmergeable_int_order);
> +		      end = this_end;
> +		      break;
>  		    }
> +		  while (1);
>  
>  		  if (k != 0)
>  		    {
>  		      merged_store->merge_overlapping (info);
>  
> +		      merged_store->first_nonmergeable_order
> +			= MIN (merged_store->first_nonmergeable_order,
> +			       first_nonmergeable_order);
> +
>  		      for (unsigned int j = i + 1; j <= k; j++)
>  			{
>  			  store_immediate_info *info2 = m_store_info[j];
> @@ -2778,7 +2838,8 @@ imm_store_chain_info::coalesce_immediate
>  			  if (info2->order < last_order)
>  			    {
>  			      gcc_assert (info2->rhs_code == INTEGER_CST);
> -			      merged_store->merge_overlapping (info2);
> +			      if (info != info2)
> +				merged_store->merge_overlapping (info2);
>  			    }
>  			  /* Other stores are kept and not merged in any
>  			     way.  */
> --- gcc/testsuite/gcc.dg/store_merging_24.c.jj	2018-11-02 15:36:59.829518310 +0100
> +++ gcc/testsuite/gcc.dg/store_merging_24.c	2018-11-02 15:36:59.829518310 +0100
> @@ -0,0 +1,75 @@
> +/* PR tree-optimization/87859 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fdump-tree-store-merging-details" } */
> +/* { dg-final { scan-tree-dump "New sequence of \[23] stores to replace old one of 19 stores" "store-merging" { target i?86-*-* x86_64-*-* } } } */
> +/* { dg-final { scan-tree-dump "New sequence of 1 stores to replace old one of 6 stores" "store-merging" { target i?86-*-* x86_64-*-* } } } */
> +
> +struct S {
> +  union F {
> +    struct T {
> +#define A(n) unsigned n:1;
> +#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) \
> +	     A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
> +      B(f) B(f1) B(f2) B(f3) B(f4) B(f5)
> +      A(f60) A(f61) A(f62) A(f63) A(f64) A(f65) A(f66)
> +    } q;
> +    unsigned int i[3];
> +  } f;
> +};
> +
> +struct S s = {
> +  .f.q.f0 = 1, .f.q.f1 = 1, .f.q.f2 = 1, .f.q.f5 = 1, .f.q.f6 = 1,
> +  .f.q.f7 = 1, .f.q.f8 = 1, .f.q.f9 = 1, .f.q.f13 = 1, .f.q.f14 = 1,
> +  .f.q.f15 = 1, .f.q.f16 = 1, .f.q.f17 = 1, .f.q.f19 = 1, .f.q.f21 = 1,
> +  .f.q.f66 = 1
> +};
> +
> +__attribute__((noipa)) void
> +bar (unsigned *x)
> +{
> +  if (x[0] != s.f.i[0] || x[1] != s.f.i[1] || x[2] != s.f.i[2])
> +    __builtin_abort ();
> +}
> +
> +__attribute__((noipa)) void
> +foo (unsigned char *state)
> +{
> +  struct S i;
> +  i.f.i[0] = 0;
> +  i.f.i[1] = 0;
> +  i.f.i[2] = 0;
> +  i.f.q.f7 = 1;
> +  i.f.q.f2 = 1;
> +  i.f.q.f21 = 1;
> +  i.f.q.f19 = 1;
> +  i.f.q.f14 = 1;
> +  i.f.q.f5 = 1;
> +  i.f.q.f0 = 1;
> +  i.f.q.f15 = 1;
> +  i.f.q.f16 = 1;
> +  i.f.q.f6 = 1;
> +  i.f.q.f9 = 1;
> +  i.f.q.f17 = 1;
> +  i.f.q.f1 = 1;
> +  i.f.q.f8 = 1;
> +  i.f.q.f13 = 1;
> +  i.f.q.f66 = 1;
> +  if (*state)
> +    {
> +      i.f.q.f37 = 1;
> +      i.f.q.f38 = 1;
> +      i.f.q.f39 = 1;
> +      i.f.q.f40 = 1;
> +      i.f.q.f41 = 1;
> +      i.f.q.f36 = 1;
> +    }
> +  bar (i.f.i);
> +}
> +
> +int
> +main ()
> +{
> +  unsigned char z = 0;
> +  foo (&z);
> +  return 0;
> +}
> --- gcc/testsuite/gcc.dg/store_merging_25.c.jj	2018-11-05 10:28:41.544249761 +0100
> +++ gcc/testsuite/gcc.dg/store_merging_25.c	2018-11-05 10:28:04.033864475 +0100
> @@ -0,0 +1,75 @@
> +/* PR tree-optimization/87859 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fdump-tree-store-merging-details" } */
> +/* { dg-final { scan-tree-dump "New sequence of \[23] stores to replace old one of 14 stores" "store-merging" { target i?86-*-* x86_64-*-* } } } */
> +/* { dg-final { scan-tree-dump "New sequence of 1 stores to replace old one of 6 stores" "store-merging" { target i?86-*-* x86_64-*-* } } } */
> +
> +struct S {
> +  union F {
> +    struct T {
> +#define A(n) unsigned n:1;
> +#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) \
> +	     A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
> +      B(f) B(f1) B(f2) B(f3) B(f4) B(f5)
> +      A(f60) A(f61) A(f62) A(f63) A(f64) A(f65) A(f66)
> +    } q;
> +    unsigned int i[3];
> +  } f;
> +};
> +
> +struct S s = {
> +  .f.q.f0 = 1, .f.q.f1 = 1, .f.q.f2 = 1, .f.q.f5 = 1, .f.q.f6 = 1,
> +  .f.q.f7 = 1, .f.q.f8 = 1, .f.q.f9 = 1, .f.q.f13 = 1, .f.q.f14 = 1,
> +  .f.q.f15 = 1, .f.q.f16 = 1, .f.q.f17 = 1, .f.q.f19 = 1, .f.q.f21 = 1,
> +  .f.q.f66 = 1
> +};
> +
> +__attribute__((noipa)) void
> +bar (unsigned *x)
> +{
> +  if (x[0] != s.f.i[0] || x[1] != s.f.i[1] || x[2] != s.f.i[2])
> +    __builtin_abort ();
> +}
> +
> +__attribute__((noipa)) void
> +foo (unsigned char *state, unsigned char c)
> +{
> +  struct S i;
> +  i.f.i[0] = 0;
> +  i.f.i[1] = 0;
> +  i.f.i[2] = 0;
> +  i.f.q.f7 = 1;
> +  i.f.q.f2 = 1;
> +  i.f.q.f21 = 1;
> +  i.f.q.f19 = 1;
> +  i.f.q.f14 = 1;
> +  i.f.q.f5 = 1;
> +  i.f.q.f0 = 1;
> +  i.f.q.f15 = 1;
> +  i.f.q.f16 = 1;
> +  i.f.q.f6 = 1;
> +  i.f.q.f9 = 1;
> +  i.f.q.f17 = c;
> +  i.f.q.f1 = 1;
> +  i.f.q.f8 = 1;
> +  i.f.q.f13 = 1;
> +  i.f.q.f66 = 1;
> +  if (*state)
> +    {
> +      i.f.q.f37 = 1;
> +      i.f.q.f38 = 1;
> +      i.f.q.f39 = 1;
> +      i.f.q.f40 = 1;
> +      i.f.q.f41 = 1;
> +      i.f.q.f36 = 1;
> +    }
> +  bar (i.f.i);
> +}
> +
> +int
> +main ()
> +{
> +  unsigned char z = 0;
> +  foo (&z, 1);
> +  return 0;
> +}
> 
> 	Jakub
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]