This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Improve store merging to handle load+store or bitwise logicals (PR tree-optimization/78821, take 2)


On Fri, 3 Nov 2017, Jakub Jelinek wrote:

> On Fri, Nov 03, 2017 at 02:14:39PM +0100, Richard Biener wrote:
> > > +/* Return true if stmts in between FIRST (inclusive) and LAST (exclusive)
> > > +   may clobber REF.  FIRST and LAST must be in the same basic block and
> > > +   have non-NULL vdef.  */
> > > +
> > > +bool
> > > +stmts_may_clobber_ref_p (gimple *first, gimple *last, tree ref)
> > > +{
> > > +  ao_ref r;
> > > +  ao_ref_init (&r, ref);
> > > +  unsigned int count = 0;
> > > +  tree vop = gimple_vdef (last);
> > > +  gimple *stmt;
> > > +
> > > +  gcc_checking_assert (gimple_bb (first) == gimple_bb (last));
> > 
> > EBB would probably work as well, thus we should assert we do not
> > end up visiting a PHI in the loop?
> 
> For a general purpose routine sure, this one is in anonymous namespace
> and meant for use in this pass.  And there it is only checking stores from
> the same store group and any other stores intermixed between those.
> The pass at least right now is resetting all of its state at the end of
> basic blocks, so gimple_bb (first) == gimple_bb (last) is indeed always
> guaranteed.  If we ever extend it such that we don't have this guarantee,
> then this assert would fail and then of course it should be adjusted to
> handle whatever is needed.  But do we need to do that right now?

No, we don't.  Just wondered about the assert and the real limitation
of the implementation.

> Note extending store-merging to handle groups of stores within EBB
> doesn't look useful, then not all stores would be unconditional.

Yes.

> What could make sense is if we have e.g. a diamond
>      |
>     bb1
>    /  \
>   bb2 bb3
>    \  /
>     bb4
>      |
> and stores are in bb1 and bb4 and no stores in bb2 or bb3 can alias
> with those.  But then we'd likely need full-blown walk_aliased_vdefs
> for this...

Yes.

> > > +  gimple *first = merged_store->first_stmt;
> > > +  gimple *last = merged_store->last_stmt;
> > > +  unsigned int i;
> > > +  store_immediate_info *infoc;
> > > +  if (info->order < merged_store->first_order)
> > > +    {
> > > +      FOR_EACH_VEC_ELT (merged_store->stores, i, infoc)
> > > +	if (stmts_may_clobber_ref_p (info->stmt, first, infoc->ops[idx].val))
> > > +	  return false;
> > > +      first = info->stmt;
> > > +    }
> > > +  else if (info->order > merged_store->last_order)
> > > +    {
> > > +      FOR_EACH_VEC_ELT (merged_store->stores, i, infoc)
> > > +	if (stmts_may_clobber_ref_p (last, info->stmt, infoc->ops[idx].val))
> > > +	  return false;
> > > +      last = info->stmt;
> > > +    }
> > > +  if (stmts_may_clobber_ref_p (first, last, info->ops[idx].val))
> > > +    return false;
> > 
> > Can you comment on what you check in this block?  It first checks
> > all stmts (but not info->stmt itself if it is after last!?) 
> > against
> > all stores that would be added when adding 'info'.  Then it checks
> > from new first to last against the newly added stmt (again
> > excluding that stmt if it was added last).
> 
> The stmts_may_clobber_ref_p routine doesn't check aliasing on the last
> stmt, only on the first stmt and stmts in between.
> 
> Previous iterations have checked FOR_EACH_VEC_ELT (merged_store->stores, i, infoc)
> that merged_store->first_stmt and stmts in between that and 
> merged_store->last_stmt don't clobber any of the infoc->ops[idx].val
> references and we want to maintain that invariant if we add another store to
> the group.  So, if we are about to extend the range between first_stmt and
> last_stmt, then we need to check all the earlier refs on the stmts we've
> added to the range.  Note that the stores are sorted by bitpos, not by
> their order within the basic block at this point, so it is possible that a
> store with a higher bitpos extends to earlier stmts or later stmts.
> 
> And finally the if (stmts_may_clobber_ref_p (first, last, info->ops[idx].val))
> is checking the reference we are adding against the whole new range.
> 
> > > +  if (offset != NULL_TREE)
> > > +    {
> > > +      /* If the access is variable offset then a base decl has to be
> > > +	 address-taken to be able to emit pointer-based stores to it.
> > > +	 ???  We might be able to get away with re-using the original
> > > +	 base up to the first variable part and then wrapping that inside
> > > +	 a BIT_FIELD_REF.  */
> > 
> > Yes, that's what I'd generally recommend...  OTOH it can get quite
> > fugly but it only has to survive until RTL expansion...
> 
> This is an preexisting comment I've just moved around from the
> pass_store_merging::execute method into a helper function (it grew too much
> and needed too big indentation and furthermore I wanted to use it for the
> loads too).  Haven't really changed anything on that.
> 
> > As extra sanity check I'd rather have that all refs share a common
> > base (operand-equal-p'ish).  But I guess that's what usually will
> > happen anyways.  The alias-ptr-type trick will be tricky to do
> > here as well (you have to go down to the base MEM_REF, wrap
> > a decl if there's no MEM_REF and adjust the offset type).
> 
> For the aliasing, I have an untested incremental patch, need to finish
> testcases for that, then test and then I can post it.
> 
> > given the style of processing we can end up doing more than
> > necessary work when following ! single-use chains here, no?
> > Would it make sense to restrict this to single-uses or do we
> > handle any case of ! single-uses?  When extending things to
> > allow an intermediate swap or general permute we could handle
> > a byte/word-splat for example.
> 
> single-use vs. multiple uses is something I've thought about, but don't
> know whether it is better to require single-use or not (or sometimes,
> under some condition?).  Say if we have:
>   _1 = t.a;
>   _2 = t.b;
>   _3 = t.c;
>   _4 = t.d;
>   s.a = _1;
>   s.b = _2;
>   s.c = _3;
>   s.d = _4;
>   use (_1, _2, _3, _4);
> it will likely be shorter and maybe faster if we do:
>   _1 = t.a;
>   _2 = t.b;
>   _3 = t.c;
>   _4 = t.d;
>   _5 = load[t.a...t.d];
>   store[s.a...s.d] = _5;
>   use (_1, _2, _3, _4);
> If there are just 2 stores combined together, having
>   _1 = t.a; _2 = t.b; _5 = load[t.a...t.b]; store[t.a...t.b] = _5;
>   use (_1, _2);
> will be as many stmts as before.  And if there is BIT_*_EXPR, we can be
> adding not just loads, but also the bitwise ops.
> 
> So, if you want, I can at least for now add has_single_use checks
> in all the spots where it follows SSA_NAME_DEF_STMT.
> 
> > Otherwise the patch looks good.  Quite some parts of the changes
> > seem to be due to splitting out stuff into functions and refactoring.
> 
> Indeed, the mem_valid_for_store_merging and pass_store_merging::process_store
> functions are mostly code move + reindent + tweaks.
> 
> Here are hand made diffs between old portions of pass_store_merging::execute
> corresponding to the above 2 functions and those new functions with -upbd
> if it is of any help.
> 
> @@ -1,33 +1,36 @@
> -	      HOST_WIDE_INT bitsize, bitpos;
> +static tree
> +mem_valid_for_store_merging (tree mem, unsigned HOST_WIDE_INT *pbitsize,
> +			     unsigned HOST_WIDE_INT *pbitpos,
> +			     unsigned HOST_WIDE_INT *pbitregion_start,
> +			     unsigned HOST_WIDE_INT *pbitregion_end)
> +{
> +  HOST_WIDE_INT bitsize;
> +  HOST_WIDE_INT bitpos;
>  	      unsigned HOST_WIDE_INT bitregion_start = 0;
>  	      unsigned HOST_WIDE_INT bitregion_end = 0;
>  	      machine_mode mode;
>  	      int unsignedp = 0, reversep = 0, volatilep = 0;
> -	      tree offset, base_addr;
> -	      base_addr
> -		= get_inner_reference (lhs, &bitsize, &bitpos, &offset, &mode,
> +  tree offset;
> +  tree base_addr = get_inner_reference (mem, &bitsize, &bitpos, &offset, &mode,
>  				       &unsignedp, &reversep, &volatilep);
> -	      if (TREE_CODE (lhs) == COMPONENT_REF
> -		  && DECL_BIT_FIELD_TYPE (TREE_OPERAND (lhs, 1)))
> +  *pbitsize = bitsize;
> +  if (bitsize == 0)
> +    return NULL_TREE;
> +
> +  if (TREE_CODE (mem) == COMPONENT_REF
> +      && DECL_BIT_FIELD_TYPE (TREE_OPERAND (mem, 1)))
>  		{
> -		  get_bit_range (&bitregion_start, &bitregion_end, lhs,
> -				 &bitpos, &offset);
> +      get_bit_range (&bitregion_start, &bitregion_end, mem, &bitpos, &offset);
>  		  if (bitregion_end)
>  		    ++bitregion_end;
>  		}
> -	      if (bitsize == 0)
> -		continue;
>  
> -	      /* As a future enhancement we could handle stores with the same
> -		 base and offset.  */
> -	      bool invalid = reversep
> -			     || ((bitsize > MAX_BITSIZE_MODE_ANY_INT)
> -				  && (TREE_CODE (rhs) != INTEGER_CST))
> -			     || !rhs_valid_for_store_merging_p (rhs);
> +  if (reversep)
> +    return NULL_TREE;
>  
>  	      /* We do not want to rewrite TARGET_MEM_REFs.  */
>  	      if (TREE_CODE (base_addr) == TARGET_MEM_REF)
> -		invalid = true;
> +    return NULL_TREE;
>  	      /* In some cases get_inner_reference may return a
>  		 MEM_REF [ptr + byteoffset].  For the purposes of this pass
>  		 canonicalize the base_addr to MEM_REF [ptr] and take
> @@ -60,7 +63,7 @@
>  			}
>  		    }
>  		  else
> -		    invalid = true;
> +	return NULL_TREE;
>  		  base_addr = TREE_OPERAND (base_addr, 0);
>  		}
>  	      /* get_inner_reference returns the base object, get at its
> @@ -68,7 +71,7 @@
>  	      else
>  		{
>  		  if (bitpos < 0)
> -		    invalid = true;
> +	return NULL_TREE;
>  		  base_addr = build_fold_addr_expr (base_addr);
>  		}
>  
> @@ -78,23 +81,25 @@
>  		  bitregion_end = ROUND_UP (bitpos + bitsize, BITS_PER_UNIT);
>  		}
>  
> -	      if (! invalid
> -		  && offset != NULL_TREE)
> +  if (offset != NULL_TREE)
>  		{
> -		  /* If the access is variable offset then a base
> -		     decl has to be address-taken to be able to
> -		     emit pointer-based stores to it.
> -		     ???  We might be able to get away with
> -		     re-using the original base up to the first
> -		     variable part and then wrapping that inside
> +      /* If the access is variable offset then a base decl has to be
> +	 address-taken to be able to emit pointer-based stores to it.
> +	 ???  We might be able to get away with re-using the original
> +	 base up to the first variable part and then wrapping that inside
>  		     a BIT_FIELD_REF.  */
>  		  tree base = get_base_address (base_addr);
>  		  if (! base
> -		      || (DECL_P (base)
> -			  && ! TREE_ADDRESSABLE (base)))
> -		    invalid = true;
> -		  else
> -		    base_addr = build2 (POINTER_PLUS_EXPR,
> -					TREE_TYPE (base_addr),
> +	  || (DECL_P (base) && ! TREE_ADDRESSABLE (base)))
> +	return NULL_TREE;
> +
> +      base_addr = build2 (POINTER_PLUS_EXPR, TREE_TYPE (base_addr),
>  					base_addr, offset);
>  		}
> +
> +  *pbitsize = bitsize;
> +  *pbitpos = bitpos;
> +  *pbitregion_start = bitregion_start;
> +  *pbitregion_end = bitregion_end;
> +  return base_addr;
> +}
> 
> ------
> 
> @@ -1,71 +1,129 @@
> +void
> +pass_store_merging::process_store (gimple *stmt)
> +{
>  	      tree lhs = gimple_assign_lhs (stmt);
>  	      tree rhs = gimple_assign_rhs1 (stmt);
> +  unsigned HOST_WIDE_INT bitsize, bitpos;
> +  unsigned HOST_WIDE_INT bitregion_start;
> +  unsigned HOST_WIDE_INT bitregion_end;
> +  tree base_addr
> +    = mem_valid_for_store_merging (lhs, &bitsize, &bitpos,
> +				   &bitregion_start, &bitregion_end);
> +  if (bitsize == 0)
> +    return;
>  
> -	      /* As a future enhancement we could handle stores with the same
> -		 base and offset.  */
> -	      bool invalid = reversep
> +  bool invalid = (base_addr == NULL_TREE
>  			     || ((bitsize > MAX_BITSIZE_MODE_ANY_INT)
> -				  && (TREE_CODE (rhs) != INTEGER_CST))
> -			     || !rhs_valid_for_store_merging_p (rhs);
> +		       && (TREE_CODE (rhs) != INTEGER_CST)));
> +  enum tree_code rhs_code = ERROR_MARK;
> +  store_operand_info ops[2];
> +  if (invalid)
> +    ;
> +  else if (rhs_valid_for_store_merging_p (rhs))
> +    {
> +      rhs_code = INTEGER_CST;
> +      ops[0].val = rhs;
> +    }
> +  else if (TREE_CODE (rhs) != SSA_NAME)
> +    invalid = true;
> +  else
> +    {
> +      gimple *def_stmt = SSA_NAME_DEF_STMT (rhs), *def_stmt1, *def_stmt2;
> +      if (!is_gimple_assign (def_stmt))
> +	invalid = true;
> +      else if (handled_load (def_stmt, &ops[0], bitsize, bitpos,
> +			     bitregion_start, bitregion_end))
> +	rhs_code = MEM_REF;
> +      else
> +	switch ((rhs_code = gimple_assign_rhs_code (def_stmt)))
> +	  {
> +	  case BIT_AND_EXPR:
> +	  case BIT_IOR_EXPR:
> +	  case BIT_XOR_EXPR:
> +	    tree rhs1, rhs2;
> +	    rhs1 = gimple_assign_rhs1 (def_stmt);
> +	    rhs2 = gimple_assign_rhs2 (def_stmt);
> +	    invalid = true;
> +	    if (TREE_CODE (rhs1) != SSA_NAME)
> +	      break;
> +	    def_stmt1 = SSA_NAME_DEF_STMT (rhs1);
> +	    if (!is_gimple_assign (def_stmt1)
> +		|| !handled_load (def_stmt1, &ops[0], bitsize, bitpos,
> +				  bitregion_start, bitregion_end))
> +	      break;
> +	    if (rhs_valid_for_store_merging_p (rhs2))
> +	      ops[1].val = rhs2;
> +	    else if (TREE_CODE (rhs2) != SSA_NAME)
> +	      break;
> +	    else
> +	      {
> +		def_stmt2 = SSA_NAME_DEF_STMT (rhs2);
> +		if (!is_gimple_assign (def_stmt2))
> +		  break;
> +		else if (!handled_load (def_stmt2, &ops[1], bitsize, bitpos,
> +					bitregion_start, bitregion_end))
> +		  break;
> +	      }
> +	    invalid = false;
> +	    break;
> +	  default:
> +	    invalid = true;
> +	    break;
> +	  }
> +    }
>  
> -	      struct imm_store_chain_info **chain_info
> -		= m_stores.get (base_addr);
> +  struct imm_store_chain_info **chain_info = NULL;
> +  if (base_addr)
> +    chain_info = m_stores.get (base_addr);
>  
> -	      if (!invalid)
> +  if (invalid)
>  		{
> +      terminate_all_aliasing_chains (chain_info, stmt);
> +      return;
> +    }
> +
>  		  store_immediate_info *info;
>  		  if (chain_info)
>  		    {
>  		      unsigned int ord = (*chain_info)->m_store_info.length ();
> -		      info = new store_immediate_info (bitsize, bitpos,
> -						       bitregion_start,
> -						       bitregion_end,
> -						       stmt, ord);
> +      info = new store_immediate_info (bitsize, bitpos, bitregion_start,
> +				       bitregion_end, stmt, ord, rhs_code,
> +				       ops[0], ops[1]);
>  		      if (dump_file && (dump_flags & TDF_DETAILS))
>  			{
> -			  fprintf (dump_file,
> -				   "Recording immediate store from stmt:\n");
> +	  fprintf (dump_file, "Recording immediate store from stmt:\n");
>  			  print_gimple_stmt (dump_file, stmt, 0);
>  			}
>  		      (*chain_info)->m_store_info.safe_push (info);
> -		      /* If we reach the limit of stores to merge in a chain
> -			 terminate and process the chain now.  */
> +      /* If we reach the limit of stores to merge in a chain terminate and
> +	 process the chain now.  */
>  		      if ((*chain_info)->m_store_info.length ()
> -			   == (unsigned int)
> -			      PARAM_VALUE (PARAM_MAX_STORES_TO_MERGE))
> +	  == (unsigned int) PARAM_VALUE (PARAM_MAX_STORES_TO_MERGE))
>  			{
>  			  if (dump_file && (dump_flags & TDF_DETAILS))
>  			    fprintf (dump_file,
> -				 "Reached maximum number of statements"
> -				 " to merge:\n");
> +		     "Reached maximum number of statements to merge:\n");
>  			  terminate_and_release_chain (*chain_info);
>  			}
> -		      continue;
> +      return;
>  		    }
>  
>  		  /* Store aliases any existing chain?  */
> -		  terminate_all_aliasing_chains (chain_info, false, stmt);
> +  terminate_all_aliasing_chains (chain_info, stmt);
>  		  /* Start a new chain.  */
>  		  struct imm_store_chain_info *new_chain
>  		    = new imm_store_chain_info (m_stores_head, base_addr);
> -		  info = new store_immediate_info (bitsize, bitpos,
> -						   bitregion_start,
> -						   bitregion_end,
> -						   stmt, 0);
> +  info = new store_immediate_info (bitsize, bitpos, bitregion_start,
> +				   bitregion_end, stmt, 0, rhs_code,
> +				   ops[0], ops[1]);
>  		  new_chain->m_store_info.safe_push (info);
>  		  m_stores.put (base_addr, new_chain);
>  		  if (dump_file && (dump_flags & TDF_DETAILS))
>  		    {
> -		      fprintf (dump_file,
> -			       "Starting new chain with statement:\n");
> +      fprintf (dump_file, "Starting new chain with statement:\n");
>  		      print_gimple_stmt (dump_file, stmt, 0);
>  		      fprintf (dump_file, "The base object is:\n");
>  		      print_generic_expr (dump_file, base_addr);
>  		      fprintf (dump_file, "\n");
>  		    }
> -		}
> -	      else
> -		terminate_all_aliasing_chains (chain_info,
> -					       offset != NULL_TREE, stmt);
> -
> -	      continue;
> +}
> 
> 
> 	Jakub
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]