[PATCH] Improve store merging to handle load+store or bitwise logicals (PR tree-optimization/78821)

Kyrill Tkachov kyrylo.tkachov@foss.arm.com
Thu Nov 2 15:38:00 GMT 2017


Hi Jakub,

On 02/11/17 14:10, Jakub Jelinek wrote:
> Hi!
>
> The following patch improves store merging, so that it doesn't handle
> just constant stores into adjacent memory, but also adjacent memory copying
> and simple bitwise logical ops where at least one argument is a load
> from adjacent memory and the other argument as well or a constant.
> The loads are limited to be either all using the same vuse, or each using
> vuse of the corresponding stores.  So examples of what can be handled are:
>    s.a = 1; s.b = 2; // we could handle this before this patch already
>    _1 = t.a; _2 = t.b; s.a = _1; s.b = _2; // copying with the same vuse
>    _1 = t.a; s.a = _1; _2 = t.b; s.b = _2; // copying with vuse of the store
>    _1 = s.a; _2 = _1 | 23; _3 = s.b; _4 = _3 | 12345; s.a = _2; s.b = _4; // | with one load and one constant
> etc.
> What the patch doesn't handle yet because
> terminate_all_aliasing_chains uses:
>        /* We can't use the base object here as that does not reliably exist.
>           Build a ao_ref from the base object address (if we know the
>           minimum and maximum offset and the maximum size we could improve
>           things here).  */
>        ao_ref chain_ref;
>        ao_ref_init_from_ptr_and_size (&chain_ref, cur->base_addr, NULL_TREE);
> is e.g.
> void
> f3 (struct S *__restrict p, struct S *__restrict q)
> {
>    p->a |= q->a; p->b |= q->b; p->c |= q->c; p->d |= q->d;
>    p->e |= q->e; p->f |= q->f; p->g |= q->g; p->h |= q->h;
> }
> I'll try to improve that incrementally by preserving the underlying original
> reference and tracking minimum/maximum offsets from that.
>
> The patch also doesn't hook in the bswap infrastructure to recognize say
> struct S { char a, b, c, d; } u, v;
> void foo (void) { u.a = v.d; u.b = v.c; u.c = v.b; u.d = v.a; }
> though wonder if it is worth it (whether there is any real-world code like
> that at all or common enough to worth the work on it).
>
> Bootstrapped/regtested on {x86_64,i686,powerpc64,powerpc64le}-linux, ok for
> trunk?
>
> I'm now doing another x86_64/i686 bootstrap/regtest to gather some
> statistics, both are still regtesting now, the current numbers show:
> rhs_code	split_stores.length ()		orig_num_stmts
> integer_cst	135533				275746
> mem_ref		13289				27852
> bit_*_expr	36				81
> so the first row shows that already before this patch when we decided to
> optimize constant stores we decreased the number to 50% on average, for
> memory copying around 10% cases of the constant stores and the reason
> why the bitwise logical don't trigger much is probably related to the
> above mentioned ao_ref_init* missed-opt as well as such constructs being
> far less common.  In theory we could handle also mixed rhs codes, but not
> sure it is worth the effort - e.g. if somebody does:
>    s.a = 5; s.b |= 4; s.c &= 2; s.d ^= 5;
> we could load the memory and do some |/&/^ on it.

this looks great! I have a couple of comments.
* Can you please extend file comments for gimple-ssa-store-merging.c ?
Currently it mostly describes how we merge constants together. Once we 
start accepting non-constant members
we should mention it in there.

* Can we also handle BIT_NOT_EXPRESSIONS? i.e. Copying memory locations 
but but with a unary op applied on top.
Don't know how often that comes up though. Maybe it will complicate 
store_operand_info and its handling too much
to be worth it...
> 2017-11-02  Jakub Jelinek  <jakub@redhat.com>
>
> 	PR tree-optimization/78821
> 	* gimple-ssa-store-merging.c (struct store_operand_info): New type.
> 	(store_operand_info::store_operand_info): New constructor.
> 	(struct store_immediate_info): Add rhs_code and ops data members.
> 	(store_immediate_info::store_immediate_info): Add rhscode, op0r
> 	and op1r arguments to the ctor, initialize corresponding data members.
> 	(struct merged_store_group): Add load_align_base and load_align
> 	data members.
> 	(merged_store_group::merged_store_group): Initialize them.
> 	(merged_store_group::do_merge): Update them.
> 	(merged_store_group::apply_stores): Pick the constant for
> 	encode_tree_to_bitpos from one of the two operands, or skip
> 	encode_tree_to_bitpos if neither operand is a constant.
> 	(class pass_store_merging): Add process_store method decl.  Remove
> 	bool argument from terminate_all_aliasing_chains method decl.
> 	(pass_store_merging::terminate_all_aliasing_chains): Remove
> 	var_offset_p argument and corresponding handling.
> 	(stmts_may_clobber_ref_p): New function.
> 	(compatible_load_p): New function.
> 	(imm_store_chain_info::coalesce_immediate_stores): Terminate group
> 	if there is overlap and rhs_code is not INTEGER_CST.  For
> 	non-overlapping stores terminate group if rhs is not mergeable.
> 	(get_alias_type_for_stmts): Change first argument from
> 	auto_vec<gimple *> & to vec<gimple *> &.  Add IS_LOAD, CLIQUEP and
> 	BASEP arguments.  If IS_LOAD is true, look at rhs1 of the stmts
> 	instead of lhs.  Compute *CLIQUEP and *BASEP in addition to the
> 	alias type.
> 	(get_location_for_stmts): Change first argument from
> 	auto_vec<gimple *> & to vec<gimple *> &.
> 	(struct split_store): Remove orig_stmts data member, add orig_stores.
> 	(split_store::split_store): Create orig_stores rather than orig_stmts.
> 	(find_constituent_stmts): Renamed to ...
> 	(find_constituent_stores): ... this.  Change second argument from
> 	vec<gimple *> * to vec<store_immediate_info *> *, push pointers
> 	to info structures rather than the statements.
> 	(split_group): Rename ALLOW_UNALIGNED argument to
> 	ALLOW_UNALIGNED_STORE, add ALLOW_UNALIGNED_LOAD argument and handle
> 	it.  Adjust find_constituent_stores caller.
> 	(imm_store_chain_info::output_merged_store): Handle rhs_code other
> 	than INTEGER_CST, adjust split_group, get_alias_type_for_stmts and
> 	get_location_for_stmts callers.  Set MR_DEPENDENCE_CLIQUE and
> 	MR_DEPENDENCE_BASE on the MEM_REFs if they are the same in all stores.
> 	(mem_valid_for_store_merging): New function.
> 	(handled_load): New function.
> 	(pass_store_merging::process_store): New method.
> 	(pass_store_merging::execute): Use process_store method.  Adjust
> 	terminate_all_aliasing_chains caller.
>
> 	* gcc.dg/store_merging_13.c: New test.
> 	* gcc.dg/store_merging_14.c: New test.
>

<snip>

> @@ -920,6 +980,107 @@ pass_store_merging::terminate_and_releas
>     return ret;
>   }
>   
> +/* Return true if stmts in between FIRST (inclusive) and LAST (exclusive)
> +   may clobber REF.  FIRST and LAST must be in the same basic block and
> +   have non-NULL vdef.  */
> +
> +bool
> +stmts_may_clobber_ref_p (gimple *first, gimple *last, tree ref)
> +{
> +  ao_ref r;
> +  ao_ref_init (&r, ref);
> +  unsigned int count = 0;
> +  tree vop = gimple_vdef (last);
> +  gimple *stmt;
> +
> +  gcc_checking_assert (gimple_bb (first) == gimple_bb (last));
> +  do
> +    {
> +      stmt = SSA_NAME_DEF_STMT (vop);
> +      if (stmt_may_clobber_ref_p_1 (stmt, &r))
> +	return true;
> +      if (++count > 64)
> +	return true;

Magic number 64? Don't know if it's worth having a PARAM for it, but at 
least a comment saying we're bailing out
for compile time considerations would be good.

> +      vop = gimple_vuse (stmt);
> +    }
> +  while (stmt != first);
> +  return false;
> +}
> +

<snip>

> +	  tree ops[2];
> +	  for (int j = 0;
> +	       j < 1 + (split_store->orig_stores[0]->ops[1].val != NULL_TREE);
> +	       ++j)
> +	    {
> +	      store_operand_info &op = split_store->orig_stores[0]->ops[j];
> +	      if (op.base_addr)
> +		{
> +		  FOR_EACH_VEC_ELT (split_store->orig_stores, k, info)
> +		    orig_stmts.safe_push (info->ops[j].stmt);
> +
> +		  offset_type = get_alias_type_for_stmts (orig_stmts, true,
> +							  &clique, &base);
> +		  location_t load_loc = get_location_for_stmts (orig_stmts);
> +		  orig_stmts.truncate (0);
> +
> +		  unsigned HOST_WIDE_INT load_align = group->load_align[j];
> +		  unsigned HOST_WIDE_INT align_bitpos
> +		    = (try_pos * BITS_PER_UNIT
> +		       - split_store->orig_stores[0]->bitpos
> +		       + op.bitpos) & (load_align - 1);
> +		  if (align_bitpos)
> +		    load_align = least_bit_hwi (align_bitpos);
> +
> +		  tree load_int_type
> +		    = build_nonstandard_integer_type (try_size, UNSIGNED);
> +		  load_int_type
> +		    = build_aligned_type (load_int_type, load_align);
> +
> +		  unsigned HOST_WIDE_INT load_pos
> +		    = (try_pos * BITS_PER_UNIT
> +		       - split_store->orig_stores[0]->bitpos
> +		       + op.bitpos) / BITS_PER_UNIT;
> +		  ops[j] = fold_build2 (MEM_REF, load_int_type, load_addr[j],
> +					build_int_cst (offset_type, load_pos));
> +		  if (TREE_CODE (ops[j]) == MEM_REF)
> +		    {
> +		      MR_DEPENDENCE_CLIQUE (ops[j]) = clique;
> +		      MR_DEPENDENCE_BASE (ops[j]) = base;
> +		    }
> +		  if (!integer_zerop (mask))
> +		    TREE_NO_WARNING (ops[j]) = 1;

Please add a comment here as to why we need the TREE_NO_WARNING here.

Thanks,
Kyrill



More information about the Gcc-patches mailing list