[PATCH] Improve store merging to handle load+store or bitwise logicals (PR tree-optimization/78821)
Kyrill Tkachov
kyrylo.tkachov@foss.arm.com
Thu Nov 2 15:38:00 GMT 2017
Hi Jakub,
On 02/11/17 14:10, Jakub Jelinek wrote:
> Hi!
>
> The following patch improves store merging, so that it doesn't handle
> just constant stores into adjacent memory, but also adjacent memory copying
> and simple bitwise logical ops where at least one argument is a load
> from adjacent memory and the other argument as well or a constant.
> The loads are limited to be either all using the same vuse, or each using
> vuse of the corresponding stores. So examples of what can be handled are:
> s.a = 1; s.b = 2; // we could handle this before this patch already
> _1 = t.a; _2 = t.b; s.a = _1; s.b = _2; // copying with the same vuse
> _1 = t.a; s.a = _1; _2 = t.b; s.b = _2; // copying with vuse of the store
> _1 = s.a; _2 = _1 | 23; _3 = s.b; _4 = _3 | 12345; s.a = _2; s.b = _4; // | with one load and one constant
> etc.
> What the patch doesn't handle yet because
> terminate_all_aliasing_chains uses:
> /* We can't use the base object here as that does not reliably exist.
> Build a ao_ref from the base object address (if we know the
> minimum and maximum offset and the maximum size we could improve
> things here). */
> ao_ref chain_ref;
> ao_ref_init_from_ptr_and_size (&chain_ref, cur->base_addr, NULL_TREE);
> is e.g.
> void
> f3 (struct S *__restrict p, struct S *__restrict q)
> {
> p->a |= q->a; p->b |= q->b; p->c |= q->c; p->d |= q->d;
> p->e |= q->e; p->f |= q->f; p->g |= q->g; p->h |= q->h;
> }
> I'll try to improve that incrementally by preserving the underlying original
> reference and tracking minimum/maximum offsets from that.
>
> The patch also doesn't hook in the bswap infrastructure to recognize say
> struct S { char a, b, c, d; } u, v;
> void foo (void) { u.a = v.d; u.b = v.c; u.c = v.b; u.d = v.a; }
> though wonder if it is worth it (whether there is any real-world code like
> that at all or common enough to worth the work on it).
>
> Bootstrapped/regtested on {x86_64,i686,powerpc64,powerpc64le}-linux, ok for
> trunk?
>
> I'm now doing another x86_64/i686 bootstrap/regtest to gather some
> statistics, both are still regtesting now, the current numbers show:
> rhs_code split_stores.length () orig_num_stmts
> integer_cst 135533 275746
> mem_ref 13289 27852
> bit_*_expr 36 81
> so the first row shows that already before this patch when we decided to
> optimize constant stores we decreased the number to 50% on average, for
> memory copying around 10% cases of the constant stores and the reason
> why the bitwise logical don't trigger much is probably related to the
> above mentioned ao_ref_init* missed-opt as well as such constructs being
> far less common. In theory we could handle also mixed rhs codes, but not
> sure it is worth the effort - e.g. if somebody does:
> s.a = 5; s.b |= 4; s.c &= 2; s.d ^= 5;
> we could load the memory and do some |/&/^ on it.
this looks great! I have a couple of comments.
* Can you please extend file comments for gimple-ssa-store-merging.c ?
Currently it mostly describes how we merge constants together. Once we
start accepting non-constant members
we should mention it in there.
* Can we also handle BIT_NOT_EXPRESSIONS? i.e. Copying memory locations
but but with a unary op applied on top.
Don't know how often that comes up though. Maybe it will complicate
store_operand_info and its handling too much
to be worth it...
> 2017-11-02 Jakub Jelinek <jakub@redhat.com>
>
> PR tree-optimization/78821
> * gimple-ssa-store-merging.c (struct store_operand_info): New type.
> (store_operand_info::store_operand_info): New constructor.
> (struct store_immediate_info): Add rhs_code and ops data members.
> (store_immediate_info::store_immediate_info): Add rhscode, op0r
> and op1r arguments to the ctor, initialize corresponding data members.
> (struct merged_store_group): Add load_align_base and load_align
> data members.
> (merged_store_group::merged_store_group): Initialize them.
> (merged_store_group::do_merge): Update them.
> (merged_store_group::apply_stores): Pick the constant for
> encode_tree_to_bitpos from one of the two operands, or skip
> encode_tree_to_bitpos if neither operand is a constant.
> (class pass_store_merging): Add process_store method decl. Remove
> bool argument from terminate_all_aliasing_chains method decl.
> (pass_store_merging::terminate_all_aliasing_chains): Remove
> var_offset_p argument and corresponding handling.
> (stmts_may_clobber_ref_p): New function.
> (compatible_load_p): New function.
> (imm_store_chain_info::coalesce_immediate_stores): Terminate group
> if there is overlap and rhs_code is not INTEGER_CST. For
> non-overlapping stores terminate group if rhs is not mergeable.
> (get_alias_type_for_stmts): Change first argument from
> auto_vec<gimple *> & to vec<gimple *> &. Add IS_LOAD, CLIQUEP and
> BASEP arguments. If IS_LOAD is true, look at rhs1 of the stmts
> instead of lhs. Compute *CLIQUEP and *BASEP in addition to the
> alias type.
> (get_location_for_stmts): Change first argument from
> auto_vec<gimple *> & to vec<gimple *> &.
> (struct split_store): Remove orig_stmts data member, add orig_stores.
> (split_store::split_store): Create orig_stores rather than orig_stmts.
> (find_constituent_stmts): Renamed to ...
> (find_constituent_stores): ... this. Change second argument from
> vec<gimple *> * to vec<store_immediate_info *> *, push pointers
> to info structures rather than the statements.
> (split_group): Rename ALLOW_UNALIGNED argument to
> ALLOW_UNALIGNED_STORE, add ALLOW_UNALIGNED_LOAD argument and handle
> it. Adjust find_constituent_stores caller.
> (imm_store_chain_info::output_merged_store): Handle rhs_code other
> than INTEGER_CST, adjust split_group, get_alias_type_for_stmts and
> get_location_for_stmts callers. Set MR_DEPENDENCE_CLIQUE and
> MR_DEPENDENCE_BASE on the MEM_REFs if they are the same in all stores.
> (mem_valid_for_store_merging): New function.
> (handled_load): New function.
> (pass_store_merging::process_store): New method.
> (pass_store_merging::execute): Use process_store method. Adjust
> terminate_all_aliasing_chains caller.
>
> * gcc.dg/store_merging_13.c: New test.
> * gcc.dg/store_merging_14.c: New test.
>
<snip>
> @@ -920,6 +980,107 @@ pass_store_merging::terminate_and_releas
> return ret;
> }
>
> +/* Return true if stmts in between FIRST (inclusive) and LAST (exclusive)
> + may clobber REF. FIRST and LAST must be in the same basic block and
> + have non-NULL vdef. */
> +
> +bool
> +stmts_may_clobber_ref_p (gimple *first, gimple *last, tree ref)
> +{
> + ao_ref r;
> + ao_ref_init (&r, ref);
> + unsigned int count = 0;
> + tree vop = gimple_vdef (last);
> + gimple *stmt;
> +
> + gcc_checking_assert (gimple_bb (first) == gimple_bb (last));
> + do
> + {
> + stmt = SSA_NAME_DEF_STMT (vop);
> + if (stmt_may_clobber_ref_p_1 (stmt, &r))
> + return true;
> + if (++count > 64)
> + return true;
Magic number 64? Don't know if it's worth having a PARAM for it, but at
least a comment saying we're bailing out
for compile time considerations would be good.
> + vop = gimple_vuse (stmt);
> + }
> + while (stmt != first);
> + return false;
> +}
> +
<snip>
> + tree ops[2];
> + for (int j = 0;
> + j < 1 + (split_store->orig_stores[0]->ops[1].val != NULL_TREE);
> + ++j)
> + {
> + store_operand_info &op = split_store->orig_stores[0]->ops[j];
> + if (op.base_addr)
> + {
> + FOR_EACH_VEC_ELT (split_store->orig_stores, k, info)
> + orig_stmts.safe_push (info->ops[j].stmt);
> +
> + offset_type = get_alias_type_for_stmts (orig_stmts, true,
> + &clique, &base);
> + location_t load_loc = get_location_for_stmts (orig_stmts);
> + orig_stmts.truncate (0);
> +
> + unsigned HOST_WIDE_INT load_align = group->load_align[j];
> + unsigned HOST_WIDE_INT align_bitpos
> + = (try_pos * BITS_PER_UNIT
> + - split_store->orig_stores[0]->bitpos
> + + op.bitpos) & (load_align - 1);
> + if (align_bitpos)
> + load_align = least_bit_hwi (align_bitpos);
> +
> + tree load_int_type
> + = build_nonstandard_integer_type (try_size, UNSIGNED);
> + load_int_type
> + = build_aligned_type (load_int_type, load_align);
> +
> + unsigned HOST_WIDE_INT load_pos
> + = (try_pos * BITS_PER_UNIT
> + - split_store->orig_stores[0]->bitpos
> + + op.bitpos) / BITS_PER_UNIT;
> + ops[j] = fold_build2 (MEM_REF, load_int_type, load_addr[j],
> + build_int_cst (offset_type, load_pos));
> + if (TREE_CODE (ops[j]) == MEM_REF)
> + {
> + MR_DEPENDENCE_CLIQUE (ops[j]) = clique;
> + MR_DEPENDENCE_BASE (ops[j]) = base;
> + }
> + if (!integer_zerop (mask))
> + TREE_NO_WARNING (ops[j]) = 1;
Please add a comment here as to why we need the TREE_NO_WARNING here.
Thanks,
Kyrill
More information about the Gcc-patches
mailing list