This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Improve store merging to handle load+store or bitwise logicals (PR tree-optimization/78821, take 2)
- From: Richard Biener <rguenther at suse dot de>
- To: Jakub Jelinek <jakub at redhat dot com>
- Cc: Kyrill Tkachov <kyrylo dot tkachov at foss dot arm dot com>, gcc-patches at gcc dot gnu dot org
- Date: Fri, 3 Nov 2017 15:09:25 +0100 (CET)
- Subject: Re: [PATCH] Improve store merging to handle load+store or bitwise logicals (PR tree-optimization/78821, take 2)
- Authentication-results: sourceware.org; auth=none
- References: <20171102141031.GH14653@tucnak> <59FB3C05.1090403@foss.arm.com> <20171102173641.GL14653@tucnak> <alpine.LSU.2.20.1711031347210.12252@zhemvz.fhfr.qr> <20171103140418.GM14653@tucnak>
On Fri, 3 Nov 2017, Jakub Jelinek wrote:
> On Fri, Nov 03, 2017 at 02:14:39PM +0100, Richard Biener wrote:
> > > +/* Return true if stmts in between FIRST (inclusive) and LAST (exclusive)
> > > + may clobber REF. FIRST and LAST must be in the same basic block and
> > > + have non-NULL vdef. */
> > > +
> > > +bool
> > > +stmts_may_clobber_ref_p (gimple *first, gimple *last, tree ref)
> > > +{
> > > + ao_ref r;
> > > + ao_ref_init (&r, ref);
> > > + unsigned int count = 0;
> > > + tree vop = gimple_vdef (last);
> > > + gimple *stmt;
> > > +
> > > + gcc_checking_assert (gimple_bb (first) == gimple_bb (last));
> >
> > EBB would probably work as well, thus we should assert we do not
> > end up visiting a PHI in the loop?
>
> For a general purpose routine sure, this one is in anonymous namespace
> and meant for use in this pass. And there it is only checking stores from
> the same store group and any other stores intermixed between those.
> The pass at least right now is resetting all of its state at the end of
> basic blocks, so gimple_bb (first) == gimple_bb (last) is indeed always
> guaranteed. If we ever extend it such that we don't have this guarantee,
> then this assert would fail and then of course it should be adjusted to
> handle whatever is needed. But do we need to do that right now?
No, we don't. Just wondered about the assert and the real limitation
of the implementation.
> Note extending store-merging to handle groups of stores within EBB
> doesn't look useful, then not all stores would be unconditional.
Yes.
> What could make sense is if we have e.g. a diamond
> |
> bb1
> / \
> bb2 bb3
> \ /
> bb4
> |
> and stores are in bb1 and bb4 and no stores in bb2 or bb3 can alias
> with those. But then we'd likely need full-blown walk_aliased_vdefs
> for this...
Yes.
> > > + gimple *first = merged_store->first_stmt;
> > > + gimple *last = merged_store->last_stmt;
> > > + unsigned int i;
> > > + store_immediate_info *infoc;
> > > + if (info->order < merged_store->first_order)
> > > + {
> > > + FOR_EACH_VEC_ELT (merged_store->stores, i, infoc)
> > > + if (stmts_may_clobber_ref_p (info->stmt, first, infoc->ops[idx].val))
> > > + return false;
> > > + first = info->stmt;
> > > + }
> > > + else if (info->order > merged_store->last_order)
> > > + {
> > > + FOR_EACH_VEC_ELT (merged_store->stores, i, infoc)
> > > + if (stmts_may_clobber_ref_p (last, info->stmt, infoc->ops[idx].val))
> > > + return false;
> > > + last = info->stmt;
> > > + }
> > > + if (stmts_may_clobber_ref_p (first, last, info->ops[idx].val))
> > > + return false;
> >
> > Can you comment on what you check in this block? It first checks
> > all stmts (but not info->stmt itself if it is after last!?)
> > against
> > all stores that would be added when adding 'info'. Then it checks
> > from new first to last against the newly added stmt (again
> > excluding that stmt if it was added last).
>
> The stmts_may_clobber_ref_p routine doesn't check aliasing on the last
> stmt, only on the first stmt and stmts in between.
>
> Previous iterations have checked FOR_EACH_VEC_ELT (merged_store->stores, i, infoc)
> that merged_store->first_stmt and stmts in between that and
> merged_store->last_stmt don't clobber any of the infoc->ops[idx].val
> references and we want to maintain that invariant if we add another store to
> the group. So, if we are about to extend the range between first_stmt and
> last_stmt, then we need to check all the earlier refs on the stmts we've
> added to the range. Note that the stores are sorted by bitpos, not by
> their order within the basic block at this point, so it is possible that a
> store with a higher bitpos extends to earlier stmts or later stmts.
>
> And finally the if (stmts_may_clobber_ref_p (first, last, info->ops[idx].val))
> is checking the reference we are adding against the whole new range.
>
> > > + if (offset != NULL_TREE)
> > > + {
> > > + /* If the access is variable offset then a base decl has to be
> > > + address-taken to be able to emit pointer-based stores to it.
> > > + ??? We might be able to get away with re-using the original
> > > + base up to the first variable part and then wrapping that inside
> > > + a BIT_FIELD_REF. */
> >
> > Yes, that's what I'd generally recommend... OTOH it can get quite
> > fugly but it only has to survive until RTL expansion...
>
> This is an preexisting comment I've just moved around from the
> pass_store_merging::execute method into a helper function (it grew too much
> and needed too big indentation and furthermore I wanted to use it for the
> loads too). Haven't really changed anything on that.
>
> > As extra sanity check I'd rather have that all refs share a common
> > base (operand-equal-p'ish). But I guess that's what usually will
> > happen anyways. The alias-ptr-type trick will be tricky to do
> > here as well (you have to go down to the base MEM_REF, wrap
> > a decl if there's no MEM_REF and adjust the offset type).
>
> For the aliasing, I have an untested incremental patch, need to finish
> testcases for that, then test and then I can post it.
>
> > given the style of processing we can end up doing more than
> > necessary work when following ! single-use chains here, no?
> > Would it make sense to restrict this to single-uses or do we
> > handle any case of ! single-uses? When extending things to
> > allow an intermediate swap or general permute we could handle
> > a byte/word-splat for example.
>
> single-use vs. multiple uses is something I've thought about, but don't
> know whether it is better to require single-use or not (or sometimes,
> under some condition?). Say if we have:
> _1 = t.a;
> _2 = t.b;
> _3 = t.c;
> _4 = t.d;
> s.a = _1;
> s.b = _2;
> s.c = _3;
> s.d = _4;
> use (_1, _2, _3, _4);
> it will likely be shorter and maybe faster if we do:
> _1 = t.a;
> _2 = t.b;
> _3 = t.c;
> _4 = t.d;
> _5 = load[t.a...t.d];
> store[s.a...s.d] = _5;
> use (_1, _2, _3, _4);
> If there are just 2 stores combined together, having
> _1 = t.a; _2 = t.b; _5 = load[t.a...t.b]; store[t.a...t.b] = _5;
> use (_1, _2);
> will be as many stmts as before. And if there is BIT_*_EXPR, we can be
> adding not just loads, but also the bitwise ops.
>
> So, if you want, I can at least for now add has_single_use checks
> in all the spots where it follows SSA_NAME_DEF_STMT.
>
> > Otherwise the patch looks good. Quite some parts of the changes
> > seem to be due to splitting out stuff into functions and refactoring.
>
> Indeed, the mem_valid_for_store_merging and pass_store_merging::process_store
> functions are mostly code move + reindent + tweaks.
>
> Here are hand made diffs between old portions of pass_store_merging::execute
> corresponding to the above 2 functions and those new functions with -upbd
> if it is of any help.
>
> @@ -1,33 +1,36 @@
> - HOST_WIDE_INT bitsize, bitpos;
> +static tree
> +mem_valid_for_store_merging (tree mem, unsigned HOST_WIDE_INT *pbitsize,
> + unsigned HOST_WIDE_INT *pbitpos,
> + unsigned HOST_WIDE_INT *pbitregion_start,
> + unsigned HOST_WIDE_INT *pbitregion_end)
> +{
> + HOST_WIDE_INT bitsize;
> + HOST_WIDE_INT bitpos;
> unsigned HOST_WIDE_INT bitregion_start = 0;
> unsigned HOST_WIDE_INT bitregion_end = 0;
> machine_mode mode;
> int unsignedp = 0, reversep = 0, volatilep = 0;
> - tree offset, base_addr;
> - base_addr
> - = get_inner_reference (lhs, &bitsize, &bitpos, &offset, &mode,
> + tree offset;
> + tree base_addr = get_inner_reference (mem, &bitsize, &bitpos, &offset, &mode,
> &unsignedp, &reversep, &volatilep);
> - if (TREE_CODE (lhs) == COMPONENT_REF
> - && DECL_BIT_FIELD_TYPE (TREE_OPERAND (lhs, 1)))
> + *pbitsize = bitsize;
> + if (bitsize == 0)
> + return NULL_TREE;
> +
> + if (TREE_CODE (mem) == COMPONENT_REF
> + && DECL_BIT_FIELD_TYPE (TREE_OPERAND (mem, 1)))
> {
> - get_bit_range (&bitregion_start, &bitregion_end, lhs,
> - &bitpos, &offset);
> + get_bit_range (&bitregion_start, &bitregion_end, mem, &bitpos, &offset);
> if (bitregion_end)
> ++bitregion_end;
> }
> - if (bitsize == 0)
> - continue;
>
> - /* As a future enhancement we could handle stores with the same
> - base and offset. */
> - bool invalid = reversep
> - || ((bitsize > MAX_BITSIZE_MODE_ANY_INT)
> - && (TREE_CODE (rhs) != INTEGER_CST))
> - || !rhs_valid_for_store_merging_p (rhs);
> + if (reversep)
> + return NULL_TREE;
>
> /* We do not want to rewrite TARGET_MEM_REFs. */
> if (TREE_CODE (base_addr) == TARGET_MEM_REF)
> - invalid = true;
> + return NULL_TREE;
> /* In some cases get_inner_reference may return a
> MEM_REF [ptr + byteoffset]. For the purposes of this pass
> canonicalize the base_addr to MEM_REF [ptr] and take
> @@ -60,7 +63,7 @@
> }
> }
> else
> - invalid = true;
> + return NULL_TREE;
> base_addr = TREE_OPERAND (base_addr, 0);
> }
> /* get_inner_reference returns the base object, get at its
> @@ -68,7 +71,7 @@
> else
> {
> if (bitpos < 0)
> - invalid = true;
> + return NULL_TREE;
> base_addr = build_fold_addr_expr (base_addr);
> }
>
> @@ -78,23 +81,25 @@
> bitregion_end = ROUND_UP (bitpos + bitsize, BITS_PER_UNIT);
> }
>
> - if (! invalid
> - && offset != NULL_TREE)
> + if (offset != NULL_TREE)
> {
> - /* If the access is variable offset then a base
> - decl has to be address-taken to be able to
> - emit pointer-based stores to it.
> - ??? We might be able to get away with
> - re-using the original base up to the first
> - variable part and then wrapping that inside
> + /* If the access is variable offset then a base decl has to be
> + address-taken to be able to emit pointer-based stores to it.
> + ??? We might be able to get away with re-using the original
> + base up to the first variable part and then wrapping that inside
> a BIT_FIELD_REF. */
> tree base = get_base_address (base_addr);
> if (! base
> - || (DECL_P (base)
> - && ! TREE_ADDRESSABLE (base)))
> - invalid = true;
> - else
> - base_addr = build2 (POINTER_PLUS_EXPR,
> - TREE_TYPE (base_addr),
> + || (DECL_P (base) && ! TREE_ADDRESSABLE (base)))
> + return NULL_TREE;
> +
> + base_addr = build2 (POINTER_PLUS_EXPR, TREE_TYPE (base_addr),
> base_addr, offset);
> }
> +
> + *pbitsize = bitsize;
> + *pbitpos = bitpos;
> + *pbitregion_start = bitregion_start;
> + *pbitregion_end = bitregion_end;
> + return base_addr;
> +}
>
> ------
>
> @@ -1,71 +1,129 @@
> +void
> +pass_store_merging::process_store (gimple *stmt)
> +{
> tree lhs = gimple_assign_lhs (stmt);
> tree rhs = gimple_assign_rhs1 (stmt);
> + unsigned HOST_WIDE_INT bitsize, bitpos;
> + unsigned HOST_WIDE_INT bitregion_start;
> + unsigned HOST_WIDE_INT bitregion_end;
> + tree base_addr
> + = mem_valid_for_store_merging (lhs, &bitsize, &bitpos,
> + &bitregion_start, &bitregion_end);
> + if (bitsize == 0)
> + return;
>
> - /* As a future enhancement we could handle stores with the same
> - base and offset. */
> - bool invalid = reversep
> + bool invalid = (base_addr == NULL_TREE
> || ((bitsize > MAX_BITSIZE_MODE_ANY_INT)
> - && (TREE_CODE (rhs) != INTEGER_CST))
> - || !rhs_valid_for_store_merging_p (rhs);
> + && (TREE_CODE (rhs) != INTEGER_CST)));
> + enum tree_code rhs_code = ERROR_MARK;
> + store_operand_info ops[2];
> + if (invalid)
> + ;
> + else if (rhs_valid_for_store_merging_p (rhs))
> + {
> + rhs_code = INTEGER_CST;
> + ops[0].val = rhs;
> + }
> + else if (TREE_CODE (rhs) != SSA_NAME)
> + invalid = true;
> + else
> + {
> + gimple *def_stmt = SSA_NAME_DEF_STMT (rhs), *def_stmt1, *def_stmt2;
> + if (!is_gimple_assign (def_stmt))
> + invalid = true;
> + else if (handled_load (def_stmt, &ops[0], bitsize, bitpos,
> + bitregion_start, bitregion_end))
> + rhs_code = MEM_REF;
> + else
> + switch ((rhs_code = gimple_assign_rhs_code (def_stmt)))
> + {
> + case BIT_AND_EXPR:
> + case BIT_IOR_EXPR:
> + case BIT_XOR_EXPR:
> + tree rhs1, rhs2;
> + rhs1 = gimple_assign_rhs1 (def_stmt);
> + rhs2 = gimple_assign_rhs2 (def_stmt);
> + invalid = true;
> + if (TREE_CODE (rhs1) != SSA_NAME)
> + break;
> + def_stmt1 = SSA_NAME_DEF_STMT (rhs1);
> + if (!is_gimple_assign (def_stmt1)
> + || !handled_load (def_stmt1, &ops[0], bitsize, bitpos,
> + bitregion_start, bitregion_end))
> + break;
> + if (rhs_valid_for_store_merging_p (rhs2))
> + ops[1].val = rhs2;
> + else if (TREE_CODE (rhs2) != SSA_NAME)
> + break;
> + else
> + {
> + def_stmt2 = SSA_NAME_DEF_STMT (rhs2);
> + if (!is_gimple_assign (def_stmt2))
> + break;
> + else if (!handled_load (def_stmt2, &ops[1], bitsize, bitpos,
> + bitregion_start, bitregion_end))
> + break;
> + }
> + invalid = false;
> + break;
> + default:
> + invalid = true;
> + break;
> + }
> + }
>
> - struct imm_store_chain_info **chain_info
> - = m_stores.get (base_addr);
> + struct imm_store_chain_info **chain_info = NULL;
> + if (base_addr)
> + chain_info = m_stores.get (base_addr);
>
> - if (!invalid)
> + if (invalid)
> {
> + terminate_all_aliasing_chains (chain_info, stmt);
> + return;
> + }
> +
> store_immediate_info *info;
> if (chain_info)
> {
> unsigned int ord = (*chain_info)->m_store_info.length ();
> - info = new store_immediate_info (bitsize, bitpos,
> - bitregion_start,
> - bitregion_end,
> - stmt, ord);
> + info = new store_immediate_info (bitsize, bitpos, bitregion_start,
> + bitregion_end, stmt, ord, rhs_code,
> + ops[0], ops[1]);
> if (dump_file && (dump_flags & TDF_DETAILS))
> {
> - fprintf (dump_file,
> - "Recording immediate store from stmt:\n");
> + fprintf (dump_file, "Recording immediate store from stmt:\n");
> print_gimple_stmt (dump_file, stmt, 0);
> }
> (*chain_info)->m_store_info.safe_push (info);
> - /* If we reach the limit of stores to merge in a chain
> - terminate and process the chain now. */
> + /* If we reach the limit of stores to merge in a chain terminate and
> + process the chain now. */
> if ((*chain_info)->m_store_info.length ()
> - == (unsigned int)
> - PARAM_VALUE (PARAM_MAX_STORES_TO_MERGE))
> + == (unsigned int) PARAM_VALUE (PARAM_MAX_STORES_TO_MERGE))
> {
> if (dump_file && (dump_flags & TDF_DETAILS))
> fprintf (dump_file,
> - "Reached maximum number of statements"
> - " to merge:\n");
> + "Reached maximum number of statements to merge:\n");
> terminate_and_release_chain (*chain_info);
> }
> - continue;
> + return;
> }
>
> /* Store aliases any existing chain? */
> - terminate_all_aliasing_chains (chain_info, false, stmt);
> + terminate_all_aliasing_chains (chain_info, stmt);
> /* Start a new chain. */
> struct imm_store_chain_info *new_chain
> = new imm_store_chain_info (m_stores_head, base_addr);
> - info = new store_immediate_info (bitsize, bitpos,
> - bitregion_start,
> - bitregion_end,
> - stmt, 0);
> + info = new store_immediate_info (bitsize, bitpos, bitregion_start,
> + bitregion_end, stmt, 0, rhs_code,
> + ops[0], ops[1]);
> new_chain->m_store_info.safe_push (info);
> m_stores.put (base_addr, new_chain);
> if (dump_file && (dump_flags & TDF_DETAILS))
> {
> - fprintf (dump_file,
> - "Starting new chain with statement:\n");
> + fprintf (dump_file, "Starting new chain with statement:\n");
> print_gimple_stmt (dump_file, stmt, 0);
> fprintf (dump_file, "The base object is:\n");
> print_generic_expr (dump_file, base_addr);
> fprintf (dump_file, "\n");
> }
> - }
> - else
> - terminate_all_aliasing_chains (chain_info,
> - offset != NULL_TREE, stmt);
> -
> - continue;
> +}
>
>
> Jakub
>
>
--
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)