[PATCH] Improve store merging to handle load+store or bitwise logicals (PR tree-optimization/78821, take 2)
Richard Biener
rguenther@suse.de
Fri Nov 3 19:39:00 GMT 2017
On November 3, 2017 8:17:30 PM GMT+01:00, Jakub Jelinek <jakub@redhat.com> wrote:
>On Fri, Nov 03, 2017 at 03:04:18PM +0100, Jakub Jelinek wrote:
>> single-use vs. multiple uses is something I've thought about, but
>don't
>> know whether it is better to require single-use or not (or sometimes,
>> under some condition?). Say if we have:
>
>So, here is what I've committed in the end after
>bootstrapping/regtesting
>it on x86_64-linux and i686-linux, the only changes from the earlier
>patch
>were comments and addition of has_single_use checks.
>
>In those bootstraps/regtests, the number of integer_cst stores were
>expectedly the same, and so were the number of bit_*_expr cases, but
>it apparently matters a lot for the memory copying (rhs_code MEM_REF).
>Without this patch new/orig stores:
>16943 35369
>and with the patch:
>12111 24911
>So, perhaps we'll need to do something smarter (approximate how many
>original loads would be kept and how many new loads/stores we'd need to
>add
>to get rid of how many original stores).
>Or allow multiple uses for the MEM_REF rhs_code only and for anything
>else
>require single use.
Probably interesting to look at the individual cases. But yes, it should be factored into the cost model somehow.
It's possibly also increasing register pressure.
Richard.
>2017-11-03 Jakub Jelinek <jakub@redhat.com>
>
> PR tree-optimization/78821
> * gimple-ssa-store-merging.c: Update the file comment.
> (MAX_STORE_ALIAS_CHECKS): Define.
> (struct store_operand_info): New type.
> (store_operand_info::store_operand_info): New constructor.
> (struct store_immediate_info): Add rhs_code and ops data members.
> (store_immediate_info::store_immediate_info): Add rhscode, op0r
> and op1r arguments to the ctor, initialize corresponding data members.
> (struct merged_store_group): Add load_align_base and load_align
> data members.
> (merged_store_group::merged_store_group): Initialize them.
> (merged_store_group::do_merge): Update them.
> (merged_store_group::apply_stores): Pick the constant for
> encode_tree_to_bitpos from one of the two operands, or skip
> encode_tree_to_bitpos if neither operand is a constant.
> (class pass_store_merging): Add process_store method decl. Remove
> bool argument from terminate_all_aliasing_chains method decl.
> (pass_store_merging::terminate_all_aliasing_chains): Remove
> var_offset_p argument and corresponding handling.
> (stmts_may_clobber_ref_p): New function.
> (compatible_load_p): New function.
> (imm_store_chain_info::coalesce_immediate_stores): Terminate group
> if there is overlap and rhs_code is not INTEGER_CST. For
> non-overlapping stores terminate group if rhs is not mergeable.
> (get_alias_type_for_stmts): Change first argument from
> auto_vec<gimple *> & to vec<gimple *> &. Add IS_LOAD, CLIQUEP and
> BASEP arguments. If IS_LOAD is true, look at rhs1 of the stmts
> instead of lhs. Compute *CLIQUEP and *BASEP in addition to the
> alias type.
> (get_location_for_stmts): Change first argument from
> auto_vec<gimple *> & to vec<gimple *> &.
> (struct split_store): Remove orig_stmts data member, add orig_stores.
> (split_store::split_store): Create orig_stores rather than orig_stmts.
> (find_constituent_stmts): Renamed to ...
> (find_constituent_stores): ... this. Change second argument from
> vec<gimple *> * to vec<store_immediate_info *> *, push pointers
> to info structures rather than the statements.
> (split_group): Rename ALLOW_UNALIGNED argument to
> ALLOW_UNALIGNED_STORE, add ALLOW_UNALIGNED_LOAD argument and handle
> it. Adjust find_constituent_stores caller.
> (imm_store_chain_info::output_merged_store): Handle rhs_code other
> than INTEGER_CST, adjust split_group, get_alias_type_for_stmts and
> get_location_for_stmts callers. Set MR_DEPENDENCE_CLIQUE and
> MR_DEPENDENCE_BASE on the MEM_REFs if they are the same in all stores.
> (mem_valid_for_store_merging): New function.
> (handled_load): New function.
> (pass_store_merging::process_store): New method.
> (pass_store_merging::execute): Use process_store method. Adjust
> terminate_all_aliasing_chains caller.
>
> * gcc.dg/store_merging_13.c: New test.
> * gcc.dg/store_merging_14.c: New test.
>
>--- gcc/gimple-ssa-store-merging.c.jj 2017-11-03 15:37:02.869561500
>+0100
>+++ gcc/gimple-ssa-store-merging.c 2017-11-03 16:15:15.059282459 +0100
>@@ -19,7 +19,8 @@
> <http://www.gnu.org/licenses/>. */
>
> /* The purpose of this pass is to combine multiple memory stores of
>- constant values to consecutive memory locations into fewer wider
>stores.
>+ constant values, values loaded from memory or bitwise operations
>+ on those to consecutive memory locations into fewer wider stores.
> For example, if we have a sequence peforming four byte stores to
> consecutive memory locations:
> [p ] := imm1;
>@@ -29,21 +30,49 @@
>we can transform this into a single 4-byte store if the target supports
>it:
>[p] := imm1:imm2:imm3:imm4 //concatenated immediates according to
>endianness.
>
>+ Or:
>+ [p ] := [q ];
>+ [p + 1B] := [q + 1B];
>+ [p + 2B] := [q + 2B];
>+ [p + 3B] := [q + 3B];
>+ if there is no overlap can be transformed into a single 4-byte
>+ load followed by single 4-byte store.
>+
>+ Or:
>+ [p ] := [q ] ^ imm1;
>+ [p + 1B] := [q + 1B] ^ imm2;
>+ [p + 2B] := [q + 2B] ^ imm3;
>+ [p + 3B] := [q + 3B] ^ imm4;
>+ if there is no overlap can be transformed into a single 4-byte
>+ load, xored with imm1:imm2:imm3:imm4 and stored using a single
>4-byte store.
>+
> The algorithm is applied to each basic block in three phases:
>
>- 1) Scan through the basic block recording constant assignments to
>+ 1) Scan through the basic block recording assignments to
>destinations that can be expressed as a store to memory of a certain
>size
>- at a certain bit offset. Record store chains to different bases in
>a
>- hash_map (m_stores) and make sure to terminate such chains when
>appropriate
>- (for example when when the stored values get used subsequently).
>+ at a certain bit offset from expressions we can handle. For
>bit-fields
>+ we also note the surrounding bit region, bits that could be stored
>in
>+ a read-modify-write operation when storing the bit-field. Record
>store
>+ chains to different bases in a hash_map (m_stores) and make sure to
>+ terminate such chains when appropriate (for example when when the
>stored
>+ values get used subsequently).
>These stores can be a result of structure element initializers, array
>stores
> etc. A store_immediate_info object is recorded for every such store.
> Record as many such assignments to a single base as possible until a
> statement that interferes with the store sequence is encountered.
>+ Each store has up to 2 operands, which can be an immediate constant
>+ or a memory load, from which the value to be stored can be
>computed.
>+ At most one of the operands can be a constant. The operands are
>recorded
>+ in store_operand_info struct.
>
>2) Analyze the chain of stores recorded in phase 1) (i.e. the vector of
> store_immediate_info objects) and coalesce contiguous stores into
>- merged_store_group objects.
>+ merged_store_group objects. For bit-fields stores, we don't need
>to
>+ require the stores to be contiguous, just their surrounding bit
>regions
>+ have to be contiguous. If the expression being stored is different
>+ between adjacent stores, such as one store storing a constant and
>+ following storing a value loaded from memory, or if the loaded
>memory
>+ objects are not adjacent, a new merged_store_group is created as
>well.
>
> For example, given the stores:
> [p ] := 0;
>@@ -134,8 +163,35 @@
> #define MAX_STORE_BITSIZE (BITS_PER_WORD)
> #define MAX_STORE_BYTES (MAX_STORE_BITSIZE / BITS_PER_UNIT)
>
>+/* Limit to bound the number of aliasing checks for loads with the
>same
>+ vuse as the corresponding store. */
>+#define MAX_STORE_ALIAS_CHECKS 64
>+
> namespace {
>
>+/* Struct recording one operand for the store, which is either a
>constant,
>+ then VAL represents the constant and all the other fields are zero,
>+ or a memory load, then VAL represents the reference, BASE_ADDR is
>non-NULL
>+ and the other fields also reflect the memory load. */
>+
>+struct store_operand_info
>+{
>+ tree val;
>+ tree base_addr;
>+ unsigned HOST_WIDE_INT bitsize;
>+ unsigned HOST_WIDE_INT bitpos;
>+ unsigned HOST_WIDE_INT bitregion_start;
>+ unsigned HOST_WIDE_INT bitregion_end;
>+ gimple *stmt;
>+ store_operand_info ();
>+};
>+
>+store_operand_info::store_operand_info ()
>+ : val (NULL_TREE), base_addr (NULL_TREE), bitsize (0), bitpos (0),
>+ bitregion_start (0), bitregion_end (0), stmt (NULL)
>+{
>+}
>+
>/* Struct recording the information about a single store of an
>immediate
> to memory. These are created in the first phase and coalesced into
> merged_store_group objects in the second phase. */
>@@ -149,9 +205,17 @@ struct store_immediate_info
> unsigned HOST_WIDE_INT bitregion_end;
> gimple *stmt;
> unsigned int order;
>+ /* INTEGER_CST for constant stores, MEM_REF for memory copy or
>+ BIT_*_EXPR for logical bitwise operation. */
>+ enum tree_code rhs_code;
>+ /* Operands. For BIT_*_EXPR rhs_code both operands are used,
>otherwise
>+ just the first one. */
>+ store_operand_info ops[2];
> store_immediate_info (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
> unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
>- gimple *, unsigned int);
>+ gimple *, unsigned int, enum tree_code,
>+ const store_operand_info &,
>+ const store_operand_info &);
> };
>
> store_immediate_info::store_immediate_info (unsigned HOST_WIDE_INT bs,
>@@ -159,11 +223,22 @@ store_immediate_info::store_immediate_in
> unsigned HOST_WIDE_INT brs,
> unsigned HOST_WIDE_INT bre,
> gimple *st,
>- unsigned int ord)
>+ unsigned int ord,
>+ enum tree_code rhscode,
>+ const store_operand_info &op0r,
>+ const store_operand_info &op1r)
>: bitsize (bs), bitpos (bp), bitregion_start (brs), bitregion_end
>(bre),
>- stmt (st), order (ord)
>+ stmt (st), order (ord), rhs_code (rhscode)
>+#if __cplusplus >= 201103L
>+ , ops { op0r, op1r }
>+{
>+}
>+#else
> {
>+ ops[0] = op0r;
>+ ops[1] = op1r;
> }
>+#endif
>
>/* Struct representing a group of stores to contiguous memory
>locations.
>These are produced by the second phase (coalescing) and consumed in the
>@@ -178,8 +253,10 @@ struct merged_store_group
> /* The size of the allocated memory for val and mask. */
> unsigned HOST_WIDE_INT buf_size;
> unsigned HOST_WIDE_INT align_base;
>+ unsigned HOST_WIDE_INT load_align_base[2];
>
> unsigned int align;
>+ unsigned int load_align[2];
> unsigned int first_order;
> unsigned int last_order;
>
>@@ -576,6 +653,20 @@ merged_store_group::merged_store_group (
> get_object_alignment_1 (gimple_assign_lhs (info->stmt),
> &align, &align_bitpos);
> align_base = start - align_bitpos;
>+ for (int i = 0; i < 2; ++i)
>+ {
>+ store_operand_info &op = info->ops[i];
>+ if (op.base_addr == NULL_TREE)
>+ {
>+ load_align[i] = 0;
>+ load_align_base[i] = 0;
>+ }
>+ else
>+ {
>+ get_object_alignment_1 (op.val, &load_align[i], &align_bitpos);
>+ load_align_base[i] = op.bitpos - align_bitpos;
>+ }
>+ }
> stores.create (1);
> stores.safe_push (info);
> last_stmt = info->stmt;
>@@ -608,6 +699,19 @@ merged_store_group::do_merge (store_imme
> align = this_align;
> align_base = info->bitpos - align_bitpos;
> }
>+ for (int i = 0; i < 2; ++i)
>+ {
>+ store_operand_info &op = info->ops[i];
>+ if (!op.base_addr)
>+ continue;
>+
>+ get_object_alignment_1 (op.val, &this_align, &align_bitpos);
>+ if (this_align > load_align[i])
>+ {
>+ load_align[i] = this_align;
>+ load_align_base[i] = op.bitpos - align_bitpos;
>+ }
>+ }
>
> gimple *stmt = info->stmt;
> stores.safe_push (info);
>@@ -682,16 +786,21 @@ merged_store_group::apply_stores ()
> FOR_EACH_VEC_ELT (stores, i, info)
> {
> unsigned int pos_in_buffer = info->bitpos - bitregion_start;
>- bool ret = encode_tree_to_bitpos (gimple_assign_rhs1
>(info->stmt),
>- val, info->bitsize,
>- pos_in_buffer, buf_size);
>- if (dump_file && (dump_flags & TDF_DETAILS))
>+ tree cst = NULL_TREE;
>+ if (info->ops[0].val && info->ops[0].base_addr == NULL_TREE)
>+ cst = info->ops[0].val;
>+ else if (info->ops[1].val && info->ops[1].base_addr ==
>NULL_TREE)
>+ cst = info->ops[1].val;
>+ bool ret = true;
>+ if (cst)
>+ ret = encode_tree_to_bitpos (cst, val, info->bitsize,
>+ pos_in_buffer, buf_size);
>+ if (cst && dump_file && (dump_flags & TDF_DETAILS))
> {
> if (ret)
> {
> fprintf (dump_file, "After writing ");
>- print_generic_expr (dump_file,
>- gimple_assign_rhs1 (info->stmt), 0);
>+ print_generic_expr (dump_file, cst, 0);
> fprintf (dump_file, " of size " HOST_WIDE_INT_PRINT_DEC
> " at position %d the merged region contains:\n",
> info->bitsize, pos_in_buffer);
>@@ -799,9 +908,10 @@ private:
> decisions when going out of SSA). */
> imm_store_chain_info *m_stores_head;
>
>+ void process_store (gimple *);
> bool terminate_and_process_all_chains ();
> bool terminate_all_aliasing_chains (imm_store_chain_info **,
>- bool, gimple *);
>+ gimple *);
> bool terminate_and_release_chain (imm_store_chain_info *);
> }; // class pass_store_merging
>
>@@ -831,7 +941,6 @@ pass_store_merging::terminate_and_proces
> bool
>pass_store_merging::terminate_all_aliasing_chains (imm_store_chain_info
> **chain_info,
>- bool var_offset_p,
> gimple *stmt)
> {
> bool ret = false;
>@@ -845,37 +954,21 @@ pass_store_merging::terminate_all_aliasi
> of a chain. */
> if (chain_info)
> {
>- /* We have a chain at BASE and we're writing to [BASE +
><variable>].
>- This can interfere with any of the stores so terminate
>- the chain. */
>- if (var_offset_p)
>- {
>- terminate_and_release_chain (*chain_info);
>- ret = true;
>- }
>- /* Otherwise go through every store in the chain to see if it
>- aliases with any of them. */
>- else
>+ store_immediate_info *info;
>+ unsigned int i;
>+ FOR_EACH_VEC_ELT ((*chain_info)->m_store_info, i, info)
> {
>- store_immediate_info *info;
>- unsigned int i;
>- FOR_EACH_VEC_ELT ((*chain_info)->m_store_info, i, info)
>+ if (ref_maybe_used_by_stmt_p (stmt, gimple_assign_lhs (info->stmt))
>+ || stmt_may_clobber_ref_p (stmt, gimple_assign_lhs
>(info->stmt)))
> {
>- if (ref_maybe_used_by_stmt_p (stmt,
>- gimple_assign_lhs (info->stmt))
>- || stmt_may_clobber_ref_p (stmt,
>- gimple_assign_lhs (info->stmt)))
>+ if (dump_file && (dump_flags & TDF_DETAILS))
> {
>- if (dump_file && (dump_flags & TDF_DETAILS))
>- {
>- fprintf (dump_file,
>- "stmt causes chain termination:\n");
>- print_gimple_stmt (dump_file, stmt, 0);
>- }
>- terminate_and_release_chain (*chain_info);
>- ret = true;
>- break;
>+ fprintf (dump_file, "stmt causes chain termination:\n");
>+ print_gimple_stmt (dump_file, stmt, 0);
> }
>+ terminate_and_release_chain (*chain_info);
>+ ret = true;
>+ break;
> }
> }
> }
>@@ -920,6 +1013,125 @@ pass_store_merging::terminate_and_releas
> return ret;
> }
>
>+/* Return true if stmts in between FIRST (inclusive) and LAST
>(exclusive)
>+ may clobber REF. FIRST and LAST must be in the same basic block
>and
>+ have non-NULL vdef. */
>+
>+bool
>+stmts_may_clobber_ref_p (gimple *first, gimple *last, tree ref)
>+{
>+ ao_ref r;
>+ ao_ref_init (&r, ref);
>+ unsigned int count = 0;
>+ tree vop = gimple_vdef (last);
>+ gimple *stmt;
>+
>+ gcc_checking_assert (gimple_bb (first) == gimple_bb (last));
>+ do
>+ {
>+ stmt = SSA_NAME_DEF_STMT (vop);
>+ if (stmt_may_clobber_ref_p_1 (stmt, &r))
>+ return true;
>+ /* Avoid quadratic compile time by bounding the number of checks
>+ we perform. */
>+ if (++count > MAX_STORE_ALIAS_CHECKS)
>+ return true;
>+ vop = gimple_vuse (stmt);
>+ }
>+ while (stmt != first);
>+ return false;
>+}
>+
>+/* Return true if INFO->ops[IDX] is mergeable with the
>+ corresponding loads already in MERGED_STORE group.
>+ BASE_ADDR is the base address of the whole store group. */
>+
>+bool
>+compatible_load_p (merged_store_group *merged_store,
>+ store_immediate_info *info,
>+ tree base_addr, int idx)
>+{
>+ store_immediate_info *infof = merged_store->stores[0];
>+ if (!info->ops[idx].base_addr
>+ || (info->ops[idx].bitpos - infof->ops[idx].bitpos
>+ != info->bitpos - infof->bitpos)
>+ || !operand_equal_p (info->ops[idx].base_addr,
>+ infof->ops[idx].base_addr, 0))
>+ return false;
>+
>+ store_immediate_info *infol = merged_store->stores.last ();
>+ tree load_vuse = gimple_vuse (info->ops[idx].stmt);
>+ /* In this case all vuses should be the same, e.g.
>+ _1 = s.a; _2 = s.b; _3 = _1 | 1; t.a = _3; _4 = _2 | 2; t.b = _4;
>+ or
>+ _1 = s.a; _2 = s.b; t.a = _1; t.b = _2;
>+ and we can emit the coalesced load next to any of those loads.
>*/
>+ if (gimple_vuse (infof->ops[idx].stmt) == load_vuse
>+ && gimple_vuse (infol->ops[idx].stmt) == load_vuse)
>+ return true;
>+
>+ /* Otherwise, at least for now require that the load has the same
>+ vuse as the store. See following examples. */
>+ if (gimple_vuse (info->stmt) != load_vuse)
>+ return false;
>+
>+ if (gimple_vuse (infof->stmt) != gimple_vuse (infof->ops[idx].stmt)
>+ || (infof != infol
>+ && gimple_vuse (infol->stmt) != gimple_vuse
>(infol->ops[idx].stmt)))
>+ return false;
>+
>+ /* If the load is from the same location as the store, already
>+ the construction of the immediate chain info guarantees no
>intervening
>+ stores, so no further checks are needed. Example:
>+ _1 = s.a; _2 = _1 & -7; s.a = _2; _3 = s.b; _4 = _3 & -7; s.b =
>_4; */
>+ if (info->ops[idx].bitpos == info->bitpos
>+ && operand_equal_p (info->ops[idx].base_addr, base_addr, 0))
>+ return true;
>+
>+ /* Otherwise, we need to punt if any of the loads can be clobbered
>by any
>+ of the stores in the group, or any other stores in between those.
>+ Previous calls to compatible_load_p ensured that for all the
>+ merged_store->stores IDX loads, no stmts starting with
>+ merged_store->first_stmt and ending right before
>merged_store->last_stmt
>+ clobbers those loads. */
>+ gimple *first = merged_store->first_stmt;
>+ gimple *last = merged_store->last_stmt;
>+ unsigned int i;
>+ store_immediate_info *infoc;
>+ /* The stores are sorted by increasing store bitpos, so if
>info->stmt store
>+ comes before the so far first load, we'll be changing
>+ merged_store->first_stmt. In that case we need to give up if
>+ any of the earlier processed loads clobber with the stmts in the
>new
>+ range. */
>+ if (info->order < merged_store->first_order)
>+ {
>+ FOR_EACH_VEC_ELT (merged_store->stores, i, infoc)
>+ if (stmts_may_clobber_ref_p (info->stmt, first, infoc->ops[idx].val))
>+ return false;
>+ first = info->stmt;
>+ }
>+ /* Similarly, we could change merged_store->last_stmt, so ensure
>+ in that case no stmts in the new range clobber any of the earlier
>+ processed loads. */
>+ else if (info->order > merged_store->last_order)
>+ {
>+ FOR_EACH_VEC_ELT (merged_store->stores, i, infoc)
>+ if (stmts_may_clobber_ref_p (last, info->stmt, infoc->ops[idx].val))
>+ return false;
>+ last = info->stmt;
>+ }
>+ /* And finally, we'd be adding a new load to the set, ensure it
>isn't
>+ clobbered in the new range. */
>+ if (stmts_may_clobber_ref_p (first, last, info->ops[idx].val))
>+ return false;
>+
>+ /* Otherwise, we are looking for:
>+ _1 = s.a; _2 = _1 ^ 15; t.a = _2; _3 = s.b; _4 = _3 ^ 15; t.b =
>_4;
>+ or
>+ _1 = s.a; t.a = _1; _2 = s.b; t.b = _2; */
>+ return true;
>+}
>+
>/* Go through the candidate stores recorded in m_store_info and merge
>them
> into merged_store_group objects recorded into m_merged_store_groups
>representing the widened stores. Return true if coalescing was
>successful
>@@ -967,32 +1179,56 @@ imm_store_chain_info::coalesce_immediate
> if (IN_RANGE (start, merged_store->start,
> merged_store->start + merged_store->width - 1))
> {
>- merged_store->merge_overlapping (info);
>- continue;
>+ /* Only allow overlapping stores of constants. */
>+ if (info->rhs_code == INTEGER_CST
>+ && merged_store->stores[0]->rhs_code == INTEGER_CST)
>+ {
>+ merged_store->merge_overlapping (info);
>+ continue;
>+ }
>+ }
>+ /* |---store 1---||---store 2---|
>+ This store is consecutive to the previous one.
>+ Merge it into the current store group. There can be gaps in between
>+ the stores, but there can't be gaps in between bitregions. */
>+ else if (info->bitregion_start <= merged_store->bitregion_end
>+ && info->rhs_code == merged_store->stores[0]->rhs_code)
>+ {
>+ store_immediate_info *infof = merged_store->stores[0];
>+
>+ /* All the rhs_code ops that take 2 operands are commutative,
>+ swap the operands if it could make the operands compatible. */
>+ if (infof->ops[0].base_addr
>+ && infof->ops[1].base_addr
>+ && info->ops[0].base_addr
>+ && info->ops[1].base_addr
>+ && (info->ops[1].bitpos - infof->ops[0].bitpos
>+ == info->bitpos - infof->bitpos)
>+ && operand_equal_p (info->ops[1].base_addr,
>+ infof->ops[0].base_addr, 0))
>+ std::swap (info->ops[0], info->ops[1]);
>+ if ((!infof->ops[0].base_addr
>+ || compatible_load_p (merged_store, info, base_addr, 0))
>+ && (!infof->ops[1].base_addr
>+ || compatible_load_p (merged_store, info, base_addr, 1)))
>+ {
>+ merged_store->merge_into (info);
>+ continue;
>+ }
> }
>
> /* |---store 1---| <gap> |---store 2---|.
>- Gap between stores. Start a new group if there are any gaps
>- between bitregions. */
>- if (info->bitregion_start > merged_store->bitregion_end)
>- {
>- /* Try to apply all the stores recorded for the group to determine
>- the bitpattern they write and discard it if that fails.
>- This will also reject single-store groups. */
>- if (!merged_store->apply_stores ())
>- delete merged_store;
>- else
>- m_merged_store_groups.safe_push (merged_store);
>-
>- merged_store = new merged_store_group (info);
>+ Gap between stores or the rhs not compatible. Start a new group.
>*/
>
>- continue;
>- }
>+ /* Try to apply all the stores recorded for the group to
>determine
>+ the bitpattern they write and discard it if that fails.
>+ This will also reject single-store groups. */
>+ if (!merged_store->apply_stores ())
>+ delete merged_store;
>+ else
>+ m_merged_store_groups.safe_push (merged_store);
>
>- /* |---store 1---||---store 2---|
>- This store is consecutive to the previous one.
>- Merge it into the current store group. */
>- merged_store->merge_into (info);
>+ merged_store = new merged_store_group (info);
> }
>
> /* Record or discard the last store group. */
>@@ -1014,35 +1250,57 @@ imm_store_chain_info::coalesce_immediate
> return success;
> }
>
>-/* Return the type to use for the merged stores described by STMTS.
>- This is needed to get the alias sets right. */
>+/* Return the type to use for the merged stores or loads described by
>STMTS.
>+ This is needed to get the alias sets right. If IS_LOAD, look for
>rhs,
>+ otherwise lhs. Additionally set *CLIQUEP and *BASEP to
>MR_DEPENDENCE_*
>+ of the MEM_REFs if any. */
>
> static tree
>-get_alias_type_for_stmts (auto_vec<gimple *> &stmts)
>+get_alias_type_for_stmts (vec<gimple *> &stmts, bool is_load,
>+ unsigned short *cliquep, unsigned short *basep)
> {
> gimple *stmt;
> unsigned int i;
>- tree lhs = gimple_assign_lhs (stmts[0]);
>- tree type = reference_alias_ptr_type (lhs);
>+ tree type = NULL_TREE;
>+ tree ret = NULL_TREE;
>+ *cliquep = 0;
>+ *basep = 0;
>
> FOR_EACH_VEC_ELT (stmts, i, stmt)
> {
>- if (i == 0)
>- continue;
>+ tree ref = is_load ? gimple_assign_rhs1 (stmt)
>+ : gimple_assign_lhs (stmt);
>+ tree type1 = reference_alias_ptr_type (ref);
>+ tree base = get_base_address (ref);
>
>- lhs = gimple_assign_lhs (stmt);
>- tree type1 = reference_alias_ptr_type (lhs);
>+ if (i == 0)
>+ {
>+ if (TREE_CODE (base) == MEM_REF)
>+ {
>+ *cliquep = MR_DEPENDENCE_CLIQUE (base);
>+ *basep = MR_DEPENDENCE_BASE (base);
>+ }
>+ ret = type = type1;
>+ continue;
>+ }
> if (!alias_ptr_types_compatible_p (type, type1))
>- return ptr_type_node;
>+ ret = ptr_type_node;
>+ if (TREE_CODE (base) != MEM_REF
>+ || *cliquep != MR_DEPENDENCE_CLIQUE (base)
>+ || *basep != MR_DEPENDENCE_BASE (base))
>+ {
>+ *cliquep = 0;
>+ *basep = 0;
>+ }
> }
>- return type;
>+ return ret;
> }
>
> /* Return the location_t information we can find among the statements
> in STMTS. */
>
> static location_t
>-get_location_for_stmts (auto_vec<gimple *> &stmts)
>+get_location_for_stmts (vec<gimple *> &stmts)
> {
> gimple *stmt;
> unsigned int i;
>@@ -1062,7 +1320,7 @@ struct split_store
> unsigned HOST_WIDE_INT bytepos;
> unsigned HOST_WIDE_INT size;
> unsigned HOST_WIDE_INT align;
>- auto_vec<gimple *> orig_stmts;
>+ auto_vec<store_immediate_info *> orig_stores;
>/* True if there is a single orig stmt covering the whole split store.
>*/
> bool orig;
> split_store (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
>@@ -1076,21 +1334,20 @@ split_store::split_store (unsigned HOST_
> unsigned HOST_WIDE_INT al)
> : bytepos (bp), size (sz), align (al), orig (false)
> {
>- orig_stmts.create (0);
>+ orig_stores.create (0);
> }
>
>-/* Record all statements corresponding to stores in GROUP that write
>to
>- the region starting at BITPOS and is of size BITSIZE. Record such
>- statements in STMTS if non-NULL. The stores in GROUP must be
>sorted by
>- bitposition. Return INFO if there is exactly one original store
>- in the range. */
>+/* Record all stores in GROUP that write to the region starting at
>BITPOS and
>+ is of size BITSIZE. Record infos for such statements in STORES if
>+ non-NULL. The stores in GROUP must be sorted by bitposition.
>Return INFO
>+ if there is exactly one original store in the range. */
>
> static store_immediate_info *
>-find_constituent_stmts (struct merged_store_group *group,
>- vec<gimple *> *stmts,
>- unsigned int *first,
>- unsigned HOST_WIDE_INT bitpos,
>- unsigned HOST_WIDE_INT bitsize)
>+find_constituent_stores (struct merged_store_group *group,
>+ vec<store_immediate_info *> *stores,
>+ unsigned int *first,
>+ unsigned HOST_WIDE_INT bitpos,
>+ unsigned HOST_WIDE_INT bitsize)
> {
> store_immediate_info *info, *ret = NULL;
> unsigned int i;
>@@ -1119,9 +1376,9 @@ find_constituent_stmts (struct merged_st
> if (stmt_start >= end)
> return ret;
>
>- if (stmts)
>+ if (stores)
> {
>- stmts->safe_push (info->stmt);
>+ stores->safe_push (info);
> if (ret)
> {
> ret = NULL;
>@@ -1143,11 +1400,14 @@ find_constituent_stmts (struct merged_st
> This is to separate the splitting strategy from the statement
> building/emission/linking done in output_merged_store.
> Return number of new stores.
>+ If ALLOW_UNALIGNED_STORE is false, then all stores must be aligned.
>+ If ALLOW_UNALIGNED_LOAD is false, then all loads must be aligned.
> If SPLIT_STORES is NULL, it is just a dry run to count number of
> new stores. */
>
> static unsigned int
>-split_group (merged_store_group *group, bool allow_unaligned,
>+split_group (merged_store_group *group, bool allow_unaligned_store,
>+ bool allow_unaligned_load,
> vec<struct split_store *> *split_stores)
> {
> unsigned HOST_WIDE_INT pos = group->bitregion_start;
>@@ -1155,6 +1415,7 @@ split_group (merged_store_group *group,
> unsigned HOST_WIDE_INT bytepos = pos / BITS_PER_UNIT;
> unsigned HOST_WIDE_INT group_align = group->align;
> unsigned HOST_WIDE_INT align_base = group->align_base;
>+ unsigned HOST_WIDE_INT group_load_align = group_align;
>
>gcc_assert ((size % BITS_PER_UNIT == 0) && (pos % BITS_PER_UNIT == 0));
>
>@@ -1162,9 +1423,14 @@ split_group (merged_store_group *group,
> unsigned HOST_WIDE_INT try_pos = bytepos;
> group->stores.qsort (sort_by_bitpos);
>
>+ if (!allow_unaligned_load)
>+ for (int i = 0; i < 2; ++i)
>+ if (group->load_align[i])
>+ group_load_align = MIN (group_load_align, group->load_align[i]);
>+
> while (size > 0)
> {
>- if ((allow_unaligned || group_align <= BITS_PER_UNIT)
>+ if ((allow_unaligned_store || group_align <= BITS_PER_UNIT)
> && group->mask[try_pos - bytepos] == (unsigned char) ~0U)
> {
> /* Skip padding bytes. */
>@@ -1180,10 +1446,34 @@ split_group (merged_store_group *group,
> unsigned HOST_WIDE_INT align = group_align;
> if (align_bitpos)
> align = least_bit_hwi (align_bitpos);
>- if (!allow_unaligned)
>+ if (!allow_unaligned_store)
> try_size = MIN (try_size, align);
>+ if (!allow_unaligned_load)
>+ {
>+ /* If we can't do or don't want to do unaligned stores
>+ as well as loads, we need to take the loads into account
>+ as well. */
>+ unsigned HOST_WIDE_INT load_align = group_load_align;
>+ align_bitpos = (try_bitpos - align_base) & (load_align - 1);
>+ if (align_bitpos)
>+ load_align = least_bit_hwi (align_bitpos);
>+ for (int i = 0; i < 2; ++i)
>+ if (group->load_align[i])
>+ {
>+ align_bitpos = try_bitpos - group->stores[0]->bitpos;
>+ align_bitpos += group->stores[0]->ops[i].bitpos;
>+ align_bitpos -= group->load_align_base[i];
>+ align_bitpos &= (group_load_align - 1);
>+ if (align_bitpos)
>+ {
>+ unsigned HOST_WIDE_INT a = least_bit_hwi (align_bitpos);
>+ load_align = MIN (load_align, a);
>+ }
>+ }
>+ try_size = MIN (try_size, load_align);
>+ }
> store_immediate_info *info
>- = find_constituent_stmts (group, NULL, &first, try_bitpos, try_size);
>+ = find_constituent_stores (group, NULL, &first, try_bitpos,
>try_size);
> if (info)
> {
> /* If there is just one original statement for the range, see if
>@@ -1191,8 +1481,8 @@ split_group (merged_store_group *group,
> than try_size. */
> unsigned HOST_WIDE_INT stmt_end
> = ROUND_UP (info->bitpos + info->bitsize, BITS_PER_UNIT);
>- info = find_constituent_stmts (group, NULL, &first, try_bitpos,
>- stmt_end - try_bitpos);
>+ info = find_constituent_stores (group, NULL, &first, try_bitpos,
>+ stmt_end - try_bitpos);
> if (info && info->bitpos >= try_bitpos)
> {
> try_size = stmt_end - try_bitpos;
>@@ -1221,7 +1511,7 @@ split_group (merged_store_group *group,
> nonmasked *= BITS_PER_UNIT;
> while (nonmasked <= try_size / 2)
> try_size /= 2;
>- if (!allow_unaligned && group_align > BITS_PER_UNIT)
>+ if (!allow_unaligned_store && group_align > BITS_PER_UNIT)
> {
> /* Now look for whole padding bytes at the start of that bitsize.
>*/
> unsigned int try_bytesize = try_size / BITS_PER_UNIT, masked;
>@@ -1252,8 +1542,8 @@ split_group (merged_store_group *group,
> {
> struct split_store *store
> = new split_store (try_pos, try_size, align);
>- info = find_constituent_stmts (group, &store->orig_stmts,
>- &first, try_bitpos, try_size);
>+ info = find_constituent_stores (group, &store->orig_stores,
>+ &first, try_bitpos, try_size);
> if (info
> && info->bitpos >= try_bitpos
> && info->bitpos + info->bitsize <= try_bitpos + try_size)
>@@ -1288,19 +1578,23 @@ imm_store_chain_info::output_merged_stor
>
> auto_vec<struct split_store *, 32> split_stores;
> split_stores.create (0);
>- bool allow_unaligned
>+ bool allow_unaligned_store
>= !STRICT_ALIGNMENT && PARAM_VALUE
>(PARAM_STORE_MERGING_ALLOW_UNALIGNED);
>- if (allow_unaligned)
>+ bool allow_unaligned_load = allow_unaligned_store;
>+ if (allow_unaligned_store)
> {
> /* If unaligned stores are allowed, see how many stores we'd emit
> for unaligned and how many stores we'd emit for aligned stores.
> Only use unaligned stores if it allows fewer stores than aligned. */
>- unsigned aligned_cnt = split_group (group, false, NULL);
>- unsigned unaligned_cnt = split_group (group, true, NULL);
>+ unsigned aligned_cnt
>+ = split_group (group, false, allow_unaligned_load, NULL);
>+ unsigned unaligned_cnt
>+ = split_group (group, true,
More information about the Gcc-patches
mailing list