[PATCH] Improve store merging to handle load+store or bitwise logicals (PR tree-optimization/78821, take 2)

Richard Biener rguenther@suse.de
Fri Nov 3 19:39:00 GMT 2017


On November 3, 2017 8:17:30 PM GMT+01:00, Jakub Jelinek <jakub@redhat.com> wrote:
>On Fri, Nov 03, 2017 at 03:04:18PM +0100, Jakub Jelinek wrote:
>> single-use vs. multiple uses is something I've thought about, but
>don't
>> know whether it is better to require single-use or not (or sometimes,
>> under some condition?).  Say if we have:
>
>So, here is what I've committed in the end after
>bootstrapping/regtesting
>it on x86_64-linux and i686-linux, the only changes from the earlier
>patch
>were comments and addition of has_single_use checks.
>
>In those bootstraps/regtests, the number of integer_cst stores were
>expectedly the same, and so were the number of bit_*_expr cases, but
>it apparently matters a lot for the memory copying (rhs_code MEM_REF).
>Without this patch new/orig stores:
>16943   35369
>and with the patch:
>12111   24911
>So, perhaps we'll need to do something smarter (approximate how many
>original loads would be kept and how many new loads/stores we'd need to
>add
>to get rid of how many original stores).
>Or allow multiple uses for the MEM_REF rhs_code only and for anything
>else
>require single use.

Probably interesting to look at the individual cases. But yes, it should be factored into the cost model somehow. 
It's possibly also increasing register pressure. 

Richard. 

>2017-11-03  Jakub Jelinek  <jakub@redhat.com>
>
>	PR tree-optimization/78821
>	* gimple-ssa-store-merging.c: Update the file comment.
>	(MAX_STORE_ALIAS_CHECKS): Define.
>	(struct store_operand_info): New type.
>	(store_operand_info::store_operand_info): New constructor.
>	(struct store_immediate_info): Add rhs_code and ops data members.
>	(store_immediate_info::store_immediate_info): Add rhscode, op0r
>	and op1r arguments to the ctor, initialize corresponding data members.
>	(struct merged_store_group): Add load_align_base and load_align
>	data members.
>	(merged_store_group::merged_store_group): Initialize them.
>	(merged_store_group::do_merge): Update them.
>	(merged_store_group::apply_stores): Pick the constant for
>	encode_tree_to_bitpos from one of the two operands, or skip
>	encode_tree_to_bitpos if neither operand is a constant.
>	(class pass_store_merging): Add process_store method decl.  Remove
>	bool argument from terminate_all_aliasing_chains method decl.
>	(pass_store_merging::terminate_all_aliasing_chains): Remove
>	var_offset_p argument and corresponding handling.
>	(stmts_may_clobber_ref_p): New function.
>	(compatible_load_p): New function.
>	(imm_store_chain_info::coalesce_immediate_stores): Terminate group
>	if there is overlap and rhs_code is not INTEGER_CST.  For
>	non-overlapping stores terminate group if rhs is not mergeable.
>	(get_alias_type_for_stmts): Change first argument from
>	auto_vec<gimple *> & to vec<gimple *> &.  Add IS_LOAD, CLIQUEP and
>	BASEP arguments.  If IS_LOAD is true, look at rhs1 of the stmts
>	instead of lhs.  Compute *CLIQUEP and *BASEP in addition to the
>	alias type.
>	(get_location_for_stmts): Change first argument from
>	auto_vec<gimple *> & to vec<gimple *> &.
>	(struct split_store): Remove orig_stmts data member, add orig_stores.
>	(split_store::split_store): Create orig_stores rather than orig_stmts.
>	(find_constituent_stmts): Renamed to ...
>	(find_constituent_stores): ... this.  Change second argument from
>	vec<gimple *> * to vec<store_immediate_info *> *, push pointers
>	to info structures rather than the statements.
>	(split_group): Rename ALLOW_UNALIGNED argument to
>	ALLOW_UNALIGNED_STORE, add ALLOW_UNALIGNED_LOAD argument and handle
>	it.  Adjust find_constituent_stores caller.
>	(imm_store_chain_info::output_merged_store): Handle rhs_code other
>	than INTEGER_CST, adjust split_group, get_alias_type_for_stmts and
>	get_location_for_stmts callers.  Set MR_DEPENDENCE_CLIQUE and
>	MR_DEPENDENCE_BASE on the MEM_REFs if they are the same in all stores.
>	(mem_valid_for_store_merging): New function.
>	(handled_load): New function.
>	(pass_store_merging::process_store): New method.
>	(pass_store_merging::execute): Use process_store method.  Adjust
>	terminate_all_aliasing_chains caller.
>
>	* gcc.dg/store_merging_13.c: New test.
>	* gcc.dg/store_merging_14.c: New test.
>
>--- gcc/gimple-ssa-store-merging.c.jj	2017-11-03 15:37:02.869561500
>+0100
>+++ gcc/gimple-ssa-store-merging.c	2017-11-03 16:15:15.059282459 +0100
>@@ -19,7 +19,8 @@
>    <http://www.gnu.org/licenses/>.  */
> 
> /* The purpose of this pass is to combine multiple memory stores of
>-   constant values to consecutive memory locations into fewer wider
>stores.
>+   constant values, values loaded from memory or bitwise operations
>+   on those to consecutive memory locations into fewer wider stores.
>    For example, if we have a sequence peforming four byte stores to
>    consecutive memory locations:
>    [p     ] := imm1;
>@@ -29,21 +30,49 @@
>we can transform this into a single 4-byte store if the target supports
>it:
>[p] := imm1:imm2:imm3:imm4 //concatenated immediates according to
>endianness.
> 
>+   Or:
>+   [p     ] := [q     ];
>+   [p + 1B] := [q + 1B];
>+   [p + 2B] := [q + 2B];
>+   [p + 3B] := [q + 3B];
>+   if there is no overlap can be transformed into a single 4-byte
>+   load followed by single 4-byte store.
>+
>+   Or:
>+   [p     ] := [q     ] ^ imm1;
>+   [p + 1B] := [q + 1B] ^ imm2;
>+   [p + 2B] := [q + 2B] ^ imm3;
>+   [p + 3B] := [q + 3B] ^ imm4;
>+   if there is no overlap can be transformed into a single 4-byte
>+   load, xored with imm1:imm2:imm3:imm4 and stored using a single
>4-byte store.
>+
>    The algorithm is applied to each basic block in three phases:
> 
>-   1) Scan through the basic block recording constant assignments to
>+   1) Scan through the basic block recording assignments to
>destinations that can be expressed as a store to memory of a certain
>size
>-   at a certain bit offset.  Record store chains to different bases in
>a
>-   hash_map (m_stores) and make sure to terminate such chains when
>appropriate
>-   (for example when when the stored values get used subsequently).
>+   at a certain bit offset from expressions we can handle.  For
>bit-fields
>+   we also note the surrounding bit region, bits that could be stored
>in
>+   a read-modify-write operation when storing the bit-field.  Record
>store
>+   chains to different bases in a hash_map (m_stores) and make sure to
>+   terminate such chains when appropriate (for example when when the
>stored
>+   values get used subsequently).
>These stores can be a result of structure element initializers, array
>stores
>  etc.  A store_immediate_info object is recorded for every such store.
>   Record as many such assignments to a single base as possible until a
>    statement that interferes with the store sequence is encountered.
>+   Each store has up to 2 operands, which can be an immediate constant
>+   or a memory load, from which the value to be stored can be
>computed.
>+   At most one of the operands can be a constant.  The operands are
>recorded
>+   in store_operand_info struct.
> 
>2) Analyze the chain of stores recorded in phase 1) (i.e. the vector of
>    store_immediate_info objects) and coalesce contiguous stores into
>-   merged_store_group objects.
>+   merged_store_group objects.  For bit-fields stores, we don't need
>to
>+   require the stores to be contiguous, just their surrounding bit
>regions
>+   have to be contiguous.  If the expression being stored is different
>+   between adjacent stores, such as one store storing a constant and
>+   following storing a value loaded from memory, or if the loaded
>memory
>+   objects are not adjacent, a new merged_store_group is created as
>well.
> 
>    For example, given the stores:
>    [p     ] := 0;
>@@ -134,8 +163,35 @@
> #define MAX_STORE_BITSIZE (BITS_PER_WORD)
> #define MAX_STORE_BYTES (MAX_STORE_BITSIZE / BITS_PER_UNIT)
> 
>+/* Limit to bound the number of aliasing checks for loads with the
>same
>+   vuse as the corresponding store.  */
>+#define MAX_STORE_ALIAS_CHECKS 64
>+
> namespace {
> 
>+/* Struct recording one operand for the store, which is either a
>constant,
>+   then VAL represents the constant and all the other fields are zero,
>+   or a memory load, then VAL represents the reference, BASE_ADDR is
>non-NULL
>+   and the other fields also reflect the memory load.  */
>+
>+struct store_operand_info
>+{
>+  tree val;
>+  tree base_addr;
>+  unsigned HOST_WIDE_INT bitsize;
>+  unsigned HOST_WIDE_INT bitpos;
>+  unsigned HOST_WIDE_INT bitregion_start;
>+  unsigned HOST_WIDE_INT bitregion_end;
>+  gimple *stmt;
>+  store_operand_info ();
>+};
>+
>+store_operand_info::store_operand_info ()
>+  : val (NULL_TREE), base_addr (NULL_TREE), bitsize (0), bitpos (0),
>+    bitregion_start (0), bitregion_end (0), stmt (NULL)
>+{
>+}
>+
>/* Struct recording the information about a single store of an
>immediate
>    to memory.  These are created in the first phase and coalesced into
>    merged_store_group objects in the second phase.  */
>@@ -149,9 +205,17 @@ struct store_immediate_info
>   unsigned HOST_WIDE_INT bitregion_end;
>   gimple *stmt;
>   unsigned int order;
>+  /* INTEGER_CST for constant stores, MEM_REF for memory copy or
>+     BIT_*_EXPR for logical bitwise operation.  */
>+  enum tree_code rhs_code;
>+  /* Operands.  For BIT_*_EXPR rhs_code both operands are used,
>otherwise
>+     just the first one.  */
>+  store_operand_info ops[2];
>  store_immediate_info (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
> 			unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
>-			gimple *, unsigned int);
>+			gimple *, unsigned int, enum tree_code,
>+			const store_operand_info &,
>+			const store_operand_info &);
> };
> 
> store_immediate_info::store_immediate_info (unsigned HOST_WIDE_INT bs,
>@@ -159,11 +223,22 @@ store_immediate_info::store_immediate_in
> 					    unsigned HOST_WIDE_INT brs,
> 					    unsigned HOST_WIDE_INT bre,
> 					    gimple *st,
>-					    unsigned int ord)
>+					    unsigned int ord,
>+					    enum tree_code rhscode,
>+					    const store_operand_info &op0r,
>+					    const store_operand_info &op1r)
>: bitsize (bs), bitpos (bp), bitregion_start (brs), bitregion_end
>(bre),
>-    stmt (st), order (ord)
>+    stmt (st), order (ord), rhs_code (rhscode)
>+#if __cplusplus >= 201103L
>+    , ops { op0r, op1r }
>+{
>+}
>+#else
> {
>+  ops[0] = op0r;
>+  ops[1] = op1r;
> }
>+#endif
> 
>/* Struct representing a group of stores to contiguous memory
>locations.
>These are produced by the second phase (coalescing) and consumed in the
>@@ -178,8 +253,10 @@ struct merged_store_group
>   /* The size of the allocated memory for val and mask.  */
>   unsigned HOST_WIDE_INT buf_size;
>   unsigned HOST_WIDE_INT align_base;
>+  unsigned HOST_WIDE_INT load_align_base[2];
> 
>   unsigned int align;
>+  unsigned int load_align[2];
>   unsigned int first_order;
>   unsigned int last_order;
> 
>@@ -576,6 +653,20 @@ merged_store_group::merged_store_group (
>   get_object_alignment_1 (gimple_assign_lhs (info->stmt),
> 			  &align, &align_bitpos);
>   align_base = start - align_bitpos;
>+  for (int i = 0; i < 2; ++i)
>+    {
>+      store_operand_info &op = info->ops[i];
>+      if (op.base_addr == NULL_TREE)
>+	{
>+	  load_align[i] = 0;
>+	  load_align_base[i] = 0;
>+	}
>+      else
>+	{
>+	  get_object_alignment_1 (op.val, &load_align[i], &align_bitpos);
>+	  load_align_base[i] = op.bitpos - align_bitpos;
>+	}
>+    }
>   stores.create (1);
>   stores.safe_push (info);
>   last_stmt = info->stmt;
>@@ -608,6 +699,19 @@ merged_store_group::do_merge (store_imme
>       align = this_align;
>       align_base = info->bitpos - align_bitpos;
>     }
>+  for (int i = 0; i < 2; ++i)
>+    {
>+      store_operand_info &op = info->ops[i];
>+      if (!op.base_addr)
>+	continue;
>+
>+      get_object_alignment_1 (op.val, &this_align, &align_bitpos);
>+      if (this_align > load_align[i])
>+	{
>+	  load_align[i] = this_align;
>+	  load_align_base[i] = op.bitpos - align_bitpos;
>+	}
>+    }
> 
>   gimple *stmt = info->stmt;
>   stores.safe_push (info);
>@@ -682,16 +786,21 @@ merged_store_group::apply_stores ()
>   FOR_EACH_VEC_ELT (stores, i, info)
>     {
>       unsigned int pos_in_buffer = info->bitpos - bitregion_start;
>-      bool ret = encode_tree_to_bitpos (gimple_assign_rhs1
>(info->stmt),
>-					val, info->bitsize,
>-					pos_in_buffer, buf_size);
>-      if (dump_file && (dump_flags & TDF_DETAILS))
>+      tree cst = NULL_TREE;
>+      if (info->ops[0].val && info->ops[0].base_addr == NULL_TREE)
>+	cst = info->ops[0].val;
>+      else if (info->ops[1].val && info->ops[1].base_addr ==
>NULL_TREE)
>+	cst = info->ops[1].val;
>+      bool ret = true;
>+      if (cst)
>+	ret = encode_tree_to_bitpos (cst, val, info->bitsize,
>+				     pos_in_buffer, buf_size);
>+      if (cst && dump_file && (dump_flags & TDF_DETAILS))
> 	{
> 	  if (ret)
> 	    {
> 	      fprintf (dump_file, "After writing ");
>-	      print_generic_expr (dump_file,
>-				  gimple_assign_rhs1 (info->stmt), 0);
>+	      print_generic_expr (dump_file, cst, 0);
> 	      fprintf (dump_file, " of size " HOST_WIDE_INT_PRINT_DEC
> 			" at position %d the merged region contains:\n",
> 			info->bitsize, pos_in_buffer);
>@@ -799,9 +908,10 @@ private:
>      decisions when going out of SSA).  */
>   imm_store_chain_info *m_stores_head;
> 
>+  void process_store (gimple *);
>   bool terminate_and_process_all_chains ();
>   bool terminate_all_aliasing_chains (imm_store_chain_info **,
>-				      bool, gimple *);
>+				      gimple *);
>   bool terminate_and_release_chain (imm_store_chain_info *);
> }; // class pass_store_merging
> 
>@@ -831,7 +941,6 @@ pass_store_merging::terminate_and_proces
> bool
>pass_store_merging::terminate_all_aliasing_chains (imm_store_chain_info
> 						     **chain_info,
>-						   bool var_offset_p,
> 						   gimple *stmt)
> {
>   bool ret = false;
>@@ -845,37 +954,21 @@ pass_store_merging::terminate_all_aliasi
>      of a chain.  */
>   if (chain_info)
>     {
>-      /* We have a chain at BASE and we're writing to [BASE +
><variable>].
>-	 This can interfere with any of the stores so terminate
>-	 the chain.  */
>-      if (var_offset_p)
>-	{
>-	  terminate_and_release_chain (*chain_info);
>-	  ret = true;
>-	}
>-      /* Otherwise go through every store in the chain to see if it
>-	 aliases with any of them.  */
>-      else
>+      store_immediate_info *info;
>+      unsigned int i;
>+      FOR_EACH_VEC_ELT ((*chain_info)->m_store_info, i, info)
> 	{
>-	  store_immediate_info *info;
>-	  unsigned int i;
>-	  FOR_EACH_VEC_ELT ((*chain_info)->m_store_info, i, info)
>+	  if (ref_maybe_used_by_stmt_p (stmt, gimple_assign_lhs (info->stmt))
>+	      || stmt_may_clobber_ref_p (stmt, gimple_assign_lhs
>(info->stmt)))
> 	    {
>-	      if (ref_maybe_used_by_stmt_p (stmt,
>-					    gimple_assign_lhs (info->stmt))
>-		  || stmt_may_clobber_ref_p (stmt,
>-					     gimple_assign_lhs (info->stmt)))
>+	      if (dump_file && (dump_flags & TDF_DETAILS))
> 		{
>-		  if (dump_file && (dump_flags & TDF_DETAILS))
>-		    {
>-		      fprintf (dump_file,
>-			       "stmt causes chain termination:\n");
>-		      print_gimple_stmt (dump_file, stmt, 0);
>-		    }
>-		  terminate_and_release_chain (*chain_info);
>-		  ret = true;
>-		  break;
>+		  fprintf (dump_file, "stmt causes chain termination:\n");
>+		  print_gimple_stmt (dump_file, stmt, 0);
> 		}
>+	      terminate_and_release_chain (*chain_info);
>+	      ret = true;
>+	      break;
> 	    }
> 	}
>     }
>@@ -920,6 +1013,125 @@ pass_store_merging::terminate_and_releas
>   return ret;
> }
> 
>+/* Return true if stmts in between FIRST (inclusive) and LAST
>(exclusive)
>+   may clobber REF.  FIRST and LAST must be in the same basic block
>and
>+   have non-NULL vdef.  */
>+
>+bool
>+stmts_may_clobber_ref_p (gimple *first, gimple *last, tree ref)
>+{
>+  ao_ref r;
>+  ao_ref_init (&r, ref);
>+  unsigned int count = 0;
>+  tree vop = gimple_vdef (last);
>+  gimple *stmt;
>+
>+  gcc_checking_assert (gimple_bb (first) == gimple_bb (last));
>+  do
>+    {
>+      stmt = SSA_NAME_DEF_STMT (vop);
>+      if (stmt_may_clobber_ref_p_1 (stmt, &r))
>+	return true;
>+      /* Avoid quadratic compile time by bounding the number of checks
>+	 we perform.  */
>+      if (++count > MAX_STORE_ALIAS_CHECKS)
>+	return true;
>+      vop = gimple_vuse (stmt);
>+    }
>+  while (stmt != first);
>+  return false;
>+}
>+
>+/* Return true if INFO->ops[IDX] is mergeable with the
>+   corresponding loads already in MERGED_STORE group.
>+   BASE_ADDR is the base address of the whole store group.  */
>+
>+bool
>+compatible_load_p (merged_store_group *merged_store,
>+		   store_immediate_info *info,
>+		   tree base_addr, int idx)
>+{
>+  store_immediate_info *infof = merged_store->stores[0];
>+  if (!info->ops[idx].base_addr
>+      || (info->ops[idx].bitpos - infof->ops[idx].bitpos
>+	  != info->bitpos - infof->bitpos)
>+      || !operand_equal_p (info->ops[idx].base_addr,
>+			   infof->ops[idx].base_addr, 0))
>+    return false;
>+
>+  store_immediate_info *infol = merged_store->stores.last ();
>+  tree load_vuse = gimple_vuse (info->ops[idx].stmt);
>+  /* In this case all vuses should be the same, e.g.
>+     _1 = s.a; _2 = s.b; _3 = _1 | 1; t.a = _3; _4 = _2 | 2; t.b = _4;
>+     or
>+     _1 = s.a; _2 = s.b; t.a = _1; t.b = _2;
>+     and we can emit the coalesced load next to any of those loads. 
>*/
>+  if (gimple_vuse (infof->ops[idx].stmt) == load_vuse
>+      && gimple_vuse (infol->ops[idx].stmt) == load_vuse)
>+    return true;
>+
>+  /* Otherwise, at least for now require that the load has the same
>+     vuse as the store.  See following examples.  */
>+  if (gimple_vuse (info->stmt) != load_vuse)
>+    return false;
>+
>+  if (gimple_vuse (infof->stmt) != gimple_vuse (infof->ops[idx].stmt)
>+      || (infof != infol
>+	  && gimple_vuse (infol->stmt) != gimple_vuse
>(infol->ops[idx].stmt)))
>+    return false;
>+
>+  /* If the load is from the same location as the store, already
>+     the construction of the immediate chain info guarantees no
>intervening
>+     stores, so no further checks are needed.  Example:
>+     _1 = s.a; _2 = _1 & -7; s.a = _2; _3 = s.b; _4 = _3 & -7; s.b =
>_4;  */
>+  if (info->ops[idx].bitpos == info->bitpos
>+      && operand_equal_p (info->ops[idx].base_addr, base_addr, 0))
>+    return true;
>+
>+  /* Otherwise, we need to punt if any of the loads can be clobbered
>by any
>+     of the stores in the group, or any other stores in between those.
>+     Previous calls to compatible_load_p ensured that for all the
>+     merged_store->stores IDX loads, no stmts starting with
>+     merged_store->first_stmt and ending right before
>merged_store->last_stmt
>+     clobbers those loads.  */
>+  gimple *first = merged_store->first_stmt;
>+  gimple *last = merged_store->last_stmt;
>+  unsigned int i;
>+  store_immediate_info *infoc;
>+  /* The stores are sorted by increasing store bitpos, so if
>info->stmt store
>+     comes before the so far first load, we'll be changing
>+     merged_store->first_stmt.  In that case we need to give up if
>+     any of the earlier processed loads clobber with the stmts in the
>new
>+     range.  */
>+  if (info->order < merged_store->first_order)
>+    {
>+      FOR_EACH_VEC_ELT (merged_store->stores, i, infoc)
>+	if (stmts_may_clobber_ref_p (info->stmt, first, infoc->ops[idx].val))
>+	  return false;
>+      first = info->stmt;
>+    }
>+  /* Similarly, we could change merged_store->last_stmt, so ensure
>+     in that case no stmts in the new range clobber any of the earlier
>+     processed loads.  */
>+  else if (info->order > merged_store->last_order)
>+    {
>+      FOR_EACH_VEC_ELT (merged_store->stores, i, infoc)
>+	if (stmts_may_clobber_ref_p (last, info->stmt, infoc->ops[idx].val))
>+	  return false;
>+      last = info->stmt;
>+    }
>+  /* And finally, we'd be adding a new load to the set, ensure it
>isn't
>+     clobbered in the new range.  */
>+  if (stmts_may_clobber_ref_p (first, last, info->ops[idx].val))
>+    return false;
>+
>+  /* Otherwise, we are looking for:
>+     _1 = s.a; _2 = _1 ^ 15; t.a = _2; _3 = s.b; _4 = _3 ^ 15; t.b =
>_4;
>+     or
>+     _1 = s.a; t.a = _1; _2 = s.b; t.b = _2;  */
>+  return true;
>+}
>+
>/* Go through the candidate stores recorded in m_store_info and merge
>them
>    into merged_store_group objects recorded into m_merged_store_groups
>representing the widened stores.  Return true if coalescing was
>successful
>@@ -967,32 +1179,56 @@ imm_store_chain_info::coalesce_immediate
>       if (IN_RANGE (start, merged_store->start,
> 		    merged_store->start + merged_store->width - 1))
> 	{
>-	  merged_store->merge_overlapping (info);
>-	  continue;
>+	  /* Only allow overlapping stores of constants.  */
>+	  if (info->rhs_code == INTEGER_CST
>+	      && merged_store->stores[0]->rhs_code == INTEGER_CST)
>+	    {
>+	      merged_store->merge_overlapping (info);
>+	      continue;
>+	    }
>+	}
>+      /* |---store 1---||---store 2---|
>+	 This store is consecutive to the previous one.
>+	 Merge it into the current store group.  There can be gaps in between
>+	 the stores, but there can't be gaps in between bitregions.  */
>+      else if (info->bitregion_start <= merged_store->bitregion_end
>+	       && info->rhs_code == merged_store->stores[0]->rhs_code)
>+	{
>+	  store_immediate_info *infof = merged_store->stores[0];
>+
>+	  /* All the rhs_code ops that take 2 operands are commutative,
>+	     swap the operands if it could make the operands compatible.  */
>+	  if (infof->ops[0].base_addr
>+	      && infof->ops[1].base_addr
>+	      && info->ops[0].base_addr
>+	      && info->ops[1].base_addr
>+	      && (info->ops[1].bitpos - infof->ops[0].bitpos
>+		  == info->bitpos - infof->bitpos)
>+	      && operand_equal_p (info->ops[1].base_addr,
>+				  infof->ops[0].base_addr, 0))
>+	    std::swap (info->ops[0], info->ops[1]);
>+	  if ((!infof->ops[0].base_addr
>+	       || compatible_load_p (merged_store, info, base_addr, 0))
>+	      && (!infof->ops[1].base_addr
>+		  || compatible_load_p (merged_store, info, base_addr, 1)))
>+	    {
>+	      merged_store->merge_into (info);
>+	      continue;
>+	    }
> 	}
> 
>       /* |---store 1---| <gap> |---store 2---|.
>-	 Gap between stores.  Start a new group if there are any gaps
>-	 between bitregions.  */
>-      if (info->bitregion_start > merged_store->bitregion_end)
>-	{
>-	  /* Try to apply all the stores recorded for the group to determine
>-	     the bitpattern they write and discard it if that fails.
>-	     This will also reject single-store groups.  */
>-	  if (!merged_store->apply_stores ())
>-	    delete merged_store;
>-	  else
>-	    m_merged_store_groups.safe_push (merged_store);
>-
>-	  merged_store = new merged_store_group (info);
>+	 Gap between stores or the rhs not compatible.  Start a new group. 
>*/
> 
>-	  continue;
>-	}
>+      /* Try to apply all the stores recorded for the group to
>determine
>+	 the bitpattern they write and discard it if that fails.
>+	 This will also reject single-store groups.  */
>+      if (!merged_store->apply_stores ())
>+	delete merged_store;
>+      else
>+	m_merged_store_groups.safe_push (merged_store);
> 
>-      /* |---store 1---||---store 2---|
>-	 This store is consecutive to the previous one.
>-	 Merge it into the current store group.  */
>-       merged_store->merge_into (info);
>+      merged_store = new merged_store_group (info);
>     }
> 
>   /* Record or discard the last store group.  */
>@@ -1014,35 +1250,57 @@ imm_store_chain_info::coalesce_immediate
>   return success;
> }
> 
>-/* Return the type to use for the merged stores described by STMTS.
>-   This is needed to get the alias sets right.  */
>+/* Return the type to use for the merged stores or loads described by
>STMTS.
>+   This is needed to get the alias sets right.  If IS_LOAD, look for
>rhs,
>+   otherwise lhs.  Additionally set *CLIQUEP and *BASEP to
>MR_DEPENDENCE_*
>+   of the MEM_REFs if any.  */
> 
> static tree
>-get_alias_type_for_stmts (auto_vec<gimple *> &stmts)
>+get_alias_type_for_stmts (vec<gimple *> &stmts, bool is_load,
>+			  unsigned short *cliquep, unsigned short *basep)
> {
>   gimple *stmt;
>   unsigned int i;
>-  tree lhs = gimple_assign_lhs (stmts[0]);
>-  tree type = reference_alias_ptr_type (lhs);
>+  tree type = NULL_TREE;
>+  tree ret = NULL_TREE;
>+  *cliquep = 0;
>+  *basep = 0;
> 
>   FOR_EACH_VEC_ELT (stmts, i, stmt)
>     {
>-      if (i == 0)
>-	continue;
>+      tree ref = is_load ? gimple_assign_rhs1 (stmt)
>+			 : gimple_assign_lhs (stmt);
>+      tree type1 = reference_alias_ptr_type (ref);
>+      tree base = get_base_address (ref);
> 
>-      lhs = gimple_assign_lhs (stmt);
>-      tree type1 = reference_alias_ptr_type (lhs);
>+      if (i == 0)
>+	{
>+	  if (TREE_CODE (base) == MEM_REF)
>+	    {
>+	      *cliquep = MR_DEPENDENCE_CLIQUE (base);
>+	      *basep = MR_DEPENDENCE_BASE (base);
>+	    }
>+	  ret = type = type1;
>+	  continue;
>+	}
>       if (!alias_ptr_types_compatible_p (type, type1))
>-	return ptr_type_node;
>+	ret = ptr_type_node;
>+      if (TREE_CODE (base) != MEM_REF
>+	  || *cliquep != MR_DEPENDENCE_CLIQUE (base)
>+	  || *basep != MR_DEPENDENCE_BASE (base))
>+	{
>+	  *cliquep = 0;
>+	  *basep = 0;
>+	}
>     }
>-  return type;
>+  return ret;
> }
> 
> /* Return the location_t information we can find among the statements
>    in STMTS.  */
> 
> static location_t
>-get_location_for_stmts (auto_vec<gimple *> &stmts)
>+get_location_for_stmts (vec<gimple *> &stmts)
> {
>   gimple *stmt;
>   unsigned int i;
>@@ -1062,7 +1320,7 @@ struct split_store
>   unsigned HOST_WIDE_INT bytepos;
>   unsigned HOST_WIDE_INT size;
>   unsigned HOST_WIDE_INT align;
>-  auto_vec<gimple *> orig_stmts;
>+  auto_vec<store_immediate_info *> orig_stores;
>/* True if there is a single orig stmt covering the whole split store. 
>*/
>   bool orig;
>   split_store (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
>@@ -1076,21 +1334,20 @@ split_store::split_store (unsigned HOST_
> 			  unsigned HOST_WIDE_INT al)
> 			  : bytepos (bp), size (sz), align (al), orig (false)
> {
>-  orig_stmts.create (0);
>+  orig_stores.create (0);
> }
> 
>-/* Record all statements corresponding to stores in GROUP that write
>to
>-   the region starting at BITPOS and is of size BITSIZE.  Record such
>-   statements in STMTS if non-NULL.  The stores in GROUP must be
>sorted by
>-   bitposition.  Return INFO if there is exactly one original store
>-   in the range.  */
>+/* Record all stores in GROUP that write to the region starting at
>BITPOS and
>+   is of size BITSIZE.  Record infos for such statements in STORES if
>+   non-NULL.  The stores in GROUP must be sorted by bitposition. 
>Return INFO
>+   if there is exactly one original store in the range.  */
> 
> static store_immediate_info *
>-find_constituent_stmts (struct merged_store_group *group,
>-			vec<gimple *> *stmts,
>-			unsigned int *first,
>-			unsigned HOST_WIDE_INT bitpos,
>-			unsigned HOST_WIDE_INT bitsize)
>+find_constituent_stores (struct merged_store_group *group,
>+			 vec<store_immediate_info *> *stores,
>+			 unsigned int *first,
>+			 unsigned HOST_WIDE_INT bitpos,
>+			 unsigned HOST_WIDE_INT bitsize)
> {
>   store_immediate_info *info, *ret = NULL;
>   unsigned int i;
>@@ -1119,9 +1376,9 @@ find_constituent_stmts (struct merged_st
>       if (stmt_start >= end)
> 	return ret;
> 
>-      if (stmts)
>+      if (stores)
> 	{
>-	  stmts->safe_push (info->stmt);
>+	  stores->safe_push (info);
> 	  if (ret)
> 	    {
> 	      ret = NULL;
>@@ -1143,11 +1400,14 @@ find_constituent_stmts (struct merged_st
>    This is to separate the splitting strategy from the statement
>    building/emission/linking done in output_merged_store.
>    Return number of new stores.
>+   If ALLOW_UNALIGNED_STORE is false, then all stores must be aligned.
>+   If ALLOW_UNALIGNED_LOAD is false, then all loads must be aligned.
>    If SPLIT_STORES is NULL, it is just a dry run to count number of
>    new stores.  */
> 
> static unsigned int
>-split_group (merged_store_group *group, bool allow_unaligned,
>+split_group (merged_store_group *group, bool allow_unaligned_store,
>+	     bool allow_unaligned_load,
> 	     vec<struct split_store *> *split_stores)
> {
>   unsigned HOST_WIDE_INT pos = group->bitregion_start;
>@@ -1155,6 +1415,7 @@ split_group (merged_store_group *group,
>   unsigned HOST_WIDE_INT bytepos = pos / BITS_PER_UNIT;
>   unsigned HOST_WIDE_INT group_align = group->align;
>   unsigned HOST_WIDE_INT align_base = group->align_base;
>+  unsigned HOST_WIDE_INT group_load_align = group_align;
> 
>gcc_assert ((size % BITS_PER_UNIT == 0) && (pos % BITS_PER_UNIT == 0));
> 
>@@ -1162,9 +1423,14 @@ split_group (merged_store_group *group,
>   unsigned HOST_WIDE_INT try_pos = bytepos;
>   group->stores.qsort (sort_by_bitpos);
> 
>+  if (!allow_unaligned_load)
>+    for (int i = 0; i < 2; ++i)
>+      if (group->load_align[i])
>+	group_load_align = MIN (group_load_align, group->load_align[i]);
>+
>   while (size > 0)
>     {
>-      if ((allow_unaligned || group_align <= BITS_PER_UNIT)
>+      if ((allow_unaligned_store || group_align <= BITS_PER_UNIT)
> 	  && group->mask[try_pos - bytepos] == (unsigned char) ~0U)
> 	{
> 	  /* Skip padding bytes.  */
>@@ -1180,10 +1446,34 @@ split_group (merged_store_group *group,
>       unsigned HOST_WIDE_INT align = group_align;
>       if (align_bitpos)
> 	align = least_bit_hwi (align_bitpos);
>-      if (!allow_unaligned)
>+      if (!allow_unaligned_store)
> 	try_size = MIN (try_size, align);
>+      if (!allow_unaligned_load)
>+	{
>+	  /* If we can't do or don't want to do unaligned stores
>+	     as well as loads, we need to take the loads into account
>+	     as well.  */
>+	  unsigned HOST_WIDE_INT load_align = group_load_align;
>+	  align_bitpos = (try_bitpos - align_base) & (load_align - 1);
>+	  if (align_bitpos)
>+	    load_align = least_bit_hwi (align_bitpos);
>+	  for (int i = 0; i < 2; ++i)
>+	    if (group->load_align[i])
>+	      {
>+		align_bitpos = try_bitpos - group->stores[0]->bitpos;
>+		align_bitpos += group->stores[0]->ops[i].bitpos;
>+		align_bitpos -= group->load_align_base[i];
>+		align_bitpos &= (group_load_align - 1);
>+		if (align_bitpos)
>+		  {
>+		    unsigned HOST_WIDE_INT a = least_bit_hwi (align_bitpos);
>+		    load_align = MIN (load_align, a);
>+		  }
>+	      }
>+	  try_size = MIN (try_size, load_align);
>+	}
>       store_immediate_info *info
>-	= find_constituent_stmts (group, NULL, &first, try_bitpos, try_size);
>+	= find_constituent_stores (group, NULL, &first, try_bitpos,
>try_size);
>       if (info)
> 	{
> 	  /* If there is just one original statement for the range, see if
>@@ -1191,8 +1481,8 @@ split_group (merged_store_group *group,
> 	     than try_size.  */
> 	  unsigned HOST_WIDE_INT stmt_end
> 	    = ROUND_UP (info->bitpos + info->bitsize, BITS_PER_UNIT);
>-	  info = find_constituent_stmts (group, NULL, &first, try_bitpos,
>-					 stmt_end - try_bitpos);
>+	  info = find_constituent_stores (group, NULL, &first, try_bitpos,
>+					  stmt_end - try_bitpos);
> 	  if (info && info->bitpos >= try_bitpos)
> 	    {
> 	      try_size = stmt_end - try_bitpos;
>@@ -1221,7 +1511,7 @@ split_group (merged_store_group *group,
>       nonmasked *= BITS_PER_UNIT;
>       while (nonmasked <= try_size / 2)
> 	try_size /= 2;
>-      if (!allow_unaligned && group_align > BITS_PER_UNIT)
>+      if (!allow_unaligned_store && group_align > BITS_PER_UNIT)
> 	{
>	  /* Now look for whole padding bytes at the start of that bitsize. 
>*/
> 	  unsigned int try_bytesize = try_size / BITS_PER_UNIT, masked;
>@@ -1252,8 +1542,8 @@ split_group (merged_store_group *group,
> 	{
> 	  struct split_store *store
> 	    = new split_store (try_pos, try_size, align);
>-	  info = find_constituent_stmts (group, &store->orig_stmts,
>-	  				 &first, try_bitpos, try_size);
>+	  info = find_constituent_stores (group, &store->orig_stores,
>+					  &first, try_bitpos, try_size);
> 	  if (info
> 	      && info->bitpos >= try_bitpos
> 	      && info->bitpos + info->bitsize <= try_bitpos + try_size)
>@@ -1288,19 +1578,23 @@ imm_store_chain_info::output_merged_stor
> 
>   auto_vec<struct split_store *, 32> split_stores;
>   split_stores.create (0);
>-  bool allow_unaligned
>+  bool allow_unaligned_store
>= !STRICT_ALIGNMENT && PARAM_VALUE
>(PARAM_STORE_MERGING_ALLOW_UNALIGNED);
>-  if (allow_unaligned)
>+  bool allow_unaligned_load = allow_unaligned_store;
>+  if (allow_unaligned_store)
>     {
>      /* If unaligned stores are allowed, see how many stores we'd emit
> 	 for unaligned and how many stores we'd emit for aligned stores.
>	 Only use unaligned stores if it allows fewer stores than aligned.  */
>-      unsigned aligned_cnt = split_group (group, false, NULL);
>-      unsigned unaligned_cnt = split_group (group, true, NULL);
>+      unsigned aligned_cnt
>+	= split_group (group, false, allow_unaligned_load, NULL);
>+      unsigned unaligned_cnt
>+	= split_group (group, true,



More information about the Gcc-patches mailing list