This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] Fix PR46556 (poor address generation)


This patch addresses the poor code generation in PR46556 for the
following code:

struct x
{
  int a[16];
  int b[16];
  int c[16];
};

extern void foo (int, int, int);

void
f (struct x *p, unsigned int n)
{
  foo (p->a[n], p->c[n], p->b[n]);
}

Prior to the fix for PR32698, gcc calculated the offset for accessing
the array elements as:  n*4; 64+n*4; 128+n*4.

Following that fix, the offsets are calculated as:  n*4; (n+16)*4; (n
+32)*4.  This led to poor code generation on powerpc64 targets, among
others.

The poor code generation was observed to not occur in loops, as the
IVOPTS code does a good job of lowering these expressions to MEM_REFs.
It was previously suggested that perhaps a general pass to lower memory
accesses to MEM_REFs in GIMPLE would solve not only this, but other
similar problems.  I spent some time looking into various approaches to
this, and reviewing some previous attempts to do similar things.  In the
end, I've concluded that this is a bad idea in practice because of the
loss of useful aliasing information.  In particular, early lowering of
component references causes us to lose the ability to disambiguate
non-overlapping references in the same structure, and there is no simple
way to carry the necessary aliasing information along with the
replacement MEM_REFs to avoid this.  While some performance gains are
available with GIMPLE lowering of memory accesses, there are also
offsetting performance losses, and I suspect this would just be a
continuous source of bug reports into the future.

Therefore the current patch is a much simpler approach to solve the
specific problem noted in the PR.  There are two pieces to the patch:

 * The offending addressing pattern is matched in GIMPLE and transformed
into a restructured MEM_REF that distributes the multiply, so that (n
+32)*4 becomes 4*n+128 as before.  This is done during the reassociation
pass, for reasons described below.  The transformation only occurs in
non-loop blocks, since IVOPTS does a good job on such things within
loops.
 * A tweak is added to the RTL forward-propagator to avoid propagating
into memory references based on a single base register with no offset,
under certain circumstances.  This improves sharing of base registers
for accesses within the same structure and slightly lowers register
pressure.

It would be possible to separate these into two patches if that's
preferred.  I chose to combine them because together they provide the
ideal code generation that the new test cases test for.

I initially implemented the pattern matcher during expand, but I found
that the expanded code for two accesses to the same structure was often
not being CSEd well.  So I moved it back into the GIMPLE phases prior to
DOM to take advantage of its CSE.  To avoid an additional complete pass
over the IL, I chose to piggyback on the reassociation pass.  This
transformation is not technically a reassociation, but it is related
enough to not be a complete wart.

One noob question about this:  It would probably be preferable to have
this transformation only take place during the second reassociation
pass, so the ARRAY_REFs are seen by earlier optimization phases.  Is
there an easy way to detect that it's the second pass without having to
generate a separate pass entry point?

One other general question about the pattern-match transformation:  Is
this an appropriate transformation for all targets, or should it be
somehow gated on available addressing modes on the target processor?

Bootstrapped and regression tested on powerpc64-linux-gnu.  Verified no
performance degradations on that target for SPEC CPU2000 and CPU2006.

I'm looking for eventual approval for trunk after any comments are
resolved.  Thanks!

Bill


2011-10-05  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

gcc:
	
	PR rtl-optimization/46556
	* fwprop.c (fwprop_bb_aux_d): New struct.
	(MEM_PLUS_REGS): New macro.
	(record_mem_plus_reg): New function.
	(record_mem_plus_regs): Likewise.
	(single_def_use_enter_block): Record mem-plus-reg patterns.
	(build_single_def_use_links): Allocate aux storage.
	(locally_poor_mem_replacement): New function.
	(forward_propagate_and_simplify): Call
	locally_poor_mem_replacement.
	(fwprop_init): Free storage.
	* tree.h (copy_ref_info): Expose existing function.
	* tree-ssa-loop-ivopts.c (copy_ref_info): Remove static token.
	* tree-ssa-reassoc.c (restructure_base_and_offset): New function.
	(restructure_mem_ref): Likewise.
	(reassociate_bb): Look for opportunities to call
	restructure_mem_ref; clean up immediate use lists.

gcc/testsuite:
	
	PR rtl-optimization/46556
	* gcc.target/powerpc/ppc-pr46556-1.c: New testcase.
	* gcc.target/powerpc/ppc-pr46556-2.c: Likewise.
	* gcc.target/powerpc/ppc-pr46556-3.c: Likewise.
	* gcc.target/powerpc/ppc-pr46556-4.c: Likewise.
	* gcc.dg/tree-ssa/pr46556-1.c: Likewise.
	* gcc.dg/tree-ssa/pr46556-2.c: Likewise.
	* gcc.dg/tree-ssa/pr46556-3.c: Likewise.


Index: gcc/fwprop.c
===================================================================
--- gcc/fwprop.c	(revision 179317)
+++ gcc/fwprop.c	(working copy)
@@ -131,6 +131,15 @@ static VEC(df_ref,heap) *reg_defs_stack;
 static bitmap local_md;
 static bitmap local_lr;
 
+/* Auxiliary information for each block for this pass.  */
+typedef struct GTY(()) fwprop_bb_aux_d
+{
+  /* Registers appearing in (mem (plus (reg ... patterns in this block.  */
+  bitmap mem_plus_regs;
+} fwprop_bb_aux, *fwprop_bb_aux_t;
+
+#define MEM_PLUS_REGS(BB) ((fwprop_bb_aux_t) ((BB)->aux))->mem_plus_regs
+
 /* Return the only def in USE's use-def chain, or NULL if there is
    more than one def in the chain.  */
 
@@ -212,7 +221,43 @@ process_uses (df_ref *use_rec, int top_flag)
 }
 
 
+/* Record whether EXPR in block BB is of the form (mem (plus (reg ...  */
 static void
+record_mem_plus_reg (basic_block bb, rtx expr)
+{
+  rtx addr, reg;
+
+  if (GET_CODE (expr) != MEM)
+    return;
+
+  addr = XEXP (expr, 0);
+  if (GET_CODE (addr) != PLUS)
+    return;
+
+  reg = XEXP (addr, 0);
+  if (!REG_P (reg))
+    return;
+
+  (void)bitmap_set_bit (MEM_PLUS_REGS (bb), REGNO (reg));
+}
+
+
+/* Record whether INSN in block BB contains any patterns of the form
+   (mem (plus (reg ...  */
+static void
+record_mem_plus_regs (basic_block bb, rtx insn)
+{
+  rtx insn_set = single_set (insn);
+
+  if (insn_set)
+    {
+      record_mem_plus_reg (bb, SET_DEST (insn_set));
+      record_mem_plus_reg (bb, SET_SRC (insn_set));
+    }
+}
+
+
+static void
 single_def_use_enter_block (struct dom_walk_data *walk_data ATTRIBUTE_UNUSED,
 			    basic_block bb)
 {
@@ -230,6 +275,8 @@ single_def_use_enter_block (struct dom_walk_data *
   process_uses (df_get_artificial_uses (bb_index), DF_REF_AT_TOP);
   process_defs (df_get_artificial_defs (bb_index), DF_REF_AT_TOP);
 
+  MEM_PLUS_REGS (bb) = BITMAP_ALLOC (NULL);
+
   /* We don't call df_simulate_initialize_forwards, as it may overestimate
      the live registers if there are unused artificial defs.  We prefer
      liveness to be underestimated.  */
@@ -242,8 +289,15 @@ single_def_use_enter_block (struct dom_walk_data *
         process_uses (DF_INSN_UID_EQ_USES (uid), 0);
         process_defs (DF_INSN_UID_DEFS (uid), 0);
 	df_simulate_one_insn_forwards (bb, insn, local_lr);
+	record_mem_plus_regs (bb, insn);
       }
 
+  if (dump_file)
+    {
+      fprintf (dump_file, "mem_plus_regs (%d): ", bb->index);
+      dump_bitmap (dump_file, MEM_PLUS_REGS (bb));
+    }
+
   process_uses (df_get_artificial_uses (bb_index), 0);
   process_defs (df_get_artificial_defs (bb_index), 0);
 }
@@ -295,6 +349,8 @@ build_single_def_use_links (void)
   local_md = BITMAP_ALLOC (NULL);
   local_lr = BITMAP_ALLOC (NULL);
 
+  alloc_aux_for_blocks (sizeof (fwprop_bb_aux));
+
   /* Walk the dominator tree looking for single reaching definitions
      dominating the uses.  This is similar to how SSA form is built.  */
   walk_data.dom_direction = CDI_DOMINATORS;
@@ -1207,6 +1263,69 @@ forward_propagate_asm (df_ref use, rtx def_insn, r
   return true;
 }
 
+/* We are proposing to replace a USE of REG in USE_SET.  Determine
+   whether we have a situation where two storage uses of REG occur
+   in the same block as follows.  Each use can be either a memory
+   store or a memory load.
+
+     ... (mem (reg REG))
+
+     ... (mem (plus (reg REG)
+                    (...)))
+
+   The problem is that it will look profitable to propagate something
+   like
+
+     (set (reg REG)
+          (plus (reg X)
+                (reg Y)))
+
+   into the first use, but not into the second one.  This breaks a 
+   CSE opportunity and raises register pressure by extending the
+   lifetimes of X and Y.  To avoid this, don't propagate into the
+   first use when this very specific situation arises.  */
+static bool
+locally_poor_mem_replacement (df_ref use, rtx use_set, rtx reg, rtx def_set)
+{
+  rtx rhs, addend, mem, base;
+  unsigned i;
+
+  /* We're only concerned if the RHS of def_set is the sum of two
+     registers.  */
+  rhs = SET_SRC (def_set);
+
+  if (GET_CODE (rhs) != PLUS)
+    return false;
+
+  for (i = 0; i < 2; i++)
+    {
+      addend = XEXP (rhs, i);
+      if (!REG_P (addend))
+	return false;
+    }
+
+  /* The use must have a single SET and be used in addressing.  */
+  if (!use_set || !DF_REF_REG_MEM_P (use))
+    return false;
+
+  if (DF_REF_REG_MEM_STORE_P (use))
+    mem = SET_DEST (use_set);
+  else
+    mem = SET_SRC (use_set);
+
+  if (GET_CODE (mem) != MEM)
+    return false;
+
+  /* We are specifically interested in the base address.  */
+  base = XEXP (mem, 0);
+  if (reg != base)
+    return false;
+
+  /* We've found a use of the first form.  Check whether uses of the
+     second form exist in the same block.  If found, don't replace.  */
+  return bitmap_bit_p (MEM_PLUS_REGS (DF_REF_BB (use)), DF_REF_REGNO (use));
+}
+
 /* Try to replace USE with SRC (defined in DEF_INSN) and simplify the
    result.  */
 
@@ -1273,6 +1392,11 @@ forward_propagate_and_simplify (df_ref use, rtx de
       return false;
     }
 
+  /* Check for a specific unprofitable propagation into a memory
+     reference, where the propagation will break a CSE opportunity.  */
+  if (locally_poor_mem_replacement (use, use_set, reg, def_set))
+    return false;
+
   if (asm_use >= 0)
     return forward_propagate_asm (use, def_insn, def_set, reg);
 
@@ -1403,6 +1527,12 @@ fwprop_init (void)
 static void
 fwprop_done (void)
 {
+  basic_block bb;
+
+  FOR_EACH_BB (bb)
+    BITMAP_FREE (MEM_PLUS_REGS (bb));
+  free_aux_for_blocks ();
+
   loop_optimizer_finalize ();
 
   VEC_free (df_ref, heap, use_def_ref);
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 179317)
+++ gcc/tree.h	(working copy)
@@ -5775,6 +5775,9 @@ tree target_for_debug_bind (tree);
 extern tree tree_mem_ref_addr (tree, tree);
 extern void copy_mem_ref_info (tree, tree);
 
+/* In tree-ssa-loop-ivopts.c.  */
+extern void copy_ref_info (tree, tree);
+
 /* In tree-vrp.c */
 extern bool ssa_name_nonnegative_p (const_tree);
 
Index: gcc/testsuite/gcc.target/powerpc/ppc-pr46556-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/ppc-pr46556-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/ppc-pr46556-1.c	(revision 0)
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+struct x
+{
+  int a[16];
+  int b[16];
+  int c[16];
+};
+
+extern void foo (int, int, int);
+
+void
+f (struct x *p, unsigned int n)
+{
+  foo (p->a[n], p->c[n], p->b[n]);
+}
+
+/* { dg-final { scan-assembler-times "lw.*3," 1 { target powerpc*-*-* } } } */
+/* { dg-final { scan-assembler-times "lw.*4,128\\(.*\\)" 1 { target powerpc*-*-* } } } */
+/* { dg-final { scan-assembler-times "lw.*5,64\\(.*\\)" 1 { target powerpc*-*-* } } } */
Index: gcc/testsuite/gcc.target/powerpc/ppc-pr46556-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/ppc-pr46556-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/ppc-pr46556-2.c	(revision 0)
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+struct x
+{
+  int a[16];
+  int b[16];
+  int c[16];
+};
+
+extern void foo (int, int, int);
+
+void
+f (struct x *p, unsigned int n)
+{
+  foo (p->a[n], p->c[n], p->b[n]);
+  if (n > 12)
+    foo (p->a[n], p->c[n], p->b[n]);
+  else if (n > 3)
+    foo (p->b[n], p->a[n], p->c[n]);
+}
+
+/* { dg-final { scan-assembler-times "lw.*4,0\\(.*\\)" 1 { target powerpc*-*-* } } } */
+/* { dg-final { scan-assembler-times "lw.*3,0\\(.*\\)" 1 { target powerpc*-*-* } } } */
Index: gcc/testsuite/gcc.target/powerpc/ppc-pr46556-3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/ppc-pr46556-3.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/ppc-pr46556-3.c	(revision 0)
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+struct x
+{
+  int a[16];
+  int b[16];
+  int c[16];
+};
+
+extern void foo (int, int, int);
+
+void
+f (struct x *p, unsigned int n)
+{
+  foo (p->a[n], p->c[n], p->b[n]);
+  if (n > 3)
+    {
+      foo (p->a[n], p->c[n], p->b[n]);
+      if (n > 12)
+	foo (p->b[n], p->a[n], p->c[n]);
+    }
+}
+
+/* { dg-final { scan-assembler-times "lw.*3,0\\(.*\\)" 1 { target powerpc*-*-* } } } */
+/* { dg-final { scan-assembler-times "lw.*4,0\\(.*\\)" 1 { target powerpc*-*-* } } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c	(revision 0)
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-reassoc2" } */
+
+struct x
+{
+  int a[16];
+  int b[16];
+  int c[16];
+};
+
+extern void foo (int, int, int);
+
+void
+f (struct x *p, unsigned int n)
+{
+  foo (p->a[n], p->c[n], p->b[n]);
+}
+
+/* { dg-final { scan-tree-dump-times "\\* 4;" 1 "reassoc2" } } */
+/* { dg-final { scan-tree-dump-times "p_1\\(D\\) \\+ D" 1 "reassoc2" } } */
+/* { dg-final { scan-tree-dump-times "MEM\\\[\\(struct x \\*\\)D" 2 "reassoc2" } } */
+/* { dg-final { cleanup-tree-dump "reassoc2" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c	(revision 0)
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-reassoc2" } */
+
+struct x
+{
+  int a[16];
+  int b[16];
+  int c[16];
+};
+
+extern void foo (int, int, int);
+
+void
+f (struct x *p, unsigned int n)
+{
+  foo (p->a[n], p->c[n], p->b[n]);
+  if (n > 12)
+    foo (p->a[n], p->c[n], p->b[n]);
+  else if (n > 3)
+    foo (p->b[n], p->a[n], p->c[n]);
+}
+
+/* { dg-final { scan-tree-dump-times "\\* 4;" 1 "reassoc2" } } */
+/* { dg-final { scan-tree-dump-times "p_1\\(D\\) \\+ D" 1 "reassoc2" } } */
+/* { dg-final { scan-tree-dump-times "MEM\\\[\\(struct x \\*\\)D" 6 "reassoc2" } } */
+/* { dg-final { cleanup-tree-dump "reassoc2" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c	(revision 0)
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-reassoc2" } */
+
+struct x
+{
+  int a[16];
+  int b[16];
+  int c[16];
+};
+
+extern void foo (int, int, int);
+
+void
+f (struct x *p, unsigned int n)
+{
+  foo (p->a[n], p->c[n], p->b[n]);
+  if (n > 3)
+    {
+      foo (p->a[n], p->c[n], p->b[n]);
+      if (n > 12)
+	foo (p->b[n], p->a[n], p->c[n]);
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "\\* 4;" 1 "reassoc2" } } */
+/* { dg-final { scan-tree-dump-times "p_1\\(D\\) \\+ D" 1 "reassoc2" } } */
+/* { dg-final { scan-tree-dump-times "MEM\\\[\\(struct x \\*\\)D" 6 "reassoc2" } } */
+/* { dg-final { cleanup-tree-dump "reassoc2" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-4.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/pr46556-4.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-4.c	(revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-fwprop1" } */
+
+struct x
+{
+  int a[16];
+  int b[16];
+  int c[16];
+};
+
+extern void foo (int, int, int);
+
+void
+f (struct x *p, unsigned int n)
+{
+  foo (p->a[n], p->c[n], p->b[n]);
+}
+
+/* { dg-final { scan-rtl-dump-times "\\(mem.\*:.\* \\(reg.\*p_1\\(D\\)->a" 1 "fwprop1" } } */
+/* { dg-final { cleanup-rtl-dump "fwprop1" } } */
Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 179317)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -6280,7 +6280,7 @@ rewrite_use_nonlinear_expr (struct ivopts_data *da
 
 /* Copies the reference information from OLD_REF to NEW_REF.  */
 
-static void
+void
 copy_ref_info (tree new_ref, tree old_ref)
 {
   tree new_ptr_base = NULL_TREE;
Index: gcc/tree-ssa-reassoc.c
===================================================================
--- gcc/tree-ssa-reassoc.c	(revision 179317)
+++ gcc/tree-ssa-reassoc.c	(working copy)
@@ -2361,6 +2361,116 @@ break_up_subtract_bb (basic_block bb)
     break_up_subtract_bb (son);
 }
 
+/* Given BASE and OFFSET derived from *EXPR occurring in stmt *GSI,
+   determine whether there is a profitable opportunity to restructure
+   address arithmetic within BASE and OFFSET.  If so, produce such
+   a restructuring in *MEM_REF and return true; else return false.  */
+
+static tree
+restructure_base_and_offset (tree *expr, gimple_stmt_iterator *gsi,
+			     tree base, tree offset, HOST_WIDE_INT bitpos)
+{
+  tree mult_op0, mult_op1, t1, t2, offset_type, mult_expr, add_expr, mem_ref;
+  HOST_WIDE_INT c, c1, c2, c3, c4;
+
+  /* Look for the following pattern:
+
+       base   = MEM_REF (T1, C1);
+       offset = MULT_EXPR (PLUS_EXPR (T2, C2), C3)
+       bitpos = C4 * BITS_PER_UNIT
+
+     If found, convert it to:
+
+       MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)),
+                C1 + (C2 * C3) + C4)
+   */
+  if (!base
+      || !offset
+      || TREE_CODE (base) != MEM_REF
+      || TREE_CODE (TREE_OPERAND (base, 1)) != INTEGER_CST
+      || TREE_CODE (offset) != MULT_EXPR)
+    return NULL_TREE;
+
+  mult_op0 = TREE_OPERAND (offset, 0);
+  mult_op1 = TREE_OPERAND (offset, 1);
+
+  if (TREE_CODE (mult_op0) != PLUS_EXPR
+      || TREE_CODE (mult_op1) != INTEGER_CST
+      || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST)
+    return NULL_TREE;
+
+  t1 = TREE_OPERAND (base, 0);
+  t2 = TREE_OPERAND (mult_op0, 0);
+  c1 = TREE_INT_CST_LOW (TREE_OPERAND (base, 1));
+  c2 = TREE_INT_CST_LOW (TREE_OPERAND (mult_op0, 1));
+  c3 = TREE_INT_CST_LOW (mult_op1);
+  c4 = bitpos / BITS_PER_UNIT;
+  c = c1 + c2 * c3 + c4;
+
+  offset_type = TREE_TYPE (TREE_OPERAND (base, 1));
+  
+  mult_expr = fold_build2 (MULT_EXPR, sizetype, t2,
+			   build_int_cst (sizetype, c3));
+  mult_expr = force_gimple_operand_gsi (gsi, mult_expr, true, NULL,
+					true, GSI_SAME_STMT);
+  add_expr = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr);
+  add_expr = force_gimple_operand_gsi (gsi, add_expr, true, NULL,
+				       true, GSI_SAME_STMT);
+  mem_ref = fold_build2 (MEM_REF, TREE_TYPE (*expr), add_expr,
+			 build_int_cst (offset_type, c));
+  return mem_ref;
+}
+
+/* If *EXPR contains a memory reference, determine whether there is an
+   opportunity to restructure the base and offset expressions for
+   improved performance.  */
+
+static bool
+restructure_mem_ref (tree *expr, gimple_stmt_iterator *gsi)
+{
+  enum tree_code code = TREE_CODE (*expr);
+  tree base, offset, mem_ref;
+  HOST_WIDE_INT bitsize, bitpos;
+  enum machine_mode mode;
+  int unsignedp, volatilep;
+
+  /* Only expressions that reference memory are of interest.  Bitfield
+     references should eventually be lowered prior to this point and
+     are not dealt with.  */
+  if (! handled_component_p (*expr)
+      || code == BIT_FIELD_REF
+      || (code == COMPONENT_REF && DECL_BIT_FIELD (TREE_OPERAND (*expr, 1))))
+    return false;
+
+  /* Determine whether the expression can be represented with base and
+     offset components.  */
+  base = get_inner_reference (*expr, &bitsize, &bitpos, &offset, &mode,
+			      &unsignedp, &volatilep, false);
+  if (!base || !offset)
+    return false;
+
+  /* Look for a restructuring opportunity.  */
+  if ((mem_ref = restructure_base_and_offset (expr, gsi, base,
+					      offset, bitpos)) == NULL_TREE)
+    return false;
+
+  /* Found one.  Record the replacement in the dump file and complete
+     the replacement.  */
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "\nOriginal_expr = ");
+      print_generic_expr (dump_file, *expr, TDF_VOPS|TDF_MEMSYMS);
+      fprintf (dump_file, "\nmem_ref = ");
+      print_generic_expr (dump_file, mem_ref, TDF_VOPS|TDF_MEMSYMS);
+      fprintf (dump_file, "\n\n");
+    }
+
+  copy_ref_info (mem_ref, *expr);
+  *expr = mem_ref;
+
+  return true;
+}
+
 /* Reassociate expressions in basic block BB and its post-dominator as
    children.  */
 
@@ -2369,14 +2479,30 @@ reassociate_bb (basic_block bb)
 {
   gimple_stmt_iterator gsi;
   basic_block son;
+  bool chgd_mem_ref = false;
+  bool in_loop = loop_depth (bb->loop_father) != 0;
 
   for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi))
     {
       gimple stmt = gsi_stmt (gsi);
 
-      if (is_gimple_assign (stmt)
-	  && !stmt_could_throw_p (stmt))
+      /* Look for restructuring opportunities within an expression
+	 that references memory.  We only do this for blocks not
+         contained in loops, since the ivopts machinery does a 
+         good job on loop expressions, and we don't want to interfere
+	 with other loop optimizations.  */
+      if (!in_loop && gimple_vuse (stmt) && gimple_assign_single_p (stmt))
 	{
+	  tree *lhs, *rhs;
+	  lhs = gimple_assign_lhs_ptr (stmt);
+	  chgd_mem_ref = restructure_mem_ref (lhs, &gsi) || chgd_mem_ref;
+	  rhs = gimple_assign_rhs1_ptr (stmt);
+	  chgd_mem_ref = restructure_mem_ref (rhs, &gsi) || chgd_mem_ref;
+	}
+
+      else if (is_gimple_assign (stmt)
+	       && !stmt_could_throw_p (stmt))
+	{
 	  tree lhs, rhs1, rhs2;
 	  enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
 
@@ -2489,6 +2615,12 @@ reassociate_bb (basic_block bb)
 	    }
 	}
     }
+  /* If memory references have been restructured, immediate uses need
+     to be cleaned up.  */
+  if (chgd_mem_ref)
+    for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+      update_stmt (gsi_stmt (gsi));
+
   for (son = first_dom_son (CDI_POST_DOMINATORS, bb);
        son;
        son = next_dom_son (CDI_POST_DOMINATORS, son))



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]