This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] Fix part of pr25505


PR25505 indicates excessive stack usage in certain C++ code.  One
problem is that we are missing NRV optimization opportunities.  For
example, in the test case included with this patch, the mere presence of
a call-clobbered variable is enough to prevent NRV optimization on a
non-call-clobbered variable.

The function execute_return_slot_opt is checking the call-clobber
attribute across FOR_EACH_SSA_DEF_OPERAND.  However, this means that any
variable potentially clobbered by the function call will inhibit NRV
optimization.  The attached patch implements a more fine-grained
approach when the result is a VAR_DECL - only checking to make sure the
result isn't clobbered.

Unfortunately, this isn't quite enough to actually improve code
generation for pr25505.  This is because the front-end introduces a
temporary for function calls, and tree-gimple doesn't recognize some
instances when it can optimize away this temporary.

For example, calling a function returning a structure like this:

  struct S { int x; void *y; }

The C++ front-end will generate:

 result = *(struct S &) (struct S *) &TARGET_EXPR <D.1940, fn1 ()>

And tree-gimple will produce:

  D.1960 = fn1 ();
  result = D.1960;

Contrast that with a call to a function returning this structure:

  struct S { int x[1000]; }

Where the front-end produces the same thing, but tree-gimple gives us:

  result = fn1 ();

This is because tree-gimple bases its decision to retain the temporary
on whether it thinks the RHS of the TARGET_EXPR will be in memory.  And,
 it makes this decision based on the type of the RHS.  The second part
of the attached patch makes tree-gimple recognize that the result of a
function returning an aggregate is in memory.

Unfortunately, these changes together only fix ~25% of the remaining
stack size issue described in pr25505 (taking us from ~10K stack usage
for the function described in the PR down to ~7.5K), but it is progress.

Tested on i686-pc-linux-gnu (all default languages) with no regressions.

OK for mainline?

- Josh

2006-08-31  Josh Conner  <jconner@apple.com>

	PR c++/25505
	* tree-nrv.c (execute_return_slot_opt): If LHS is a simple decl,
	only look at whether it is clobbered.
	* tree-gimple.c (is_gimple_mem_rhs): Recognize functions
	returning aggregates.

2006-08-31  Josh Conner  <jconner@apple.com>

	PR c++/25505
	* gcc.dg/nrv3.c: New test.


Index: gcc/tree-nrv.c
===================================================================
--- gcc/tree-nrv.c	(revision 116353)
+++ gcc/tree-nrv.c	(working copy)
@@ -264,18 +264,33 @@ execute_return_slot_opt (void)
 	    {
 	      def_operand_p def_p;
 	      ssa_op_iter op_iter;
+	      tree lhs = TREE_OPERAND (stmt, 0);
 
 	      /* We determine whether or not the LHS address escapes by
 		 asking whether it is call clobbered.  When the LHS isn't a
 		 simple decl, we need to check the VDEFs, so it's simplest
 		 to just loop through all the DEFs.  */
-	      FOR_EACH_SSA_DEF_OPERAND (def_p, stmt, op_iter, SSA_OP_ALL_DEFS)
+	      if (TREE_CODE (lhs) == VAR_DECL)
 		{
-		  tree def = DEF_FROM_PTR (def_p);
-		  if (TREE_CODE (def) == SSA_NAME)
-		    def = SSA_NAME_VAR (def);
-		  if (is_call_clobbered (def))
+		  subvar_t subvar;
+		  if (is_call_clobbered (lhs))
 		    goto unsafe;
+		  for (subvar = get_subvars_for_var (lhs);
+		       subvar;
+		       subvar = subvar->next)
+		    if (is_call_clobbered (subvar->var))
+		      goto unsafe;
+		}
+	      else
+		{
+	          FOR_EACH_SSA_DEF_OPERAND (def_p, stmt, op_iter, SSA_OP_ALL_DEFS)
+		    {
+		      tree def = DEF_FROM_PTR (def_p);
+		      if (TREE_CODE (def) == SSA_NAME)
+		        def = SSA_NAME_VAR (def);
+		      if (is_call_clobbered (def))
+		        goto unsafe;
+		    }
 		}
 
 	      /* No defs are call clobbered, so the optimization is safe.  */
Index: gcc/tree-gimple.c
===================================================================
--- gcc/tree-gimple.c	(revision 116353)
+++ gcc/tree-gimple.c	(working copy)
@@ -115,7 +115,9 @@ is_gimple_mem_rhs (tree t)
      to be stored in memory, since it's cheap and prevents erroneous
      tailcalls (PR 17526).  */
   if (is_gimple_reg_type (TREE_TYPE (t))
-      || TYPE_MODE (TREE_TYPE (t)) != BLKmode)
+      || (TYPE_MODE (TREE_TYPE (t)) != BLKmode
+	  && (TREE_CODE (t) != CALL_EXPR
+              || ! aggregate_value_p (t, t))))
     return is_gimple_val (t);
   else
     return is_gimple_formal_tmp_rhs (t);
Index: gcc/testsuite/gcc.dg/nrv3.c
===================================================================
--- gcc/testsuite/gcc.dg/nrv3.c	(revision 0)
+++ gcc/testsuite/gcc.dg/nrv3.c	(revision 0)
@@ -0,0 +1,23 @@
+/* Verify that gimple-level NRV is occurring when values other than the
+   return slot are call-clobbered.  */
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-optimized" } */
+
+typedef struct { int x; void *y; } S;
+S fn1 (void);
+void fn2 (S, int);
+int *ptr;
+void foo (void)
+{
+  S result;
+  int i;
+
+  ptr = &i;
+
+  /* i is call-clobbered here, but result isn't.  */
+  result = fn1();
+  fn2 (result, i);
+}
+
+/* { dg-final { scan-tree-dump-times "return slot optimization" 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]