This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH] Fix part of pr25505
- From: Josh Conner <jconner at apple dot com>
- To: gcc-patches <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 31 Aug 2006 05:53:07 -0700
- Subject: [PATCH] Fix part of pr25505
PR25505 indicates excessive stack usage in certain C++ code. One
problem is that we are missing NRV optimization opportunities. For
example, in the test case included with this patch, the mere presence of
a call-clobbered variable is enough to prevent NRV optimization on a
non-call-clobbered variable.
The function execute_return_slot_opt is checking the call-clobber
attribute across FOR_EACH_SSA_DEF_OPERAND. However, this means that any
variable potentially clobbered by the function call will inhibit NRV
optimization. The attached patch implements a more fine-grained
approach when the result is a VAR_DECL - only checking to make sure the
result isn't clobbered.
Unfortunately, this isn't quite enough to actually improve code
generation for pr25505. This is because the front-end introduces a
temporary for function calls, and tree-gimple doesn't recognize some
instances when it can optimize away this temporary.
For example, calling a function returning a structure like this:
struct S { int x; void *y; }
The C++ front-end will generate:
result = *(struct S &) (struct S *) &TARGET_EXPR <D.1940, fn1 ()>
And tree-gimple will produce:
D.1960 = fn1 ();
result = D.1960;
Contrast that with a call to a function returning this structure:
struct S { int x[1000]; }
Where the front-end produces the same thing, but tree-gimple gives us:
result = fn1 ();
This is because tree-gimple bases its decision to retain the temporary
on whether it thinks the RHS of the TARGET_EXPR will be in memory. And,
it makes this decision based on the type of the RHS. The second part
of the attached patch makes tree-gimple recognize that the result of a
function returning an aggregate is in memory.
Unfortunately, these changes together only fix ~25% of the remaining
stack size issue described in pr25505 (taking us from ~10K stack usage
for the function described in the PR down to ~7.5K), but it is progress.
Tested on i686-pc-linux-gnu (all default languages) with no regressions.
OK for mainline?
- Josh
2006-08-31 Josh Conner <jconner@apple.com>
PR c++/25505
* tree-nrv.c (execute_return_slot_opt): If LHS is a simple decl,
only look at whether it is clobbered.
* tree-gimple.c (is_gimple_mem_rhs): Recognize functions
returning aggregates.
2006-08-31 Josh Conner <jconner@apple.com>
PR c++/25505
* gcc.dg/nrv3.c: New test.
Index: gcc/tree-nrv.c
===================================================================
--- gcc/tree-nrv.c (revision 116353)
+++ gcc/tree-nrv.c (working copy)
@@ -264,18 +264,33 @@ execute_return_slot_opt (void)
{
def_operand_p def_p;
ssa_op_iter op_iter;
+ tree lhs = TREE_OPERAND (stmt, 0);
/* We determine whether or not the LHS address escapes by
asking whether it is call clobbered. When the LHS isn't a
simple decl, we need to check the VDEFs, so it's simplest
to just loop through all the DEFs. */
- FOR_EACH_SSA_DEF_OPERAND (def_p, stmt, op_iter, SSA_OP_ALL_DEFS)
+ if (TREE_CODE (lhs) == VAR_DECL)
{
- tree def = DEF_FROM_PTR (def_p);
- if (TREE_CODE (def) == SSA_NAME)
- def = SSA_NAME_VAR (def);
- if (is_call_clobbered (def))
+ subvar_t subvar;
+ if (is_call_clobbered (lhs))
goto unsafe;
+ for (subvar = get_subvars_for_var (lhs);
+ subvar;
+ subvar = subvar->next)
+ if (is_call_clobbered (subvar->var))
+ goto unsafe;
+ }
+ else
+ {
+ FOR_EACH_SSA_DEF_OPERAND (def_p, stmt, op_iter, SSA_OP_ALL_DEFS)
+ {
+ tree def = DEF_FROM_PTR (def_p);
+ if (TREE_CODE (def) == SSA_NAME)
+ def = SSA_NAME_VAR (def);
+ if (is_call_clobbered (def))
+ goto unsafe;
+ }
}
/* No defs are call clobbered, so the optimization is safe. */
Index: gcc/tree-gimple.c
===================================================================
--- gcc/tree-gimple.c (revision 116353)
+++ gcc/tree-gimple.c (working copy)
@@ -115,7 +115,9 @@ is_gimple_mem_rhs (tree t)
to be stored in memory, since it's cheap and prevents erroneous
tailcalls (PR 17526). */
if (is_gimple_reg_type (TREE_TYPE (t))
- || TYPE_MODE (TREE_TYPE (t)) != BLKmode)
+ || (TYPE_MODE (TREE_TYPE (t)) != BLKmode
+ && (TREE_CODE (t) != CALL_EXPR
+ || ! aggregate_value_p (t, t))))
return is_gimple_val (t);
else
return is_gimple_formal_tmp_rhs (t);
Index: gcc/testsuite/gcc.dg/nrv3.c
===================================================================
--- gcc/testsuite/gcc.dg/nrv3.c (revision 0)
+++ gcc/testsuite/gcc.dg/nrv3.c (revision 0)
@@ -0,0 +1,23 @@
+/* Verify that gimple-level NRV is occurring when values other than the
+ return slot are call-clobbered. */
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-optimized" } */
+
+typedef struct { int x; void *y; } S;
+S fn1 (void);
+void fn2 (S, int);
+int *ptr;
+void foo (void)
+{
+ S result;
+ int i;
+
+ ptr = &i;
+
+ /* i is call-clobbered here, but result isn't. */
+ result = fn1();
+ fn2 (result, i);
+}
+
+/* { dg-final { scan-tree-dump-times "return slot optimization" 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */