GCC Bugzilla – Bug 9997
Coelesce stack slots for disjoint scopes.
Last modified: 2004-11-15 00:02:10 UTC
Given struct big { char x[1024]; }; extern void bar(struct big *); void foo(void) { { struct big one; bar(&one); } { struct big one; bar(&one); } } we should allocate only one structure on the stack. Release: all Environment: any
State-Changed-From-To: open->analyzed State-Changed-Why: Confirmed. I think it is a usual observation that gcc often allocates far too much stack space, which often enough isn't even used -- see, for example, PRs 8935 and 8936 (in particular the latter, without optimizations). W.
From: Daniel Jacobowitz <drow@mvista.com> To: rth@redhat.com Cc: gcc-gnats@gcc.gnu.org Subject: Re: middle-end/9997: Coelesce stack slots for disjoint scopes. Date: Sat, 8 Mar 2003 12:54:07 -0500 On Sat, Mar 08, 2003 at 12:56:24AM -0000, rth@redhat.com wrote: > > >Number: 9997 > >Category: middle-end > >Synopsis: Coelesce stack slots for disjoint scopes. > >Confidential: no > >Severity: non-critical > >Priority: medium > >Responsible: unassigned > >State: open > >Class: change-request > >Submitter-Id: net > >Arrival-Date: Sat Mar 08 01:06:00 UTC 2003 > >Closed-Date: > >Last-Modified: > >Originator: Richard Henderson > >Release: all > >Organization: > >Environment: > any > >Description: > Given > > struct big { char x[1024]; }; > extern void bar(struct big *); > > void foo(void) > { > { > struct big one; > bar(&one); > } > { > struct big one; > bar(&one); > } > } > > we should allocate only one structure on the stack. Here's some interesting numbers I got for this testcase. HEAD+stack is the -fstack-reorg patch posted some time ago. Compiler Flags Stack size gcc 3.2.2 -O0 2068 gcc 3.2.2 -O2 2068 gcc HEAD+stack -O0 2072 gcc HEAD+stack -O2 2072 gcc HEAD+stack -O0 -fstack-reorg 2072 gcc HEAD+stack -O2 -fstack-reorg 2072 Here's a patch I've been playing with: gcc HEAD+patch -O0 1048 gcc HEAD+patch -O2 1044 The patch bootstrapped on i686-pc-linux-gnu, and I'm running make check now. I think it's correct; basically, when we close the SCOPE_EXPR, mark all stack slots associated with locals as dead. -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer 2003-03-08 Daniel Jacobowitz <drow@mvista.com> * c-semantics.c (genrtl_scope_stmt): Call free_named_temp_slots. * expr.c (expand_expr): Update calls to assign_stack_temp_for_type. * function.c (struct temp_slot): Add DECL. (assign_stack_temp_for_type): Add DECL argument; save it in the temp_slot. (assign_stack_temp): Update call to assign_stack_temp_for_type. (assign_temp): Pass the DECL to assign_stack_temp_for_type. (free_named_stack_slots): New function. * rtl.h (assign_stack_temp_for_type): Update prototype. * tree.h (assign_stack_temp_for_type): Add prototype. Index: c-semantics.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/c-semantics.c,v retrieving revision 1.53 diff -u -p -r1.53 c-semantics.c --- c-semantics.c 22 Feb 2003 00:32:02 -0000 1.53 +++ c-semantics.c 8 Mar 2003 17:48:53 -0000 @@ -622,7 +622,11 @@ genrtl_scope_stmt (t) if (SCOPE_BEGIN_P (t)) expand_start_bindings_and_block (2 * SCOPE_NULLIFIED_P (t), block); else if (SCOPE_END_P (t)) - expand_end_bindings (NULL_TREE, !SCOPE_NULLIFIED_P (t), 0); + { + expand_end_bindings (NULL_TREE, !SCOPE_NULLIFIED_P (t), 0); + if (block) + free_named_temp_slots (block); + } } else if (!SCOPE_NULLIFIED_P (t)) { Index: expr.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/expr.c,v retrieving revision 1.510 diff -u -p -r1.510 expr.c --- expr.c 5 Mar 2003 00:12:40 -0000 1.510 +++ expr.c 8 Mar 2003 17:48:55 -0000 @@ -7863,7 +7863,8 @@ expand_expr (exp, target, tmode, modifie target = assign_stack_temp_for_type (TYPE_MODE (inner_type), - GET_MODE_SIZE (TYPE_MODE (inner_type)), 0, inner_type); + GET_MODE_SIZE (TYPE_MODE (inner_type)), 0, inner_type, + NULL_TREE); emit_move_insn (target, op0); op0 = target; @@ -7887,7 +7888,8 @@ expand_expr (exp, target, tmode, modifie = MAX (int_size_in_bytes (inner_type), (HOST_WIDE_INT) GET_MODE_SIZE (TYPE_MODE (type))); rtx new = assign_stack_temp_for_type (TYPE_MODE (type), - temp_size, 0, type); + temp_size, 0, type, + NULL_TREE); rtx new_with_op0_mode = adjust_address (new, GET_MODE (op0), 0); if (TREE_ADDRESSABLE (exp)) @@ -9189,7 +9191,8 @@ expand_expr (exp, target, tmode, modifie : int_size_in_bytes (inner_type), 1, build_qualified_type (inner_type, (TYPE_QUALS (inner_type) - | TYPE_QUAL_CONST))); + | TYPE_QUAL_CONST)), + NULL_TREE); if (TYPE_ALIGN_OK (inner_type)) abort (); Index: function.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/function.c,v retrieving revision 1.405 diff -u -p -r1.405 function.c --- function.c 4 Mar 2003 06:20:12 -0000 1.405 +++ function.c 8 Mar 2003 17:48:57 -0000 @@ -165,7 +165,8 @@ static GTY(()) varray_type sibcall_epilo Automatic variables are also assigned temporary slots, at the nesting level where they are defined. They are marked a "kept" so that - free_temp_slots will not free them. */ + free_temp_slots will not free them, although a decl is associated + with each so that it can be freed when the decl goes out of scope. */ struct temp_slot GTY(()) { @@ -201,6 +202,9 @@ struct temp_slot GTY(()) /* The size of the slot, including extra space for alignment. This info is for combine_temp_slots. */ HOST_WIDE_INT full_size; + /* The decl associated with this stack slot, for automatic variables. This + info is for free_named_temp_slots. */ + tree decl; }; /* This structure is used to record MEMs or pseudos used to replace VAR, any @@ -649,14 +653,17 @@ assign_stack_local (mode, size, align) if we are to allocate something at an inner level to be treated as a variable in the block (e.g., a SAVE_EXPR). - TYPE is the type that will be used for the stack slot. */ + TYPE is the type that will be used for the stack slot. + + DECL is the decl that will be associated with the stack slot. */ rtx -assign_stack_temp_for_type (mode, size, keep, type) +assign_stack_temp_for_type (mode, size, keep, type, decl) enum machine_mode mode; HOST_WIDE_INT size; int keep; tree type; + tree decl; { unsigned int align; struct temp_slot *p, *best_p = 0; @@ -789,6 +796,7 @@ assign_stack_temp_for_type (mode, size, p->addr_taken = 0; p->rtl_expr = seq_rtl_expr; p->type = type; + p->decl = decl; if (keep == 2) { @@ -838,7 +846,7 @@ assign_stack_temp (mode, size, keep) HOST_WIDE_INT size; int keep; { - return assign_stack_temp_for_type (mode, size, keep, NULL_TREE); + return assign_stack_temp_for_type (mode, size, keep, NULL_TREE, NULL_TREE); } /* Assign a temporary. @@ -904,7 +912,7 @@ assign_temp (type_or_decl, keep, memory_ size = 1; } - tmp = assign_stack_temp_for_type (mode, size, keep, type); + tmp = assign_stack_temp_for_type (mode, size, keep, type, decl); return tmp; } @@ -1177,6 +1185,23 @@ preserve_temp_slots (x) for (p = temp_slots; p; p = p->next) if (p->in_use && p->level == temp_slot_level && ! p->keep) p->level--; +} + +/* For every VAR_DECL in BLOCK, mark the associated temporary slot as dead + so that it can be reused. */ + +void +free_named_temp_slots (block) + tree block; +{ + struct temp_slot *p; + tree decl; + + for (decl = BLOCK_VARS (block); decl; decl = TREE_CHAIN (decl)) + if (TREE_CODE (decl) == VAR_DECL) + for (p = temp_slots; p; p = p->next) + if (p->decl == decl) + p->in_use = 0; } /* X is the result of an RTL_EXPR. If it is a temporary slot associated Index: rtl.h =================================================================== RCS file: /cvs/gcc/gcc/gcc/rtl.h,v retrieving revision 1.386 diff -u -p -r1.386 rtl.h --- rtl.h 28 Feb 2003 07:06:33 -0000 1.386 +++ rtl.h 8 Mar 2003 17:48:57 -0000 @@ -1459,7 +1459,8 @@ extern rtx assign_stack_local PARAMS (( extern rtx assign_stack_temp PARAMS ((enum machine_mode, HOST_WIDE_INT, int)); extern rtx assign_stack_temp_for_type PARAMS ((enum machine_mode, - HOST_WIDE_INT, int, tree)); + HOST_WIDE_INT, int, tree, + tree)); extern rtx assign_temp PARAMS ((tree, int, int, int)); /* In emit-rtl.c */ extern rtx emit_insn_before PARAMS ((rtx, rtx)); Index: tree.h =================================================================== RCS file: /cvs/gcc/gcc/gcc/tree.h,v retrieving revision 1.385 diff -u -p -r1.385 tree.h --- tree.h 2 Mar 2003 21:18:15 -0000 1.385 +++ tree.h 8 Mar 2003 17:49:00 -0000 @@ -2999,6 +2999,7 @@ extern void pop_temp_slots PARAMS ((voi extern void push_temp_slots PARAMS ((void)); extern void preserve_temp_slots PARAMS ((rtx)); extern void preserve_rtl_expr_temps PARAMS ((tree)); +extern void free_named_temp_slots PARAMS ((tree)); extern int aggregate_value_p PARAMS ((tree)); extern void free_temps_for_rtl_expr PARAMS ((tree)); extern void instantiate_virtual_regs PARAMS ((tree, rtx));
From: Daniel Jacobowitz <drow@mvista.com> To: rth@redhat.com, gcc-gnats@gcc.gnu.org, gcc-bugs@gcc.gnu.org Cc: Subject: Re: middle-end/9997: Coelesce stack slots for disjoint scopes. Date: Sat, 8 Mar 2003 15:22:00 -0500 On Sat, Mar 08, 2003 at 12:54:06PM -0500, Daniel Jacobowitz wrote: > On Sat, Mar 08, 2003 at 12:56:24AM -0000, rth@redhat.com wrote: > > > > >Number: 9997 > > >Category: middle-end > > >Synopsis: Coelesce stack slots for disjoint scopes. > > >Confidential: no > > >Severity: non-critical > > >Priority: medium > > >Responsible: unassigned > > >State: open > > >Class: change-request > > >Submitter-Id: net > > >Arrival-Date: Sat Mar 08 01:06:00 UTC 2003 > > >Closed-Date: > > >Last-Modified: > > >Originator: Richard Henderson > > >Release: all > > >Organization: > > >Environment: > > any > > >Description: > > Given > > > > struct big { char x[1024]; }; > > extern void bar(struct big *); > > > > void foo(void) > > { > > { > > struct big one; > > bar(&one); > > } > > { > > struct big one; > > bar(&one); > > } > > } > > > > we should allocate only one structure on the stack. > > Here's some interesting numbers I got for this testcase. HEAD+stack is > the -fstack-reorg patch posted some time ago. > > Compiler Flags Stack size > > gcc 3.2.2 -O0 2068 > gcc 3.2.2 -O2 2068 > gcc HEAD+stack -O0 2072 > gcc HEAD+stack -O2 2072 > gcc HEAD+stack -O0 -fstack-reorg 2072 > gcc HEAD+stack -O2 -fstack-reorg 2072 > > Here's a patch I've been playing with: > gcc HEAD+patch -O0 1048 > gcc HEAD+patch -O2 1044 > > The patch bootstrapped on i686-pc-linux-gnu, and I'm running make check > now. I think it's correct; basically, when we close the SCOPE_EXPR, > mark all stack slots associated with locals as dead. It's not quite right, since it re-introduces the failure at gcc.c-torture/execute/20020320-1.c. It probably just needs to be tweaked for statement expressions though. > > -- > Daniel Jacobowitz > MontaVista Software Debian GNU/Linux Developer > > 2003-03-08 Daniel Jacobowitz <drow@mvista.com> > > * c-semantics.c (genrtl_scope_stmt): Call free_named_temp_slots. > * expr.c (expand_expr): Update calls to assign_stack_temp_for_type. > * function.c (struct temp_slot): Add DECL. > (assign_stack_temp_for_type): Add DECL argument; save it in the > temp_slot. > (assign_stack_temp): Update call to assign_stack_temp_for_type. > (assign_temp): Pass the DECL to assign_stack_temp_for_type. > (free_named_stack_slots): New function. > * rtl.h (assign_stack_temp_for_type): Update prototype. > * tree.h (assign_stack_temp_for_type): Add prototype. > > Index: c-semantics.c > =================================================================== > RCS file: /cvs/gcc/gcc/gcc/c-semantics.c,v > retrieving revision 1.53 > diff -u -p -r1.53 c-semantics.c > --- c-semantics.c 22 Feb 2003 00:32:02 -0000 1.53 > +++ c-semantics.c 8 Mar 2003 17:48:53 -0000 > @@ -622,7 +622,11 @@ genrtl_scope_stmt (t) > if (SCOPE_BEGIN_P (t)) > expand_start_bindings_and_block (2 * SCOPE_NULLIFIED_P (t), block); > else if (SCOPE_END_P (t)) > - expand_end_bindings (NULL_TREE, !SCOPE_NULLIFIED_P (t), 0); > + { > + expand_end_bindings (NULL_TREE, !SCOPE_NULLIFIED_P (t), 0); > + if (block) > + free_named_temp_slots (block); > + } > } > else if (!SCOPE_NULLIFIED_P (t)) > { > Index: expr.c > =================================================================== > RCS file: /cvs/gcc/gcc/gcc/expr.c,v > retrieving revision 1.510 > diff -u -p -r1.510 expr.c > --- expr.c 5 Mar 2003 00:12:40 -0000 1.510 > +++ expr.c 8 Mar 2003 17:48:55 -0000 > @@ -7863,7 +7863,8 @@ expand_expr (exp, target, tmode, modifie > target > = assign_stack_temp_for_type > (TYPE_MODE (inner_type), > - GET_MODE_SIZE (TYPE_MODE (inner_type)), 0, inner_type); > + GET_MODE_SIZE (TYPE_MODE (inner_type)), 0, inner_type, > + NULL_TREE); > > emit_move_insn (target, op0); > op0 = target; > @@ -7887,7 +7888,8 @@ expand_expr (exp, target, tmode, modifie > = MAX (int_size_in_bytes (inner_type), > (HOST_WIDE_INT) GET_MODE_SIZE (TYPE_MODE (type))); > rtx new = assign_stack_temp_for_type (TYPE_MODE (type), > - temp_size, 0, type); > + temp_size, 0, type, > + NULL_TREE); > rtx new_with_op0_mode = adjust_address (new, GET_MODE (op0), 0); > > if (TREE_ADDRESSABLE (exp)) > @@ -9189,7 +9191,8 @@ expand_expr (exp, target, tmode, modifie > : int_size_in_bytes (inner_type), > 1, build_qualified_type (inner_type, > (TYPE_QUALS (inner_type) > - | TYPE_QUAL_CONST))); > + | TYPE_QUAL_CONST)), > + NULL_TREE); > > if (TYPE_ALIGN_OK (inner_type)) > abort (); > Index: function.c > =================================================================== > RCS file: /cvs/gcc/gcc/gcc/function.c,v > retrieving revision 1.405 > diff -u -p -r1.405 function.c > --- function.c 4 Mar 2003 06:20:12 -0000 1.405 > +++ function.c 8 Mar 2003 17:48:57 -0000 > @@ -165,7 +165,8 @@ static GTY(()) varray_type sibcall_epilo > > Automatic variables are also assigned temporary slots, at the nesting > level where they are defined. They are marked a "kept" so that > - free_temp_slots will not free them. */ > + free_temp_slots will not free them, although a decl is associated > + with each so that it can be freed when the decl goes out of scope. */ > > struct temp_slot GTY(()) > { > @@ -201,6 +202,9 @@ struct temp_slot GTY(()) > /* The size of the slot, including extra space for alignment. This > info is for combine_temp_slots. */ > HOST_WIDE_INT full_size; > + /* The decl associated with this stack slot, for automatic variables. This > + info is for free_named_temp_slots. */ > + tree decl; > }; > > /* This structure is used to record MEMs or pseudos used to replace VAR, any > @@ -649,14 +653,17 @@ assign_stack_local (mode, size, align) > if we are to allocate something at an inner level to be treated as > a variable in the block (e.g., a SAVE_EXPR). > > - TYPE is the type that will be used for the stack slot. */ > + TYPE is the type that will be used for the stack slot. > + > + DECL is the decl that will be associated with the stack slot. */ > > rtx > -assign_stack_temp_for_type (mode, size, keep, type) > +assign_stack_temp_for_type (mode, size, keep, type, decl) > enum machine_mode mode; > HOST_WIDE_INT size; > int keep; > tree type; > + tree decl; > { > unsigned int align; > struct temp_slot *p, *best_p = 0; > @@ -789,6 +796,7 @@ assign_stack_temp_for_type (mode, size, > p->addr_taken = 0; > p->rtl_expr = seq_rtl_expr; > p->type = type; > + p->decl = decl; > > if (keep == 2) > { > @@ -838,7 +846,7 @@ assign_stack_temp (mode, size, keep) > HOST_WIDE_INT size; > int keep; > { > - return assign_stack_temp_for_type (mode, size, keep, NULL_TREE); > + return assign_stack_temp_for_type (mode, size, keep, NULL_TREE, NULL_TREE); > } > > /* Assign a temporary. > @@ -904,7 +912,7 @@ assign_temp (type_or_decl, keep, memory_ > size = 1; > } > > - tmp = assign_stack_temp_for_type (mode, size, keep, type); > + tmp = assign_stack_temp_for_type (mode, size, keep, type, decl); > return tmp; > } > > @@ -1177,6 +1185,23 @@ preserve_temp_slots (x) > for (p = temp_slots; p; p = p->next) > if (p->in_use && p->level == temp_slot_level && ! p->keep) > p->level--; > +} > + > +/* For every VAR_DECL in BLOCK, mark the associated temporary slot as dead > + so that it can be reused. */ > + > +void > +free_named_temp_slots (block) > + tree block; > +{ > + struct temp_slot *p; > + tree decl; > + > + for (decl = BLOCK_VARS (block); decl; decl = TREE_CHAIN (decl)) > + if (TREE_CODE (decl) == VAR_DECL) > + for (p = temp_slots; p; p = p->next) > + if (p->decl == decl) > + p->in_use = 0; > } > > /* X is the result of an RTL_EXPR. If it is a temporary slot associated > Index: rtl.h > =================================================================== > RCS file: /cvs/gcc/gcc/gcc/rtl.h,v > retrieving revision 1.386 > diff -u -p -r1.386 rtl.h > --- rtl.h 28 Feb 2003 07:06:33 -0000 1.386 > +++ rtl.h 8 Mar 2003 17:48:57 -0000 > @@ -1459,7 +1459,8 @@ extern rtx assign_stack_local PARAMS (( > extern rtx assign_stack_temp PARAMS ((enum machine_mode, > HOST_WIDE_INT, int)); > extern rtx assign_stack_temp_for_type PARAMS ((enum machine_mode, > - HOST_WIDE_INT, int, tree)); > + HOST_WIDE_INT, int, tree, > + tree)); > extern rtx assign_temp PARAMS ((tree, int, int, int)); > /* In emit-rtl.c */ > extern rtx emit_insn_before PARAMS ((rtx, rtx)); > Index: tree.h > =================================================================== > RCS file: /cvs/gcc/gcc/gcc/tree.h,v > retrieving revision 1.385 > diff -u -p -r1.385 tree.h > --- tree.h 2 Mar 2003 21:18:15 -0000 1.385 > +++ tree.h 8 Mar 2003 17:49:00 -0000 > @@ -2999,6 +2999,7 @@ extern void pop_temp_slots PARAMS ((voi > extern void push_temp_slots PARAMS ((void)); > extern void preserve_temp_slots PARAMS ((rtx)); > extern void preserve_rtl_expr_temps PARAMS ((tree)); > +extern void free_named_temp_slots PARAMS ((tree)); > extern int aggregate_value_p PARAMS ((tree)); > extern void free_temps_for_rtl_expr PARAMS ((tree)); > extern void instantiate_virtual_regs PARAMS ((tree, rtx)); -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer
From: Jim Wilson <wilson@tuliptree.org> To: gcc-gnats@gcc.gnu.org Cc: Subject: Re: middle-end/9997: Coelesce stack slots for disjoint scopes. Date: Sun, 09 Mar 2003 10:52:33 -0500 I believe we have tried this in the past, and ran into aliasing problems. Suppose you have two variables of different types defined in different scopes. Suppose the scheduler reorders instructions causing the scopes to overlap. Now you do an alias test, and the alias code says that two references can't possibly alias because they have different types, but they do, because they were allocated to the same stack slot. You can't rely on a test for the same address, because we may have lost address info in the RTL. We have different and better alias code now, so perhaps there is a way to address this. Maybe if we add scope nesting level info to the alias info, and then assume that locals alias if they are in different scopes with a common parent, and neither is a parent of the other. Jim
From: Jim Wilson <wilson@tuliptree.org> To: Daniel Jacobowitz <drow@mvista.com> Cc: gcc-gnats@gcc.gnu.org Subject: Re: middle-end/9997: Coelesce stack slots for disjoint scopes. Date: Sun, 09 Mar 2003 11:32:47 -0500 This patch seems more complicated than necessary. expand_end_bindings already calls pop_temp_slots. So all you have to do is arrange for locals to have the right temp_slot_level. I tried looking at this. We are calling preserve_stack_temps, and that preserves all temp slots whose address was taken. This means a local of array type won't be freed when its scope terminates. The question then is whether this is really necessary to preserve values whose address was taken. This code is used to preserve values that might be needed by an expression statement result. In this case, the result is const0_rtx. This code has a test for !MEM. This is apparently to handle the case where the result is in a REG that might contain an address. However, !MEM also includes constants, and constants can't possibly contain the address of a stack local. I think this problem would go away if we fixed this code to do nothing for constant results. The problematic code is here: /* If X is not in memory or is at a constant address, it cannot be in a temporary slot, but it can contain something whose address was taken. */ if (p == 0 && (GET_CODE (x) != MEM || CONSTANT_P (XEXP (x, 0)))) We should do nothing here if x is a constant, but there is no code for that. Jim
From: Daniel Jacobowitz <drow@mvista.com> To: Jim Wilson <wilson@tuliptree.org> Cc: gcc-gnats@gcc.gnu.org Subject: Re: middle-end/9997: Coelesce stack slots for disjoint scopes. Date: Sun, 9 Mar 2003 15:45:47 -0500 On Sun, Mar 09, 2003 at 11:32:47AM -0500, Jim Wilson wrote: > This patch seems more complicated than necessary. expand_end_bindings > already calls pop_temp_slots. So all you have to do is arrange for > locals to have the right temp_slot_level. > > I tried looking at this. We are calling preserve_stack_temps, and that > preserves all temp slots whose address was taken. This means a local of > array type won't be freed when its scope terminates. The question then > is whether this is really necessary to preserve values whose address was > taken. > > This code is used to preserve values that might be needed by an > expression statement result. In this case, the result is const0_rtx. > This code has a test for !MEM. This is apparently to handle the case > where the result is in a REG that might contain an address. However, > !MEM also includes constants, and constants can't possibly contain the > address of a stack local. I think this problem would go away if we > fixed this code to do nothing for constant results. The problematic > code is here: > /* If X is not in memory or is at a constant address, it cannot be in > a temporary slot, but it can contain something whose address was > taken. */ > if (p == 0 && (GET_CODE (x) != MEM || CONSTANT_P (XEXP (x, 0)))) > We should do nothing here if x is a constant, but there is no code for that. I think I was looking at free_temp_slots instead of pop_temp_slots, so I just assumed p->keep was causing the problem; I didn't realize that pop_temp_slots overrode keep. Thanks. Your suggestion does fix Richard's test, so I bootstrapped and tested it on i686-pc-linux-gnu. The patch is below; however, it causes one new failure: FAIL: g++.dg/opt/alias2.C execution test The interesting segment of the diff is: - movl %eax, -56(%ebp) - movl -24(%ebp), %eax - movl %eax, -44(%ebp) + movl %eax, -24(%ebp) + movl %eax, -12(%ebp) call _ZN11_Deque_baseD2Ev I'm not quite sure what's going on. 0x123 ends up in eax instead of, I think, the "this" pointer. -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer 2003-03-09 Daniel Jacobowitz <drow@mvista.com> * function.c (preserve_temp_slots): If X is constant, don't preserve temp slots. Suggested by Jim Wilson. Index: function.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/function.c,v retrieving revision 1.405 diff -u -p -r1.405 function.c --- function.c 4 Mar 2003 06:20:12 -0000 1.405 +++ function.c 9 Mar 2003 16:52:29 -0000 @@ -1132,6 +1132,10 @@ preserve_temp_slots (x) return; } + /* If X is a constant, then we don't need to preserve stack slots. */ + if (CONSTANT_P (x)) + return; + /* If X is a register that is being used as a pointer, see if we have a temporary slot we know it points to. To be consistent with the code below, we really should preserve all non-kept slots
From: Jim Wilson <wilson@tuliptree.org> To: Daniel Jacobowitz <drow@mvista.com> Cc: gcc-gnats@gcc.gnu.org Subject: Re: middle-end/9997: Coelesce stack slots for disjoint scopes. Date: 09 Mar 2003 22:12:03 -0500 On Sun, 2003-03-09 at 15:45, Daniel Jacobowitz wrote: > FAIL: g++.dg/opt/alias2.C execution test I wasn't able to reproduce this with a quick stage1 build. This might take a little time to track down. I am leaving on a trip tomorrow, so I may not be able to look at this until late next week. Jim
From: Jim Wilson <wilson@tuliptree.org> To: Daniel Jacobowitz <drow@mvista.com> Cc: gcc-gnats@gcc.gnu.org Subject: Re: middle-end/9997: Coelesce stack slots for disjoint scopes. Date: 29 Apr 2003 23:12:45 -0400 --=-CssSwvW0ZmvCLU9WCSS5 Content-Type: text/plain Content-Transfer-Encoding: 7bit On Sun, 2003-03-09 at 15:45, Daniel Jacobowitz wrote: > On Sun, Mar 09, 2003 at 11:32:47AM -0500, Jim Wilson wrote: > The patch is below; however, it causes one new failure: > FAIL: g++.dg/opt/alias2.C execution test This is failing because we have two objects on the stack, which are allocated to overlapping slots, and which have overlapping lifetimes. When we try to copy fields from one to the other, we accidentally clobber a field in the first one before we read it. This is obvious using the 2003-04-14 snapshot. I get: movl -36(%ebp), %eax movl %eax, -24(%ebp) /* movl -24(%ebp), %eax deleted by optimizer as redundant */ movl %eax, -12(%ebp) Note that the objects are 12 bytes apart, and we are copying two fields which are 12 bytes apart. Thus the first copy clobbers the source of the second copy before we read it. The same problem exists in current sources, but is less obvious, because the two objects are not 12 bytes apart, and thus they still overlap, but it works by accident. movl -36(%ebp), %eax movl %eax, -40(%ebp) movl -24(%ebp), %eax movl %eax, -28(%ebp) The failure occurs because we have a TARGET_EXPR with a cleanup. In genrtl_stmt_expr_value, we call expand_start_target_temps which calls push_temp_slots. The current temp_slot_level is stored in target_temp_slot_level. Then we call expand_stmt_expr_value which evaluates the stmt expr. Along the way, we evaluate a TARGET_EXPR which creates a stack level at target_temp_slot_level, and gives it a cleanup. When we get to the end of expand_stmt_expr_value, we call free_temp_slots. Since there were no push_temp_slots calls along the way, target_temp_slot_level == temp_slot_level, and this frees the slot allocated to the TARGET_EXPR. Then when we call the cleanup (destructor), we are reading from a freed stack slot, and we have a problem. It appears that we need another set of push_temp_slot/pop_temp_slot calls here, or else we need to mark the TARGET_EXPR stack slot as kept until the end of the binding level. I have mostly convinced myself that marking it as kept is the right solution. It appears that this was broken when the C++ front end was converted to generate an entire function as a tree. This was such a large change that I don't think it makes any sense to try to track down exactly why or how this got broken. This gives me the following patch. I haven't tried testing it yet, other than against the alias2.C testcase. Since this has one part that reduces stack space usage and one part that increases stack space usage, we need to check to see what kind of effect this has on C++ performance. Hopefully it doesn't make things worse. It seems OK for the alias2.C testcase. Main uses the same stack space before and after the patch. Jim --=-CssSwvW0ZmvCLU9WCSS5 Content-Disposition: attachment; filename=tmp.file.2 Content-Transfer-Encoding: quoted-printable Content-Type: text/x-patch; name=tmp.file.2; charset=UTF-8 Index: function.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /cvs/gcc/gcc/gcc/function.c,v retrieving revision 1.421 diff -p -r1.421 function.c *** function.c 22 Apr 2003 12:09:09 -0000 1.421 --- function.c 30 Apr 2003 05:36:43 -0000 *************** assign_stack_temp_for_type (mode, size,=20 *** 796,802 **** if (keep =3D=3D 2) { p->level =3D target_temp_slot_level; ! p->keep =3D 0; } else if (keep =3D=3D 3) { --- 796,802 ---- if (keep =3D=3D 2) { p->level =3D target_temp_slot_level; ! p->keep =3D 1; } else if (keep =3D=3D 3) { *************** preserve_temp_slots (x) *** 1134,1139 **** --- 1134,1143 ---- =20 return; } +=20 + /* If X is a constant, then we don't need to preserve stack slots.=C2= =A0 */ + if (CONSTANT_P (x)) + return; =20 /* If X is a register that is being used as a pointer, see if we have a temporary slot we know it points to. To be consistent with --=-CssSwvW0ZmvCLU9WCSS5--
From: Jim Wilson <wilson@tuliptree.org> To: Daniel Jacobowitz <drow@mvista.com> Cc: gcc-gnats@gcc.gnu.org Subject: Re: middle-end/9997: Coelesce stack slots for disjoint scopes. Date: 29 Apr 2003 23:19:28 -0400 While looking at this, I noticed that the var_temp_slot_level stuff seems to be broken. In expand_expr, case SAVE_EXPR, we call assign_temp with keep == 3. This ends up using var_temp_slot_level. However, there is no place in the compiler that ever sets var_temp_slot_level. It is initialized to 0, so all SAVE_TEMP temps end up at level 0, which means they live until the end of the function. This can't be right. I suspect this was also broken by the functions-as-trees stuff. I haven't looked into this yet. We can perhaps open a new PR for this problem. Jim
This looks fixed on the tree-ssa branch.
For some reason this is no longer fixed on the tree-ssa but it could be and should be able to be fixed.
*** Bug 15400 has been marked as a duplicate of this bug. ***
Would someone please assign this? This bug can severely impact systems with limited memory.
Rather than looking just at the scopes of variables, this should properly be fixed by looking at the live ranges of variables. W.
The only issue with the patch from Daniel Jacobowitz was a problem with TARGET_EXPR, and I wasn't sure how to fix that. Meanwhile, Ulrich Weigand checked in a patch May 1 to fix an aliasing problem with TARGET_EXPRs that was apparently related to the problem I identified last year, as his patch is a superset of mine. Therefore, it may be the case that Daniel's patch is OK now. It needs retesting. The SAVE_EXPR/var_temp_slot_level stuff is still broken. Someone needs to look into that.
Subject: Re: Coelesce stack slots for disjoint scopes. On Wed, 2004-05-12 at 10:44, leblanc at skycomputers dot com wrote: > Would someone please assign this? This bug can severely impact systems with limited memory. See the patch in comment #6. It should work, but this stuff is tricky, as there is no way to prove that this patch is safe. We may just have to try it and see what happens.
Subject: Re: Coelesce stack slots for disjoint scopes. On Wed, May 12, 2004 at 10:55:00PM -0000, wilson at gcc dot gnu dot org wrote: > > ------- Additional Comments From wilson at gcc dot gnu dot org 2004-05-12 22:54 ------- > The only issue with the patch from Daniel Jacobowitz was a problem with > TARGET_EXPR, and I wasn't sure how to fix that. Meanwhile, Ulrich Weigand > checked in a patch May 1 to fix an aliasing problem with TARGET_EXPRs that was > apparently related to the problem I identified last year, as his patch is a > superset of mine. Therefore, it may be the case that Daniel's patch is OK now. > It needs retesting. > > The SAVE_EXPR/var_temp_slot_level stuff is still broken. Someone needs to look > into that. I'll try to take a look at these after the merge.
> we should allocate only one structure on the stack. > Here's some interesting numbers I got for this testcase. HEAD+stack is > the -fstack-reorg patch posted some time ago. A small clarification. -fstack-reorg doesn't affect stack height significantly. It had a slightly different objective (i.e. reorganizing frame to reduce address arithemtic for processors with limited offsets). Stack height reduction, if any, occurs due only due alignment holes elimination. (not in this case anyway) I realize that work can be used to reduce stack height as well. Myself and Rakesh Kumar plan to work on this problem shortly. If the proposed patch (for this PR) works, it is fine, otherwise we can look into this. However, it seems it will not be a bug-fixing exercise, because this is actually a design deficiency. Cannot give a commitment on timeline though.
A related but slightly different problem was exposed by a testcase from Ian Taylor. The message with the testcase is here: http://gcc.gnu.org/ml/gcc/2004-06/msg01827.html My analysis is here: http://gcc.gnu.org/ml/gcc/2004-06/msg01913.html Briefly, expand_builtin returns a value even when the ignore parameter is true. This confuses other code.
Working on it. Finally.
Subject: Bug 9997 CVSROOT: /cvs/gcc Module name: gcc Changes by: rth@gcc.gnu.org 2004-09-03 23:50:11 Modified files: gcc : ChangeLog Makefile.in cfgexpand.c Log message: PR middle-end/9997 * cfgexpand.c (LOCAL_ALIGNMENT): Provide default. (STACK_ALIGNMENT_NEEDED, FRAME_GROWS_DOWNWARD): Likewise. (struct stack_var, EOC, stack_vars, stack_vars_alloc, stack_vars_num, stack_vars_sorted, stack_vars_conflict, stack_vars_conflict_alloc, frame_phase, get_decl_align_unit, add_stack_var, triangular_index, resize_stack_vars_conflict, add_stack_var_conflict, stack_var_conflict_p, add_alias_set_conflicts, stack_var_size_cmp, union_stack_vars, partition_stack_vars, dump_stack_var_partition, expand_one_stack_var_at, expand_stack_vars, expand_one_stack_var, expand_one_static_var, expand_one_hard_reg_var, expand_one_register_var, expand_one_error_var, defer_stack_allocation, expand_one_var, expand_used_vars_for_block, clear_tree_used): New. (expand_used_vars): Rewrite. * Makefile.in (cfgexpand.o): Update dependencies. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.5243&r2=2.5244 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/Makefile.in.diff?cvsroot=gcc&r1=1.1367&r2=1.1368 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cfgexpand.c.diff?cvsroot=gcc&r1=2.18&r2=2.19
http://gcc.gnu.org/ml/gcc-patches/2004-09/msg00440.html