Bug 39604 - [4.3/4.4/4.5 Regression] tree-ssa-sink breaks stack layout
[4.3/4.4/4.5 Regression] tree-ssa-sink breaks stack layout
Status: RESOLVED DUPLICATE of bug 39509
Product: gcc
Classification: Unclassified
Component: tree-optimization
4.5.0
: P2 major
: 4.3.5
Assigned To: Not yet assigned to anyone
: wrong-code
: 41354 (view as bug list)
Depends on:
Blocks: 39509
  Show dependency treegraph
 
Reported: 2009-03-31 22:36 UTC by Sandra Loosemore
Modified: 2011-06-01 17:34 UTC (History)
11 users (show)

See Also:
Host:
Target: arm-none-eabi
Build:
Known to work: 3.4.0
Known to fail: 4.4.0
Last reconfirmed: 2009-04-01 08:42:30


Attachments
C++ test case sink-1.C (854 bytes, text/plain)
2009-03-31 22:37 UTC, Sandra Loosemore
Details
C test-case for the same problem on x86_64 and i386. (680 bytes, text/plain)
2009-05-10 00:56 UTC, Doug Kwan
Details
type-correct version (708 bytes, text/plain)
2009-05-10 02:09 UTC, Michael Matz
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sandra Loosemore 2009-03-31 22:36:05 UTC
As reported in this thread:

http://gcc.gnu.org/ml/gcc-patches/2009-03/msg01798.html

This problem was reported by an ARM user and reproduced on arm-none-eabi, but is not target-specific.

If the attached test program is compiled with -O1, it fails by incorrectly calling a pure virtual method.  What is happening is that tree-ssa-sink is moving code from the inlined destructor for STUFF, in the first nested block, into the second nested block, where it ends up after the code for the inlined constructor for STUFF2.  Then, cfgexpand comes along and decides that STUFF and STUFF2 can share stack space because they are in disjoint lexical blocks.  Thus the sunk destructor statement for STUFF ends up trashing the vtable of STUFF2.  The test program appears to work correctly at -O2 only because -fstrict-aliasing prevents cfgexpand from assigning STUFF and STUFF2 to the same stack offset.

Per further discussion in the thread above, cfgexpand's stack layout should not be using lexical block scoping information to determine when stack variables may share storage, as GIMPLE lowering removes lexical scopes and promotes all locals to function scope, and subsequent middle-end optimizations do not preserve the lexical block structure.  Since stack variable sharing is an important optimization for some applications, some other form of lifetime analysis is needed.

Apparently PR middle-end/32327 was another incarnation of this same problem, but was closed without really addressing it.
Comment 1 Sandra Loosemore 2009-03-31 22:37:50 UTC
Created attachment 17573 [details]
C++ test case sink-1.C
Comment 2 Michael Matz 2009-04-01 11:45:21 UTC
The old stack slot sharing problem.  stack slot sharing really wants to
look at scopes to determine if stack variables can share the same space or
not (stack variables, not registers, those are all top-level).  But sometimes
tree transformations do fiddle also with stack variables (loads/stores) and
hence can move references to variables outside of their block tree.

The block tree either needs to be fixed up, or the stack slot sharing needs
to look at other information than block scopes to determine lifeness bounds.

Both of these things are probably best done in cfgexpand, and as I'm working
on expanding from SSA form I might try to do something here (though the
variables in question are exactly those which are _not_ in SSA form).
Comment 3 Jakub Jelinek 2009-04-01 14:00:32 UTC
Even for live range analysis of the vars that must go into stack the block info
needs to be used, otherwise once address of a stack var escapes, you'd have to assume it is live almost till the end of the function (at least live on any function calls that might see it through global state, or global ptr dereferences etc.).  Using the block info, you can find out that:
  {
    char buf[10];
    foo (buf);
    bar ();
  }
  baz ();
in baz () call the buf var doesn't need to be live anymore.  At least short term I think it would be best just to walk the function to be expanded, look at all gimple_block BLOCKs referenced and note which originally non-overlapping BLOCKs are indeed still non-overlapping after the tree passes and don't consider originally non-overlapping BLOCKs during stack slot sharing decisions if they are overlapping.
Comment 4 Michael Matz 2009-04-01 14:05:01 UTC
Yes, that's my thought too.  Fixing the BLOCK_VARS when references to
variables show up in a block where they weren't before.
Comment 5 rguenther@suse.de 2009-04-01 14:10:39 UTC
Subject: Re:  [4.3/4.4/4.5 Regression] tree-ssa-sink
 breaks stack layout

On Wed, 1 Apr 2009, jakub at gcc dot gnu dot org wrote:

> ------- Comment #3 from jakub at gcc dot gnu dot org  2009-04-01 14:00 -------
> Even for live range analysis of the vars that must go into stack the block info
> needs to be used, otherwise once address of a stack var escapes, you'd have to
> assume it is live almost till the end of the function (at least live on any
> function calls that might see it through global state, or global ptr
> dereferences etc.).  Using the block info, you can find out that:
>   {
>     char buf[10];
>     foo (buf);
>     bar ();
>   }
>   baz ();
> in baz () call the buf var doesn't need to be live anymore.  At least short
> term I think it would be best just to walk the function to be expanded, look at
> all gimple_block BLOCKs referenced and note which originally non-overlapping
> BLOCKs are indeed still non-overlapping after the tree passes and don't
> consider originally non-overlapping BLOCKs during stack slot sharing decisions
> if they are overlapping.

Note that once we use more precise alias information during RTL
optimizations (via alias exports patch) avoiding stack slot sharing
solely based on type-based conflicts will no longer be working.

Richard.
Comment 6 Jakub Jelinek 2009-04-01 14:15:57 UTC
One would hope that with -fno-strict-aliasing we can still share the stack slots even with alias export patch, otherwise the kernel people are going to be extremely violent on us.
Comment 7 rguenther@suse.de 2009-04-01 14:29:03 UTC
Subject: Re:  [4.3/4.4/4.5 Regression] tree-ssa-sink
 breaks stack layout

On Wed, 1 Apr 2009, jakub at gcc dot gnu dot org wrote:

> ------- Comment #6 from jakub at gcc dot gnu dot org  2009-04-01 14:15 -------
> One would hope that with -fno-strict-aliasing we can still share the stack
> slots even with alias export patch, otherwise the kernel people are going to be
> extremely violent on us.

I think it would be easier to fix scheduling to properly detect
the must-conflicts by noting down the stack slot partition used
in MEMs (and obviously if we end up having MEMs w/o an assigned
parition make that conflict with any other - this can then be
refined with PTA information from the alias export).

The same is true for gcse-sm - any other RTL pass that may
be a problem?

Richard.
Comment 8 Michael Matz 2009-04-01 14:47:27 UTC
One other approach is to create new aliasing conflicts once we union
two stack partitions.  That would inhibit any invalid (after sharing)
transformations by RTL transformations downstream (when they look at only such
conflicts, i.e. TBAA).  If they also look at other disambiguation
possibilities then we indeed need to make the stack partition part of the IL 
and use it in the alias queries.
Comment 9 Sandra Loosemore 2009-04-03 12:54:06 UTC
After the merge of the alias_improvements branch to trunk, the test case no longer compiles incorrectly at -O1.  Is this coincidence, or a real fix that addresses the underlying problem?
Comment 10 rguenther@suse.de 2009-04-03 13:23:37 UTC
Subject: Re:  [4.3/4.4/4.5 Regression] tree-ssa-sink
 breaks stack layout

On Fri, 3 Apr 2009, sandra at codesourcery dot com wrote:

> ------- Comment #9 from sandra at codesourcery dot com  2009-04-03 12:54 -------
> After the merge of the alias_improvements branch to trunk, the test case no
> longer compiles incorrectly at -O1.  Is this coincidence, or a real fix that
> addresses the underlying problem?

I believe this is a coincidence.  Can you investigate what the difference 
is?

Richard.
Comment 11 Doug Kwan 2009-05-09 01:21:44 UTC
We saw this also in gcc-4.3.1 on target arm-none-eabi.
Comment 12 Doug Kwan 2009-05-10 00:56:06 UTC
Created attachment 17840 [details]
C test-case for the same problem on x86_64 and i386.

The original C++ test-case does not crash on x86_64 and i386.  I compared the generated code and found that ARM EABI requires constructors to return values.  That is not common and certainly is not the case on x86_64 and i386.  So gcc generates very different code on both targert very early on.  The C test-case was created based on the lowered C++ code in the middle-end.  It crashes with just -O2 on x86_64 with both 4.3.1 and 4.4.0; it crashes on i386 similarly with 4.3.1.  I did not test 4.4.0 on i386.

Unfortunately this does not crash on ARM.
Comment 13 Michael Matz 2009-05-10 02:09:55 UTC
Created attachment 17843 [details]
type-correct version

This is a type-corrected version of the same C testcase.  Two structs are
different types even if they contain the same members, so the casts and
accesses as 'struct a' where the object really was 'struct b' are not valid.
That can be solved by modeling the same as in C++, via proper subobjects of
base-type.

This testcase still exhibits the abort() with -O2 with 4.3.2 on i386 and
x86_64.  It doesn't fail with current trunk, but most probably for the same
coincidence since a-i merge.  I haven't checked 4.4.x.
Comment 14 Richard Biener 2009-05-10 10:00:01 UTC
Another possible solution that was mentioned is to delay stack slot allocation
until after scheduling (or possibly later code motion optimizations).  Before
that stack slots could be represented by their own pseudos (pointers to the
base of the stack slots) and conflict analysis could be done on RTL based on
life ranges of the memory pointed to by them.  A complication is how to recover
scope based lifetime rules if the stack slots escape to function calls
(but I suggested to do these merging opportunities early on the tree level,
during gimple lowering).
Comment 15 Steven Bosscher 2009-05-10 13:51:36 UTC
The late stack slot allocation idea will just cause other problems, like missing CSE of addresses.  GCC should just get the conflicts right somehow...
Comment 16 rguenther@suse.de 2009-05-10 14:14:44 UTC
Subject: Re:  [4.3/4.4/4.5 Regression] tree-ssa-sink
 breaks stack layout

On Sun, 10 May 2009, steven at gcc dot gnu dot org wrote:

> ------- Comment #15 from steven at gcc dot gnu dot org  2009-05-10 13:51 -------
> The late stack slot allocation idea will just cause other problems, like
> missing CSE of addresses.  GCC should just get the conflicts right somehow...

"somehow..." ;)

I don't see how without making the coalescing decisions part of the IL
on RTL (and of course see if coalescing is even possible before doing it).

What about dropping all MEM_EXPRs for accesses to coalesced variables?
With the new memory-model that should be enough, without it you'd also
need to make the use alias set zero.

Richard.
Comment 17 Steven Bosscher 2009-05-10 14:22:47 UTC
How will dropping the MEM_EXPRs solve the wrong coalescing in cfgexpand?
Comment 18 rguenther@suse.de 2009-05-10 14:32:22 UTC
Subject: Re:  [4.3/4.4/4.5 Regression] tree-ssa-sink
 breaks stack layout

On Sun, 10 May 2009, steven at gcc dot gnu dot org wrote:

> ------- Comment #17 from steven at gcc dot gnu dot org  2009-05-10 14:22 -------
> How will dropping the MEM_EXPRs solve the wrong coalescing in cfgexpand?

Not.  That was just to avoid making coalescing decisions part of the IL
for RTL.

Richard.
Comment 19 Richard Biener 2009-08-04 12:30:01 UTC
GCC 4.3.4 is being released, adjusting target milestone.
Comment 20 Ramana Radhakrishnan 2009-09-18 07:19:34 UTC
*** Bug 41354 has been marked as a duplicate of this bug. ***
Comment 21 Richard Biener 2009-12-16 20:53:08 UTC
In a discussion we decided that introducing a __builtin_undefined () that
we can assign to variables at the point they die would be the proper way
to fix this.
Comment 22 Andrew Pinski 2010-03-08 22:50:21 UTC
This is the same as bug 39509. 

*** This bug has been marked as a duplicate of 39509 ***
Comment 23 Sandra Loosemore 2011-06-01 17:34:56 UTC
Draft patch that addresses this bug here:

http://gcc.gnu.org/ml/gcc-patches/2011-05/msg02029.html