Bug 14455 - Structs that cannot alias are not SRA'd
Summary: Structs that cannot alias are not SRA'd
Status: RESOLVED WORKSFORME
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: tree-ssa
: P2 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on: 23983
Blocks:
  Show dependency treegraph
 
Reported: 2004-03-06 04:31 UTC by Timothy J. Wood
Modified: 2019-03-04 13:03 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2005-12-24 20:33:18


Attachments
Test input (1.57 KB, text/plain)
2004-03-06 04:32 UTC, Timothy J. Wood
Details
Updated test case for the (partial) workaround (1.71 KB, text/plain)
2004-04-07 03:16 UTC, Timothy J. Wood
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Timothy J. Wood 2004-03-06 04:31:02 UTC
Often in inner loops (no child functions called, big loop count), there will be several (many!) Altivec 
registers left unused.

The attached file demonstrates the problem, when built with tree-ssa with:

%PREFIX/bin/g++ -Winline -mdynamic-no-pic -fno-exceptions -O3 -maltivec -fstrict-aliasing 
-finline-functions -finline-limit=1000000000 -falign-loops=16 --param large-function-
growth=1000000 --param inline-unit-growth=1000 iterator_10.cpp -S -o /tmp/iterator_10.s

In my real world code, this is a big problem since I'm have code that does:

    - long computation for some answer
    - compute some address to ADD to the answer to (and the address is almost never in cache)
    - load from answer address
    - start on next loop
    - add answer and old answer and store

The problem is that the compiler totally blows the approach since it immediately stores the loaded 
old answer to the stack, causing a stall waiting for the load to complete (and thus preventing any 
asynchrony with the load and the computation in the next loop).
Comment 1 Timothy J. Wood 2004-03-06 04:32:32 UTC
Created attachment 5874 [details]
Test input
Comment 2 Andrew Pinski 2004-03-06 04:51:17 UTC
The problem I see is that there is no store/load motion in the loop for the C.* and state.* (note that 
they really are C->* and state->* as they are references), this is caused by aliasing anylasis not 
knowing that they can not be the same object.
Comment 3 Andrew Pinski 2004-03-06 04:58:37 UTC
here is a work around:
void foo(const Constants &C1, State &state1)
{
  Constants C = C1;
  State state = state1;
    for (int i = 0; i < 100; i++) {
	state.step(C);
    }
  state1 = state;
}
Comment 4 Timothy J. Wood 2004-04-07 03:14:38 UTC
This workaround doesn't entirely solve the problem, I think.  The issue is that passing 'C' here 
invokes an implicit copy constructor.  If the constructor isn't defined, memcpy is used (bad for 
inlining).  If the constructor is defined, is must be defined to take 'const Constants &C' as the 
argument and we're back in a similar boat as before.  Attaching an updated example.  This code is 
definitely better, but there should be zero load/stores in the inner loop but there still are a few.


Comment 5 Timothy J. Wood 2004-04-07 03:16:47 UTC
Created attachment 6048 [details]
Updated test case for the (partial) workaround
Comment 6 Andrew Pinski 2005-09-20 17:26:57 UTC
The other issue is that the altivec builtins are not marked so we think they can clobber the what the 
pointers point to.
Comment 7 Steven Bosscher 2010-07-21 21:09:53 UTC
The SRA rewrite for GCC 4.6 probably fixes the SRA part of this bug report (at last!).
Can someone with a powerpc box have a look?