This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[tree-ssa] Returning to alias analysis


Now that I'm done mucking up CCP, it's time to return to alias analysis, 
which accounts for roughly 2% of the cpu time for my components of cc1
test.

To recap, the single biggest problem I see with the performance of our
type based alias analysis is that it computes may-alias relationships
for load pairs.  This information is completely and totally useless.

I've got a patch which allows us to distinguish between loads and
stores so that we do not compute alias information for load-load pairs.

Here's some data to give you some idea of how much this concept can
improve the speed of our type based aliasing code.

For my components of cc1 test, the branch currently spends 13.46 seconds
computing type based alias information.  This accounts for roughly 2% of
the total compile time for the test.

With my patch alias analysis time drops to 4.3 seconds.  That's a 9.16
second improvement in alias analysis.  But wait, it gets even better,
in addition to speeding up the aliasing code we get more accurate
aliasing information which enables us to disambiguate more loads and stores.

The downside of disambiguating more loads and stores is that the SSA
rewriting, PHI node insertion and SSA optimizers may have more work to
do.  For the same test, that additional work costs .6 seconds.

Just to reiterate.  alias analysis improves by 9.16 seconds.  The following
passes in the SSA code get worse by .6 seconds.  Meaning that we have a 
net improvement of 8.5 seconds -- or just over 1.1% of the total time.  I feel
this is a good indicator of typical performance for this change.


If we look at 20001226-1.c, we get a good feel for one of the potential
downsides.

In this test we spend very little time in alias analysis before or after
my changes (.28 and .24 seconds respectively).  However, after my changes
the time for PHI node insertion explodes (.01 seconds vs 100 seconds).
Why?

This is due to my changes allowing the alias analysis code to disambiguate
all the memory references in that test (something like 32k of them).  This
in turn means that we have thousands of objects for which we'll need to
create PHI nodes (instead of just 3). Most of the PHI node insertion time
is spent computing global live information those 32k objects.

But note that the test still completes without blowing the machine up --
meaning that the semi-pruned vs fully-pruned PHI node insertion heuristics
are working properly.

I've got one additional tweak to investigate which could improve things
further..

Jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]