This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[tree-ssa] Initial implementation of flow-sensitive points-toanalysis

The whole idea of this patch is to take advantage of the SSA web to
compute flow sensitive points-to sets for each SSA pointer in the
program.  So, given this code fragment:

foo (int c)
  int a, b;

  if (c_1 > 5)
    p_2 = &b;
     p_4 = malloc ();
     if (c_1)
       p_5 = &a;
     p_6 = PHI (p_4, p_5);
     *p_6 = ...;
  p_7 = PHI (p_2, p_6);

  return *p_7

(Yes. It's contrived).  With type-based analysis we would've determined
that p can point to 'a' or 'b'.  With flow insensitive points-to
analysis, we would've come to the same conclusion.  But if we collect
points-to information on each SSA_NAME, then we will determine that p_6
may only point to 'a', while p_7 may point to 'a' and 'b'.  Collecting
points-to information on SSA_NAMEs is what gives us flow sensitivity.

This is a significant overhaul to our alias analysis code and not all I
wanted to do is finished.  However, the basic design should not change
significantly.  Alias analysis now proceeds in 3 main phases:

1- Points-to and escape analysis.

This phase walks the use-def chains in the SSA web looking for three

      * Assignments of the form P_i = &VAR
      * Assignments of the form P_i = malloc()
      * Pointers and ADDR_EXPR that escape the current function.

The concept of 'escaping' is the same one used in the Java world.  When
a pointer or an ADDR_EXPR escapes, it means that it has been exposed
outside of the current function.  So, assignment to global variables,
function arguments and returning a pointer are all escape sites.

This is where we are currently limited.  Since not everything is renamed
into SSA, we lose track of escape properties when a pointer is stashed
inside a field in a structure, for instance.  In those cases, we are
assuming that the pointer does escape.

We use escape analysis to determine whether a variable is
call-clobbered.  Simply put, if an ADDR_EXPR escapes, then the variable
is call-clobbered.  If a pointer P_i escapes, then all the variables
pointed-to by P_i (and its memory tag) also escape.

2- Compute flow-sensitive aliases

We now have two classes of memory tags.  The classic one we've been
using for TBAA, which the patch renamed to "type memory tag" (TMT), and
memory tags associated with SSA_NAMEs, called "name memory tag" (NMT). 
The basic idea is that when adding operands for an INDIRECT_REF *P_i, we
will first check whether P_i has a name tag, if it does we use it,
because that will have more precise aliasing information.  Otherwise, we
use the standard type tag.

In this phase, we go through all the pointers we found in points-to
analysis and create alias sets for the name memory tags associated with
each pointer P_i.  If P_i escapes, we mark call-clobbered the variables
it points to and its tag.

3- Compute flow-insensitive aliases

This is the exact same algorithm we had before.  We associate alias sets
to each memory tag based on the results given by alias_sets_conflict_p.


Since we only use the SSA web to derive points-to information, this
analysis is not very precise because our SSA web does not contain
information about a whole lot of things that may be used to stash
pointers away.

However, the intent was not to have perfect information.  I mostly
wanted to improve our base alias analyzer to stop it from being so
naive.  We could increase the amount of processing done, or even repeat
this pass to improve accuracy as we go along.

For instance, one of the things that happen quite often is our flow
restructuring phases will obliterate all the flow-sensitive information
(no big surprise).  So, we could probably repeat alias analysis after
things like jump threading.

In terms of compile time costs, I was expecting a significant hit, but
it didn't look too bad.  We now have potentially more variables to deal
with (the name tags), but on the other hand, our aliasing information is
more precise, so the number of virtual operands shouldn't change much. 
On the collection of .i/.ii files, I counted about 5% fewer VDEF/VUSE
operands in the .optimized dumps.

As far as compile time goes, things balanced out.  Total compile time
decreased 0.6% (from 665 secs to 661 secs).  Out of which, we spent a
grand total of 1.03 seconds doing alias analysis (including points-to,
escape, flow-sensitive and TBAA).

For DLV, compile times increased a bit.  I took a few .ii files out of
DLV and measured a 1.5% increase in compile time (from 281 secs to 285
secs).  However, we get a significant reduction in memory consumption.  

Measured by -fmem-report, maximum memory consumption goes from 323Mb to
265Mb (~18% reduction).  According to 'top', we go from a maximum of
321Mb to 300Mb (~6% reduction).

SPEC2k results are mostly positive, but within noise thresholds.  The
best results are for gzip (+2%), perlbmk (+2%) and wupwise (+7%).  Build
times for SPECint are down by about 2%.

SPECint results for peak

    Benchmark   Before   After  % diff
     164.gzip   636.93  649.94  +  2.04%
      175.vpr   427.93  423.22  -  1.10%
      176.gcc     0.00    0.00  INF
      181.mcf   413.85  419.70  +  1.41%
   186.crafty   669.03  678.89  +  1.47%
   197.parser   561.46  560.83  -  0.11%
      252.eon     0.00    0.00  INF
  253.perlbmk   757.42  773.51  +  2.12%   712.86  722.39  +  1.34%
   255.vortex   848.59  836.78  -  1.39%
    256.bzip2   526.98  525.32  -  0.31%
    300.twolf   527.53  526.81  -  0.14%
         mean   593.22  594.73  +  0.25%

SPECfp result for peak

    Benchmark   Before   After  % diff
  168.wupwise   546.11  584.59  +  7.05%
     171.swim   501.81  506.01  +  0.84%
    172.mgrid   378.88  378.70  -  0.05%
    173.applu   532.45  540.71  +  1.55%
     177.mesa   450.33  446.69  -  0.81%
   178.galgel     0.00    0.00  INF   193.67  192.66  -  0.52%
   183.equake   820.99  820.67  -  0.04%
  187.facerec     0.00    0.00  INF
     188.ammp   349.81  351.56  +  0.50%
    189.lucas     0.00    0.00  INF
    191.fma3d     0.00    0.00  INF
 200.sixtrack     0.00    0.00  INF
     301.apsi     0.00    0.00  INF
         mean   439.21  443.77  +  1.04%

So, all in all, we are doing a bit better job.  There are a few FIXMEs
in the patch that I will address before the merge.  I will also keep an
eye on any significant regressions that people may find with it.

Bootstrapped and tested on x86, x86-64, alpha and ia64.  I will be
committing the patch in a day or two, after I re-test with the new
changes that have gone in the branch.


Attachment: 20040209-pta-ssa.diff.gz
Description: GNU Zip compressed data

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]