[alias-improvements] Review of branch as of 2009-01-23

Sun Feb 1 13:55:00 GMT 2009

On Sat, Jan 31, 2009 at 11:31 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Sat, Jan 31, 2009 at 10:30 PM, Diego Novillo <dnovillo@google.com> wrote:
>> I took a snapshot of the branch as of 2009-01-23 and reviewed the
>> diff against trunk as of the latest merge point.  I saw various
>> changes to the branch come in during this time, so some of the
>> comments may be stale by now (sorry).
>>
>> Some general observations:
>>
>> - Overall, the cleanups in the branch look impressive.  There is
>>  a lot of simplified logic in the basic operand manipulation
>>  routines.  Good stuff.
>
> That probably accounts for most of the compile-time improvements I saw.
>
>> - We need to get an idea of how much codegen quality has degraded
>>  (or not).  This change is very disruptive as it shifts the
>>  responsibility from the memory SSA web to the alias oracle. We
>>  need to find the right balance.  I suspect that the memory web
>>  in the IL will always need to be very coarse.
>>
>> - What is the compile time performance profile?  We are trading
>>  slowness in one place for slowness in another.  Though now the
>>  slowness should only happen in memory passes.  I also didn't
>>  notice very many changes to the various memory optimizers
>>  (except, perhaps PRE), it mostly looked like updates for the
>>  new virtual operand interfaces.  What do you estimate needs to
>>  be changed for them to come back to parity with the existing
>>  behaviour?
>
> I am running SPEC2006, SPEC2000, Polyhedron and our set of C++
> benchmarks (tramp3d, DLV and some more) continuously on the
> branch vs. the trunk.  Overall the branch is better in compile-time
> and memory-usage, optimization seems to be on-par overall
> with occasional slower/faster cases, a few of them probably worth
> investigating.  I promised to post some numbers, I'll hope to manage
> to next week (I have to extract them first...).  You can see graphs
> beyond http://gcc.opensuse.org/, follow "frescobaldi" and
> "frescobaldi (alias-improvements)".

I am looking at the results now and just want to cite overall performance
numbers.  On x86_64 we have for SPEC2006 (base: -O2,
peak: -O3 -march=native -ffast-math -funroll-loops -fpeel-loops):

                                                     trunk           branch
compile-time of all FP binaries:   500s/800s   420s/740s
compile-time of all INT binaries:  180s/255s   175s/250s
GCC build time (noisy):                 420s           420s
SPEC2006 INT overall score:       10.9/11.05  10.9/11.05
SPEC2006 FP overall score:         9.55/10.65  9.55/10.7

so we get significantly reduced benchmark compile-time at no cost for
the above scenario.  The other example would be tramp3d, my favorite
benchmark, where for -fprofile-generate with leafify (the worst compile-time
offender and probably a good case for long stmt walks) the branch builds
this in 132s vs. 134s on the trunk.  Runtime at -O3 is 6.9s vs 7.9s, the
branch is more than 10% faster.

Polyhedron scores are (pasting from the last available logfile):

trunk:
   Benchmark   Compile  Executable   Ave Run  Number   Estim
        Name    (secs)     (bytes)    (secs) Repeats   Err %
   ---------   -------  ----------   ------- -------  ------
          ac      1.54     1305543     13.79       5  0.0094
      aermod     47.06     2567970     36.01       5  0.0401
         air      3.00     1363170     10.97       5  0.1328
    capacita      1.96     1376112     82.94       4  0.0326
     channel      1.13     1311637      6.45       5  0.4735
       doduc      5.95     1468235     43.35       5  0.0334
     fatigue      2.53     1405225      9.84       5  0.1173
     gas_dyn      3.51     1423748      8.19       5  0.0663
      induct      5.11     1552778     25.36       5  0.0064
       linpk      1.02     1299568     20.09       5  0.0474
        mdbx      1.94     1334296     18.19       5  0.0660
          nf      4.71     1392755     27.51       5  0.0913
     protein      5.24     1518725     51.17       5  0.0361
      rnflow      6.67     1511810     35.58       5  0.0770
    test_fpu      6.31     1485860     15.31       5  0.0476
        tfft      0.67     1321163      7.31       5  0.0754

Geometric Mean Execution Time =      19.73 seconds

branch:
   Benchmark   Compile  Executable   Ave Run  Number   Estim
        Name    (secs)     (bytes)    (secs) Repeats   Err %
   ---------   -------  ----------   ------- -------  ------
          ac      1.91     1300871     13.85       5  0.0083
      aermod     42.95     2547458     36.52       5  0.1237
         air      2.95     1358498     10.86       5  0.0933
    capacita      1.84     1371760     81.43       4  0.1287
     channel      1.16     1306949      6.47       5  0.3794
       doduc      5.90     1463547     44.13       5  0.0436
     fatigue      2.44     1400761     10.34       5  0.0555
     gas_dyn      3.32     1419268      8.30       5  0.1099
      induct      5.01     1548554     23.81       5  0.0420
       linpk      1.01     1294896     20.10       5  0.0077
        mdbx      1.89     1329608     18.77       5  0.0771
          nf      4.56     1388275     27.53       5  0.0161
     protein      5.12     1509173     50.23       5  0.0195
      rnflow      6.57     1507522     34.93       5  0.0223
    test_fpu      6.27     1481428     15.36       5  0.0305
        tfft      0.68     1316651      7.37       5  0.1213

Geometric Mean Execution Time =      19.76 seconds

both at -ffast-math -funroll-loops -O3.

Richard.