This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Please benchmark --param large-function-insns=3000 (Was Re: Some updates on tree-ssa and PR8361)


Hi,

I did some more benchmarking and it looks like 3.4 code even with the limit
changes is performing much better than any earlier version and the compile
times are now slightly better too.

[0]: 2.95
[1]: 3.0.4
[2]: 3.3.2
[3]: -O3 -fno-unit-at-a-time
[4]: -O3 --param large-function-insns=1000
[5]: -O3 --param large-function-insns=3000
[6]: -O3

(times, faster is better)
                     |     [0]      |     [1]      |     [2]      |     [3]      |     [4]      |     [5]      |     [6]     | speedup 
---------------------+--------------+--------------+--------------+--------------+--------------+--------------+-------------+ [0]->[6]
      STRATCOMP1-ALL |  2.45 (0.00) | 24.92 (0.02) |  8.31 (0.01) |  4.68 (0.00) |  2.58 (0.03) |  2.58 (0.03) | 2.58 (0.02) | (-5%)
   STRATCOMP-770.2-Q |  0.49 (0.00) |  0.57 (0.00) |  0.47 (0.00) |  1.22 (0.04) |  0.45 (0.00) |  0.45 (0.01) | 0.45 (0.00) | (8%)
               2QBF1 | 10.92 (0.04) | 13.96 (0.07) | 11.06 (0.04) | 28.68 (0.03) | 10.44 (0.11) | 10.29 (0.10) | 9.33 (0.03) | (17%)
          PRIMEIMPL2 |  7.52 (0.02) |  8.75 (0.00) |  6.27 (0.01) | 43.60 (0.05) |  6.19 (0.02) |  5.99 (0.01) | 6.00 (0.02) | (25%)
       3COL-SIMPLEX1 |  4.68 (0.00) |  4.97 (0.00) |  4.56 (0.00) | 11.13 (0.26) |  4.34 (0.01) |  4.35 (0.02) | 4.34 (0.01) | (53%)
        3COL-RANDOM1 |  6.66 (0.05) |  8.15 (0.04) |  5.95 (0.03) | 38.14 (0.16) |  6.10 (0.26) |  5.91 (0.04) | 5.86 (0.27) | (13%)
          HP-RANDOM1 |  4.93 (0.04) |  5.72 (0.03) |  5.23 (0.16) | 18.44 (0.01) |  4.63 (0.07) |  4.64 (0.15) | 4.44 (0.05) | (11%)
       HAMCYCLE-FREE |  0.80 (0.00) |  1.12 (0.00) |  1.03 (0.00) |  4.96 (0.00) |  0.75 (0.00) |  0.73 (0.01) | 0.72 (0.00) | (11%)
             DECOMP2 |  8.44 (0.01) |  9.59 (0.02) |  8.53 (0.06) | 33.91 (0.06) |  7.71 (0.01) |  7.48 (0.04) | 7.87 (0.01) | (7%)
        BW-P5-nopush |  4.45 (0.01) |  4.85 (0.01) |  4.25 (0.01) | 12.90 (0.04) |  4.27 (0.03) |  4.20 (0.02) | 4.19 (0.02) | (6%)
       BW-P5-pushbin |  3.79 (0.01) |  4.05 (0.03) |  3.44 (0.02) | 12.61 (0.02) |  3.41 (0.01) |  3.38 (0.01) | 3.40 (0.01) | (11%)
     BW-P5-nopushbin |  1.21 (0.00) |  1.31 (0.01) |  1.13 (0.00) |  4.07 (0.02) |  1.09 (0.00) |  1.09 (0.00) | 1.09 (0.00) | (11%)
        HANOI-Towers |  2.05 (0.02) |  2.19 (0.03) |  1.94 (0.01) |  6.21 (0.03) |  1.88 (0.01) |  1.81 (0.01) | 1.82 (0.03) | (12%)
              RAMSEY |  5.34 (0.03) |  5.69 (0.00) |  4.83 (0.01) | 16.69 (0.05) |  4.60 (0.02) |  4.65 (0.02) | 4.58 (0.05) | (16%)
             CRISTAL |  5.30 (0.08) |  5.91 (0.09) |  5.14 (0.03) | 12.67 (0.03) |  4.65 (0.01) |  4.68 (0.05) | 4.75 (0.12) | (11%)
           21-QUEENS |  6.35 (0.00) |  7.31 (0.00) |  5.09 (0.01) | 40.15 (0.15) |  5.09 (0.01) |  4.98 (0.01) | 4.86 (0.02) | (30%)
   MSTDir[V=13,A=40] | 12.58 (0.00) | 14.46 (0.00) |  9.14 (0.00) | 41.77 (0.02) |  9.07 (0.02) |  8.60 (0.03) | 8.60 (0.05) | (46%)
   MSTDir[V=15,A=40] | 12.62 (0.00) | 14.49 (0.01) |  9.15 (0.00) | 41.44 (0.01) |  9.00 (0.02) |  8.59 (0.04) | 8.53 (0.03) | (47%)
 MSTUndir[V=13,A=40] |  6.47 (0.00) |  7.57 (0.00) |  4.96 (0.00) | 25.48 (0.02) |  4.88 (0.01) |  4.66 (0.00) | 4.61 (0.03) | (40%)
         TIMETABLING |  7.08 (0.14) |  7.37 (0.02) |  6.30 (0.02) | 18.21 (0.20) |  5.87 (0.08) |  6.02 (0.10) | 5.90 (0.04) | (20%)
---------------------+--------------+--------------+--------------+--------------+--------------+--------------+-------------+


[0] build time is 2m42
[2] build time is 2m50
[4] build time is 2m39 (plus I think that about 9% can be saved by fixing for_each_template_parm_r)
[5] build time is 2m58 
[6] build time is 3m26

So it seems to me that we probably can set the default to 1000 and get faster
execution&compile times relative to both 2.95 and all 3.3.x

I was looking into the -fno-unit-at-a-time results and the core of slodown is
in fact that templates instantiated by finalizing given function never get
inlined into that function (becuase we do strictly in-order inlining now while
originally we did partly out of order when function were small enought).  This
explains the better compile times too and I don't think it is important problem
as we are shooting for -funit-at-a-time default anyway.

Honza


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]