This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Compile-time and execution-time impact of CSE - some numbers


On Tuesday 31 August 2004 15:29, Paolo Bonzini wrote:
> Hello, these are the results of a simple attempt at trimming the time
> spent in CSE passes.  Not very encouraging really, but maybe it can
> help more experienced people than me.
>
> The first thing I tried is to remove CSE1 and move EBB CSE to -O3,
> using the attached patch.  This also meant that we do not run local CSE
> at -O1 anymore.  Here are the results of this and other experiments;
> all times were taken on a Pentium 4 machine running at 1.7 GHz.
>
> As for bootstrapping, I have only timed a C-only --disable-checking
> bootstrap.  Bootstrapping times are very similar but they are not very
> representative of the effect of the patch, due to the large time spent
> compiling stage2; but compiling stage3 takes 10:22 minutes instead of
> 11:00, which is about 6% faster.
>
> I then timed combine.i files.  I ran the compiler five times, took out
> the runs with the best and worst overall time, and averaged the other
> three (the machine was very lightly loaded and has plenty of memory,
> so system time did not matter).  Times are in the following table. The
> headings are different at -O1 than for other optimization levels,
> because CSE and/or GCSE are not run there:
>
>              -O1       | -O2             | -O3
>              tot   CSE | tot   GCSE  CSE | tot   GCSE  CSE
> patched      6.74  --- | 10.07 0.43 0.35 | 15.19 1.16 0.90
> HEAD         6.99 0.16 | 10.86 0.48 1.04 | 16.00 1.10 1.59
> improvement  4.1%      |  7.3%           |  5.0%
>
> For -O2 I got run-time numbers too, which I took from a CPU-intensive
> sed benchmark (I used sed 4.1.1, compiled with IMA including the regex
> matcher), doing measurements in the same way as above for both
> compilation and the sed benchmark dc.sed.
>
> The results are in the table that follows and are for several compilers:
>
> 1) "patched" is as above
>
> 2) "HEAD, no EBB" is mainline with -fno-cse-skip-blocks
> -fno-cse-follow-jumps: the results are even worse.
>
> 3) "patched+EBB" uses the attached patch but without the hunks that move
> -fcse-skip-blocks and -fcse-follow-jumps to -O3, since it looks like CSE
> on EBBs is (still :-( ...) doing good, but CSE1 is not.
>
> 4) "HEAD, no CSE2" is a final try... let's disable CSE2 instead, and run a
> full-power CSE1 (no GCSE column in the table since the two GCSE's look
> at exactly the same things): this means moving -frerun-cse-after-loop to
> -O3 (or using HEAD's compiler with -O2 -fno-rerun-cse-after-loop).
>
>                combine.i             sed
>                tot    GCSE    CSE  | compile  GCSE    CSE     dc.sed
> -----------------------------------+---------------------------------
> HEAD           10.86  0.48   1.04  | 10.50    0.33   1.08     11.77
> -----------------------------------+---------------------------------
> patched        10.07  0.43   0.35  |  9.69    0.35   0.38     11.96
> improvement     7.3%               |  7.7%                    -1.6%
> -----------------------------------+---------------------------------
> HEAD, no EBB   10.28  0.46   0.67  |  9.99    0.31   0.67     12.03
> improvement     5.3%               |  4.8%                    -2.1%
> -----------------------------------+---------------------------------
> patched+EBB    10.31  0.46   0.66  | 10.00    0.34   0.65     11.89
> improvement     5.1%               |  4.8%                    -1.0%
> -----------------------------------+---------------------------------
> HEAD, no CSE2  10.47  0.48   0.62  | 10.05    0.33   0.76     11.85
> improvement     3.6%               |  4.3%                    -0.7%
>
> 4.1% on -O1 looks good to me, and I think we can safely lose 1-2% of
> execution time at -O1.  But for -O2 only the last two are worth running
> SPEC on.  If anybody wants to try, for the latter there's not even a
> patch to apply.  But it looks like at -O2 the RTL passes are not going
> away soon. :-(

You will find that you can spend your time better on fixing the bugs
marked as "tree-optimization" and "missed-optimization".  That is still
where most of the things CSE1 and GCSE catch come from.

Also, the only *real* good way of speeding up CSE is by making it work
on extended basic blocks (ie. kill -fskip-blocks) and then teaching it
to not rescan already visited blocks (by using a scoped hash table).

Unfortunately, simply disabling -fskip-blocks doesn't give much speedup
either.  But that would be the first step.   The second step would be to
make cse.c use the CFG (ie. FOR_BB_INSNS/BB_HEAD/BB_ END/etc.) instead
of relying on block notes.  Next you'd clean up the path following code
to track back only to the last visited block before following a jump.

Gr.
Steven



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]