This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Compile-time and execution-time impact of CSE - some numbers
- From: Steven Bosscher <stevenb at suse dot de>
- To: Paolo Bonzini <paolo dot bonzini at polimi dot it>, gcc at gcc dot gnu dot org
- Date: Tue, 31 Aug 2004 15:50:27 +0200
- Subject: Re: Compile-time and execution-time impact of CSE - some numbers
- Organization: SUSE Labs
- References: <1093958955.41347d2b7d59f@webmail.polimi.it>
On Tuesday 31 August 2004 15:29, Paolo Bonzini wrote:
> Hello, these are the results of a simple attempt at trimming the time
> spent in CSE passes. Not very encouraging really, but maybe it can
> help more experienced people than me.
>
> The first thing I tried is to remove CSE1 and move EBB CSE to -O3,
> using the attached patch. This also meant that we do not run local CSE
> at -O1 anymore. Here are the results of this and other experiments;
> all times were taken on a Pentium 4 machine running at 1.7 GHz.
>
> As for bootstrapping, I have only timed a C-only --disable-checking
> bootstrap. Bootstrapping times are very similar but they are not very
> representative of the effect of the patch, due to the large time spent
> compiling stage2; but compiling stage3 takes 10:22 minutes instead of
> 11:00, which is about 6% faster.
>
> I then timed combine.i files. I ran the compiler five times, took out
> the runs with the best and worst overall time, and averaged the other
> three (the machine was very lightly loaded and has plenty of memory,
> so system time did not matter). Times are in the following table. The
> headings are different at -O1 than for other optimization levels,
> because CSE and/or GCSE are not run there:
>
> -O1 | -O2 | -O3
> tot CSE | tot GCSE CSE | tot GCSE CSE
> patched 6.74 --- | 10.07 0.43 0.35 | 15.19 1.16 0.90
> HEAD 6.99 0.16 | 10.86 0.48 1.04 | 16.00 1.10 1.59
> improvement 4.1% | 7.3% | 5.0%
>
> For -O2 I got run-time numbers too, which I took from a CPU-intensive
> sed benchmark (I used sed 4.1.1, compiled with IMA including the regex
> matcher), doing measurements in the same way as above for both
> compilation and the sed benchmark dc.sed.
>
> The results are in the table that follows and are for several compilers:
>
> 1) "patched" is as above
>
> 2) "HEAD, no EBB" is mainline with -fno-cse-skip-blocks
> -fno-cse-follow-jumps: the results are even worse.
>
> 3) "patched+EBB" uses the attached patch but without the hunks that move
> -fcse-skip-blocks and -fcse-follow-jumps to -O3, since it looks like CSE
> on EBBs is (still :-( ...) doing good, but CSE1 is not.
>
> 4) "HEAD, no CSE2" is a final try... let's disable CSE2 instead, and run a
> full-power CSE1 (no GCSE column in the table since the two GCSE's look
> at exactly the same things): this means moving -frerun-cse-after-loop to
> -O3 (or using HEAD's compiler with -O2 -fno-rerun-cse-after-loop).
>
> combine.i sed
> tot GCSE CSE | compile GCSE CSE dc.sed
> -----------------------------------+---------------------------------
> HEAD 10.86 0.48 1.04 | 10.50 0.33 1.08 11.77
> -----------------------------------+---------------------------------
> patched 10.07 0.43 0.35 | 9.69 0.35 0.38 11.96
> improvement 7.3% | 7.7% -1.6%
> -----------------------------------+---------------------------------
> HEAD, no EBB 10.28 0.46 0.67 | 9.99 0.31 0.67 12.03
> improvement 5.3% | 4.8% -2.1%
> -----------------------------------+---------------------------------
> patched+EBB 10.31 0.46 0.66 | 10.00 0.34 0.65 11.89
> improvement 5.1% | 4.8% -1.0%
> -----------------------------------+---------------------------------
> HEAD, no CSE2 10.47 0.48 0.62 | 10.05 0.33 0.76 11.85
> improvement 3.6% | 4.3% -0.7%
>
> 4.1% on -O1 looks good to me, and I think we can safely lose 1-2% of
> execution time at -O1. But for -O2 only the last two are worth running
> SPEC on. If anybody wants to try, for the latter there's not even a
> patch to apply. But it looks like at -O2 the RTL passes are not going
> away soon. :-(
You will find that you can spend your time better on fixing the bugs
marked as "tree-optimization" and "missed-optimization". That is still
where most of the things CSE1 and GCSE catch come from.
Also, the only *real* good way of speeding up CSE is by making it work
on extended basic blocks (ie. kill -fskip-blocks) and then teaching it
to not rescan already visited blocks (by using a scoped hash table).
Unfortunately, simply disabling -fskip-blocks doesn't give much speedup
either. But that would be the first step. The second step would be to
make cse.c use the CFG (ie. FOR_BB_INSNS/BB_HEAD/BB_ END/etc.) instead
of relying on block notes. Next you'd clean up the path following code
to track back only to the last visited block before following a jump.
Gr.
Steven