This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
GCSE after LOOP

From: Roger Sayle <roger at eyesopen dot com>
To: <gcc at gcc dot gnu dot org>
Cc: Andreas Jaeger <aj at suse dot de>
Date: Fri, 27 Sep 2002 13:13:57 -0600 (MDT)
Subject: GCSE after LOOP
There's been some interest on the gcc mailing lists recently on
my ongoing work evaluating moving GCC's global common subexpression
pass (GCSE) after the loop optimizations pass, that performs
strength reduction, loop unrolling, etc. (LOOP).

My progress so far is summarized by the table at the bottom of
this message.  Firstly, I'd like to thank Andreas Jaeger for
patiently running SPECcpu2000 for me over the last five months.
I don't have access to SPEC myself, and so the 150+ CPU hours
of results presented in the table itself are thanks to his efforts.

Each value in the table represents the change in the SPEC score
that resulted from applying the patch.  All values were for running
SPEC with "-O2 -march=athlon" on Suse's "amdsim" farm.  Individual
tests are listed as rows, and different patches, and patch
combinations, are by column.  The use of deltas factors out the
natural movements in SPEC results over this period.  A value
"???" denotes that when the timings were run this test was
failing due to unrelated problems.  The word "fail" indicates
that a patch of mine actually introduced a regression.  A question
mark after a result, indicates an outlier probably caused by a
temporary performance regression of the branch.  Finally, an
"executive summary" is presented at the bottom as wins/draws/loses.

Now to describe the patches themselves.  For more details and
motivations, jump bypassing (Bypass) and implicit sets in PRE (Impl)
are described in http://gcc.gnu.org/ml/gcc/2002-05/msg01658.html
and the patch for aggressive jump bypassing (Aggr) has been posted
as http://gcc.gnu.org/ml/gcc-patches/2002-06/msg01392.html


The first column labelled "Orig Bypass" was the change in SPEC scores
that resulted when the initial jump bypassing implementation was added
to GCC.  To recap, the jump bypassing optimization was an improvement to
GCSE's constant propagation pass, that "bypassed" blocks beginning with
conditional jumps when the values along one of the incoming edges caused
the condition to always be constant, so the conditional branch would
always be taken or not taken.  In theory, this optimization modifies
the CFG such that we always execute fewer instructions.  This column
is used as a control in the table below representing a good improvement.
All other columns include this patch.  It also limits the performance
regression I was prepared to accept during my investigations.


Although original jump bypassing improved results overall there were
some performance degradations, which at the time I found surprising.

Shortly, after I developed to two further patches "Aggr" and "Impl"
that both should of improved the quality of jump bypassing and GCSE.
In the original/current jump bypassing implementation, I only ever
modified "taken" jumps (including unconditional jumps), but didn't
optimize fall-thru edges.  The "Aggr" patch, modifies the algorithm
to optimize both taken and fall through edges.  The "Impl" patch
modifies PRE such that more constants could be propagated into
expressions, allowing jump bypassing to effectively perform
if-then-else conversion.

In theory, both "Aggr" and "Impl" improvements to GCSE should result
in better performance because GCSE is doing a better job.  The results
on SPEC however indicated that as GCSE improved overall performance
dropped.  It was at this point that Jeff Law and I hypothesized that
the cause was that GCSE's flow control optimizations were interfering
with LOOP's ability to optimize loops.  Jump bypassing can create
multiple jumps into a loop, creating irreducible graphs.  Even if
the loops were still reducible the dramatic modifications of control
flow created havoc with GCC's loop notes.  Currently GCC uses syntax
based loop-discovery rather than CFG-based loop-discovery which relies
on NOTE insns being correctly placed in the RTL stream by front-ends.
Most importantly these notes shouldn't become corrupted prior to LOOP.

The first solution that was considered was moving the entire GCSE
pass after LOOP.  The results of this are presented in the "Move GCSE"
column.  The first noticable result of this is that SPECint and SPECfp
behave very differently.  Programs with complex control flow such as
those in SPECint benefit most from GCSE before LOOP, but heavily loop
oriented code such as the scientific numerical applications in SPECfp
prefer GCSE after LOOP.  Factor's affecting this include how much GCSE
can simplify a loop body, affecting the loop unrolling heuristics and
occassionally GCSE's ability to cprop a loop bound enabling loop
optimizations that otherwise wouldn't be possible.

Although the SPECfp results were promising, the SPECint degradations,
especially in 176.gcc which would affect compile times, were too severe.

The next step was to evaluate "Move GCSE" with both "Aggr" and "Impl".
Now that control flow optimizations didn't occur until after loop,
better branch optimizations might be able to recover the cost of the
move.  The results indicated that doing these steps after LOOP was
an improvement, but still not enough to reach their potential.  Prior
to the move the "Aggr" and "Impl" optimizations were so effective on
176.gcc, that we'd see drops in compiler bootstrap times.  i.e. the
cost of doing the extra work was completely recovered by the improved
compiler performance.  Moving GCSE "en masse" after LOOP was just
to expensive.

Moving the GCSE pass entirely is just to blunt an instrument.  Taking
a page from Muchnick's text book (infact the inside front-cover), he
recommends that GCSE be performed prior to loop optimizations, but
that control flow optimizations take place afterwards.  From GCC's
prespective, GCSE pass consists of several iterations of PRE and
cprop where jumps are not allowed to be modified, followed by a
final pass that allows constant propagation into jumps and the jump
bypassing optimization itself.  The next attempt was simply to move
this final stage of GCSE after loop.  i.e. have the main GCSE pass
prior to LOOP, but move the branch modifying transformations until
after loop.  This is listed as the "Move Bypass" column below.

Although another incremental improvement, it was still a surprise
that constant propagation into jumps has such a beneficial
effect on SPECint performance.  My most recent trial, is to add
a new ".bypass" jump bypassing pass the the compiler.  This just
duplicates the code that was moved in "Move Bypass".  Prior to
loop constant propagation is allowed by jump bypassing isn't,
and after loop both optimizations are allowed.  Unlike all earlier
patches, this duplication was the first to slow the compiler. The
net effect was a 1-2% increase in bootstrap times.  IMHO this is
an acceptable loss.

Duplicating constant propagation into jumps, and moving jump bypassing
after loop appears to be the best compromise (so far).  This produces
only minor loses on SPECint and but consistent improvements in SPECfp.

To confirm that this split now allows improvements in GCSE to not
adversely affect LOOP, the final column shows yesterday's results
of running "New Bypass" and "Aggr" together.  With this combination
SPECint has as many winners as losers, and SPECfp is universally better.


						Move
						GCSE			New
		Orig			Move	+Aggr	Move	New	Bypass
		Bypass	Aggr	Impl	GCSE	+Impl	Bypass	Bypass	+Aggr
164.gzip	+2	+3	+5	-2	-5	-3	+1	+3
175.vpr		+1	 0	+1	 0	 0	 0	 0	+1
176.gcc		+8	+2	+1	-2	-1	-1	-1	+1
181.mcf		+4	 0	-4	 0	 0	 0	 0	 0
186.crafty	-4	+4	 0	 0	-4	-9	 0	-3
197.parser	 0	 0	-1	 0	-1	+1	+1	+1
252.eon		???	-5	-23	-32	-27	-25	-18	-16
253.perlbmk	-6	+1	-1	-2	fail	-8	-6	-8
254.gap		+1	+20	-3	-2	fail	-1	+3	+3
255.vortex	 0	-11	-1	-2	fail	-3	 0	 0
256.bzip2	-1	-3	 0	+4	fail	-1	-1	-2
300.twolf	+1	-1	+1	+2	+5	+3	-1	 0
SPECint		+1	 0	-2	-2	???	-2	-1	-1
Summary		6/2/3	5/3/4	4/2/6	2/4/6	1/2/5	2/2/8	3/4/5	4/4/4

						Move
						GCSE			New
					Move	+Aggr	Move	New	Bypass
		Orig	Aggr	Impl	GCSE	+Impl	Bypass	Bypass	+Aggr
168.wupwise	 0	+1	 0	-4	-1	+7	+20	+7
171.swim	+10	-3	-1	+16	+2	+5	+5	+12
172.mgrid	 0	 0	 0	 0	+3	+1	+1	+1
173.applu	 0	 0	+1	+10	+11	+4	+3	+3
177.mesa	+1	-10	 0	-1	-4	-11	-9	+59?
179.art		 0	 0	 0	+5	+4	+4	+4	+4
183.equake	-1	 0	 0	+7	+6	+111?	+6	+6
188.ammp	 0	 0	 0	+2	+3	+1	+1	+1
200.sixtrack	???	???	???	???	+31	-2	-3	-3
301.apsi	 0	 0	 0	-1	-2	 0	+1	+1
SPECfp		+1	-1	 0	+4	+6	+13?	+3	+7?
Summary		2/6/1	1/6/2	1/7/1	5/1/3	7/0/3	7/1/2	8/1/2	9/0/1



I believe that I've now reached a reasonable point to submit patches
for the development/basic-improvements branch.  The most recent set
of patches avoids the known problems caused by jump bypassing and
overall improves GCC's results.  There are still problems, 252.eon!,
but I believe I've gone about as far as I can without seeing the
source of a routine that's experiencing problems.  Hopefully, putting
these patches in CVS, will provide broader exposure for the changes.
For example, "New Bypass" results in a 1.6% improvement in whetstone
on athlon-pc-cygwin.


I apologise to the non-x86 community that this investigation is
strongly IA-32 biased, but unfortunately besides Andreas, no-one
else offered the time or resources to evaluate SPEC2000 on other
targets.


I thought I'd share these results, and hopefully see whether the
conclusions I've reached are shared by others from the raw data.

If all goes well I'll submit the patch for "New Bypass" and resubmit
"Aggr", both against the gcc-3_4-basic-improvement-branch, to the
gcc-patches list over the weekend.  If all goes well, I'll also submit
"Impl" after further SPEC testing confirms that it too is now a win.

Roger
--
Roger Sayle,                         E-mail: roger@eyesopen.com
OpenEye Scientific Software,         WWW: http://www.eyesopen.com/
Suite 1107, 3600 Cerrillos Road,     Tel: (+1) 505-473-7385
Santa Fe, New Mexico, 87507.         Fax: (+1) 505-473-0833
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]