This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: branches compared (lja_speed, EV56)


On Mon, Apr 19, 2004 at 01:50:57PM +0200, Kurt Garloff wrote:
> lja_speed benchmarks

Update:

lja_speed benchmarks on a quad Opteron (1400MHz), 2GB, times in seconds
(average from two best out of three).

			-fno-	-fnew-ra FDO	 FDO	
			new-ra			new-ra	
gcc-3.2.2 (SLES8)	 5.56	 4.32+	 5.52	 4.15+
gcc-334 CVS 20040424	 5.55	 7.85-	 5.49	 7.66-
gcc-333 hammer  0424	 5.51	 ICE!!	 5.22	 ICE!!
gcc-340 newra   0424	 6.26-	 4.27+	 6.95-	 4.46+
gcc-3.4.0 		 5.59	 7.69-	 5.82-	 9.21-
gcc-350 CVS 20040424	 4.59+	 7.05	 5.22	 8.74
gcc-350 treessa 0424	 3.94+	 5.87	 3.43+	 4.05+
gcc-350 ssa-lno 0424	 4.98	 7.61-	 4.23+	 4.96
For comparison: 32bit
gcc-3.2.2 (SLES8) -m32	 6.11	 4.51	 6.02	 4.49

Options:
gcc: -Wall -O2 -ffast-math -fomit-frame-pointer -fschedule-insns2 -O3 
	-frerun-loop-opt  -funroll-loops -fstrict-aliasing
FDO:    -fprofile-arcs, run, -fbranch-probabilities (gcc-3.2/3.3/newra)
	-fprofile-generate, run, -fprofile-use (gcc-3.4/3.5/ssa/lno)

Compile time (seconds, user) and text sizes (stripped):
			-fno-	-fnew-ra FDO	 FDO	
			new-ra			new-ra	
gcc-3.2.2 (SLES8)	0.28	0.29	0.27	0.31
			 6601	 6201	 6585	 6169
gcc-334 CVS 20040424	0.29	0.37	0.30	0.36
			 7141 	 8389	 7059	 8255
gcc-333 hammer  0424	0.32 	ICE!	0.29	ICE!	
			 7145		 5874
gcc-340 newra   0424	0.57	0.79	0.32	0.41
			 9810	 9746	 5544	 5729
gcc-3.4.0 		0.48	0.56	0.29	0.35
			 9381	10261	 5764	 6795
gcc-350 CVS 20040424	0.51	0.60	0.36	0.41
			 9471	 9855	 6287	 7183
gcc-350 treessa 0424	0.80	0.91	0.71	0.79
			 6861	 7325	 5501	 5901	
gcc-350 ssa-lno 0424	0.55	0.64	0.42	0.46
			 8797	 9261	 5837	 6186

Notes:
* -fnew-ra performs very well on 3.2.2-SuSE and on the new-ra 
  branch, on most others it hurts.
* hammer branch has completely broken -fnew-ra.
* tree-ssa branch yields the best results, even better with FDO.
* FDO does save considerable compile time on newer versions; but
  it seems to save too much at the cost of optimization on 3.4.0,
  3.4 newra, and 3.5 mainline; they all lose compared to non-FD
  optimization. tree-ssa and tree-ssa-lno do win with FDO.
* lno loses against tree-ssa always and against mainline unless
  FDO is used.
* 32bit is slower than 64.
* If neither FDO nor new-ra is used, 3.2.2, 334 CVS, 333-hammer, 
  and 3.4.0 are all about the same speed. 3.5.0 improves on that.

Same benchmark on EV56 (DEC21164A), 600MHz, 768MB, linked with libffm

			-fno-	-fnew-ra FDO	 FDO	
			new-ra			new-ra	
gcc-3.2.2 (SL81)	14.13	 9.47+	13.99	 9.35+
gcc-3.2.3		14.07	 -	 -	 -
gcc-334 CVS 20040409	13.90	 9.32+	13.77	 9.17+
gcc-333 hammer  0409	16.47-	13.74	16.48-	13.75-
gcc-34 newra    0409	17.19-	14.04-	ERR!!	ERR!!
gcc-3.4.0		12.79+	11.25	ERR!!	ERR!!
gcc-350 CVS 20040409	13.00	11.21	ERR!!	ERR!!
gcc-35 tree-ssa 0416	11.93+	11.82	14.76-	13.73-
gcc-35 ssa-lno  0409	13.07	15.03-	13.11	12.89

For comparison: Linked with libcpml
gcc-3.2.2 (SL8.1)	13.88	 9.28
gcc-334 CVS 20040409	13.85	 9.26	13.76	 9.16
ccc-6.5.9		 8.62

Options:
gcc: -Wall -O2 -ffast-math -fomit-frame-pointer -fschedule-insns2 -O3 \
	-frerun-loop-opt -mcpu=ev56 -funroll-loops -fstrict-aliasing
FDO: See above
ccc: -w0 -msg_display_tag -O2 -accept restrict_keyword -D__USE_STD_IOSTREAM \
	-fast -tune ev56 -arch ev56 -O4 -inline speed 

Compile time (seconds, user) and text sizes (stripped):
			-fno-	-fnew-ra FDO	 FDO	
			new-ra			new-ra	
			new-ra			new-ra	
gcc-3.2.2 (SL81)	1.96	2.19	2.02	2.21
			11456	11392	11392	11328
gcc-3.2.3		2.03
			11445
gcc-334 CVS 20040409	2.16	2.33	2.15	2.31
			11777	11393	11897	11337
gcc-333 hammer  0409	1.88	2.04	1.95	2.09
			 9960	 9800	10260	10004
gcc-34 new-ra   0409	2.01	2.48
			 9912	 9808
gcc-3.4.0		2.04	2.19
			11653	11565
gcc-350 CVS 20040409	3.01	3.32
			13415	13135
gcc-35 tree-ssa 0416	5.14	5.40	4.60	4.93
			11358	11334	 9822	9670
gcc-35 ssa-lno  0409	3.23	3.49	2.57	2.75
			12542	12398	 9806	9782
ccc-6.5.9		2.83
			11427

Notes:
* The effect of -fnew-ra is larger on AXP than on x86-64. Despite more 
  registers, the higher demand on a RISC arch seems to be causing this.
* Like x86-64, 3.2.2-SuSE does well with -fnew-ra. Unlike x86-64, it
  works well with 334-CVS but not with new-ra branch.
* Early 3.4 had been performing bad on AXP and we see this both on hammer 
  branch and new-ra branch.
* With the old register allocator, you can see improvements
  from 3.2 -> 3.3 -> 3.4 -> ssa.
  Both 3.5 and ssa-lno are a bit behind, hammer branch and new_ra
  branches suck.
* The results vary a lot on this platform which is bad news. The good
  news is that we get quite close to ccc, if the right options are
  specified and the right compiler is used.
* tree-ssa does well again. If only -fnew-ra and FDO would help it the
  same way as old 3.2.2-SuSE!

Regards,
-- 
Kurt Garloff                   <kurt@garloff.de>             [Koeln, DE]
Physics:Plasma modeling <garloff@plasimo.phys.tue.nl> [TU Eindhoven, NL]
Linux: SUSE Labs (Head)        <garloff@suse.de>    [SUSE Nuernberg, DE]

Attachment: lja_speed.c
Description: Text document

Attachment: pgp00000.pgp
Description: PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]