This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On Mon, Apr 19, 2004 at 01:50:57PM +0200, Kurt Garloff wrote: > lja_speed benchmarks Update: lja_speed benchmarks on a quad Opteron (1400MHz), 2GB, times in seconds (average from two best out of three). -fno- -fnew-ra FDO FDO new-ra new-ra gcc-3.2.2 (SLES8) 5.56 4.32+ 5.52 4.15+ gcc-334 CVS 20040424 5.55 7.85- 5.49 7.66- gcc-333 hammer 0424 5.51 ICE!! 5.22 ICE!! gcc-340 newra 0424 6.26- 4.27+ 6.95- 4.46+ gcc-3.4.0 5.59 7.69- 5.82- 9.21- gcc-350 CVS 20040424 4.59+ 7.05 5.22 8.74 gcc-350 treessa 0424 3.94+ 5.87 3.43+ 4.05+ gcc-350 ssa-lno 0424 4.98 7.61- 4.23+ 4.96 For comparison: 32bit gcc-3.2.2 (SLES8) -m32 6.11 4.51 6.02 4.49 Options: gcc: -Wall -O2 -ffast-math -fomit-frame-pointer -fschedule-insns2 -O3 -frerun-loop-opt -funroll-loops -fstrict-aliasing FDO: -fprofile-arcs, run, -fbranch-probabilities (gcc-3.2/3.3/newra) -fprofile-generate, run, -fprofile-use (gcc-3.4/3.5/ssa/lno) Compile time (seconds, user) and text sizes (stripped): -fno- -fnew-ra FDO FDO new-ra new-ra gcc-3.2.2 (SLES8) 0.28 0.29 0.27 0.31 6601 6201 6585 6169 gcc-334 CVS 20040424 0.29 0.37 0.30 0.36 7141 8389 7059 8255 gcc-333 hammer 0424 0.32 ICE! 0.29 ICE! 7145 5874 gcc-340 newra 0424 0.57 0.79 0.32 0.41 9810 9746 5544 5729 gcc-3.4.0 0.48 0.56 0.29 0.35 9381 10261 5764 6795 gcc-350 CVS 20040424 0.51 0.60 0.36 0.41 9471 9855 6287 7183 gcc-350 treessa 0424 0.80 0.91 0.71 0.79 6861 7325 5501 5901 gcc-350 ssa-lno 0424 0.55 0.64 0.42 0.46 8797 9261 5837 6186 Notes: * -fnew-ra performs very well on 3.2.2-SuSE and on the new-ra branch, on most others it hurts. * hammer branch has completely broken -fnew-ra. * tree-ssa branch yields the best results, even better with FDO. * FDO does save considerable compile time on newer versions; but it seems to save too much at the cost of optimization on 3.4.0, 3.4 newra, and 3.5 mainline; they all lose compared to non-FD optimization. tree-ssa and tree-ssa-lno do win with FDO. * lno loses against tree-ssa always and against mainline unless FDO is used. * 32bit is slower than 64. * If neither FDO nor new-ra is used, 3.2.2, 334 CVS, 333-hammer, and 3.4.0 are all about the same speed. 3.5.0 improves on that. Same benchmark on EV56 (DEC21164A), 600MHz, 768MB, linked with libffm -fno- -fnew-ra FDO FDO new-ra new-ra gcc-3.2.2 (SL81) 14.13 9.47+ 13.99 9.35+ gcc-3.2.3 14.07 - - - gcc-334 CVS 20040409 13.90 9.32+ 13.77 9.17+ gcc-333 hammer 0409 16.47- 13.74 16.48- 13.75- gcc-34 newra 0409 17.19- 14.04- ERR!! ERR!! gcc-3.4.0 12.79+ 11.25 ERR!! ERR!! gcc-350 CVS 20040409 13.00 11.21 ERR!! ERR!! gcc-35 tree-ssa 0416 11.93+ 11.82 14.76- 13.73- gcc-35 ssa-lno 0409 13.07 15.03- 13.11 12.89 For comparison: Linked with libcpml gcc-3.2.2 (SL8.1) 13.88 9.28 gcc-334 CVS 20040409 13.85 9.26 13.76 9.16 ccc-6.5.9 8.62 Options: gcc: -Wall -O2 -ffast-math -fomit-frame-pointer -fschedule-insns2 -O3 \ -frerun-loop-opt -mcpu=ev56 -funroll-loops -fstrict-aliasing FDO: See above ccc: -w0 -msg_display_tag -O2 -accept restrict_keyword -D__USE_STD_IOSTREAM \ -fast -tune ev56 -arch ev56 -O4 -inline speed Compile time (seconds, user) and text sizes (stripped): -fno- -fnew-ra FDO FDO new-ra new-ra new-ra new-ra gcc-3.2.2 (SL81) 1.96 2.19 2.02 2.21 11456 11392 11392 11328 gcc-3.2.3 2.03 11445 gcc-334 CVS 20040409 2.16 2.33 2.15 2.31 11777 11393 11897 11337 gcc-333 hammer 0409 1.88 2.04 1.95 2.09 9960 9800 10260 10004 gcc-34 new-ra 0409 2.01 2.48 9912 9808 gcc-3.4.0 2.04 2.19 11653 11565 gcc-350 CVS 20040409 3.01 3.32 13415 13135 gcc-35 tree-ssa 0416 5.14 5.40 4.60 4.93 11358 11334 9822 9670 gcc-35 ssa-lno 0409 3.23 3.49 2.57 2.75 12542 12398 9806 9782 ccc-6.5.9 2.83 11427 Notes: * The effect of -fnew-ra is larger on AXP than on x86-64. Despite more registers, the higher demand on a RISC arch seems to be causing this. * Like x86-64, 3.2.2-SuSE does well with -fnew-ra. Unlike x86-64, it works well with 334-CVS but not with new-ra branch. * Early 3.4 had been performing bad on AXP and we see this both on hammer branch and new-ra branch. * With the old register allocator, you can see improvements from 3.2 -> 3.3 -> 3.4 -> ssa. Both 3.5 and ssa-lno are a bit behind, hammer branch and new_ra branches suck. * The results vary a lot on this platform which is bad news. The good news is that we get quite close to ccc, if the right options are specified and the right compiler is used. * tree-ssa does well again. If only -fnew-ra and FDO would help it the same way as old 3.2.2-SuSE! Regards, -- Kurt Garloff <kurt@garloff.de> [Koeln, DE] Physics:Plasma modeling <garloff@plasimo.phys.tue.nl> [TU Eindhoven, NL] Linux: SUSE Labs (Head) <garloff@suse.de> [SUSE Nuernberg, DE]
Attachment:
lja_speed.c
Description: Text document
Attachment:
pgp00000.pgp
Description: PGP signature
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |