This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Some GCC 4.1 benchmarks (Re: Thoughts on LLVM and LTO)
>
> > Which is why i said "It's fine to say compile time performance of the
> > middle end portions ew may replace should be same or better".
> >
> > And if you were to look right now, it's actually significantly better in
> > some cases :(
>
> Can you prove this assertion?
>
> Here is some data:
> http://people.redhat.com/dnovillo/spec2000.i686/gcc/global-build-secs_elapsed.html
>
> And some more
> http://llvm.cs.uiuc.edu/testresults/X86/2005-11-01.html
>
> I'm not sure about accuracy, or versions of LLVM used, etc.
>
> Although promising on some things (as Diego said), LLVM exectue and
> compile performance is a mixed bag.
>
> It would probably be interesting to run SPEC or something else with icc
> IPO enabled, LLVM IPO enabled, and whatever gcc IMA support is
> available, to do a true comparison of where things stand. More data
> would be interesting.
I might try to produce bit more useful charts, but I've done some
testing of GCC 4.1 on SPEC and some of C++ testcases recently mostly
looking for regressions in GCC 4.1 release. I didn't tested LLVM, but
did some ICC comparsion and testing both with and without our current
IMA so it gives rough idea.
I should note that comparison to ICC is not quite fair since it lacks
Opteron tunning I tested on, but I would say that we are in same
performance camp on SPECint with IMA (IMA contribute 3.3% to the result)
despite the fact that GCC IMA and IPA is very primitive. This can be
just proof that SPECint is not best testcase for testing future IPA
implementations. I also did some C++ results that are a lot more wild.
It would be really interesting to see how much benefits one can see on
compiling full blown application and how large stuff one can hope to
compile with LTO (ie GCC/kernel/mozilla/OOo/... ;).
I am not quite sure how much of SPECfp loss can be contributed to IMA,
since I would expect it to more come from Fotran tunning. Only
regressing C benchmark is ART that ineed needs whole program
optimization to allow datastructure layout changes. Obviously we did
some notable progress on fortran perofrmance in between 4.0 and 4.1 and
none of that is IPA related.
I am also adding some scores of C++ testcases - tramp3d that has single
file and Gerald's application I didn't actually managed to merge into
single file, but I combined the files that appear hot in coverage.
Concerning compile time at -O2 hammer branch needs 185s, 4.0 192s, 4.1
205s With IPA and no FDO 4.0 needs 193s when patches by Andrew's faster
typemerging patch, 4.1 needs 218s. I didn't recorded ICC compilation
times, but it clearly show that we are making compile time problems
worse with 4.1 again overall. It also shows that IPA is cheap right,
but just because it is so primitive. It is also cheap only as long as
you fit in memory (You need over 512MB of memory to build SPEC with IMA
on GCC that is far from acceptable)
Also note that eon and fortran files are not compiled with IMA in GCC
tests.
-O2, no IMA on both compilers:
GCC-3.3-hammer GCC 4.0 GCC 4.1 ICC-9.0
gzip 1162 1181 1199 1151
vpr 859 853 824 854
gcc 1057 1035 1028 963
mcf 540 540 541 543
crafty 2100 2041 2025 2106
parser 776 790 783 778
eon 1793 1874 1952 (failed, substituted as 783 for geomavg)
perlbmk 1407 1453 1438 1503
gap 1095 1152 1156 1071
vortex 1689 1663 1666 1618
bzip2 1009 1011 1000 997
twolf 843 858 852 823
geomavg 1114.8 1124.95 1122.76 1102
GCC-3.3-hammer GCC 4.0 GCC 4.1 ICC-9.0
wupwise 1218 1079 1304 1278
swim 1038 1065 1070 1064
mgrid 784 728 906 909
applu 772 822 840 884
mesa 1536 1609 1536 1486
galgel 803 830
art 730 739 735 747
equake 1102 1085 1069 1055
facerec 905 914 1393
ammp 967 993 1008 985
lucas 1106 1113 1264
fma3d 976 978 1154
sixtrac 582 591 618 647
apsi 810 922 1004 948
933 971 1016
-O2 -static --combine -fwhole-program -fipa-cp
versus ICC -xW -O3 -ipo -vec_report3
profile feedback is used on both compilers.
GCC-3.3-hammer GCC 4.0 GCC-4.1 ICC-9.0
gzip 1269 1299 1264 1337
vpr 890 864 885 869
gcc 1112 1095 1175 1023
mcf 539 536 538 546
crafty 2055 2034 2236 2301
parser 960 975 993 851
eon 2081 1928 2192 2150
perlbmk 1621 1574 1697 1652
gap 1117 1181 1223 1224
vortex 1683 2038 2173 2421
bzip2 1058 1022 1085 1087
twolf 842 877 877 849
1183.41 1195.84 1251.55 1232.97
GCC-3.3-hammer GCC 4.0 GCC 4.1 ICC-9.0
wupwise 1305 1401 1678
swim 1065 1293 1360
mgrid 758 884 973
applu 857 918 1060
mesa 1756 1751 1756 1759
galgel 818 848 1790
art 724 734 735 1414
equake 1088 1101 1108 1308
facerec 974 1110 1467
ammp 1008 1034 1063 967
lucas 1111 1104 1261
fma3d 976 1215 1238
sixtrac 643 702 653
apsi 940 988 958
973.82 1049.12 1234.02
Tramp3d, iterations per seccond with and without FDO.
GCC 3.3-hammer 0.36
GCC 4.0 0.45
GCC 4.1 0.56
GCC 4.1 flatten 0.62
GCC 4.1 profile 0.07
GCC 4.1 FDO 0.81
GCC 4.1 profile 0.08
4.1 FDO flatten 0.89
ICC 9.0 0.14
DLV, speedup in percents relative to GCC 3.3 hammer-branch
GCC 4.0 GCC 4.1 GCC-4.1 profile ICC 9.0
STRATCOMP1-ALL 284 287.1 242.86 18.52
STRATCOMP-770.2-6.25 0 13.33 -10.53
2QBF1 -5.47 -5.87 6.83 -15.23
PRIMEIMPL2 3.09 5.26 12.36 -23.95
3COL-SIMPLEX1 -1.78 -7.78 2.47 9.21
3COL-RANDOM1 -3.88 -0.84 0.21 -20.84
HP-RANDOM1 -26.72 -13.83 -12.45 -9.94
HAMCYCLE-FREE -1.89 -3.7 0 -17.46
DECOMP2 -6.84 -12.2 -12.35 -11.27
BW-P5-nopush -6.29 -4.07 -2.75 -5.98
BW-P5-pushbin -5.28 -1.95 -0.4 -13.75
BW-P5-nopushbin -6.49 -2.7 0 -8.86
HANOI-Towers -6.79 -2.58 0 -21.35
RAMSEY 5.41 -3.7 9.86 -5.65
CRISTAL -17.21 -20.12 -13.53 -8.91
21-QUEENS -1.71 -2.55 4.24 -34.48
MSTDir[V=13] 2.06 0.2 6 -31.72
MSTDir[V=15] 1.84 1.01 6.87 -32.15
MSTUndir[V=13] -4.08 -4.08 2.92 -29.5
TIMETABLING 2.65 0.74 7.97 -31.91
AVG 2.71 2.6 7.74 -16.31