This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64
> I did some measurement (64bit).
>
> Experiment 1:
>
> -O2 -funroll-loops vs -O2
>
> It improves performance (geomean) by 0.56%, not too much:
> O2 O2 unroll-loops
> 164.gzip 1324 1331 0.56%
> 175.vpr 1694 1605 -5.24%
> 176.gcc 2293 2350 2.47%
> 181.mcf 1772 1788 0.90%
> 186.crafty 2320 2326 0.26%
> 197.parser 1166 1162 -0.32%
> 252.eon 2443 2529 3.50%
> 253.perlbmk 2410 2460 2.07%
> 254.gap 1987 2019 1.58%
> 255.vortex 2392 2406 0.58%
> 256.bzip2 1719 1715 -0.25%
> 300.twolf 2288 2308 0.88%
Can you also try -funroll-all-loops? As for pretty small programs, like
spec2k, -funroll-all-loops is often win. In just few loops we can work out
number of iterations.
>
> Experiment 3: O2 lto vs O2: geomean 0.72%
> O2 O2 LTO
> 164.gzip 1324 1317 -0.53%
> 175.vpr 1694 1697 0.18%
> 176.gcc 2293 2291 -0.08%
> 181.mcf 1772 1760 -0.65%
> 186.crafty 2320 2245 -3.26%
> 197.parser 1166 1163 -0.29%
> 252.eon 2443 2576 5.44%
> 253.perlbmk 2410 2433 0.93%
> 254.gap 1987 1995 0.36%
> 255.vortex 2392 2588 8.19%
> 256.bzip2 1719 1729 0.56%
> 300.twolf 2288 2248 -1.77%
You need -O3 -fwhole-program -flto for resonable cross module inlining to happen.
-fwhole-program is quite essential to get resonable win from LTO (w/o profile feedback).
At least our nightly tester then gets quite nice improvements on few benchmark at spec2k,
see also my gccsummit slides.
Honza