This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: IVOPT improvement patch
- From: Sebastian Pop <sebpop at gmail dot com>
- To: Xinliang David Li <davidxl at google dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Zdenek Dvorak <rakdver at kam dot mff dot cuni dot cz>
- Date: Mon, 26 Jul 2010 11:31:16 -0500
- Subject: Re: IVOPT improvement patch
- References: <20100525235926.GA3326@kam.mff.cuni.cz> <AANLkTikGZ-LQGCIlr4K1GIhILtRsTI2MNCPMP5cazPkG@mail.gmail.com> <20100527075632.GA12991@kam.mff.cuni.cz> <AANLkTimY4icwv57CMowcV7UtOXGU2Yqfmg5Z0nwsRSDe@mail.gmail.com> <20100528085052.GA3423@kam.mff.cuni.cz> <AANLkTikNxYZLO3egDnljShqbfFydKrZU70xw8aG6Mfbo@mail.gmail.com> <20100529152243.GA18706@kam.mff.cuni.cz> <AANLkTinXyF9GhrcZxjohiP3ypcfpqgicv4KnegvYWwYR@mail.gmail.com> <20100529191446.GA3996@kam.mff.cuni.cz> <AANLkTimJoc5EeFxvTo731fpE4rGYdLgbOI5l0q9GbG33@mail.gmail.com> <20100604105451.GB5105@kam.mff.cuni.cz> <AANLkTi=iM78LxLm=2R7QxObC0NE0ZkG3YatCnwXHaSpM@mail.gmail.com>
On Wed, Jul 21, 2010 at 02:27, Xinliang David Li <davidxl@google.com> wrote:
> The perf measurement was done on my Intel core-2 box with option -O2
> -ffast-math -mfpmath=sse
>
> 1. SPEC06
>
> m32
> ---------
>
> bwaves: ?+14.7%
> calculiux: +12.8%
> wrf ? ? ? ?: ?+5.7%
> GemsFDTD: +3.8%
> cactusADM: ?+3.6%
> leslie3d ? ? : ? ?+3.0%
> povray ? ? ?: ? ?+1.2%
> zeusmp: ? ? ? +1.8%
> xalancbmk: ?+1%
> mcf: ? ? ? ? ? ?+5.3%
>
> a) I also verified that large improvements from bwaves and calculix on
> opteron box -- they are reproducible
> b) There are more rooms that I did not persue further -- for instance,
> in the process of perf regression fixing, I noticed the speed up of
> cactusADM can be up to +14%, wrf upto 9%, and deallI upto +8%.
>
> m64
> -------
> calculix: ? ?+8.1%
> bwaves : ? +2.1%
> povray : ? ? +1.1%
> wrf ? ? ?: ? ? ?+1.4%
> gromacs: ? ?+1.0%
> xalanbmk: ? +1.2%
> h264ref: ? ? ?+1.4%
>
> SPEC06 degradations:
>
> gamess: ? -6% (32bit and 64bit)
> bzip2: ? ?-3% (32bit only)
>
> Investigation of gamess degradation shows that the performance
> difference comes from the difference of IVOPT on the inner most loop
> (in a 3-deep loop nest) in function twotff_. ? ?With the IVOPT patch,
> the inner loop has only 3 ivs and is tighter compared with loop
> without the patch, in which 6 ivs are generated. ? ?Profile data shows
> that the number of instructions retired got reduced a lot with the
> IVOPT patch while the unhalted CPU cyclecs increased on core-2.
> However, when running the program on an opteron box, the patched
> version is actually ~5% faster.
>
Here are the CPU2k6 results on AMD Phenom(tm) 9950 Quad-Core.
Old: Gcc 4.6.0 revision 162423
New: Gcc 4.6.0 revision 162423 + this patch.
Flags: -O3 -funroll-loops -fpeel-loops -ffast-math -march=native
The number is the run time percentage: (old - new) / old * 100
(positive is better)
400.perlbench -2.14%
401.bzip2 0.33%
403.gcc 0.86%
429.mcf 6.06%
445.gobmk 3.20%
456.hmmer 1.14%
458.sjeng 0.70%
462.libquantum -0.13%
464.h264ref 2.73%
471.omnetpp 0.69%
473.astar 0.12%
483.xalancbmk -1.28%
410.bwaves 5.71%
416.gamess 3.10%
433.milc 0.29%
434.zeusmp 1.86%
435.gromacs -0.18%
436.cactusADM 1.83%
437.leslie3d -0.61%
444.namd 0.14%
447.dealII 0.81%
450.soplex 0.61%
453.povray -3.37%
454.calculix 5.79%
459.GemsFDTD 0.74%
465.tonto 1.01%
470.lbm 0.35%
481.wrf 0.78%
482.sphinx3 0.00%
Overall it looks like a good improvement as well on AMD processors.
> Ok to checkin the patch with the above performance impact ? (I may
> find time to look at the regressions later after the checkin).
>
Note that your patch still contains formatting errors. Please use this
script to check the patch and correct the warnings:
http://gcc.gnu.org/viewcvs/trunk/contrib/check_GNU_style.sh
Thanks,
Sebastian Pop
--
AMD / Open Source Compiler Engineering / GNU Tools