This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: IVOPT improvement patch


On Wed, Jul 21, 2010 at 02:27, Xinliang David Li <davidxl@google.com> wrote:
> The perf measurement was done on my Intel core-2 box with option -O2
> -ffast-math -mfpmath=sse
>
> 1. SPEC06
>
> m32
> ---------
>
> bwaves: ?+14.7%
> calculiux: +12.8%
> wrf ? ? ? ?: ?+5.7%
> GemsFDTD: +3.8%
> cactusADM: ?+3.6%
> leslie3d ? ? : ? ?+3.0%
> povray ? ? ?: ? ?+1.2%
> zeusmp: ? ? ? +1.8%
> xalancbmk: ?+1%
> mcf: ? ? ? ? ? ?+5.3%
>
> a) I also verified that large improvements from bwaves and calculix on
> opteron box -- they are reproducible
> b) There are more rooms that I did not persue further -- for instance,
> in the process of perf regression fixing, I noticed the speed up of
> cactusADM can be up to +14%, wrf upto 9%, and deallI upto +8%.
>
> m64
> -------
> calculix: ? ?+8.1%
> bwaves : ? +2.1%
> povray : ? ? +1.1%
> wrf ? ? ?: ? ? ?+1.4%
> gromacs: ? ?+1.0%
> xalanbmk: ? +1.2%
> h264ref: ? ? ?+1.4%
>
> SPEC06 degradations:
>
> gamess: ? -6% (32bit and 64bit)
> bzip2: ? ?-3% (32bit only)
>
> Investigation of gamess degradation shows that the performance
> difference comes from the difference of IVOPT on the inner most loop
> (in a 3-deep loop nest) in function twotff_. ? ?With the IVOPT patch,
> the inner loop has only 3 ivs and is tighter compared with loop
> without the patch, in which 6 ivs are generated. ? ?Profile data shows
> that the number of instructions retired got reduced a lot with the
> IVOPT patch while the unhalted CPU cyclecs increased on core-2.
> However, when running the program on an opteron box, the patched
> version is actually ~5% faster.
>

Here are the CPU2k6 results on AMD Phenom(tm) 9950 Quad-Core.

Old: Gcc 4.6.0 revision 162423
New: Gcc 4.6.0 revision 162423 + this patch.
Flags: -O3 -funroll-loops -fpeel-loops -ffast-math -march=native

The number is the run time percentage: (old - new) / old * 100
(positive is better)

400.perlbench	-2.14%
401.bzip2	0.33%
403.gcc	0.86%
429.mcf	6.06%
445.gobmk	3.20%
456.hmmer	1.14%
458.sjeng	0.70%
462.libquantum	-0.13%
464.h264ref	2.73%
471.omnetpp	0.69%
473.astar	0.12%
483.xalancbmk	-1.28%
410.bwaves	5.71%
416.gamess	3.10%
433.milc	0.29%
434.zeusmp	1.86%
435.gromacs	-0.18%
436.cactusADM	1.83%
437.leslie3d	-0.61%
444.namd	0.14%
447.dealII	0.81%
450.soplex	0.61%
453.povray	-3.37%
454.calculix	5.79%
459.GemsFDTD	0.74%
465.tonto	1.01%
470.lbm	0.35%
481.wrf	0.78%
482.sphinx3	0.00%

Overall it looks like a good improvement as well on AMD processors.

> Ok to checkin the patch with the above performance impact ? (I may
> find time to look at the regressions later after the checkin).
>

Note that your patch still contains formatting errors.  Please use this
script to check the patch and correct the warnings:
http://gcc.gnu.org/viewcvs/trunk/contrib/check_GNU_style.sh

Thanks,
Sebastian Pop
--
AMD / Open Source Compiler Engineering / GNU Tools


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]