This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: IVOPT improvement patch


Thanks Sebatian for testing it out. I also asked Pat to help testing
the patch again on powerpc. I will first split off the unrelated
patches and submit them first (e.g, multiple exit loop handling etc).

David

On Mon, Jul 26, 2010 at 9:31 AM, Sebastian Pop <sebpop@gmail.com> wrote:
> On Wed, Jul 21, 2010 at 02:27, Xinliang David Li <davidxl@google.com> wrote:
>> The perf measurement was done on my Intel core-2 box with option -O2
>> -ffast-math -mfpmath=sse
>>
>> 1. SPEC06
>>
>> m32
>> ---------
>>
>> bwaves: ?+14.7%
>> calculiux: +12.8%
>> wrf ? ? ? ?: ?+5.7%
>> GemsFDTD: +3.8%
>> cactusADM: ?+3.6%
>> leslie3d ? ? : ? ?+3.0%
>> povray ? ? ?: ? ?+1.2%
>> zeusmp: ? ? ? +1.8%
>> xalancbmk: ?+1%
>> mcf: ? ? ? ? ? ?+5.3%
>>
>> a) I also verified that large improvements from bwaves and calculix on
>> opteron box -- they are reproducible
>> b) There are more rooms that I did not persue further -- for instance,
>> in the process of perf regression fixing, I noticed the speed up of
>> cactusADM can be up to +14%, wrf upto 9%, and deallI upto +8%.
>>
>> m64
>> -------
>> calculix: ? ?+8.1%
>> bwaves : ? +2.1%
>> povray : ? ? +1.1%
>> wrf ? ? ?: ? ? ?+1.4%
>> gromacs: ? ?+1.0%
>> xalanbmk: ? +1.2%
>> h264ref: ? ? ?+1.4%
>>
>> SPEC06 degradations:
>>
>> gamess: ? -6% (32bit and 64bit)
>> bzip2: ? ?-3% (32bit only)
>>
>> Investigation of gamess degradation shows that the performance
>> difference comes from the difference of IVOPT on the inner most loop
>> (in a 3-deep loop nest) in function twotff_. ? ?With the IVOPT patch,
>> the inner loop has only 3 ivs and is tighter compared with loop
>> without the patch, in which 6 ivs are generated. ? ?Profile data shows
>> that the number of instructions retired got reduced a lot with the
>> IVOPT patch while the unhalted CPU cyclecs increased on core-2.
>> However, when running the program on an opteron box, the patched
>> version is actually ~5% faster.
>>
>
> Here are the CPU2k6 results on AMD Phenom(tm) 9950 Quad-Core.
>
> Old: Gcc 4.6.0 revision 162423
> New: Gcc 4.6.0 revision 162423 + this patch.
> Flags: -O3 -funroll-loops -fpeel-loops -ffast-math -march=native
>
> The number is the run time percentage: (old - new) / old * 100
> (positive is better)
>
> 400.perlbench ? -2.14%
> 401.bzip2 ? ? ? 0.33%
> 403.gcc 0.86%
> 429.mcf 6.06%
> 445.gobmk ? ? ? 3.20%
> 456.hmmer ? ? ? 1.14%
> 458.sjeng ? ? ? 0.70%
> 462.libquantum ?-0.13%
> 464.h264ref ? ? 2.73%
> 471.omnetpp ? ? 0.69%
> 473.astar ? ? ? 0.12%
> 483.xalancbmk ? -1.28%
> 410.bwaves ? ? ?5.71%
> 416.gamess ? ? ?3.10%
> 433.milc ? ? ? ?0.29%
> 434.zeusmp ? ? ?1.86%
> 435.gromacs ? ? -0.18%
> 436.cactusADM ? 1.83%
> 437.leslie3d ? ?-0.61%
> 444.namd ? ? ? ?0.14%
> 447.dealII ? ? ?0.81%
> 450.soplex ? ? ?0.61%
> 453.povray ? ? ?-3.37%
> 454.calculix ? ?5.79%
> 459.GemsFDTD ? ?0.74%
> 465.tonto ? ? ? 1.01%
> 470.lbm 0.35%
> 481.wrf 0.78%
> 482.sphinx3 ? ? 0.00%
>
> Overall it looks like a good improvement as well on AMD processors.
>
>> Ok to checkin the patch with the above performance impact ? (I may
>> find time to look at the regressions later after the checkin).
>>
>
> Note that your patch still contains formatting errors. ?Please use this
> script to check the patch and correct the warnings:
> http://gcc.gnu.org/viewcvs/trunk/contrib/check_GNU_style.sh
>
> Thanks,
> Sebastian Pop
> --
> AMD / Open Source Compiler Engineering / GNU Tools
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]