This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: IVOPT improvement patch


Sorry for the delay on this patch.

I rewrote the patch-3 (handling of pseudo invariants). The new
implementation uses the cost_pair to store the invariant id, and it
also tracks common invariants (that can be CSEed) such that the
register pressure increase is not over counted. There are also more
tunings in heuristics to determine if an invariant expression can be
created (mainly driven by spec performance and bug fixes such as
fixing regressions in hmmer, sixtrack and tonto).

The patch has gone through lots of performance testing with spec06 and
spec2k. I have fixed many performance regressions but some regressions
still exist possibly hit by some uArch related issues (see below).

The perf measurement was done on my Intel core-2 box with option -O2
-ffast-math -mfpmath=sse

1. SPEC06

m32
---------

bwaves:  +14.7%
calculiux: +12.8%
wrf        :  +5.7%
GemsFDTD: +3.8%
cactusADM:  +3.6%
leslie3d     :    +3.0%
povray      :    +1.2%
zeusmp:       +1.8%
xalancbmk:  +1%
mcf:            +5.3%

a) I also verified that large improvements from bwaves and calculix on
opteron box -- they are reproducible
b) There are more rooms that I did not persue further -- for instance,
in the process of perf regression fixing, I noticed the speed up of
cactusADM can be up to +14%, wrf upto 9%, and deallI upto +8%.

m64
-------
calculix:    +8.1%
bwaves :   +2.1%
povray :     +1.1%
wrf      :      +1.4%
gromacs:    +1.0%
xalanbmk:   +1.2%
h264ref:      +1.4%

SPEC06 degradations:

gamess:   -6% (32bit and 64bit)
bzip2:    -3% (32bit only)

Investigation of gamess degradation shows that the performance
difference comes from the difference of IVOPT on the inner most loop
(in a 3-deep loop nest) in function twotff_.    With the IVOPT patch,
the inner loop has only 3 ivs and is tighter compared with loop
without the patch, in which 6 ivs are generated.    Profile data shows
that the number of instructions retired got reduced a lot with the
IVOPT patch while the unhalted CPU cyclecs increased on core-2.
However, when running the program on an opteron box, the patched
version is actually ~5% faster.


2. SPEC2k

m32
------

perlbmk:    +7.8%
bzip2:        +1.4%
mgrid:        +2.7%
mesa:        +2.2%
facerec:      +2.5%
apsi:           +2.7%
gap:            +2.0%

m64
------
gzip :     2.0%
perlbmk:  2.1%
wupwise: 8.0%
mgrid:     2.6%
applu:     2.5%

Degredations:

applu: -2.5% (m32)
mesa: -2.3% (m64)

Ok to checkin the patch with the above performance impact ? (I may
find time to look at the regressions later after the checkin).

Thanks,

David


On Fri, Jun 4, 2010 at 3:54 AM, Zdenek Dvorak <rakdver@kam.mff.cuni.cz> wrote:
> Hi,
>
>> patch-1 ok for this revision?
>
> yes, modulo the standard formalities (missing changelog, information about
> testing). ?Also, for the final submission, please split off the trivial
> changes (formatting, comments, new debug dumps, ...) to a separate patch. ?Furthermore,
> the avg. # of iterations part and the iv. elimination changes should be
> separate patches (this will make it easier to find the source of the problems,
> should any arise later),
>
> Zdenek
>

Attachment: ivopts_latest7.p
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]