This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Sorry for the delay on this patch. I rewrote the patch-3 (handling of pseudo invariants). The new implementation uses the cost_pair to store the invariant id, and it also tracks common invariants (that can be CSEed) such that the register pressure increase is not over counted. There are also more tunings in heuristics to determine if an invariant expression can be created (mainly driven by spec performance and bug fixes such as fixing regressions in hmmer, sixtrack and tonto). The patch has gone through lots of performance testing with spec06 and spec2k. I have fixed many performance regressions but some regressions still exist possibly hit by some uArch related issues (see below). The perf measurement was done on my Intel core-2 box with option -O2 -ffast-math -mfpmath=sse 1. SPEC06 m32 --------- bwaves: +14.7% calculiux: +12.8% wrf : +5.7% GemsFDTD: +3.8% cactusADM: +3.6% leslie3d : +3.0% povray : +1.2% zeusmp: +1.8% xalancbmk: +1% mcf: +5.3% a) I also verified that large improvements from bwaves and calculix on opteron box -- they are reproducible b) There are more rooms that I did not persue further -- for instance, in the process of perf regression fixing, I noticed the speed up of cactusADM can be up to +14%, wrf upto 9%, and deallI upto +8%. m64 ------- calculix: +8.1% bwaves : +2.1% povray : +1.1% wrf : +1.4% gromacs: +1.0% xalanbmk: +1.2% h264ref: +1.4% SPEC06 degradations: gamess: -6% (32bit and 64bit) bzip2: -3% (32bit only) Investigation of gamess degradation shows that the performance difference comes from the difference of IVOPT on the inner most loop (in a 3-deep loop nest) in function twotff_. With the IVOPT patch, the inner loop has only 3 ivs and is tighter compared with loop without the patch, in which 6 ivs are generated. Profile data shows that the number of instructions retired got reduced a lot with the IVOPT patch while the unhalted CPU cyclecs increased on core-2. However, when running the program on an opteron box, the patched version is actually ~5% faster. 2. SPEC2k m32 ------ perlbmk: +7.8% bzip2: +1.4% mgrid: +2.7% mesa: +2.2% facerec: +2.5% apsi: +2.7% gap: +2.0% m64 ------ gzip : 2.0% perlbmk: 2.1% wupwise: 8.0% mgrid: 2.6% applu: 2.5% Degredations: applu: -2.5% (m32) mesa: -2.3% (m64) Ok to checkin the patch with the above performance impact ? (I may find time to look at the regressions later after the checkin). Thanks, David On Fri, Jun 4, 2010 at 3:54 AM, Zdenek Dvorak <rakdver@kam.mff.cuni.cz> wrote: > Hi, > >> patch-1 ok for this revision? > > yes, modulo the standard formalities (missing changelog, information about > testing). ?Also, for the final submission, please split off the trivial > changes (formatting, comments, new debug dumps, ...) to a separate patch. ?Furthermore, > the avg. # of iterations part and the iv. elimination changes should be > separate patches (this will make it easier to find the source of the problems, > should any arise later), > > Zdenek >
Attachment:
ivopts_latest7.p
Description: Binary data
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |