[comitted] bump branch_cost to 5 for Athlon/K8
Jan Hubicka
jh@suse.cz
Mon Sep 27 17:06:00 GMT 2004
Hi,
while looking for the reason for recent slowdowns in the gzip benchmark
I noticed that we no longer ifconvert:
q(int a)
{
int limit = a>1000?a-1000:0;
return f(limit);
}
directly. This has chnaged in between 3.2 and 3.3 when we changed ifcvt to use
costs based on rtx_cost. Branch cost is set to 2, so we only ifconvert if both
if and else edge is trivial set instruction.
In 3.4 times we still suceeded to do the ifconversion by first matching IF-CASE
1 lifting the else edge out and so making it appear cheaper for subsequenct
ifcvt pass, but now we manage to confuse ourselves and not cascade ifcvt since
at the time ifcvt is executed after combine, the code no longer looks so easy
(and if we are going to match this, we ought to do that before combine of
course)
I experimented with bumping this value up and unlike the original tests in 3.1
period, I now get consistent improvements up to value of 5. (my recollection
is that in 3.1 times higher values of the branch cost didn't improve/degrade
performance much in my initial tests and I choose 2 just because it produced
smallest code size). The code size is also best with changed default so I am
going to commit the change as obvious. It would be interesting to do similar
testing on Pentium3/4 target too.
The gzip and gcc regressions relative to 3.4 are not fully solved yet, however.
We still execute about 10% more branch instructions on SPEC benchmarks, partly
it is because of lost ifcvt oppurtunities. For gzip/gcc case I found two
instances of testcase:
q(int a)
{
int limit = a>1000?a-1000:0;
return f(limit,a-1000);
}
that is no longer convertible after PRE that do insert computation of a-1000 in
the then edge of conditional. Another failure is due to old loop optimizer
moving code out of loop right into middle of if/then construct so we no longer
have fallthru.
Last failure I found is due to dominator opts increasing register pressure
noticeably. Disabling these three optimizers brings performance back to 3.4
scores....
Honza
64bit:
Size of binaries:
164.gzip: Base: 62779 bytes
164.gzip: Peak: 62779 bytes
175.vpr: Base: 169165 bytes
175.vpr: Peak: 169101 bytes
176.gcc: Base: 1628744 bytes
176.gcc: Peak: 1627432 bytes
181.mcf: Base: 25563 bytes
181.mcf: Peak: 25595 bytes
186.crafty: Base: 226304 bytes
186.crafty: Peak: 225920 bytes
197.parser: Base: 147740 bytes
197.parser: Peak: 147740 bytes
252.eon: Base: 591362 bytes
252.eon: Peak: 591362 bytes
253.perlbmk: Base: 680088 bytes
253.perlbmk: Peak: 679992 bytes
254.gap: Base: 548526 bytes
254.gap: Peak: 547950 bytes
255.vortex: Base: 664532 bytes
255.vortex: Peak: 664660 bytes
256.bzip2: Base: 53926 bytes
256.bzip2: Peak: 53926 bytes
300.twolf: Base: 229950 bytes
300.twolf: Peak: 229214 bytes
=============================
Total: Base: 5028679 bytes
Total: Peak: 5025671 bytes
164.gzip 1400 193 725* 1400 182 771*
175.vpr 1400 187 748* 1400 188 746*
176.gcc 1100 116 951* 1100 115 957*
181.mcf 1800 441 408* 1800 439 410*
186.crafty 1000 74.0 1351* 1000 75.0 1334*
197.parser 1800 327 551* 1800 327 551*
252.eon 1300 103 1267* 1300 101 1283*
253.perlbmk 1800 178 1009* 1800 178 1012*
254.gap 1100 144 765* 1100 144 766*
255.vortex 1900 169 1125* 1900 163 1165*
256.bzip2 1500 195 768* 1500 196 766*
300.twolf 3000 350 857* 3000 367 826*
Est. SPECint_base2000 834
Est. SPECint2000 839
32bit:
Size of binaries:
164.gzip: Base: 52199 bytes
164.gzip: Peak: 52103 bytes
175.vpr: Base: 156320 bytes
175.vpr: Peak: 156032 bytes
176.gcc: Base: 1438922 bytes
176.gcc: Peak: 1436618 bytes
181.mcf: Base: 21683 bytes
181.mcf: Peak: 21683 bytes
186.crafty: Base: 260398 bytes
186.crafty: Peak: 260462 bytes
197.parser: Base: 121552 bytes
197.parser: Peak: 121552 bytes
252.eon: Base: 526578 bytes
252.eon: Peak: 526546 bytes
253.perlbmk: Base: 580068 bytes
253.perlbmk: Peak: 579748 bytes
254.gap: Base: 482536 bytes
254.gap: Peak: 482536 bytes
255.vortex: Base: 649596 bytes
255.vortex: Peak: 649596 bytes
256.bzip2: Base: 45092 bytes
256.bzip2: Peak: 45092 bytes
300.twolf: Base: 226288 bytes
300.twolf: Peak: 226288 bytes
=============================
Total: Base: 4561232 bytes
Total: Peak: 4558256 bytes
164.gzip 1400 210 666* 1400 202 693*
175.vpr 1400 197 710* 1400 199 704*
176.gcc 1100 130 843* 1100 131 840*
181.mcf 1800 342 526* 1800 340 530*
186.crafty 1000 113 887* 1000 113 885*
197.parser 1800 291 619* 1800 290 620*
252.eon 1300 136 956* 1300 136 953*
253.perlbmk 1800 187 962* 1800 189 955*
254.gap 1100 143 771* 1100 144 766*
255.vortex 1900 179 1062* 1900 178 1068*
256.bzip2 1500 231 648* 1500 234 642*
300.twolf 3000 316 948* 3000 314 954*
Est. SPECint_base2000 783
Est. SPECint2000 784
bootstrapped/regtested x86_64-linux, going to commit it as obvious.
2004-09-27 Jan Hubicka <jh@suse.cz>
* i386.c (athlon_cost, k8_cost): Set BRANCH_COST to 5.
Index: config/i386/i386.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.c,v
retrieving revision 1.730
diff -c -3 -p -r1.730 i386.c
*** config/i386/i386.c 23 Sep 2004 14:34:24 -0000 1.730
--- config/i386/i386.c 26 Sep 2004 22:28:54 -0000
*************** struct processor_costs athlon_cost = {
*** 362,368 ****
5, /* MMX or SSE register to integer */
64, /* size of prefetch block */
6, /* number of parallel prefetches */
! 2, /* Branch cost */
4, /* cost of FADD and FSUB insns. */
4, /* cost of FMUL instruction. */
24, /* cost of FDIV instruction. */
--- 362,368 ----
5, /* MMX or SSE register to integer */
64, /* size of prefetch block */
6, /* number of parallel prefetches */
! 5, /* Branch cost */
4, /* cost of FADD and FSUB insns. */
4, /* cost of FMUL instruction. */
24, /* cost of FDIV instruction. */
*************** struct processor_costs k8_cost = {
*** 406,412 ****
5, /* MMX or SSE register to integer */
64, /* size of prefetch block */
6, /* number of parallel prefetches */
! 2, /* Branch cost */
4, /* cost of FADD and FSUB insns. */
4, /* cost of FMUL instruction. */
19, /* cost of FDIV instruction. */
--- 406,412 ----
5, /* MMX or SSE register to integer */
64, /* size of prefetch block */
6, /* number of parallel prefetches */
! 5, /* Branch cost */
4, /* cost of FADD and FSUB insns. */
4, /* cost of FMUL instruction. */
19, /* cost of FDIV instruction. */
More information about the Gcc-patches
mailing list