[comitted] bump branch_cost to 5 for Athlon/K8

Mon Sep 27 17:06:00 GMT 2004

Hi,
while looking for the reason for recent slowdowns in the gzip benchmark
I noticed that we no longer ifconvert:

q(int a)
{
	int limit = a>1000?a-1000:0;
	return f(limit);
}

directly.  This has chnaged in between 3.2 and 3.3 when we changed ifcvt to use
costs based on rtx_cost.  Branch cost is set to 2, so we only ifconvert if both
if and else edge is trivial set instruction.
In 3.4 times we still suceeded to do the ifconversion by first matching IF-CASE
1 lifting the else edge out and so making it appear cheaper for subsequenct
ifcvt pass, but now we manage to confuse ourselves and not cascade ifcvt since
at the time ifcvt is executed after combine, the code no longer looks so easy
(and if we are going to match this, we ought to do that before combine of
course)

I experimented with bumping this value up and unlike the original tests in 3.1
period, I now get consistent improvements up to value of 5.  (my recollection
is that in 3.1 times higher values of the branch cost didn't improve/degrade
performance much in my initial tests and I choose 2 just because it produced
smallest code size).  The code size is also best with changed default so I am
going to commit the change as obvious.  It would be interesting to do similar
testing on Pentium3/4 target too.

The gzip and gcc regressions relative to 3.4 are not fully solved yet, however.
We still execute about 10% more branch instructions on SPEC benchmarks, partly
it is because of lost ifcvt oppurtunities.  For gzip/gcc case I found two
instances of testcase:
q(int a)
{
	int limit = a>1000?a-1000:0;
	return f(limit,a-1000);
}
that is no longer convertible after PRE that do insert computation of a-1000 in
the then edge of conditional.  Another failure is due to old loop optimizer
moving code out of loop right into middle of if/then construct so we no longer
have fallthru.
Last failure I found is due to dominator opts increasing register pressure
noticeably.  Disabling these three optimizers brings performance back to 3.4
scores....

Honza

64bit:

Size of binaries:
 164.gzip: Base: 62779 bytes
 164.gzip: Peak: 62779 bytes
 175.vpr: Base: 169165 bytes
 175.vpr: Peak: 169101 bytes
 176.gcc: Base: 1628744 bytes
 176.gcc: Peak: 1627432 bytes
 181.mcf: Base: 25563 bytes
 181.mcf: Peak: 25595 bytes
 186.crafty: Base: 226304 bytes
 186.crafty: Peak: 225920 bytes
 197.parser: Base: 147740 bytes
 197.parser: Peak: 147740 bytes
 252.eon: Base: 591362 bytes
 252.eon: Peak: 591362 bytes
 253.perlbmk: Base: 680088 bytes
 253.perlbmk: Peak: 679992 bytes
 254.gap: Base: 548526 bytes
 254.gap: Peak: 547950 bytes
 255.vortex: Base: 664532 bytes
 255.vortex: Peak: 664660 bytes
 256.bzip2: Base: 53926 bytes
 256.bzip2: Peak: 53926 bytes
 300.twolf: Base: 229950 bytes
 300.twolf: Peak: 229214 bytes
 =============================
 Total: Base: 5028679 bytes
 Total: Peak: 5025671 bytes

   164.gzip          1400     193         725*     1400     182         771*
   175.vpr           1400     187         748*     1400     188         746*
   176.gcc           1100     116         951*     1100     115         957*
   181.mcf           1800     441         408*     1800     439         410*
   186.crafty        1000      74.0      1351*     1000      75.0      1334*
   197.parser        1800     327         551*     1800     327         551*
   252.eon           1300     103        1267*     1300     101        1283*
   253.perlbmk       1800     178        1009*     1800     178        1012*
   254.gap           1100     144         765*     1100     144         766*
   255.vortex        1900     169        1125*     1900     163        1165*
   256.bzip2         1500     195         768*     1500     196         766*
   300.twolf         3000     350         857*     3000     367         826*
   Est. SPECint_base2000                  834
   Est. SPECint2000                                                     839

32bit:

Size of binaries:
 164.gzip: Base: 52199 bytes
 164.gzip: Peak: 52103 bytes
 175.vpr: Base: 156320 bytes
 175.vpr: Peak: 156032 bytes
 176.gcc: Base: 1438922 bytes
 176.gcc: Peak: 1436618 bytes
 181.mcf: Base: 21683 bytes
 181.mcf: Peak: 21683 bytes
 186.crafty: Base: 260398 bytes
 186.crafty: Peak: 260462 bytes
 197.parser: Base: 121552 bytes
 197.parser: Peak: 121552 bytes
 252.eon: Base: 526578 bytes
 252.eon: Peak: 526546 bytes
 253.perlbmk: Base: 580068 bytes
 253.perlbmk: Peak: 579748 bytes
 254.gap: Base: 482536 bytes
 254.gap: Peak: 482536 bytes
 255.vortex: Base: 649596 bytes
 255.vortex: Peak: 649596 bytes
 256.bzip2: Base: 45092 bytes
 256.bzip2: Peak: 45092 bytes
 300.twolf: Base: 226288 bytes
 300.twolf: Peak: 226288 bytes
 =============================
 Total: Base: 4561232 bytes
 Total: Peak: 4558256 bytes

   164.gzip          1400       210       666*     1400       202       693*
   175.vpr           1400       197       710*     1400       199       704*
   176.gcc           1100       130       843*     1100       131       840*
   181.mcf           1800       342       526*     1800       340       530*
   186.crafty        1000       113       887*     1000       113       885*
   197.parser        1800       291       619*     1800       290       620*
   252.eon           1300       136       956*     1300       136       953*
   253.perlbmk       1800       187       962*     1800       189       955*
   254.gap           1100       143       771*     1100       144       766*
   255.vortex        1900       179      1062*     1900       178      1068*
   256.bzip2         1500       231       648*     1500       234       642*
   300.twolf         3000       316       948*     3000       314       954*
   Est. SPECint_base2000                  783
   Est. SPECint2000                                                     784

bootstrapped/regtested x86_64-linux, going to commit it as obvious.
2004-09-27  Jan Hubicka  <jh@suse.cz>

	* i386.c (athlon_cost, k8_cost): Set BRANCH_COST to 5.

Index: config/i386/i386.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.c,v
retrieving revision 1.730
diff -c -3 -p -r1.730 i386.c
*** config/i386/i386.c	23 Sep 2004 14:34:24 -0000	1.730
--- config/i386/i386.c	26 Sep 2004 22:28:54 -0000
*************** struct processor_costs athlon_cost = {
*** 362,368 ****
    5,					/* MMX or SSE register to integer */
    64,					/* size of prefetch block */
    6,					/* number of parallel prefetches */
!   2,					/* Branch cost */
    4,					/* cost of FADD and FSUB insns.  */
    4,					/* cost of FMUL instruction.  */
    24,					/* cost of FDIV instruction.  */
--- 362,368 ----
    5,					/* MMX or SSE register to integer */
    64,					/* size of prefetch block */
    6,					/* number of parallel prefetches */
!   5,					/* Branch cost */
    4,					/* cost of FADD and FSUB insns.  */
    4,					/* cost of FMUL instruction.  */
    24,					/* cost of FDIV instruction.  */
*************** struct processor_costs k8_cost = {
*** 406,412 ****
    5,					/* MMX or SSE register to integer */
    64,					/* size of prefetch block */
    6,					/* number of parallel prefetches */
!   2,					/* Branch cost */
    4,					/* cost of FADD and FSUB insns.  */
    4,					/* cost of FMUL instruction.  */
    19,					/* cost of FDIV instruction.  */
--- 406,412 ----
    5,					/* MMX or SSE register to integer */
    64,					/* size of prefetch block */
    6,					/* number of parallel prefetches */
!   5,					/* Branch cost */
    4,					/* cost of FADD and FSUB insns.  */
    4,					/* cost of FMUL instruction.  */
    19,					/* cost of FDIV instruction.  */