This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Strange conditional jumps on the POWER4


On Sat, 5 Apr 2003, Zack Weinberg wrote:
> Roger Sayle <roger at www dot eyesopen dot com> writes:
> > I've come across some very stange behaviour when benchmarking
> > GCC on an AIX power4 box.
>
> It would help immensely if you would post assembly language snippets.

My apologies.  As I mentioned previously, I wan't sure whether this
is a register allocation issue, i.e. whether my timing loops affect
the slow-down etc...  However here are some snippets:

#include <stdio.h>
#include <time.h>

int foo(int j)
{
  clock_t t1, t2;
  unsigned int i;

  t1 = clock();
  for (i=0; i<100000000; i++)
  {
    if (j == 1) j = 0;
    else        j = 1;
  }
  t2 = clock();
  printf("ticks = %d\n",(t2-t1)/1000);
  return j;
}

int main()
{
  foo(0);
  return 0;
}


Generates the following loop body with both gcc3.4 "-O2" and gcc3.2
"-O2" (600 ticks)

L..10:
        xori 0,31,1
        addic 9,0,-1
        subfe 31,9,0
        bdnz L..10


Changing the condition to (j > 1) generates with gcc3.2 "-O2" (300
ticks)

L..11:
        cmpwi 0,31,1
        li 31,0
        bgt- 0,L..4
        li 31,1
L..4:
        bdnz L..11

but the same code, (j > 1), generates with gcc3.4 "-O2" (600 ticks)

L..10:
        cmpwi 7,31,1
        li 31,1
        ble- 7,L..4
        li 31,0
L..4:
        bdnz L..10


However, with (j == 1) and "gcc-3.4 -O2 -fno-if-conversion", we get
(1200 ticks)

L..10:
        cmpwi 7,31,1
        li 31,1
        beq- 7,L..12
        bdnz L..10
L..13:

	...

L..12:
        li 31,0
        bdnz L..10
        b L..13



So it looks as though the problem is not a register stall, or scheduling
problem but with branch prediction and basic block re-ordering.  If
you're lucky and the compiler gets the branch probabilities right,
and you always branch the same way it can be done in 300 ticks.  If
you're unlucky, the compiler gets the probabilities wrong and you
alternate taken/not-taken it takes 1200 ticks.

Hence if-conversion hedges its bets and always uses a 600 tick
straight-line sequence.

We were just lucky in gcc-3.2 that we got it right, and the "safer"
code generated in gcc-3.4 is just twice as slow.  If we'd got the
branch prediction wrong in gcc-3.2 we'd see a doubling of performance.


Sorry, if I've wasted your time.  I hope the above analysis is
interesting.  Obviously power4 has a significant misprediction
penalty.

Roger
--


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]