This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: PA8000 performance oddity



  In message <199905211645.MAA03952@wmtl249c.us.nortel.com>you write:
  > 
  > I was looking at a test case someone had posted originally showing
  > poor g++ (vs gcc) performance and began looking at the performance
  > difference between aCC and egcs.  On this particular piece of code,
  > aCC wins by a sizeable margin.
  > 
  > In attempting to understand what is going on, I started looking at
  > register allocation in the doSth subroutine.  egcs heavily favors
  > reusing registers even when others are available while aCC apparently
  > tries to spread things out among registers when it can (i.e. b = a + b 
  > vs. c = a + b)
Yes.  Register renaming.  It can avoid false dependency stalls added by
register allocation in some circumstances.  GCC doesn't support this yet,
but does have some heuristics in the register allocator to try and avoid
creating false dependency stalls.

[ ... ]

  > I've included the complete assembly for anyone that cares to try it.
  > It is followed by a patch file to make the register change.  Link
  > using the 990517 snapshot and a recent binutils snapshot which has the
  > pa2.0 support.
  > 
  > Does anyone have a clue why this would happen?  Am I missing something
  > very obvious?
The most obvious thing I see is we are not taking advantage of the larger
displacement allowed in reg+disp addresses for FP insns.

In PA1.0/PA1.1 a FP load/store is only allowed a 5 bit displacement.  Thus
you see all those  "ldo disp(base), temp; fldds 0(temp),fptarget".

In PA2.0 FP loads/stores can have an aligned 16bit offset, which would
allow most of the ldo instructions to disappear.  This (of course) requires
more GAS work to support the larger displacements.

I did some fooling around with this stuff a while back and it was worth a
few more percent across specfp.

It may also be the case that using targetted FP compares rather than queued
FP compares.   I've never done any experiments to see what the benefit was,
but presumably PA, mips & others that added multiple FP condition code regs
did so for a reason :-)

Jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]