This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: (really Fortran patches)



  In message <9710231737.AA22320@moene.indiv.nluug.nl>you write:

  > Well, there is another difference between the loops you showed.   
  > The one produced without your `use' patch has this instruction:
  > 
  >         fmpysub,dbl %fr22,%fr24,%fr22,%fr25,%fr23
Yup.  I'd have to look closely at the code, this is probably
a one cycle difference on a relatively modern PA.

The USE variant didn't use fmpysub because it couldn't find an independent 
fmpy and fsub to issue together -- fmpyadd/fmpysub was a poor man's
way to increase FP performance back in 1991.  It's still useful on
most PAs, except the PA8000 based machines.

Why didn't it find independent ones in USE patch version?  Because
the scheduler wasn't able to reorder move instructions in such a
way as to force more registers to be used (and thus make it more
likely that later passes can find independent operations to combine
into an fmpyadd or fmpysub pattern).

I took a peek at another loop this morning, and it's got the same
fundamental problem -- the scheduler isn't able to move the loads
around enough.  After hand scheduling that one loop pair, half of the
overall tomcatv slowdown disappears.

The basic problem is it appears that the alias code gets confused
when a particular register is several sets removed from the original
base reg.

We exposed a similar problem a couple years ago with the static
combination code -- the trick is to recursively continue to
look for the base register instead of a one or two level search.

x = symbol_ref

y = x + index

z = y + index

etc.

To get the basereg for z, you need to recurse back to the symbol_ref
instead of stopping at y.

We may also be losing REGNO_POINTER_FLAG for some of the pseudos
created by loop -- which would have similar effects -- I'll have
to look at this further too.


  > BTW, does HP really palm off this PA as a *R*ISC architecture ?   
Yup.

  > With a five operand instruction ?
Yup.  Plus we have more general auto_inc_dec addressing than the
m68k, base + scaled index addressing, base + [scaled] index with
base register modification , etc etc.

  > `movc5' hiding somewhere, or `index',
It doesn't have movc5, but it can be easily synthesized from a
2 instruction sequence.

The PA has some ciscy characteristics, but it's still got many
riscy characteristics (load/store architecture, fixed instruction
length, instruction execute in a single cycle, etc).

Often I look at it as a risc box with ciscy address modes for
load/store operations.

jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]