This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Why performance so poor on Alpha?


Martin Kahlert wrote:

> I wrote:

> > First of all you have to generate different pseudo registers for
> > floating temporaries for every loop body copy while unrolling, keeping
> > in mind the liveness properties of the floating point values involved.
> >
> > If you solved that problem, then you have to prevent local-alloc.c from
> > assigning these pseudo's to the minimal number of registers necessary
> > (which it can determine because it also knows which value is live over
> > which range of instructions) - nicely undoing all the hard work you did
> > above.

> Does this mean, that even i wrote your pseudo code in Fortran by hand,
> local-alloc.c would undo this change, too? (Sorry, i don't have my
> Alpha handy now)

Yep, I think so, but can't check it now (no access to an Alpha).  This
would be an interesting experiment to try, though.

Another thing I forgot to write is that the ev56 implementation is
particularly sensitive to these differences.  The newer ev6 with its
out-of-order instruction scheduling has fewer problems with the code as
g77 generates it.

For instance, here are some STREAM benchmark results Greg Lindahl sent
me earlier this year (numbers for an XP1000):

g77 -O2 -funroll-loops:

Function     Rate (MB/s)  RMS time   Min time  Max time
Copy:        655.3624      0.0493      0.0488      0.0508
Scale:       555.3933      0.0577      0.0576      0.0586
Add:         599.4164      0.0801      0.0801      0.0801
Triad:       558.5424      0.0869      0.0859      0.0879

Compaq's compiler -O5 -tune=ev6:

Function     Rate (MB/s)  RMS time   Min time  Max time
Copy:        725.3111      0.0444      0.0441      0.0463
Scale:       675.5626      0.0474      0.0474      0.0475
Add:         736.6600      0.0652      0.0652      0.0653
Triad:       692.9703      0.0693      0.0693      0.0694

STREAM also suffers from this "tiny loops must be unrolled" syndrome -
as you can see there's still a marked difference, but we're not talking
about a factor of 4-5.

> So, perhaps some sort of modification of gas would help?
> I mean to rewrite the asm gcc outputs using more registers.
> On a load store architecture like the Alpha with a clean asm
> lifetime analysis of registers should be fairly easy and this could be
> done here more easily. Perhaps this is some sort of peep hole opt?

I wouldn't do this in gas - for this to work in the general case you
would have to reconstruct flow graphs, and an assembler is not equipped
to do that sort of things.  No, we really should solve this in the
compiler - I just don't know how, yet.

> Thanks for your good explanation.

I'm glad it was clear.

> The early bird gets the worm. If you want something else for
> breakfast, get up later.

:-)

-- 
Toon Moene (toon@moene.indiv.nluug.nl)
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
Phone: +31 346 214290; Fax: +31 346 214286
GNU Fortran: http://gcc.gnu.org/onlinedocs/g77_news.html


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]