This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: g77 performance on ALPHA


In article <37CAF304.B65C1506@moene.indiv.nluug.nl>,
Toon Moene  <toon@moene.indiv.nluug.nl> wrote:
>
>In addition to the fact that "their code":
>
>1. Doesn't have "nop"s.
>2. Misses some extraneous instructions updating the loop counter.
>3. Uses lda's instead of addq's to update addresses.
>4. Uses a more efficient scheme to deal with the "extra" loop bodies
>   when unrolling (something on my at-least-a-year-old-to-do-list).

None of those look all that noticeable.

>it also seems to have a different strategy to schedule this code.

..but this one is.  Doing ld+ld+ld+ld -> st+st+st+st is just generally a
ton more efficient (if you have the registers, which it does) than doing
ld->st + ld->st + ld->st..  and gives much better room for the hardware
to optimize things (I suspect that the 21264 doesn't much speculate
stores past loads, although who knows - they could check for aliases in
hardware). 

It looks like the Compaq compiler knows the stores cannot alias the
loads, while g77 thinks (or tells the instruction scheduler) that the
stores could alias and thus they never get moved down. 

The right schedule should fall out pretty automatically from the
load->use and gen->store delays - the fact that gcc has nops is a dead
giveaway to show that it's not scheduling very well and it does look
like it thinks the stores interfere with the loads. 

		Linus

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]