This is the mail archive of the
mailing list for the GCC project.
Re: Transformations to increase parallelism
On Wed, Jul 23, 2003 at 06:27:21PM +0300, Ayal Zaks wrote:
> In response to: http://gcc.gnu.org/ml/gcc/2003-07/msg01606.html
> >On Tue, 22 Jul 2003, Dorit Naishlos wrote:
> >> Other compiler stages may be able to generate better code w/o these
> >> forms (we encountered such a situation when trying to optimize
> >> during combine); in fact, it may even be beneficial if this decision
> >> take place as late as possible (possibly even after sched2...?).
> >Can you give an example of this? Is this a power4-specific problem?
> Yes, and possibly yes again.
> In general, instead of generating a series of pairwise dependent insns:
> load_inc r2,4(r1)
> load_inc r3,4(r1)
> load_inc r4,4(r1)
> we prefer to generate:
> load r2,4(r1)
> load r3,8(r1)
> load_inc r4,12(r1)
> because on power4 (1) load_inc is more expensive than load in terms of
> resource utilization, and (2) removing data-dependencies allows faster
> time to start (out-of-order) execution.
> I think we ran across such redundant pre-increment modes compiling
I remember trying to track down a bug that only showed up on some PPC's if you
did 8 load_inc's in a row, and there wasn't enough writeback slots for both the
register and the incoming value.
On the other hand, some machines have a really small index field, so if you do:
you might have a better chance of compiling the code as is, instead of:
One of the machines with this was early HP PA machines, whose floating point
instructions had really small offsets, and it was problematical to just build
the frame when you were saving a lot of FP registers without autoinc.