This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: GCC scheduler tuning

From: Vladimir Makarov <vmakarov at redhat dot com>
To: David Edelsohn <dje at watson dot ibm dot com>
Cc: Richard Henderson <rth at redhat dot com>, Jan Hubicka <jh at suse dot cz>,gcc-patches at gcc dot gnu dot org
Date: Tue, 11 Mar 2003 14:52:21 -0500
Subject: Re: GCC scheduler tuning
References: <20030304004929.GL12472@redhat.com> <200303111922.OAA26546@makai.watson.ibm.com>

David Edelsohn wrote:
> 
>         We're still working on the heuristics to improve ifcvt rtx_cost to
> reduce overspeculation, but I want to provide more information about one
> cause of another overspeculation problem to get some feedback / comment.
> 
>         In our tests, the performance degradation of the first scheduling
> pass is not due to increased register pressure, but due to the
> pre-register allocation scheduler not seeing an instruction stream which
> matches the target architecture.  Specifically, the instruction stream
> contains many unnecessary register-to-register moves and omits moves which
> will be inserted by reload.  The first global scheduling pass re-arranges
> (and speculates) insns which never will appear (or never should appear) in
> the final output to match the dispatch, function units, and latencies of
> the real processor.
> 
>         Other production compilers handle this problem by implementing an
> early register coalescing phase before the first global scheduler pass.
> This is good on architectures with many registers, but bad on
> register-starved architectures.  On register starved architectures, one
> wants to insert many *more* register copies to split live ranges and allow
> the register allocator to coalesce/uncoalesce them as necessary, however,
> coalescing, scheduling, then uncoalescing may not be efficient.
> 
>         Any comments about a pre-scheduler coalescing phase for
> non-register-starved architectures, such as x86?  A cost-based, live-range
> splitter would be necessary for the coalescing phase to be beneficial
> everywhere.

The register allocation and the 1st insn scheduling creates many
contradictions.  Usage of two insn scheduling (before and after register
allocation) is a classical approach originating from times when insn
schedulers were primitive and did not do a complex transformations.

With my point of view it is time to use approach of one insn scheduling
after register allocation because a good insn scheduler should make
register renaming/forward substitution on the fly in any case.  It
permits to remove additional dependencies created by the register
allocation. That was the reason of the insn scheduling before the
register allocation in the classical approach.  

But the current code can not be easy modified for this.  So we have to
use temporary solutions creating new contradictions.

Vlad

References:
- Re: PowerPC performance tuning
  - From: Richard Henderson
- GCC scheduler tuning
  - From: David Edelsohn

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]