This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

GCC scheduler tuning


	We're still working on the heuristics to improve ifcvt rtx_cost to
reduce overspeculation, but I want to provide more information about one
cause of another overspeculation problem to get some feedback / comment.

	In our tests, the performance degradation of the first scheduling
pass is not due to increased register pressure, but due to the
pre-register allocation scheduler not seeing an instruction stream which
matches the target architecture.  Specifically, the instruction stream
contains many unnecessary register-to-register moves and omits moves which
will be inserted by reload.  The first global scheduling pass re-arranges
(and speculates) insns which never will appear (or never should appear) in
the final output to match the dispatch, function units, and latencies of
the real processor.

	Other production compilers handle this problem by implementing an
early register coalescing phase before the first global scheduler pass.
This is good on architectures with many registers, but bad on
register-starved architectures.  On register starved architectures, one
wants to insert many *more* register copies to split live ranges and allow
the register allocator to coalesce/uncoalesce them as necessary, however,
coalescing, scheduling, then uncoalescing may not be efficient.

	Any comments about a pre-scheduler coalescing phase for
non-register-starved architectures, such as x86?  A cost-based, live-range
splitter would be necessary for the coalescing phase to be beneficial
everywhere.

David


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]