This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Improved target tuning in simplify-rtx.c

From: Roger Sayle <roger at eyesopen dot com>
To: tm_gccmail at kloo dot net
Cc: gcc-patches at gcc dot gnu dot org, <gcc at gcc dot gnu dot org>
Date: Tue, 15 Jun 2004 10:10:44 -0600 (MDT)
Subject: Re: [PATCH] Improved target tuning in simplify-rtx.c

On Tue, 15 Jun 2004 tm_gccmail@kloo.net wrote:
> One small metaproblem is that some ports have very sloppy RTX_COSTS
> which will lead to some pessimization.

Metaproblem is a good description of the issue.  When comparing GCC
against many of the commercial compilers it competes against, it's
clear that its poorly parameterized rtx_costs are an achilles heel.
Most "native" compilers target only a single processor architecture,
and are written by groups closely associated with the relevant
semiconductor groups:  Intel (and Microsoft) compilers are well
tuned for IA-32, IBM's xlc and Motorola's Metrowerks compilers for the
powerpc/rs6000, SGI's MIPSPro compilers for the MIPS architecture,
SUN's FORTE compilers for the SPARC architecture, Digital/Compaq/HP's
GEM compilers for the alpha, HP's compilers for PA-Risc, etc...
Many of these manufacturers don't even publicly provide the accurate
timing information necessary to parameterize optimizing compilers.
GCC therefore currently relies far more heavily on machine independent
transformations to remain competative.

Though there's little I can do about it, I'd really like to see the
quality of RTX_COSTS improved.  It might come as a shock to some that
the rs6000 backend doesn't provide timing for floating point addition
and multiplication (important for computational targets) and that the
ARM backend doesn't provide optimize_size costs (important for embedded
targets).  I'd be shocked myself if the IBM and ARM Ltd compilers didn't
have values for these parameters.

In an ideal world, GCC would eventually make use of RTX_COSTS in combine,
only combining instructions if the backend considered their "combination"
to be a win.  Clearly, this would require RTX_COSTS to return a reasonable
value for nearly every insn pattern defined in its machine description.
Clearly this is beyond even the best parameterized of GCC's current
backends.

One way forward may be to provide backend debugging tools to support
maintainers.  Perhaps a gencosts program that reports tables of values
of RTX_COST for each define_insn, that can be checked for discrepancies.
And perhaps even an automatic parameterization tool, that attempts to
time or determine the size of each pattern in the .md file automatically.

As mentioned by Ian Lance Taylor many of these cost values are heuristic
in nature.  The number of "cycles" taken to execute a given instruction
depends heavily upon the instructions around it and the state of the
pipeline/scheduling group, contents of primary and secondary caches,
position within cache-line, register bypasses, hazzards, etc...
Even when accurate microcode cycle timings are available, these may not
be the  most appropriate values for use in RTX_COSTS.  In the longer
term, I suspect that Robert Scott Ladd's proposals of using GA's
to refine RTX_COSTS will eventually become the way that new hardware
will be parameterized by future optimizing compilers.

Roger
--

Follow-Ups:
- Re: [PATCH] Improved target tuning in simplify-rtx.c
  - From: Geert Bosch
- Re: [PATCH] Improved target tuning in simplify-rtx.c
  - From: Jim Wilson

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]