This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH]: Add pipeline description for Mips R10K-series
- From: Roger Sayle <roger at eyesopen dot com>
- To: Kumba <kumba at gentoo dot org>
- Cc: Ralf Baechle <ralf at linux-mips dot org>, <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 21 Dec 2005 09:23:11 -0700 (MST)
- Subject: Re: [PATCH]: Add pipeline description for Mips R10K-series
On Wed, 21 Dec 2005, Kumba wrote:
> * Not even close to knowing what to do for COST_N_INSNS or how to
> calculate these values correctly. Is there a known formula out
> there that simply needs values plugged in to get an average value?
Although I'm not much help with pipeline descriptions, I do understand
how rtx_costs should work. These values are used by the middle-end to
determine which of several possible code sequences should be generated.
For example, combine will replace a sequence of instructions with an
equivalent one if the rtx_costs are lower. All comparisons are done
within units of COSTS_N_INSNS(1) which is defined to be the "cost" of a
single fast ALU operation such as a word_mode addition. When optimizing
for size, these values should be the size of an instruction or sequence
relative to an addition, otherwise these reflect "ideal typical case
latency" relative to addition. Given most current CPUs often take a
single cycle to perform an integer addition, the COSTS_N_INSNS(x) can
be thought of as "x" in cycles.
The difficulty starts as soon as you start thinking about modelling
the pipeline :-(. Normally, for multiple issue machines with multiple
resources, the "ideal" assumption is used, so that rtx_costs assumes
that all necessary functional units are available, memories are in
cache, following and preceding instructions cause no conflicts etc...
and that the GCC scheduler and DFA model will make it so. Harder to
parameterize are data-dependent instructions whose latencies depend
upon their operand values. Although worst case is reasonable, better
code could be generated for some notion of average or typical. For
multiplications assuming half the operand bits set, for example.
Ultimately, its more of an art than a science.
Of course, the above describes an "ideal world". In practice, most
backends are very poorly parameterized by rtx_costs, with many falling
back to the machine-independent defaults and/or ignoring optimize_size
and/or ignoring the machine mode and even FP vs. integer, etc....
It's a tribute to GCC that it does so well knowing so little about the
performance characteristics of the target its compiling for! Of course,
it chicken-and-egg, where the optimizers in turn choose not to rely
too heaily on often suspect rtx_costs, and having highly accurate
values is unlikely to be a significant benefit :-)
If you're interested, I've an experimental "genrtxcosts" program that
empircally estimates rtx_costs based on timing GCC generated code
sequences, but so far its utility has mainly been as a nanobenchmark
identifying code sequences that we generate poorly on some platforms.
It might even be useful for tweaking GCC to optimize code to run on