This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
RFC: improving estimate_num_insns
- From: Josh Conner <jconner at apple dot com>
- To: gcc at gcc dot gnu dot org
- Date: Tue, 12 Jul 2005 10:56:04 -0700
- Subject: RFC: improving estimate_num_insns
In looking at the function inliner, I did some analysis of the
correctness of estimate_num_insns (from tree-inline.c), as this
measurement is critical to the inlining heuristics. Here is the
overview of what I found on the functions in two sample files from
the FAAD2 AAC library (measured at -O3 with inlining disabled):
syntax.c
avg. avg. std.
target ratio error deviation
arm-none 2.88 57% 2.37
darwin-ppc 3.17 55% 2.50
linux-x86 2.72 55% 2.06
huffman.c
avg. avg. std.
target ratio error deviation
arm-none 3.27 56% 2.37
darwin-ppc 4.19 53% 2.88
linux-x86 3.42 56% 2.39
fn. differential = actual bytes / estimated insns
avg. ratio = average (fn. differential) of all functions
fn. error = max (fn. differential, avg. ratio) / min (fn.
differential, avg. ratio)
avg. error = average (fn. error) of all functions
std. deviation = standard deviation (fn. differential) across all
functions
Note that by choosing these metrics, my intent was to isolate the
consistency of the differential from the magnitude of the
differential. In other words, the goal was to have estimates that
were a consistent ratio to the actual results, whatever that ratio
may be.
Thinking that there may be room for improvement on this, I tried this
same experiment with a couple of adjustments to estimate_num_insns:
- Instead of ignoring constants, assign them a weight of 2
instructions (one for the constant itself and one to load into memory)
- Instead of ignoring dereferences, assign a weight of 2 to an array
reference and 1 for a pointer dereference
- Instead of ignoring case labels, assign them a weight of 2
instructions (to represent the cost of control logic to get there)
With these modifications (tree-estimate.patch), I see the following
results:
syntax.c
avg. avg. std.
target ratio error deviation
arm-none 1.68 29% 0.67
darwin-ppc 1.84 29% 0.69
linux-x86 1.62 26% 0.55
huffman.c
avg. avg. std.
target ratio error deviation
arm-none 1.75 26% 0.52
darwin-ppc 2.31 24% 0.69
linux-x86 1.95 25% 0.67
Which appears to be a significant improvement in the accuracy of
estimate_num_insns. However, note that since the avg. ratio has
decreased significantly (estimates have gone up because we're
counting things that were ignored before), any heuristics that are
based on instruction counts will have effectively decreased by
~25-50%. To compensate for that, I bumped up any constants that are
based on instruction estimates by 50% each (see inl-costs.patch).
With both of these changes in place, I ran some simple SPEC2000
integer benchmarks (not fine-tuned, not reportable) and saw the
following impact on performance (positive is better):
darwin-ppc linux-x86 linux-arm*
gzip -4.7% +11.7% +1.3%
vpr -0.7% -1.1% -----
mcf -0.3% +1.1% -----
crafty -0.8% +1.8% -----
parser ----- ----- -----
eon -4.4% +0.3% -----
perlbmk +0.4% 0.0% -1.1%
gap ----- +1.0% -2.9%
vortex +4.1% 0.0% 0.0%
bzip2 +1.2% +1.1% -1.3%
twolf -0.5% -2.9% -1.2%
----- = unable to obtain a valid result
* linux-arm results are based on a subset of SPEC data and command-
line options for use on a 32MB embedded board -- not even close to
the full SPEC suite.
For what it's worth, code size is equal to or smaller for all
benchmarks across all platforms.
So, here are the open issues I see at this point:
1. It appears that this change to estimate_num_instructions generates
a much better estimate of actual code size. However, the benchmark
results are ambiguous. Is this patch worth considering as-is?
2. Increasing instruction weights causes the instruction-based values
(e.g., --param max-inline-insns-auto) to be effectively lower.
However, changing these constants/defaults as in the second patch
will cause a semantic change to anyone who is setting these values at
the command line. Is that change acceptable?
I do realize that this area is one that will eventually be likely
best served by target-dependent routines (and constants), however I
also see a significant benefit to all targets in fixing the default
implementation first.
Thoughts? Advice?
Thanks -
Josh
~~~~
Josh Conner
Attachment:
tree-estimate.patch
Description: Binary data
Attachment:
inl-costs.patch
Description: Binary data