This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RFC: improving estimate_num_insns


In looking at the function inliner, I did some analysis of the correctness of estimate_num_insns (from tree-inline.c), as this measurement is critical to the inlining heuristics. Here is the overview of what I found on the functions in two sample files from the FAAD2 AAC library (measured at -O3 with inlining disabled):

syntax.c
              avg.    avg.     std.
target        ratio   error    deviation
arm-none      2.88     57%     2.37
darwin-ppc    3.17     55%     2.50
linux-x86     2.72     55%     2.06

huffman.c
              avg.    avg.     std.
target        ratio   error    deviation
arm-none      3.27    56%      2.37
darwin-ppc    4.19    53%      2.88
linux-x86     3.42    56%      2.39

fn. differential = actual bytes / estimated insns
avg. ratio = average (fn. differential) of all functions
fn. error = max (fn. differential, avg. ratio) / min (fn. differential, avg. ratio)
avg. error = average (fn. error) of all functions
std. deviation = standard deviation (fn. differential) across all functions


Note that by choosing these metrics, my intent was to isolate the consistency of the differential from the magnitude of the differential. In other words, the goal was to have estimates that were a consistent ratio to the actual results, whatever that ratio may be.

Thinking that there may be room for improvement on this, I tried this same experiment with a couple of adjustments to estimate_num_insns:
- Instead of ignoring constants, assign them a weight of 2 instructions (one for the constant itself and one to load into memory)
- Instead of ignoring dereferences, assign a weight of 2 to an array reference and 1 for a pointer dereference
- Instead of ignoring case labels, assign them a weight of 2 instructions (to represent the cost of control logic to get there)


With these modifications (tree-estimate.patch), I see the following results:

syntax.c
              avg.    avg.     std.
target        ratio   error    deviation
arm-none      1.68    29%      0.67
darwin-ppc    1.84    29%      0.69
linux-x86     1.62    26%      0.55

huffman.c
              avg.    avg.     std.
target        ratio   error    deviation
arm-none      1.75    26%      0.52
darwin-ppc    2.31    24%      0.69
linux-x86     1.95    25%      0.67

Which appears to be a significant improvement in the accuracy of estimate_num_insns. However, note that since the avg. ratio has decreased significantly (estimates have gone up because we're counting things that were ignored before), any heuristics that are based on instruction counts will have effectively decreased by ~25-50%. To compensate for that, I bumped up any constants that are based on instruction estimates by 50% each (see inl-costs.patch).

With both of these changes in place, I ran some simple SPEC2000 integer benchmarks (not fine-tuned, not reportable) and saw the following impact on performance (positive is better):

          darwin-ppc   linux-x86  linux-arm*
gzip      -4.7%        +11.7%     +1.3%
vpr       -0.7%        -1.1%      -----
mcf       -0.3%        +1.1%      -----
crafty    -0.8%        +1.8%      -----
parser    -----        -----      -----
eon       -4.4%        +0.3%      -----
perlbmk   +0.4%        0.0%       -1.1%
gap       -----        +1.0%      -2.9%
vortex    +4.1%        0.0%       0.0%
bzip2     +1.2%        +1.1%      -1.3%
twolf     -0.5%        -2.9%      -1.2%

----- = unable to obtain a valid result
* linux-arm results are based on a subset of SPEC data and command- line options for use on a 32MB embedded board -- not even close to the full SPEC suite.


For what it's worth, code size is equal to or smaller for all benchmarks across all platforms.

So, here are the open issues I see at this point:
1. It appears that this change to estimate_num_instructions generates a much better estimate of actual code size. However, the benchmark results are ambiguous. Is this patch worth considering as-is?
2. Increasing instruction weights causes the instruction-based values (e.g., --param max-inline-insns-auto) to be effectively lower. However, changing these constants/defaults as in the second patch will cause a semantic change to anyone who is setting these values at the command line. Is that change acceptable?


I do realize that this area is one that will eventually be likely best served by target-dependent routines (and constants), however I also see a significant benefit to all targets in fixing the default implementation first.

Thoughts? Advice?

Thanks -

Josh
~~~~
Josh Conner


Attachment: tree-estimate.patch
Description: Binary data

Attachment: inl-costs.patch
Description: Binary data



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]