This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: improving estimate_num_insns



On Jul 12, 2005, at 1:07 PM, Steven Bosscher wrote:


You don't say what compiler you used for these measurements. I suppose
you used mainline?

Yes, I am working with the mainline.


I think you should look at a lot more code than this.

OK - I stopped because I was seeing fairly consistent results. However, since both of these files were from the same source base, I can see how this may not be representative. I'll try some more examples from different source code, including C++.


Thinking that there may be room for improvement on this, I tried this
same experiment with a couple of adjustments to estimate_num_insns:
- Instead of ignoring constants, assign them a weight of 2
instructions (one for the constant itself and one to load into memory)



Why the load into memory, you mean constants that must be loaded from
the constant pool? I guess the cost for the constant itself depends on
whether it is a legitimate constant or not, and how it is loaded, and
this is all very target specific. But a cost greater than 0 probably
makes sense.

Good point - I was thinking of constant pools when I wrote that, even though that isn't always the case.


It would be nice if you retain the comment about constants in the
existing code somehow.  The cost of constants is not trivial, and you
should explain your choice better in a comment.

I'm not sure which comment you mean - the one from my email, or the one that was originally in the source code?


- Instead of ignoring case labels, assign them a weight of 2
instructions (to represent the cost of control logic to get there)


This depends on how the switch statement is expanded, e.g. to a binary
decision tree, or to a table jump, or to something smaller than that.
So again (like everything else, sadly) this is highly target-specific
and even context-specific. I'd think a cost of 2 is too pessimistic in
most cases.

Do you mean that 2 is too high? I actually got (slightly) better statistical results with a cost of 3, but I had the same reaction (that it was too pessimistic), which is why I settled on 2.


You could look at the code in stmt.c to see how switch statements are
expanded. Maybe there is a cheap test you can do to make CASE_LABELs
free for very small switch statements (i.e. the ones which should never
hold you back from inlining the function containing them ;-).

I'll look into this.


For what it's worth, code size is equal to or smaller for all
benchmarks across all platforms.


What about the compile time?

Oh no! I didn't measure this. I will have a look.


So, here are the open issues I see at this point:
1. It appears that this change to estimate_num_instructions generates
a much better estimate of actual code size.  However, the benchmark
results are ambiguous.  Is this patch worth considering as-is?


I would say you'd at least have to look into the ppc gzip and eon regressions before this is acceptable. But it is not my decision to make, of course.

Makes sense. I'll see what kind of time I can put into this.


2. Increasing instruction weights causes the instruction-based values
(e.g., --param max-inline-insns-auto) to be effectively lower.
However, changing these constants/defaults as in the second patch
will cause a semantic change to anyone who is setting these values at
the command line.  Is that change acceptable?


This has constantly changed from one release to the next since GCC 3.3,
so I don't think this should be a problem.

Whew...


Thoughts? Advice?


...on to advice then.


First of all, I think you should handle ARRAY_RANGE_REF and ARRAY_REF
the same. And you probably should make BIT_FIELD_REF more expensive,
and maybe make its cost depend on which bit you're addressing (you can
see that in operands 1 and 2 of the BIT_FIELD_REF). Its current cost
of just 1 is probably too optimistic, just like the other cases you are
trying to address with this patch.

Great - thanks for the suggestions, I'll try these changes as well and see if I get even better correlation.


Second, you may want to add a target hook to return the cost of target
builtins. Even builtins that expand to just one instruction are now
counted as 16 insns plus the cost of the arguments list. This badly
hurts when you use e.g. SSE intrinsics. It's probably not an issue for
the benchmark you looked at now, but maybe you want to look into it
anyway, while you're at it.

OK, I'll look into this.


Third,

(measured at -O3 with inlining disabled):

Then why not just -O2 with inlining disabled?  Now you have enabled
loop unswitching, which is known to sometimes significantly increase
code size.

Well, I thought that inlining estimates were most likely to be relevant at O3, and so I should include as many other optimizations as were likely to be performed alongside inlining, even though this may make it more difficult to get accurate estimates.


Fourth, look at _much_ more code than this. I would especially look at
a lot more C++ code, which is where our inliner heuristics can really
dramatically improve or destroy performance.

OK.


See Richard Guenther's
inline heuristics tweaks from earlier this year, in the thread starting
here: http://gcc.gnu.org/ml/gcc-patches/2005-03/msg01936.html.

I remember reading this when it was first posted, but looking at it again after my experiments I see the additional relevance. Especially the decision to assign a function call a cost of 16, which I quickly identified as hurting our size estimates, but which also improves inlining performance!


Finally, you've apparently used grep to find all the places where
PARAM_MAX_INLINE_INSNS_SINGLE and its friends are used, but you hve
missed the ones in opts.c and maybe elsewhere.

Hmmm - I looked for all of the places where estimate_num_insns was called. I still don't see anything in opts.c -- can you give me a little more of a hint?


Good luck, and thanks for working on this difficult stuff.

Thank you for the helpful comments!


- Josh


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]