This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: RFC: improving estimate_num_insns

From: Josh Conner <jconner at apple dot com>
To: gcc at gcc dot gnu dot org
Cc: Steven Bosscher <stevenb at suse dot de>, Jan Hubicka <jh at suse dot cz>
Date: Tue, 12 Jul 2005 19:29:48 -0700
Subject: Re: RFC: improving estimate_num_insns
References: <E88CA000-08FD-467A-9DE5-AF9459339DCC@apple.com> <200507122207.51801.stevenb@suse.de>

On Jul 12, 2005, at 1:07 PM, Steven Bosscher wrote:

You don't say what compiler you used for these measurements. I suppose you used mainline?

Yes, I am working with the mainline.

I think you should look at a lot more code than this.

OK - I stopped because I was seeing fairly consistent results. However, since both of these files were from the same source base, I can see how this may not be representative. I'll try some more examples from different source code, including C++.

Thinking that there may be room for improvement on this, I tried this same experiment with a couple of adjustments to estimate_num_insns: - Instead of ignoring constants, assign them a weight of 2 instructions (one for the constant itself and one to load into memory)

Why the load into memory, you mean constants that must be loaded from the constant pool? I guess the cost for the constant itself depends on whether it is a legitimate constant or not, and how it is loaded, and this is all very target specific. But a cost greater than 0 probably makes sense.

Good point - I was thinking of constant pools when I wrote that, even though that isn't always the case.

It would be nice if you retain the comment about constants in the
existing code somehow.  The cost of constants is not trivial, and you
should explain your choice better in a comment.

I'm not sure which comment you mean - the one from my email, or the one that was originally in the source code?

- Instead of ignoring case labels, assign them a weight of 2
instructions (to represent the cost of control logic to get there)
This depends on how the switch statement is expanded, e.g. to a binary decision tree, or to a table jump, or to something smaller than that. So again (like everything else, sadly) this is highly target-specific and even context-specific. I'd think a cost of 2 is too pessimistic in most cases.

Do you mean that 2 is too high? I actually got (slightly) better statistical results with a cost of 3, but I had the same reaction (that it was too pessimistic), which is why I settled on 2.

You could look at the code in stmt.c to see how switch statements are expanded. Maybe there is a cheap test you can do to make CASE_LABELs free for very small switch statements (i.e. the ones which should never hold you back from inlining the function containing them ;-).

I'll look into this.

For what it's worth, code size is equal to or smaller for all
benchmarks across all platforms.

What about the compile time?

Oh no! I didn't measure this. I will have a look.

So, here are the open issues I see at this point:
1. It appears that this change to estimate_num_instructions generates
a much better estimate of actual code size.  However, the benchmark
results are ambiguous.  Is this patch worth considering as-is?


I would say you'd at least have to look into the ppc gzip and eon
regressions before this is acceptable.  But it is not my decision to
make, of course.

Makes sense. I'll see what kind of time I can put into this.

2. Increasing instruction weights causes the instruction-based values
(e.g., --param max-inline-insns-auto) to be effectively lower.
However, changing these constants/defaults as in the second patch
will cause a semantic change to anyone who is setting these values at
the command line.  Is that change acceptable?

This has constantly changed from one release to the next since GCC 3.3, so I don't think this should be a problem.

Whew...

Thoughts? Advice?

...on to advice then.

First of all, I think you should handle ARRAY_RANGE_REF and ARRAY_REF the same. And you probably should make BIT_FIELD_REF more expensive, and maybe make its cost depend on which bit you're addressing (you can see that in operands 1 and 2 of the BIT_FIELD_REF). Its current cost of just 1 is probably too optimistic, just like the other cases you are trying to address with this patch.

Great - thanks for the suggestions, I'll try these changes as well and see if I get even better correlation.

Second, you may want to add a target hook to return the cost of target builtins. Even builtins that expand to just one instruction are now counted as 16 insns plus the cost of the arguments list. This badly hurts when you use e.g. SSE intrinsics. It's probably not an issue for the benchmark you looked at now, but maybe you want to look into it anyway, while you're at it.

OK, I'll look into this.

Third,

(measured at -O3 with inlining disabled):

Then why not just -O2 with inlining disabled?  Now you have enabled
loop unswitching, which is known to sometimes significantly increase
code size.

Well, I thought that inlining estimates were most likely to be relevant at O3, and so I should include as many other optimizations as were likely to be performed alongside inlining, even though this may make it more difficult to get accurate estimates.

Fourth, look at _much_ more code than this. I would especially look at a lot more C++ code, which is where our inliner heuristics can really dramatically improve or destroy performance.

OK.

See Richard Guenther's inline heuristics tweaks from earlier this year, in the thread starting here: http://gcc.gnu.org/ml/gcc-patches/2005-03/msg01936.html.

I remember reading this when it was first posted, but looking at it again after my experiments I see the additional relevance. Especially the decision to assign a function call a cost of 16, which I quickly identified as hurting our size estimates, but which also improves inlining performance!

Finally, you've apparently used grep to find all the places where
PARAM_MAX_INLINE_INSNS_SINGLE and its friends are used, but you hve
missed the ones in opts.c and maybe elsewhere.

Hmmm - I looked for all of the places where estimate_num_insns was called. I still don't see anything in opts.c -- can you give me a little more of a hint?

Good luck, and thanks for working on this difficult stuff.

Thank you for the helpful comments!

- Josh

Follow-Ups:
- Re: RFC: improving estimate_num_insns
  - From: Steven Bosscher

References:
- RFC: improving estimate_num_insns
  - From: Josh Conner
- Re: RFC: improving estimate_num_insns
  - From: Steven Bosscher

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]