This is the mail archive of the
mailing list for the GCC project.
Re: Suboptimal bb ordering with -Os on arm
On Fri, Nov 11, 2016 at 02:16:18AM +0100, Nicolai Stange wrote:
> >> >From the discussion on gcc-patches  of what is now the aforementioned
> >> r228318 ("bb-reorder: Add -freorder-blocks-algorithm= and wire it up"),
> >> it is not clear to me whether this change can actually reduce code size
> >> beyond those 0.1% given there for -Os.
> > There is r228692 as well.
> Ok, summarizing, that changelog says that the simple algorithm
> potentially produced even bigger code with -Os than stc did. From that
> commit on, this remains true only on x86 and mn10300. Right?
x86 and mn10300 use STC at -Os by default.
> >> So first question:
> >> Do you guys know of any code where there are more significant code size
> >> savings achieved?
> > For -O2 it is ~15%, which matters a lot for targets where STC isn't faster
> > at all (targets without cache / with tiny cache / with only cache memory).
> If I understand you correctly, this means that there is a use case for
> having -O2 -freorder-blocks-algorithm=simple, right?
Yes, that is why I wrote this code at all :-)
(And then it turned out to be actually *bigger* at -Os, so I fixed that).
> >> And second question:
> >> If that isn't the case, would it possibly make sense to partly revert
> >> gcc's behaviour and set -freorder-blocks-algorithm=stc at -Os?
> > -Os does many other things that are slower but smaller as well.
> Sure. Let me restate my original question: assume for a moment that it
> is true that -Os with simple never produces code smaller than 0.1% of
> what is created by -Os with stc. I haven't got any idea what the "other
> things" are able to achieve w.r.t code size savings, but to me, 0.1%
> doesn't appear to be that huge. Don't get me wrong: I *really* can't
> judge on whether 0.1% is a significant improvement or not. I'm just
> assuming that it's not. With this assumption, the question of whether
> those saved 0.1% are really worth the significantly decreased
> performance encountered in some situations seemed just natural...
It all depends on the tradeoff you want. There are many knobs you can
turn -- for example the inlining params, that has quite some effect on
-Os is mostly -O2 except those things that increase code size.
What is the tradeoff in your case? What is a realistic number for the
slowdown of your kernel? Do you see hotspots in there that should be
handled better anyhow? Etc.
> No, I want small, possibly at the cost of performance to the extent of
> what's sensible. What sensible actually is is what my question is about.
It is different for every use case I'm afraid.
> So, summarizing, I'm not asking whether I should use -O2 or -Os or
> whatever, but whether the current behaviour I'm seeing with -Os is
> intended/expected quantitatively.
With simple you get smaller code than with STC, so -Os uses simple.
If that is ridiculously slower then you won't hear me complaining if
you propose defaulting it the other way; but you haven't shown any
convincing numbers yet?