This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Suboptimal bb ordering with -Os on arm


Hi Segher,

thanks for your prompt reply!


Segher Boessenkool <segher@kernel.crashing.org> writes:

> On Fri, Nov 11, 2016 at 12:03:44AM +0100, Nicolai Stange wrote:
>> in the course of doing some benchmarks on arm with -Os, I noticed that
>> some list traversion code became significantly slower since gcc 5.3 when
>> instruction caches are cold.
>
> But is it smaller?  This tiny example function is not, but on average?

The Linux kernel's .text with for my config at hand is smaller by ~0.1%
with simple than with stc.

I gave this tiny example only to demonstrate the bb ordering issue I was
talking about. Of course, it's made up. So in particular it was not
meant to show anything related to code size.


> If you care about speed instead of size, you should not use -Os.

Indeed.


>> That being said, I could certainly go and submit a patch to the Linux
>> kernel setting -freorder-blocks-algorithm=stc for the -Os case.
>
> Or do not set CONFIG_CC_OPTIMIZE_FOR_SIZE in your kernel config.

Yes, of course.


>> >From the discussion on gcc-patches [1] of what is now the aforementioned
>> r228318 ("bb-reorder: Add -freorder-blocks-algorithm= and wire it up"),
>> it is not clear to me whether this change can actually reduce code size
>> beyond those 0.1% given there for -Os.
>
> There is r228692 as well.

Ok, summarizing, that changelog says that the simple algorithm
potentially produced even bigger code with -Os than stc did. From that
commit on, this remains true only on x86 and mn10300. Right?


>> So first question:
>> Do you guys know of any code where there are more significant code size
>> savings achieved?
>
> For -O2 it is ~15%, which matters a lot for targets where STC isn't faster
> at all (targets without cache / with tiny cache / with only cache memory).

If I understand you correctly, this means that there is a use case for
having -O2 -freorder-blocks-algorithm=simple, right?

My question is about whether switching the default algorithm for -Os
might make sense, c.f. below.


>> And second question:
>> If that isn't the case, would it possibly make sense to partly revert
>> gcc's behaviour and set -freorder-blocks-algorithm=stc at -Os?
>
> -Os does many other things that are slower but smaller as well.

Sure. Let me restate my original question: assume for a moment that it
is true that -Os with simple never produces code smaller than 0.1% of
what is created by -Os with stc. I haven't got any idea what the "other
things" are able to achieve w.r.t code size savings, but to me, 0.1%
doesn't appear to be that huge. Don't get me wrong: I *really* can't
judge on whether 0.1% is a significant improvement or not. I'm just
assuming that it's not. With this assumption, the question of whether
those saved 0.1% are really worth the significantly decreased
performance encountered in some situations seemed just natural...


> There is no way to ask for somewhat fast and somewhat small at the
> same time, which seems to be what you want?

No, I want small, possibly at the cost of performance to the extent of
what's sensible. What sensible actually is is what my question is about.

Example: A (hypothetical) code size saving of 0.00000000001% at the cost
of 10000000000x slower code certainly isn't. But 0.1% at the cost of
some additional 0.5us here and there -- no clue.

So, summarizing, I'm not asking whether I should use -O2 or -Os or
whatever, but whether the current behaviour I'm seeing with -Os is
intended/expected quantitatively.


Thank you!

Nicolai


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]