Would it make sense to use attribute ((noinline)) for functions like _M_realloc_insert?

Mon Aug 24 09:19:52 GMT 2020

> Could you share which version of gcc you are testing with, what flags you are using, and ideally even some code to reproduce this? With current gcc, I don't see that happening very often.

I stumbled upon this while looking at the generated code with godbolt.org. I tried GCC 9.2 and 10.2. Flags were -std=c++17 -O2, architecture was amd64.

The code where I saw this was

https://godbolt.org/z/oG9EMM
// -------------------------------------------------

#include <vector>

struct Foo {
    Foo(int a, int b) : a(a), b(b) {}
    int a;
    int b;
};

class Bar {
public:
    void test(int a, int b);

private:
    std::vector<Foo> m_fooVec;
};

void Bar::test(int a, int b) {
    m_fooVec.emplace_back(a, b);
}

// -------------------------------------------------

> Ideally, the inliner would already be clever enough not to inline the function except in special cases.

After writing this mail, I found out that it only happens if the function is called just once in the translation unit. So I think it's not that big of a problem. I wish we could use LTCG/LTO with our project. Maybe I should give that a try, see how badly it will affect our build times. But since our link times are already kind-of high, I don't have high hopes.

Maybe it would also be worth grepping over our code to see where we have push_back/emplace_back/insert in header files. Those would have a high probability of creating just one call to a particular specialization of _M_realloc_insert per translation unit, but in multiple translation units.

> Does profile-guided optimization help with your application?

I guess it probably would, if we were able to do it right. It's a huge product though and I don't think we'll be able to use PGO in the near future (or at all). As far as I understand PGO heavily relies on the application used for profiling covering all common use cases in a realistic fashion. And that's something that we simply cannot do.

Anyway, I understand your reasoning. When I wrote my mail I didn't know yet that the behavior I was seeing was only triggered by the more aggressive inlining for functions that are called only once per TU. And I didn't think of some of the things you mentioned.

Just out of curiosity, going back to...

> If someone inserts an element in a newly created vector, it does make sense to inline _M_realloc_insert, especially if we want any hope of making some small local vectors use the stack, or removing some unused small vectors.

Is that something that we can realistically expect in the next few years? I've seen manual new/delete calls removed by the optimizer in very simple examples, but I've never seen that happen with any kind of container. Seems like as soon as std::allocator is involved, the calls will not be optimized out.

Regards,
Paul

-----Original Message-----
From: Marc Glisse <marc.glisse@inria.fr>
Sent: Sonntag, 23. August 2020 21:55
To: Groke, Paul <paul.groke@dynatrace.com>
Cc: libstdc++@gcc.gnu.org
Subject: Re: Would it make sense to use __attribute__ ((noinline)) for functions like _M_realloc_insert?

On Fri, 21 Aug 2020, Groke, Paul via Libstdc++ wrote:

> I've recently noticed that GCC is inlining vector::_M_realloc_insert
> into some of my functions that call vector::emplace_back.

Could you share which version of gcc you are testing with, what flags you are using, and ideally even some code to reproduce this? With current gcc, I don't see that happening very often.

> IMO that doesn't make a lot of sense - reallocation is usually so
> expensive that an extra function call doesn't really matter. And it
> bloats the code, which has two drawbacks: the binary gets bigger and
> the fast path gets slower (less efficient pre-fetching/more cache
> thrashing).
>
> What do you think about marking such functions __attribute__ ((noinline))?

That's a possibility, but I'd rather avoid it if possible. If someone inserts an element in a newly created vector, it does make sense to inline _M_realloc_insert, especially if we want any hope of making some small local vectors use the stack, or removing some unused small vectors.

Ideally, the inliner would already be clever enough not to inline the function except in special cases. It is rather large, calls other functions, is called with low probability (gcc guesses it as 17% on a simple example, using profile guided optimization would lower that number if the vectors are usually large), etc. This code isn't doing anything unusual (not like std::function or std::any), if the inliner behaves badly there, maybe a heuristic needs some tweaking.

> In our own code, we've seen measurable improvements (size & execution speed) by splitting the "slow path" out into helper functions and making those helper functions noinline.

Yes, that's a common strategy, and it would indeed be good to make sure that gcc does the right thing for std::vector.

> (I've also noticed that there's no __builtin_expect for the fast path,
> but that's a different topic and IMO far less important.)

Inlining takes probabilities into account, so this may be strongly related. However, gcc already guesses that the fast path is the most likely one, it isn't clear that making the probability of the slow path too low is a good idea.

Does profile-guided optimization help with your application?

--
Marc Glisse

The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a company registered in Linz whose registered office is at 4020 Linz, Austria, Am Fünfundzwanziger Turm 20