Bug 94960 - extern template prevents inlining of standard library objects
Summary: extern template prevents inlining of standard library objects
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 9.1.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2020-05-05 20:09 UTC by krzysio.kurek
Modified: 2022-02-18 00:53 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2020-05-05 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description krzysio.kurek 2020-05-05 20:09:58 UTC
Consider this example
void foo()
{
  std::string(1, 0);
}
(https://godbolt.org/z/AlkBBJ)
This function creates a string using the `basic_string(size_t, CharT)` constructor and then discards it. This particular constructor uses _M_construct internally, which is declared as an out of line member function. Because of this, and because the function isn't marked as `inline`, when the compiler reaches the `extern template class basic_string<char>;`, it foregoes trying to find the definition for _M_construct, instead generating a call to it, causing foo() to fully instantiate a string object and then delete it, since the compiler can't find _M_construct within its own translation unit.

This problem applies to every member function of any class which has an extern template, is defined out of line and is not marked as `inline`.
Comment 1 Andrew Pinski 2020-05-05 21:02:33 UTC
g:1a289fa36294627c252492e4c18d7877a7c80dc1 changed that.
Comment 2 Jonathan Wakely 2020-05-05 21:08:31 UTC
Please provide complete testcases, not just URLs, as required by https://gcc.gnu.org/bugs

#include <string>

int main()
{
    std::string(size_t(0), 0);
}


I still think it's wrong for GCC to treat the 'inline' specifier as an inlining hint. The compiler should be a better judge of inlining decisions than the developer.

(In reply to Andrew Pinski from comment #1)
> g:1a289fa36294627c252492e4c18d7877a7c80dc1 changed that.

Well that commit just meant that the explicit instantiations are declared for C++17 as well, where previously they were only declared for < C++17. It didn't add the explicit instantiations.
Comment 3 Erich Keane 2020-05-06 00:19:53 UTC
(In reply to Jonathan Wakely from comment #2)
> Please provide complete testcases, not just URLs, as required by
> https://gcc.gnu.org/bugs
> 
> #include <string>
> 
> int main()
> {
>     std::string(size_t(0), 0);
> }
> 
> 
> I still think it's wrong for GCC to treat the 'inline' specifier as an
> inlining hint. The compiler should be a better judge of inlining decisions
> than the developer.
> 
> (In reply to Andrew Pinski from comment #1)
> > g:1a289fa36294627c252492e4c18d7877a7c80dc1 changed that.
> 
> Well that commit just meant that the explicit instantiations are declared
> for C++17 as well, where previously they were only declared for < C++17. It
> didn't add the explicit instantiations.

Hi Jon!
I helped the submitter in #llvm debug this a little, so I perhaps have a better understanding of his issue:

As you know, "extern template" is a hint to the compiler that we don't need to emit the template as a way to save on compile time.

Both GCC and clang will NOT instantiate these templates in O0 mode.  However, in O1+ modes, both will actually still instantiate the templates in the frontend, BUT only for 'inline' functions.  Basically, we're using 'inline' as a heuristic that there is benefit in sending these functions to the optimizer (basically, sacrificing the compile time gained by 'extern template' in exchange for a better inlining experience).

In the submitter's case, the std::string constructor calls "_M_construct".  The constructor is inlined, but _M_construct is not, since it never gets to the optimizer.

libc++ uses an __init function to do the same thing as _M_construct, however IT is marked inline, and thus doesn't have the problem.

I believe the submitter wants to have you mark more of the functions in extern-templated classes 'inline' so that it matches the heuristic better.

I don't think that there is a good way to change the compiler itself without making 'extern template' absolutely meaningless.
Comment 4 Richard Biener 2020-05-06 07:05:03 UTC
I guess the C++ FE could honor -finline-functions and consider all functions having the 'inline' hint in that case.  I'm not sure how wide-spread
explicit instantiations are and what compile-time (and size?) hit we get
when doing the instantiations always.

That is, is the middle-end smart enough to not emit out-of-line instances
for the inline instantiated extern template parts?  Off the top of my head
I'm not aware of any middle-end flagging of this?
Comment 5 Jonathan Wakely 2020-05-06 09:11:09 UTC
(In reply to Erich Keane from comment #3)
> As you know, "extern template" is a hint to the compiler that we don't need
> to emit the template as a way to save on compile time.
> 
> Both GCC and clang will NOT instantiate these templates in O0 mode. 
> However, in O1+ modes, both will actually still instantiate the templates in
> the frontend, BUT only for 'inline' functions.  Basically, we're using
> 'inline' as a heuristic that there is benefit in sending these functions to
> the optimizer (basically, sacrificing the compile time gained by 'extern
> template' in exchange for a better inlining experience).

Hmm, I've seen different behaviours for clang and g++ in this respect, with clang inlining a lot more of std::string's members. So I'm surprised they use the same heuristic.

Do they both instantiate the function templates marked 'inline' even at -O1? Presumably not at -O0.

> In the submitter's case, the std::string constructor calls "_M_construct". 
> The constructor is inlined, but _M_construct is not, since it never gets to
> the optimizer.
> 
> libc++ uses an __init function to do the same thing as _M_construct, however
> IT is marked inline, and thus doesn't have the problem.
> 
> I believe the submitter wants to have you mark more of the functions in
> extern-templated classes 'inline' so that it matches the heuristic better.

And that's what I don't want to do. I think it's wrong for the human to say "inline this!" because humans are stupid (well, I am anyway). And I don't want to have to examine the GIMPLE/asm again for every new GCC release to decide whether 'inline' is still in the right places (and whether the answer should be different for every different version of Clang or ICC!)

And when I say "I don't want to" I mean "I am never ever going to".

> I don't think that there is a good way to change the compiler itself without
> making 'extern template' absolutely meaningless.

I absolutely disagree.

It would still give a reduction in object file size for cases where the compiler decides not to inline, and still make compilation much faster for -O0 and -O1.

One property of -O2 and -O3 is that we try to optimize aggressively even if that takes a long time to compile. So we could instantiate things that have an explicit instantiation declaration (thus doing "redundant" work) to see if inlining them would be beneficial. That would take longer to compile, but might produce faster code. If the heuristics decide the instantiation ends up too big to inline, it could just discard it (because we know there's a definition elsewhere).

If the only way to get that is to mark every function as 'inline' (and then "trick" the compiler into doing all that extra work even at -O1?) then we might as well add 'inline' to every single function template in <string> and <istream>, <ostream>, <streambuf> etc. so they're all potential candiates for inlining.

And if we have to mark every single function as 'inline' then maybe the compiler shouldn't be using it as a hint.
Comment 6 Erich Keane 2020-05-06 13:01:28 UTC
(In reply to Jonathan Wakely from comment #5)
> (In reply to Erich Keane from comment #3)
> > As you know, "extern template" is a hint to the compiler that we don't need
> > to emit the template as a way to save on compile time.
> > 
> > Both GCC and clang will NOT instantiate these templates in O0 mode. 
> > However, in O1+ modes, both will actually still instantiate the templates in
> > the frontend, BUT only for 'inline' functions.  Basically, we're using
> > 'inline' as a heuristic that there is benefit in sending these functions to
> > the optimizer (basically, sacrificing the compile time gained by 'extern
> > template' in exchange for a better inlining experience).
> 
> Hmm, I've seen different behaviours for clang and g++ in this respect, with
> clang inlining a lot more of std::string's members. So I'm surprised they
> use the same heuristic.
> 
> Do they both instantiate the function templates marked 'inline' even at -O1?
> Presumably not at -O0.

My understanding of Clang is based on a brief debugging session. My understanding of GCC's behavior here is a brief amount of time messing around on godbolt. I could very well be incorrect.


> 
> > In the submitter's case, the std::string constructor calls "_M_construct". 
> > The constructor is inlined, but _M_construct is not, since it never gets to
> > the optimizer.
> > 
> > libc++ uses an __init function to do the same thing as _M_construct, however
> > IT is marked inline, and thus doesn't have the problem.
> > 
> > I believe the submitter wants to have you mark more of the functions in
> > extern-templated classes 'inline' so that it matches the heuristic better.
> 
> And that's what I don't want to do. I think it's wrong for the human to say
> "inline this!" because humans are stupid (well, I am anyway). And I don't
> want to have to examine the GIMPLE/asm again for every new GCC release to
> decide whether 'inline' is still in the right places (and whether the answer
> should be different for every different version of Clang or ICC!)
> 
> And when I say "I don't want to" I mean "I am never ever going to".
> 
> > I don't think that there is a good way to change the compiler itself without
> > making 'extern template' absolutely meaningless.
> 
> I absolutely disagree.
> 
> It would still give a reduction in object file size for cases where the
> compiler decides not to inline, and still make compilation much faster for
> -O0 and -O1.

That is fair, I guess it would slightly reduce 'link' time because of that. I doubt people would be willing to put up with the STL compiling that much slower though (which seems to be the major user of this feature in my experience).
 
> One property of -O2 and -O3 is that we try to optimize aggressively even if
> that takes a long time to compile. So we could instantiate things that have
> an explicit instantiation declaration (thus doing "redundant" work) to see
> if inlining them would be beneficial. That would take longer to compile, but
> might produce faster code. If the heuristics decide the instantiation ends
> up too big to inline, it could just discard it (because we know there's a
> definition elsewhere).

That is essentially what the frontends DO, except only with the 'inline' functions.  If the inliner chooses to not inline it, it gets thrown out (since we've marked it 'available externally').
 
> If the only way to get that is to mark every function as 'inline' (and then
> "trick" the compiler into doing all that extra work even at -O1?) then we
> might as well add 'inline' to every single function template in <string> and
> <istream>, <ostream>, <streambuf> etc. so they're all potential candiates
> for inlining.
> 
> And if we have to mark every single function as 'inline' then maybe the
> compiler shouldn't be using it as a hint.

I don't think the idea is to mark EVERY function 'inline', simply ones that are pretty tiny and really good candidates for inlining.
Comment 7 Jason Merrill 2022-02-17 17:01:49 UTC
C++17 and below said, 

Except for inline functions and variables, declarations with types deduced from their initializer or return value (10.1.7.4), const variables of literal types, variables of reference types, and class template specializations, explicit instantiation declarations have the effect of suppressing the implicit instantiation of the entity to which they refer. [ Note: The intent is that an inline function that is the subject of an explicit instantiation declaration will still be implicitly instantiated when odr-used (6.2) so that the body can be considered for inlining, but that no out-of-line copy of the inline function would be generated in the translation unit. — end note ]

This wording was changed in C++20 by P1815, a modules paper, but I believe the replacement wording still says that the a function is not implicitly instantiated  after an explicit instantiation declaration unless it is inline or has a deduced return type.  And if it isn't instantiated, it can't be inlined.

So if you want an *explicit instantiation declaration* to still be considered for inlining, you need to declare it inline.  Most templates don't need to be declared inline, only those that have implicit instantiations suppressed by 'extern template'.

So, I think the string case specifically is a library isssue.
Comment 8 Jonathan Wakely 2022-02-17 23:54:45 UTC
(In reply to Erich Keane from comment #6)
> (In reply to Jonathan Wakely from comment #5)
> > And if we have to mark every single function as 'inline' then maybe the
> > compiler shouldn't be using it as a hint.
> 
> I don't think the idea is to mark EVERY function 'inline', simply ones that
> are pretty tiny and really good candidates for inlining.

But that's exactly what we do. _M_construct isn't tiny, it has two loops (and until quite recently, a try-catch block, but that's been replaced). There are some functions in <bits/basic_string.tcc> which are probably small enough to be marked 'inline', so I should review those. Not for GCC 12 though.

But in C++20 every function is 'constexpr' now, so every function is inline anyway, right? Even the large functions that aren't good candidates for inlining (see also PR 93008). So The 'inline' keyword has lost all meaning in <string> now.
Comment 9 Erich Keane 2022-02-18 00:53:51 UTC
> But in C++20 every function is 'constexpr' now, so every function is inline
> anyway, right? Even the large functions that aren't good candidates for
> inlining (see also PR 93008). So The 'inline' keyword has lost all meaning
> in <string> now.

Do you mean 'every function in std::string'?  If so, you'd know better than I. In a general case, every function is NOT 'constexpr', and that didn't pass EWG.