Bug 107500 - [12 Regression] Useless atexit entry for ~constant_init in eh_globals.cc
Summary: [12 Regression] Useless atexit entry for ~constant_init in eh_globals.cc
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 12.2.0
: P3 normal
Target Milestone: 12.4
Assignee: Jonathan Wakely
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2022-11-01 17:53 UTC by R. Diez
Modified: 2024-03-18 14:03 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work: 12.1.0, 13.0
Known to fail: 12.2.0
Last reconfirmed: 2022-11-02 00:00:00


Attachments
Simplify lifetime of eh_globals variable (810 bytes, patch)
2022-11-03 11:55 UTC, Jonathan Wakely
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description R. Diez 2022-11-01 17:53:15 UTC
I have this embedded firmware project of mine, which uses Newlib:

https://github.com/rdiez/DebugDue

It is the template for other similar bare-metal projects I have. Even though some targets have very little SRAM (like 16 KiB), I am still using C++ and exceptions. The project above documents how I configure GCC to that effect.

Up until GCC 11.2, I have been doing this check:

  if ( _GLOBAL_ATEXIT != nullptr )
  {
    Panic( "Unexpected entries in atexit table." );
  }

On devices with very little RAM and without an operating system, initialisation and destruction can be tricky. With the check above, I am making sure that nothing unexpected comes up in that respect. I am initialising all static objects manually anyway, to prevent any ordering surprises (the initialisation order of C++ static objects can be problematic too).

The check above fails with GCC 12.2. Apparently, a destructor called constant_init::~constant_init() gets added to the atexit table on start-up.

Because of the way that Newlib works, that wastes 400 bytes of SRAM, which corresponds to sizeof( _global_atexit0 ). The structure has room for 32 atexit calls (because of some ANSI conformance), but we are only using 1 entry.

The interesting thing is that the destructor is supposed to be empty, see:

https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/libsupc%2B%2B/eh_globals.cc

~constant_init() { /* do nothing, union member is not destroyed */ }

GCC generates the following code for that empty destructor:

0008da68 <(anonymous namespace)::constant_init::~constant_init()>:
  8da68:  b580  push  {r7, lr}
  8da6a:  af00  add   r7, sp, #0
  8da6c:  bd80  pop   {r7, pc}

That does not make any sense. Is there a way to prevent GCC from registering such an empty atexit function? Failing that, is there a way to prevent GCC from registering a particular atexit function, even if it is not empty?

I find surprising that GCC emits such code. My project is building its own GCC/Newlib toolchain with optimisation level "-Os", so I would expect at least the "add r7, sp, #0" to be optimised away.
Comment 1 Jonathan Wakely 2022-11-02 12:07:09 UTC
(In reply to R. Diez from comment #0)
> The check above fails with GCC 12.2. Apparently, a destructor called
> constant_init::~constant_init() gets added to the atexit table on start-up.

This was done to make the global object in that file immortal. The constant_init wrapper gets destroyed, but its subobject doesn't, so code that refers to it later is ... slightly less undefined.

> Because of the way that Newlib works, that wastes 400 bytes of SRAM, which
> corresponds to sizeof( _global_atexit0 ). The structure has room for 32
> atexit calls (because of some ANSI conformance), but we are only using 1
> entry.
> 
> The interesting thing is that the destructor is supposed to be empty, see:
> 
> https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/libsupc%2B%2B/
> eh_globals.cc
> 
> ~constant_init() { /* do nothing, union member is not destroyed */ }
> 
> GCC generates the following code for that empty destructor:
> 
> 0008da68 <(anonymous namespace)::constant_init::~constant_init()>:
>   8da68:  b580  push  {r7, lr}
>   8da6a:  af00  add   r7, sp, #0
>   8da6c:  bd80  pop   {r7, pc}
> 
> That does not make any sense. Is there a way to prevent GCC from registering
> such an empty atexit function? Failing that, is there a way to prevent GCC
> from registering a particular atexit function, even if it is not empty?

If we had clang's no_destroy attribute then we could use that on those constant_init wrappers:
https://clang.llvm.org/docs/AttributeReference.html#no-destroy


> I find surprising that GCC emits such code. My project is building its own
> GCC/Newlib toolchain with optimisation level "-Os", so I would expect at
> least the "add r7, sp, #0" to be optimised away.

Agreed, I don't think we should be emitting cleanup if it's a no-op. But I don't know if the compiler is able to tell it's a no-op at the point where it emits that code. The attribute would tell the compiler it's OK to do that.
Comment 2 Jonathan Wakely 2022-11-02 12:08:09 UTC
I don't think this is a libstdc++ bug. We really do want that no-op destructor to be in the library. It just shouldn't generate any code, but that's a compiler problem.
Comment 3 Jonathan Wakely 2022-11-02 12:13:38 UTC
N.B. making the dtor constexpr doesn't change anything. Unsurprising, since that just says it's usable in constant expressions, it doesn't mean the destructor is trivial and can be elided. Worth a try though.
Comment 4 R. Diez 2022-11-02 12:46:06 UTC
The 'constant_init' wrapper with the 'union' inside is a contrived hack, isn't it? We may as well use a different hack then.

How about a combination of '__attribute__ constructor' and 'placement new' like this?

uint8_t some_buffer[ sizeof( __cxa_eh_globals ) ];

// All objects with an init_priority attribute are constructed before any
// object with no init_priority attribute.
#define SOME_INIT_PRIORITY  200  // Priority range [101, 65535].

static __attribute__ ((constructor (SOME_INIT_PRIORITY))) void MyHackForInitWithoutAtExitDestructor ( void ) throw()
{
  // Placement new.
  new ( some_buffer ) __cxa_eh_globals();
}

You would then need a 'get_eh_globals()' wrapper to return a pointer or a reference to a '__cxa_eh_globals' object from 'some_buffer', by doing a type cast. Everybody should then use the wrapper to access that singleton object.
Comment 5 R. Diez 2022-11-02 13:01:15 UTC
I know very little about GCC, but it is a very smart compiler, so I am having a hard time understanding how GCC could miss so many optimisations. After all, even when compiling with little optimisation, GCC seems able to discard unused code rather well.

In my project, I am building my own toolchain with this makefile:

https://github.com/rdiez/DebugDue/blob/master/Toolchain/Makefile

I haven't verified it this time around, but I believe that Newlib is being built with '-Os' optimisation. Search for COMMON_CFLAGS_FOR_TARGET in that makefile, which eventually gets passed to Newlib in CFLAGS_FOR_TARGET.

First of all, GCC seems unable to generate an empty routine or destructor, or at least flag it as being effectively empty. The caller should then realise that it is empty, so it is not worth generating an atexit call for it.

Secondly, I am not ARM Thumb assembly expert either, but shouldn't "add r7, sp, #0" be optimised away? After all, nobody is really using R7 in that routine.

And finally, what is the point of generating a function prolog and epilog which saves and restores context from the stack? If the routine is pretty much empty, but cannot be really empty, shouldn't some kind of RET instruction suffice?
Comment 6 Jonathan Wakely 2022-11-02 13:04:51 UTC
You can't use placement new in a constexpr constructor, so it can't be constinit, which means it is susceptible to the static initialization order fiasco. init priority attributes are also a hack, and less portable.

The whole point of this is to ensure the global is accessible *before* any user code runs, and still accessible *after* static destructors run. Saving a few bytes is less important than correctness.

If you really need to avoid it, you can provide a dummy atexit that doesn't register the destructor, or wait for a possible compiler improvement to optimize it away. I'm not going to change libstdc++ to stop doing this.
Comment 7 Jonathan Wakely 2022-11-02 13:09:42 UTC
(In reply to R. Diez from comment #5)
> First of all, GCC seems unable to generate an empty routine or destructor,
> or at least flag it as being effectively empty. The caller should then
> realise that it is empty, so it is not worth generating an atexit call for
> it.

The object has a user-provided destructor, which is necessary for the immortalization trick to work. Being user-provided means it's non-trivial and that means the destructor must be run. To destroy a gobal object, atexit it used.

Now if the compiler can tell that the destructor actually has no side effects, it's unobservable whether the atexit call ever happens. But that takes non-trivial analysis of the destructor body and hard-coded knowledge that atexit has no *other* observable side effects except registering the destructor This all requires special case handling in the compiler, not just "turn on -Os and all the code gets removed".
Comment 8 R. Diez 2022-11-02 13:29:14 UTC
Why does this 'eh_globals' object have to use a constexpr constructor?

How does the current code avoid the "static initialization order fiasco"? If the user defines his/her own static C++ objects, how is it guaranteed now that 'eh_globals' is initialised before all other user code?

Isn't using the "__attribute__ constructor" trick safer anyway? With it, you can document what priority levels libstdc++ uses. The user may even want to run a few routines before libstdc++ initialises. Flexibility in the initialisation order is often important in embedded environments.

Portability is not really an issue. You can just "#ifdef GCC" around the "better" hack. Is GCC not using "__attribute__ constructor" internally anyway to implement such static constructors? So anybody using C++ with GCC must support that mechanism already.

And about saving a few bytes, 400 bytes is no small amount in tightly-embedded environments. But it is not just the amount of memory. As I mentioned, my code is checking that nothing unexpected registers an atexit() destructor. If libstdc++ does that on start-up, it becomes hard to tell whether something unexpected has been added recently.

I can surely put up with yet another little annoyance with this new GCC version. But bear in mind that flexibility and attention to detail in the embedded world is one of GCC's few remaining bastions. If GCC starts dropping the ball here too, even more people will consider moving to clang.
Comment 9 R. Diez 2022-11-02 13:36:48 UTC
> [...]
> not just "turn on -Os and all the code gets removed".

I am sure that the solution is not as trivial as "turn on -Os". But, as an outsider, it is hard to believe that it "takes non-trivial analysis of the destructor body". The destructor is empty!

I am not talking about the GCC optimiser realising afterwards that the code is generating an atexit() entry that does nothing. I am saying that GCC should not generate so much code for an empty function for starters, and that GCC should not generate the destructor registration at all if the destructor is empty. I would imagine that those steps come before the optimiser gets to see the generated code.
Comment 10 Jonathan Wakely 2022-11-02 14:17:00 UTC
(In reply to R. Diez from comment #8)
> Why does this 'eh_globals' object have to use a constexpr constructor?

So it can be constinit.
 
> How does the current code avoid the "static initialization order fiasco"? If
> the user defines his/her own static C++ objects, how is it guaranteed now
> that 'eh_globals' is initialised before all other user code?

Because that's what constinit means.
https://en.cppreference.com/w/cpp/language/constinit

> Isn't using the "__attribute__ constructor" trick safer anyway? With it, you
> can document what priority levels libstdc++ uses. The user may even want to
> run a few routines before libstdc++ initialises. Flexibility in the
> initialisation order is often important in embedded environments.
> 
> Portability is not really an issue. You can just "#ifdef GCC" around the
> "better" hack. Is GCC not using "__attribute__ constructor" internally
> anyway to implement such static constructors? So anybody using C++ with GCC
> must support that mechanism already.

No, it doesn't work on all targets supported by GCC and libstdc++.

> And about saving a few bytes, 400 bytes is no small amount in
> tightly-embedded environments. But it is not just the amount of memory. As I
> mentioned, my code is checking that nothing unexpected registers an atexit()
> destructor. If libstdc++ does that on start-up, it becomes hard to tell
> whether something unexpected has been added recently.
> 
> I can surely put up with yet another little annoyance with this new GCC
> version. But bear in mind that flexibility and attention to detail in the
> embedded world is one of GCC's few remaining bastions. If GCC starts
> dropping the ball here too, even more people will consider moving to clang.

The "if you don't fix this people will switch to clang" threat is not as motivating as you might think. It gets tedious after the thousandth time you hear it.

I've confirmed the bug as a missed-optimization, and suggested ways the compiler might be able to solve it. I am not going to reintroduce race conditions in libstdc++ to work around your needs here.

If you think GCC (and libstdc++ in particular) doesn't care about embedded then you're not paying attention. The changes to eh_globals.cc were introduced for PR105880 specifically to solve a bug on embedded targets using newlib. And there have been more than a dozen commits in the past month making huge parts of libstdc++ more usable on bare metal.

If you think the codegen for empty functions can be improved, please file a separate bug as that's not the same topic as optimizing away atexit calls for empty destructors.
Comment 11 Jonathan Wakely 2022-11-02 14:38:41 UTC
Reduced:

extern "C" int puts(const char*);

struct A
{
  constexpr A()  { }
  ~A() { puts("bye"); }
};

namespace
{
  struct constant_init
  {
    union {
      A obj;
    };
    constexpr constant_init() : obj() { }

    ~constant_init() { /* do nothing, union member is not destroyed */ }
  };


  // Single-threaded fallback buffer.
  constinit constant_init global;
}

extern "C" A* get() { return &global.obj; }

With -std=c++20 -Os GCC emits a call to atexit:

(anonymous namespace)::constant_init::~constant_init() [base object destructor]:
        ret
get:
        mov     eax, OFFSET FLAT:_ZN12_GLOBAL__N_16globalE
        ret
_GLOBAL__sub_I_get:
        mov     edx, OFFSET FLAT:__dso_handle
        mov     esi, OFFSET FLAT:_ZN12_GLOBAL__N_16globalE
        mov     edi, OFFSET FLAT:_ZN12_GLOBAL__N_113constant_initD1Ev
        jmp     __cxa_atexit


With the same options, Clang doesn't:

get:                                    # @get
        lea     rax, [rip + (anonymous namespace)::global]
        ret
Comment 12 Andrew Pinski 2022-11-02 15:47:08 UTC
(In reply to Jonathan Wakely from comment #11)
> Reduced:

And that is a dup of bug 19661.

*** This bug has been marked as a duplicate of bug 19661 ***
Comment 13 R. Diez 2022-11-02 16:12:20 UTC
From your comments about "constexpr constructor" and "constinit", I gather that the "eh_globals" singleton is guaranteed then to be initialised very early, earlier than anything else that might use it, right? But that does not guarantee that it will be destroyed later than anything that wants to use it, is that correct? That is why we need the hack, to make it outlive all potential users.

I am now trying to understand the implications of not destroying "__cxa_eh_globals obj" inside the "eh_globals" singleton, at least in the case of single-threaded (bare metal) embedded software. Hopefully, I can learn a little more along the way about how C++ exception handling works.

As far as I can see, "struct __cxa_eh_globals" in unwind-cxx.h has 1 or 2 pointers and no destructor:

struct __cxa_eh_globals
{
  __cxa_exception *caughtExceptions;
  unsigned int uncaughtExceptions;
#ifdef __ARM_EABI_UNWINDER__
  __cxa_exception* propagatingExceptions;
#endif
};

Therefore, destroying this object should have real no effect. I wonder why there was a problem to fix in 'eh_globals' then.

I am guessing that some static analyser, or checker instrumentation, may now complain that static object 'eh_globals', or at least its member 'obj', has not been properly destroyed upon termination. Normally, that would mean a risk of leaking some memory, but I am guessing that the last thread can never have any such 'caughtExceptions' or 'propagatingExceptions' left upon termination, right?

So, theoretically, instead of leaving the destructor for the singleton empty, we could add asserts that those 2 pointers are nullptr. Or am I missing something?

This all feels iffy. If I understand this correctly, it is impossible for GCC to guarantee the correct construction and destruction order of such global objects, and that is why we are hacking our way out. The reason is mainly, that not all targets support "__attribute__ constructor", so there is no way to implement a deterministic initialisation and destruction order for everybody. Is that right?
Comment 14 Jonathan Wakely 2022-11-02 17:16:22 UTC
Not FIXED, nothing was fixed here, it's a dup.

*** This bug has been marked as a duplicate of bug 19661 ***
Comment 15 Jonathan Wakely 2022-11-02 17:46:15 UTC
(In reply to R. Diez from comment #13)
> From your comments about "constexpr constructor" and "constinit", I gather
> that the "eh_globals" singleton is guaranteed then to be initialised very
> early, earlier than anything else that might use it, right? But that does

At compile time, before any code executes.

> not guarantee that it will be destroyed later than anything that wants to
> use it, is that correct? That is why we need the hack, to make it outlive
> all potential users.

Yes, the destruction is dynamic, and so not ordered relative to destructors in other translation units.

> Therefore, destroying this object should have real no effect. I wonder why
> there was a problem to fix in 'eh_globals' then.

Read the bug. There was another global with a non-trivial destructor, which had problems due to destructor ordering.

Without using the constant_init type to make the eh_globals variable immortal, its destructor would still logically "run" at some point before or after destruction of the other globals in that translation unit. That is true even if the destructor is trivial and no code actually "runs". The current solution avoids that, because its destructor never runs. Its lifetime would end when the storage is reused, but it's a global so that doesn't happen. The object is effectively immortal, and we don't run foul of optimizers making assumptions about object lifetime.

> This all feels iffy. If I understand this correctly, it is impossible for
> GCC to guarantee the correct construction and destruction order of such
> global objects, and that is why we are hacking our way out. The reason is
> mainly, that not all targets support "__attribute__ constructor", so there
> is no way to implement a deterministic initialisation and destruction order
> for everybody. Is that right?

I don't consider this a hack, unlike using init priority.

Constant initialization is better than dynamic initialization where possible, so I see no reason to forego constant initialization for dynamic initialization that then needs init priority to make it work reliably. Constant init is always reliable.
Comment 16 R. Diez 2022-11-03 07:57:40 UTC
I am slowly arriving at a different conclusion.

"struct __cxa_eh_globals" has neither a constructor nor a destructor. Its members are pointers or integers, so GCC will not have automatically generated any constructor or destructor for this structure.

Therefore, variable "static __cxa_eh_globals eh_globals" in the past was already initialised before any users (probably because static memory being zeroed on start-up). The situation has not changed now.

This static variable already outlived anything else, as there was no destructor changing anything on termination.

Your patch introduced wrapper "struct constant_init" around it, which makes GCC generate a constructor for the wrapper solely to register an empty destructor with atexit(). Otherwise, the wrapper does nothing else useful.

Your patch also changes '__eh_globals_init::_M_init' (a different global object) to a static member. Is that not enough to fix the original problem?
Comment 17 Jonathan Wakely 2022-11-03 11:52:10 UTC
(In reply to R. Diez from comment #16)
> Therefore, variable "static __cxa_eh_globals eh_globals" in the past was
> already initialised before any users (probably because static memory being
> zeroed on start-up). The situation has not changed now.

Now it's enforced by the compiler and will fail to compile if for some reason it can't be statically initialized.

> This static variable already outlived anything else, as there was no
> destructor changing anything on termination.

Semantically, no, that's wrong. The C++ standard still says its lifetime ends, and the compiler can optimize based on that. It's storage is not reused, but its lifetime does end.

But it also says the lifetime of the wrapper ends, so accessing eh_globals.obj is still semantically undefined.

> Your patch introduced wrapper "struct constant_init" around it, which makes
> GCC generate a constructor for the wrapper solely to register an empty
> destructor with atexit(). Otherwise, the wrapper does nothing else useful.
> 
> Your patch also changes '__eh_globals_init::_M_init' (a different global
> object) to a static member. Is that not enough to fix the original problem?

Yes, probably.
Comment 18 Jonathan Wakely 2022-11-03 11:55:11 UTC
Created attachment 53823 [details]
Simplify lifetime of eh_globals variable

Please test this.
Comment 19 Sebastian Huber 2022-11-04 09:42:02 UTC
I just have a side note. Enabling the thread-local storage may be an option for targets with limited memory resources. Newlib has the configuration option --enable-newlib-reent-thread-local to use thread-local objects instead of one big struct _reent.
Comment 20 R. Diez 2022-11-04 12:43:11 UTC
I had to modify the patch slightly. I guess that union member "unsigned char unused;" was removed after GCC 12.2 was released.

But otherwise, the patch does work, at least in my bare-metal scenario. The atexit entry is no longer being generated, and I haven't seen any other side-effects yet.
Comment 21 GCC Commits 2022-11-04 14:05:56 UTC
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:418999fe382c608facf57f96b53a9cb12d2fdd20

commit r13-3685-g418999fe382c608facf57f96b53a9cb12d2fdd20
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Thu Nov 3 11:48:57 2022 +0000

    libstdc++: Simplify lifetime of eh_globals variable [PR107500]
    
    Since this is a trivial type, we probably don't need to do anything to
    ensure it's still accessible after other static dtors.
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/107500
            * libsupc++/eh_globals.cc (eh_globals): Remove immortalizing
            wrapper.
            (__cxxabiv1::__cxa_get_globals_fast): Adjust.
            (__cxxabiv1::__cxa_get_globals): Adjust.
Comment 22 Jonathan Wakely 2022-11-04 14:15:44 UTC
Fixed for GCC 13 then. Backport to GCC 12 to be decided later.
Comment 23 R. Diez 2022-11-04 14:58:10 UTC
Many thanks for the fix.

If you backport it to GCC 12.x, I won't be able to complain so much. ;-)
Comment 24 R. Diez 2022-11-04 20:40:38 UTC
In case somebody else wants to patch their GCC 12.2, here is the slightly-modified patch for convenience:

https://github.com/rdiez/DebugDue/blob/master/Toolchain/Patches/Gcc12EhGlobalsAtexit.patch
Comment 25 Richard Biener 2023-05-08 12:25:50 UTC
GCC 12.3 is being released, retargeting bugs to GCC 12.4.
Comment 26 Jonathan Wakely 2024-03-16 10:25:14 UTC
I'm not planning to backport this.
Comment 27 Jonathan Wakely 2024-03-16 10:29:59 UTC
Strike that - this was a regression in 12.2 compared to 12.1 (bit wasn't previously marked as such) so should be backported.
Comment 28 GCC Commits 2024-03-18 14:03:27 UTC
The releases/gcc-12 branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:d9076dafa12c93e96b349035fb59050151403866

commit r12-10222-gd9076dafa12c93e96b349035fb59050151403866
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Thu Nov 3 11:48:57 2022 +0000

    libstdc++: Simplify lifetime of eh_globals variable [PR107500]
    
    Since this is a trivial type, we probably don't need to do anything to
    ensure it's still accessible after other static dtors.
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/107500
            * libsupc++/eh_globals.cc (eh_globals): Remove immortalizing
            wrapper.
            (__cxxabiv1::__cxa_get_globals_fast): Adjust.
            (__cxxabiv1::__cxa_get_globals): Adjust.
    
    (cherry picked from commit 418999fe382c608facf57f96b53a9cb12d2fdd20)