When some object having empty destructor is declared as static in the local context of some function g++ still inserts <atexit> call with the empty handler like <__tcf_0> below. It may seem like minor issue but redundant instructions increase chances for I-cache misses and worthen processor pipeline. Yuri 08048800 <__tcf_0>: 8048800: 55 push %ebp 8048801: 89 e5 mov %esp,%ebp 8048803: 5d pop %ebp 8048804: c3 ret --testcase-- begin 644 case-empty-atexit.tgz M'XL(`.1G^4$``^V6VVJ#0!"&O=ZG&-*;))!D/:Q>)!1"GV31:2.83=`U&$KZ M[!U-BI)#<U%)"YT/85?\]Z#C/[.Q+G""ZZW=3[3%*K4SIW>D#&2D%+5*JLBG M5DHW4$U[PG&E&X5>Z(>^YTC7#X/(`=7_5BXI"ZMS`&=?YNEWN@1WF&VVC]C2 M(XDOXE]-7WI>0[I2AD%P,_X4^:_XUW\`Z0//"QV0/>_C*O\\_D(\I2;.R@1A M4$U7`R&PLI@;V&W2!,H"AZ.Y$/2-;!K#$C3=B+5.S7`$[P(@WI06%@L8H*%1 M@[J+)LGF].@TMJ/)4.^PHSF(WWY[YIK_5SVO<<__7NM_GZ2-_U7$_G\$'?\O MTDUA<]3K9U$6J7D#H]=8;'6,4-BDR0)Y&5O*`K7SEW4&:+V]I*XY"C!I/0X' MDC:IY-6<#Z`$<:[\.,XZ&W=D"5Y..YZ1_##G_/%CKOA_0H'I]0QPM_Y+U=;_ M*"*]HHO]_PANE?]CJ6\/`8WGN\<``#VM/<UEG&$8AF$8AF$8AF$8AF$8AF$8 -AF'^#)_;GM5=`"@````` ` end
Confirmed, although I consider this to be a rather minor point since the code is actually run only once. Here's a small test: -------------------- struct A { A(); ~A() {} }; void foo () { static A a; } ------------------------- g/x> /home/bangerth/bin/gcc-4.0-pre/bin/c++ -S -O3 x.cc g/x> cat x.s | c++filt > x.ss .... foo(): .LFB5: pushl %ebp .LCFI2: movl %esp, %ebp .LCFI3: pushl %ebx .LCFI4: subl $20, %esp .LCFI5: cmpb $0, guard variable for foo()::a je .L12 .L9: addl $20, %esp popl %ebx popl %ebp ret .p2align 4,,7 .L12: movl guard variable for foo()::a, (%esp) call __cxa_guard_acquire testl %eax, %eax je .L9 movl foo()::a, (%esp) .LEHB0: call A::A() .LEHE0: movl guard variable for foo()::a, (%esp) call __cxa_guard_release movl $__tcf_0, (%esp) call atexit addl $20, %esp popl %ebx popl %ebp ret ... W.
This is related to PR 17736 which is for empty global variables with empty constructors.
(In reply to comment #1) > Confirmed, although I consider this to be a rather minor point since > the code is actually run only once. Here's a small test: I agree, but it bloats the code, therefore reducing potential for inlining and pipeline trashing.
But it is still an enhancement because any missed optimization that does not happen before is. Also this is a middle-end bug since the middle-end and not the front-end figure out that the function is empty and remove the call to atexit.
atexit only takes a pointer to a function to be run on exit of the program. The fact that this is an empty function is unbeknownst to it, and probably the code in the middle-end that has to deal with that. I therefore believe that the front-end has to take care of the fact that we have may run an empty function; it would also logically make sense to guard the call to atexit with a check whether the destructor is empty or not. I'll move this bug back to the C++ component. W.
(In reply to comment #5) > I'll move this bug back to the C++ component. Consider the following C++ code: struct Foo { ~Foo() {} int i; }; struct A { Foo foo[2]; }; void foo () { static A a; } We don't know the deconstructor for A is empty until we remove the empry loop and do optimizations on it. So this is again a job for the midde-end/tree-optimizations. We can then mark the function as empty and then have a pass (or really fold or something like that) remove calls to atexit (and similar functions) which just references empty functions. This is the correct way of doing this.
Why do you think the front-end doesn't know that the destructor is empty? It sees its definition, and it knows about potential destructors of member objects and base classes, so it should have all the information. That being said, I don't consider this to be such an important case as to pour a lot of passion into this discussion :-) W.
Hmm, the easy way to to have a pass which finds all the atexit calls and sees if the function which is passed is a const function and if it is, then just remove the atexit call.
(In reply to comment #7) > Why do you think the front-end doesn't know that the destructor is empty? Because it is non trivial. Try it with a more complex case where you have multiple layer destructors and inlining, you will see that the front-end does not know if it is empty until way after inlining. -- Pinski
Any progress? Multiple compilers already eliminate the atexit call. Moreover, some of the compilers even eliminate the guard variable after that https://godbolt.org/z/dbdfMrroa Note that the atexit elimination would benefit the libstdc++, as the latter now uses a bunch of constant_init instances that have empty destructor in libstdc++-v3/src/c++17/memory_resource.cc and libstdc++-v3/src/c++11/system_error.cc . It would be possible to eliminate the atexit calls for those cases and speedup startup times https://godbolt.org/z/MKaWKevzq
*** Bug 107500 has been marked as a duplicate of this bug. ***
Reduced from the code in libstdc++: extern "C" int puts(const char*); struct A { constexpr A() { } ~A() { puts("bye"); } }; namespace { struct constant_init { union { A obj; }; constexpr constant_init() : obj() { } ~constant_init() { /* do nothing, union member is not destroyed */ } }; // Single-threaded fallback buffer. constinit constant_init global; } extern "C" A* get() { return &global.obj; } With -std=c++20 -Os GCC emits a call to atexit: (anonymous namespace)::constant_init::~constant_init() [base object destructor]: ret get: mov eax, OFFSET FLAT:_ZN12_GLOBAL__N_16globalE ret _GLOBAL__sub_I_get: mov edx, OFFSET FLAT:__dso_handle mov esi, OFFSET FLAT:_ZN12_GLOBAL__N_16globalE mov edi, OFFSET FLAT:_ZN12_GLOBAL__N_113constant_initD1Ev jmp __cxa_atexit With the same options, Clang doesn't: get: # @get lea rax, [rip + (anonymous namespace)::global] ret
I am testing a (small) patch for this. Basically DCE can remove the calls to __cxa_atexit .
The patch is able to handle all 3 testcases here even: For the libstdc++ one: t3.cc.123t.dce2:Deleting : __cxxabiv1::__cxa_atexit (__dt_comp , &global, &__dso_handle);
So what is interesting is the way LLVM implements this as an IPO pass as how I described in comment #8, 6 years after I wrote that. Well they used "empty" rather than const . Note using it as an IPO pass I think is not wrong just having a specialized pass here where this is the same as DCE so just have DCE handle it instead seems better; plus with the availability of having pure/const flags with LTO and not having to load in the function seems much better really.
(In reply to Andrew Pinski from comment #16) > So what is interesting is the way LLVM implements this as an IPO pass as how > I described in comment #8, 6 years after I wrote that. Well they used > "empty" rather than const . Note using it as an IPO pass I think is not > wrong just having a specialized pass here where this is the same as DCE so > just have DCE handle it instead seems better; plus with the availability of > having pure/const flags with LTO and not having to load in the function > seems much better really. Yes, LLVM recognizes __cxa_atexit as a library function and removes it in GlobalOpt (part of IPO) since 2011 (https://github.com/llvm/llvm-project/commit/ee6bc70d2f1c2434ca9ca8092216bdeab322c7e5), likely because GlobalOpt is already doing global variable optimizations (e.g. removal if dead, constant folding). Technically __cxa_atexit removal can be moved elsewhere. There are two GlobalOpt passes in the optimization pipeline (see -mllvm -print-changed=cdiff output), one before the inliner and one after. SROA ... GlobalOpt ... buildInlinerPipeline ... GlobalOpt For empty functions like `~constant_init() {}`, SROA deletes unneeded IR instructions (spill of the "this" pointer) and actually makes the IR function empty. Then GlobalOpt removes __cxa_atexit call sites. For `static void empty() {} ~constant_init() { empty(); }`, the inliner removes the `empty()` and makes the IR function empty. Then GlobalOpt removes __cxa_atexit call sites.
Created attachment 57987 [details] Patch which I will be submitting once GCC 15 stage 1 opens up
(In reply to Andrew Pinski from comment #18) > Created attachment 57987 [details] > Patch which I will be submitting once GCC 15 stage 1 opens up I forgot to add the testcase from comment #12. the patch which I am going to submit will have it.