Bug 19661 - unnecessary atexit calls emitted for static objects with empty destructors
Summary: unnecessary atexit calls emitted for static objects with empty destructors
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 3.4.2
: P2 enhancement
Target Milestone: ---
Assignee: Andrew Pinski
URL:
Keywords: missed-optimization
Depends on:
Blocks: 84411
  Show dependency treegraph
 
Reported: 2005-01-27 22:24 UTC by Yuri
Modified: 2024-04-19 00:59 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2022-11-02 00:00:00


Attachments
Patch which I will be submitting once GCC 15 stage 1 opens up (2.17 KB, patch)
2024-04-19 00:39 UTC, Andrew Pinski
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yuri 2005-01-27 22:24:29 UTC
When some object having empty destructor is declared as static in the local
context of some function g++ still inserts <atexit> call with the empty handler
like <__tcf_0> below.

It may seem like minor issue but redundant instructions increase chances for
I-cache misses and worthen processor pipeline.

Yuri


08048800 <__tcf_0>:
 8048800:       55                      push   %ebp
 8048801:       89 e5                   mov    %esp,%ebp
 8048803:       5d                      pop    %ebp
 8048804:       c3                      ret


--testcase--
begin 644 case-empty-atexit.tgz
M'XL(`.1G^4$``^V6VVJ#0!"&O=ZG&-*;))!D/:Q>)!1"GV31:2.83=`U&$KZ
M[!U-BI)#<U%)"YT/85?\]Z#C/[.Q+G""ZZW=3[3%*K4SIW>D#&2D%+5*JLBG
M5DHW4$U[PG&E&X5>Z(>^YTC7#X/(`=7_5BXI"ZMS`&=?YNEWN@1WF&VVC]C2
M(XDOXE]-7WI>0[I2AD%P,_X4^:_XUW\`Z0//"QV0/>_C*O\\_D(\I2;.R@1A
M4$U7`R&PLI@;V&W2!,H"AZ.Y$/2-;!K#$C3=B+5.S7`$[P(@WI06%@L8H*%1
M@[J+)LGF].@TMJ/)4.^PHSF(WWY[YIK_5SVO<<__7NM_GZ2-_U7$_G\$'?\O
MTDUA<]3K9U$6J7D#H]=8;'6,4-BDR0)Y&5O*`K7SEW4&:+V]I*XY"C!I/0X'
MDC:IY-6<#Z`$<:[\.,XZ&W=D"5Y..YZ1_##G_/%CKOA_0H'I]0QPM_Y+U=;_
M*"*]HHO]_PANE?]CJ6\/`8WGN\<``#VM/<UEG&$8AF$8AF$8AF$8AF$8AF$8
-AF'^#)_;GM5=`"@`````
`
end
Comment 1 Wolfgang Bangerth 2005-01-28 01:08:27 UTC
Confirmed, although I consider this to be a rather minor point since 
the code is actually run only once. Here's a small test: 
-------------------- 
struct A { 
    A(); 
    ~A() {} 
}; 
 
void foo () { 
  static A a; 
} 
------------------------- 
 
g/x> /home/bangerth/bin/gcc-4.0-pre/bin/c++ -S -O3 x.cc 
g/x> cat x.s | c++filt > x.ss 
 
.... 
foo(): 
.LFB5: 
	pushl	%ebp 
.LCFI2: 
	movl	%esp, %ebp 
.LCFI3: 
	pushl	%ebx 
.LCFI4: 
	subl	$20, %esp 
.LCFI5: 
	cmpb	$0, guard variable for foo()::a 
	je	.L12 
.L9: 
	addl	$20, %esp 
	popl	%ebx 
	popl	%ebp 
	ret 
	.p2align 4,,7 
.L12: 
	movl	guard variable for foo()::a, (%esp) 
	call	__cxa_guard_acquire 
	testl	%eax, %eax 
	je	.L9 
	movl	foo()::a, (%esp) 
.LEHB0: 
	call	A::A() 
.LEHE0: 
	movl	guard variable for foo()::a, (%esp) 
	call	__cxa_guard_release 
	movl	$__tcf_0, (%esp) 
	call	atexit 
	addl	$20, %esp 
	popl	%ebx 
	popl	%ebp 
	ret 
... 
 
W. 
Comment 2 Andrew Pinski 2005-01-28 01:22:51 UTC
This is related to PR 17736 which is for empty global variables with empty constructors.
Comment 3 Yuri 2005-01-28 01:26:49 UTC
(In reply to comment #1)
> Confirmed, although I consider this to be a rather minor point since 
> the code is actually run only once. Here's a small test: 

I agree, but it bloats the code, therefore reducing potential for inlining and
pipeline trashing.
Comment 4 Andrew Pinski 2005-01-28 01:45:58 UTC
But it is still an enhancement because any missed optimization that does not happen before is.
Also this is a middle-end bug since the middle-end and not the front-end figure out that the function 
is empty and remove the call to atexit.
Comment 5 Wolfgang Bangerth 2005-01-28 17:01:53 UTC
atexit only takes a pointer to a function to be run on exit of the 
program. The fact that this is an empty function is unbeknownst to 
it, and probably the code in the middle-end that has to deal with 
that. I therefore believe that the front-end has to take care of the 
fact that we have may run an empty function; it would also logically 
make sense to guard the call to atexit with a check whether the 
destructor is empty or not. 
 
I'll move this bug back to the C++ component. 
 
W. 
Comment 6 Andrew Pinski 2005-01-28 18:53:45 UTC
(In reply to comment #5)
> I'll move this bug back to the C++ component. 
Consider the following C++ code:
struct Foo { ~Foo() {} int i; };
struct A { Foo foo[2]; };
void foo () { 
  static A a; 
} 

We don't know the deconstructor for A is empty until we remove the empry loop and do optimizations 
on it.  So this is again a job for the midde-end/tree-optimizations.  We can then mark the function as 
empty and then have a pass (or really fold or something like that) remove calls to atexit (and similar 
functions) which just references empty functions.  This is the correct way of doing this.
Comment 7 Wolfgang Bangerth 2005-01-28 19:11:31 UTC
Why do you think the front-end doesn't know that the destructor is empty? 
It sees its definition, and it knows about potential destructors of 
member objects and base classes, so it should have all the information. 
That being said, I don't consider this to be such an important case as 
to pour a lot of passion into this discussion :-) 
 
W. 
Comment 8 Andrew Pinski 2005-09-24 17:42:54 UTC
Hmm, the easy way to to have a pass which finds all the atexit calls and sees if the function which is 
passed is a const function and if it is, then just remove the atexit call.
Comment 9 Andrew Pinski 2008-02-20 21:52:35 UTC
(In reply to comment #7)
> Why do you think the front-end doesn't know that the destructor is empty? 

Because it is non trivial.  Try it with a more complex case where you have multiple layer destructors and inlining, you will see that the front-end does not know if it is empty until way after inlining.

-- Pinski

Comment 10 Antony Polukhin 2021-09-24 07:41:50 UTC
Any progress?

Multiple compilers already eliminate the atexit call. Moreover, some of the compilers even eliminate the guard variable after that  https://godbolt.org/z/dbdfMrroa

Note that the atexit elimination would benefit the libstdc++, as the latter now uses a bunch of constant_init instances that have empty destructor
in libstdc++-v3/src/c++17/memory_resource.cc and libstdc++-v3/src/c++11/system_error.cc . It would be possible to eliminate the atexit calls for those cases and speedup startup times https://godbolt.org/z/MKaWKevzq
Comment 11 Andrew Pinski 2022-11-02 15:47:08 UTC
*** Bug 107500 has been marked as a duplicate of this bug. ***
Comment 12 Jonathan Wakely 2022-11-02 15:53:14 UTC
Reduced from the code in libstdc++:

extern "C" int puts(const char*);

struct A
{
  constexpr A()  { }
  ~A() { puts("bye"); }
};

namespace
{
  struct constant_init
  {
    union {
      A obj;
    };
    constexpr constant_init() : obj() { }

    ~constant_init() { /* do nothing, union member is not destroyed */ }
  };


  // Single-threaded fallback buffer.
  constinit constant_init global;
}

extern "C" A* get() { return &global.obj; }

With -std=c++20 -Os GCC emits a call to atexit:

(anonymous namespace)::constant_init::~constant_init() [base object destructor]:
        ret
get:
        mov     eax, OFFSET FLAT:_ZN12_GLOBAL__N_16globalE
        ret
_GLOBAL__sub_I_get:
        mov     edx, OFFSET FLAT:__dso_handle
        mov     esi, OFFSET FLAT:_ZN12_GLOBAL__N_16globalE
        mov     edi, OFFSET FLAT:_ZN12_GLOBAL__N_113constant_initD1Ev
        jmp     __cxa_atexit


With the same options, Clang doesn't:

get:                                    # @get
        lea     rax, [rip + (anonymous namespace)::global]
        ret
Comment 13 Jonathan Wakely 2022-11-02 17:16:22 UTC
*** Bug 107500 has been marked as a duplicate of this bug. ***
Comment 14 Andrew Pinski 2024-03-15 22:56:29 UTC
I am testing a (small) patch for this. Basically DCE can remove the calls to __cxa_atexit .
Comment 15 Andrew Pinski 2024-03-15 23:40:38 UTC
The patch is able to handle all 3 testcases here even:
For the libstdc++ one:
t3.cc.123t.dce2:Deleting : __cxxabiv1::__cxa_atexit (__dt_comp , &global, &__dso_handle);
Comment 16 Andrew Pinski 2024-03-16 02:17:39 UTC
So what is interesting is the way LLVM implements this as an IPO pass as how I described in comment #8, 6 years after I wrote that. Well they used "empty" rather than const . Note using it as an IPO pass I think is not wrong just having a specialized pass here where this is the same as DCE so just have DCE handle it instead seems better; plus with the availability of having pure/const flags with LTO and not having to load in the function seems much better really.
Comment 17 Fangrui Song 2024-03-16 22:47:01 UTC
(In reply to Andrew Pinski from comment #16)
> So what is interesting is the way LLVM implements this as an IPO pass as how
> I described in comment #8, 6 years after I wrote that. Well they used
> "empty" rather than const . Note using it as an IPO pass I think is not
> wrong just having a specialized pass here where this is the same as DCE so
> just have DCE handle it instead seems better; plus with the availability of
> having pure/const flags with LTO and not having to load in the function
> seems much better really.

Yes, LLVM recognizes __cxa_atexit as a library function and removes it in GlobalOpt (part of IPO) since 2011
(https://github.com/llvm/llvm-project/commit/ee6bc70d2f1c2434ca9ca8092216bdeab322c7e5), likely because
GlobalOpt is already doing global variable optimizations (e.g. removal if dead, constant folding). Technically __cxa_atexit removal can be moved elsewhere.

There are two GlobalOpt passes in the optimization pipeline (see -mllvm -print-changed=cdiff output), one before the inliner and one after.

  SROA
  ...
  GlobalOpt
  ...
  buildInlinerPipeline
  ...
  GlobalOpt

For empty functions like `~constant_init() {}`, SROA deletes unneeded IR instructions (spill of the "this" pointer) and actually makes the IR function empty. Then GlobalOpt removes __cxa_atexit call sites.

For `static void empty() {} ~constant_init() { empty(); }`, the inliner removes the `empty()` and makes the IR function empty. Then GlobalOpt removes __cxa_atexit call sites.
Comment 18 Andrew Pinski 2024-04-19 00:39:36 UTC
Created attachment 57987 [details]
Patch which I will be submitting once GCC 15 stage 1 opens up
Comment 19 Andrew Pinski 2024-04-19 00:59:01 UTC
(In reply to Andrew Pinski from comment #18)
> Created attachment 57987 [details]
> Patch which I will be submitting once GCC 15 stage 1 opens up

I forgot to add the testcase from comment #12. the patch which I am going to submit will have it.