noreturn attribute on a function causes sibling calling optimization not to be able to be done, so it causes more code to be executed then needed. Release: GNU C version 3.4 20030516 and GNU C version 3.4 20030517 Environment: powerpc-apple-darwin6.6 and i686-pc-linux-gnu How-To-Repeat: cat >tt.c <<EOF void temp() { abort(); } void temp2() __attribute__((__noreturn__)); void temp1() { temp2(); } void temp3(); void temp4() { temp3(); } EOF gcc -O3 -fomit-frame-pointer tt.c -S look at the asm and see that temp4 has the optimization but temp1 does not.
Fix: if the attrubute is on a function, act like it is a return statement also, so that the sibiling call optimization will always work with that one.
Working as designed. (1) It often takes more insns to pop the stack frame than to make the call. (2) You get a proper backtrace from abort. (3) http://gcc.gnu.org/ml/gcc-patches/2000-10/msg00180.html
*** Bug 33083 has been marked as a duplicate of this bug. ***
Some comments from the outside: (1) It should be possible to exchange the emmited call into a jmp, without adding the function epilog. (2) A proper backtrace is not generated on optimized sibling functions. (3) http://gcc.gnu.org/ml/gcc-patches/2000-10/msg00180.html explains, that it is hard to implement
(4) abort produce a proper backtrace, if it is not a noreturn function (5) this optimization could be configurable by something like -foptimize-noreturns
*** Bug 34589 has been marked as a duplicate of this bug. ***
*** Bug 56165 has been marked as a duplicate of this bug. ***
In this case, perhaps sibling call optimization is the wrong thing here. The caller of a noreturn function shouldn't pop the stack, but it also shouldn't save registers (and, if it doesn't need to save registers, it shouldn't create a stack frame in the first place).
*** Bug 58152 has been marked as a duplicate of this bug. ***
*** Bug 67327 has been marked as a duplicate of this bug. ***
(In reply to Richard Henderson from comment #2) > Working as designed. > (1) It often takes more insns to pop the stack frame than to make the call. > (2) You get a proper backtrace from abort. > (3) http://gcc.gnu.org/ml/gcc-patches/2000-10/msg00180.html Glibc has been using ENTRY (__memmove_chk) movl 12(%esp), %eax cmpl %eax, 16(%esp) jb __chk_fail jmp memmove END (__memmove_chk) since 2004. #1 and #2 shouldn't be the reason not to optimize. I am using: /* Due to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10837 noreturn attribute disable tail call optimization. Removes noreturn attribute to enable tail call optimization. */ extern void *chk_fail (void) __asm__ ("__chk_fail") attribute_hidden; to work around this.
*** Bug 68677 has been marked as a duplicate of this bug. ***
*** Bug 111786 has been marked as a duplicate of this bug. ***
May I suggest we just add something like __attribute__((trace)) for the special abort case? Noreturn was added for code optimization after all, not for backtracing.
(In reply to gooncreeper from comment #15) > May I suggest we just add something like __attribute__((trace)) for the > special abort case? Noreturn was added for code optimization after all, not > for backtracing. It will break any attempts to debug an abort until the libc headers are updated to use __attribute__((trace)). Note that in GCC noreturn has been added far before the WG14 _Noreturn paper (even this ticket predates the WG14 paper), so the rationale in the paper may not apply. In practice most _Noreturn functions are abort, exit, ..., i.e. they are only executed one time so optimizing against a cold path does not help much. I don't think it's a good idea to encourage people to construct some fancy code by a recursive _Noreturn function (why not just use a loop?!) And if you must write such fancy code anyway IMO musttail attribute (PR83324) will be a better solution.
(In reply to Xi Ruoyao from comment #16) > (In reply to gooncreeper from comment #15) > > May I suggest we just add something like __attribute__((trace)) for the > > special abort case? Noreturn was added for code optimization after all, not > > for backtracing. > > It will break any attempts to debug an abort until the libc headers are > updated to use __attribute__((trace)). "any attempts"? We could simply use the gdb debugger and ignore the backtrace. In comparison, the backtrace is a rather restricted debugging instrument. If there are applications that really depend on GCC's backtrace, this should be the reason to keep the current behaviour. > > Note that in GCC noreturn has been added far before the WG14 _Noreturn paper > (even this ticket predates the WG14 paper), so the rationale in the paper > may not apply. Backtracing functionality is highly platform dependent, so there is no surprise that the C standard cannot guarantee anything about it. > > In practice most _Noreturn functions are abort, exit, ..., i.e. they are > only executed one time so optimizing against a cold path does not help much. > I don't think it's a good idea to encourage people to construct some fancy > code by a recursive _Noreturn function (why not just use a loop?!) ... and why not just if and goto? Because it is considered good programming practice to structure source code into functions (not to long) and loops. If a function gets too big, GCC might not optimize it well. > And if > you must write such fancy code anyway IMO musttail attribute (PR83324) will > be a better solution. I agree.
On another thought: I think something like -fignore-backtrace could be a reasonable optimization flag (enabled by default for -O4). By ignoring the backtrace we could do other optimizations on size and speed, like in this ticket and duplicates. There are use cases for that, see some of the duplicate tickets. For example in PR56165, they didn't want to support any debugging at all. And even if you want debugging, you might want to disregard backtraces and use a more sophisticated debugging device. This is independent from attribute musttail, with -fignore-backtrace we would leave GCC more freedom to do optimization.
IMO(In reply to Xi Ruoyao from comment #16) > In practice most _Noreturn functions are abort, exit, ..., i.e. they are > only executed one time so optimizing against a cold path does not help much. > I don't think it's a good idea to encourage people to construct some fancy > code by a recursive _Noreturn function (why not just use a loop?!) And if > you must write such fancy code anyway IMO musttail attribute (PR83324) will > be a better solution. There's also longjmp, which may not be all that super cold and may be executed multiple times. And while yeah, nobody will notice a single call vs jmp time save against a process spawn/exit, for a longjmp wrapper, it'll make it a few % faster (as would utilizing _Noreturn attributes for better register allocation: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114097, which would also save a bit of codesize too). Taillcalls can also save a bit of codesize if the target is near.
(In reply to Petr Skocik from comment #19) > IMO(In reply to Xi Ruoyao from comment #16) > > > In practice most _Noreturn functions are abort, exit, ..., i.e. they are > > only executed one time so optimizing against a cold path does not help much. > > I don't think it's a good idea to encourage people to construct some fancy > > code by a recursive _Noreturn function (why not just use a loop?!) And if > > you must write such fancy code anyway IMO musttail attribute (PR83324) will > > be a better solution. > > There's also longjmp, which may not be all that super cold and may be > executed multiple times. And while yeah, nobody will notice a single call vs > jmp time save against a process spawn/exit, for a longjmp wrapper, it'll > make it a few % faster (as would utilizing _Noreturn attributes for better > register allocation: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114097, > which would also save a bit of codesize too). Taillcalls can also save a bit > of codesize if the target is near. Just to emphasize, tail call optimization is not just for speed. It is essential to avoid waste of stack space. Especially, to avoid potential stack overflows, it should _not_ be necessary to replace all recursions with loops, as Xi Ruoyao suggests. Ah, and I also think that recursions in C is not fancy (anymore), since everyone expects the compiler to do sibcall or similar optimizations. Noreturn functions are the exception for that. So it would be consequent indeed to do sibcall optimization for noreturn functions, too! Personally, I would be satisfied with the new attribute musttail to enforces tail calls whenever necessary (given that this will be available for C, not C++ only). But speed-wise, musttail might not have the desired effect. It is meant for preserving stack space. --- Following Petr Skocik, I quick-tested on my computer: ===== longjmp_wrapper.c ===================== #include <setjmp.h> __attribute__((noreturn)) void longjmp_wrapper(jmp_buf env, int val) { longjmp(env, val); } ===== longjmp_main.c ======================== #include <setjmp.h> #include <limits.h> __attribute__((noreturn)) void longjmp_wrapper(jmp_buf env, int val); int main(void) { jmp_buf env; for (int i = 0; i < INT_MAX; i++) { if (setjmp(env) == 0) { longjmp_wrapper(env, 1); } } } ============================================= After compiling with $ gcc -O3 -m32 -c -S longjmp_wrapper.c -o longjmp_wrapper.S I copied and manually modified the generated longjmp_wrapper.S as follows: 9,15c9 < subl $20, %esp < .cfi_def_cfa_offset 24 < pushl 28(%esp) < .cfi_def_cfa_offset 28 < pushl 28(%esp) < .cfi_def_cfa_offset 32 < call longjmp --- > jmp longjmp Then I compiled both versions with longjmp_main.c, again with -m32. Measured with "time", the sibcall and unmodified version took around 23.5 sec and 24.5 sec on my computer. So around 4 % improvement for 32 bit x86. For 64 bit x86, both took around 18 secs without noticeable speed difference (perhaps because both arguments are passed in registers instead of stack by 64 bit calling conventions).