10837 – noreturn attribute causes no sibling calling optimization

Bug 10837 - noreturn attribute causes no sibling calling optimization

Summary: noreturn attribute causes no sibling calling optimization

Status:	RESOLVED WONTFIX

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	rtl-optimization (show other bugs)
Version:	3.4.0

Importance:	P3 normal
Target Milestone:	3.4.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Duplicates (7):	33083 34589 56165 58152 67327 68677 111786 (view as bug list)
Depends on:
Blocks:

Reported:	2003-05-17 17:16 UTC by Andrew Pinski
Modified:	2024-03-05 03:06 UTC (History)
CC List:	14 users (show)

See Also:	111786 38534 83324
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Andrew Pinski 2003-05-17 17:16:00 UTC

noreturn attribute on a function causes sibling calling optimization not to be able to be done, so it causes more code to be executed then needed.

Release:
GNU C version 3.4 20030516 and GNU C version 3.4 20030517

Environment:
powerpc-apple-darwin6.6 and i686-pc-linux-gnu

How-To-Repeat:
cat >tt.c <<EOF
void temp()
{
                abort();
}

void temp2() __attribute__((__noreturn__));
void temp1()
{
        temp2();
}

void temp3();

void temp4()
{
        temp3();
}
EOF
gcc -O3 -fomit-frame-pointer tt.c -S
look at the asm and see that temp4 has the optimization but temp1 does not.

Comment 1 Andrew Pinski 2003-05-17 17:16:00 UTC

Fix:
if the attrubute is on a function, act like it is a return statement also, so that the sibiling call optimization will always work with that one.

Comment 2 Richard Henderson 2003-05-23 15:28:22 UTC

Working as designed.
(1) It often takes more insns to pop the stack frame than to make the call.
(2) You get a proper backtrace from abort.
(3) http://gcc.gnu.org/ml/gcc-patches/2000-10/msg00180.html

Comment 3 Andrew Pinski 2007-08-16 10:38:55 UTC

*** Bug 33083 has been marked as a duplicate of this bug. ***

Comment 4 Bernhard Kauer 2007-08-28 19:07:18 UTC

Some comments from the outside:
(1) It should be possible to exchange the emmited call into a jmp, without adding the function epilog.
(2) A proper backtrace is not generated on optimized sibling functions.
(3) http://gcc.gnu.org/ml/gcc-patches/2000-10/msg00180.html explains, that it is hard to implement

Comment 5 Bernhard Kauer 2007-08-28 19:27:38 UTC

(4) abort produce a proper backtrace, if it is not a noreturn function
(5) this optimization could be configurable by something like -foptimize-noreturns

Comment 6 Andrew Pinski 2007-12-26 09:36:14 UTC

*** Bug 34589 has been marked as a duplicate of this bug. ***

Comment 7 Andrew Pinski 2013-01-31 21:46:42 UTC

*** Bug 56165 has been marked as a duplicate of this bug. ***

Comment 8 Andrew Pinski 2013-02-03 02:18:59 UTC

*** Bug 56165 has been marked as a duplicate of this bug. ***

Comment 9 Andy Lutomirski 2013-08-10 00:16:54 UTC

In this case, perhaps sibling call optimization is the wrong thing here.  The caller of a noreturn function shouldn't pop the stack, but it also shouldn't save registers (and, if it doesn't need to save registers, it shouldn't create a stack frame in the first place).

Comment 10 Andrew Pinski 2013-08-14 15:38:33 UTC

*** Bug 58152 has been marked as a duplicate of this bug. ***

Comment 11 Andrew Pinski 2015-08-23 13:41:10 UTC

*** Bug 67327 has been marked as a duplicate of this bug. ***

Comment 12 H.J. Lu 2015-08-25 11:25:59 UTC

(In reply to Richard Henderson from comment #2)
> Working as designed.
> (1) It often takes more insns to pop the stack frame than to make the call.
> (2) You get a proper backtrace from abort.
> (3) http://gcc.gnu.org/ml/gcc-patches/2000-10/msg00180.html

Glibc has been using

ENTRY (__memmove_chk)
	movl	12(%esp), %eax
	cmpl	%eax, 16(%esp)
	jb	__chk_fail
	jmp	memmove
END (__memmove_chk)

since 2004. #1 and #2 shouldn't be the reason not to optimize.  I
am using:

/* Due to
   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10837
   noreturn attribute disable tail call optimization.  Removes noreturn
   attribute to enable tail call optimization.  */
extern void *chk_fail (void) __asm__ ("__chk_fail") attribute_hidden;

to work around this.

Comment 13 Andrew Pinski 2015-12-03 17:54:56 UTC

*** Bug 68677 has been marked as a duplicate of this bug. ***

Comment 14 Xi Ruoyao 2023-10-12 14:28:05 UTC

*** Bug 111786 has been marked as a duplicate of this bug. ***

Comment 15 gooncreeper 2024-02-11 19:51:39 UTC

May I suggest we just add something like __attribute__((trace)) for the special abort case? Noreturn was added for code optimization after all, not for backtracing.

Comment 16 Xi Ruoyao 2024-02-12 02:39:17 UTC

(In reply to gooncreeper from comment #15)
> May I suggest we just add something like __attribute__((trace)) for the
> special abort case? Noreturn was added for code optimization after all, not
> for backtracing.

It will break any attempts to debug an abort until the libc headers are updated to use __attribute__((trace)).

Note that in GCC noreturn has been added far before the WG14 _Noreturn paper (even this ticket predates the WG14 paper), so the rationale in the paper may not apply.

In practice most _Noreturn functions are abort, exit, ..., i.e. they are only executed one time so optimizing against a cold path does not help much.  I don't think it's a good idea to encourage people to construct some fancy code by a recursive _Noreturn function (why not just use a loop?!)  And if you must write such fancy code anyway IMO musttail attribute (PR83324) will be a better solution.

Comment 17 Lukas Grätz 2024-02-12 04:38:28 UTC

(In reply to Xi Ruoyao from comment #16)
> (In reply to gooncreeper from comment #15)
> > May I suggest we just add something like __attribute__((trace)) for the
> > special abort case? Noreturn was added for code optimization after all, not
> > for backtracing.
> 
> It will break any attempts to debug an abort until the libc headers are
> updated to use __attribute__((trace)).

"any attempts"? We could simply use the gdb debugger and ignore the backtrace. In comparison, the backtrace is a rather restricted debugging instrument.

If there are applications that really depend on GCC's backtrace, this should be the reason to keep the current behaviour.

> 
> Note that in GCC noreturn has been added far before the WG14 _Noreturn paper
> (even this ticket predates the WG14 paper), so the rationale in the paper
> may not apply.

Backtracing functionality is highly platform dependent, so there is no surprise that the C standard cannot guarantee anything about it.

> 
> In practice most _Noreturn functions are abort, exit, ..., i.e. they are
> only executed one time so optimizing against a cold path does not help much.
> I don't think it's a good idea to encourage people to construct some fancy
> code by a recursive _Noreturn function (why not just use a loop?!)

... and why not just if and goto? Because it is considered good programming practice to structure source code into functions (not to long) and loops. If a function gets too big, GCC might not optimize it well.

>  And if
> you must write such fancy code anyway IMO musttail attribute (PR83324) will
> be a better solution.

I agree.

Comment 18 Lukas Grätz 2024-02-14 11:40:52 UTC

On another thought: I think something like -fignore-backtrace could be a reasonable optimization flag (enabled by default for -O4). By ignoring the backtrace we could do other optimizations on size and speed, like in this ticket and duplicates.

There are use cases for that, see some of the duplicate tickets. For example in PR56165, they didn't want to support any debugging at all. And even if you want debugging, you might want to disregard backtraces and use a more sophisticated debugging device. This is independent from attribute musttail, with -fignore-backtrace we would leave GCC more freedom to do optimization.

Comment 19 Petr Skocik 2024-02-25 11:48:39 UTC

IMO(In reply to Xi Ruoyao from comment #16)
 
> In practice most _Noreturn functions are abort, exit, ..., i.e. they are
> only executed one time so optimizing against a cold path does not help much.
> I don't think it's a good idea to encourage people to construct some fancy
> code by a recursive _Noreturn function (why not just use a loop?!)  And if
> you must write such fancy code anyway IMO musttail attribute (PR83324) will
> be a better solution.

There's also longjmp, which may not be all that super cold and may be executed multiple times. And while yeah, nobody will notice a single call vs jmp time save against a process spawn/exit, for a longjmp wrapper, it'll make it a few % faster (as would utilizing _Noreturn attributes for better register allocation: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114097, which would also save a bit of codesize too). Taillcalls can also save a bit of codesize if the target is near.

Comment 20 Lukas Grätz 2024-02-26 15:17:07 UTC

(In reply to Petr Skocik from comment #19)
> IMO(In reply to Xi Ruoyao from comment #16)
>  
> > In practice most _Noreturn functions are abort, exit, ..., i.e. they are
> > only executed one time so optimizing against a cold path does not help much.
> > I don't think it's a good idea to encourage people to construct some fancy
> > code by a recursive _Noreturn function (why not just use a loop?!)  And if
> > you must write such fancy code anyway IMO musttail attribute (PR83324) will
> > be a better solution.
> 
> There's also longjmp, which may not be all that super cold and may be
> executed multiple times. And while yeah, nobody will notice a single call vs
> jmp time save against a process spawn/exit, for a longjmp wrapper, it'll
> make it a few % faster (as would utilizing _Noreturn attributes for better
> register allocation: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114097,
> which would also save a bit of codesize too). Taillcalls can also save a bit
> of codesize if the target is near.


Just to emphasize, tail call optimization is not just for speed. It is essential to avoid waste of stack space. Especially, to avoid potential stack overflows, it should _not_ be necessary to replace all recursions with loops, as Xi Ruoyao suggests. Ah, and I also think that recursions in C is not fancy (anymore), since everyone expects the compiler to do sibcall or similar optimizations. Noreturn functions are the exception for that. So it would be consequent indeed to do sibcall optimization for noreturn functions, too!

Personally, I would be satisfied with the new attribute musttail to enforces tail calls whenever necessary (given that this will be available for C, not C++ only). But speed-wise, musttail might not have the desired effect. It is meant for preserving stack space.

---

Following Petr Skocik, I quick-tested on my computer:

===== longjmp_wrapper.c =====================
#include <setjmp.h>

__attribute__((noreturn))
void longjmp_wrapper(jmp_buf env, int val) {
    longjmp(env, val);
}

===== longjmp_main.c ========================
#include <setjmp.h>
#include <limits.h>

__attribute__((noreturn))
void longjmp_wrapper(jmp_buf env, int val);

int main(void) {
    jmp_buf env;
    for (int i = 0; i < INT_MAX; i++) {
        if (setjmp(env) == 0) {
            longjmp_wrapper(env, 1);
        }
    }
}
=============================================

After compiling with

$ gcc -O3 -m32 -c -S longjmp_wrapper.c -o longjmp_wrapper.S

I copied and manually modified the generated longjmp_wrapper.S as follows:

9,15c9
< 	subl	$20, %esp
< 	.cfi_def_cfa_offset 24
< 	pushl	28(%esp)
< 	.cfi_def_cfa_offset 28
< 	pushl	28(%esp)
< 	.cfi_def_cfa_offset 32
< 	call	longjmp
---
> 	jmp 	longjmp


Then I compiled both versions with longjmp_main.c, again with -m32. Measured with "time", the sibcall and unmodified version took around 23.5 sec and 24.5 sec on my computer. So around 4 % improvement for 32 bit x86. For 64 bit x86, both took around 18 secs without noticeable speed difference (perhaps because both arguments are passed in registers instead of stack by 64 bit calling conventions).