(https://gcc.godbolt.org/z/a4ox6oEfT) ``` struct impl; struct impl* get_impl(int key); int get_value(struct impl* p); extern __inline__ __attribute__((__gnu_inline__)) int get_value_by_key(int key) { struct impl* p = get_impl(key); if(!p) return -1; return get_value(p); } int real_get_value_by_key(int key) { return get_value_by_key(key); } ``` This is actually two functions, one is `gnu_inline` and the other is a non-inline one. It looks to me that if I mark a function `gnu_inline`, I assert that 'somewhere I shall provide an external definition for you' so when optimizing for size, GCC may generate a call instead of using the more complex inline definition. The `real_get_value_by_key` function is made a deliberate sibling call, so ideally this should be ``` real_get_value_by_key: jmp get_value_by_key ``` and not ``` real_get_value_by_key: push rsi call get_impl test rax, rax je .L2 mov rdi, rax pop rcx jmp get_value .L2: or eax, -1 pop rdx ret ``` It still gets inlined with `-finline-limit=0` and can only be disabled by `-fno-inline`. I have no idea how it is controlled. --------------------------- # Trivia These are two `gnu_inline` functions from the same library. Most of the time they should both be inlined in user code. However, external definitions are required when optimization is not turned on, or when their addresses are taken, so they must still exist. As they are unlikely to be used anyway, optimizing for size makes much more sense.
I thought it was documented that gnu_inline also causes always_inline if optimization is turned on but I can't seem to find that ...
(In reply to Andrew Pinski from comment #1) > I thought it was documented that gnu_inline also causes always_inline if > optimization is turned on but I can't seem to find that ... Is that the case in GCC source? I think I would have to find a workaround for it.
The C++ front-end does: /* Handle gnu_inline attribute. */ if (GNU_INLINE_P (decl1)) { DECL_EXTERNAL (decl1) = 1; DECL_NOT_REALLY_EXTERN (decl1) = 0; DECL_INTERFACE_KNOWN (decl1) = 1; DECL_DISREGARD_INLINE_LIMITS (decl1) = 1; } C front-end does: /* For GNU C extern inline functions disregard inline limits. */ if (DECL_EXTERNAL (fndecl) && DECL_DECLARED_INLINE_P (fndecl) && (flag_gnu89_inline || lookup_attribute ("gnu_inline", DECL_ATTRIBUTES (fndecl)))) DECL_DISREGARD_INLINE_LIMITS (fndecl) = 1; This specifically from r0-82849-gc536a6a77a19a8 but it was done different before that (using a language hook). https://gcc.gnu.org/pipermail/gcc-patches/2007-July/221806.html https://gcc.gnu.org/pipermail/gcc-patches/2007-August/223406.html It looks like it has been this way since r0-37737-g4838c5ee553f06 (2001) (or rather that is when it was used by the tree inline; I don't want to dig further back to understand the RTL inliner). So looks like this is just missing documentation ...
(In reply to Andrew Pinski from comment #3) > It looks like it has been this way since r0-37737-g4838c5ee553f06 (2001) (or > rather that is when it was used by the tree inline; I don't want to dig > further back to understand the RTL inliner). So looks like this is just > missing documentation ... It's not just about `gnu_inline`. If we switch to C++ inline we get the same result: (https://gcc.godbolt.org/z/ehbjqj5xh) ``` struct impl; struct impl* get_impl(int key); int get_value(struct impl* p); extern inline int get_value_by_key(int key) { struct impl* p = get_impl(key); if(!p) return -1; return get_value(p); } int real_get_value_by_key(int key) { return get_value_by_key(key); } ``` GCC outputs: ``` real_get_value_by_key(int): push rsi call get_impl(int) test rax, rax je .L2 mov rdi, rax pop rcx jmp get_value(impl*) .L2: or eax, -1 pop rdx ret ``` If we switched to C99 `extern inline` then it would produce desired result: ``` get_value_by_key: push rsi call get_impl test rax, rax je .L2 mov rdi, rax pop rcx jmp get_value .L2: or eax, -1 pop rdx ret real_get_value_by_key: jmp get_value_by_key `` The only difference between the C99 `extern inline` and C++ `extern inline` is that the C++ external definition is COMDAT.
(In reply to LIU Hao from comment #4) > The only difference between the C99 `extern inline` and C++ `extern inline` > is that the C++ external definition is COMDAT. Well not really. comdat changes heurstics here though. The reason being C++ inline functions are most likely smaller and should really be inlined. This is all heurstics of inlining and figuring out locally in the TU that it might not be called in another TU or not. Note GCC has not retuned its -Os heurstics for a long time because it has been decent enough for most folks and corner cases like this is almost never come up.
> Note GCC has not retuned its -Os heurstics for a long time because it has been > decent enough for most folks and corner cases like this is almost never come > up. There were quite few changes to -Os heuristics :) One of bigger challenges is that we do see more and more C++ code built with -Os which relies on certain functions to be inlined and optimized in context, so we had to get more optimistic in a hope that inlined code will optimize well. COMDAT functions are more likely inlined because statistics shows that many of them are not really shared between translations units (see -param=comdat-sharing-probability parameter). This was necessary to get reasonable code for Firefox approx 15 years ago.
(In reply to Jan Hubicka from comment #6) > > Note GCC has not retuned its -Os heurstics for a long time because it has been > > decent enough for most folks and corner cases like this is almost never come > > up. > There were quite few changes to -Os heuristics :) > One of bigger challenges is that we do see more and more C++ code built > with -Os which relies on certain functions to be inlined and optimized > in context, so we had to get more optimistic in a hope that inlined code > will optimize well. > > COMDAT functions are more likely inlined because statistics shows that > many of them are not really shared between translations units > (see -param=comdat-sharing-probability parameter). This was necessary to > get reasonable code for Firefox approx 15 years ago. So is there no way to get the C99 extern inline behavior? i.e. sibling calls to gnu_inline functions are inlined even when optimizing for size.