Consider the following code, compiled with -O2 -fPIC either in C or in C++: === void f(TYPE b) { f(0); } === if TYPE is a type of 32- or 64-bit width (int, unsigned, long, long long), GCC generates the following code (-m32, -mx32 or -m64): === f: .L2: jmp .L2 === If TYPE is shorter than 32-bit (bool, _Bool, char, short), GCC generates the following code (-mx32, -m64): === f: xorl %edi, %edi jmp f@PLT === and much worse code for -m32. For whatever reason, GCC decided to place the call via the PLT. That's a the missed optimisation: if this function was called, then the PLT must resolve back to itself. What's more, since the argument wasn't used, it's also unnecessary to set it. The output happens without -fPIC, but in that case there is no PLT. Tested on: GCC 4.7.2 (as shipped by Fedora 17) GCC 4.9 (trunk build from 20130318) This is a contrived example (infinite recursion) that no one would write in their sane mind. But it might point to missed optimisations in legitimate recursive functions.
This is due to promoting of the type to int on some targets (x86 is the main one).
Confirmed. We don't optimize callgraph cycles with one externally visible entry that way. And I believe we currently have no way of annotating a single call to resolve locally. Honza?
> Confirmed. We don't optimize callgraph cycles with one externally visible > entry that way. And I believe we currently have no way of annotating a > single call to resolve locally. Honza? The canonical way to do it is to create a static alias and the redirect call to the alias. We do that in FE for some specific examples (like thunks) but not in general. Doing so would indeed make sense. Honza
Note it would need to be done with lots of care, because you can e.g. have aliases to the function and in that case you should go through the PLT: __attribute__((noinline, noclone)) void f(short b) { f(0); } static void g (short) __attribute__ ((alias ("f"))); void h () { g (0); } Because the global scope f can be in different shared library, but if you call h () and is defined only in this CU, you call this CU's f, no the globally visible one.
Created attachment 40562 [details] Draft patch (In reply to Thiago Macieira from comment #0) > What's more, since the argument wasn't used, it's also > unnecessary to set it. This may be due to contrivedness of example (endless loop, etc.). The argument seems to be forwarded in normal cases. (In reply to Jan Hubicka from comment #3) > The canonical way to do it is to create a static alias and the redirect call > to the alias. We do that in FE for some specific examples (like thunks) but > not in general. Doing so would indeed make sense. Would something like attached draft patch be ok? (In reply to Jakub Jelinek from comment #4) > Note it would need to be done with lots of care, because you can e.g. have > aliases to the function and in that case you should go through the PLT: Note that we already violate this. E.g. in Thiago's reprocase compiler shouldn't have figured out that there is and endless loop. I can also see that recursive factorial implementation is expanded to loop even under -fPIC. Should I fix this?
Note that even without symbol aliases, such calls are not necessarily self-recursive when 'f' is first called via dlsym with RTLD_NEXT or a specific module as the first argument. In that case the result of dlsym won't necessarily be the prevailing definition of 'f' placed into the PLT.
(In reply to Alexander Monakov from comment #6) > Note that even without symbol aliases, such calls are not necessarily > self-recursive when 'f' is first called via dlsym with RTLD_NEXT or a > specific module as the first argument. Right. It seems -fno-semantic-interposition is the only option to disable strict interposition rules. So could someon close this as wontfix?
Well, if my argument is correct, then GCC generates wrong code for the very first example in comment #0. If that is deliberate as a compromise even though otherwise GCC suppresses optimizations to honor possible ELF interposition, I think this ought to be documented? To my knowledge, that is the sole instance where optimization doesn't fully honor ELF interposition possibility.
(In reply to Alexander Monakov from comment #8) > Well, if my argument is correct, then GCC generates wrong code for the very > first example in comment #0. I believe it does (see my #5, most probly author of some pass failed to check for interposition). Note that simple recursive factorial is expanded to loop too which is a more impressive instance of this bug. > To my knowledge, that > is the sole instance where optimization doesn't fully honor ELF > interposition possibility. +1
(In reply to Yuri Gribov from comment #9) > (In reply to Alexander Monakov from comment #8) > > Well, if my argument is correct, then GCC generates wrong code for the very > > first example in comment #0. > > I believe it does (see my #5, most probly author of some pass failed to > check for interposition). Note that simple recursive factorial is expanded > to loop too which is a more impressive instance of this bug. > > > To my knowledge, that > > is the sole instance where optimization doesn't fully honor ELF > > interposition possibility. > > +1 Don't we also inline any beneficial inline functions at -O3 even if they could be interposed (definitely not suggesting we stop doing that, that would totally kill compiler performance)? I'd say we shouldn't care about the semantic interposition for self-recursion either, just make sure we don't crash on such code.
(In reply to Jakub Jelinek from comment #10) > Don't we also inline any beneficial inline functions at -O3 even if they > could be interposed (definitely not suggesting we stop doing that, that > would totally kill compiler performance)? I'd say we shouldn't care about > the semantic interposition for self-recursion either, just make sure we > don't crash on such code. At least we should make things consistent and use direct calls for all local locals which could potentially be subject to inlining. Otherwise, interposition could break with apparently unrelated code changes.
(In reply to Jakub Jelinek from comment #10) > (In reply to Yuri Gribov from comment #9) > > (In reply to Alexander Monakov from comment #8) > > > Well, if my argument is correct, then GCC generates wrong code for the very > > > first example in comment #0. > > > > I believe it does (see my #5, most probly author of some pass failed to > > check for interposition). Note that simple recursive factorial is expanded > > to loop too which is a more impressive instance of this bug. > > > > > To my knowledge, that > > > is the sole instance where optimization doesn't fully honor ELF > > > interposition possibility. > > > > +1 > > Don't we also inline any beneficial inline functions at -O3 even if they > could be interposed (definitely not suggesting we stop doing that, that > would totally kill compiler performance)? Inlining inline functions is fine due to ODR rule.
(In reply to Yuri Gribov from comment #12) > Inlining inline functions is fine due to ODR rule. ODR doesn't apply just to inline functions. So all semantic interposition, except for the case when both functions do the very same thing, is ODR violation. But many programs and libraries rely on it heavily.
For the draft patch you need to check for aliases. If global symbol is indeed the only way to reach the function, then the transformation is IMO valid. As for tailcall, we have recursive_call_p predicate that uses symtab_node::semantically_equivalent_p which returns true, because it is called for t and t. It is valid for testcase as written, because we know there is no alias. However void f(int b) { f(0); } void g(int b) __attribute__((alias(("f")))); is indeed misopitmized and the bug is that recursive_call_p needs to check all aliases of the function being semantically equivalent to function itself. I will fix that. For the optimization redirecting to noninterposable aliases, I would be fine with the patch if it was extended by the check verifying that there is at most one externally visible symbol aliasing the definition or that they are all non-interposable. Honza
Author: hubicka Date: Sat Feb 11 17:56:02 2017 New Revision: 245359 URL: https://gcc.gnu.org/viewcvs?rev=245359&root=gcc&view=rev Log: PR tree-ssa/56727 * gcc.dg/tree-ssa/pr56727.c: New testcase. * ipa-utils.c (recursive_call_p): Be more careful about interposition. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/pr56727.c Modified: trunk/gcc/ChangeLog trunk/gcc/ipa-utils.c trunk/gcc/testsuite/ChangeLog
Author: ygribov Date: Fri Jul 21 19:48:51 2017 New Revision: 250442 URL: https://gcc.gnu.org/viewcvs?rev=250442&root=gcc&view=rev Log: 2017-07-21 Yury Gribov <tetra2005@gmail.com> PR middle-end/56727 * ipa-visibility (function_and_variable_visibility): Convert recursive PLT call to direct call if appropriate. * gcc.dg/pr56727-1.c: New test. * gcc.dg/pr56727-2.c: New test. Added: trunk/gcc/testsuite/gcc.dg/pr56727-1.c trunk/gcc/testsuite/gcc.dg/pr56727-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/ipa-visibility.c trunk/gcc/testsuite/ChangeLog
Fixed.