https://bugzilla.suse.com/show_bug.cgi?id=1212101 reports crashes of Firefox because with LTO we end up with comdats built with different ISA flags and the linker choosing the "wrong" one. While that's general a user error the TUs in question have no uses of the offending symbols and they are indeed eliminated when optimizing. But this elimination happens only after LTO streaming which is unfortunate. We also seem to lack any diagnostic at WPA time that we have COMDAT (group members) from several TUs that are not built with the same options. Apart from diagnosing this (with -Wodr?) a possible DWIM solution could be to clone the COMDAT (groups), overriding the linker decision and preserving the edges to the original refered copy. That's also superior to using symbol visibility in the TUs as it would allow to share the comdats across TUs if they are built with the same set of flags. But first and foremost I wonder why we do not eliminate these symbols before LTO streaming.
Created attachment 55378 [details] preprocessed testcase This is one of the TUs in question, preprocessed with GCC 13. It was built with -O3 -mavx
The desired symbol table before LTO streaming has no symbols in the skvx:: namespace Just look at the final symbol table when not compiling with -flto, it takes until the 'Optimized Symbol table' to do that, interestingly we have Reclaiming functions: Reclaiming variables: Clearing address taken flags: Optimized Symbol table: so it doesn't claim to do anything.
So it's actually that we require the instantiations but we are not able to fully optimize avx::[rect_]memset* before LTO streaming. In fact we're not too far from that: void skvx::Vec<2, unsigned int>::Vec (struct Vec * const this, unsigned int D.78460) { unsigned int _3(D); <bb 2> [local count: 1073741824]: MEM[(struct VecStorage *)this_2(D)] ={v} {CLOBBER}; MEM[(struct Vec *)this_2(D)] ={v} {CLOBBER}; MEM[(struct Vec *)this_2(D)].val = _3(D); MEM[(struct Vec *)this_2(D) + 4B] ={v} {CLOBBER}; MEM[(struct Vec *)this_2(D) + 4B].val = _3(D); return; } void skvx::Vec<4, unsigned int>::Vec (struct Vec * const this, unsigned int D.78672) { unsigned int _3(D); struct Vec * _4; struct Vec * _5; <bb 2> [local count: 1073741824]: MEM[(struct VecStorage *)this_2(D)] ={v} {CLOBBER}; _4 = &MEM[(struct VecStorage *)this_2(D)].lo; skvx::Vec<2, unsigned int>::Vec (_4, _3(D)); _5 = &MEM[(struct VecStorage *)this_2(D)].hi; skvx::Vec<2, unsigned int>::Vec (_5, _3(D)); return; } void avx::memsetT<unsigned int> (unsigned int * buffer, unsigned int value, int count) { struct Vec wideValue; long unsigned int _17; <bb 2> [local count: 118111600]: MEM[(struct VecStorage *)&wideValue] ={v} {CLOBBER}; skvx::Vec<4, unsigned int>::Vec (&MEM[(struct VecStorage *)&wideValue].lo, value_8(D)); skvx::Vec<4, unsigned int>::Vec (&MEM[(struct VecStorage *)&wideValue].hi, value_8(D)); we go inlining skvx::Vec<8, unsigned int>::Vec here which eventually expands __attribute__((always_inline)) void skvx::VecStorage<8, unsigned int>::VecStorage (struct VecStorage * const this, unsigned int s) { but we do not inline into that function and as we do not iterate early inlining we stop here. Removing all always_inline from the TU fixes the issue.
So we run into /* Never inline regular functions into always-inline functions during incremental inlining. This sucks as functions calling always inline functions will get less optimized, but at the same time inlining of functions calling always inline function into an always inline function might introduce cycles of edges to be always inlined in the callgraph. We might want to be smarter and just avoid this type of inlining. */ || (DECL_DISREGARD_INLINE_LIMITS (node->decl) && lookup_attribute ("always_inline", DECL_ATTRIBUTES (node->decl)))) which I think is sensible but the issue is that for some reason the skvx::Vec<4, unsigned int>::Vec (_1, s_6(D)) calls are not always_inline. (gdb) p node->callees $3 = <cgraph_edge* 0x7ffff301c068 (<cgraph_node * 0x7ffff2f43000 "__ct_base "/8008> -> <cgraph_node * 0x7ffff3338dd0 "__ct_comp "/8006>)> (gdb) p node->callees->next_callee $4 = <cgraph_edge* 0x7ffff301c000 (<cgraph_node * 0x7ffff2f43000 "__ct_base "/8008> -> <cgraph_node * 0x7ffff3338dd0 "__ct_comp "/8006>)> __attribute__((always_inline)) void skvx::VecStorage<8, unsigned int>::VecStorage (struct VecStorage * const this, unsigned int s) { struct Vec * _1; struct Vec * _2; <bb 2> : *this_4(D) ={v} {CLOBBER}; _1 = &this_4(D)->lo; skvx::Vec<4, unsigned int>::Vec (_1, s_6(D)); _2 = &this_4(D)->hi; skvx::Vec<4, unsigned int>::Vec (_2, s_6(D)); return; the location info points to __attribute__((always_inline)) Vec(typename ConvertNative<N, T>::type native) : Vec(bit_pun<Vec>(native)) {} I'm not sure what fails to work here.
So the CTOR we call is built (well, cloned from that) via implicitly_declare_fn (kind=sfk_inheriting_constructor, type=<record_type 0x7ffff3423e70 Vec>, const_p=false, pattern_fn=<function_decl 0x7ffff3427600 __ct >, inherited_parms=<tree_list 0x7ffff34298c0>) at /space/rguenther/src/gcc11queue/gcc/cp/method.cc:3245 and 'pattern_fn' has the always_inline attribute here but we don't seem to copy that anywhere? -fno-new-inheriting-ctors also "fixes" the optimization issue for me. So do we fail to copy DECL_ATTRIBUTES for these kind of implicitely declared functions (which are in fact explicitely declared?!)?
Comdats are really in conflict with the fact that we have command line options. I blame C++ standard for that and I don't think there is fully satisfactory solution to this problem. I was playing with the idea of warning when at lto time when comdats have different command line options, but this triggers way too often in practice. We would need to determine "dangerous" one i.e. when -fno-avx2 is replaced by -favx2 code. There are many ways one can stubly change semantics of the IL which makes merging possibly dangerous which is done often in larger projects, like firefox. With syntactic aliases it is possible to keep multiple copies of comdat function through merging process so inlining will chose corresponding one, but it does make other things harder. One important anoyance is that it makes it a lot harder to estimate overall size of the translation unit and how inlining affects it. We currently assume that every function will need offline unless all calls to it disappears. We will need to understand that this is true across all syntacit aliases. Also that conditional that disables early inliner for all always_inlines is probably bit harmful these days as libstdc++ indeed has quite interesting set of always_inlines that call normal inlines. I noticed that just recently while looking into the push_back implementation how complex maze it got. I will fix early inliner to allow safe inlining to always_inlines.
(In reply to Jan Hubicka from comment #6) > Comdats are really in conflict with the fact that we have command line > options. I blame C++ standard for that and I don't think there is fully > satisfactory solution to this problem. > > I was playing with the idea of warning when at lto time when comdats have > different command line options, but this triggers way too often in practice. Really? :/ > We would need to determine "dangerous" one i.e. when -fno-avx2 is replaced > by -favx2 code. > There are many ways one can stubly change semantics of the IL which makes > merging possibly dangerous which is done often in larger projects, like > firefox. I think it would be desirable to diagnose these, maybe with an option to selectively disable this specific diagnostic. Because while it is not always a correctness issue it can be a performance issue as well. > With syntactic aliases it is possible to keep multiple copies of comdat > function through merging process so inlining will chose corresponding one, > but it does make other things harder. One important anoyance is that it > makes it a lot harder to estimate overall size of the translation unit and > how inlining affects it. We currently assume that every function will need > offline unless all calls to it disappears. We will need to understand that > this is true across all syntacit aliases. > > Also that conditional that disables early inliner for all always_inlines is > probably bit harmful these days as libstdc++ indeed has quite interesting > set of always_inlines that call normal inlines. I noticed that just recently > while looking into the push_back implementation how complex maze it got. > > I will fix early inliner to allow safe inlining to always_inlines. Beware of new always-inline calls then appearing after greedy inlining (though that's exactly the case that we try to avoid here). I suppose you could disable inlining of a function which contains always-inline calls or simply functions that did not yet have the early inliner run on them (so keep the current behavior in cycles). Beware of indirect always-inline calls then. Btw, for Skia the issue is really that some auto-generated CTOR isn't marked always-inline but everything else is. Maybe they should use flatten instead of always-inline ...
> > I was playing with the idea of warning when at lto time when comdats have > > different command line options, but this triggers way too often in practice. > > Really? :/ Yep, for example firefox consist of many extra libraries (skia, video codecs, image format decoders...) each developed independently LTOed into one libxul. Each of them has its own configure that tampers with command line options. These brings in libstdc++ comdats with different command line sets (over 100 of different command lines for std::vector). Firefox is bit extreme, but it is common for other bigger projects, too. > > I think it would be desirable to diagnose these, maybe with an option to > selectively disable this specific diagnostic. Because while it is not > always a correctness issue it can be a performance issue as well. I can dig out the patch then. But it was simply printing something like warning: merging comdat function with function of same name but different flags note: first difference is -f..... which was produced similar way we do diffs for optimization and target attributes. Now those -f.... were usually quite misleading as they tended to be internal flags implied by something else (such as -Ofast instead of -O2). Often the picked -f flag was obviously harmless and false positive, however other -f flag might have been important. So I concluded that it is not very easy information to interpret from user perspective. But indeed I agree htat this is not very obvious source of interesting correctness & performance issues. > > Beware of new always-inline calls then appearing after greedy inlining > (though that's exactly the case that we try to avoid here). I suppose > you could disable inlining of a function which contains always-inline > calls or simply functions that did not yet have the early inliner run > on them (so keep the current behavior in cycles). Beware of indirect > always-inline calls then. > > Btw, for Skia the issue is really that some auto-generated CTOR isn't > marked always-inline but everything else is. Maybe they should use > flatten instead of always-inline ... We disable inlining to always_inline during early inline, but not greedy inline. Both of them can turn indirect calls to direct calls. So I was thinking that as first cut we can inline only callees with no indirect calls and no inlinable direct calls into always_inlines with no indirect calls. Are you worried about possibility of early opt inventing new call into builtin function that is defined as always_inline?
Just so it is somewhere, here is a testcase that we can't inline leaf functions to always_inlines unless we do some tracking of what calls were formerly indirect calls. We really overloaded always_inline from the original semantics "drop inlining heuristics" into "be sure that result is inlined" while for the second it does not make sense to take its address. Clang apparently simply does not error on failes always inlines which makes its life easier. int n; typedef void (*fnptr)(); fnptr get_me(); __attribute__ ((always_inline)) inline void test(void) { if (n < 10) (get_me())(); n++; return; } fnptr get_me() { return test; } void foo() { test(); }
On Fri, 23 Jun 2023, hubicka at ucw dot cz wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110334 > > --- Comment #8 from Jan Hubicka <hubicka at ucw dot cz> --- > > > I was playing with the idea of warning when at lto time when comdats have > > > different command line options, but this triggers way too often in practice. > > > > Really? :/ > Yep, for example firefox consist of many extra libraries (skia, video > codecs, image format decoders...) each developed independently LTOed > into one libxul. Each of them has its own configure that tampers with > command line options. > These brings in libstdc++ comdats with different command line > sets (over 100 of different command lines for std::vector). > > Firefox is bit extreme, but it is common for other bigger projects, too. > > > > I think it would be desirable to diagnose these, maybe with an option to > > selectively disable this specific diagnostic. Because while it is not > > always a correctness issue it can be a performance issue as well. > > I can dig out the patch then. But it was simply printing something like > warning: merging comdat function with function of same name but > different flags > note: first difference is -f..... > which was produced similar way we do diffs for optimization and target > attributes. > > Now those -f.... were usually quite misleading as they tended to be > internal flags implied by something else (such as -Ofast instead of > -O2). Often the picked -f flag was obviously harmless and false > positive, however other -f flag might have been important. > > So I concluded that it is not very easy information to interpret from user > perspective. > > But indeed I agree htat this is not very obvious source of interesting > correctness & performance issues. Hmm, maybe we should then really go the semantic alias route for QOI reasons? We can then still diagnose "preserving comdat symbol because of different flags in TU X vs. Y" and maybe we can also estimate a total unit growth because of doing that? > > > > Beware of new always-inline calls then appearing after greedy inlining > > (though that's exactly the case that we try to avoid here). I suppose > > you could disable inlining of a function which contains always-inline > > calls or simply functions that did not yet have the early inliner run > > on them (so keep the current behavior in cycles). Beware of indirect > > always-inline calls then. > > > > Btw, for Skia the issue is really that some auto-generated CTOR isn't > > marked always-inline but everything else is. Maybe they should use > > flatten instead of always-inline ... > > We disable inlining to always_inline during early inline, but not greedy > inline. Both of them can turn indirect calls to direct calls. So I was > thinking that as first cut we can inline only callees with no indirect > calls and no inlinable direct calls into always_inlines with no indirect > calls. > > Are you worried about possibility of early opt inventing new call into > builtin function that is defined as always_inline? No, just that we end up with a always-inline cycle (aka recursion) which we used to time/memory bomb on. That is, sth like the following has a "valid" inline solution (valid aka all always-inline calls are inlined), but only if we break the cycle at the 'forward' call. We used to mishandle this when unlucky with ordering (depending on which edge we enter the cycle when ordering for early opts). static void forward (void); static inline void __attribute__((always_inline)) always() { forward (); } static void forward (void) { always (); }; void entry1 () { forward1 (); } void entry2 () { always (); } now you can make this less obvious a cycle via indirect calls, maybe not without taking the address of always-inline functions which would be on the border of being a user error. Richard.
Hi, what about this. It should make at least quite basic inlining to happen to always_inline. I do not think many critical always_inlines have indirect calls in them. The test for lto is quite bad and I can work on solving this incrementally (it would be nice to have this tested and possibly backport it). diff --git a/gcc/ipa-inline.cc b/gcc/ipa-inline.cc index efc8df7d4e0..dcec07e49e1 100644 --- a/gcc/ipa-inline.cc +++ b/gcc/ipa-inline.cc @@ -702,6 +702,38 @@ can_early_inline_edge_p (struct cgraph_edge *e) if (!can_inline_edge_p (e, true, true) || !can_inline_edge_by_limits_p (e, true, false, true)) return false; + /* When inlining regular functions into always-inline functions + during early inlining watch for possible inline cycles. */ + if (DECL_DISREGARD_INLINE_LIMITS (caller->decl) + && lookup_attribute ("always_inline", DECL_ATTRIBUTES (caller->decl)) + && (!DECL_DISREGARD_INLINE_LIMITS (callee->decl) + || !lookup_attribute ("always_inline", DECL_ATTRIBUTES (callee->decl)))) + { + /* If there are indirect calls, inlining may produce direct call. + TODO: We may lift this restriction if we avoid errors on formely + indirect calls to always_inline functions. Taking address + of always_inline function is generally bad idea and should + have been declared as undefined, but sadly we allow this. */ + if (caller->indirect_calls || e->callee->indirect_calls) + return false; + for (cgraph_edge *e2 = callee->callees; e2; e2 = e2->next_callee) + { + struct cgraph_node *callee2 = e2->callee->ultimate_alias_target (); + /* As early inliner runs in RPO order, we will see uninlined + always_inline calls only in the case of cyclic graphs. */ + if (DECL_DISREGARD_INLINE_LIMITS (callee2->decl) + || lookup_attribute ("always_inline", callee2->decl)) + return false; + /* With LTO watch for case where function is later replaced + by always_inline definition. + TODO: We may either stop treating noninlined cross-module always + inlines as errors, or we can extend decl merging to produce + syntacic alias and honor always inline only in units it has + been declared as such. */ + if (flag_lto && callee2->externally_visible) + return false; + } + } return true; } @@ -3034,18 +3066,7 @@ early_inliner (function *fun) if (!optimize || flag_no_inline - || !flag_early_inlining - /* Never inline regular functions into always-inline functions - during incremental inlining. This sucks as functions calling - always inline functions will get less optimized, but at the - same time inlining of functions calling always inline - function into an always inline function might introduce - cycles of edges to be always inlined in the callgraph. - - We might want to be smarter and just avoid this type of inlining. */ - || (DECL_DISREGARD_INLINE_LIMITS (node->decl) - && lookup_attribute ("always_inline", - DECL_ATTRIBUTES (node->decl)))) + || !flag_early_inlining) ; else if (lookup_attribute ("flatten", DECL_ATTRIBUTES (node->decl)) != NULL)
On Mon, 26 Jun 2023, hubicka at ucw dot cz wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110334 > > --- Comment #11 from Jan Hubicka <hubicka at ucw dot cz> --- > Hi, > what about this. It should make at least quite basic inlining to happen > to always_inline. I do not think many critical always_inlines have > indirect calls in them. The test for lto is quite bad and I can > work on solving this incrementally (it would be nice to have this > tested and possibly backport it). > > diff --git a/gcc/ipa-inline.cc b/gcc/ipa-inline.cc > index efc8df7d4e0..dcec07e49e1 100644 > --- a/gcc/ipa-inline.cc > +++ b/gcc/ipa-inline.cc > @@ -702,6 +702,38 @@ can_early_inline_edge_p (struct cgraph_edge *e) > if (!can_inline_edge_p (e, true, true) > || !can_inline_edge_by_limits_p (e, true, false, true)) > return false; > + /* When inlining regular functions into always-inline functions > + during early inlining watch for possible inline cycles. */ > + if (DECL_DISREGARD_INLINE_LIMITS (caller->decl) > + && lookup_attribute ("always_inline", DECL_ATTRIBUTES (caller->decl)) > + && (!DECL_DISREGARD_INLINE_LIMITS (callee->decl) > + || !lookup_attribute ("always_inline", DECL_ATTRIBUTES > (callee->decl)))) > + { > + /* If there are indirect calls, inlining may produce direct call. > + TODO: We may lift this restriction if we avoid errors on formely > + indirect calls to always_inline functions. Taking address > + of always_inline function is generally bad idea and should > + have been declared as undefined, but sadly we allow this. */ > + if (caller->indirect_calls || e->callee->indirect_calls) why disallow caller->indirect_calls? > + return false; > + for (cgraph_edge *e2 = callee->callees; e2; e2 = e2->next_callee) I don't think this flys - it looks quadratic. Can we compute this in the inline summary once instead? As for indirect calls, can we maybe mark initial direct GIMPLE call stmts as "always-inline" and only look at that marking, thus an indirect call will never become "always-inline"? Iff cgraph edges prevail during all early inlining we could mark call edges for this purpose? > + { > + struct cgraph_node *callee2 = e2->callee->ultimate_alias_target (); > + /* As early inliner runs in RPO order, we will see uninlined > + always_inline calls only in the case of cyclic graphs. */ > + if (DECL_DISREGARD_INLINE_LIMITS (callee2->decl) > + || lookup_attribute ("always_inline", callee2->decl)) > + return false; > + /* With LTO watch for case where function is later replaced > + by always_inline definition. > + TODO: We may either stop treating noninlined cross-module always > + inlines as errors, or we can extend decl merging to produce > + syntacic alias and honor always inline only in units it has > + been declared as such. */ > + if (flag_lto && callee2->externally_visible) > + return false; > + } > + } > return true; > } > > @@ -3034,18 +3066,7 @@ early_inliner (function *fun) > > if (!optimize > || flag_no_inline > - || !flag_early_inlining > - /* Never inline regular functions into always-inline functions > - during incremental inlining. This sucks as functions calling > - always inline functions will get less optimized, but at the > - same time inlining of functions calling always inline > - function into an always inline function might introduce > - cycles of edges to be always inlined in the callgraph. > - > - We might want to be smarter and just avoid this type of inlining. */ > - || (DECL_DISREGARD_INLINE_LIMITS (node->decl) > - && lookup_attribute ("always_inline", > - DECL_ATTRIBUTES (node->decl)))) > + || !flag_early_inlining) > ; > else if (lookup_attribute ("flatten", > DECL_ATTRIBUTES (node->decl)) != NULL) > >
The trunk branch has been updated by Jason Merrill <jason@gcc.gnu.org>: https://gcc.gnu.org/g:abdf0b6cdff5783b97f35ad61ae31433f0569dfd commit r14-2149-gabdf0b6cdff5783b97f35ad61ae31433f0569dfd Author: Jason Merrill <jason@redhat.com> Date: Tue Jun 27 05:15:01 2023 -0400 c++: inherited constructor attributes Inherited constructors are like constructor clones; they don't exist from the language perspective, so they should copy the attributes in the same way. But it doesn't make sense to copy alias or ifunc attributes in either case. Unlike handle_copy_attribute, we do want to copy inlining attributes. The discussion of PR110334 pointed out that we weren't copying the always_inline attribute, leading to poor inlining choices. PR c++/110334 gcc/cp/ChangeLog: * cp-tree.h (clone_attrs): Declare. * method.cc (implicitly_declare_fn): Use it for inherited constructor. * optimize.cc (clone_attrs): New. (maybe_clone_body): Use it. gcc/testsuite/ChangeLog: * g++.dg/cpp1z/nodiscard-inh1.C: New test.
> > why disallow caller->indirect_calls? See testcase in comment #9 > > > + return false; > > + for (cgraph_edge *e2 = callee->callees; e2; e2 = e2->next_callee) > > I don't think this flys - it looks quadratic. Can we compute this > in the inline summary once instead? I guess I can place a cache there. I think this check will become more global over time so it more fits IMO here. > > As for indirect calls, can we maybe mark initial direct GIMPLE call > stmts as "always-inline" and only look at that marking, thus an > indirect call will never become "always-inline"? Iff cgraph edges > prevail during all early inlining we could mark call edges for > this purpose? I also think we need call site specific info. Tagging gimple call statements and copying the info to gimple edges will probably be needed here. We want to keep the info from early inlining to late inlining since we output errors late. We already have plenty of GF_CALL_ flags, so adding one should be easy? Honza
On Wed, 28 Jun 2023, hubicka at ucw dot cz wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110334 > > --- Comment #14 from Jan Hubicka <hubicka at ucw dot cz> --- > > > > why disallow caller->indirect_calls? > See testcase in comment #9 > > > > > + return false; > > > + for (cgraph_edge *e2 = callee->callees; e2; e2 = e2->next_callee) > > > > I don't think this flys - it looks quadratic. Can we compute this > > in the inline summary once instead? > > I guess I can place a cache there. I think this check will become more > global over time so it more fits IMO here. > > > > As for indirect calls, can we maybe mark initial direct GIMPLE call > > stmts as "always-inline" and only look at that marking, thus an > > indirect call will never become "always-inline"? Iff cgraph edges > > prevail during all early inlining we could mark call edges for > > this purpose? > > I also think we need call site specific info. > Tagging gimple call statements and copying the info to gimple edges will > probably be needed here. We want to keep the info from early inlining > to late inlining since we output errors late. > We already have plenty of GF_CALL_ flags, so adding one should be easy? We have 3 bits left :/ I was hoping that cgraph_edge lives long enough? But I suppose we're not keeping them across the early opts pipeline.
> > We already have plenty of GF_CALL_ flags, so adding one should be easy? > > We have 3 bits left :/ I was hoping that cgraph_edge lives long > enough? But I suppose we're not keeping them across the early opts > pipeline. Hmm, so we have too many flags. Indeed problem is that we don't want to keep callgraph edges across all modifications gimple optimization passes does. Eventualy such annotations can probably go to hash_map just like we do for EH regions etc. Honza
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>: https://gcc.gnu.org/g:d88fd2e1d0720e6f892da9ff48e9a301a7ad0ad4 commit r14-2172-gd88fd2e1d0720e6f892da9ff48e9a301a7ad0ad4 Author: Jan Hubicka <jh@suse.cz> Date: Wed Jun 28 23:05:28 2023 +0200 Enable early inlining into always_inline functions Early inliner currently skips always_inline functions and moreover we ignore calls from always_inline in ipa_reverse_postorder. This leads to disabling most of propagation done using early optimization that is quite bad when early inline functions are not leaf functions, which is now quite common in libstdc++. This patch instead of fully disabling the inline checks calls in callee. I am quite conservative about what can be inlined as this patch is bit touchy anyway. To avoid problems with always_inline being optimized after early inline I extended inline_always_inline_functions to lazilly compute fnsummary when needed. gcc/ChangeLog: PR middle-end/110334 * ipa-fnsummary.h (ipa_fn_summary): Add safe_to_inline_to_always_inline. * ipa-inline.cc (can_early_inline_edge_p): ICE if SSA is not built; do cycle checking for always_inline functions. (inline_always_inline_functions): Be recrusive; watch for cycles; do not updat overall summary. (early_inliner): Do not give up on always_inlines. * ipa-utils.cc (ipa_reverse_postorder): Do not skip always inlines. gcc/testsuite/ChangeLog: PR middle-end/110334 * g++.dg/opt/pr66119.C: Disable early inlining. * gcc.c-torture/compile/pr110334.c: New test. * gcc.dg/tree-ssa/pr110334.c: New test.
For skia, wouldn't the best fix be stop compiling the TU(s) with -mavx but instead #pragma GCC push_options #pragma GCC target("avx") after including headers and #pragma GCC pop_options at the end (plus clang equivalents)? Then whatever is defined in that file would be -mavx and the comdats if inlined would do too, but if they aren't inlined would still not have it?
It seems that the C++ FE change in comment#13 causes libreoffice to fail to build with [ 553s] /home/abuild/rpmbuild/BUILD/libreoffice-7.5.4.2/workdir/UnpackedTarball/skia/include/private/SkVx.h: In function 'interpret_skvm': [ 553s] /home/abuild/rpmbuild/BUILD/libreoffice-7.5.4.2/workdir/UnpackedTarball/skia/include/private/SkVx.h:150: error: inlining failed in call to 'always_inline' '__ct_comp ': target specific option mismatch [ 553s] 150 | using VecStorage<N,T>::VecStorage; [ 553s] | I have yet to reproduce and extract a testcase though.
(In reply to Richard Biener from comment #19) > It seems that the C++ FE change in comment#13 causes libreoffice to fail to > build with > > [ 553s] > /home/abuild/rpmbuild/BUILD/libreoffice-7.5.4.2/workdir/UnpackedTarball/skia/ > include/private/SkVx.h: In function 'interpret_skvm': > [ 553s] > /home/abuild/rpmbuild/BUILD/libreoffice-7.5.4.2/workdir/UnpackedTarball/skia/ > include/private/SkVx.h:150: error: inlining failed in call to > 'always_inline' '__ct_comp ': target specific option mismatch > [ 553s] 150 | using VecStorage<N,T>::VecStorage; > [ 553s] | > > I have yet to reproduce and extract a testcase though. libreoffice has a skia import and links its Library/libskialo.so library with LTO, combining the AVX, HSW, SSE41, SSE42, SSSE3, CRC32, SKX, ... SkOpts_*.o objects which likely follow a very similar scheme as the firefox copy. SkOpts_hsw.cpp looks like #include "src/core/SkOpts.h" #define SK_OPTS_NS hsw #include "src/core/SkCubicSolver.h" #include "src/opts/SkBitmapProcState_opts.h" #include "src/opts/SkBlitRow_opts.h" #include "src/opts/SkRasterPipeline_opts.h" #include "src/opts/SkSwizzler_opts.h" #include "src/opts/SkUtils_opts.h" #include "src/opts/SkVM_opts.h" namespace SkOpts { void Init_hsw() { blit_row_color32 = hsw::blit_row_color32; blit_row_s32a_opaque = hsw::blit_row_s32a_opaque; S32_alpha_D32_filter_DX = hsw::S32_alpha_D32_filter_DX; cubic_solver = SK_OPTS_NS::cubic_solver; RGBA_to_BGRA = SK_OPTS_NS::RGBA_to_BGRA; RGBA_to_rgbA = SK_OPTS_NS::RGBA_to_rgbA; RGBA_to_bgrA = SK_OPTS_NS::RGBA_to_bgrA; gray_to_RGB1 = SK_OPTS_NS::gray_to_RGB1; grayA_to_RGBA = SK_OPTS_NS::grayA_to_RGBA; grayA_to_rgbA = SK_OPTS_NS::grayA_to_rgbA; inverted_CMYK_to_RGB1 = SK_OPTS_NS::inverted_CMYK_to_RGB1; inverted_CMYK_to_BGR1 = SK_OPTS_NS::inverted_CMYK_to_BGR1; #define M(st) stages_highp[SkRasterPipeline::st] = (StageFn)SK_OPTS_NS::st; SK_RASTER_PIPELINE_STAGES(M) just_return_highp = (StageFn)SK_OPTS_NS::just_return; start_pipeline_highp = SK_OPTS_NS::start_pipeline; #undef M #define M(st) stages_lowp[SkRasterPipeline::st] = (StageFn)SK_OPTS_NS::lowp::st; SK_RASTER_PIPELINE_STAGES(M) just_return_lowp = (StageFn)SK_OPTS_NS::lowp::just_return; start_pipeline_lowp = SK_OPTS_NS::lowp::start_pipeline; #undef M interpret_skvm = SK_OPTS_NS::interpret_skvm; } } // namespace SkOpts the question in the end is why we fail to process the always_inlines early here and only discover them late. Unfortunately the inline diagnostic doesn't print the original TU name (that would be useful for all late LTO diagnostics). I'm not sure whether this is an issue with the C++ frontend change or with the ODR issues present in Skia. Eventually since we're dealing with aliases(?), always_inline isn't appropriately detected or still missing on some callees. Unfortunately there isn't a knob to diagnose late inlined always-inline functions.
Created attachment 55512 [details] another testcase This one needs -mavx2 -mf16c -mfma -fPIC -O2 -std=c++17
I will cook up the patch to keep multiple variants of nodes pre-inline and we will see how much that affects compile time & how hard it will be to get unit size esitmates right.
But it would be nice to see why the functions are not early inlined.
On Tue, 11 Jul 2023, hubicka at ucw dot cz wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110334 > > --- Comment #23 from Jan Hubicka <hubicka at ucw dot cz> --- > But it would be nice to see why the functions are not early inlined. Note this was on the 13 branch with the C++ fix backported, so before your changes to allow to inline into always-inline. But yes, it would be nice to confirm the issue is gone on trunk and to see where always_inline is still missing (if it is).
GCC 13.2 is being released, retargeting bugs to GCC 13.3.
(In reply to Richard Biener from comment #20) > Unfortunately there isn't a knob to diagnose late inlined always-inline > functions. Is there a separate bug for this?
(In reply to Eric Gallager from comment #26) > (In reply to Richard Biener from comment #20) > > Unfortunately there isn't a knob to diagnose late inlined always-inline > > functions. > > Is there a separate bug for this? I don't think so.
GCC 13.3 is being released, retargeting bugs to GCC 13.4.
GCC 13.4 is being released, retargeting bugs to GCC 13.5.