Summary: | LTO runs forever in libfabric 1.15.1 linking | ||
---|---|---|---|
Product: | gcc | Reporter: | Tomasz Kłoczko <kloczko.tomasz> |
Component: | lto | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | RESOLVED INVALID | ||
Severity: | normal | CC: | hubicka, marxin, sjames |
Priority: | P3 | Keywords: | compile-time-hog, memory-hog |
Version: | unknown | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Known to work: | ||
Known to fail: | Last reconfirmed: |
Description
Tomasz Kłoczko
2022-08-01 16:09:07 UTC
Last detail. I'm adding -Os to %build_cflags > -flto-partition=none
I suspect this is not in an infinite loop but rather the amount of memory required for all of the IR and meta-data is huge which means you are now swapping.
I suspect -flto-partition=none should/cannot not be used here really.
This box has 256GB of RAM and ZERO swap. Other detail is that produces DSO libfabric.so.1.18.1 without LTO has only 1340096 bytest so question is why lto needs in this case +10GB of RAM?🤔 +100GB Doctor it hurts! Then don't do it. Sorry, seriously, it's caused by the flatten attribute and I can reproduce it for our openSUSE package. The following helps: diff --git a/prov/opx/include/rdma/opx/fi_opx_compiler.h b/prov/opx/include/rdma/opx/fi_opx_compiler.h index 80493bd..e216faa 100644 --- a/prov/opx/include/rdma/opx/fi_opx_compiler.h +++ b/prov/opx/include/rdma/opx/fi_opx_compiler.h @@ -41,7 +41,7 @@ #define L2_CACHE_LINE_SIZE (64) #ifdef NDEBUG // No Debug, Optimizing -#define __OPX_FORCE_INLINE_AND_FLATTEN__ static inline __attribute__ ((always_inline, flatten)) +#define __OPX_FORCE_INLINE_AND_FLATTEN__ static inline __attribute__ ((always_inline)) #define __OPX_FORCE_INLINE__ static inline __attribute__ ((always_inline)) #else // NDEBUG #define __OPX_FORCE_INLINE_AND_FLATTEN__ static inline Hmm .. Martin even if that can be fixed in libfabric code does it still mean that it something wrong with LO optimisation? Sorry for asking maybe dumb question but this situation is not clear for me :) (In reply to Tomasz Kłoczko from comment #7) > Hmm .. Martin even if that can be fixed in libfabric code does it still mean > that it something wrong with LO optimisation? > Sorry for asking maybe dumb question but this situation is not clear for me > :) Basically with the flatten attribute and lto, every function needs to there and cloned and inlined causing a lot of memory and time really. Functions become huge and all. Gcc memory usage for some things can be improved but it won't be enough. (In reply to Andrew Pinski from comment #8) [..] > Basically with the flatten attribute and lto, every function needs to there > and cloned and inlined causing a lot of memory and time really. > Functions become huge and all. Gcc memory usage for some things can be > improved but it won't be enough. Knowing size of the non-LTO optimised DSO I suppose that sill it maybe some design issue (higher level) which is causing that inline operations are causing such gigantic memory usage increase. And/or maybe it would be good to organise some internal metric with such operation counter to display at least some warning that some threshold of such operations has been reached? Maybe I'm mumbling but I'm trying to find at least sone generic solution to have some at least linker fart that thing are going in wrong direction because what is implemented in the code .. (In reply to Tomasz Kłoczko from comment #9) > (In reply to Andrew Pinski from comment #8) > [..] > > Basically with the flatten attribute and lto, every function needs to there > > and cloned and inlined causing a lot of memory and time really. > > Functions become huge and all. Gcc memory usage for some things can be > > improved but it won't be enough. > > Knowing size of the non-LTO optimised DSO I suppose that sill it maybe some > design issue (higher level) which is causing that inline operations are > causing such gigantic memory usage increase. > And/or maybe it would be good to organise some internal metric with such > operation counter to display at least some warning that some threshold of > such operations has been reached? > Maybe I'm mumbling but I'm trying to find at least sone generic solution to > have some at least linker fart that thing are going in wrong direction > because what is implemented in the code .. The flatten attribute is designed to override all heuristics in the compiler that is designed to not cause the gignatic memory usage and compile time. Basically you told the compiler to ignore those. (In reply to Andrew Pinski from comment #10) > The flatten attribute is designed to override all heuristics in the compiler > that is designed to not cause the gignatic memory usage and compile time. > Basically you told the compiler to ignore those. Now I'm a bit confused because in this case looks like use flatten attribute caused high memory usage. Does it mean that (generally) flatten should not be used at the same time with inline? (In reply to Tomasz Kłoczko from comment #11) > (In reply to Andrew Pinski from comment #10) > > The flatten attribute is designed to override all heuristics in the compiler > > that is designed to not cause the gignatic memory usage and compile time. > > Basically you told the compiler to ignore those. > > Now I'm a bit confused because in this case looks like use flatten attribute > caused high memory usage. > Does it mean that (generally) flatten should not be used at the same time > with inline? The flatten attribute combined with LTO causes the high memory usage. Flatten means inline everything into that function and ignore heuristics that might otherwise block the inlining. Basically this means flatten should not be used combined with LTO. With LTO you could just allow the heuristics do its job and back off as needed. Thank you for the explanation. In addition to that, -flto-partition=none is almost never a good idea either. Note I think that we should honor flatten only during early inlining to avoid all kinds of funny behavior when applying cross TU. (In reply to Richard Biener from comment #14) > In addition to that, -flto-partition=none is almost never a good idea either. > > Note I think that we should honor flatten only during early inlining to > avoid all kinds of funny behavior when applying cross TU. Issue is that in few cases AFAIK it is only solution to some still unresolved LTO issues :/ > Issue is that in few cases AFAIK it is only solution to some still
> unresolved LTO issues :/
Well, in most cases it's used for symbol versioning which is implemented by assembly directives. However, we offer symver function attribute that survives LTO partitioning. One more reason can be usage of top-level asm, which can be mitigated by -fno-lto for units that use it.
(In reply to Martin Liška from comment #16) [..] > Well, in most cases it's used for symbol versioning which is implemented by > assembly directives. However, we offer symver function attribute that > survives LTO partitioning. One more reason can be usage of top-level asm, > which can be mitigated by -fno-lto for units that use it. Yes I know however many project still is not usig that macro. BTW I just realised that as long as low level versioning symbols is handled it turns ouit that this bug seems is only arount he code which is handling versioned symbols taken from sym file. It should not be so hard to fix that. Am I right? This bug is in the queue for et least two years. What is the difficultu with fixing that? > It should not be so hard to fix that. Am I right?
Do you mean the usage of symver attribute? No, it's quite a straightforward transformation many projects have already done.
(In reply to Martin Liška from comment #18) > > It should not be so hard to fix that. Am I right? > > Do you mean the usage of symver attribute? No, it's quite a straightforward > transformation many projects have already done. No, no .. I mean IIRC therea are few cases when versioned sym file is incorrectly handled if -flto-partition=none is not used. > No, no .. I mean IIRC therea are few cases when versioned sym file is
> incorrectly handled if -flto-partition=none is not used.
I'm not aware of any..
FYI I've opened libfabric ticket https://github.com/ofiwg/libfabric/issues/7916 Thank you one more time for all your explanations :) (In reply to Martin Liška from comment #6) > Doctor it hurts! Then don't do it. Sorry, seriously, it's caused by the > flatten attribute and I can reproduce it for our openSUSE package. If may I ask yet another question 😋 Martin can you tell how did you manage to diagnose that it was exactly that cause in this case?🤔 Thank you in advance. > If may I ask yet another question 😋 Sure, don't hesitate ;) > Martin can you tell how did you manage to diagnose that it was exactly that > cause in this case? I noticed we spent time in inliner (perf top) and then I suspected a flatten attribute. So 'git grep flatten' proved that. Thank you :) |