we are used to mix in the same library code compiled with -O2 and -Os Lto seems to ignore the options used in the "compilation" steps It does honor "pragma GCC optimize" though example cat optopt.cc // #pragma GCC optimize ("0") void bar(int); inline void foo(int i, int j) { if (i>0) bar(i); if (j>0) bar(j); if (i>0) bar(j); if (j>0) bar(i); }; void foo1(int i, int j) { foo(i,j); } void foo2(int i, int j) { foo(i,j); } void foo3(int i, int j) { foo(i,j); } c++ -flto -fno-fat-lto-objects -Os -c optopt.cc -fPIC c++ -flto -O2 -shared optopt.o -fPIC -o optopt.so; nm -C optopt.so …. U bar(int) 00000000000007a0 T foo1(int, int) 0000000000000740 T foo2(int, int) 00000000000006e0 T foo3(int, int) …. c++ -flto -Os -shared optopt.o -fPIC -o optopt.so; nm -C optopt.so …. U bar(int) 00000000000006e0 t foo(int, int) [clone .local.0.2370] 000000000000071c T foo1(int, int) 000000000000071a T foo2(int, int) 0000000000000718 T foo3(int, int) … if I decomment the pragma I get what intended c++ -flto -fno-fat-lto-objects -O2 -c optopt.cc -fPIC c++ -flto -O2 -shared optopt.o -fPIC -o optopt.so; nm -C optopt.so U bar(int) 00000000000006e0 t foo(int, int) [clone .local.0.2370] 0000000000000760 T foo1(int, int) 0000000000000750 T foo2(int, int) 0000000000000740 T foo3(int, int) due to PR53776 I cannot specify Os using a pragma. so I am a bit stuck
;) There are a vast number of things the developer might expect when flags used at compile time differ from flags used at link-optimization time. At the moment the semantics are (roughly) that you can think of the link-optimization time flags to be appended to the flags used at compile-time. Thus -O2 overrides -Os here. The implementation does not explicitely retain (all) of the compile-time flags - they might differ between different CUs after all - but only a very selected subset - and expects that meaningful link-optimization time options are present. You seem to want that code compiled with -Os would be implicitely wrapped in a optimize/target pragma with the options specified at compile-time. Note that this would inhibit inlining -Os code into -O2 code or vice-versa. Note also that in your example the -O2 given at link-time would be ignored? Note also that the optimize attribute/pragma has serious implementation issues and thus this kind of general use would not likely be a good idea. Suggestions (with formal specification ;)) of how to produce from Options(CU1), Options(CU2) ..., Options(Link-Time) a set of options effective at link-time is welcome. The current logic is implemented inside lto-wrapper.c (the LTO driver) and lto-opts.c (which pre-filters which options from compile-time are presented to the LTO driver).
I fully agree that combining code sections to be optimized differently is not well defined in particular when optimization works looking at a broader scope. In principle one can imagine that an optimization-flag change should be considered as an optimization barrier, but that clearly defeats the very purpose of optimization itself and may lead to code that is less performant that if compiled with the lower of all options... Most probably we should go back and understand WHY users choose different optimization level for different code sections. In some cases is to workaround a compiler or coding problem In our specific case the code to be compiled with -Os is "machine generated" and contains mostly streamers not very sensitive to aggressive optimization (vectorization helps in some, at best) : we judged that a smaller code (much smaller actually!) and faster compilation was more effective. It is packaged in the same library with the class they have to stream for convenience and dependency management. In reality I noticed that with lto the compilation time is fully dominated by those files, so it is surely not "fast-developement" friendly. I suspect that the final solution will be to segregate them into their own library. I also experimented with trying to optimize more aggressively some computational-intensive code segments. In principle it could make sense, in practice I understand that as soon as inter-procedural optimization kicks-in having code fragments with Ofast and other with O2 can make little sense even from a pure numerical point of view: think of an expression found in both sections that can be factorized out. I suspect that the only safe way is to to segregate the code requested to be compiled with different options, for sure if the option is "lower". Most probably this is not what other users expects.
(In reply to comment #2) > I fully agree that combining code sections to be optimized differently is not > well defined in particular when optimization works looking at a broader scope. > In principle one can imagine that an optimization-flag change should be > considered as an optimization barrier, but that clearly defeats the very > purpose of optimization itself and may lead to code that is less performant > that if compiled with the lower of all options... > > Most probably we should go back and understand WHY users choose different > optimization level for different code sections. > > In some cases is to workaround a compiler or coding problem Right. In that case I'd say the specific object should better not participate in the LTO link (thus do not use -flto at .c -> .o compile time for that object). > In our specific case the code to be compiled with -Os is "machine generated" > and contains mostly streamers not very sensitive to aggressive optimization > (vectorization helps in some, at best) : we judged that a smaller code (much > smaller actually!) and faster compilation was more effective. > It is packaged in the same library with the class they have to stream for > convenience and dependency management. > In reality I noticed that with lto the compilation time is fully dominated by > those files, so it is surely not "fast-developement" friendly. I suspect that > the final solution will be to segregate them into their own library. > > I also experimented with trying to optimize more aggressively some > computational-intensive code segments. > In principle it could make sense, in practice I understand that as soon as > inter-procedural optimization kicks-in having code fragments with Ofast and > other with O2 can make little sense even from a pure numerical point of view: > think of an expression found in both sections that can be factorized out. A very convenient way of "optimize more aggressively some computational-intensive code segments" is to use profile-feedback ;) > I suspect that the only safe way is to to segregate the code requested to be > compiled with different options, for sure if the option is "lower". Most > probably this is not what other users expects. So eventually you want to simply not compile the -Os code with -flto so that it won't participate in link-time optimization (with using the linker plugin you can still get most of the effect of whole-program assumptions). Or if you have several -Os sources, do a partial -flto link with -Os and link that object in the link-time optimization with -O2. As what the user expects that is not really clear ;) A quite extreme way would be to simply assert that flags on all compiles and the link are the same and otherwise give up (or warn). Another idea was that if at link-time no compile options are specified (might be quite common, but at least people added -flto somehow) then use the options from compile-time (but which, if they differed?). Another idea was to tag the cgraph with the compile options used and adjust the partitioning done for the LTRANS stage to be able to have consistent options (that might differ between LTRANS units) for all functions inside one LTRANS unit. For all of the different-flags-at-compile-time issues the issue remains what the flags used at link-time mean? Do they override or amend flags? (We specifically thought of libraries shipped with LTO bytecode compiled at -O0 -g to be used for both debugging and optimized compile by means of optimizing only at link-time)
On 28 Jun, 2012, at 11:44 AM, rguenth at gcc dot gnu.org wrote: > > A very convenient way of "optimize more aggressively some > computational-intensive code segments" is to use profile-feedback ;) > Indeed, and I'm very pleased that gcc (since 4.6) is able to pgo our million-lines code! the use of Ofast is to relax IEEE754 conformance only in specific routines, mainly to allow "more" vectorization. >> I suspect that the only safe way is to to segregate the code requested to be >> compiled with different options, for sure if the option is "lower". Most >> probably this is not what other users expects. > > So eventually you want to simply not compile the -Os code with -flto so > that it won't participate in link-time optimization (with using the linker > plugin you can still get most of the effect of whole-program assumptions). this is what I'm doing now, as you may have guessed from PR53780... > Or if you have several -Os sources, do a partial -flto link with -Os and > link that object in the link-time optimization with -O2. I've no experience with -Wl,-r (or -Ur?) will try. > > As what the user expects that is not really clear ;) A quite extreme > way would be to simply assert that flags on all compiles and the link > are the same and otherwise give up (or warn). Another idea was that > if at link-time no compile options are specified (might be quite common, > but at least people added -flto somehow) then use the options from > compile-time (but which, if they differed?). Historically we always propagated compiler flags at the linker step. For sure with lto the meaning of flags at compile vs link time is quite confusing. > Another idea was to tag > the cgraph with the compile options used and adjust the partitioning > done for the LTRANS stage to be able to have consistent options (that > might differ between LTRANS units) for all functions inside one LTRANS unit. This is close to what I would expect to happen. > For all of the different-flags-at-compile-time issues the issue remains > what the flags used at link-time mean? Do they override or amend flags? I would suggest to issue warning or at least to clarify your choices > (We specifically thought of libraries shipped with LTO bytecode > compiled at -O0 -g to be used for both debugging and optimized compile > by means of optimizing only at link-time) > Clever, a bit confusing though. I've moved to -fno-fat-lto-objects also to make sure that the plugin is used.
I think the problem listed here is all fully fixed on the trunk (there has been many improvements over time even to get this fixed, even as recently as r12-5920 [PR 103515] ).