Bug 53777 - [lto] lto does not propagate optimization flags from command lines given at "compilation time"
Summary: [lto] lto does not propagate optimization flags from command lines given at "...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: lto (show other bugs)
Version: 4.7.1
: P3 normal
Target Milestone: 12.0
Assignee: Not yet assigned to anyone
URL:
Keywords: lto
Depends on:
Blocks:
 
Reported: 2012-06-26 08:33 UTC by vincenzo Innocente
Modified: 2021-12-24 04:12 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2012-06-26 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description vincenzo Innocente 2012-06-26 08:33:59 UTC
we are used to mix in the same library code compiled with -O2 and -Os
Lto seems to ignore the options used in the "compilation" steps
It does honor "pragma GCC optimize" though

example

cat optopt.cc
// #pragma GCC optimize ("0")

void bar(int);

inline void foo(int i, int j) {
  if (i>0) bar(i);
  if (j>0) bar(j);
  if (i>0) bar(j);
  if (j>0) bar(i);
 };


void foo1(int i, int j) {
  foo(i,j);
}
void foo2(int i, int j) {
  foo(i,j);
}
void foo3(int i, int j) {
  foo(i,j);
}

c++ -flto -fno-fat-lto-objects -Os -c optopt.cc -fPIC
c++ -flto -O2 -shared optopt.o -fPIC -o optopt.so; nm -C optopt.so
….
                 U bar(int)
00000000000007a0 T foo1(int, int)
0000000000000740 T foo2(int, int)
00000000000006e0 T foo3(int, int)
….
c++ -flto -Os -shared optopt.o -fPIC -o optopt.so; nm -C optopt.so
….
                 U bar(int)
00000000000006e0 t foo(int, int) [clone .local.0.2370]
000000000000071c T foo1(int, int)
000000000000071a T foo2(int, int)
0000000000000718 T foo3(int, int)
…


if I decomment the pragma I get what intended
c++ -flto -fno-fat-lto-objects -O2 -c optopt.cc -fPIC
c++ -flto -O2 -shared optopt.o -fPIC -o optopt.so; nm -C optopt.so
                 U bar(int)
00000000000006e0 t foo(int, int) [clone .local.0.2370]
0000000000000760 T foo1(int, int)
0000000000000750 T foo2(int, int)
0000000000000740 T foo3(int, int)

due to PR53776 I cannot specify Os using a pragma. so I am a bit stuck
Comment 1 Richard Biener 2012-06-26 09:10:25 UTC
;)

There are a vast number of things the developer might expect when flags
used at compile time differ from flags used at link-optimization time.

At the moment the semantics are (roughly) that you can think of the
link-optimization time flags to be appended to the flags used at compile-time.
Thus -O2 overrides -Os here.

The implementation does not explicitely retain (all) of the compile-time
flags - they might differ between different CUs after all - but only
a very selected subset - and expects that meaningful link-optimization time
options are present.

You seem to want that code compiled with -Os would be implicitely wrapped
in a optimize/target pragma with the options specified at compile-time.
Note that this would inhibit inlining -Os code into -O2 code or vice-versa.
Note also that in your example the -O2 given at link-time would be
ignored?

Note also that the optimize attribute/pragma has serious implementation
issues and thus this kind of general use would not likely be a good idea.

Suggestions (with formal specification ;)) of how to produce from
Options(CU1), Options(CU2) ..., Options(Link-Time) a set of options
effective at link-time is welcome.  The current logic is implemented
inside lto-wrapper.c (the LTO driver) and lto-opts.c (which pre-filters
which options from compile-time are presented to the LTO driver).
Comment 2 vincenzo Innocente 2012-06-28 05:07:57 UTC
I fully agree that combining code sections to be optimized differently is not well defined in particular when optimization works looking at a broader scope.
In principle one can imagine that an optimization-flag change should be considered as an optimization barrier, but that clearly defeats the very purpose of optimization itself and may lead to code that is less performant that if compiled with the lower of all options...

Most probably we should go back and understand WHY users choose different optimization level for different code sections.

In some cases is to workaround a compiler or coding problem

In our specific case the code to be compiled with -Os is "machine generated" and contains mostly streamers not very sensitive to aggressive optimization (vectorization helps in some, at best) : we judged that a smaller code (much smaller actually!) and faster compilation was more effective.
It is packaged in the same library with the class they have to stream for convenience and dependency management.
In reality I noticed that with lto the compilation time is fully dominated by those files, so it is surely not "fast-developement" friendly. I suspect that the final solution will be to segregate them into their own library.

I also experimented with trying to optimize more aggressively some computational-intensive code segments.
In principle it could make sense, in practice I understand that as soon as inter-procedural optimization kicks-in having code fragments with Ofast and other with O2 can make little sense even from a pure numerical point of view: think of an expression found in both sections that can be factorized out.

I suspect that the only safe way is to to segregate the code requested to be compiled with different options, for sure if the option is "lower". Most probably this is not what other users expects.
Comment 3 Richard Biener 2012-06-28 09:44:40 UTC
(In reply to comment #2)
> I fully agree that combining code sections to be optimized differently is not
> well defined in particular when optimization works looking at a broader scope.
> In principle one can imagine that an optimization-flag change should be
> considered as an optimization barrier, but that clearly defeats the very
> purpose of optimization itself and may lead to code that is less performant
> that if compiled with the lower of all options...
> 
> Most probably we should go back and understand WHY users choose different
> optimization level for different code sections.
> 
> In some cases is to workaround a compiler or coding problem

Right.  In that case I'd say the specific object should better not participate
in the LTO link (thus do not use -flto at .c -> .o compile time for that
object).

> In our specific case the code to be compiled with -Os is "machine generated"
> and contains mostly streamers not very sensitive to aggressive optimization
> (vectorization helps in some, at best) : we judged that a smaller code (much
> smaller actually!) and faster compilation was more effective.
> It is packaged in the same library with the class they have to stream for
> convenience and dependency management.
> In reality I noticed that with lto the compilation time is fully dominated by
> those files, so it is surely not "fast-developement" friendly. I suspect that
> the final solution will be to segregate them into their own library.
> 
> I also experimented with trying to optimize more aggressively some
> computational-intensive code segments.
> In principle it could make sense, in practice I understand that as soon as
> inter-procedural optimization kicks-in having code fragments with Ofast and
> other with O2 can make little sense even from a pure numerical point of view:
> think of an expression found in both sections that can be factorized out.

A very convenient way of "optimize more aggressively some computational-intensive code segments" is to use profile-feedback ;)

> I suspect that the only safe way is to to segregate the code requested to be
> compiled with different options, for sure if the option is "lower". Most
> probably this is not what other users expects.

So eventually you want to simply not compile the -Os code with -flto so
that it won't participate in link-time optimization (with using the linker
plugin you can still get most of the effect of whole-program assumptions).
Or if you have several -Os sources, do a partial -flto link with -Os and
link that object in the link-time optimization with -O2.

As what the user expects that is not really clear ;)  A quite extreme
way would be to simply assert that flags on all compiles and the link
are the same and otherwise give up (or warn).  Another idea was that
if at link-time no compile options are specified (might be quite common,
but at least people added -flto somehow) then use the options from
compile-time (but which, if they differed?).  Another idea was to tag
the cgraph with the compile options used and adjust the partitioning
done for the LTRANS stage to be able to have consistent options (that
might differ between LTRANS units) for all functions inside one LTRANS unit.
For all of the different-flags-at-compile-time issues the issue remains
what the flags used at link-time mean?  Do they override or amend flags?
(We specifically thought of libraries shipped with LTO bytecode
compiled at -O0 -g to be used for both debugging and optimized compile
by means of optimizing only at link-time)
Comment 4 vincenzo Innocente 2012-06-28 12:10:44 UTC
On 28 Jun, 2012, at 11:44 AM, rguenth at gcc dot gnu.org wrote:

> 
> A very convenient way of "optimize more aggressively some
> computational-intensive code segments" is to use profile-feedback ;)
> 
Indeed, and I'm very pleased that gcc (since 4.6) is able to pgo our million-lines code!
the use of Ofast is to relax IEEE754 conformance only in specific routines, mainly to allow "more" vectorization.

>> I suspect that the only safe way is to to segregate the code requested to be
>> compiled with different options, for sure if the option is "lower". Most
>> probably this is not what other users expects.
> 
> So eventually you want to simply not compile the -Os code with -flto so
> that it won't participate in link-time optimization (with using the linker
> plugin you can still get most of the effect of whole-program assumptions).
this is what I'm doing now, as you may have guessed from PR53780...
> Or if you have several -Os sources, do a partial -flto link with -Os and
> link that object in the link-time optimization with -O2.
I've no experience with -Wl,-r (or -Ur?) will try.
> 
> As what the user expects that is not really clear ;)  A quite extreme
> way would be to simply assert that flags on all compiles and the link
> are the same and otherwise give up (or warn).  Another idea was that
> if at link-time no compile options are specified (might be quite common,
> but at least people added -flto somehow) then use the options from
> compile-time (but which, if they differed?).  
Historically we always propagated compiler flags at the linker step.
For sure with lto the meaning of flags at compile vs link time is quite confusing.
> Another idea was to tag
> the cgraph with the compile options used and adjust the partitioning
> done for the LTRANS stage to be able to have consistent options (that
> might differ between LTRANS units) for all functions inside one LTRANS unit.
This is close to what I would expect to happen.
> For all of the different-flags-at-compile-time issues the issue remains
> what the flags used at link-time mean?  Do they override or amend flags?
I would suggest to issue warning or at least to clarify your choices
> (We specifically thought of libraries shipped with LTO bytecode
> compiled at -O0 -g to be used for both debugging and optimized compile
> by means of optimizing only at link-time)
> 
Clever, a bit confusing though.
I've moved to -fno-fat-lto-objects also to make sure that the plugin is used.
Comment 5 Andrew Pinski 2021-12-24 04:12:34 UTC
I think the problem listed here is all fully fixed on the trunk (there has been many improvements over time even to get this fixed, even as recently as r12-5920 [PR 103515] ).