Bug 55102 - The options -flto and -On do not behave as described in GCC docs
Summary: The options -flto and -On do not behave as described in GCC docs
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: lto (show other bugs)
Version: 4.8.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: documentation, lto, missed-optimization
: 56700 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-10-27 19:27 UTC by Dmitry Gorbachev
Modified: 2013-03-27 13:09 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail: 4.6.4, 4.7.3, 4.8.0
Last reconfirmed: 2012-10-29 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dmitry Gorbachev 2012-10-27 19:27:38 UTC
> Additionally, the optimization flags used to compile
> individual files are not necessarily related to those
> used at link time.  For instance,
>
>         gcc -c -O0 -flto foo.c
>         gcc -c -O0 -flto bar.c
>         gcc -o myprog -flto -O3 foo.o bar.o
>
> This produces individual object files with unoptimized
> assembler code, but the resulting binary myprog is
> optimized at -O3.  If, instead, the final binary is
> generated without -flto, then myprog is not optimized.

In fact, when you use -O3 when linking the .o files, it is already too late, the resulting binary will not be fully optimized. You need to compile the .c files with at least -O1. Thus, there is a bug either in GCC itself or in the documentation.

====== 8< ======
int foo(void)
{
  return 0;
}

int main(void)
{
  return foo();
}
====== >8 ======

$ gcc -flto -O0 -c foo.c -o foo-O0.o
$ gcc -flto -O1 -c foo.c -o foo-O1.o
$ gcc -flto -O0 foo-O0.o -o prog-O0-O0
$ gcc -flto -O3 foo-O0.o -o prog-O0-O3
$ gcc -flto -O0 foo-O1.o -o prog-O1-O0
$ gcc -flto -O3 foo-O1.o -o prog-O1-O3
$ nm -A prog-O0-O0 prog-O0-O3 prog-O1-O0 prog-O1-O3 | grep foo
prog-O0-O0:080483f0 t foo.2337
prog-O0-O3:080483f0 t foo.2337
prog-O1-O0:080483f0 t foo.2337

GCC 4.6 gives a slight different result:

$ nm -A prog-O0-O0 prog-O0-O3 prog-O1-O0 prog-O1-O3 | grep foo
prog-O0-O0:08048381 t foo.1988
prog-O0-O3:08048380 t foo.1988

(GCC 4.5 crashes.)
Comment 1 Richard Biener 2012-10-29 14:38:24 UTC
Yes, fact is that -O0 disables local optimizations before LTO streaming.
It also disables IPA pass local analysis phase which means that even if
enabling -O3 at WPA / LTRANS stages you will _not_ get IPA transforms
enabled.

Eventually we'd want to enable all IPA analysis phases at all -O levels...
Comment 2 Eric Botcazou 2013-03-26 20:51:48 UTC
*** Bug 56700 has been marked as a duplicate of this bug. ***
Comment 3 Jan Hubicka 2013-03-27 12:47:04 UTC
Doing IPA analysis at -O0 for LTO streaming won't really solve the fact that functions are not early optimized.  I would vote for at least issuing a waning when LTOing -O0 objects into -On, n>=1 LTO binary or simply declaring -O0 to be non-LTO only.

But indeed, we probably should make analysis/summary streaming of all IPA passes so -fno-ipa-cp and such works as expected all the time.  I have patch fot that somewhere already.

We probably should lean the route of streaming the options used and honoring them rather than taking whatever is passed to linker...
Comment 4 rguenther@suse.de 2013-03-27 13:09:18 UTC
On Wed, 27 Mar 2013, hubicka at gcc dot gnu.org wrote:

> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55102
> 
> --- Comment #3 from Jan Hubicka <hubicka at gcc dot gnu.org> 2013-03-27 12:47:04 UTC ---
> Doing IPA analysis at -O0 for LTO streaming won't really solve the fact that
> functions are not early optimized.  I would vote for at least issuing a waning
> when LTOing -O0 objects into -On, n>=1 LTO binary or simply declaring -O0 to be
> non-LTO only.
> 
> But indeed, we probably should make analysis/summary streaming of all IPA
> passes so -fno-ipa-cp and such works as expected all the time.  I have patch
> fot that somewhere already.
> 
> We probably should lean the route of streaming the options used and honoring
> them rather than taking whatever is passed to linker...

Well, we _do_ stream them.  The issue is that we need to formally
define how to merge N sets of options from the N input files
at WPA stage to M sets of options for the M LTRANS units
(with eventually, but not necessarily, M == 1).

Oh, and implement it, of course.

At the moment the LTO driver (lto-wrapper.c) has a brief look at
options because it creates options for the WPA stage (which
shouldn't really care about the options passed ... in which
case it could do the option processing from the TUs and eventually
simply partition them into sets of TUs that have the same options).

So - Honza, what about first making WPA "ignore" all flags?
(all optimization and target flags)  IPA pass processing should
just unconditionally run and handle inputs which have the IPA
sections present.