Bug 61043 - LTO accumulates CPU requirements from all input objects
Summary: LTO accumulates CPU requirements from all input objects
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: lto (show other bugs)
Version: 4.8.2
: P3 normal
Target Milestone: 4.9.0
Assignee: Not yet assigned to anyone
URL:
Keywords: lto
Depends on:
Blocks:
 
Reported: 2014-05-03 11:42 UTC by andysem
Modified: 2016-10-04 17:44 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
A test case to reproduce the problem (975 bytes, application/gzip)
2014-05-03 11:42 UTC, andysem
Details
A new testcase which produces invalid code with gcc 5.4 (640 bytes, application/gzip)
2016-10-04 16:31 UTC, andysem
Details

Note You need to log in before you can comment on or make changes to this bug.
Description andysem 2014-05-03 11:42:39 UTC
Created attachment 32726 [details]
A test case to reproduce the problem

I have a test case (attached) with several input files. main.cpp contains generic code that should run on any CPU, and add_sse2.c and add_avx2.c containing optimized code with SSE2 and AVX2 intrinsics, respectively. main.cpp detects CPU features in run time and invokes the optimized code when possible.

The problem is when this test is compiled with LTO enabled, the resulting executable contains add_sse2 function with VEX-encoded instructions (i.e. with AVX-128 code instead of legacy SSE2). This does not happen when LTO is not enabled. My guess is that LTO computes the highest required CPU across all input object files (which is one with AVX2 in this case) and generates code for it instead of generating code for the CPU that was specified during the compilation stage. The expected behavior would be to record target-related compiler options for every function and use these options at LTO stage.

To compile the test you can use compile.sh. To obtain disassembled code you can use disasm.sh. Look for add_sse2 code in the disassembly.
Comment 1 Andi Kleen 2014-05-04 03:48:25 UTC
Yes LTO doesn't support different options for different files, and combines some of them (which happens in your case) and ignores some others.

You could use tag the functions in the different file with 

__attribute__(target("..."))

This will also allow automatic switching.

Arguably gcc should do this automatically for LTO, but unfortunately it doesn't

Or alternatively don't compile the file that needs the changed options with LTO
Comment 2 andysem 2014-05-04 09:22:36 UTC
(In reply to Andi Kleen from comment #1)
> Yes LTO doesn't support different options for different files, and combines
> some of them (which happens in your case) and ignores some others.
> 
> You could use tag the functions in the different file with 
> 
> __attribute__(target("..."))
> 
> This will also allow automatic switching.
> 
> Arguably gcc should do this automatically for LTO, but unfortunately it
> doesn't

Unfortunately, gcc does not allow using SIMD intrinsics if not enabled by compiler switches, so leaving the compiler options for a generic target CPU wouldn't work. At least that is the case with gcc 4.8.

> Or alternatively don't compile the file that needs the changed options with
> LTO

Yes, I'm currently not using LTO in my real world project that exhibits this problem. But users of my project would like to enable LTO, and currently this silently produces incorrect binaries. The purpose of this ticket is to indicate the problem and suggest a possible solution (automatically marking each function in every translation unit with the target options).
Comment 3 Andi Kleen 2014-05-04 17:18:51 UTC
>Unfortunately, gcc does not allow using SIMD intrinsics if not enabled by >compiler switches, so leaving the compiler options for a generic target CPU >wouldn't work. At least that is the case with gcc 4.8.

This has been fixed in 4.9
Comment 4 Richard Biener 2014-05-05 09:10:51 UTC
Known issue (there are duplicate bugs for this AFAIK).
Comment 5 Andrew Pinski 2016-08-14 04:18:55 UTC
I think this has been fixed already in GCC 5 (maybe even in 4.9).  Can you try GCC 5 and see if it has been fixed?
Comment 6 Andrew Pinski 2016-08-14 04:20:24 UTC
Fixed in 4.9
Comment 7 andysem 2016-10-04 16:30:55 UTC
I believe, this bug is not yet fixed in gcc 5.4 on Kubuntu 16.04 x86_64.

$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 5.4.0-6ubuntu1~16.04.2' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.2)

I attached a new testcase. The compiler still produces AVX instructions in both _Z17my_simd_func_sse2PKhPh and _Z16my_simd_func_avxPKhPh functions.
Comment 8 andysem 2016-10-04 16:31:56 UTC
Created attachment 39751 [details]
A new testcase which produces invalid code with gcc 5.4
Comment 9 Andrew Pinski 2016-10-04 16:39:07 UTC
(In reply to andysem from comment #8)
> Created attachment 39751 [details]
> A new testcase which produces invalid code with gcc 5.4

I think this testcase is violating C++ ODR.  In that INSTRUCTION_SET::my_simd_func_impl is the same between the TUs.  If you had used an anonymous namespace, it should have worked correctly.  If anonymous namespace does not work, please file a separate bug.
Comment 10 andysem 2016-10-04 17:28:47 UTC
(In reply to Andrew Pinski from comment #9)
> 
> I think this testcase is violating C++ ODR.  In that
> INSTRUCTION_SET::my_simd_func_impl is the same between the TUs.  If you had
> used an anonymous namespace, it should have worked correctly.  If anonymous
> namespace does not work, please file a separate bug.

INSTRUCTION_SET is defined differently for the two translation units, so we essentially have sse2::my_simd_func_impl and avx::my_simd_func_impl. This does not violate ODR.
Comment 11 Andrew Pinski 2016-10-04 17:31:47 UTC
(In reply to andysem from comment #10)
> (In reply to Andrew Pinski from comment #9)
> > 
> > I think this testcase is violating C++ ODR.  In that
> > INSTRUCTION_SET::my_simd_func_impl is the same between the TUs.  If you had
> > used an anonymous namespace, it should have worked correctly.  If anonymous
> > namespace does not work, please file a separate bug.
> 
> INSTRUCTION_SET is defined differently for the two translation units, so we
> essentially have sse2::my_simd_func_impl and avx::my_simd_func_impl. This
> does not violate ODR.

Open a new bug.
Comment 12 Andrew Pinski 2016-10-04 17:37:25 UTC
(In reply to andysem from comment #10)
> (In reply to Andrew Pinski from comment #9)
> > 
> > I think this testcase is violating C++ ODR.  In that
> > INSTRUCTION_SET::my_simd_func_impl is the same between the TUs.  If you had
> > used an anonymous namespace, it should have worked correctly.  If anonymous
> > namespace does not work, please file a separate bug.
> 
> INSTRUCTION_SET is defined differently for the two translation units, so we
> essentially have sse2::my_simd_func_impl and avx::my_simd_func_impl. This
> does not violate ODR.

Oh I did not notice INSTRUCTION_SET was defined on the command line.  as I said please file a different bug.
Comment 13 andysem 2016-10-04 17:44:55 UTC
Ok. For the record, opened bug 77845.