Consider (-march=native is amdfam10): markus@x4 tmp % cat foo.ii markus@x4 tmp % cat bar.ii typedef int __m128i __attribute__ ((__vector_size__ (16))); __m128i a, b, c; void dequant_scaling () { c = __builtin_ia32_pmulld128 (a, b); } markus@x4 tmp % g++ -flto -fPIC -march=native -O2 -c foo.ii markus@x4 tmp % g++ -flto -fPIC -march=native -O2 -msse4.1 -c bar.ii markus@x4 tmp % g++ -flto -march=native -O2 -shared foo.o bar.o bar.ii: In function ‘dequant_scaling’: bar.ii:3:61: error: ‘__builtin_ia32_pmulld128’ needs isa option -m32 -msse4.1 void dequant_scaling () { c = __builtin_ia32_pmulld128 (a, b); } ^ lto-wrapper: /usr/x86_64-pc-linux-gnu/gcc-bin/4.9.0/g++ returned 1 exit status Adding -msse4.1 to the final link step would fix the issue. This causes e.g. media-libs/x265 build failures see: PR60568 comment13.
The issue is that -march=native "explodes" to explicit set options, including negative ones such as -mno-sse4.1. That's bad, as we now have conflicting options for bar.o and foo.o which we merge like /* Do what the old LTO code did - collect exactly one option setting per OPT code, we pick the first we encounter. ??? This doesn't make too much sense, but when it doesn't then we should complain. */ I think this option exploding done by -march=native is simply broken. At least exploding to full positive _and_ negative lists is. Either we have a separate option for each target feature - then we don't need the -mno-xxx stuff, or we don't - then we need to fix that. Note that the plan for the future is to no longer "merge" any target options for link-time but use target attributes more aggressively. The current code merely tries to make the link-step succeed somehow, not follow what the user intended with setting specific target options on specific TUs.
Variation of the problem without -march=native: markus@x4 tmp % cat foo.ii markus@x4 tmp % cat bar.ii typedef int __m128i __attribute__ ((__vector_size__ (16))); __m128i a, b, c; void dequant_scaling () { c = __builtin_ia32_pmulld128 (a, b); } markus@x4 tmp % cat main.ii void dequant_scaling(); int main () { dequant_scaling(); } markus@x4 tmp % g++ -flto -fPIC -march=amdfam10 -O2 -c foo.ii markus@x4 tmp % g++ -flto -fPIC -march=amdfam10 -O2 -msse4.1 -c bar.ii markus@x4 tmp % g++ -flto -march=native -O2 -shared foo.o bar.o markus@x4 tmp % ar cr test.a foo.o bar.o markus@x4 tmp % g++ -march=amdfam10 -O2 main.ii test.a bar.ii: In function ‘dequant_scaling’: bar.ii:3:61: error: ‘__builtin_ia32_pmulld128’ needs isa option -m32 -msse4.1 void dequant_scaling () { c = __builtin_ia32_pmulld128 (a, b); } ^ lto-wrapper: /usr/x86_64-pc-linux-gnu/gcc-bin/4.9.0/g++ returned 1 exit status /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.0/../../../../x86_64-pc-linux-gnu/bin/ld: fatal error: lto-wrapper failed collect2: error: ld returned 1 exit status
*** Bug 60964 has been marked as a duplicate of this bug. ***
GCC 4.9.1 has been released.
GCC 4.9.2 has been released.
Both issues are fixed on trunk. Closing.