All versions of gcc and g++ i've tested have issues with SSE ordered/unordered compare (vector or scalar) when -ffast-math is on; the result being either an ICE, the instruction being omitted or rarely the Right Thing. ie with gcc/g++ 4.0.0 20041205: #include <xmmintrin.h> int main() { __m128 a = _mm_set1_ps(1.f/0.f), b = _mm_set1_ps(0), c = _mm_mul_ps(a,b), //z = _mm_setzero_ps(), comp = _mm_cmpunord_ps(c,c); // ICE -O >= 1 //comp = _mm_cmpord_ps(c,c); // ok //comp = _mm_cmpord_ps(c,z); // removed with -O >= 1 //comp = _mm_cmpunord_ps(c,z); // removed with -O >= 1 //comp = _mm_cmpunord_ss(c,c); // ok //comp = _mm_cmpord_ss(c,c); // ok //comp = _mm_cmpord_ss(c,z); // ok //comp = _mm_cmpunord_ss(c,z); // ok return _mm_movemask_ps(comp); } $ /usr/local/gcc-405/bin/g++ sse.cpp -march=pentium4 -O1 -ffast-math sse.cpp: In function 'int main()': sse.cpp:7: warning: division by zero in '1.0e+0f / 0.' sse.cpp:21: internal compiler error: in output_constant_pool_2, at varasm.c:3108 Please submit a full bug report, with preprocessed source if appropriate. See <URL:http://gcc.gnu.org/bugs.html> for instructions. $ /usr/local/gcc-405/bin/gcc -v Reading specs from /usr/local/gcc-405/lib/gcc/i686-pc-cygwin/4.0.0/specs Configured with: ../configure --prefix=/usr/local/gcc-405 --enable-languages=c,c++ --enable-threads=posix --with-system-zlib --disable-checking --disable-nls --disable-shared --disable-win32-registry --verbose --with-gcc --with-gnu-ld --with-gnu-as --with-as=/usr/local/binutils/bin/as --with-ld=/usr/local/binutils/bin/ld Thread model: posix gcc version 4.0.0 20041205 (experimental) That 20041205 snapshot is the best at handling that particular issue, it's even worse with other version. I've tried: gcc version 3.3.3 (cygwin special) gcc version 3.4.1 20040625 (prerelease) gcc version 3.5.0 20040620 (experimental) You get the same with -march=k8. So it's not just a gcc4.0.0 problem, i don't have older version or other platforms to test it but i guess it would be wise to check it there too :)
Reduced testcase: typedef float __v4sf __attribute__ ((vector_size (16))); int ffg (__v4sf __A); int main() { float f = 1.f/0.f; __v4sf __tmp = __builtin_ia32_loadss (&f); __v4sf a = __builtin_ia32_shufps(__tmp, __tmp, 0), comp = (__v4sf)__builtin_ia32_cmpunordps(a,a); return ffg(comp); } : Search converges between 2004-01-01-trunk (#437) and 2004-01-17-trunk (#438).
It was fixed in the 3.4 branch: : Search converges between 2004-10-10-004002-3.4 (#94) and 2004-10-11-004001-3.4 (#95).
This is a dup of bug 17767. *** This bug has been marked as a duplicate of 17767 ***