The project I work on uses an inline assembly part to compute floor and ceil functions of floating point numbers and its seems that in some cases, with gcc-4.2 and optimizations turned on, the computed values are not correct. I attached the smallest testcase that I could come up with to reproduce the error and the preprocessed input. Changing the testcase a little suffices to make the miscompilation disappear. Since I am not an expert on inline assembly in gcc, I am not sure that the iCeil/iFloor functions are completely correct and perhaps this is not a bug... I was unable to reproduce the error with gcc-4.1 and gcc-4.3. The error is still present with gcc-4.2.3. The error only appears with -O2/-O3. Compilation command: g++-4.2 -v -save-temps testcase.cpp -o testcase -O3 Using built-in specs. Target: i486-linux-gnu Configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2 --enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --enable-targets=all --enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnu Thread model: posix gcc version 4.2.1 (Ubuntu 4.2.1-5ubuntu4) /usr/lib/gcc/i486-linux-gnu/4.2.1/cc1plus -E -quiet -v -D_GNU_SOURCE testcase.cpp -mtune=generic -O3 -fpch-preprocess -o testcase.ii ignoring nonexistent directory "/usr/local/include/i486-linux-gnu" ignoring nonexistent directory "/usr/lib/gcc/i486-linux-gnu/4.2.1/../../../../i486-linux-gnu/include" ignoring nonexistent directory "/usr/include/i486-linux-gnu" #include "..." search starts here: #include <...> search starts here: /usr/include/c++/4.2 /usr/include/c++/4.2/i486-linux-gnu /usr/include/c++/4.2/backward /usr/local/include /usr/lib/gcc/i486-linux-gnu/4.2.1/include /usr/include End of search list. /usr/lib/gcc/i486-linux-gnu/4.2.1/cc1plus -fpreprocessed testcase.ii -quiet -dumpbase testcase.cpp -mtune=generic -auxbase testcase -O3 -version -fstack-protector -fstack-protector -o testcase.s GNU C++ version 4.2.1 (Ubuntu 4.2.1-5ubuntu4) (i486-linux-gnu) compiled by GNU C version 4.2.1 (Ubuntu 4.2.1-5ubuntu4). GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 Compiler executable checksum: 44e55ae5d2724830dee11801424b84d8 as --traditional-format -V -Qy -o testcase.o testcase.s GNU assembler version 2.18 (i486-linux-gnu) using BFD version (GNU Binutils for Ubuntu) 2.18 /usr/lib/gcc/i486-linux-gnu/4.2.1/collect2 --eh-frame-hdr -m elf_i386 --hash-style=both -dynamic-linker /lib/ld-linux.so.2 -o testcase /usr/lib/gcc/i486-linux-gnu/4.2.1/../../../../lib/crt1.o /usr/lib/gcc/i486-linux-gnu/4.2.1/../../../../lib/crti.o /usr/lib/gcc/i486-linux-gnu/4.2.1/crtbegin.o -L/usr/lib/gcc/i486-linux-gnu/4.2.1 -L/usr/lib/gcc/i486-linux-gnu/4.2.1 -L/usr/lib/gcc/i486-linux-gnu/4.2.1/../../../../lib -L/lib/../lib -L/usr/lib/../lib -L/usr/lib/gcc/i486-linux-gnu/4.2.1/../../.. testcase.o -lstdc++ -lm -lgcc_s -lgcc -lc -lgcc_s -lgcc /usr/lib/gcc/i486-linux-gnu/4.2.1/crtend.o /usr/lib/gcc/i486-linux-gnu/4.2.1/../../../../lib/crtn.o
Created attachment 15317 [details] Testcase to reproduce the bug
Created attachment 15318 [details] Preprocessed testcase
This is caused by extra precision on the 387 FPU. *** This bug has been marked as a duplicate of 323 ***
I think I need some help here, I looked to bug 323 and I can't see how it is related to this issue. The assertion at the end of the test case compares integers, and the iFloor function is only applied to 0 in the test. My problem is that the result in sp.bbox[0] seems total garbage. I looked at the assembly code generated by gcc (-S flag), the portion corresponding to: sp.bbox[0] = std::min(sp.bbox[0], iFloor(txcum) ); is: .L201: fld %st(1) fadd %st(2), %st fsubs .LC4 #APP fistpl -44(%ebp) #NO_APP movl -44(%ebp), %edi movl -88(%ebp), %eax sarl %edi cmpl 16(%ebx), %edi jge .L204 leal -28(%ebp), %eax .L204: fld %st(0) movl (%eax), %eax fadd %st(1), %st fsubs .LC4 movl %eax, 16(%ebx) The result of iFloor is stored in %edi but whatever the result of cmpl, the value in %edi isn't used (instead the value at -28(%ebp), which appears to be uninitialized, is stored into sp.bbox[0]).
Sorry. The tree optimizers produce <L62>:; __asm__ __volatile__("fistl %0":"=m" r:"t" txcum * 2.0e+0 - 5.0e-1); D.16879 = r >> 1; if (D.16879 < sp->bbox[0]) goto <L105>; else goto <L17>; <L105>:; __b = &D.12083; goto <bb 20> (<L18>); <L17>:; __b = &sp->bbox[0]; <L18>:; sp->bbox[0] = *__b; which looks suspicious as well, as D.12083 is not the correct result here (but in fact is uninitialized). And this is store-sinking which makes a mess of it: Sinking # D.12083_947 = V_MUST_DEF <D.12083_380>; D.12083 = D.16879_333 from bb 21 to bb 52 because of wrong alias information computed right before this pass: <L62>:; D.16874_329 = txcum_285 * 2.0e+0; x_330 = D.16874_329 - 5.0e-1; # r_946 = V_MAY_DEF <r_288>; __asm__ __volatile__("fistl %0":"=m" r:"t" x_330); # VUSE <r_946>; r.41_332 = r; D.16879_333 = r.41_332 >> 1; # D.12083_947 = V_MUST_DEF <D.12083_380>; D.12083 = D.16879_333; # VUSE <SFT.738_136>; # VUSE <SFT.739_451>; # VUSE <SFT.740_367>; D.16880_340 = sp_119->bbox[0]; if (D.16879_333 < D.16880_340) goto <L91>; else goto <L17>; <L91>:; goto <bb 23> (<L18>); <L17>:; __b_342 = &sp_119->bbox[0]; # __b_5 = PHI <&D.12083(48), __b_342(22)>; <L18>:; # VUSE <r_946>; # VUSE <r_284>; # VUSE <r_6>; # VUSE <r_43>; D.12120_344 = *__b_5; Oh well, it's not that 4.2 does not have known aliasing related problems.
In fact it's completely wrong. # __b_5 = PHI <&D.12083(48), __b_342(22)>; <L18>:; # VUSE <r_946>; # VUSE <r_284>; # VUSE <r_6>; # VUSE <r_43>; D.12120_344 = *__b_5; should be # __b_5 = PHI <&D.12083(48), __b_342(22)>; <L18>:; # VUSE <SFT.738_136>; # VUSE <SFT.739_451>; # VUSE <SFT.740_367>; # VUSE <D.12083_947>; D.12120_344 = *__b_5; This looks like a const vs. non-const issue (which I vaguely remember).
points-to works well and ends up with __b_5 pointing to anything, so we fall back to use SMTs which in this case is (for const int& __b): SMT.761, UID 18425, const int, is addressable, is global, call clobbered, may aliases: { r r r r } there you go. It should also (at least) alias D.12083, UID 12083, int, is aliased, is addressable, call clobbered, default def: D.12083_242 SMT.763, UID 18427, struct SceneProps, is addressable, is global, call clobbered, default def: SMT.763_469, may aliases: { SFT.738 SFT.739 SFT.740 r r r r D.12089 D.12083 D.12084 D.12088 } so it is flow-insensitive alias analysis that gets it wrong.
4.2.4 is being released, changing milestones to 4.2.5.
Closing 4.2 branch, fixed for 4.3.