Bug 35585 - [4.2 Regression] Miscompiled inline assembly
[4.2 Regression] Miscompiled inline assembly
Product: gcc
Classification: Unclassified
Component: tree-optimization
: P2 normal
: 4.3.0
Assigned To: Not yet assigned to anyone
: alias, wrong-code
Depends on:
  Show dependency treegraph
Reported: 2008-03-14 16:24 UTC by nicos
Modified: 2009-03-31 15:25 UTC (History)
2 users (show)

See Also:
Target: i486-linux-gnu
Known to work: 4.1.3 4.3.0
Known to fail: 4.2.1 4.2.3 4.2.5
Last reconfirmed: 2008-03-15 11:51:54

Testcase to reproduce the bug (2.42 KB, text/plain)
2008-03-14 16:26 UTC, nicos
Preprocessed testcase (267.81 KB, text/plain)
2008-03-14 16:29 UTC, nicos

Note You need to log in before you can comment on or make changes to this bug.
Description nicos 2008-03-14 16:24:51 UTC
The project I work on uses an inline assembly part to compute floor and ceil functions of floating point numbers and its seems that in some cases, with gcc-4.2 and optimizations turned on, the computed values are not correct. I attached the smallest testcase that I could come up with to reproduce the error and the preprocessed input. Changing the testcase a little suffices to make the miscompilation disappear.
Since I am not an expert on inline assembly in gcc, I am not sure that the iCeil/iFloor functions are completely correct and perhaps this is not a bug...

I was unable to reproduce the error with gcc-4.1 and gcc-4.3. The error is still present with gcc-4.2.3. The error only appears with -O2/-O3.

Compilation command:
g++-4.2 -v -save-temps testcase.cpp -o testcase -O3
Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2 --enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --enable-targets=all --enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnu
Thread model: posix
gcc version 4.2.1 (Ubuntu 4.2.1-5ubuntu4)
 /usr/lib/gcc/i486-linux-gnu/4.2.1/cc1plus -E -quiet -v -D_GNU_SOURCE testcase.cpp -mtune=generic -O3 -fpch-preprocess -o testcase.ii
ignoring nonexistent directory "/usr/local/include/i486-linux-gnu"
ignoring nonexistent directory "/usr/lib/gcc/i486-linux-gnu/4.2.1/../../../../i486-linux-gnu/include"
ignoring nonexistent directory "/usr/include/i486-linux-gnu"
#include "..." search starts here:
#include <...> search starts here:
End of search list.
 /usr/lib/gcc/i486-linux-gnu/4.2.1/cc1plus -fpreprocessed testcase.ii -quiet -dumpbase testcase.cpp -mtune=generic -auxbase testcase -O3 -version -fstack-protector -fstack-protector -o testcase.s
GNU C++ version 4.2.1 (Ubuntu 4.2.1-5ubuntu4) (i486-linux-gnu)
        compiled by GNU C version 4.2.1 (Ubuntu 4.2.1-5ubuntu4).
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 44e55ae5d2724830dee11801424b84d8
 as --traditional-format -V -Qy -o testcase.o testcase.s
GNU assembler version 2.18 (i486-linux-gnu) using BFD version (GNU Binutils for Ubuntu) 2.18
 /usr/lib/gcc/i486-linux-gnu/4.2.1/collect2 --eh-frame-hdr -m elf_i386 --hash-style=both -dynamic-linker /lib/ld-linux.so.2 -o testcase /usr/lib/gcc/i486-linux-gnu/4.2.1/../../../../lib/crt1.o /usr/lib/gcc/i486-linux-gnu/4.2.1/../../../../lib/crti.o /usr/lib/gcc/i486-linux-gnu/4.2.1/crtbegin.o -L/usr/lib/gcc/i486-linux-gnu/4.2.1 -L/usr/lib/gcc/i486-linux-gnu/4.2.1 -L/usr/lib/gcc/i486-linux-gnu/4.2.1/../../../../lib -L/lib/../lib -L/usr/lib/../lib -L/usr/lib/gcc/i486-linux-gnu/4.2.1/../../.. testcase.o -lstdc++ -lm -lgcc_s -lgcc -lc -lgcc_s -lgcc /usr/lib/gcc/i486-linux-gnu/4.2.1/crtend.o /usr/lib/gcc/i486-linux-gnu/4.2.1/../../../../lib/crtn.o
Comment 1 nicos 2008-03-14 16:26:23 UTC
Created attachment 15317 [details]
Testcase to reproduce the bug
Comment 2 nicos 2008-03-14 16:29:07 UTC
Created attachment 15318 [details]
Preprocessed testcase
Comment 3 Richard Biener 2008-03-14 17:30:25 UTC
This is caused by extra precision on the 387 FPU.

*** This bug has been marked as a duplicate of 323 ***
Comment 4 nicos 2008-03-15 01:07:50 UTC
I think I need some help here, I looked to bug 323 and I can't see how it is related to this issue.
The assertion at the end of the test case compares integers, and the iFloor function is only applied to 0 in the test. My problem is that the result in sp.bbox[0] seems total garbage.
I looked at the assembly code generated by gcc (-S flag), the portion corresponding to:

sp.bbox[0] = std::min(sp.bbox[0], iFloor(txcum) );


	fld	%st(1)
	fadd	%st(2), %st
	fsubs	.LC4
	fistpl -44(%ebp)
	movl	-44(%ebp), %edi
	movl	-88(%ebp), %eax
	sarl	%edi
	cmpl	16(%ebx), %edi
	jge	.L204
	leal	-28(%ebp), %eax
	fld	%st(0)
	movl	(%eax), %eax
	fadd	%st(1), %st
	fsubs	.LC4
	movl	%eax, 16(%ebx)

The result of iFloor is stored in %edi but whatever the result of cmpl, the value in %edi isn't used (instead the value at -28(%ebp), which appears to be uninitialized, is stored into sp.bbox[0]).
Comment 5 Richard Biener 2008-03-15 11:28:14 UTC
Sorry.  The tree optimizers produce

  __asm__ __volatile__("fistl %0":"=m" r:"t" txcum * 2.0e+0 - 5.0e-1);
  D.16879 = r >> 1;
  if (D.16879 < sp->bbox[0]) goto <L105>; else goto <L17>;

  __b = &D.12083;
  goto <bb 20> (<L18>);

  __b = &sp->bbox[0];

  sp->bbox[0] = *__b;

which looks suspicious as well, as D.12083 is not the correct result here
(but in fact is uninitialized).  And this is store-sinking which makes
a mess of it:

Sinking #   D.12083_947 = V_MUST_DEF <D.12083_380>;
D.12083 = D.16879_333 from bb 21 to bb 52

because of wrong alias information computed right before this pass:

  D.16874_329 = txcum_285 * 2.0e+0;
  x_330 = D.16874_329 - 5.0e-1;
  #   r_946 = V_MAY_DEF <r_288>;
  __asm__ __volatile__("fistl %0":"=m" r:"t" x_330);
  #   VUSE <r_946>;
  r.41_332 = r;
  D.16879_333 = r.41_332 >> 1;
  #   D.12083_947 = V_MUST_DEF <D.12083_380>;
  D.12083 = D.16879_333;
  #   VUSE <SFT.738_136>;
  #   VUSE <SFT.739_451>;
  #   VUSE <SFT.740_367>;
  D.16880_340 = sp_119->bbox[0];
  if (D.16879_333 < D.16880_340) goto <L91>; else goto <L17>;

  goto <bb 23> (<L18>);

  __b_342 = &sp_119->bbox[0];

  # __b_5 = PHI <&D.12083(48), __b_342(22)>;
  #   VUSE <r_946>;
  #   VUSE <r_284>;
  #   VUSE <r_6>;
  #   VUSE <r_43>;
  D.12120_344 = *__b_5;

Oh well, it's not that 4.2 does not have known aliasing related problems.
Comment 6 Richard Biener 2008-03-15 11:33:09 UTC
In fact it's completely wrong.

  # __b_5 = PHI <&D.12083(48), __b_342(22)>;
  #   VUSE <r_946>;
  #   VUSE <r_284>;
  #   VUSE <r_6>;
  #   VUSE <r_43>;
  D.12120_344 = *__b_5;

should be

  # __b_5 = PHI <&D.12083(48), __b_342(22)>;
  #   VUSE <SFT.738_136>;
  #   VUSE <SFT.739_451>;
  #   VUSE <SFT.740_367>;
  #   VUSE <D.12083_947>;
  D.12120_344 = *__b_5;

This looks like a const vs. non-const issue (which I vaguely remember).
Comment 7 Richard Biener 2008-03-15 11:51:54 UTC
points-to works well and ends up with __b_5 pointing to anything, so we fall
back to use SMTs which in this case is (for const int& __b):

SMT.761, UID 18425, const int, is addressable, is global, call clobbered, may aliases: { r r r r }

there you go.  It should also (at least) alias

D.12083, UID 12083, int, is aliased, is addressable, call clobbered, default def: D.12083_242

SMT.763, UID 18427, struct SceneProps, is addressable, is global, call clobbered, default def: SMT.763_469, may aliases: { SFT.738 SFT.739 SFT.740 r r r r D.12089 D.12083 D.12084 D.12088 }

so it is flow-insensitive alias analysis that gets it wrong.
Comment 8 Joseph S. Myers 2008-05-19 20:25:15 UTC
4.2.4 is being released, changing milestones to 4.2.5.
Comment 9 Joseph S. Myers 2009-03-31 15:25:34 UTC
Closing 4.2 branch, fixed for 4.3.