Bug 44606

Summary: Wrong SPE floating point during computation
Product: gcc Reporter: Sebastian Andrzej Siewior <gcc>
Component: targetAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal CC: bernds, dvdkhlng, froydnj, gcc-bugs, kyle+new, kyle+old, mark.workman, wd
Priority: P3    
Version: 4.4.4   
Target Milestone: ---   
Host: powerpc-linux-gnuspe Target: powerpc-linux-gnuspe
Build: powerpc-linux-gnuspe Known to work:
Known to fail: Last reconfirmed:
Attachments: testcase
slightly extended tc
-S output of the first tc
Further stripped testcase with problematic section identified
Makefile for "test.c"

Description Sebastian Andrzej Siewior 2010-06-21 09:09:44 UTC
I attached two testcase which is stripped down graphicsmagick code.
tc-resize2.c has a few instructions more than tc-resize.c. I belive the bug is 
the same.
 gcc -o tc2 -O0 -Wall -Wextra tc-resize.c
 gcc -o tc2 -O2 -Wall -Wextra tc-resize.c
 ( ./tc0 ; echo "-"; ./tc2 )
 .19 2484.000000
 .21 2700.000000
 .23 2916.000000
 .24 3132.000000
 .26 3348.000000
 -
 .0 2484.000000
 .0 2700.000000
 .0 2916.000000
 .0 3132.000000
 .0 3348.000000
 
 gcc -o tc20 -O0 -Wall -Wextra tc-resize2.c
 gcc -o tc22 -O2 -Wall -Wextra tc-resize2.c
 ( ./tc20 ; echo "-"; ./tc22 )
 .26 3264.000000
 .28 3520.000000
 .30 3776.000000
 .32 4032.000000
 .34 4288.000000
 -
 .14 1734.000000
 .15 1870.000000
 .16 2006.000000
 .17 2142.000000
 .18 2278.000000
 
 Now here is stripped down output of the assembly file:
 
 MinifyImage:
         evstdd 26,88(11)
 .L9:
         li 26,0
 
         stw 26,64(1)
 .L11:
  #APP
  # 105 "tc-resize.c" 1
         nop
  # 0 "" 2
 #NO_APP
         lis 19,.LC5@ha
         efdmul 5,9,26           r26 should be 0.0078125 but is 0
                                 r26 bits 32..63 are, dunno about 0..31    
         la 19,.LC5@l(19)
         evldd 19,0(19)          r19 is 0.5
         efdadd 5,5,19           r5 is expected to be .5
         efdctuiz 5,5            r5 is 0
         rlwinm 5,5,0,0xff
 
 #APP
  # 107 "tc-resize.c" 1
         nop
  # 0 "" 2
 #NO_APP
         evmergehi 6,9,9
         mr 7,9
         mr 8,7
         mr 7,6
 
         crxor 6,6,6     that should be 9,9,9 but is a different issue :)
         bl fprintf      # r5 int (bad), r6 pad, r7 & r8 double
 
r26 is zero and that is wrong. I belive 1/128.0 gets replaced by 0.0 and optimized.
 
Sebastian
Comment 1 Sebastian Andrzej Siewior 2010-06-21 09:10:17 UTC
Created attachment 20950 [details]
testcase
Comment 2 Sebastian Andrzej Siewior 2010-06-21 09:10:45 UTC
Created attachment 20951 [details]
slightly extended tc
Comment 3 Sebastian Andrzej Siewior 2010-06-21 09:11:13 UTC
Created attachment 20952 [details]
-S output of the first tc
Comment 4 Kyle Moffett 2010-07-16 18:48:08 UTC
(In reply to comment #0)
> I attached two testcase which is stripped down graphicsmagick code.
> tc-resize2.c has a few instructions more than tc-resize.c. I belive the bug is 
> the same.

I was able to get similarly erroneous results with just "-O1 -fschedule-insns":
gcc -o tc0 -O0 -Wall -Wextra tc-resize.c
gcc -o tc1x -O1 -fschedule-insns -Wall -Wextra tc-resize.c
( ./tc0 ; echo "-"; ./tc1x; )
.19 2484.000000
.21 2700.000000
.23 2916.000000
.24 3132.000000
.26 3348.000000
-
.7 954.000000
.8 1050.000000
.9 1146.000000
.10 1242.000000
.10 1338.000000

Comment 5 Kyle Moffett 2010-08-31 14:03:44 UTC
Created attachment 21605 [details]
Further stripped testcase with problematic section identified

Ok, I've spent a bit more time fiddling with this testcase, and I believe I can display exactly where the bug happens.  In the attached "test.c" file, you can see a section like this:

  saved_r = total_r;
  saved_g = total_g;
  saved_b = total_b;
  Minify(2*x + 10, 15.0);
  save2_r = total_r;
  save2_g = total_g;
  save2_b = total_b;

The "Minify()" macro is supposed to add nonzero values to total_[rgb] but when compiled with -O2 (or -O1 and a few extra optimizations) the values of save2_[rgb] are the same as those of saved_[rgb].

I'm also attaching a Makefile I used while testing this program.  In the Makefile I enable -O1 and then turn off every optimization pass that is not strictly necessary to reproduce the bug.  The Makefile simply builds 2 copies of the program, one with -O0 and one with -O2, then runs them and compares their output.

Some kind of loop optimization or unrolling seems to be involved.  The specific optimizations which are mandatory for the bug to show up are below, if any of these is turned off the bug seems to go away:
  -fdce
  -fguess-branch-probability
  -fschedule-insns
  -ftree-ch
  -ftree-dominator-opts
  -ftree-loop-optimize

Cheers,
Kyle Moffett
Comment 6 Kyle Moffett 2010-08-31 14:04:12 UTC
Created attachment 21606 [details]
Makefile for "test.c"
Comment 7 Nathan Froyd 2010-09-09 15:28:46 UTC
The problem is in the register allocator.  What happens is that after register allocation, we have something like:

  r27:DF = [r11:SI]
  ...
  r27:SI = 0
  ...

and then the first def gets deleted because it's obviously dead code.  I don't know why the register allocator chooses r27 for the SImode pseudo.

Scheduling is necessary to trigger the bug, but it's not the problem.
Comment 8 Kyle Moffett 2010-09-11 15:06:43 UTC
Is there anything else that any of us can do (dumps, testcases, etc) to help get this bug tracked down and fixed?  If access to a native SPE system would help I can set you up with remote access credentials to one of ours.

Cheers,
Kyle Moffett
Comment 9 Wolfgang Denk 2010-09-22 18:50:31 UTC
Known to work: 4.2.2 (from ELDK 4.2)
Known to fail: 4.4.1 (from CodeSourcery G++ Lite 4.4-254)
Comment 10 David Kühling 2010-11-02 14:39:12 UTC
I can reproduce this bug with gcc-4_4-branch SVN HEAD, compiling gcc in cross-compile mode and running the resulting executable with qemu-ppc.  Instructions to reproduce everything without any PPC hardware involved (tested on Ubuntu Maverick/AMD64):

Get a powerpcspe sysroot:

 apt-cross -a powerpcspe -S sid -m http://ftp.de.debian.org/debian-ports \
        libdb1-compat libdb4.8 libc6 libc6-dev linux-libc-dev \
	zlib1g zlib1g-dev libmpfr-dev
 apt-cross -a powerpcspe -S unreleased -m http://ftp.de.debian.org/debian-ports \
        libdb1-compat libdb4.8 libc6 libc6-dev linux-libc-dev \
	zlib1g zlib1g-dev libmpfr-dev

(repeat a few times until all dependencies resolved)

get Debian SID binutils source, then compile&install in cross-compile mode:
 
 dpkg-source -x binutils_*.dsc
 cd binutils-* && \
	    TARGET=powerpcspe fakeroot debian/rules binary-cross
 dpkg -i binutils-powepc-linux-gnuspe-*.deb

Get GCC from SVN and compile minimal C-only version:

 svn co svn://gcc.gnu.org/svn/gcc/branches/gcc-4_4-branch
 mkdir -p build
	cd build && ../gcc-4_4-branch/configure \
		--target=powerpc-linux-gnuspe \
		-with-cpu=8548 --enable-e500_double --with-long-double-128 \
		--enable-languages=c --disable-multilib \
		--prefix=/usr --with-headers=/usr/powerpc-linux-gnuspe/include \
		--with-libs=/usr/powerpc-linux-gnuspe/lib 
	make -C build -j5

Compile test-cases available as attachments to this bug, i.e. test.c:

 build/gcc/xgcc \
	-B./build/gcc/ -B/usr/powerpc-linux-gnuspe/bin/ \
	-B/usr/powerpc-linux-gnuspe/lib/ \
	-isystem /usr/powerpc-linux-gnuspe/include \
	-isystem /usr/powerpc-linux-gnuspe/sys-include \
	-fverbose-asm -dA -O2 -o test test.c

Use Ubuntu's qemu user-space emulation to run resulting executable:

 rootfs=/usr/powerpc-linux-gnuspe
 /usr/bin/qemu-ppc -cpu 'MPC8548E_v21' \
        -E LD_LIBRARY_PATH=$rootfs/lib/ $rootfs/lib/ld.so.1 \
        ./test

Repeat the same with -O0 and note the difference.  This test fails for all the 3 .c files attached here.  Output for tc-resize.c matches the output documented by Sebastian, so I'd say qemu is doing it right.
Comment 11 Nathan Froyd 2010-12-30 15:47:46 UTC
Author: froydnj
Date: Thu Dec 30 15:47:43 2010
New Revision: 168347

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=168347
Log:
gcc/
        PR target/44606
        * reload1.c (choose_reload_regs): Don't look for equivalences for
        output reloads of constant loads.

gcc/testsuite/
        PR target/44606
        * gcc.dg/pr44606.c: New test.


Added:
    trunk/gcc/testsuite/gcc.dg/pr44606.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/reload1.c
    trunk/gcc/testsuite/ChangeLog
Comment 12 David Kühling 2011-01-21 15:17:34 UTC
Ok, looks like the change to reload1.c introduced by GCC revision 168347 (author froydnj):

http://gcc.gnu.org/viewcvs/trunk/gcc/reload1.c?r1=168347&r2=168346&pathrev=168347

fixes the problem for me.  I just took that patch and applied it to a checkout of the GCC 4.4 branch, now my test-case for this floating point bug passes!
Comment 13 Nathan Froyd 2011-01-31 21:53:15 UTC
Author: froydnj
Date: Mon Jan 31 21:53:12 2011
New Revision: 169452

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169452
Log:
gcc/
	Backport from mainline:
	2010-12-30  Nathan Froyd  <froydnj@codesourcery.com>

        PR target/44606
        * reload1.c (choose_reload_regs): Don't look for equivalences for
        output reloads of constant loads.

gcc/testsuite/
	Backport from mainline:
	2010-12-30  Nathan Froyd  <froydnj@codesourcery.com>

        PR target/44606
        * gcc.dg/pr44606.c: New test.


Added:
    branches/gcc-4_5-branch/gcc/testsuite/gcc.dg/pr44606.c
Modified:
    branches/gcc-4_5-branch/gcc/ChangeLog
    branches/gcc-4_5-branch/gcc/reload1.c
    branches/gcc-4_5-branch/gcc/testsuite/ChangeLog
Comment 14 Nathan Froyd 2011-02-01 02:11:57 UTC
Author: froydnj
Date: Tue Feb  1 02:11:54 2011
New Revision: 169465

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169465
Log:
gcc/
	Backport from mainline:
	2010-12-30  Nathan Froyd  <froydnj@codesourcery.com>

        PR target/44606
        * reload1.c (choose_reload_regs): Don't look for equivalences for
        output reloads of constant loads.

gcc/testsuite/
	Backport from mainline:
	2010-12-30  Nathan Froyd  <froydnj@codesourcery.com>

        PR target/44606
        * gcc.dg/pr44606.c: New test.


Added:
    branches/gcc-4_4-branch/gcc/testsuite/gcc.dg/pr44606.c
Modified:
    branches/gcc-4_4-branch/gcc/ChangeLog
    branches/gcc-4_4-branch/gcc/reload1.c
    branches/gcc-4_4-branch/gcc/testsuite/ChangeLog
Comment 15 Nathan Froyd 2011-02-01 02:12:33 UTC
Fixed everywhere.
Comment 16 Bernd Schmidt 2011-03-16 22:47:15 UTC
Does the testcase still test for the problem if you replace the declarations of stderr etc. with "#include <stdio.h>"? This fails with link errors of the "unresolved reference to stderr" variety on newlib targets.
Comment 17 Sebastian Andrzej Siewior 2011-03-17 13:02:58 UTC
(In reply to comment #16)
> Does the testcase still test for the problem if you replace the declarations of
> stderr etc. with "#include <stdio.h>"? This fails with link errors of the
> "unresolved reference to stderr" variety on newlib targets.

As long as your stdio.h does not optimize fprintf away it should work :)