I attached two testcase which is stripped down graphicsmagick code. tc-resize2.c has a few instructions more than tc-resize.c. I belive the bug is the same. gcc -o tc2 -O0 -Wall -Wextra tc-resize.c gcc -o tc2 -O2 -Wall -Wextra tc-resize.c ( ./tc0 ; echo "-"; ./tc2 ) .19 2484.000000 .21 2700.000000 .23 2916.000000 .24 3132.000000 .26 3348.000000 - .0 2484.000000 .0 2700.000000 .0 2916.000000 .0 3132.000000 .0 3348.000000 gcc -o tc20 -O0 -Wall -Wextra tc-resize2.c gcc -o tc22 -O2 -Wall -Wextra tc-resize2.c ( ./tc20 ; echo "-"; ./tc22 ) .26 3264.000000 .28 3520.000000 .30 3776.000000 .32 4032.000000 .34 4288.000000 - .14 1734.000000 .15 1870.000000 .16 2006.000000 .17 2142.000000 .18 2278.000000 Now here is stripped down output of the assembly file: MinifyImage: evstdd 26,88(11) .L9: li 26,0 stw 26,64(1) .L11: #APP # 105 "tc-resize.c" 1 nop # 0 "" 2 #NO_APP lis 19,.LC5@ha efdmul 5,9,26 r26 should be 0.0078125 but is 0 r26 bits 32..63 are, dunno about 0..31 la 19,.LC5@l(19) evldd 19,0(19) r19 is 0.5 efdadd 5,5,19 r5 is expected to be .5 efdctuiz 5,5 r5 is 0 rlwinm 5,5,0,0xff #APP # 107 "tc-resize.c" 1 nop # 0 "" 2 #NO_APP evmergehi 6,9,9 mr 7,9 mr 8,7 mr 7,6 crxor 6,6,6 that should be 9,9,9 but is a different issue :) bl fprintf # r5 int (bad), r6 pad, r7 & r8 double r26 is zero and that is wrong. I belive 1/128.0 gets replaced by 0.0 and optimized. Sebastian
Created attachment 20950 [details] testcase
Created attachment 20951 [details] slightly extended tc
Created attachment 20952 [details] -S output of the first tc
(In reply to comment #0) > I attached two testcase which is stripped down graphicsmagick code. > tc-resize2.c has a few instructions more than tc-resize.c. I belive the bug is > the same. I was able to get similarly erroneous results with just "-O1 -fschedule-insns": gcc -o tc0 -O0 -Wall -Wextra tc-resize.c gcc -o tc1x -O1 -fschedule-insns -Wall -Wextra tc-resize.c ( ./tc0 ; echo "-"; ./tc1x; ) .19 2484.000000 .21 2700.000000 .23 2916.000000 .24 3132.000000 .26 3348.000000 - .7 954.000000 .8 1050.000000 .9 1146.000000 .10 1242.000000 .10 1338.000000
Created attachment 21605 [details] Further stripped testcase with problematic section identified Ok, I've spent a bit more time fiddling with this testcase, and I believe I can display exactly where the bug happens. In the attached "test.c" file, you can see a section like this: saved_r = total_r; saved_g = total_g; saved_b = total_b; Minify(2*x + 10, 15.0); save2_r = total_r; save2_g = total_g; save2_b = total_b; The "Minify()" macro is supposed to add nonzero values to total_[rgb] but when compiled with -O2 (or -O1 and a few extra optimizations) the values of save2_[rgb] are the same as those of saved_[rgb]. I'm also attaching a Makefile I used while testing this program. In the Makefile I enable -O1 and then turn off every optimization pass that is not strictly necessary to reproduce the bug. The Makefile simply builds 2 copies of the program, one with -O0 and one with -O2, then runs them and compares their output. Some kind of loop optimization or unrolling seems to be involved. The specific optimizations which are mandatory for the bug to show up are below, if any of these is turned off the bug seems to go away: -fdce -fguess-branch-probability -fschedule-insns -ftree-ch -ftree-dominator-opts -ftree-loop-optimize Cheers, Kyle Moffett
Created attachment 21606 [details] Makefile for "test.c"
The problem is in the register allocator. What happens is that after register allocation, we have something like: r27:DF = [r11:SI] ... r27:SI = 0 ... and then the first def gets deleted because it's obviously dead code. I don't know why the register allocator chooses r27 for the SImode pseudo. Scheduling is necessary to trigger the bug, but it's not the problem.
Is there anything else that any of us can do (dumps, testcases, etc) to help get this bug tracked down and fixed? If access to a native SPE system would help I can set you up with remote access credentials to one of ours. Cheers, Kyle Moffett
Known to work: 4.2.2 (from ELDK 4.2) Known to fail: 4.4.1 (from CodeSourcery G++ Lite 4.4-254)
I can reproduce this bug with gcc-4_4-branch SVN HEAD, compiling gcc in cross-compile mode and running the resulting executable with qemu-ppc. Instructions to reproduce everything without any PPC hardware involved (tested on Ubuntu Maverick/AMD64): Get a powerpcspe sysroot: apt-cross -a powerpcspe -S sid -m http://ftp.de.debian.org/debian-ports \ libdb1-compat libdb4.8 libc6 libc6-dev linux-libc-dev \ zlib1g zlib1g-dev libmpfr-dev apt-cross -a powerpcspe -S unreleased -m http://ftp.de.debian.org/debian-ports \ libdb1-compat libdb4.8 libc6 libc6-dev linux-libc-dev \ zlib1g zlib1g-dev libmpfr-dev (repeat a few times until all dependencies resolved) get Debian SID binutils source, then compile&install in cross-compile mode: dpkg-source -x binutils_*.dsc cd binutils-* && \ TARGET=powerpcspe fakeroot debian/rules binary-cross dpkg -i binutils-powepc-linux-gnuspe-*.deb Get GCC from SVN and compile minimal C-only version: svn co svn://gcc.gnu.org/svn/gcc/branches/gcc-4_4-branch mkdir -p build cd build && ../gcc-4_4-branch/configure \ --target=powerpc-linux-gnuspe \ -with-cpu=8548 --enable-e500_double --with-long-double-128 \ --enable-languages=c --disable-multilib \ --prefix=/usr --with-headers=/usr/powerpc-linux-gnuspe/include \ --with-libs=/usr/powerpc-linux-gnuspe/lib make -C build -j5 Compile test-cases available as attachments to this bug, i.e. test.c: build/gcc/xgcc \ -B./build/gcc/ -B/usr/powerpc-linux-gnuspe/bin/ \ -B/usr/powerpc-linux-gnuspe/lib/ \ -isystem /usr/powerpc-linux-gnuspe/include \ -isystem /usr/powerpc-linux-gnuspe/sys-include \ -fverbose-asm -dA -O2 -o test test.c Use Ubuntu's qemu user-space emulation to run resulting executable: rootfs=/usr/powerpc-linux-gnuspe /usr/bin/qemu-ppc -cpu 'MPC8548E_v21' \ -E LD_LIBRARY_PATH=$rootfs/lib/ $rootfs/lib/ld.so.1 \ ./test Repeat the same with -O0 and note the difference. This test fails for all the 3 .c files attached here. Output for tc-resize.c matches the output documented by Sebastian, so I'd say qemu is doing it right.
Author: froydnj Date: Thu Dec 30 15:47:43 2010 New Revision: 168347 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=168347 Log: gcc/ PR target/44606 * reload1.c (choose_reload_regs): Don't look for equivalences for output reloads of constant loads. gcc/testsuite/ PR target/44606 * gcc.dg/pr44606.c: New test. Added: trunk/gcc/testsuite/gcc.dg/pr44606.c Modified: trunk/gcc/ChangeLog trunk/gcc/reload1.c trunk/gcc/testsuite/ChangeLog
Ok, looks like the change to reload1.c introduced by GCC revision 168347 (author froydnj): http://gcc.gnu.org/viewcvs/trunk/gcc/reload1.c?r1=168347&r2=168346&pathrev=168347 fixes the problem for me. I just took that patch and applied it to a checkout of the GCC 4.4 branch, now my test-case for this floating point bug passes!
Author: froydnj Date: Mon Jan 31 21:53:12 2011 New Revision: 169452 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169452 Log: gcc/ Backport from mainline: 2010-12-30 Nathan Froyd <froydnj@codesourcery.com> PR target/44606 * reload1.c (choose_reload_regs): Don't look for equivalences for output reloads of constant loads. gcc/testsuite/ Backport from mainline: 2010-12-30 Nathan Froyd <froydnj@codesourcery.com> PR target/44606 * gcc.dg/pr44606.c: New test. Added: branches/gcc-4_5-branch/gcc/testsuite/gcc.dg/pr44606.c Modified: branches/gcc-4_5-branch/gcc/ChangeLog branches/gcc-4_5-branch/gcc/reload1.c branches/gcc-4_5-branch/gcc/testsuite/ChangeLog
Author: froydnj Date: Tue Feb 1 02:11:54 2011 New Revision: 169465 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169465 Log: gcc/ Backport from mainline: 2010-12-30 Nathan Froyd <froydnj@codesourcery.com> PR target/44606 * reload1.c (choose_reload_regs): Don't look for equivalences for output reloads of constant loads. gcc/testsuite/ Backport from mainline: 2010-12-30 Nathan Froyd <froydnj@codesourcery.com> PR target/44606 * gcc.dg/pr44606.c: New test. Added: branches/gcc-4_4-branch/gcc/testsuite/gcc.dg/pr44606.c Modified: branches/gcc-4_4-branch/gcc/ChangeLog branches/gcc-4_4-branch/gcc/reload1.c branches/gcc-4_4-branch/gcc/testsuite/ChangeLog
Fixed everywhere.
Does the testcase still test for the problem if you replace the declarations of stderr etc. with "#include <stdio.h>"? This fails with link errors of the "unresolved reference to stderr" variety on newlib targets.
(In reply to comment #16) > Does the testcase still test for the problem if you replace the declarations of > stderr etc. with "#include <stdio.h>"? This fails with link errors of the > "unresolved reference to stderr" variety on newlib targets. As long as your stdio.h does not optimize fprintf away it should work :)