[ forwarded from http://bugs.debian.org/381710 ] The following report has been submitted by Kurt Roeckx: I've been looking at the perl testsuite failure on hppa. See http://bugs.debian.org/374396 This code: while (cdouble < 0.0) cdouble += adouble; Generated by gcc-4.1 with -O2 and -fdelayed-branch gives: fadd,dbl %fr13,%fr22,%fr13 .L1447: fcmp,dbl,!< %fr13,%fr0 ftest b .L1447 fadd,dbl %fr13,%fr22,%fr13 fsub,dbl %fr13,%fr22,%fr13 With -O2 and -fno-delayed-branch: .L1239: fadd,dbl %fr13,%fr22,%fr13 fcmp,dbl,!< %fr13,%fr0 ftest b,n .L1239 As you can see, in case of the delayed branches it always executes an fadd at the start and fsub at the end, which it doesn't do without the delayed branches. This is causing unwanted rounding problems, since the mantisa doesn't have enough bits to keep the the required information. I think atleast in this case, it's not a good idea to do this optimization with floating point numbers. The same code on gcc-4.0 with -fdelayed-branch seems to generate this code: .L661: fadd,dbl %fr12,%fr22,%fr12 fcmp,dbl,!< %fr12,%fr0 ftest b .L661 ldo -256(%r30),%r20 With -fno-delayed-branch: .L643: fadd,dbl %fr12,%fr22,%fr12 fcmp,dbl,!< %fr12,%fr0 ftest b,n .L643 So gcc-4.0 looks good. gcc-snapshot 20060721-1 gives with -fdelayed-branch: fadd,dbl %fr12,%fr22,%fr12 .L1449: fcmp,dbl,!< %fr12,%fr0 ftest b .L1449 fadd,dbl %fr12,%fr22,%fr12 fsub,dbl %fr12,%fr22,%fr12 So that has the same problem. For those not familiar with hppa assembler, a branch normally executes the instruction following it too, before branching. The ",n" in "b,n" will prevent the next instruction from being executed, so has the same effect as following it with a nop instruction. The following code has the same effect: #include <stdio.h> double cdouble = -1; int main() { double adouble; adouble = 9007199254740992.0; /* 2^53 */ while (cdouble < 0.0) cdouble += adouble; printf("%lf\n", cdouble); return 0; } With delayed branches it prints: 9007199254740992.000000 without: 9007199254740991.000000
This sounds like two target problems rather than generic ones.
Re comment #1: it's a generic bug in reorg.c (fill_slots_from_thread). I'm testing a patch.
Subject: Bug 28634 Author: rsandifo Date: Mon Aug 14 12:55:52 2006 New Revision: 116124 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=116124 Log: gcc/ PR rtl-optimization/28634 * reorg.c (fill_slots_from_thread): Do not assume A + X - X == A for floating-point modes unless flag_unsafe_math_optimizations. gcc/testsuite/ PR rtl-optimization/28634 * gcc.c-torture/execute/ieee/pr28634.c: New test. Added: trunk/gcc/testsuite/gcc.c-torture/execute/ieee/pr28634.c Modified: trunk/gcc/ChangeLog trunk/gcc/reorg.c trunk/gcc/testsuite/ChangeLog
Patch applied to mainline. It has been approved for 4.1, so I'll apply it there after testing.
Subject: Bug 28634 Author: rsandifo Date: Sat Sep 9 10:56:31 2006 New Revision: 116796 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=116796 Log: gcc/ PR rtl-optimization/28634 * reorg.c (fill_slots_from_thread): Do not assume A + X - X == A for floating-point modes unless flag_unsafe_math_optimizations. gcc/testsuite/ PR rtl-optimization/28634 * gcc.c-torture/execute/ieee/pr28634.c: New test. Added: branches/gcc-4_1-branch/gcc/testsuite/gcc.c-torture/execute/ieee/pr28634.c Modified: branches/gcc-4_1-branch/gcc/ChangeLog branches/gcc-4_1-branch/gcc/reorg.c branches/gcc-4_1-branch/gcc/testsuite/ChangeLog
Applied to 4.1 after testing on mipsisa64-elf and mips64-linux-gnu. Although the bug has been around for a long time, it isn't known to be a regression from 4.0 to some earlier release, so it doesn't qualify for a 4.0 backport. I'll therefore close this PR as fixed.