55939 – [4.6/4.7/4.8 regression] gcc miscompiles gmp-5.0.5 on m68k-linux

Bug 55939 - [4.6/4.7/4.8 regression] gcc miscompiles gmp-5.0.5 on m68k-linux

Summary: [4.6/4.7/4.8 regression] gcc miscompiles gmp-5.0.5 on m68k-linux

Status:	RESOLVED DUPLICATE of bug 323

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	4.8.0

Importance:	P3 normal
Target Milestone:	4.6.4
Assignee:	Not yet assigned to anyone

URL:
Keywords:	wrong-code

Depends on:
Blocks:

Reported:	2013-01-11 11:00 UTC by Mikael Pettersson
Modified:	2013-02-18 19:38 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:	m68k-linux
Build:
Known to work:	4.5.4
Known to fail:	4.6.3, 4.7.2, 4.8.0
Last reconfirmed:	2013-01-29 00:00:00

Attachments
standalone test case (2.87 KB, text/plain) 2013-01-17 23:20 UTC, Mikael Pettersson	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Mikael Pettersson 2013-01-11 11:00:31 UTC

Building gmp-5.0.5 for m68k (configure for --host=m68020-unknown-linux), with gcc 4.6, 4.7, or 4.8, and then running gmp's test suite results in:

make[4]: Entering directory `/home/mikpe/objdir-gmp/tests/mpq'
...
ERROR (check_random test 1): bad mpq_set_d results
3.098073531878046e-120
3.098073531878046e-120
...
1 of 11 tests failed

The failure is 100% repeatable.

Building with gcc-4.5.4 instead eliminates the failure, so this is a regression.

I don't have a reduced test case yet.  I'm going to start a bisection on gcc trunk to hopefully identify when this regression started.

Comment 1 Mikael Pettersson 2013-01-11 20:06:03 UTC

I cut out one line too many in the build log, it's tests/mpq/t-get_d that fails.

On the surface the problem started with Jan Hubicka's "Inline heuristic 4/4" patch in r166657:
http://gcc.gnu.org/ml/gcc-patches/2010-11/msg01305.html
http://gcc.gnu.org/ml/gcc-cvs/2010-11/msg00546.html

However, all that did was to bump the default value of early-inlining-insns from 8 to 10.  Compiling gmp with gcc-4.5.4 --param early-inlining-insns=10 also causes tests/mpq/t-get_d to fail, so the real problem is older.

Comment 2 Mikael Pettersson 2013-01-12 16:23:55 UTC

The regression started with Jan Hubicka's "Pretty-ipa merge: Inliner heruistics reorg" patch in r147852:
http://gcc.gnu.org/ml/gcc-patches/2009-05/msg00812.html
http://gcc.gnu.org/ml/gcc-cvs/2009-05/msg00829.html

However, that just changed inlining heuristics so most likely it exposed a latent problem.

I'll start working on a reduced test case now.

Comment 3 Jeffrey A. Law 2013-01-16 19:48:46 UTC

Mikael -- can you try adding -fno-rename-registers to the cflags used to compile gmp and see if that changes anything?  It'd be greatly appreciated.

Comment 4 Mikael Pettersson 2013-01-16 20:46:19 UTC

(In reply to comment #3)
> Mikael -- can you try adding -fno-rename-registers to the cflags used to
> compile gmp and see if that changes anything?  It'd be greatly appreciated.

Done, -fno-rename-registers made no difference with any of 4.6/4.7/4.8.

Comment 5 Mikael Pettersson 2013-01-17 23:20:26 UTC

Created attachment 29198 [details]
standalone test case

Here's a standalone test case, extracted from gmp's t-get_d.c.  It's largish (444 lines) but most of that are support functions needed for the program logic but otherwise unrelated to the wrong-code issue.  The wrong-code occurs in check_random() as a side-effect of the inlining of my_ldexp().

Comment 6 Jeffrey A. Law 2013-01-18 04:28:01 UTC

Thanks.  The fact that -fno-rename-registers does not affect the result indicates this is a separate code generation issue than the one I'm working on.  The reduced testcase should help considerably.

Thanks,
Jeff

Comment 7 Aldy Hernandez 2013-01-21 17:19:52 UTC

I'm going to throw this out there before I start doing various pirouettes to get some kind of emulator going to reproduce this problem...

Mikael, is there some 68k emulator with a Linux image available somewhere?  Are you testing on actual hardware?  Is this hardware available for debugging remotely?  I'm just basically trying to find some sort of platform where I can reproduce and debug this.

Comment 8 Mikael Pettersson 2013-01-21 23:18:34 UTC

I'm using the ARAnym full-system m68040 emulator -- that's the only realistic option for testing gcc on Linux/m68k at the moment.  I maintain my own small Fedora-based distro, so if you like I can supply a disk image with that, plus the scripts and ARAnym parameter files I use.  Otherwise you can probably find HOWTOs and pointers to images on the Debian/m68k wiki.

Comment 9 Aldy Hernandez 2013-01-22 00:23:03 UTC

An image with parameters/instructions would be ideal. Heck...not ideal...great! Thanks so much.

"mikpe at it dot uu.se" <gcc-bugzilla@gcc.gnu.org> wrote:


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55939

--- Comment #8 from Mikael Pettersson <mikpe at it dot uu.se> 2013-01-21 23:18:34 UTC ---
I'm using the ARAnym full-system m68040 emulator -- that's the only realistic
option for testing gcc on Linux/m68k at the moment.  I maintain my own small
Fedora-based distro, so if you like I can supply a disk image with that, plus
the scripts and ARAnym parameter files I use.  Otherwise you can probably find
HOWTOs and pointers to images on the Debian/m68k wiki.

Comment 10 Aldy Hernandez 2013-01-29 13:23:37 UTC

Confirmed with testcase in comment #5, compiling with:

[guest@aranym2 gcc]$ ./xgcc -B./ a.c -O2
[guest@aranym2 gcc]$ ./a.out
Aborted

Comment 11 Aldy Hernandez 2013-01-29 17:26:46 UTC

In adding debugging printfs throughout a locally reduced testcase, I ran into something peculiar, that may or may not be related to this PR.  Perhaps some floating point savvy person comment?

    double d, d2;
    ...
    if (d != d2) {
	dumpd(d,d2);
        return -1;
    }

By this point, "d" and "d2" are in fp2/fp3, and we compare them in extended-precision real mode (fcmp.x) below:

(gdb) si
0x80000670	154	    if (d != d2)
1: x/i $pc
=> 0x80000670 <check_random+260>:	fcmpx %fp3,%fp2
(gdb) info reg fp2
fp2            9.5869038983080764131982923476680188e-94	(raw 0x3ec90000fff801fffff84000)
(gdb) info reg fp3
fp3            9.5869038983080764090401284414100591e-94	(raw 0x3ec90000fff801fffff83ff8)

Notice above that the registers are slightly different, so d != d2.

(gdb) si
0x80000674	154	    if (d != d2)
1: x/i $pc
=> 0x80000674 <check_random+264>:	fbne 0x80000692 <check_random+294>
(gdb) 
156		dumpd(d,d2);
1: x/i $pc
=> 0x80000692 <check_random+294>:	fmoved %fp2,%sp@-
(gdb) si
0x80000696	156		dumpd(d,d2);
1: x/i $pc
=> 0x80000696 <check_random+298>:	fmoved %fp3,%sp@-
(gdb) 
0x8000069a	156		dumpd(d,d2);
1: x/i $pc
=> 0x8000069a <check_random+302>:	jsr 0x80000508 <dumpd>

Notice we pass fp2/fp3 in the stack as arguments to dumpd().  Interestingly, we move them with fmove.d which is double-precision (instead of the compare we did prior in .x mode).

With dumpd() implemented as below, "a" will be the same as "b", since we chopped off some bits with the fmove.d explained above.

void dumpd(double a, double b)
{
  ...
  if (a == b)
    dump_same();
  ...
  ...
}

dumpd() gets executed as follows:

1: x/i $pc
=> 0x8000050c <dumpd+4>:	fmoved %fp@(8),%fp1
(gdb) 
0x80000512	90	{
1: x/i $pc
=> 0x80000512 <dumpd+10>:	fmoved %fp@(16),%fp0
(gdb) info reg fp1
fp1            9.5869038983080764131982923476680188e-94	(raw 0x3ec90000fff801fffff84000)
(gdb) info reg fp0
fp0            9.5869038983080764131982923476680188e-94	(raw 0x3ec90000fff801fffff84000)

Notice fp0 and fp1 are now the same.

(gdb) 
93	  if (a == b)
1: x/i $pc
=> 0x80000528 <dumpd+32>:	fcmpx %fp0,%fp1
(gdb) 
0x8000052c	93	  if (a == b)
1: x/i $pc
=> 0x8000052c <dumpd+36>:	fbeq 0x80000538 <dumpd+48>
(gdb) 
110	}
1: x/i $pc
=> 0x80000538 <dumpd+48>:	unlk %fp

etc etc etc.

I'm a bit flaky as to FP standards, but it seems suspect that two values that are different, suddenly become the same when passed to a function.

Is this one of those FP quirks, and I'm horrendously misguided?

Comment 12 Jeffrey A. Law 2013-01-29 17:37:12 UTC

On 01/29/2013 10:26 AM, aldyh at gcc dot gnu.org wrote:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55939
>
> --- Comment #11 from Aldy Hernandez <aldyh at gcc dot gnu.org> 2013-01-29 17:26:46 UTC ---
> In adding debugging printfs throughout a locally reduced testcase, I ran into
> something peculiar, that may or may not be related to this PR.  Perhaps some
> floating point savvy person comment?
>
>      double d, d2;
>      ...
>      if (d != d2) {
>      dumpd(d,d2);
>          return -1;
>      }
>
> By this point, "d" and "d2" are in fp2/fp3, and we compare them in
> extended-precision real mode (fcmp.x) below:
At this point alarm bells are already ringing...  When I see EQ/NE 
comparisons of floating points, there's a good bet something is wrong.


>
> (gdb) si
> 0x80000670    154        if (d != d2)
> 1: x/i $pc
> => 0x80000670 <check_random+260>:    fcmpx %fp3,%fp2
> (gdb) info reg fp2
> fp2            9.5869038983080764131982923476680188e-94    (raw
> 0x3ec90000fff801fffff84000)
> (gdb) info reg fp3
> fp3            9.5869038983080764090401284414100591e-94    (raw
> 0x3ec90000fff801fffff83ff8)
>
> Notice above that the registers are slightly different, so d != d2.
Right.

>
> (gdb) si
> 0x80000674    154        if (d != d2)
> 1: x/i $pc
> => 0x80000674 <check_random+264>:    fbne 0x80000692 <check_random+294>
> (gdb)
> 156        dumpd(d,d2);
> 1: x/i $pc
> => 0x80000692 <check_random+294>:    fmoved %fp2,%sp@-
> (gdb) si
> 0x80000696    156        dumpd(d,d2);
> 1: x/i $pc
> => 0x80000696 <check_random+298>:    fmoved %fp3,%sp@-
> (gdb)
> 0x8000069a    156        dumpd(d,d2);
> 1: x/i $pc
> => 0x8000069a <check_random+302>:    jsr 0x80000508 <dumpd>
>
> Notice we pass fp2/fp3 in the stack as arguments to dumpd().  Interestingly, we
> move them with fmove.d which is double-precision (instead of the compare we did
> prior in .x mode).
>
> With dumpd() implemented as below, "a" will be the same as "b", since we
> chopped off some bits with the fmove.d explained above.
Right.  This is very similar to the problems with the 80bit FP precision 
on i387.  The in-register format is 80 bits, but in memory the values 
are just 64 bits.  So a value can change simply by storing it into 
memory and reading it back out.

This causes all kinds of subtle problems -- think reload.  A programmer 
can't predict what the allocator is going to do, so a small change in 
the source might lead to changes in register allocation and a value 
which previously lived in a register might now be on the stack and 
reloaded.  And if the code is not tolerant of those extra bits 
appearing/disappearing causes problems.

A great example was Dreamworks where this extra precision suddenly 
appearing/disappearing was causing a single pixel to unexpectedly change 
color in one of their frame renders.

> Is this one of those FP quirks, and I'm horrendously misguided?
Yup, this is definitely one of those FP quirks.

It  might be helpful to see the entire function.

Jeff

Comment 13 Richard Henderson 2013-01-29 17:46:21 UTC

All hail extended precision, and lack of proper rounding operations
(until the 68040 anyway).

Aldy's analysis in #c11 suggests that compiling (all of gmp, including
the testcase) with -ffloat-store will probably fix this test case.

I can't immediately see the bug in gmp, as it looks as if most of the
double extraction should be done via integer arithmetic, but perhaps
if something went wrong in the configuration one of the fp arithmetic
fallback paths is being used accidentially...

Comment 14 Mikael Pettersson 2013-01-29 18:03:04 UTC

The wrong-code occurs in the test wrapper rather than in gmp proper, and the test wrapper does contain a helper function (my_ldexp) which computes and returns 'double'.  So the problem might simply be caused by excess precision.

Comment 15 Jakub Jelinek 2013-01-29 18:49:25 UTC

Perhaps -fexcess-precision=standard might fix this too (and be less expensive).

Comment 16 Aldy Hernandez 2013-01-29 19:47:24 UTC

>>       double d, d2;
>>       ...
>>       if (d != d2) {
>>       dumpd(d,d2);
>>           return -1;
>>       }
>>
>> By this point, "d" and "d2" are in fp2/fp3, and we compare them in
>> extended-precision real mode (fcmp.x) below:
> At this point alarm bells are already ringing...  When I see EQ/NE
> comparisons of floating points, there's a good bet something is wrong.

For the record, only the EQ/NE above comes from GMP.  The other one (in 
dumpd()) I added myself in testing.  But yes, that one "d != d2" is in 
the original testcase.

I will try the various suggestions and report back.

Thanks everyone.

Comment 17 Aldy Hernandez 2013-01-29 21:20:53 UTC

Both -fexcess-precision=standard and -ffloat-store fix the testcase.  This is perhaps a duplicate of PR323.

Although, Jakub mentions that according to:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323#c127

...-fexcess-precision=standard might not be 100% reliable on m68k, so you may have to go for -ffloat-store.

Mikael, can you check the GMP testsuite with -ffloat-store?  And perhaps see if there is a floating point EQ/NE compare in the GMP test assembly?

Thanks.

Comment 18 Mikael Pettersson 2013-01-30 08:53:31 UTC

gmp-5.0.5 builds and tests Ok on m68k-linux when configured with -ffloat-store in CFLAGS.  There is one "fcmp;fblt" and three "fcmp;fbne" sequences in the previously failing test case (just the .o file, so excluding libgmp.a).  There are 24 fcmps in libgmp.a, but they are all followed by relational conditional jumps, none is followed by an equality/inequality conditional jump.

Should I close this as a duplicate of PR323?

Comment 19 Jakub Jelinek 2013-01-30 08:57:24 UTC

Dup.

*** This bug has been marked as a duplicate of bug 323 ***

Comment 20 jsm-csl@polyomino.org.uk 2013-01-30 21:11:31 UTC

On Tue, 29 Jan 2013, jakub at gcc dot gnu.org wrote:

> Perhaps -fexcess-precision=standard might fix this too (and be less 
> expensive).

Note that -fexcess-precision=standard is not fully implemented for m68k: 
the back-end patterns claiming to carry out SFmode/DFmode operations, that 
actually do XFmode arithmetic, haven't been disabled for 
-fexcess-precision=standard as I did for x86 (along with some related 
fixes for conversions to SFmode/DFmode).  While the disabling of SFmode / 
DFmode operations is just for safety - if the architecture-independent 
parts of GCC are working correctly, it shouldn't matter whether they are 
enabled or not because there will be no arithmetic in those modes in the 
GIMPLE anyway - I don't know if m68k needs similar fixes to conversion 
operations.

Comment 21 Aldy Hernandez 2013-02-18 19:38:40 UTC

Another less invasive option could be to force the return value of my_ldexp() to go through memory, thus chopping off the excess precision before returning:

--- a.c	2013-02-18 20:37:23.000000000 +0100
+++ b.c	2013-02-18 20:36:42.000000000 +0100
@@ -404,7 +404,11 @@
 		e += 1;
 	    }
 	} else
-	  return d;
+	  {
+	    volatile double force_round = d;
+	    return force_round;
+
+	  }
     }
 }

But basically, you'd have to carefully keep track of when you have to use volatiles.